Wednesday, May 29, 2013

Redis and Twit-arr

Just experienced my first PEBKAC while using my library - code reloading doesn't work when the server is in production mode.

Over the weekend, I started eating my own dog food where Kul is concerned. Last year, my wife and I went on the Joco Cruise Crazy - and it was awesome. Highly recommended. Cruising is something that IMHO requires friends, and 800+ fellow nerds are a great group of soon-to-be friends to go with.

(I also got to meet John Scalzi - and got to watch Wil Wheaton play Artemis. That's hard to beat, you gotta admit)

Anyway, the group brought along their own server with it's own microblogging instance. And while it was neat and all, it wasn't exactly set up for the cruise. It tended to overwhelm the wifi capacity whenever the entire group was together, and it didn't exactly make the best use of small screens, etc. The concensus was that on the next trip Twit-arr (the microblog) would have its software overhauled.

So over the weekend I started putting together a new version. I started with Kul (of course), and it's helped me to see some of the flaws / weak points in what I'm building. I've already made some changes to the framework based on things that were difficult to use.

I'm using Redis as a backend for the service as well. Redis is awesome, and it's Ruby integration is even more awesome (if that's possible). The installation guide in the quickstart was one of the most complete I've ever used, and if it took me an hour to set Redis up it was only because I was going slowly. Awesome, awesome, awesome.

Basically, I got to mess around with Redis (which I've been wanting to for awhile), use my own framework, and build something that (hopefully) will be of use to a bunch of people.

The code for my new twit-arr instance is up on GitHub, of course. Still in early development but I'm having a blast with it.

Ruby / Sinatra stuff

I had trouble finding this little nugget out there, but it seems that Ruby's require statement is now thread-safe. Looking at the web, it seems that it didn't used to be back in 2009 or so.

load is still not thread-safe. This makes my auto-loading unsafe. Also learned about autoload, which does some similar things to what I'm doing, but is being deprecated because it's unsafe. Which makes sense, as it should have the same problems I'm experiencing.

Also, I couldn't find anything that described what Sinatra did when it was in development mode - there's a few things I found browsing through the source:

  • Sets the error pages to the Sinatra versions
  • Something with the session secret
  • Turns on template reloading
  • Binds only to localhost
  • Turns on showing exceptions
... and that's it.

Sunday, May 26, 2013

Code reloading

For a web framework, code reloading is a crucial feature (especially for a framework designed for rapid development!). Minutes spent waiting for a server to reload add up quickly. Some of that can be mitigated by doing comprehensive unit testing, but there is no replacement for being able to see your code running in situ.

There's a great explanation of code reloading by Konstantin Haase, some of which I actually understand! The rails deep-diving isn't what I'm interested in, but the explanation of code reloading. This is also the link that Sinatra uses to explain code reloading, so I'm not the only person that uses it as a reference.

The bottom line is that if you want true code reloading, you need to actually have a separate Ruby context for each request - which is what Shotgun (or shotgonne if you're a Prachett fan) does behind the scenes. It's the only real way to guarantee that your code reloads.

Initially the way Kul did code reloading was just to re-load the files every time it loaded one. It's both brute-force and has a few problems. In production, you can't reload the entire website from disk for every request, but in development mode it really isn't that big of an issue.

The bigger issue is that I was basically abusing the open nature of ruby classes. Effectively, the load will reopen the class and redefine it from the code in the file. This works well for a few specific things, and badly (or not at all) for others. For example, adding a method would work fine; removing a method would not work at all.

Plus, as much as I justified it, reloading all of the code on every request did bother me.

Today I started replacing the reloading inside of Kul. It's a small improvement, but I think it should help in the long run. There are some limitations, but it should work for 99% of the expected use of the framework. In the other circumstances, you'll just have to restart the server - sorry.

The first thing I wanted to fix was the brute-force nature of the code reloading. That was simple enough - I just keep track of the file date and don't bother reloading the file if it hasn't been modified. I did find out that (at least on Windows) the modified time is only tracked to the second - I'm sure that's fine for almost everyone else but I had to do some jiggery-pokery to trick the unit tests into passing. It was either that or put a one-second sleep in there, and that's just wrong.

The second thing I did was to assume that require statements are not likely in user code. That's probably the most fragile assumption I'm making, but it should cover most of the usages (once I get the models in, at least). As long as all of the code you're using is either:

  1. inside of a library (which shouldn't change during runtime), or
  2. in framework files such as server.rb or controller.rb
then the framework should be able to gracefully handle changes to your code. Basically, the user has to follow the framework's rules as far as naming files and placing code. 



(This is actually the same assumption I was making before, but I made a more conscious decision whereas before it was just incidental)

There are a few limitations even with that big 'ol assumption up there. If you have more than the framework-expected class in the file, you'll be falling back on the original code reloading (i.e. it won't remove methods / constants / etc). Also, any metaprogramming you've done outside of that file will be gone. Basically don't manipulate classes outside of the file.

Also, any instances hanging around will not change their code. That's the one that I think is most likely to cause confusion. Code reloading could also do some really interesting things with class methods, depending on how they're called. And lastly, this code reloading will work horribly in a multi-threaded environment. I think that it would eventually reach a steady-state, but it's hard to say for sure.

At the end of the day, it's not perfect, but it should make development easier, and that's the final goal. It's not for production anyway. The reloading code will be exposed so that if anyone does want to make use of it from their application, they can call Kul::FrameworkFactory.load_class with the class and path, and the framework will handle the reloading.

Tuesday, May 21, 2013

User Interfaces

This is kind of a followup to a post from a couple months ago about rich internet apps.

Today, if you want to build an application that runs on as many systems as possible, the answer is simple - you build a web application. There are other options: you could build a Java app for example. Java runs on so many systems, and you can build it once and the executable runs anywhere.

However, you have to install Java on the client. Many people don't want to do that. And you have to install the application on the client. More people don't want to do that. A non-sandboxed application is a scary thing to install on a system these days.

I feel that there's limited utility in actually trying to build and distribute a Java application. And the market pretty much bears that out - how many Java apps are actually being worked on these days? I can't think of a one. I'm sure there are some out there, but not very many. And they probably have a very limited / controlled distribution channel.

It makes sense if I think about it. In order for it to be a program, it has to have a set of instructions to run. Those instructions need to either be in the language of the machine, or there needs to be another program on that machine to interpret those instructions. Therefore, you have to have something that can interpret your language of choice on the range of machines you want to run on.

It's simple, basic computers 101 - computers run instructions. But it has consequences when trying to build something that can be run across architectures. There has to be an interpreter for that language on the remote machine. That's all there is to it. You need an interpreter.

In the case of Java, the JVM provides that interpreter - from bytecode to machine instructions. But again, Java has too much power - it allows a malicious programmer to execute instructions that are harmful.

There is one interpreter that both runs in a sandbox and has an even bigger install base than Java does: Javascript. Or ECMAScript, if you're so inclined.

This is why web applications are so popular - they have the single largest install base for the sandbox / interpreter that they run. Heck, most modern phones have multiple browsers - take that Java!

(Also, most browsers don't install the ask toolbar along with themselves)

Web apps also have an incredibly rich graphical interface. It's not the most stable between browsers, but having written applications in Win32, MFC, Swing, WPF, Qt, and HTML - I'll take the web any day.

Cross platform, common GUI applications - what's not to like?

(Well, there are quite a few things, but there's quite a few other places to hear that sort of griping.)

Monday, May 13, 2013

Snake oil peddlers

Great article on Quartz about big data. It includes some data about data processing size on clusters at Yahoo and Facebook. If those guys don't need clusters for "big data", why do smaller companies?

Not saying that some companies don't. But it's a simple question that should be answered before you go down that path. Why do you need a multi-node cluster running MapReduce in order to process a few gigabytes of data? If you can't answer that question, then you probably are just wasting money on servers and even more money on developers to build frameworks on those servers.

Architecture should be as simple as possible.

The backing paper has a great summary: "...analytic jobs — in particular Hadoop MapReduce jobs — are often better served by a scale-up server than a scale-out cluster"

...

It seems to me that somewhere along the line, the "cloud" went from being service-oriented to being data-oriented. Having a cluster that provides services that can be accessed is an incredibly useful thing, especially if those services are accessed in a standard way. Both Amazon and Google have infrastructures like this - Google App Engine, although not my favorite, does exactly this, as does Amazon's Web Services.

Those are clusters that run many people's services all on the same hardware. Instead of having a cluster of machines that are all focused on one giant problem, you have a cluster that solves many problems at once.

What it seems many people are selling is the quest for the holy grail of data. Take all the data, run a giant complex algorithm on it, and all of the answers will be made clear - i.e. Snake oil peddlers.

I keep MapReduce on my resume - it's a great paradigm to know. But I am very cautious about how I use that technology - there are just too many companies using Hadoop as a catch-all answer for every problem.

Wednesday, May 8, 2013

Travis CI and RACK_ENV

I'm still working on a major refactor. It's going well, but it's not quite there yet.

Tonight I decided to check in my code, because... I'm a developer. Anyway, unit tests were passing, everybody was happy. environment variable

Code was checked in, and I promptly recieved an email from Travis-CI, telling me that my tests were failing. Except that I ran them before I pushed the code, and it ran fine!

Obviously a configuration difference, but I couldn't figure out what the difference was, since I have my gemfile.lock checked in and was running the same "bundle exec rake" command that Travis was. I had actually figured out the problem and had figured out how to correct the test before I could reproduce the test failure.

Eventually I stumbled onto the Travis environment documentation - and saw the light. Turns out that Travis sets the RACK_ENV=test environment variable, which causes Rack to actually complain about unhandled exceptions, instead of turning them into a 500 like it was in my code. I quickly added the parameter to the top of my spec_helper.rb and voila! - my unit tests began failing!

Never been so happy to see a unit test fail before. Committed my fix and the build works again. Winning!