Dreams from Rylath: 2013

Monday, December 9, 2013

Java

One: "So we've got this strongly and statically typed, compiled language here. We call it 'Java'."

Two: "Sounds neat! How do I configure components in it?"

One: "We prefer that you do all of your configuration via deeply nested, fragile XML that is not validated in any way."

Two: "Why don't we just configure the components in ... code?"

One: "Because then you can't change the configuration after compile time."

Two: "We need to dynamically change which factory class creates the database pool?"

One: "JUST GO TYPE! MANAGEMENT LIKES TO HEAR LOTS OF TYPING!"

Two: ...

Wednesday, December 4, 2013

Next: Scala

I just finished reading this great article about Play / Scala and async.

http://engineering.linkedin.com/play/play-framework-async-io-without-thread-pool-and-callback-hell

For most of the applications I build, I don't need the sort of performance that requires a async framework. And from what I've seen of node.js, I'm not going to mess with it unless I need something that fast.

But the things in that article are pretty. Those comprehensions are nice.

As much as I love Ruby, there are a few things that it doesn't do well. It is designed for code writing speed, not execution speed. And it does lack some of the safety features of a compiled language.

After I finish with Twitarr, it'll be time to start messing with some Scala. And probably Play. Looks like fun.

Monday, December 2, 2013

Ember.js views

In the view code (i.e. the code that extends Ember.view) you have to prefix any controller property accesses with 'controller.' - i.e. if you're getting the foo property of the the controller, you have to use @get('controller.foo') - view properties you can get without any prefix, as one would expect.

This kind of makes sense, although I'd point out that there are many other places where Ember automatically looks to parent elements for data.

What confuses me is that in the template that is used by the view you have to prefix view accesses with 'view.' - controller properties are bound without a prefix though.

Yes, the template is EXACTLY opposite the view code.

...?

Friday, September 27, 2013

Thread safety

(I wrote this some days ago but didn't post it. I realized that it reinforced my Ruby camps post so I'm putting it up now)

A few days ago I found this great online book by Avdi Grimm (author of Ruby Tapas) named Objects on Rails. The link goes to the free (as in beer) version of the book, but I highly recommend spending the five bucks and supporting the author. (Sadly the downloadable version does not contain an HTML version, which would be really useful for referencing)

I spent a couple of days deep-diving into the book. Mostly it reinforces a bunch of patterns that I've been using in my own Rails projects, like the Draper gem. (Avdi uses exhibits, which are very similar to what Draper provides as discussed by the authors here)

Something was bothering me about it though. As I started implementing some of his patterns in my own code, I realized what the problem was. The top-level blog class is not thread-safe.

This is not a problem in MRI, since it uses green threads instead of system threads. However in JRuby, using a single global object like this has the potential for race conditions. It's very unlikely that it could cause a problem in this little blog app - there's very little state in the Blog class and it's mostly initialized when the class is initialized.

But it would be easy for a new developer to see this class and add something like a caching layer. Any mutable state on the classes is going to be a problem, and it's unlikely to be caught on a development system, either.

I've ended up refactoring the top-level blog class out of my app. Instead of leveraging a root object to facilitate unit testing, I'm using DCI patterns to help. It's working well with Redis objects, which are very similar to basic Ruby types. I'll post more about that later.

A tale of two Ruby communities

I've lately begun to realize that there are two Ruby communities.

The first runs MRI - they use Mac OS and occasionally Linux if forced to. They have green threads so they don't worry about ugly things like threading or mutexes or deadlocks. They sometimes write libraries in Ruby, and sometimes in C or C++.

The other community runs JRuby on whatever platform they feel comfortable in (including *gasp* WINDOWS). They have OS-level threads so they avoid things like global variables and mutable class state. Most of the time they write libraries in Ruby, but occasionally they'll include dependencies on Java libraries.

(As a side note, I have no experience with Rubinius. I think it's in the second camp but not sure.)

When you find a new gem that you'd like to use, it's important to figure out what camp it belongs in. Some are very easy to figure out - if it uses Java, you're in the JRuby camp. If it uses C or C++ code, it's the first. Other gems are harder to figure out. Gems that are written in Ruby are theoretically compatible with either implementation. But they aren't always 100% compatible.

Personally I belong in the second camp. I write multi-threaded code. When I run across a nifty gem that uses class variables to hold database connections, I am a sad programmer.

My primary development box runs Windows. That's a certain special level of hell. Many times a gem that otherwise works perfectly fine simply won't run unit tests. Or gems that load their configuration from the local database install. (actually can't blame that one on MRI - that project is entirely JRuby based.)

(There's a certain attitude that Ruby developers get when you say you're running Windows. Comments such as "get yourself a real development machine" are commonplace.)

Ruby's a great language. But it's very difficult to migrate developers from other languages when the barriers to entry are so specific. Ruby developers are proud that they don't have to use a IDE (and right to be proud, I believe), but how much more of a barrier to entry is having to use a MacBook?

And why the hell are people ignoring OS threads?

Won't someone please think about the OS threads?!

Tuesday, September 24, 2013

Which user?

"As someone who builds software, it's important to keep in mind the users."

Most of the time when someone uses that phrase they are referring to the end users of software - the people typing and clicking and being annoyed by bugs. But there are other users as well - I see at least three kinds.

The most overlooked group are the administrators. It's a fuzzy collection (sometimes in more than one way - at least half the sysadmins I've ever met don't hardly shave) of people, but it can include people who install our software, people who maintain the software and hardware underneath, and people who manage end user experience.

I say it's an overlooked group because often the installation is the last thing a developer thinks of. It's natural to want to get the idea down and to see it running (on our dev boxes of course), and sometimes that gets away from us. The funny part is that we're often the group that complains the loudest when something is hard to work with - then we go and pester the systems team to install it for us.

Software that is easy to install and maintain is a treasure.

Apache Tomcat is dead simple to get working, works on just about any OS, and is generally obvious what's wrong with it when something's broken. Configuration is XML, which I think is it's largest downside.
Redis doesn't run in Windows, but I've already gone on about how easy it is to install.
I love the way Chrome / Chromium patches itself unobtrusively.

It's important to have a deployment plan for software, and for it to be easy to follow. Limit dependencies as much as possible. Have a good configuration scheme (e.g. not XML). Have a simple logging strategy.

(I have the hardest time finding the correct logs in Hadoop. This may be due to the fact that it's been set up by developers, but I'd say that goes back to having a good configuration scheme.)

The second (occasionally) overlooked group are developers. Remembering this group is generally a function of experience - usually the experience of trying to fix someone else's spaghetti code. "If it was hard to write, it should be hard to read" is thankfully a mantra that has fallen by the wayside, and I hope it stays dead. Code that is well constructed and readable is also maintainable, and that helps everybody.

Ruby developers have the edge over Java developers here. Java code can be (and usually is) well constructed, but the Java language limits how readable the code is. Ruby code can be a thing of beauty, and there are discussions about how to make it so. Such conversations would be anathema to Java developers, and I think that is a mistake. I've never gotten much out of poetry, but well-written prose pleases me on an aesthetic level, and well-written code is just the same. Seeing a function that solves a problem in a smart manner - feeling that "a-ha" when you capture the essence of a block in your head - provides an emotional satisfaction to our occupation that is under-valued.

Last are the end users. These are arguably the most important users. If we don't have end users, then our software really isn't doing much. There's no shortage of blogs and discussion of the whys and the hows for user experience, and I don't have much to add that couldn't be found elsewhere. End user experience is something that developers sometimes overlook. I've met a number of developers who are happy to push those responsibilities off onto the designers, as though it wasn't a part of their job.

All developers should be able to involve themselves in UX. There's really no excuse for not being able to do so. If we have a strong design team and product owner(s) then we may not have to do much in this area, but developers need to be cognizant of the techniques. After all, we are closest to the computer - we are often able to see things that the computer can do that others can't.

Another interesting point is that the groups can become muddled depending on what kind of software we're working on. If we're writing a library, then the end users are really the developers using the library - they're the ones who have the experience of using the interfaces we're providing. (And library interfaces are an art form all of their own) If writing infrastructure software then the end users become the admins who will be deploying other packages on top.

It's important to keep in mind the people using your software, no matter how they're involved.

Thursday, August 8, 2013

Strategy pattern in Ruby - x 4

One of my weaknesses as a developer is that I don't really know design patterns as well as I should. I use some of them frequently (Builder, Factory, Adapter, Decorator) but I've never used some of the others, and it caused me some pain the other day. So I'm going through the various design patterns and trying to implement them in various ways in Ruby.

I started playing this evening with the Strategy pattern. The most obvious implementation of the strategy pattern in Ruby is just using modules, like so:

Kul vs Rails

After having used it for a while, I think the time has come to ~~kill~~ deprecate Kul. Building it has taught me a great deal about many things, not least of which is just how complex Rails is.

The big sticking point I had with Rails is the router. Everything else in Rails is workable, and there's no reason to discard all of that code just to fix the router. I'm sure that I can convince rails to look in the views folders for the controller, and skipping ActiveRecord is simple and well-documented.

What I discovered the other day is that the Rails router can delegate straight to a Rack application. Which means that it's just a single line in the routes.rb (and an entire rack implementation I suppose) that will put my nifty routing / controller DSL plan in place.

Like Kul, this will probably be more of an experiment for learning purposes than a serious project. The Rails router and controllers really do a lot of work, and it's probably not a good idea to hijack them in production. But it should be fun to play with.

First thing is to switch Twitarr over to Rails. Unfortunately at this point I'm busy trying to learn Ember.js, which is not a trivial web framework. The value is there, but the documentation makes it very difficult.

Monday, July 22, 2013

Ember and Ember Data

I've been modding my version of Twitarr and checking to see if it works in Ember.Js.

The killer feature in Ember is the router, although the integration with templates (and being able to break your UI into pieces) is a close second in the feature race. As much as I enjoy working with KnockoutJS, it doesn't implicitly have those features. Sure, you could integrate a template engine such as Mustache and use some patterns to break up your javascript, but why do all that work when someone has already built the framework for you.

I've had two basic problems with Ember so far. First is the documentation. It's bad. The Ember team acknowledges how bad it is, but it's still just... bad. It doesn't help that MVC used for webapps bears no resemblance to MVC for client-side code. (Technically the MVC pattern used in Ember is closer to the original ideal)

But it is workable. And when it works, it's honestly amazing. I spent the money on the tutorial from PeepCode - most of the information I had already gleaned by working the Ember tutorial, but there were some things that the Peep guys clarified. It's a good tutorial if you're just starting in Ember.

The second problem is Ember Data. The Ember tutorial as well as the PeepCode tutorial will lead you to use Ember Data, and it is officially Not Done Yet. It also has some serious logistical problems in my mind. One pattern I often use is to have users input data into a form, then send that data to the server to be enriched and sent back. Ember Data has validation, but that's a simple binary save / no save. There's no simple way (that I've found) to be able to modify a record on the server side and update that data back to the client.

It's particularly frustrating in a microblog where I want to timestamp the post server side and add the username server side (I suppose I could check that the username for a post matches the session, but why? - then I'd have to write code to close a security hole that I don't need to open). Rule number one of webapps is to never trust data from the client, and validation doesn't always fit the bill.

Then there's refreshing the data, and having multiple arrays containing the same model type but with different parameters - none of it is really enabled with Ember Data. I just couldn't find simple ways of solving the problems I was running into.

Then I ran into this post by Robin Ward which talks about kicking Ember Data to the curb. And the heavens opened and the light showed down. I had looked through the Discourse github and had seen that they weren't using Ember Data, but it helped to have it laid out in black and white.

(Also a great post by Robin is this one that compares Ember and Angular)

So, yeah. Getting rid of Ember Data removes 99% of the roadblocks I'm running into. I assume that the Ember team is basing their documentation on Ember Data to be future-compatible, but I wonder if they're hurting overall adoption. I know that my frustration level was getting to the point where I was questioning Ember as a choice.

Wednesday, May 29, 2013

Redis and Twit-arr

Just experienced my first PEBKAC while using my library - code reloading doesn't work when the server is in production mode.

Over the weekend, I started eating my own dog food where Kul is concerned. Last year, my wife and I went on the Joco Cruise Crazy - and it was awesome. Highly recommended. Cruising is something that IMHO requires friends, and 800+ fellow nerds are a great group of soon-to-be friends to go with.

(I also got to meet John Scalzi - and got to watch Wil Wheaton play Artemis. That's hard to beat, you gotta admit)

Anyway, the group brought along their own server with it's own microblogging instance. And while it was neat and all, it wasn't exactly set up for the cruise. It tended to overwhelm the wifi capacity whenever the entire group was together, and it didn't exactly make the best use of small screens, etc. The concensus was that on the next trip Twit-arr (the microblog) would have its software overhauled.

So over the weekend I started putting together a new version. I started with Kul (of course), and it's helped me to see some of the flaws / weak points in what I'm building. I've already made some changes to the framework based on things that were difficult to use.

I'm using Redis as a backend for the service as well. Redis is awesome, and it's Ruby integration is even more awesome (if that's possible). The installation guide in the quickstart was one of the most complete I've ever used, and if it took me an hour to set Redis up it was only because I was going slowly. Awesome, awesome, awesome.

Basically, I got to mess around with Redis (which I've been wanting to for awhile), use my own framework, and build something that (hopefully) will be of use to a bunch of people.

The code for my new twit-arr instance is up on GitHub, of course. Still in early development but I'm having a blast with it.

Ruby / Sinatra stuff

I had trouble finding this little nugget out there, but it seems that Ruby's require statement is now thread-safe. Looking at the web, it seems that it didn't used to be back in 2009 or so.

load is still not thread-safe. This makes my auto-loading unsafe. Also learned about autoload, which does some similar things to what I'm doing, but is being deprecated because it's unsafe. Which makes sense, as it should have the same problems I'm experiencing.

Also, I couldn't find anything that described what Sinatra did when it was in development mode - there's a few things I found browsing through the source:

Sets the error pages to the Sinatra versions
Something with the session secret
Turns on template reloading
Binds only to localhost
Turns on showing exceptions

... and that's it.

Sunday, May 26, 2013

Code reloading

For a web framework, code reloading is a crucial feature (especially for a framework designed for rapid development!). Minutes spent waiting for a server to reload add up quickly. Some of that can be mitigated by doing comprehensive unit testing, but there is no replacement for being able to see your code running in situ.

There's a great explanation of code reloading by Konstantin Haase, some of which I actually understand! The rails deep-diving isn't what I'm interested in, but the explanation of code reloading. This is also the link that Sinatra uses to explain code reloading, so I'm not the only person that uses it as a reference.

The bottom line is that if you want true code reloading, you need to actually have a separate Ruby context for each request - which is what Shotgun (or shotgonne if you're a Prachett fan) does behind the scenes. It's the only real way to guarantee that your code reloads.

Initially the way Kul did code reloading was just to re-load the files every time it loaded one. It's both brute-force and has a few problems. In production, you can't reload the entire website from disk for every request, but in development mode it really isn't that big of an issue.

The bigger issue is that I was basically abusing the open nature of ruby classes. Effectively, the load will reopen the class and redefine it from the code in the file. This works well for a few specific things, and badly (or not at all) for others. For example, adding a method would work fine; removing a method would not work at all.

Plus, as much as I justified it, reloading all of the code on every request did bother me.

Today I started replacing the reloading inside of Kul. It's a small improvement, but I think it should help in the long run. There are some limitations, but it should work for 99% of the expected use of the framework. In the other circumstances, you'll just have to restart the server - sorry.

The first thing I wanted to fix was the brute-force nature of the code reloading. That was simple enough - I just keep track of the file date and don't bother reloading the file if it hasn't been modified. I did find out that (at least on Windows) the modified time is only tracked to the second - I'm sure that's fine for almost everyone else but I had to do some jiggery-pokery to trick the unit tests into passing. It was either that or put a one-second sleep in there, and that's just wrong.

The second thing I did was to assume that require statements are not likely in user code. That's probably the most fragile assumption I'm making, but it should cover most of the usages (once I get the models in, at least). As long as all of the code you're using is either:

inside of a library (which shouldn't change during runtime), or
in framework files such as server.rb or controller.rb

then the framework should be able to gracefully handle changes to your code. Basically, the user has to follow the framework's rules as far as naming files and placing code.

(This is actually the same assumption I was making before, but I made a more conscious decision whereas before it was just incidental)

There are a few limitations even with that big 'ol assumption up there. If you have more than the framework-expected class in the file, you'll be falling back on the original code reloading (i.e. it won't remove methods / constants / etc). Also, any metaprogramming you've done outside of that file will be gone. Basically don't manipulate classes outside of the file.

Also, any instances hanging around will not change their code. That's the one that I think is most likely to cause confusion. Code reloading could also do some really interesting things with class methods, depending on how they're called. And lastly, this code reloading will work horribly in a multi-threaded environment. I think that it would eventually reach a steady-state, but it's hard to say for sure.

At the end of the day, it's not perfect, but it should make development easier, and that's the final goal. It's not for production anyway. The reloading code will be exposed so that if anyone does want to make use of it from their application, they can call Kul::FrameworkFactory.load_class with the class and path, and the framework will handle the reloading.

Tuesday, May 21, 2013

User Interfaces

This is kind of a followup to a post from a couple months ago about rich internet apps.

Today, if you want to build an application that runs on as many systems as possible, the answer is simple - you build a web application. There are other options: you could build a Java app for example. Java runs on so many systems, and you can build it once and the executable runs anywhere.

However, you have to install Java on the client. Many people don't want to do that. And you have to install the application on the client. More people don't want to do that. A non-sandboxed application is a scary thing to install on a system these days.

I feel that there's limited utility in actually trying to build and distribute a Java application. And the market pretty much bears that out - how many Java apps are actually being worked on these days? I can't think of a one. I'm sure there are some out there, but not very many. And they probably have a very limited / controlled distribution channel.

It makes sense if I think about it. In order for it to be a program, it has to have a set of instructions to run. Those instructions need to either be in the language of the machine, or there needs to be another program on that machine to interpret those instructions. Therefore, you have to have something that can interpret your language of choice on the range of machines you want to run on.

It's simple, basic computers 101 - computers run instructions. But it has consequences when trying to build something that can be run across architectures. There has to be an interpreter for that language on the remote machine. That's all there is to it. You need an interpreter.

In the case of Java, the JVM provides that interpreter - from bytecode to machine instructions. But again, Java has too much power - it allows a malicious programmer to execute instructions that are harmful.

There is one interpreter that both runs in a sandbox and has an even bigger install base than Java does: Javascript. Or ECMAScript, if you're so inclined.

This is why web applications are so popular - they have the single largest install base for the sandbox / interpreter that they run. Heck, most modern phones have multiple browsers - take that Java!

(Also, most browsers don't install the ask toolbar along with themselves)

Web apps also have an incredibly rich graphical interface. It's not the most stable between browsers, but having written applications in Win32, MFC, Swing, WPF, Qt, and HTML - I'll take the web any day.

Cross platform, common GUI applications - what's not to like?

(Well, there are quite a few things, but there's quite a few other places to hear that sort of griping.)

Monday, May 13, 2013

Snake oil peddlers

Great article on Quartz about big data. It includes some data about data processing size on clusters at Yahoo and Facebook. If those guys don't need clusters for "big data", why do smaller companies?

Not saying that some companies don't. But it's a simple question that should be answered before you go down that path. Why do you need a multi-node cluster running MapReduce in order to process a few gigabytes of data? If you can't answer that question, then you probably are just wasting money on servers and even more money on developers to build frameworks on those servers.

Architecture should be as simple as possible.

The backing paper has a great summary: "...analytic jobs — in particular Hadoop MapReduce jobs — are often better served by a scale-up server than a scale-out cluster"

...

It seems to me that somewhere along the line, the "cloud" went from being service-oriented to being data-oriented. Having a cluster that provides services that can be accessed is an incredibly useful thing, especially if those services are accessed in a standard way. Both Amazon and Google have infrastructures like this - Google App Engine, although not my favorite, does exactly this, as does Amazon's Web Services.

Those are clusters that run many people's services all on the same hardware. Instead of having a cluster of machines that are all focused on one giant problem, you have a cluster that solves many problems at once.

What it seems many people are selling is the quest for the holy grail of data. Take all the data, run a giant complex algorithm on it, and all of the answers will be made clear - i.e. Snake oil peddlers.

I keep MapReduce on my resume - it's a great paradigm to know. But I am very cautious about how I use that technology - there are just too many companies using Hadoop as a catch-all answer for every problem.

Wednesday, May 8, 2013

Travis CI and RACK_ENV

I'm still working on a major refactor. It's going well, but it's not quite there yet.

Tonight I decided to check in my code, because... I'm a developer. Anyway, unit tests were passing, everybody was happy. environment variable

Code was checked in, and I promptly recieved an email from Travis-CI, telling me that my tests were failing. Except that I ran them before I pushed the code, and it ran fine!

Obviously a configuration difference, but I couldn't figure out what the difference was, since I have my gemfile.lock checked in and was running the same "bundle exec rake" command that Travis was. I had actually figured out the problem and had figured out how to correct the test before I could reproduce the test failure.

Eventually I stumbled onto the Travis environment documentation - and saw the light. Turns out that Travis sets the RACK_ENV=test environment variable, which causes Rack to actually complain about unhandled exceptions, instead of turning them into a 500 like it was in my code. I quickly added the parameter to the top of my spec_helper.rb and voila! - my unit tests began failing!

Never been so happy to see a unit test fail before. Committed my fix and the build works again. Winning!

Sunday, April 28, 2013

Controllers ... again

I realized there was a problem with the controllers the other day, and I've been trying to figure out a way of fixing it. Technically, I have two very similar problems.

First is that users would probably like to be able to control which methods in a module are actions. Currently, if a function exists in a module, then it gets applied to the request and run. I could use the private / public status to control which methods are visible, but it's not terribly elegant.

The other problem is that there's no way to restrict HTTP verbs for an action. Currently only GET is implemented anyway, but that's ratcheting ever higher on my list of things to fix. Once that happens, I'll have to figure out some way of being able to restrict an action to a specific verb.

Interesting side question: does it make any sense to be able to set multiple verbs on an action? I suppose that makes sense from a REST type perspective - you'd one to decide what action to take based on what verb was passed in. Okay, so... multiple verbs.

There are quite a few different ways to approach this. First, you could just add a constant to the module that contains a list of actions. The framework would only call functions that are in that list.

That is what's known as "ugly as hell".

Plan B: define a DSL like Sinatra does:

module MyApp
  module MyController
    actionize!

    get 'foo' do
      # controller action
    end
  end
end

The actionize! call is there for a very specific reason. Since we don't have a base class to derive from, we have to hack our way in from Ruby's Module class. Adding 'get' and 'post' functions to every module would be a bad and possibly hazardous thing to do, so we add a single unique method that adds the required functions to this specific module instance.

So that works, it looks pretty clean, and it has the added benefit of being a pattern that many people use and like. (i.e. Sinatra) It's also extensible, in that we can add options after the action name, the same way that Sinatra does. It also handles multiple verbs on an action cleanly, since you're not actually defining a function per say.

The downside is that you have to understand the paradigm in order to build a controller. It's built around the language.

Plan C - The last way I thought do to it was more annotations-based:

module MyApp
  module MyController
    actionize!

    action :verb => POST
    def foo
      # controller action
    end
  end
end

What I like about this plan is that it's using generic Ruby methods. We could default the verb to GET and much of the boilerplate goes away. The options we could pass in for plan B could also be passed in here. We could do interesting things like look at the function parameters and pull those parameters out of the params hash - syntactic sugar, but nice to have. It seems like some fun things could be done here to make these actions easier to work with.

One downside is that since it's a ruby method we can't really overload it. We could pass an array of verbs in to the action, but now we'd have to put a big ugly switch inside the function to handle each verb. It's also an extra line that is not grammatically attached to the function definition - it could get lost or deleted and the code would still parse.

I think the Sinatra DSL plan is probably the best plan. It's simpler and easier to see what is being built. I'll keep thinking on it, but that is probably the route I'll end up taking.

Tuesday, April 23, 2013

Controller verbs

Messing with the framework today, I remembered that controller actions currently only respond to GET. Which is not a terribly complex problem to solve from a framework perspective - I can simply add a Sinatra handler for the other verbs that routes to the controller actions the same way that GET does.

The question is, should I?

If I do this, then all controller actions will respond to all HTTP verbs. Which is not the typical way of working with one of these frameworks. Rails routing makes each routing action very specific to a given verb. In MVC.net you annotate the controller methods with the particular verb.

We're really talking about a filter here. What I don't want is the framework user to have to build in boilerplate code for each controller that filters out verbs that they don't want to deal with for a given method.

There's probably some nifty Ruby language feature I could use to filter these methods. But that would be adding configuration, I think. Also, the ways I can think of do build this filter would be a little bit fragile.

What I'm actually thinking about is building the filter into the name. If the action function starts with "get_", then it's a get method. If it starts with "post_" then it's a post method. If it doesn't start with a verb, then it would handle any verb. The url for the action would not include the verb, of course.

Interestingly, that would actually allow overloads of a given action - you could easily have a put version and a get version, which would normally not be allowed in Ruby.

The only real downside I can think of is that now you could not have an action that began with a verb. There's probably a way around it (explicit routing would be one option) but I have trouble thinking of a reason why you'd need to start an action with a verb.

I think I like this plan.

I also need to remember ... the thing that I forgot. crap.

Thursday, April 18, 2013

Github tasks

I've been quite distracted over the last week, and haven't really gotten anything done since Saturday.

But today I put all the outstanding issues I could think of into the github repo. Even checked off a couple easy ones. I've got two milestones - a basic "get the key features working" and the bigger "kick out a 1.0 release".

The next tasks to knock out are documentation and some examples. Right now, anyone looking at the gem is going to be very confused unless they go to all the trouble of reading the code. If there's documentation than maybe a few people will start using the gem and find me MORE features to implement.

Monday, April 15, 2013

It lives!

Kul v0.0.1 is up on rubygems.

The damned thing actually works. Probably less than a hundred hours of development time, and I've developed a fairly full-featured web application server.

The biggest feature missing at this point is any sort of authentication. There's probably a rack middleware for that though. Yup - omniauth. In order to get that working a person would have to hack into the router pretty deeply though. Something to work on for the future though.

I've tested the gem on both Win7 / JRuby and on Ubuntu / MRI. It worked in both places, although I did have to install nodejs on the linux box in order to provide the javascript runtime for coffee compilation.

I also noticed that the default website points back at the rubygems site, which is something I hate whenever I hit a gem page. It's been changed to github, although I'm not bothering to update the gem since nobody's using the thing.

Need to add a history file now.

I'm giddy now. My app server works.

Sunday, April 14, 2013

Kul controller design

Controllers are a love/hate relationship for me. They're a great place to put code that brings the models to the views. They're also great for figuring out what to do with a given request. All of those things could probably be done implicitly, but the rules would be hard to understand, I think. Or would break the code up into many small distributed chunks which would be painful to read / follow / debug.

I hate controllers because they don't make sense. What is the life cycle of a controller object? In most frameworks, it's the lifetime of the request. It exists only so that you can do dynamic dispatch from the request parameters. You could put initialization code in the controller, but in most frameworks, the controller constructor can't take parameters, so you're limited on what initialization you can do. So most of the time the controller exists only to hold similar functions together, and is really a stateless object. Stateless objects seem like a bad design choice to me. (Stateless does not equal immutable. Immutable objects are good design - particularly in threaded environments.)

Surprising things in Ruby #1

Try this in your irb:

irb(main):001:0> def foo
irb(main):002:1> 'hello world'
irb(main):003:1> end
=> nil
irb(main):004:0> 'a'.foo
=> "hello world"

If you're not used to Ruby, this may seem surprising. This may help make it clearer:

Rich internet applications

Looking at a webSockets implementation this evening, I was hit with the light.

The app we're building at work is not really an HTML page. That is the underlying transport layer, but really what we have is a javascript application that has its UI written in HTML. There is a single page, and different parts of it are loaded and unloaded in response to a user's actions. When the application needs data, it AJAXes back to the server for some JSON, and occasionally an HTML template or some new script.

This paradigm corresponds with a number of MVVM frameworks that are very popular these days, such as Backbone, Knockout, or Ember. Your web layer is really a javascript application running in the browser, and the app server just feeds it the data it needs to run.

What bothered me is that I felt that was unwieldy. Most of the really good app layer frameworks are MVC. You put an MVC framework on the app server with an MVVM framework in the client and you have way too many layers. I've been working on Kul as kind of a way of making simpler MVC frameworks, but it still feels to me that adding MVVM on top would be more complexity than you need.

Then I thought about web sockets. If you think of hitting a website as the initial application download, then web sockets become the data connection for your application. You could open a web socket, keep it open for the life of the app, and have that be your data connection.

That's really what Google does with docs and gmail. It's no longer a webapp, it's a heavy client app with a continuous server connection and a HTML UI.

You could build your app on top of something like Ruby's EM-WebSocket, or maybe something simpler like SinatraWebsocket. The possibilities are endless, really. Instead of building huge rest interfaces for a web app that doesn't need it, build a client application! It's a horrible practice to re-use that structure for a rest client, anyway.

You can completely minimize your web app footprint, have better testing methodologies, and build a much less complex webapp this way. Instead of having AJAX calls scattered about your javascript (and you know they all work slightly differently) you streamline your server communication through one point. Data all gets requested through the same pipe.

I'm sure this is old hat for the rest of the world, but I thought it was amazing when I realized the benefits. This is the right way to do RIA.

Friday, March 29, 2013

Almost there

Have Kul working fairly well. Getting the Rspec tests in place helped a ton - I've been using TDD since they've been working and the code is developing rapidly. Libraries like this are a great place to practice TDD - they tend to be very friendly to the process.

The framework does html, js, and css files, along with coffeescript and sass compilation on the fly. Templates with a .html.erb extension get rendered by the server base class and have a server context, an app context, and the request parameters if applicable. MVC-type routes are also processed by the server and pass the same parameters. Still need to actually connect the router to the server - just realized I forgot to do that.

Once that's all in place then all I need to do is bundle the thing up in a gem. My goal is to be able to install the gem "gem install kul" and then be able to "kul run" and have the server start right up. From that point it should dynamically handle any code in the run folder. I.e. the simplest possible execution of a web application that I've ever heard of. :)

Just thought of a nifty feature. "kul routes" - should dynamically search the folder structure (basically a BFS of the tree) and determine which files are accessible via routing. That's something I've genuinely wished I've had for just regular http servers, much less app servers. Similar to "rake routes" in rails, but it'd be an even more critical feature for a dynamic app server like this.

Saturday, March 16, 2013

Kul Testing

I've been putting off unit testing Kul as I really wasn't sure what sort of testing was possible against a web framework. Turns out that Rack apps (such as Sinatra) have this really awesome testing framework called 'rack-test'. It works really well, and allows you to do nifty things like this:

  it 'serves the favicon' do
    get '/favicon.ico'
    last_response.should be_ok
  end

Cool, right? You can also look at last_response.body for the actual body of the HTML returned, examine the headers, etc.

This was the simple part. The difficult part was setting up the external files for my tests.

Kul proof of concept

So my web app framework proof of concept is up and working. The github repo has the code I'm working with at the moment. It's basically a science project at the moment just to see if it makes sense.

Which I think it does. Currently it has a server / app / controller folder structure, dynamically reloads the code for those objects, and renders an HTML erb for the path given (in context of the controller if it exists). And that's the thing I love about Ruby - so little code and it's already working.

Strata Santa Clara 2013

The Strata conference I've been attending for the last three days is over. Now all I have to do is collect my suitcase and make my way back to Maryland and home.

It's been an interesting conference, and I learned quite a bit. Things I learned:

I got some insight as to how R works - it's a different paradigm than languages I'm used to.
I know more about data science than I thought.
I know far less about data science than many other people.
Which ML approaches are valid for which questions you're trying to answer.
There are a whole bunch of nifty technologies out there that I need to explore.
There are a whole bunch of nifty companies out there using big data that I need to learn about.
Julia is the name of a programming language. :)

The conference didn't push me that much, interestingly. I'm not exhausted or brain-dead the way I thought I would be. That may be due to actually getting enough sleep before each day (with one exception) but I think it was really just a process of putting together pieces of things that I've been learning for the last few months.

There's a whole bunch of techniques that I now understand how to use on my Kaggle data. Some of the things that I've thought about Hadoop and the surrounding architectures turned out to be validated by some clearly very smart people. That gives me confidence in the way I approach problems.

In summary, the conference gave me a perspective on where I fit within the data science community at large, and the feeling that I'm on the right track with my experiments and research.

From here, I plan to spend some time messing with various ML techniques and getting down and dirty with some more statistics. From there, maybe I can find and join a data science team that needs a good developer?

Forward the data science!

... but home first.

Kul initial hacking

So I hacked together a few lines of Ruby that do a little of what I talked about in the last post. For a given path, it checks for the presence of a .rb at that path, runs it, then tries to run a template at the same path.

I already don't like it. For starters, I can already see it'll generate too many files. If every path has a .rb and a .erb, there's gonna be a ton of files, and while it puts connected files near each other, I'm not sure it'll be clear to new users why the separation exists.

It's also completely NOT object-oriented. My previous posts go into great detail about my feelings on OO, and it seems silly to throw that away.

Finally, it's completely convention - there is no way to easily override that convention.

I think I'm going to put in something similar to rails model, where there's a controller.rb that holds the code for the various actions. That gives us good separation of concerns, limits the number of files, and gives us a place to override the defaults.

Kul Application Framework

As much as I like Rails (and I do!) there are times when it's kind of a pain in the ass. If I'm just trying to hack something together, Rails can kinda be overkill. Sinatra is MUCH more lightweight, but it also provides very little structure to be able to do anything. You have to build everything you want to do in it, other than just being able to handle a given URL. And then you have to deploy the whole package, which is sub-optimal for hacking.

Here's my use case: I've got an app server that I can run things on. I don't have much of anything running on it, but I'd like to be able to put up some simple bits of code, for demonstration purposes. In particular, I'd like to be able to put up some code that runs Ruby on the server. I don't have an "app" persay, and I don't really want all the overhead of "deploying" one, either in Ruby or jRuby. That's way more overhead than I need.

What is this statistical analysis you speak of?

I've spent the last month or so reading everything I can about the Data Science. It's been fun and interesting, but I've come to the conclusion that I don't know a damn thing about statistical analysis. I feel that's going to be a problem going forward.

Code I understand. Heck, I've been writing code for most of my life, and particularly code that needed to be far more robust than most of what I've seen in the field. Being able to hack an algorithm together is going to be my strong point.

Vocabulary is one of the most important things you can learn about a new field. Being able to communicate effectively about statistics is not a skill I have. I took a statistics class when I was in college, but my retention after fifteen years of not using it is pretty poor. I have a vague recollection of a few probability concepts, but that's about it. I'm almost certain that I didn't even learn anything about analysis.

I found this blog post from Andy Mueller that describes his standard approach for new data - I don't have context for half the words on there. But it provides me a great place to start looking for things to learn.

I've also grabbed myself a couple of statistics books, and want to try implementing a few of these algorithms. I've found that I understand concepts much better when I've implemented them in code.

Anyone have any other ideas for learning statistics?

Thursday, January 17, 2013

Wall Street Journal

I saw a link to this today and wasn't entirely certain if it was a joke or not.

I do finally understand why Romney thinks that middle income is $250K a year - he's been reading the Wall Street Journal. Here's a link to a more accurate graph of income in the US - provided by the US Census Bureau. Note the part that shows the over $250K income as being the TOP TWO PERCENT OF HOUSEHOLDS.

The easy target from the WSJ article is the single parent (with the picture showing a mother) making $260K a year. I've known many single parents throughout my life, and NONE of them made anywhere near $260K a year. Admittedly, I know very few people making $260K a year or over anyway. Most of them were in borderline poverty.

Do a search for "single mother income" and have a look at the articles like this, or this. "Half of single mother families have an annual income less than $25,000." - "Two fifths of single mother families are poor, triple the poverty rate for the rest of the population." - "Three quarters of homeless families are single mother families."

What part of that sounds like a family making $260K a year?

Oh, and according to the article by The Nation, 80% of single parents are mothers. So the WSJ got the PICTURE right, even if they had the income off by an order of magnitude.

I live in one of the highest-income areas of the US, and I think I make a large amount of money. Which is not even half of the incomes from that WSJ article. Let me run that by you again - I consider myself wealthy (which the Census graph supports) and I make LESS THAN HALF of what the WSJ implies is AVERAGE.

Given, nobody making $26K a year reads the WSJ. It's probably because they are too busy working multiple jobs to feed their families, but I'd like to think that those people are too smart to bother reading that load of crap.

...

There may be a few single parents out there actually making $260K a year, and I'm glad you have the income to be able to provide for your children. Your taxes are going up about $280 a month. My suggestion is to lose the Lexus and get a Honda instead - they're much cheaper and still really good cars.

...

Side note: why do the brunette / redhead couple have four blonde children? Guy needs to ask some questions, I think.

Sunday, January 13, 2013

Gun Control

I'm going to try to bring this around to statistics, although I'm not sure how successful I'll be. This is an incredibly divisive issue - fortunately nobody reads my blog so this is really just getting my thoughts down.

First, a disclaimer: I know how to shoot, and enjoy shooting. I don't own any guns. My father owned several pistols growing up, and I learned gun safety from him. I was in the army for five years, and in two trips to Iraq I never had to fire my M16 at a target that wasn't paper, although I once had to chamber a round to get an Iraqi to stop advancing. I would have shot at them, and I'm incredibly glad that I didn't have to.

Hatin' on Forbes

Interesting attack on data science in Forbes here. Short rebuttal here.

While the paper by Ray Rivera is mostly stupid, it actually has a point that I agree with - that data science shouldn't be considered the oracle at Delphi that has all the answers. Granted, that's true of any technology or set of buzz words.

Heck, if you want to be sold snake oil - try asking a consultant what sort of "analytics" he would recommend. (Interesting note: Ray appears to work for SAP. SAP does analytics consulting. Correlation!)

Magic voodoo snake oil algorithms are WHY more people should be doing data science. If everyone could examine data and competently discuss the results, nobody would be able to sell that kind of crap. The more people that can perform logical thinking, the less we have shady consultants.

ggplot

Next task: integrate ggplot into workflow!

...

That is all.

...

Seriously - that's my next data science goal. That's all. Baby steps.