Friday, March 29, 2013

Almost there

Have Kul working fairly well. Getting the Rspec tests in place helped a ton - I've been using TDD since they've been working and the code is developing rapidly. Libraries like this are a great place to practice TDD - they tend to be very friendly to the process.

The framework does html, js, and css files, along with coffeescript and sass compilation on the fly. Templates with a .html.erb extension get rendered by the server base class and have a server context, an app context, and the request parameters if applicable. MVC-type routes are also processed by the server and pass the same parameters. Still need to actually connect the router to the server - just realized I forgot to do that.

Once that's all in place then all I need to do is bundle the thing up in a gem. My goal is to be able to install the gem "gem install kul" and then be able to "kul run" and have the server start right up. From that point it should dynamically handle any code in the run folder. I.e. the simplest possible execution of a web application that I've ever heard of.  :)

Just thought of a nifty feature. "kul routes" - should dynamically search the folder structure (basically a BFS of the tree) and determine which files are accessible via routing. That's something I've genuinely wished I've had for just regular http servers, much less app servers. Similar to "rake routes" in rails, but it'd be an even more critical feature for a dynamic app server like this.

Saturday, March 16, 2013

Kul Testing

I've been putting off unit testing Kul as I really wasn't sure what sort of testing was possible against a web framework. Turns out that Rack apps (such as Sinatra) have this really awesome testing framework called 'rack-test'. It works really well, and allows you to do nifty things like this:
  it 'serves the favicon' do
    get '/favicon.ico'
    last_response.should be_ok
  end
Cool, right? You can also look at last_response.body for the actual body of the HTML returned, examine the headers, etc.

This was the simple part. The difficult part was setting up the external files for my tests.

Tuesday, March 5, 2013

Kul proof of concept

So my web app framework proof of concept is up and working. The github repo has the code I'm working with at the moment. It's basically a science project at the moment just to see if it makes sense.

Which I think it does. Currently it has a server / app / controller folder structure, dynamically reloads the code for those objects, and renders an HTML erb for the path given (in context of the controller if it exists). And that's the thing I love about Ruby - so little code and it's already working.

Thursday, February 28, 2013

Strata Santa Clara 2013

The Strata conference I've been attending for the last three days is over. Now all I have to do is collect my suitcase and make my way back to Maryland and home.

It's been an interesting conference, and I learned quite a bit. Things I learned:
  • I got some insight as to how R works - it's a different paradigm than languages I'm used to.
  • I know more about data science than I thought.
  • I know far less about data science than many other people.
  • Which ML approaches are valid for which questions you're trying to answer.
  • There are a whole bunch of nifty technologies out there that I need to explore.
  • There are a whole bunch of nifty companies out there using big data that I need to learn about.
  • Julia is the name of a programming language.  :)
The conference didn't push me that much, interestingly. I'm not exhausted or brain-dead the way I thought I would be. That may be due to actually getting enough sleep before each day (with one exception) but I think it was really just a process of putting together pieces of things that I've been learning for the last few months. 

There's a whole bunch of techniques that I now understand how to use on my Kaggle data. Some of the things that I've thought about Hadoop and the surrounding architectures turned out to be validated by some clearly very smart people. That gives me confidence in the way I approach problems.

In summary, the conference gave me a perspective on where I fit within the data science community at large, and the feeling that I'm on the right track with my experiments and research.

From here, I plan to spend some time messing with various ML techniques and getting down and dirty with some more statistics. From there, maybe I can find and join a data science team that needs a good developer?

Forward the data science!

... but home first.

Kul initial hacking

So I hacked together a few lines of Ruby that do a little of what I talked about in the last post. For a given path, it checks for the presence of a .rb at that path, runs it, then tries to run a template at the same path.

I already don't like it. For starters, I can already see it'll generate too many files. If every path has a .rb and a .erb, there's gonna be a ton of files, and while it puts connected files near each other, I'm not sure it'll be clear to new users why the separation exists.

It's also completely NOT object-oriented. My previous posts go into great detail about my feelings on OO, and it seems silly to throw that away.

Finally, it's completely convention - there is no way to easily override that convention.

I think I'm going to put in something similar to rails model, where there's a controller.rb that holds the code for the various actions. That gives us good separation of concerns, limits the number of files, and gives us a place to override the defaults.

Kul Application Framework

As much as I like Rails (and I do!) there are times when it's kind of a pain in the ass. If I'm just trying to hack something together, Rails can kinda be overkill. Sinatra is MUCH more lightweight, but it also provides very little structure to be able to do anything. You have to build everything you want to do in it, other than just being able to handle a given URL. And then you have to deploy the whole package, which is sub-optimal for hacking.

Here's my use case: I've got an app server that I can run things on. I don't have much of anything running on it, but I'd like to be able to put up some simple bits of code, for demonstration purposes. In particular, I'd like to be able to put up some code that runs Ruby on the server. I don't have an "app" persay, and I don't really want all the overhead of "deploying" one, either in Ruby or jRuby. That's way more overhead than I need.

Tuesday, January 29, 2013

What is this statistical analysis you speak of?

I've spent the last month or so reading everything I can about the Data Science. It's been fun and interesting, but I've come to the conclusion that I don't know a damn thing about statistical analysis. I feel that's going to be a problem going forward.

Code I understand. Heck, I've been writing code for most of my life, and particularly code that needed to be far more robust than most of what I've seen in the field. Being able to hack an algorithm together is going to be my strong point.

Vocabulary is one of the most important things you can learn about a new field. Being able to communicate effectively about statistics is not a skill I have. I took a statistics class when I was in college, but my retention after fifteen years of not using it is pretty poor. I have a vague recollection of a few probability concepts, but that's about it. I'm almost certain that I didn't even learn anything about analysis.


I found this blog post from Andy Mueller that describes his standard approach for new data - I don't have context for half the words on there. But it provides me a great place to start looking for things to learn.

I've also grabbed myself a couple of statistics books, and want to try implementing a few of these algorithms. I've found that I understand concepts much better when I've implemented them in code.

Anyone have any other ideas for learning statistics?