Thursday, February 28, 2013

Strata Santa Clara 2013

The Strata conference I've been attending for the last three days is over. Now all I have to do is collect my suitcase and make my way back to Maryland and home.

It's been an interesting conference, and I learned quite a bit. Things I learned:
  • I got some insight as to how R works - it's a different paradigm than languages I'm used to.
  • I know more about data science than I thought.
  • I know far less about data science than many other people.
  • Which ML approaches are valid for which questions you're trying to answer.
  • There are a whole bunch of nifty technologies out there that I need to explore.
  • There are a whole bunch of nifty companies out there using big data that I need to learn about.
  • Julia is the name of a programming language.  :)
The conference didn't push me that much, interestingly. I'm not exhausted or brain-dead the way I thought I would be. That may be due to actually getting enough sleep before each day (with one exception) but I think it was really just a process of putting together pieces of things that I've been learning for the last few months. 

There's a whole bunch of techniques that I now understand how to use on my Kaggle data. Some of the things that I've thought about Hadoop and the surrounding architectures turned out to be validated by some clearly very smart people. That gives me confidence in the way I approach problems.

In summary, the conference gave me a perspective on where I fit within the data science community at large, and the feeling that I'm on the right track with my experiments and research.

From here, I plan to spend some time messing with various ML techniques and getting down and dirty with some more statistics. From there, maybe I can find and join a data science team that needs a good developer?

Forward the data science!

... but home first.

Kul initial hacking

So I hacked together a few lines of Ruby that do a little of what I talked about in the last post. For a given path, it checks for the presence of a .rb at that path, runs it, then tries to run a template at the same path.

I already don't like it. For starters, I can already see it'll generate too many files. If every path has a .rb and a .erb, there's gonna be a ton of files, and while it puts connected files near each other, I'm not sure it'll be clear to new users why the separation exists.

It's also completely NOT object-oriented. My previous posts go into great detail about my feelings on OO, and it seems silly to throw that away.

Finally, it's completely convention - there is no way to easily override that convention.

I think I'm going to put in something similar to rails model, where there's a controller.rb that holds the code for the various actions. That gives us good separation of concerns, limits the number of files, and gives us a place to override the defaults.

Kul Application Framework

As much as I like Rails (and I do!) there are times when it's kind of a pain in the ass. If I'm just trying to hack something together, Rails can kinda be overkill. Sinatra is MUCH more lightweight, but it also provides very little structure to be able to do anything. You have to build everything you want to do in it, other than just being able to handle a given URL. And then you have to deploy the whole package, which is sub-optimal for hacking.

Here's my use case: I've got an app server that I can run things on. I don't have much of anything running on it, but I'd like to be able to put up some simple bits of code, for demonstration purposes. In particular, I'd like to be able to put up some code that runs Ruby on the server. I don't have an "app" persay, and I don't really want all the overhead of "deploying" one, either in Ruby or jRuby. That's way more overhead than I need.