Dreams from Rylath

Saturday, May 24, 2014

Ember take two

I went back to Rails for my latest project (personal / family photo store) and have been having quite a bit of fun with it.

The project had reached a bit of a crossroads, where the basic proof of concept was in and working, and it was actually functional. I have an idea of what the app should look like, and I've used it enough to know a few things that work well and a few things that won't work the way I had initially thought they would.

Basically, I had a prototype. And we all know the rule for a prototype: throw it away. But that means I need to know what to replace it with.

While my last project with Ember.js was a success, it performed poorly on mobile devices. However this project is not going to be mobile specific, and I can have a basic (js-lite) front end with minimal functionality for mobile. (Basically, I want to be able to upload photos from my phone, but I don't need the photo management tools on the mobile version)

(I hear the word "mobile" in my head with a British accent. It makes me feel proper.)

Another developer I've been working with recently has been advocating vanilla web development with turbolinks for speed. And I think that could work well, for a certain class of application. But that's not the application I build.

I wish I could Scala

I've decided on my next project. I'm going to build a website that my wife and I can upload our photos to and have it store the photos and a database of metadata about the photos. Basically instagram but without the internet. It should be a pretty simple thing to hack together - I have half the code already written in Twit-arr. It will allow us to take out photos from our various devices (two phones and four+ computers) and put them all into a single spot that I can back up.

Since I've been wanting to use Scala for something and have been playing with it here and there, I decided to fire up Typesafe Activator and start a Play project.

Unfortunately I now realize that plan was a mistake.

Redis autocomplete

Autocomplete is a very useful tool in a webapp. So when I found the use case, I began looking for autocomplete solutions in Redis.

There are some very neat posts about how to build one, and there's also a gem: seatgeek/soulmate. It's very nice and feature complete, but it's pretty much overkill for what I'm trying to do. It's pretty much a data store in itself. And it'll generate lots and lots of keys, which I don't really want to do - I'd like to keep all of my data in the same store, and the key explosion will make that very difficult.

If you read over the various options for redis autocomplete, you'll find quite a few descriptions of indexing via hashkey. And it's pretty straightforward if you think about it. But I was convinced that there was a way of doing autocomplete that relied on sorted sets.

And eventually I figured it out. There are two basic requirements to get this to work. First, you have to have a unique description for every key you want to index. Second, you have to invent a mapping for each autocomplete word to the set of integers such that strings have a weight that works suchly: a < aa < ab < b ... and so forth.

The mapping function is fun, because it's basic computer science. Instead of a hexadecimal number set that contains sixteen values for each digit, I'm building a number set that contains 27 values for each digit. Then I take and right-pad (or truncate) each value I want to index so that they're all the same length. This provides the property listed above where a < aa < ab < b - and now I can store those values as scores in a sorted set.

This is why we need a unique description for each value - the description becomes the key for the sorted set. Take for example - we want to index the value 'foo' and associate it to a post id: 123. We run our autocomplete value 'foo' through the function above, yielding some (big) integer. We take that integer as the score, and for the value we store a tuple ('foo', 123). This allows us to build an index that contains multiple values to multiple ids.

When searching for a value, the process is very similar. Let's say the user starts by typing an 'f', since he's searching for the 'foo' value we stored above. The system will first compute the value for 'f', which given our function will be less than the value for 'foo'. Next the system will compute the next value for the last character of the search string. In this case, the system will compute the value for 'g'. Again, given our function, this is guaranteed to be greater than the value we're looking for.

Now we simply get from redis any set members with a score between those two values. Limiting the number of results is a good idea, but in most cases users are probably only interested in the first ten results in any case. They can always type more characters to refine the autocomplete.

So what is the performance? In Redis, ZRANGEBYSCORE (with a limit) is listed as O(log N) for N elements in the set. ZADD is O(log N) as well. For a searching algorithm, that's about as fast as you're going to be able to get.

There are a couple of weaknesses to this scheme that don't make it perfect for everyone. This scheme won't work if you need to weight your terms. That's a good use case for something like soulmate, which supports that.

Also, this algorithm is length-limited. Given a 64-bit (signed) long value as the score and a 37-character set, by my calculations you'll overflow the score if you go over 12 characters. For my use case, that's a lot of characters. I've got my length set to 10 characters, and I can't imagine I'll exceed that.

Monday, December 9, 2013

Java

One: "So we've got this strongly and statically typed, compiled language here. We call it 'Java'."

Two: "Sounds neat! How do I configure components in it?"

One: "We prefer that you do all of your configuration via deeply nested, fragile XML that is not validated in any way."

Two: "Why don't we just configure the components in ... code?"

One: "Because then you can't change the configuration after compile time."

Two: "We need to dynamically change which factory class creates the database pool?"

One: "JUST GO TYPE! MANAGEMENT LIKES TO HEAR LOTS OF TYPING!"

Two: ...

Wednesday, December 4, 2013

Next: Scala

I just finished reading this great article about Play / Scala and async.

http://engineering.linkedin.com/play/play-framework-async-io-without-thread-pool-and-callback-hell

For most of the applications I build, I don't need the sort of performance that requires a async framework. And from what I've seen of node.js, I'm not going to mess with it unless I need something that fast.

But the things in that article are pretty. Those comprehensions are nice.

As much as I love Ruby, there are a few things that it doesn't do well. It is designed for code writing speed, not execution speed. And it does lack some of the safety features of a compiled language.

After I finish with Twitarr, it'll be time to start messing with some Scala. And probably Play. Looks like fun.

Monday, December 2, 2013

Ember.js views

In the view code (i.e. the code that extends Ember.view) you have to prefix any controller property accesses with 'controller.' - i.e. if you're getting the foo property of the the controller, you have to use @get('controller.foo') - view properties you can get without any prefix, as one would expect.

This kind of makes sense, although I'd point out that there are many other places where Ember automatically looks to parent elements for data.

What confuses me is that in the template that is used by the view you have to prefix view accesses with 'view.' - controller properties are bound without a prefix though.

Yes, the template is EXACTLY opposite the view code.

...?

Friday, September 27, 2013

Thread safety

(I wrote this some days ago but didn't post it. I realized that it reinforced my Ruby camps post so I'm putting it up now)

A few days ago I found this great online book by Avdi Grimm (author of Ruby Tapas) named Objects on Rails. The link goes to the free (as in beer) version of the book, but I highly recommend spending the five bucks and supporting the author. (Sadly the downloadable version does not contain an HTML version, which would be really useful for referencing)

I spent a couple of days deep-diving into the book. Mostly it reinforces a bunch of patterns that I've been using in my own Rails projects, like the Draper gem. (Avdi uses exhibits, which are very similar to what Draper provides as discussed by the authors here)

Something was bothering me about it though. As I started implementing some of his patterns in my own code, I realized what the problem was. The top-level blog class is not thread-safe.

This is not a problem in MRI, since it uses green threads instead of system threads. However in JRuby, using a single global object like this has the potential for race conditions. It's very unlikely that it could cause a problem in this little blog app - there's very little state in the Blog class and it's mostly initialized when the class is initialized.

But it would be easy for a new developer to see this class and add something like a caching layer. Any mutable state on the classes is going to be a problem, and it's unlikely to be caught on a development system, either.

I've ended up refactoring the top-level blog class out of my app. Instead of leveraging a root object to facilitate unit testing, I'm using DCI patterns to help. It's working well with Redis objects, which are very similar to basic Ruby types. I'll post more about that later.