I went back to Rails for my latest project (personal / family photo store) and have been having quite a bit of fun with it.
The project had reached a bit of a crossroads, where the basic proof of concept was in and working, and it was actually functional. I have an idea of what the app should look like, and I've used it enough to know a few things that work well and a few things that won't work the way I had initially thought they would.
Basically, I had a prototype. And we all know the rule for a prototype: throw it away. But that means I need to know what to replace it with.
While my last project with Ember.js was a success, it performed poorly on mobile devices. However this project is not going to be mobile specific, and I can have a basic (js-lite) front end with minimal functionality for mobile. (Basically, I want to be able to upload photos from my phone, but I don't need the photo management tools on the mobile version)
(I hear the word "mobile" in my head with a British accent. It makes me feel proper.)
Another developer I've been working with recently has been advocating vanilla web development with turbolinks for speed. And I think that could work well, for a certain class of application. But that's not the application I build.
Saturday, May 24, 2014
Sunday, April 6, 2014
I wish I could Scala
I've decided on my next project. I'm going to build a website that my wife and I can upload our photos to and have it store the photos and a database of metadata about the photos. Basically instagram but without the internet. It should be a pretty simple thing to hack together - I have half the code already written in Twit-arr. It will allow us to take out photos from our various devices (two phones and four+ computers) and put them all into a single spot that I can back up.
Since I've been wanting to use Scala for something and have been playing with it here and there, I decided to fire up Typesafe Activator and start a Play project.
Unfortunately I now realize that plan was a mistake.
Since I've been wanting to use Scala for something and have been playing with it here and there, I decided to fire up Typesafe Activator and start a Play project.
Unfortunately I now realize that plan was a mistake.
Thursday, February 6, 2014
Redis autocomplete
Autocomplete is a very useful tool in a webapp. So when I found the use case, I began looking for autocomplete solutions in Redis.
There are some very neat posts about how to build one, and there's also a gem: seatgeek/soulmate. It's very nice and feature complete, but it's pretty much overkill for what I'm trying to do. It's pretty much a data store in itself. And it'll generate lots and lots of keys, which I don't really want to do - I'd like to keep all of my data in the same store, and the key explosion will make that very difficult.
If you read over the various options for redis autocomplete, you'll find quite a few descriptions of indexing via hashkey. And it's pretty straightforward if you think about it. But I was convinced that there was a way of doing autocomplete that relied on sorted sets.
And eventually I figured it out. There are two basic requirements to get this to work. First, you have to have a unique description for every key you want to index. Second, you have to invent a mapping for each autocomplete word to the set of integers such that strings have a weight that works suchly: a < aa < ab < b ... and so forth.
The mapping function is fun, because it's basic computer science. Instead of a hexadecimal number set that contains sixteen values for each digit, I'm building a number set that contains 27 values for each digit. Then I take and right-pad (or truncate) each value I want to index so that they're all the same length. This provides the property listed above where a < aa < ab < b - and now I can store those values as scores in a sorted set.
This is why we need a unique description for each value - the description becomes the key for the sorted set. Take for example - we want to index the value 'foo' and associate it to a post id: 123. We run our autocomplete value 'foo' through the function above, yielding some (big) integer. We take that integer as the score, and for the value we store a tuple ('foo', 123). This allows us to build an index that contains multiple values to multiple ids.
When searching for a value, the process is very similar. Let's say the user starts by typing an 'f', since he's searching for the 'foo' value we stored above. The system will first compute the value for 'f', which given our function will be less than the value for 'foo'. Next the system will compute the next value for the last character of the search string. In this case, the system will compute the value for 'g'. Again, given our function, this is guaranteed to be greater than the value we're looking for.
Now we simply get from redis any set members with a score between those two values. Limiting the number of results is a good idea, but in most cases users are probably only interested in the first ten results in any case. They can always type more characters to refine the autocomplete.
So what is the performance? In Redis, ZRANGEBYSCORE (with a limit) is listed as O(log N) for N elements in the set. ZADD is O(log N) as well. For a searching algorithm, that's about as fast as you're going to be able to get.
There are a couple of weaknesses to this scheme that don't make it perfect for everyone. This scheme won't work if you need to weight your terms. That's a good use case for something like soulmate, which supports that.
Also, this algorithm is length-limited. Given a 64-bit (signed) long value as the score and a 37-character set, by my calculations you'll overflow the score if you go over 12 characters. For my use case, that's a lot of characters. I've got my length set to 10 characters, and I can't imagine I'll exceed that.
There are some very neat posts about how to build one, and there's also a gem: seatgeek/soulmate. It's very nice and feature complete, but it's pretty much overkill for what I'm trying to do. It's pretty much a data store in itself. And it'll generate lots and lots of keys, which I don't really want to do - I'd like to keep all of my data in the same store, and the key explosion will make that very difficult.
If you read over the various options for redis autocomplete, you'll find quite a few descriptions of indexing via hashkey. And it's pretty straightforward if you think about it. But I was convinced that there was a way of doing autocomplete that relied on sorted sets.
And eventually I figured it out. There are two basic requirements to get this to work. First, you have to have a unique description for every key you want to index. Second, you have to invent a mapping for each autocomplete word to the set of integers such that strings have a weight that works suchly: a < aa < ab < b ... and so forth.
The mapping function is fun, because it's basic computer science. Instead of a hexadecimal number set that contains sixteen values for each digit, I'm building a number set that contains 27 values for each digit. Then I take and right-pad (or truncate) each value I want to index so that they're all the same length. This provides the property listed above where a < aa < ab < b - and now I can store those values as scores in a sorted set.
This is why we need a unique description for each value - the description becomes the key for the sorted set. Take for example - we want to index the value 'foo' and associate it to a post id: 123. We run our autocomplete value 'foo' through the function above, yielding some (big) integer. We take that integer as the score, and for the value we store a tuple ('foo', 123). This allows us to build an index that contains multiple values to multiple ids.
When searching for a value, the process is very similar. Let's say the user starts by typing an 'f', since he's searching for the 'foo' value we stored above. The system will first compute the value for 'f', which given our function will be less than the value for 'foo'. Next the system will compute the next value for the last character of the search string. In this case, the system will compute the value for 'g'. Again, given our function, this is guaranteed to be greater than the value we're looking for.
Now we simply get from redis any set members with a score between those two values. Limiting the number of results is a good idea, but in most cases users are probably only interested in the first ten results in any case. They can always type more characters to refine the autocomplete.
So what is the performance? In Redis, ZRANGEBYSCORE (with a limit) is listed as O(log N) for N elements in the set. ZADD is O(log N) as well. For a searching algorithm, that's about as fast as you're going to be able to get.
There are a couple of weaknesses to this scheme that don't make it perfect for everyone. This scheme won't work if you need to weight your terms. That's a good use case for something like soulmate, which supports that.
Also, this algorithm is length-limited. Given a 64-bit (signed) long value as the score and a 37-character set, by my calculations you'll overflow the score if you go over 12 characters. For my use case, that's a lot of characters. I've got my length set to 10 characters, and I can't imagine I'll exceed that.
Subscribe to:
Posts (Atom)