Saturday, November 24, 2012

Ruby Kaggle

I promised I'd put up my ruby code from the kaggle tutorial, and then I never did. Bad me.

So here's what I ended up with for the normalization code - my ruby code was way off from what it needed to be.
average_age = all_data.select { |x| !x.age.nil? }.map{ |x| x.age }.median
common_embarkation = all_data.select{ |x| !x.port_of_embarkation.nil? }.group_by{ |x| x.port_of_embarkation }.each_with_object({}){ |(port, list), totals| totals[port] = list.size }.max.first
average_class_prices = all_data.group_by{ |x| x.pclass }.each_with_object({}){ |(pclass, list), totals| totals[pclass] = list.map{ |x| x.fare }.median }

training_data.each{ |x| x.age = average_age if x.age.nil? }
training_data.each { |x| x.port_of_embarkation = common_embarkation if x.port_of_embarkation.nil? }
training_data.each { |x| x.fare = average_class_prices[x.pclass] if x.fare == 0 }

I also opened up Enumerable and added a "median" method:
module Enumerable
  def median
    if length.even?
      sort[length / 2, 2].instance_eval { reduce(:+) / size.to_f }
    else
      sort[length / 2]
    end
  end
end

Maybe not the prettiest code (The common embarkation is icky) but it makes a whole lot more sense than the python version, at least to me.

Update: found and fixed a bug.

No comments:

Post a Comment