So here's what I ended up with for the normalization code - my ruby code was way off from what it needed to be.
average_age = all_data.select { |x| !x.age.nil? }.map{ |x| x.age }.median common_embarkation = all_data.select{ |x| !x.port_of_embarkation.nil? }.group_by{ |x| x.port_of_embarkation }.each_with_object({}){ |(port, list), totals| totals[port] = list.size }.max.first average_class_prices = all_data.group_by{ |x| x.pclass }.each_with_object({}){ |(pclass, list), totals| totals[pclass] = list.map{ |x| x.fare }.median } training_data.each{ |x| x.age = average_age if x.age.nil? } training_data.each { |x| x.port_of_embarkation = common_embarkation if x.port_of_embarkation.nil? } training_data.each { |x| x.fare = average_class_prices[x.pclass] if x.fare == 0 }
I also opened up Enumerable and added a "median" method:
module Enumerable def median if length.even? sort[length / 2, 2].instance_eval { reduce(:+) / size.to_f } else sort[length / 2] end end end
Maybe not the prettiest code (The common embarkation is icky) but it makes a whole lot more sense than the python version, at least to me.
Update: found and fixed a bug.
No comments:
Post a Comment