Tuesday, January 29, 2013

What is this statistical analysis you speak of?

I've spent the last month or so reading everything I can about the Data Science. It's been fun and interesting, but I've come to the conclusion that I don't know a damn thing about statistical analysis. I feel that's going to be a problem going forward.

Code I understand. Heck, I've been writing code for most of my life, and particularly code that needed to be far more robust than most of what I've seen in the field. Being able to hack an algorithm together is going to be my strong point.

Vocabulary is one of the most important things you can learn about a new field. Being able to communicate effectively about statistics is not a skill I have. I took a statistics class when I was in college, but my retention after fifteen years of not using it is pretty poor. I have a vague recollection of a few probability concepts, but that's about it. I'm almost certain that I didn't even learn anything about analysis.


I found this blog post from Andy Mueller that describes his standard approach for new data - I don't have context for half the words on there. But it provides me a great place to start looking for things to learn.

I've also grabbed myself a couple of statistics books, and want to try implementing a few of these algorithms. I've found that I understand concepts much better when I've implemented them in code.

Anyone have any other ideas for learning statistics?

Thursday, January 17, 2013

Wall Street Journal

I saw a link to this today and wasn't entirely certain if it was a joke or not.

I do finally understand why Romney thinks that middle income is $250K a year - he's been reading the Wall Street Journal. Here's a link to a more accurate graph of income in the US - provided by the US Census Bureau. Note the part that shows the over $250K income as being the TOP TWO PERCENT OF HOUSEHOLDS.

The easy target from the WSJ article is the single parent (with the picture showing a mother) making $260K a year. I've known many single parents throughout my life, and NONE of them made anywhere near $260K a year. Admittedly, I know very few people making $260K a year or over anyway. Most of them were in borderline poverty.

Do a search for "single mother income" and have a look at the articles like this, or this. "Half of single mother families have an annual income less than $25,000." - "Two fifths of single mother families are poor, triple the poverty rate for the rest of the population." - "Three quarters of homeless families are single mother families."

What part of that sounds like a family making $260K a year?

Oh, and according to the article by The Nation, 80% of single parents are mothers. So the WSJ got the PICTURE right, even if they had the income off by an order of magnitude.

I live in one of the highest-income areas of the US, and I think I make a large amount of money. Which is not even half of the incomes from that WSJ article. Let me run that by you again - I consider myself wealthy (which the Census graph supports) and I make LESS THAN HALF of what the WSJ implies is AVERAGE.

Given, nobody making $26K a year reads the WSJ. It's probably because they are too busy working multiple jobs to feed their families, but I'd like to think that those people are too smart to bother reading that load of crap.

...

There may be a few single parents out there actually making $260K a year, and I'm glad you have the income to be able to provide for your children. Your taxes are going up about $280 a month. My suggestion is to lose the Lexus and get a Honda instead - they're much cheaper and still really good cars.

...

Side note: why do the brunette / redhead couple have four blonde children? Guy needs to ask some questions, I think.

Sunday, January 13, 2013

Gun Control

I'm going to try to bring this around to statistics, although I'm not sure how successful I'll be. This is an incredibly divisive issue - fortunately nobody reads my blog so this is really just getting my thoughts down.

First, a disclaimer: I know how to shoot, and enjoy shooting. I don't own any guns. My father owned several pistols growing up, and I learned gun safety from him. I was in the army for five years, and in two trips to Iraq I never had to fire my M16 at a target that wasn't paper, although I once had to chamber a round to get an Iraqi to stop advancing. I would have shot at them, and I'm incredibly glad that I didn't have to.

Thursday, January 10, 2013

Hatin' on Forbes

Interesting attack on data science in Forbes here. Short rebuttal here.

While the paper by Ray Rivera is mostly stupid, it actually has a point that I agree with - that data science shouldn't be considered the oracle at Delphi that has all the answers. Granted, that's true of any technology or set of buzz words.

Heck, if you want to be sold snake oil - try asking a consultant what sort of "analytics" he would recommend. (Interesting note: Ray appears to work for SAP. SAP does analytics consulting. Correlation!)

Magic voodoo snake oil algorithms are WHY more people should be doing data science. If everyone could examine data and competently discuss the results, nobody would be able to sell that kind of crap. The more people that can perform logical thinking, the less we have shady consultants.

Tuesday, January 1, 2013

ggplot

Next task: integrate ggplot into workflow!

...

That is all.

...

Seriously - that's my next data science goal. That's all. Baby steps.