大數據的預測盲區
????統計學家內特?希爾是個數學天才,卻并非因此而出名。他的成名,是因為知道怎樣把數學天才運用到真實世界。由于非常準確地預測了美國總統大選的結果,希爾成為全美國最有名的數據達人。他在去年11月份的美國總統大選期間,準確地預言了50個州的投票勝負。2008年,他也猜中了50個州中的49個。同時希爾還把他的大數據分析法應用到了體育【比如美國的大學籃球聯賽(March Madness)、職業棒球大聯盟等(Major League Baseball)】、賭博(今年夏天他將第三次參加世界撲克系列賽)、甚至是約會。希爾曾經給一個叫Baseball Prospectus的棒球網站寫過文章,現在他擴大了涉足的領域。他既是作家,又是政治專家,而且還在《紐約時報》(New York Times)網站上開了自己的博客“FiveThirtyEight”。 ????本周四,希爾作為主講嘉賓在Lithium Technologies公司的年度LiNC大會上做了有關數據分析的演講。《財富》雜志(Fortune)對他進行了專訪,請他談了談大數據分析的局限性、大數據分析在股市中的角色、以及它如何應用到約會中的,甚至還請他預測了2016年的美國總統大選。這次專訪的文字記錄節選如下: ????財富:我相信一直都會有人找你,想讓你幫他們賭贏美國大學體育總會(NCAA)“瘋狂三月”的比賽。 ????內特?希爾:我沒有按自己的計算結果來下注,因為我覺得其他人也可能按我的選擇下注。如果我按照自己的計算結果買,我已經贏了二等獎了。 ????或許你明年可以收一小筆版權費? ????絕對的。或者我們可以先拋出一個假的計算公式,然后晚一點再拋出一個真的。哎呀,上一個里有編碼錯誤!(笑) ????你一開始是用統計學來研究和預測棒球比賽勝負,后來為什么又轉向政治了? ????回溯往事的時候,說你當初為什么做了某些事比較容易,但說出來的不一定就是當初推動你往那個方向走的合理動機。不過我認為,當初的部分理由是,我當時為棒球網站Baseball Prospectus工作了五年——從2003年到2008年,這期間我發現棒球行業取得了長足的進步。那個時代剛開始的時候,和電影《點球成金》(Moneyball ,由一本小說改編成的電影)里描寫的時代非常像,當時統計學家和傳統人士之間的矛盾很緊張。人們擔心會有一堆宅男沖出來搶走他們的飯碗。現在情況完全反了過來。事情并不是像你雇了一個統計學家,然后偷偷把他藏在某個地方。而是每支球隊——幾乎是每支球隊,當然也有例外——在它的組織內部的各個級別上都有人懂數據分析。 ????我看到統計分析方法在短短幾年的時間里進步得很快。而政治報道玩的就是語言藝術。我發現無論是關于政治的新聞報道本身,還是從政治家們嘴里說出來的話,有很多都是在胡扯。所以當時我覺得時機已經成熟了,可以把某些非常基本的分析工具用在關于選舉的新聞報道上。 |
????Statistician Nate Silver isn't famous because he's a mathematical genius. (Although, he is.) Silver's well-known because he knows how to apply his craft to the real world. The country's most popular data cruncher is known for his spot-on election predictions -- he accurately called the winner in all 50 states of November's presidential election; in 2008, he went 49 for 50 -- but Silver's big data analytics have also translated to the worlds of sports (March Madness, Major League Baseball), gambling (Silver will play in his third World series of Poker event this summer), and even dating. Silver once wrote for the baseball website Baseball Prospectus but has since expanded his offerings; he is now a published author, a political pundit, and the creator of his very own New York Times blog, FiveThirtyEight. ????Silver was in San Francisco Thursday to talk analytics as the keynote speaker at Lithium Technologies' annual LiNC Conference. Fortune sat down with him to talk about big data's limitations, its role in the stock market, how it applies to dating, and even his predictions for the 2016 presidential election. A lightly edited transcript follows. ????Fortune: I'm sure you get people coming up to you all the time to discuss how you helped them win their NCAA March Madness pool. ????Nate Silver: I went against my bracket in my own pool because I thought other people would be using it. I would have gotten second place if I had taken my own advice. ????Maybe take a small royalty fee next year? ????Absolutely. Or we need to put out a fake bracket [first], and then put out a real one [later]. Oops, there was a coding error! [Laughs] ????You started out using stats to better understand and predict success in baseball -- why did you move towards politics? ????Of course it's easy to say in retrospect why you did certain things instead of what rational motivations were pushing you in that direction in real time, but I think part was that I was involved working for Baseball Prospectus for about five years -- 2003 to 2008 -- and you saw a great amount of progress in the baseball industry during that time. The start of that era was the era described in [the book-turned movie] Moneyball where you really had a lot of tension between stat-heads and traditionalists. People were terrified that nerds would come over and take their jobs. And really now that's been totally reversed, where it's not just that you have some stat-head that you've hired and have locked into a closet somewhere, but that every team -- almost every team, there are some exceptions -- understands analytics at different levels of the organization. ????But seeing how quickly that progressed in a span of just a few years, and how behind politics coverage seemed to be where it's all about the narrative -- there's a lot of bullshit basically both in the news coverage of politics and from politicians themselves -- so it seemed like it was ripe to apply some very basic analytics tools to the coverage of elections. |