In an era where almost everything is touted as being “big data” how do we define just what we mean by “big data” and what precisely counts as a “big data” analysis? Does merely keyword searching a multi-petabyte dataset count? Does using a date filter to extract a few million tweets from the full trillion-tweet archive count as “big data?”
I used to open my data science talks back in 2013 by saying I had just run several hundred analyses the previous day over a 100-petabyte database totaling more than 30 trillion rows, with more than 200 indicators incorporated into the analysis. When I would ask the audience whether this counted as a “big data” analysis, there was typically unanimous assent.