Every day we hear more and more about Big Data and Data Science. The promise of these buzzwords is widely claimed to change the world as we know it. But where does all that data really come from, and how can you access it yourself and extract meaningful insights?
According to Eric Schmidt, every two days, we create as much information as we did from the dawn of civilization up until 2003. During a July 2012 earnings call, Mark Zuckerberg noted that one billion pieces of content are shared via Facebook’s Open Graph daily. To think more deeply, aside from the characters in a Facebook post, that single piece of content can also include many other pieces of information: the location from which it was authored, the time the content was authored, and the ID of the user who authored the content, hashtags, links, photos, users mentioned in the post, users tagged in the photos, and, oh yeah, the actual words in the message. Insights can be derived from each of these types of information.
From social media to healthcare and finance, we are creating new kinds of information that allow the sophisticated analyst to learn about how people interact, what they like, what they don’t like, where are they at different times of the day, what their social circles are like, what events take place, and on and on. All this new information gives us insight into to what’s really happening in the world, holistically, in real time and on an unprecedented scale.
The exciting part is that accessing this data is becoming increasingly simple. It now takes no more than 30 minutes for a novice to learn how to get all this information from social media APIs from companies like Twitter and LinkedIn, APIs of consumer businesses such as Best Buy and Expedia, as well as the plethora of APIs available through the government’s open data initiative at Data.gov.
But how can individuals with no background in computer science or math use all this information to make their lives richer? The answer is data science. Free packages are available online that allow regular people to do amazing things. For example, you can use a network graph in an open source programming language, R, to visualize the relationships among people, companies, or any other set of interconnected entities. A PageRank package that measures centrality can quickly tell you who the most important people or elements are in a network. This provides the key to how to market your products most effectively, how to expand your influence, and who you should send Christmas cards to this year. Businesses can manage their brand image and customer service quality not just through controlled surveys but also through real time consumer feedback posted on social networks – the most honest and important feedback of all. In fact, IBM has recently come out with an enterprise grade product that does exactly that. But individuals can do it themselves with free and widely available R code.
The tools are out there, and the insights they can provide are the new frontier. The educated professional of the 21st century will have to become acquainted with basic data science tools much like any business analyst today must be fluent with Excel.
While we don’t all have to become experts, we should all learn data science, at least a little bit.
Dmitri Adler is the Founder of Data Society, a community dedicated to providing Data Science education and networking opportunities to working professionals.