Data scientists are in short supply! Or at least that’s a headline you can find nearly everywhere. There are people trying desperately to hire them and also people trying hard to jump into the perceived gap and become one. Meanwhile, there’s plenty of skepticism over whether the role is real or a function of all of the hype.
Inside, not outside
Matt Asay, writing in Data Informed, argues that there is no external source for data scientists because outsiders would lack the business context for understanding what they’re seeing in enterprise data.
Part of the problem lies in the very name “Big Data.” Enterprises become so intent on the sheer volume of data being collected that they lose sight of the much more essential act of intelligently querying the data for insights. In other words, the goal of the data scientist isn’t to ask bigger questions, but rather to ask better questions.
In Asay’s opinion, the best data scientists come from extending existing teams with people who are familiar with the business and that generic ‘data science’ skills aren’t useful.
Asking a data scientist
Fabless Labs Cofounder and CTO John West will participate in next week’s Big Data Workshop at InterOp in Las Vegas. He may be the most bona fide data scientists in world. His background includes building and operating some of the most data-intensive systems going back three decades that centered mostly on consumer preference data. Asked if data scientist is the right term, he said, “That’s a hard questions. There are definitely people who need to examine data to see what information it has. In that sense, there definitely are data scientists.”
With that question out of the way, he was asked if companies can simply hire data scientists from the marketplace, and he offered this caution:
“The trick with data science is that when you’re looking at something data appears to be telling you, you have to be able to think about whether the relationship you’re finding is coincidence or causal. It isn’t hard and fast and it takes a special person. When people hire computer programers, it is much simpler to know what you’re getting because you can look at skills and experience. Data science requires applying mathematical skills in very creative ways and measuring talent plus creativity can be far more challenging.”
West says that pure mathematical people may be able to create great models, but if they can’t code those models, things break down very quickly. In his mind, a great data scientist is a ‘code jockey’ as well as a math and statistics person and these are very tricky people to find.
West also offered, “Data science is distinct from other jobs in technology because it isn’t as mathematical as people think. Knowing the difference of what to act on and what not to takes some amount of knowledge about the business and how data behaves.”
The necessary skills
His advice for companies trying to exploit big data? “Look for people who have the following characteristics:
- A minimal math background required to model that includes grasp of linear algebra
- Programming skills in data mining technologies like R or SAS
- Programming in Python or Ruby to be able to mine at massive scale
- ‘Data judgement’ to look at surprising results and know what to do next
- Curious enough to keep running tests until an answer emerges
- Know what to do with data sets that are too small or not very clean
Where to look
West doesn’t have a great answer for where to find this type of resource and advises companies to look for people who’ve done performance-oriented computing. West felt strongly that people who have that background have tuned large databases, tweaked code and have dealt with a lot of adversity in ways that makes them great data scientists. Said West, “At least that’s a place to start.”
Join both me and John West next week at InterOp in Las Vegas. We’ll be talking about the myths versus the realities of big data and we hope to see you there.