Sorry, you can’t simply hire a data scientist

Wanted Data ScientistsData scientists are in short supply! Or at least that’s a headline you can find nearly everywhere. There are people trying desperately to hire them and also people trying hard to jump into the perceived gap and become one. Meanwhile, there’s plenty of skepticism over whether the role is real or a function of all of the hype.

Inside, not outside

Matt Asay, writing in Data Informed, argues that there is no external source for data scientists because outsiders would lack the business context for understanding what they’re seeing in enterprise data.

Part of the problem lies in the very name “Big Data.” Enterprises become so intent on the sheer volume of data being collected that they lose sight of the much more essential act of intelligently querying the data for insights. In other words, the goal of the data scientist isn’t to ask bigger questions, but rather to ask better questions.

In Asay’s opinion, the best data scientists come from extending existing teams with people who are familiar with the business and that generic ‘data science’ skills aren’t useful.

Asking a data scientist 

John WestFabless Labs Cofounder and CTO John West will participate in next week’s Big Data Workshop at InterOp in Las Vegas. He may be the most bona fide data scientists in world. His background includes building and operating some of the most data-intensive systems going back three decades that centered mostly on consumer preference data. Asked if data scientist is the right term, he said, “That’s a hard questions. There are definitely people who need to examine data to see what information it has. In that sense, there definitely are data scientists.”

With that question out of the way, he was asked if companies can simply hire data scientists from the marketplace, and he offered this caution:

“The trick with data science is that when you’re looking at something data appears to be telling you, you have to be able to think about whether the relationship you’re finding is coincidence or causal. It isn’t hard and fast and it takes a special person. When people hire computer programers, it is much simpler to know what you’re getting because you can look at skills and experience. Data science requires applying mathematical skills in very creative ways and measuring talent plus creativity can be far more challenging.”

West says that pure mathematical people may be able to create great models, but if they can’t code those models, things break down very quickly. In his mind, a great data scientist is a ‘code jockey’ as well as a math and statistics person and these are very tricky people to find.

West also offered, “Data science is distinct from other jobs in technology because it isn’t as mathematical as people think. Knowing the difference of what to act on and what not to takes some amount of knowledge about the business and how data behaves.”

The necessary skills

His advice for companies trying to exploit big data? “Look for people who have the following characteristics:

  • A minimal math background required to model that includes grasp of linear algebra
  • Programming skills in data mining technologies like R or SAS
  • Programming in Python or Ruby to be able to mine at massive scale
  • ‘Data judgement’ to look at surprising results and know what to do next
  • Curious enough to keep running tests until an answer emerges
  • Know what to do with data sets that are too small or not very clean

Where to look

West doesn’t have a great answer for where to find this type of resource and advises companies to look for people who’ve done performance-oriented computing. West felt strongly that people who have that background have tuned large databases, tweaked code and have dealt with a lot of adversity in ways that makes them great data scientists. Said West, “At least that’s a place to start.”

Join both me and John West next week at InterOp in Las Vegas. We’ll be talking about the myths versus the realities of big data and we hope to see you there.

Advertisements

Tags:

Categories: Data Analytics / Big Data

Author:Chris Taylor

Reimagining the way work is done through big data, analytics, and event processing. There's no end to what we can change and improve. I wear myself out...

Subscribe to the blog

Subscribe and receive an email when new articles are published

4 Comments on “Sorry, you can’t simply hire a data scientist”

  1. May 2, 2013 at 3:43 pm #

    Chris, great article. Would you add ‘soft skills’ to the list of attributes a Data Scientists needs to have? A lot of the conversations we’re having at the moment revolve around that Data Scientist being able to integrate with different teams and see problems from their angle.

    • May 2, 2013 at 6:57 pm #

      Very good point and yes, I would add soft skills.

  2. Mike Zilton
    May 3, 2013 at 6:17 am #

    >Programming in Python or Ruby to be able to mine at massive scale

    What? I’m curious what your definition is for massive scale. For massive scale your normally talking about Java (Hadoop, Storm), C++ (If you work at Google) and to a lesser degree Scala (Sparq), Hadoop streaming is notoriously tedious and difficult to debug.

  3. May 29, 2013 at 7:05 am #

    I am surprised you didn’t mention astrophysicists. That’s the first place to look if you are looking for a data scientist. They have been looking at GIANT data about the universe for hundreds of years. Surely they have a little to teach us about data. 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: