Lately, Big Aanalytics 2012, NY hosted an interesting live
panel discussion. The name was quite catchy, "Do you believe in Santa? How
about Data Scientist?” Four guests were invited to the panel, including
Geoffrey Guerdat, Director of Data Engineering Group from Gilt.
Guests and audience had heated discussion around the topic.
Personally, I enjoyed almost all of Geof's comments and opinions. He precisely
described what I have been observing and thinking this year. In the following,
I just highlighted a few points that really touched my heart.
The Data Engineering group has 12 members, majoring in 3
main areas - Business Intelligence, Data Engineering and Data Science. This is
the mix I'd like to see and be involved with as well. Data Engineers are extremely
important, in my opinion, sometimes even more important than both BI and DS analyst.
They are in charge of plumbing (ETL) and making sure everything works.
Organizations normally start their data team with BI. So BI has longer history
and more "credibility and reputation" than the fancy DS. Having a
good BI sub-team, ensures the companies to have access to vital measurements
and make smarter decision. The DS sub-team is crucial as well. As Geof pointed
out, with the amount of data, and time it takes to process them, DS analysts
kind of bridge the gap between BI and DE. And they are aware of more
techniques/tools than traditional BI analysts.
Ideally, I'd like to see the mix of those 3 functions change
over time. At beginning, one might want more BI and DE people, but way less DS
people (definitely not completely missing. DS people need to get trained on
company's data over time.) This mix will focus on sorting things out and serve
other departments inside an organization. As things got more stable, one would
get more DS but less BI people. Thus, the team could work closely with a few
teams to solve harder problems.
Geof constructs his team around two Data Scientists, one is
strong in Statistics and one is strong in Computer Science. They provide
guidance and act as quarterbacks. The solution sounds very clever to me. My
ideal team would include 1 director, who is very good at working inside an
organization (aka, politics, as someone call it), 2 tech leads (1 stat and 1
cs). All other team members are acquired around this golden triangle. However,
I see many companies hire "managers" to manager data teams, who have
never written a single line of code. They had a hard time identifying problems
and bottlenecks; they even had hard time recognizing/accepting suggestions from
the data scientist inside the group. All is because that, they often don't know
data as well as the people who work with data 40 hours a week.
I'd appreciate data managers to have good
"listening" and "summarizing" skills, and data scientists
to have the nature of curiosity and ability to prove or implement their own
thoughts.
- As more people become data scientists by clicking buttons inside "tools", is it good or bad?
Some companies who are in the tool business, actually aimed
their goal to be "let everyone be data scientist". Please allow me to
frown to such claims. It's very dangerous if everybody were a data scientist.
"Data People" are armed with more and more powerful data mining
weapons. They are capable of doing more harm. And not only they need to
understand the underlying models to explain to others, they also need to know
well enough to recognize the spots to reconstruct and optimize their models.
People need to get trained on understand the input data (meaning familiar with
the business, and knowing what's available), and output data (to identify how
to act upon the insights).
- How can someone tie a dollar amount to Data teams?
All the panelists agreed that it's hard to do so. Well, my response
is "don't even get yourself there". In both of my jobs, companies
tried to tie revenue goal/gain to data teams. They both failed. Whenever I see
companies try to put a price tag on data teams, it only occurs to me that they
haven't realized the value of their data, or truly recognized the fact that
data team provides guidance and advice is helpful and important. They probably
still think data as accessories, something supplementary not necessary.
However, in my opinion, data should be treated as one of the organization's product
lines. It's as important as all other products. With the amount of data we have
on our users, and the amount of insights we know about them, we have just
started the data journey.
- Tools data scientists use
Geof mentioned R/sql/vi/emacs/shell/java. It seems rather
primitive. However, they are really powerful. I hate teams become
tool-dependant, which creates bottlenecks naturally. Because it's hard for
others to maintain the system and make changes, particularly when the tool
experts are not around.
- What makes good data scientists?
"Moving the info around, reconstructing info in some
other way, and making use out of it ...", Geof summarized. This truly
describes what I am working in the past few months, to consolidate data in a
way that is easy to consume and make sense to both analysts and the entire
company. I believe, without solid foundations, no buildings on top should be
called "success". So data plumbing and pumping is really the key to
everything.
One of the audiences raised an interesting point, the short
of data scientists is just a gap of education. Right now some schools are
teaching statistics to elementaries. It's going to be fun to teach my
kindergartener "averages" and tell him that "average American
kindergartener" actually doesn't not exist!