Business analytics is a hot career right now. Here’s why I’m smiling and nodding as it goes whizzing past:
Analytics projects start as short term experiments. This sounds awesome at first for consultants because you can parachute in, jump on the data, and give the business actionable information. You can use whatever tools you want right now – and boy, do you have choices.
There’s an analytics gold rush, which means many competing tools. To make money during a gold rush, you don’t mine gold – you sell mining supplies. Companies see the rush to analytics, so they’re slinging all kinds of tools out there to see what sticks. Think back through the last several PASS keynotes, and every one of them had a wildly different strategy for what data consumption meant. I don’t see the market settling on a winner this year or next year.
The real work in any analytics project: ETL. When the company points you at the data, you have to figure out how to make sense of it. 99% of all data is garbage. (See, I just made that statistic up, and it’s garbage too.) The big problem with big data is big cleansing – even the most basic tutorials devolve into data cleansing. I hate dealing with letters where I’m supposed to find numbers, or trying to figure out what Combo #3 meant four years ago.
ETL means boring meetings. Be prepared to spend hours around conference room tables clarifying the options for a particular field over time, what a birthdate of 00/00/1900 means in someone’s medical record, or why Mrs. Jones’ orders don’t really map up to Mrs. Jones. Booooring.
Bottom line: that’s not for me. Short-lived ETL projects with disposable, immature tools run by easily replaceable staff – that’s where I click unsubscribe. Most of you probably aren’t interested in it either – if you’re reading this, odds are you’ve already got a decently paying job in databases or development or systems administration. Be careful what bets you place in your career – catch a ride going up the salary ladder, not going down.
So who’s doing it? People who love helping businesses get that “Eureka!” moment, and can afford to place some risky short-term bets. There’s a lot of money in this at the moment, and if you do it right, you can build a consulting business around it. Once the money’s coming in, you can subcontract out the boring parts like ETL and meetings, and spend your time learning about The Next Great Thing. That way, when today’s expensive analytics project becomes yesterday’s commoditized business intelligence project, you’re ready to surf the next high-value wave.
I would basically agree that 99% of analytics is ETL and another 30% is data quality. Now let’s wait to see what Richie has to say. I know he loves Informatica.
And at the same time the analytics boom is happening, there is a concurrent trend of not wanting to spend any money on ETL (one vendor even promises ‘no more ETL’) or data warehousing. In my personal experience with a few projects, this translates into the same expensive and confusing data cleansing tasks being redone for each project, and the project ending with clients asking why it takes $100k to build a report.
I believe there is some heavy dunning-kruger and projection happening here. Your belief that Analytics is fly-by-night or risky is born from your lack of interest in it. I know plenty of technically illiterate folks that think the same thing about all tech sectors.
Your belief that your boredom with Analytics is shared by everyone else is crazy bananas. Although, if your only experience with Analytics is through software vendors trying to cram their latest excel add-on that automagically connects to BIG-CRM-SOFTWARE…. and I just got too bored to continue typing that scenario out.. there’s a point in there somewhere.
Furthermore, “Analytics” can’t be mostly ETL, mostly data cleansing, mostly keeping up with new technology, mostly deciding on one of the myriads of tools, and mostly doing small disposable projects. ETL is, no doubt a huge portion of Analytics. That’s why on most Analytics teams you will find that half of the folks are dedicated ETL.
You’ll also find folks that are dedicated to understanding the processes and technology the business uses to get the data into their CRM/Sales/HR/Supply Chain/Identity/Manufacturing software as well as the data in those systems. You’ll find people that are dedicated to architecture of the underlying data structure on which reports will be built, and other folks that are crazy into data visualization and will talk your damned ear off about Edward Tufte and effective use of white space. Rarely are any of these people bored with their area of expertise.
Again, the same could be said for other tech sectors: Web development is boring because it’s mostly mucking around with server builds, mostly trying to get every browser to render your page the same, mostly trying to decide which technologies to use, mostly long boring meetings with your client about which color to paint the bike shed, mostly throwing together quick wins for the customer and then dumping your piece of crap Shared VPS, single instance LAMP, beat-into-submission wordpress glorp on them and running off with their money.
Oh, you used Node.js, jquery, mysql, Ruby, Python, Nginx, leveldb, backbone… ugh. Don’t invest too much time in those… they change every few years and the market has definitely not settled on the technology.
John – nope. You wrote:
“Your belief that Analytics is fly-by-night or risky is born from your lack of interest in it.”
To the contrary – I’ve been interested in analytics for years. I was blogging about data mining with SQL Server back in 2008. I love the idea – I just don’t like today’s execution, and your comment goes on to explain why.
Thanks for stopping by, though!
I think that most of analytics (which I don’t consider ETL to be a part of) is understanding a domain well. The stats part is general knowledge, but transforming that data into something more than its source requires specific knowledge.
While I think analytics is a good area, I think it’s really more valuable when you’re an FTE, or a specific industry consultant, where you can bring additional value into information analysis. The presentation, patterns, mining, are all important, but the bar of learning that stuff, and adding limited value, is low. The truly important stuff comes from being tunnel focused in one area, which isn’t likely what Brent Ozar Unlimited focuses on.
Steve – ETL is embedded in analytics now because of unstructured data. The whole “big data” thing involves huge amounts of incoming data with unreliable formats, and no time to do traditional ETL processing. Now, you dump your raw data files directly into HDFS (or whatever), and your querying process has to be able to account for the lack of structure (or the changing structure).
Old-school analytics could sit in an ivory tower and query structured data, but that’s not what the buzzword bingo analytics stuff does these days.
It’s kind of a temperament thing. You either love it and live for it (and tolerate the boring meetings) or it isn’t your thing. Like many professions (although I guess the other setting is ‘tolerate it to make big money’).
Part of the ‘Gold Rush’ will be increasingly shady operations offering ‘become a Data Scientist in 30 days!’ programs.
Personally I enjoy so-called ‘Data Jujitsu’ and the detective work of hunting down and acquiring data, pulling insight out of it, etc and so on. So maybe I’ll see if I can get on the gravy train, haha. But your warnings are all worthy of consideration. As you mentioned, you like it too, but a really crummy environment can ruin anything.
It’s probably already occurred to you, but I think a great idea for a blog post might be how you decide what you *are* going to learn. There’s so much to know compared to the amount of time most people can get out of a day…
James – indeed it has! Here’s a 2010 post I wrote about it: http://www.brentozar.com/archive/2010/04/what-skill-should-i-learn-next/
I think this is really the crux of it. I don’t want to digress too much but I’ll tell you what I’m thinking about because I think the topic is interesting.
I work as a DBA in Operations and we have ongoing discussions about two things:
1) Where do we draw the line about what we look after and what we don’t? Obviously the database engine is ours but what about SSRS, SSIS, and SSAS where we have varying levels of little experience?
– On the one hand, these come with SQL Server and should be patched along with SQL Server. So they’re “ours”.
– On the other hand they are (IMHO) quite specialist disciplines which we have little experience with. Having us look after them is not very logical.
– If we did invest our personal time into getting better at them, well we can just as well argue our time is better spent learning about HA/DR, AlwaysOn, and Column Store. You know, the important stuff.
– And on a fourth hand, it seems that NOBODY is managing them. Are those Reporting Services encryption keys backed up? Well they can’t be… because there’s nowhere to put them. This concerns us.
2) When should we lend our resources for short-term projects within the organisation?
If a project needs an SSRS resource for a few weeks, sure I can likely do that (there are exceptions, like if it involves geo-spatial data or Sharepoint or farms, which are not my forte, but native mode… no problem).
But what about ETL packages? What if they need some back-end wiring for PowerShell (which we’re really good at)?
What happens to Operations while my time is spent on those projects? Yeah it earns make-believe money for my team (you know, from inter-departmental charging, not that any of that is my business), but where do we draw the line? How long is too long? How specialist do I have to be to be the one to do the work?
Long story short, focus is important, and it seems whether you work at a small company (and have to be an all-rounder) or at a large company (and are loaned between departments), the general perception is, “Databases? Oh, DBAs can do that!”
There’s not much understanding of the breadth of the field and the specialties we each have. I mean, seriously, I like the people who are good at fundamentals – backups, monitoring, and investigating and documenting problems when they find them. You may or may not be surprised but that seems extremely lacking in most DBAs I meet. Of course watching many of your videos, I think your standards are much higher 😉
Gotta disagree with you on two points, Brent:
(1)I don’t see ETL as the “real work” for analytics, b/c I spend much of my time analyzing scrubbed, loaded, and consistent data sets. I get you — it CAN be a big chunk of work the first time someone sits down to analyze a new data stream. But that cleanup workload should die down over time.
(2) I don’t see analytics as being about the tool. It’s about the results. How you get to the result is up to you, your team, your company, your budget. I would argue that IT should almost never be about the tool/package/software. Call me an idealist!
I’m a pretty skeptical guy about the newest, latest thing coming down the pike. Remember SOAP/HTTP endpoints in SQL Server 2005? But I’ve been doing data analysis for businesses since before version 6.5, and I see the overall mission for data analysis — call it BI, call it machine learning, call it analytics — persisting as a strong need.
Spend some quality time with R and PL/R and Postgres and see the beauty of running your analytics inside your database.
Robert – yep, just as DBAs have long enjoyed the pleasures of running reports in their live OLTP databases, too. Gotta love performance tuning for mixed workloads!
Brent, I don’t see that you meant this as the last word on Analytics, just your take on it, your perspective, maybe your experience. It did not sound to me that you were discounting the other pieces entirely and I think I got your intended sarcasm about ETL. There are lots of us detail-oriented types that saw the need and the fit and jumped on the ETL wagon. We are the ones smiling at your article. We get it.
Craig – thanks, glad you enjoyed it!
Interesting article and opinion, as usual! However, I think you’re not giving Analytics a fair shake. You have to consider the potential reach and audience for BI. A BI consultant has the chance to quickly learn the customer’s domain, and be in a position to make strategic recommendations, with data and charts to back it up. There’s an upward trajectory as you start to know the business’ problems as well as any manager, and get visibility and recognition for your work. Compare that to a DBA working in the back office, doing boring tasks like fixing failed backups, upgrading software versions, planning storage needs, and managing security. While important, these functions are only interesting to the IT Director, and are good candidates to outsource to your cloud vendor.
It’s true there is a huge bottleneck in BI- If you require every new dataset to be added to ETL, new DW schemas, deployed after dev/test/prod approvals, and all code in source control, the rate of change will grind to a crawl.
A good BI professional does not need fancy tools- you can “ETL” virtually anything with a Linked Server, Stored Procedure, and Agent job. Throw in a Python script to manage flat files and you’re all set. Create and publish some friendly Views, and show users Excel external data feeds and Pivot Charts. Or, invest in one of the modern, leading Viz tools, for more impact. Work fast, deploy changes right into production, and seek constant feedback. BAM, you have a model for a very successful consulting operation.
Rob – thanks, glad you like it!
But when you say “you can ETL virtually anything” – you’re cutting to the heart of the problem. Beautiful charts atop untested, unorganized, dirty data are the biggest problem I see in analytics today.
I came across this link when I googled “Bored with ETL” 🙂
I have about 6.5 years of experience with ETL and Data Warehousing, and just moved into Big Data about 2 years ago.
Out of frustration, I quit my last company with a Data Engineering/ETL profile where we had meetings to discuss exactly what you mentioned above: “what a birthdate of 00/00/1900 means in someone’s medical record, or why Mrs. Jones’ orders don’t really map up to Mrs. Jones.” 😀 I never felt so bored in 6 years and left soon, and taking a break now to figure out what to do next.
Reading this article really put a smile on my face, thanks!