DataData ManagementInformation Technology

Faster than the speed of…data!


By Jagdeep ‘Jag’ Bedi, Vice President – Data Science | Analytics | AI, Purchasing Power

Imagine walking into a thirty-minute meeting and spending the first ten minutes comparing whose data is the most accurate. “I got mine from finance, and I know the numbers are correct because Matt pulled it himself,” or “Mine was pulled directly from our database, and it’s the source.”

So, what is the source of truth in this case – a business unit’s data or the database? And that was the problem – there was no source of truth – no single data warehouse where everybody went to pull their data.

We were good at identifying what happened when it did, but it was always after the fact. We were always caught in a reactive state – everybody running around and identifying and confirming what had happened. Conversations would get increasingly difficult when we were trying to explain why it happened, and even more complicated, on whether it will happen again, and finally, how do we prevent this from happening in the future? This is the foundation of all data analysis – descriptive, diagnostic, prescriptive, and predictive analytics.

At that moment, we realized that we couldn’t continue operating a $500M enterprise (and getting to $1B and beyond) without a hard focus on our data and centralizing the organization to have a data-first mentality – which wouldn’t be easy.

So, how do you do this? Go from spreadsheets in meetings to having beautiful visualizations explaining all of our customers’ buying behaviors, what they are buying when they buy, and everything you would expect in the e-tail industry.

Well, as I said, this would not be easy, and I expect it to take a lot of time and not to be expensive!

I will make this an action-focused article – something you read and walk away with items you can start to implement, hopefully immediately.

First – get buy-in!

I cannot stress how critical this step is. If this step takes months, then so be it. But you have to start at the top – the board or CXOs. I had the buy-in from the CTO, which made this a little easier, and together, we put a hard-knock presentation on the state of our data – a SWOT analysis, if you will. We outlined where we are, where we want to be, and what happens if we don’t get there, and for kicks, I threw in a timeline – 4-6 months (you need a goal to shoot for – right?).

The presentation exposed a lot of things – silos, for one! Everyone had their own definitions of terminology in the company and would pull data based on that definition from sources that existed in more spreadsheets or personal data marts (again, more spreadsheets), which was confusing. The second exposure was not a central repository where everyone could go and pull their data (i.e., data warehouse, data lake, etc.).

There were others, but these two were glaring – especially the silos. How do you break down the silos and get everyone on the same sheet of music – data first mentality? This is where the buy-in comes in – you have to get the top-level leaders on board and support the initiative from the top down. We made numerous presentations to CXOs and our board to get their buy-in. Luckily, this didn’t take long – the board blessed our initiative.

We continued to evolve but never lost sight of the fact that the focus was data and needed to be part of every conversation.

Second – get more buy-in!

Take your data transformation to the masses at town halls and all-hands, and even plan a few roadshows. We spent most of the time educating various business units through roadshows on what’s coming, how this will impact you, and what every business wants to know – what type of commitment will this require of me and my team? We were very transparent about the time needed from various business units. Every business unit required varying levels of effort – approximately 4-5 hours a week for understanding processes and how data was used in their specific area (a disclaimer – these hours will vary depending on the complexity of your organization).

Spend time communicating how the data warehouse will help each business unit. This is not about relinquishing control but gaining more control over your data. In addition, instead of spending cycles on earning trust in the data, you can focus more on narrating a story around what the data tells us. Eventually, with self-service visualizations, everyone can answer their questions instead of bombarding business units about the ‘what’ and focus on the ‘how this happened’ and ‘how do we prevent this from happening again.’

Third  – The Achievement!

Now that we have everyone on board and excited about this new data platform and how it will change our lives forever, we have to identify how to get this done in our timeline – remember that unrealistically looking timeline from earlier?)

So, we didn’t have a data team to do this work. Remember, we’re all trying to stand this up as we go along. It comes down to who needs to be informed, involved, and responsible/accountable – your typical RACI matrix. I agreed to identify a plan to hire a new data team stacked with data engineers/ETL, data analysts, data scientists (this came later), and business intelligence developers –starting with the data engineers.

I took a small team of three, myself and two additional senior data engineers, to help identify a platform, set up streaming data in real-time, and push the data to Tableau – our visualization tool (eventually). The current focus of this micro-team was going to be sourcing data and establishing a single-source-of-truth.

We focused on setting up a data warehouse in Elastic with Kafka (and other open-source technologies) that would continuously stream our data into our data warehouse in AWS (not Redshift – for reasons beyond this article).

We identified all of the various business units. We spent time talking to each one, specifying their data challenges and building dimensions inside the data platform that would serve multiple business units simultaneously–instead of repeatedly duplicating data in tables. This went on for a few more weeks – the conversations and the build-out. All the while, I was giving updates to senior leadership – they were always kept abreast of any issues and progress weekly – communication was vital. We continued this cycle for a few more weeks – talked to the business/stakeholders, set up data pipelines as necessary, and pushed our data to our data warehouse (of course, a few more steps were involved in between).

The unveiling!

When we released the data warehouse to the organization, it took us eight months – not bad! All the while, we’re still hiring analysts across the business, ramping up the data team, addressing issues, and fixing and re-loading data. Oh, and did I mention that we were also in the process of opening up a new offshore center all the while this was going on? It was a hectic year indeed.

We continued to evolve but never lost sight of the fact that the focus was data and needed to be part of every conversation.

In the end, keep the following in mind: get buy-in from the top-down—without this support, the enterprise will not commit to completing this; champion the cause—sell it to the organization; communicate, and over-communicate. We had many parallel activities running and were lucky enough to have a supporting organization to realize this was a gap that needed to be addressed.

Now, on to our next adventure…harnessing AI!