Data Democratization — How to democratize data and unleash it on your customers

The Data Wall
10 min readDec 15, 2021

--

This is the first article in my series “What I Think When I Talk About Data”. I began with Data Democratization as it sets the foundation of everything data stands for. Why have data if we cannot use it to its full potential?

What Is Data Democratization?

Per Bernard Marr (author of Data Strategy — How to profit from a world of Big Data, Analytics and the Internet of Things):

“Data democratization means that everybody has access to data and there are no gatekeepers that create a bottleneck at the gateway to the data. The goal is to have anybody use data at any time to make decisions with no barriers to access or understanding.”

Why Data Democratization?

When we think of data democratization what comes to mind is data available in an easy-to-access, documented, trusted, and readily accessible to all regular means of data reporting.

Let’s look through some of the barriers to why we are unable to reach this democratization nirvana state within our organizations:

Most organizations struggle with data silos i.e. decentralized data access. By silos we mean that each business unit within the organization owns and manages data within their unit and may or may not support data and reporting needs for the rest of the organization. Though these silos may solve issues in the short term (getting to insights quicker for that business unit), in the longer term these cause larger problems like the inability to look at data holistically (especially when we are trying to see cross unit impacts), inconsistency in data reporting of key metrics (the published numbers are not whetted by any data council, but just by the unit owning the data).

In addition, if the data is needed by other business units within the organization, data is passed through potentially non-standard, non-governed routes leading later to challenges with data security and privacy. Data silos created a larger TCO (total cost of ownership) where each business unit has its own data and reporting infrastructure, data teams, and operations.

In addition, beyond data silos organizations also struggle with multiple reporting tools managed by different business teams/units. Executive and operational reporting now not only comes from various sources, but to view a certain dashboard/report users have to access various reporting tool portals. This similar to data silos creates an increased TCO and operational costs for the organization.

Assume that you have resolved the issue of data silos and have all the data to available to access, but the data is not properly organized for analysis. Not having a semantic/business layer adds the burden of additional data wrangling and processing for the data end-user, thus removing their focus from deriving the benefits that the data can provide to them. Studies show that typical data analysts and scientists within organizations spend over 70% of their time wrangling data and making it ready for consumption. Data not readily accessible for future analysis leads to teams loading data into their own data stores and then summarizing and aggregating data to suit their needs. This again causes an issue with inconsistency in metrics and data proliferation and the same issues with non governance, privacy and security controls.

Data with data warehouses or lakes not documented or defined correctly could lead to misinterpretations of data and cause inaccuracies in data product implementations and inconsistent metrics and KPIs for reporting.

KPIs and metrics can be misinterpreted if not documented

So, in essence, if you create a data ecosystem that is fragmented, undocumented, ungoverned, not easy to access/query/report/analyze off of, then the first focus should be on resolving those problems before trying to move towards Data Science/ML/AI. These are cool to do and know, but if the foundation of your data is not solid, then any data product coming out of it is not going to be trustworthy.

So, what do we need to do in order to get some order to the chaos above?

How to Democratize Data

  1. An Enterprise Data strategy is key to understand the strategy for the next 3–5 years. This strategy will contain the driving factors for why this initiative is required for a organization and requires the support of the C-suite through its multi-year journey. An execution strategy for the first 12–18 months needs a clear execution plan. Priorities need to be set within the organization regarding what needs to be the pilot and what will be executed subsequently. Success metrics need to be defined at each phase to ensure that the goals of the implementation are being met. Time to insight is a key KPI that should be continued to me measured through this initiative. Please note that the support for this initiative is very critical from start to finish for a successful implementation.
  2. Choosing an architecture that is forward looking for at least 3–5 years helps stay ahead of the technology curve. It is important that the infrastructure that you choose and the architecture that you build support the organization’s use cases now and in the future. This involves whether data will be flowing in near real-time vs batch or both, decisioning around whether this should be implemented on premise or cloud. Considerations and checklists for architectures will be detailed in a separate article.
  3. In a siloed organization, data for the enterprise should reside in a central location and should be owned by a central data engineering team. The data engineering team will be responsible for building, operationalizing and maintaining data pipelines to pull data from various sources. They will be responsible to ensure that the quality of data as coming in from source is maintained through to the target. They will work closely with both the source data teams and the business users to understand the data and how it is going to be used in order for them to architect it to be successful. In order for this process to be implemented successfully end to end, data engineering needs to be part of project conversations at the beginning and not as an afterthought. Any metrics that data engineering builds will have data stewards in business who own the definition of the metric and those data stewards will be responsible to communicate and ensure that the definition is aligned across the organization.
  4. The importance of Data Quality cannot be emphasized enough. If the data is not trustworthy any outputs/data products built out of it will not be trustworthy. Hence, a lot of effort needs to be put in place to ensure that the quality of the data is sustained. Regular profiling, data validations between source and target, alerts and thresholds and anomaly detections in data are just some of the key elements of ensuring that your data ecosystem is producing quality data. It is best practice to be alerted of an issue before the end user or customer finds it.
  5. The data layer that is exposed to the end users should be a business/user-friendly layer and have all the capabilities that it can be self-serve. Standard naming conventions, availability of a data dictionary, a business glossary that specifies the definition of different metrics within the data, governed data sets (e.g. — golden customer record — see chapter on Data Governance and Master Data Management), data privacy controls (access to data based on a role within the organization — read about PCI/PII data in Data Privacy chapter) and most importantly data quality are key requirements for this layer. Typically organizations call this the semantic layer. This layer becomes the foundation for all analytics, data science, AI/ML efforts going forward. This data layer hence requires to be built with a lot of thought and understanding as to its end purpose.
  6. Data Query and Reporting tools that can be used by end-users to access this data should be standardized within the organization with appropriate platform support and training on these tools available. Typically platform support and training and access control to these tools are maintained and managed by the data engineering/business intelligence teams. It may so happen that the existing standardized tool set may not suffice needs of data science teams and new tools can be added in alignment with data engineering teams to ensure that platform , access and training support is still managed through that central team. This removes the burden of operational tasks being handled by business teams.
  7. Continuous end-user training sessions on the data layer, access protocols and data policies should be communicated in forums on a cadence (Monthly is most common). These forums also provide a platform to showcase product implementations , have questions answered, know what’s next in the pipeline and some refresher eduction. Typically a power user within each business unit is identified and that person becomes the champion for the data team within their own business unit. They help the data team manage the multitude of requests that keep flowing in and help prioritize these requests. A process to request changes/additions to the layer, prioritization , and SLAs (service level agreements) for bug fixes and changes need to be defined clearly and continuous communication regarding the same is essential. Data Literacy is a more formal way of educating the entire organization and permeating the data culture across all teams and not only business and data teams. Once the layer has been set up, a data literacy plan can be set up to educate the broader organization. More in an upcoming chapter.
  8. Center of Excellence teams can be set up by unit to solve problems across various business units as well for very large organizations to manage more complicated requests that require specialized skillsets (this will be detailed in a separate chapter)
  9. Access to 3rd party data (census, weather, macroeconomic data, etc. based on business need) should be made available for broader analysis
  10. We talked briefly about data stewards owning the definitions of metrics. These are typically documented in a Business Glossary that is shared across the entire organization. Any changes to the definition are reflected in the glossary and the change is allowed only by the responsible data steward. That change needs to get trickled down the data and user chain to ensure that alignment is continued. This is just one part of Data Governance. In addition, it is required to track lineage of data elements within the semantic layer and also catalog all data and reporting objects that are available for end users to access and use. Typically organizations invest in a Data Governance tool, but the key part is ensuring that the process to maintain governance is repeatable, scalable and simple enough that it continues to be implemented.
  11. Access to data, specifically PII (personal identifiable information like email, SSN, Driver’s license etc.) needs to be encrypted and access to that needs to be managed and controlled. With laws like CCPA and GDPR it has become critical for organizations that store and user customer data to be able to provide to their customers information about how their data is being used and delete if required. Hence Data Privacy is a very critical part of the data architecture and needs to be taken very seriously. More on this in the Data Privacy chapter.
  12. One of the essential requirement is to communicate on a monthly basis to the stakeholders supporting the initiative. It’s a meeting to discuss priorities (if they have shifted), successes, failures and ofcourse the success metrics. Have your power users communicate to their business heads before hand of any successes they have had so that those successes get shared with the broader executive teams. This meeting is very essential and should never be missed.

So, this is a good checklist to ensure that you will be able to successfully democratize data across the organization and help enable your organization be data-driven.

Checklist to Democratize Data

  1. Do you have Enterprise Data Strategy with a prioritized list of data priorities for 12–18 months in the future and their corresponding success metrics defined?
  2. Do you have an understanding of the type of modern data architecture you need to implement? Batch/Streaming? On-premise/Cloud?Integration with tools, ability to extend into Data Science/ML/AI in the future?
  3. Do you have a central data team in place who will build/own/maintain the data piping into the data lake/data warehouse?
  4. Do you have the right data quality processes as defined above in place?
  5. Do you have a clear understanding of how the data is going to be used by the end users so that it can be modeled and presented in a manner that let’s business users be self-service?
  6. Have you defined a standard set of reporting tools for the organization and providing the requisite platform support, access and training?
  7. Do you meet regularly with end users where you can share information about the data available, access protocols and data policies? Have you identified power users/data champions for each of your business units?
  8. Have you identified data stewards for the datasets that you are going to be piping into the data platform (they will be key to setting up Data Governance, defined metrics, business glossary, etc.)? Are you looking to invest in a Data Governance tool? Is the data governance process well defined?
  9. Do you have clear access controls for sensitive data? Is your data organization set up to support CCPA and GDPR requirements?
  10. Do you have monthly meetings with your executive team to walk them through the continued status of this large data initiative?

If you want to talk more or provide inputs/feedback about enabling data-driven organizations or any of the topics above contact me at thedatawall@gmail.com

If you’re interested in connecting, follow me on Linkedin, and Medium.

--

--