What is Data Mesh and why do I care? – Part III.
In the first part of our series on Data Mesh, we introduced the concept and principles of Data Mesh. In the second part of the series, we looked at the technology enablers of introducing the Data Mesh idea to your organization and typical objections to Data Mesh. In this final part of the series, we will introduce the plan of how to start with Data Mesh in your organization.
How do we start with Data Mesh?
1. Assess organizational data maturity, pain points, and plans
Do a quick assessment to measure organizational maturity in data areas such as:
- Data modeling – what modeling standards are used, how are models reviewed, what tools are used and how they are integrated, what artifacts are generated from models…
- DataOps – state of CI/CD for data flows, batch jobs monitoring, logging analysis and reporting, data quality monitoring, infrastructure monitoring, and scaling…
- Data Security – how are defined and enforced security policies, how are data classified and how is the classification retained during transformation processes, data lineage analysis, how is users identity and role identified and managed…
- AI/ML Review – how (if at all) is AI/ML used within an organization, what datasets are required for model training, what data are produced…
Part of the assessment is to also capture the current data stack, and identify potential risks and pain points (legacy tools, expensive licenses preventing wider tools usage, performance or stability issues, etc.).
The assessment should also gather long-term strategy and key ongoing or short-term planned business projects that either impact the data area or require critical data inputs. Such assessment typically takes 4-6 weeks.
2. Plan tactical and strategic data stack and activities
Based on the assessment and gathered inputs prepare:
- Data platform strategy – describe at a high level outlines how should the data platform operations, capabilities, what are key interactions with other organization’s projects
- Tactical (next 3 months) and strategic (1-2 years) data stack – what tools should be used, what deprecated, how should they be integrated together and with other systems
- Domain model – prepare the initial data domain model (L1) and break it into sub-domains (L2) where possible. We suggest using organization structure and IT systems architecture for the initial domain split. In other words – leverage “Conway’s law” rather than trying to fight it.
- Define governance model – data platform governance structure (incl. mapping of domains onto an org chart, outlining key roles and processes to define and approve Data Products, security rules, monitor operations, audit data accesses, etc.
This activity should take 2-4 weeks including review and approval by key stakeholders.
3. Identify pilot and staff pilot team
Select domain and pilot Data Products (and reports) that should be constructed. Allocate necessary team – prefer fully dedicated allocation where possible to ensure the team’s full focus. Part of the pilot is typically also a validation of new technologies and tooling. For those make sure that appropriate support from IT Operations is committed – to providing necessary installation support, network (firewalls, etc.) setup, access to source data systems, credentials provisioning, etc. Account ample time to resolve and stabilize each tool before allocating users to use it.
The goal is to deliver pilot Data Products and Reports within 3 months (ideally 2 months – depending on lead time for new tooling setup, if any).
4. Evaluate and scale
Evaluate issues met during pilot delivery – especially focus on classifying if the issues are once-off (due to new methodology, tooling, and/or team) or have a more fundamental root cause that needs to be addressed.
Decide on the next data domains and outline the initial set of new Data Products for the build. Communicate project and Data Mesh concept to a wider audience, especially where users can find new Data Product Catalog and relevant reports and provide a contact for the expert team.
The critical step is to establish an in-house “black belts factory” – a program to train the trainers who can then support the Data Mesh rollout organization-wide.
We are providing our customers with the necessary knowledge, training, assets, and resources to quickly start the Data Mesh journey.
Our services typically consist of:
- Quick 2-days focused pre-assessment on identifying key focus areas and areas where Data Mesh can bring the most business value.
- Run a 2-3 days “data hackathon” integrating with real systems and proposed tooling to demonstrate their feasibility and efficiency.
- Driving data assessment, presenting outcomes, and proposing plans to C-level stakeholders to gain wide support for Data Mesh roll-out.
- Design tactical and strategic data architecture, recommended data stack, prepare guidelines (modeling methodology, ingestion patterns, CI/CD pipelines, etc.), and set up architectural ceremonies (data forum, architecture approval committee, etc.).
- Provide resources to lead and deliver the Data Mesh pilot where the organization can’t sufficiently quickly set up internal staff.
- Set up and run a training program for internal teams to ensure organizational self-sufficiency and keep key know-how “in-house”.
Grow2FIT BigData Consultant
Miloš has more than ten years of experience designing and implementing BigData solutions in both cloud and on-premise environments. He focuses on distributed systems, data processing, and data science using the Hadoop tech stack and in the cloud (AWS, Azure). Together with the team, Miloš delivered many batch and streaming data processing applications.
He is experienced in providing solutions for enterprise clients and start-ups. He follows transparent architecture principles, cost-effectiveness, and sustainability within a specific client’s environment. It is aligned with enterprise strategy and related business architecture.
The entire Grow2FIT consulting team: Our team