February 14, 2023
/
8 minutes

Integrating Ellie and VaultSpeed for business-driven Data Vault automation

Product
Guide
Juha Korpela and Jonas De Keuster
Co-written by CPO of Ellie and VP Product Marketing of Vaultspeed

Intro

The true value of data can only be achieved by ensuring that data is relevant to your real business. Data Vault (or DV for short) is a methodology built around the idea that data needs to represent reality, not technical systems. At its core is the Data Vault model.

A Data Vault model that doesn’t represent reality is useless.

Your data model needs to contain the entities, attributes, and relationships that are familiar to the people who work in the real business. For example, customers buy products, product purchases are billed to customers, companies store product inventory, etc. These are not technical details - far from it!

Capturing this reality is sometimes a challenging task. One excellent and well-tested way of doing that is building a Conceptual Data Model (or CDM for short). A CDM that really reflects your business has a lot of value - even if it’s not always perfect, it helps all stakeholders to communicate with each other, and it guides you towards building actually useful data structures.

A conceptual model can be put into practice by incorporating it into the physical Data Vault model. The physical DV model is the actual design blueprint for your relational database, including columns, column lengths, primary keys, and foreign keys. Its core structure should be derived from the CDM, and its details filled in from the source systems that actually contain all that data. Here, automation is vitally important.

This article explains how to easily build a conceptual data model and then incorporate it into the physical DV model using two efficient and modern data solutions - Ellie and VaultSpeed.

Understanding the real business with Ellie

Ellie is an intuitive data modeling tool with enterprise-level data modeling and information architecture features. Moreover, it’s cloud-native, so it’s easy to set up and use, and to collaborate with your team.

The point of Ellie in this process is to capture the real business needs in a structured way, so that the DV design is built on solid ground. Ensuring a shared understanding of the core business entities and relationships is crucially important - otherwise, we might be building things that are in the end difficult or even impossible to get value from.

Taxonomy is a word you might encounter in Data Vault methodology. In simple terms, it means capturing the structure of how the business identifies and categorizes things of importance.

Consider, for example, the business case of a car-motor-bike products store. Below you can see how they categorize their offering. This is an example of a business taxonomy:

The same structure built in Ellie looks like this:

The “boxes” on this canvas are entities - reusable business objects that have definitions and other metadata in Ellie’s repository. They are connected to each other in different ways, representing the reality of this store’s business.

Describing this structure as a conceptual model in Ellie is useful for all kinds of projects, but here we have a clear path forward: we want to build our Data Vault based on this.

Feeding the Ellie Conceptual Data Model into VaultSpeed

Building integrations between tools is made so much easier when you can use REST APIs. Fortunately, both Ellie and VaultSpeed have well-documented REST APIs in place. In this example, we’re using a Python script that does the following:

  • Reads a conceptual data model from Ellie using Ellie’s Model Export API
  • Generates hub groups for VaultSpeed from the business entities in Ellie’s model, based on their metadata
  • Builds the hub group objects in VaultSpeed using VaultSpeed’s API

This results in a pre-filled canvas for hub groups in VaultSpeed, saving us the time it would have taken to create them manually in VaultSpeed - and ensuring we’re actually following the business model!

Ellie allows you to create custom metadata in the glossary. For this particular integration, 3 fields were added:

  • “implementation level”: this shows whether you want to implement your taxonomy at this level in the data vault
  • “short name”: used to name the hub group
  • “abbreviation”: used to name the hub group

Example: integrating with a Python script

Python is the easiest way to translate the Ellie model JSON file coming from the Model Export API into readable input for VaultSpeed’s API endpoints.

The script does several things:

  • authenticate and get the Ellie model JSON
  • authenticate to VaultSpeed
  • iterate through the Ellie JSON to find all the elements that are to be implemented as a hub group
  • automatic hub group creation using the metadata.

Find more details and the full script in this VaultSpeed community post.

Conclusion

When facing enterprise-level data integration challenges, lots of data models need to be joined together. The scale of this effort requires automation. Data Vault is great for both integration and automation. 

It might be tempting to base yourself on purely the source metadata to build the DV model, but you would end up with a model that represents only the source and not your actual business. This anti-pattern is often referred to as “fake Data Vault”. The result is that business will not really receive an integrated picture of their situation - merely a repeat of what they happen to have across their system landscape.

To deliver true integration, you need to start from a business perspective and blend that perspective with the reality that exists in various data sources.

That means there are 2 main criteria for building a good Data Vault model:

  • automation to map source models to an integrated Data Vault model
  • a conceptual data model to make sure that your Data Vault model accurately represents your business.

An integration that helps to integrate the CDM into the automation process brings value by reducing manual modeling work and leaving less margin for errors. 

This use case is just one example of how Ellie and VaultSpeed can interact with each other. Other examples where integration would reduce manual work could be: 

  • Exporting object and attribute names from Logical Models in Ellie and importing those into VaultSpeed
  • Exporting VaultSpeed metadata and pushing back the physical object details to Ellie’s glossary.

The beauty of this is that as both tools have flexible and powerful API endpoints, nearly anything can be done in between.

Ellie’s overall integration philosophy is to provide (and keep adding) APIs that allow users and partners to utilize the models and other metadata in all kinds of tools. Ellie’s API specifications are publicly available here. File-based exports and imports from the Business Glossary are provided for human-friendly purposes.

VaultSpeed’s vision on integration varies on the type of integration:

  • First, on the output side, there is a native integration for supported data platform vendors like Snowflake , Azure Synapse, or Databricks, and for ETL solutions like Matillion or dbt. VaultSpeed generates all the SQL code, some form of ETL mapping, and the workflows to orchestrate your runtime environment.
  • Second, VaultSpeed supports metadata harvesting for any kind of source. As long as your source supports jdbc, VaultSpeed’s agent can read its metadata. 
  • Finally, the API allows users to build tailor-made integrations for any other integration type like data modeling or data governance. This requires the customer to build the integration scripts using for example python, or javascript. It gives endless flexibility in setting up any integration users can think of. Read more about VaultSpeed’s API.

Using these two excellent tools in unison ensures a business-focused design that is fast and efficient to implement. It means that you are always doing the “right thing”, and at the same time doing it the “right way”!