Data Product Efficacy
Pre-Requisite — Reader is expected to have basic knowledge of Data Mesh. This article is in continuation of previous 2 articles on Data Mesh — Are we Ready for Data Mesh? and Data Product Canvas.
As the organizations move away from fragmented data silos with lack of trustable data without clear ownership to trusted Data Assets owned and used by Business; they embrace Data Mesh Architecture for Data Platforms. Data Product is a key element of Data Mesh Architecture as only it is perceived as tangible asset which business can leverage in a self-serve manner to derive insights and take data driven decisions. Rest of the 3 principles — Domain alignment, Federated Governance and Self Serve Platform are means to create efficient Data Products.
In this Article, we will talk about usefulness of Data Product. Let’s say you have created a Data Mesh Architecture and start building Data Products. Over a period, there are lot of Data Products being created for self-serve and every month more gets created. Out of the 3 kinds of Data Products — Source Aligned, Aggregated and Consumer Aligned, the ‘Source Aligned’ data products would depend on the sources and hence their numbers would get stabilized. However, the Aggregated and Consumer aligned may continue to rise due to more and more self-service usage by the business.
A Data Product can have multiple interfaces to offer its data to diverse profile of data users like Data Scientist, Data Analysts, Business Users. The same Data Product can offer columnar data or relational data or timeseries data depending upon its users and access mechanism. It can offer an API or it can provide a secure view to access its data or even a stream of data to service different patterns of data to diverse user profile. Users who train machine learning models would need data in a columnar format or for some cases in temporal format. Other users who are going to generate a report may need data in relational table format while real time app developers might need data in the form of event stream.
As every Data Product would need to be maintained and need compute and storage; it’s imperative that we also measure usefulness of the Data Product and identify matching but sparsely used Data Products so that those can be rationalized. We also need to ensure that Data Products are meeting business non-functional requirements in terms of performance and quality.
In other words, we need to identify and monitor KPIs to measure Data Products Efficacy.
Below KPIs can be used to measure Data Product Efficacy. Note that there are some KPIs that would be observed at individual interface level.
The above data will need to be stored into Product meta-store for each Data Product. This will allow each Data Product to maintain its Efficacy Card showcasing how its doing in terms of above KPIs.
Mentioned below is an example of one such Efficacy Card.
It has 3 buttons with below functionality
1. Submit Feedback — Its open a new form where a consumer can provide rating on the scale of 1 to 5 and enter the text. Sentiment Analysis will be performed on the text and the sentiment pie chart will get generated. If a consumer already has provided a rating earlier, then his previous rating will get overwritten.
2. New Defect — This will take the consumer to ITSM tool where he will be able to log a new defect against the data product.
3. Subscribe — After viewing the efficacy card of the Data Product, user can decide to become consumer of the Data Product. On clicking this button, the user will be taken to Data Marketplace to request access to the Data Product.
To create the Efficacy Card above, a meta store need to be maintained. Below is the data model for such meta store.
Below is a short description about the Entities and how the data gets populated into them –
1. Domain — This entity holds Domain information. These details are populated when a new domain gets registered onto the Data Market Place.
2. DataProduct — This entity holds information about the Data Product and what are its Service Level Objectives. This also gets populated when a new Data Product is registered on the Data Market Place.
3. DataProductDefects — This entity holds defects raised against Data Product. This gets populated offline from the ITSM tool.
4. Consumers — This entity holds information regarding the consumers of the Data Product. This gets populated initially when a consumer request for a data product and is granted access to the same. ‘Status’ attribute can hold the values ‘Requested’, ‘Active’ or ‘Revoked’ depending upon whether the consumer has requested access to Data Product via Market Place, has been granted the access or is no longer accessing the Data Product and Access is revoked. ‘ConsumerType’ attribute identifies the type of consumer — whether it is another Data Product or end user accessing directly.
5. DataProductInterface — This entity holds the values for different Interfaces of Data Product like API, SQL or Streaming.
6. DataProductAccess — This entity adds a record every time a Data Product is accessed. In case of streaming interface, the ‘DataLatency’ attribute is filled. This contains information on how much delay is there between the creation of data and its serving via Data Product. The Size of Data Served and Start/End time are filled only for API and SQL interface.
7. DataProductFailures — This entity adds a record every time there is a failure in the Data Product rendering its unavailability. ‘FailureType’ captures if the failure was only on a particular interface or complete data product. Monitoring Tools need to be enabled to identify the failure and insertion of record.
Enterprise Architect — Data, Cloud and AI/ML