AI-driven Data Catalog ​

Slide Background

Problem

Data Catalog is an inventory for datasets. It provides detailed data description (i.e. metadata), such as when it was created, to enable users find and understand a relevant dataset. For many traders in financial services, data catalog enables their investment decision making to be more data-driven and evidence-based.

Take Environment, Social and Governance (ESG) for example. Effective metadata tagging will enable financial institutions to evaluate which ESG dataset is better, understanding the underlying methodology and weighted metrics, to make well-informed sustainable investments.

These emerging trends in socially responsible investing has seen a significant increase in the number of ESG vendors and the amount of ESG data on offer. For instance, there are more than 600 ESG rating frameworks as reported by the think tank Sustain Ability. There are thousands of datasets that are composed of thousands of fields, and each filed may contain millions of attributes. This diversity brings complexity and metadata management challenges. Traditional rule-based metadata tagging has been a manual and often laborious process, which can no longer efficiently and effectively tag metadata.

80% of analysts' time is spent discovering & preparing data instead of driving insights.
- CrowdFlower Survey

Solution

AI-powered data catalog can create metadata tags at a much faster rate with better accuracy. AI can automatically discover, profile, organise and document metadata and make it easily searchable. As a result, financial institutions can have a holistic view of usable data, and better understand and compare data quality to use the most appropriate dataset.

With EvoML, Data Governance Team and ESG data vendors can quickly build AI models to automate data catalog processes based on their unique framework, easily integrating new datasets into their existing workflow. Furthermore, EvoML can automatically optimise existing models, enabling the data catalog to be adaptable and usable. These AI models can create critical tags for data outside of pre-defined information and drive enhanced search and discovery.

Benefits

  • • Faster Metadata Tagging

  • AI enables automated tagging at large scale, beyond-human capabilities, accelerating the whole process to speed trading decision-making. Furthermore, EvoML’s Evolutionary Optimisation can optimise models to run even faster to speed up tagging.

  • • Quicker Data Discovery

  • AI enables search-based data catalog process. Investment professionals can quickly discover the most appropriate dataset addressing their investment questions among massive amounts of data.

  • • Cost Saving

Automated process leads to less reliance on resources and cost savings on inefficient manual labour.

EvoML can be used by:

  • • Data Governance Team
  • • Tech Experts in ESG Data Vendors

Find out more use cases in data management

Download