May 7, 2024

Setting a New Standard: ADI Introduces Groundbreaking LLM for Semantic Type Detection

Setting a New Standard: ADI Introduces Groundbreaking LLM for Semantic Type Detection

90% of data remains underutilized because it is isolated, disconnected, and lacks the quality to be trusted. The ability to truly understand this data is essential for enhancing its usability and maximizing the insights derived from it. Semantic Data Modeling, which organizes data structures based on the meanings of data elements, is becoming increasingly important. Despite the availability of various data catalog solutions, challenges persist in semantic data modeling. These challenges include the rarity of curated data, the extensive manual labor required for data curation, and the often ambiguous data usage and ownership across different departments.

Automated Entity Matching Solution by Automated Data Inc. (ADI)

ADI has developed an automated entity matching solution featuring a Large Language Model (LLM) capable of detecting and classifying over 50 different semantic types within data, regardless of its location or format. When compared to open-source LLMs and Generative Pre-trained Transformers (GPTs), ADI’s proprietary LLM demonstrated a significant improvement in accurately predicting the semantic types and labels of each data element.

Outperforming GPT-4 and Open-Source Models

The performance results are based on evaluations of ADI's LLM, GPT-4, and seven other open-source zero-shot classification models on over 600 data elements. This report highlights the metrics for the best-performing zero-shot models. ADI's LLM consistently identifies the correct semantic types of data elements with over 96% accuracy, significantly outperforming GPT-4 at 81.3% and the best open-source model at 34.9%. The LLM also excels in semantic classification for individual semantic types, significantly surpassing other models.

In terms of processing speed, ADI's LLM is the fastest, completing semantic classification in less than 0.1 seconds per data element or column. In comparison, the GPT-4 model requires 4 seconds, and the best open-source model takes 1.6 seconds per data element on average. Although speed was not the primary focus during the development of this model, it proved critical when handling large volumes of data or when the system must operate in real-time.

A Powerful Language Model for Semantic Classification

ADI leverages its deep semantic understanding of data to automatically detect semantically overlapping data elements across various data sources, enabling effective entity matching and linkages. These entities typically include, but are not limited to, companies, people, addresses, financial instruments, and brands/products. The matching engine can identify over 50 semantic types and seamlessly match entities across diverse datasets, even when they are stored in different locations such as Snowflake, Databricks, and Excel spreadsheets.

Unlocking Data Value with Automated Semantic Understanding

The challenges of siloed, unconnected, and low-quality data have long prevented organizations from fully leveraging their data assets. However, ADI's innovative LLM presents a game-changing solution. Its rapid processing capabilities and ability to seamlessly identify and match entities across disparate data sources make it an invaluable tool for organizations aiming to dismantle data silos, streamline entity resolution, and unlock unprecedented insights from their data. With ADI's approach, the journey towards true data democratization and enhanced data value is now more direct and faster than ever.

Learn more about how you can unlock clarity and accuracy with your data. Contact us today for a free trial.