Every data practitioner knows the quality of your data is of the utmost importance, hence the prevalent expression “garbage in, garbage out.” Characterizing the noise-to-signal ratio of data is difficult. However, data quality problems are particularly acute when working with natural language data, which can contain sparse information and lack context or substance. Traditional approaches to assessing the quality of natural language data make use of problematic heuristics such as character counts or entropy-based measures which do not directly contextualize whether a message can be understood atomistically. This talk will introduce a model-based approach to measuring the quality of a natural language message we call InfoQ. This model addresses a series of use cases, from improving the quality of data used for model training to ensuring only the most valuable data is scored by targeted models or included in context windows for generative AI.
Key topics to be covered:
This talk is intended for data industry professionals working in the natural language space, who may be interested in hearing how to develop their own solutions to detect meaningful message data and improve the quality of their targeted or generative AI models.
Nicole Basinski is a Data Scientist on the Behavioral Intelligence team at Aware, a SaaS startup founded in Columbus, Ohio. She has industry experience in the development of product-driven machine learning models in the natural language space, as well as the handling of massive datasets. Nicole obtained a BS in Data Analytics from The Ohio State University in 2021, and completed the Erdos Institute Data Science Bootcamp in 2022. She has also been involved with the DataConnect conference for three years, and is honored to be a part of the WIA community.
As a Senior Product Manager at Aware, I lead the ideation, development and delivery of our Behavioral Insights platform for Employee Experience & Listening, which leverages AI, NLP, and ML to analyze and surface insights from unstructured text generated by employees. Deep experience in building and launching innovative 0 to 1 products, including generative AI, that transforms massive amounts of data to enable better decisioning for leaders and organizations. I'm passionate about creating responsible AI solutions that distill meaningful insights for the enterprise and building product that makes end users workflows more enriched.