Apr 24, 2023

Using AI to Translate Human Language into Data. An Interview with Data Scientist Dr. Melodie Du

The rush to develop AI is leading to some fascinating applications. Take, for example, Natural Language Processing (NLP) which is a subfield of AI that according to Inuvo’s Data Scientist, Dr. Melodie Du, deals with the interaction between computers and human language to enable computers to interpret human language and then generate data from that interpretation.

Her company is expanding on the concept of NLP by the creation of a language-model-based, generative AI capable of identifying the words associated with the audience for any product, service, or brand. “The result is the ability to reach those audiences utilizing our AI systems without requiring any client or third-party data,” she explained.

Charlene Weisler: How does Inuvos’s NLP facilitate consumer intent measurement?

Dr. Melodie Du: We convert the entire open web into an interconnected language model of what we call intent signals. Then we assign categories, sentiment, and deduced demographic info to these signals based on a series of interconnected AI systems. This means our AI breaks down, in real time, any piece of information to its core signals and then aggregates them to determine if it matches against the custom intent models our AI builds for each client’s product, service or brand. This approach allows for a meaningful analysis on all ad impressions even when there isn’t a cookie or an ID-based profile available, including impressions from Safari.

Weisler: You speak of vectorization. What is that?

Du: In NLP, vectorization refers to the process of converting text data into numerical vectors or arrays that can be processed by machine-learning algorithms. Vectorization is a critical step in many NLP tasks, such as text classification, sentiment analysis, and language modeling.

Vectorization is essential in NLP because it enables machine-learning algorithms to process and analyze textual data typically unstructured and difficult for computers to understand. By converting text data into numerical vectors, machine-learning algorithms can more readily identify data patterns and relationships and make predictions or classifications based on that.

Weisler: How does it meet or improve measurement in a privacy-compliant manner?

Du: Without cookies, attribution on a per-source basis is difficult for anyone, especially in channels like display and CTV where the majority of conversions are not associated with clicks, and thus cannot append click IDs. We’ve approached this using different machine learning with an approach we call our Inuvo Media Mix Models. By analyzing the constant variations in spend between traffic and spend sources, our AI is able to probabilistically determine what channels / tactics are driving conversions – totally agnostic of cookies and identifiers.

Weisler: Can you give me an example of how this all works?

Du: When conducting a contextual analysis of a web page, hundreds or thousands of signals may be returned, but most of them are usually not related to the main topic. We use a combination of methods—including frequency-based probabilities, concept-graph weights, taxonomy and vectorization of the concepts—to filter out irrelevant signals and suggest relevant ones that may not be explicitly mentioned in the text. This approach can be especially valuable in a consumer-privacy-first world devoid of cookies, where inferring user intent with a single page visit is crucial. By using these methods and our language model, our AI can more accurately infer user intent and suggest relevant content, even if it is not explicitly mentioned on the page.

Weisler: What are the challenges with these methods (NLP and vectorization)?

Du: One of the main challenges of NLP is the ambiguity and complexity of natural language. Natural language is full of nuances, idioms, expressions, and sarcasm, which are difficult for machines to understand. The meaning of a sentence can vary depending on the context, and sometimes even subtle changes in wording can alter its meaning. The challenge of vectorization is to represent complex data in a way that can be efficiently processed by algorithms. The challenge here is scale. When you’re dealing with high-dimensional data, such as text, the number of features can be in the millions or billions. Based on Inuvo's domain expertise, we devolved our own model to make this procedure less time-consuming while maintaining compactness and precision.

Weisler: How can these methods be best implemented?

Du: Our team has implemented custom pipelines for scaling our AI algorithms to handle big data, using technologies like Spark, Hadoop, and MapReduce, as well as implementing our models in both Java and Python—and we hold several key patents for these components.

Weisler: Is anyone else doing this?

Du: No. We are not aware of any company that has created a similar AI system. Inuvo has an end-to-end advertising solution that leverages AI language models at its core to generate audience insights, deliver media and determine attribution across channels. Others might have pieces or parts, but no one is able to generate audience models purely from an AI.

Weisler: How can this be rolled out in scale?

Du: Inuvo has invested a significant number of resources and capital to build out a world-class infrastructure that allows us to process the entire daily open web traffic signals in North America—and have our AI process all that data to regenerate its learning at a back-breaking speed of every 5 minutes. Our AI is then able to create custom client intent models in real time to ensure the client’s message reaches its intended audience at the right time and before others. 

This article first appeared in Mediapost.

 

No comments:

Post a Comment