The question of data quality is becoming critical. Which datasets are useful and accurate and which might lead us to false or misleading conclusions? Scott McKinley, CEO of Truthset has set out to clarify and define the term “quality data” and set the standard for its future usage. “The world increasingly runs on person-level data,” he explained, “Marketing, advertising, personalization, dynamic pricing, and customer analytics all rely heavily on person-level data to power every interaction between a consumer facing business and its customers and prospects. But all that data has substantial error, which hurts performance and costs business money.”
Charlene Weisler: What does your company do?
Scott McKinley: Truthset is the first company to measure the accuracy of record-level consumer data at scale, so that companies using record-level data for any use case can understand the quality of that data, make better decisions, and improve the performance of any data driven activity. We believe that bringing accuracy to the data ecosystem will help all boats rise. Data providers, data marketplaces and platforms, marketers, and even consumers will benefit from improving accuracy in the data used to understand, profile, target, and activate.
Weisler: What is your definition of quality data?
McKinley: There are many aspects to data quality. We focus exclusively on the accuracy of record-level data. We know that accuracy is not binary, so we have developed a method to measure the likelihood a given key value pair is true on a spectrum of .00-.99. As an example, if a data provider makes an assertion as to the gender of a specific ID, we can measure the likelihood of their assertion to be true.. Higher likelihood of truth equates to quality. Our mantra is that there is no perfect data set out there, and that even a good data set can have portions of it that are not good. Data sets can be measured at both the aggregate and record level to ensure that buyers can compare entire data sets, and users can pick and choose their own level of accuracy, because scale and price are still highly relevant in decision making about data purchases. For data sellers/providers, we measure the accuracy of their data at the record level and provide both an absolute metric that the provider can use internally, and a relative index that can be used externally to help them differentiate in a sales environment. .
Weisler: Does quality data vary by company, use, timing etc? How do you manage the changeable nature of what is actually quality data?
McKinley: Absolutely. From a data provider standpoint, they
certainly believe that data quality varies, and many sell on that point to
differentiate themselves from other competitors. The tough part, up to this point, is that
they haven’t had an independent third-party to point at to back them up. We have created our Truthscore Index to give
the market a relative look at how data providers are performing comparatively,
and an easier way to say they are X% better than the average score. It is very
much the same on the data buyer/marketer side. Everyone has their own threshold
as to what level of data quality they will accept. Throughout the year, campaign-by-campaign,
marketers choose different segments and at varying granularity or scale, and
with each, they have to make individual decisions to address each
appropriately, balancing scale, their budget, and now the quality of the
data. With Truthset scoring at the
record level, the marketer/data buyer can experiment with the level of quality
that meets their needs and feel confident that they are getting what they are
paying for. Regarding the changing
nature of data, we require that every data provider we work with be measured
quarterly to ensure that we always have the freshest possible data.
Weisler: Does quality data span first to third party?
McKinley: The measurement and rating of data quality that we are focused on is record level data. This means that whether the data is zero-party, first-party, third-party, or anywhere in between, collected via SDKs or pixels, via aggregating up other data sources, CRM files, etc., We can reliably assign a Truthscore to the record level data so that the owner, buyer, or user of that data can understand the accuracy of each record. Truthscores are a numerical value between 0.00 and 1.00, that quantifies the probability that consumer-level data is truthful and accurate.
Weisler: What data sets do you process and vet?
McKinley: We have built what may be the largest cooperative of consumer data with leading data providers and we use that data to compile the most accurate view of demographic assignments for most of the US population. Today, Truthset is keys off of Hashed Emails and the demographic attributes that describe those records. For example, a record would have my email address (hashed, of course) with “female, age 35-44, Hispanic, etc.” as the descriptors. We evaluate if each one of those attribute values is correctly assigned to that hashed email. To do this, we work with multiple data providers and have them all contribute their weighted vote as to if that association is correct or not. The weighting in their vote comes from taking each data provider and comparing them to validation sets (these are our “truth sets”), giving each data provider (at each attribute value) a weighted vote. All of the data providers come back together to vote, giving each record a Truthscore (0.0-1.0) value. We then offer out to the market an index of these Truthscores, data provider by data provider, attribute value by attribute value, to make the relative comparisons that we talked about earlier.
Weisler: How can quality data best be used by a company?
McKinley: In so many ways - The desire to use sets of consumer ID’s combined with attributes (“Audiences”) is expanding as more companies rely on data to inform processes such as marketing and advertising, offering financial services, attracting tenants for real estate, or recruiting talent. One example is digital advertising. A major alcoholic beverage campaign may be targeting Hispanic beer drinkers. We have seen that many of the IDs that end up being targeted are either under the legal drinking age, or not Hispanic. By applying Truthscores to that ID pool before the ads are delivered, the brand can suppress unqualified IDs and make sure their advertising dollars are spent on IDs that ware actually in the target and might actually convert.
Another example is data enrichment. Many large enterprises acquire 3rd party demographic data to append to their CRM records. We have seen error rates of up to 40% in commodity demo data - even for the most common demographics. That error causes the enterprise to make incorrect conclusions about their customers and leads to waste in advertising, as target ID pools are built on incorrect data. Truthset can “filter” the appended file to allow the enterprise to understand the probability of each 3rd party record being accurate, and use that information to hold the provider accountable, negotiate pricing based on accuracy, and suppress incorrect records. Better data leads to better results for all downstream use cases.
Weisler: How can you track the use of data through a process to see if at any point the data becomes compromised?
McKinley: Another great reason as to why Truthset was created. A number of us have been at companies that specialize in identity, or have used data science to transform data, or have bought data and inventory based upon one thing to learn that the measurement after the fact told us another thing had actually happened. It’s not just the story about how data can be good or not-so-good - in fact, a number of data providers we work with have great data at scale, but when it goes through “the hops” the accuracy can be degraded (or improved in some cases!). Truthset believes that we should be inserted at every point of hopping to ensure that transparency in what occurred either improved or maintained the quality of the records handled.
Weisler: Should the industry have a standard for data quality and if so how to implement and monitor? Or is it not possible given all of the walled gardens and silos?
McKinley: This is a really great question, and one that hits on so many things. First, yes. In order to make things better, you have to measure and understand, as well as have ways to continue to improve. In our estimate, to be successful as a Data Quality Measurement Solution, you need to hit upon 6 key points:
- Independent, unbiased, unconflicted
- 100% transparent methodology
- Agnostic to ID spaces, attributes, and marketing channels
- Prescriptive, measuring BEFORE and not after-the-fact
- Supports any data-driven use cases
- The phrase “all boats should rise” should be true; the entire advertising ecosystem should benefit from transparency in data accuracy
Second, as we are driving more towards the consumer-privacy first ID space, even with it having the potential to be fragmented, with Truthset focusing on record level data, we can score whenever and wherever these IDs exist. It’s one of the reasons we chose to start with Hashed Emails. Lastly, markets run better with measurement. When there is opacity and uncertainty in any market, there is friction. Standards bring transparency between buyers and sellers, and remove friction in the market so the market can grow faster.
Weisler: Has the pandemic created issues with data quality?
McKinley: The pandemic has created opportunity with data quality. Now, more than ever, the data driving marketing decisions has to be accurate, to be efficient with spend and to ensure a good customer experience continues. As people’s individual lives are changing, targeting someone with a past income range that they may no longer have will waste budgets and make that customer feel negatively against the brand.
Weisler: Can we rely on COVID-19 tracking data for the US - is it of high quality?
McKinley: While we aren’t engaged in the COVID-19 data tracking space, the situation does pull into frame that transparency and granularity in data (within consumer privacy guidelines) is critical to making decisions and acting in the best interest of creating solutions. Also, measurement of data is crucial to determining if there is actual success. Very much like understanding COVID’s daily funnel of metrics (tests, cases, hospitalizations, recoveries and deaths), setting a benchmark for data quality via an independent third-party, and continuing to track and measure is important to determining if actions are resulting in success. We have encountered a number of data providers who are about to go through data cleansing, bringing on new sources, or other data science transformations, and some have hesitated to be measured at this point because they want the best data (assumed to be post-cleansing/data science) to be scored. Our response is that we think they should cut once-measure twice; be scored before the changes, make the change and score afterwards.
Weisler: Where is Truthset now and where do you see the company in 2 years?
McKinley: We want to become the standard for how buyers, sellers, and users of consumer data measure the accuracy of record level data. The Truthset flywheel starts with data cooperation as the first step. We are squarely focused on bringing accuracy to the marketplace between hashed email addresses as consumer IDs tied to the attributes that describe those records. Next, we want to help brands and enterprises understand how bad data hurts marketing performance. We are already engaged with major CPGs and will be producing case studies to demonstrate the cost of error in data, and how we can help fix it.
In the upcoming months (this year and early next year), we’ll be working with leading sell-side and buy-side companies in the ad ecosystem to bring the data accuracy scoring into the equation. We want Truthscores to be available wherever data is available to build audiences and activate. From there, we expand to additional attributes (interest-based segments, purchase, etc.), open ourselves into new environments like CTV, and move ourselves further into the programmatic exchange of buying/selling.
This article first appeared in www.Mediapost.com
No comments:
Post a Comment