
Accuracy in big data isn’t a single number, it’s context-dependent, metric-specific, and only meaningful when validated against the purpose it’s meant to serve.
In today’s connected world, traffic engineers and agencies have access to more data than ever, and much of this information comes from probe-based datasets, such as Floating Car Data (FCD). These sources deliver unprecedented visibility into real-world traffic conditions at scale.
However, access to more data does not automatically resolve the long-standing question of accuracy, meaning how closely a dataset reflects the true traffic conditions on the road. Accuracy is often treated as a single, absolute metric, something a dataset either has or doesn’t.
To unpack what accuracy really means, we recently hosted the webinar, Big Data That Delivers: Accuracy, Governance, and Agency Best Practices, featuring Craig Smith (TomTom), Jesse Coleman (City of Toronto), Jesse Newberry (HNTB), and Matthew Konski (Altitude by Geotab).
As it turned out, the discussion made a point unmistakably clear: Accuracy is nuanced and contextual; it depends on what you’re measuring, the problem you’re trying to solve, and how well you understand the data behind the chart.
Why accuracy in big data isn’t one thing
This distinction is especially important with Floating Car Data. Each metric, such as speed, travel time, volumes, and origin-destination flows, has its own determinants of accuracy, shaped by factors such as data sources, sampling rates, penetration, and modeling methods.
But understanding what influences accuracy is only half of the equation. The more important question is: What level of accuracy is actually required? Accuracy requirements differ widely depending on the metric and the purpose.
In other words, accuracy isn’t a single universal standard. It varies by context, by use case, and by the decision being supported, and it must be validated through transparency, validation, and fit-for-purpose evaluation.
Accuracy means something different for every metric
One of the clearest takeaways from the panel was that accuracy requirements vary depending on the use case. Each metric has its own determinants of accuracy, shaped by how the data is collected, processed, and modeled.
Let’s look at an example involving different metrics on different road types: freeway travel-time monitoring versus turning-movement estimation at urban intersections. Both analyses rely on the same underlying data attribute, ping frequency. But the required level of that attribute differs. On highways, minute-level data may be sufficient to capture stable travel-time patterns. At urban intersections, however, estimating turning movements requires much higher-frequency, second-by-second data to reflect stop-and-go behaviour, short movements, and rapid fluctuations.
The same principle applies across the board:

Data accuracy requirement is determined by the purpose
Beyond the technical determinants of accuracy, the required level of accuracy is also shaped by the purpose of the analysis. Different decisions demand different levels of precision. For example, if an agency is responding to a citizen complaint about suspected speeding on a neighbourhood street, a few days of historical speed and volume data may provide sufficient evidence to evaluate the concern and guide an operational response.
In contrast, a Transportation Master Plan carries long-term implications, often shaping policy and investment decisions involving tens of millions of dollars in infrastructure spending for the next decade. In this context, agencies require far more rigorous inputs, such as high temporal coverage and carefully calibrated models capable of generating reliable, network-wide volume estimates. The stakes are higher, the analysis is broader, and the tolerance for uncertainty is much lower.
In other words, accuracy requirements scale with the importance, impact, and financial consequences of the decision being made. Fit-for-purpose accuracy, not maximum accuracy in all cases, is what ensures data is used responsibly and effectively.
Best practices: accuracy comes from repeatable validation, not one-off checks
Building on this foundation, accuracy is not something achieved once; it must be maintained continually. Trust in big data, particularly when working with Floating Car Data, grows when validation becomes a routine part of the workflow rather than an exception.
Here are five best practices transportation agencies can adopt to get ahead of the game in embracing big data responsibly and confidently.
1. Embed quality checks directly into your data pipelines
Accuracy isn’t a one-time verification; it’s a continuous discipline. The City of Toronto’s data science team, for example, builds automated quality checks into sensor data pipelines, monitors feed for anomalies, and recalibrates permanent counters every six months. Quality is sustained through process, not assumptions.
2. Corroborate across datasets
Corroborating probe-based insights with independent data sources is a reliable way to validate accuracy. Agencies should regularly compare probe data with pneumatic road tubes, embedded loop detectors, as well as Bluetooth, camera and radar sensors installed permanently or temporarily to confirm patterns and flag discrepancies. Consistency across datasets reinforces trust in the results.
3. Define your thresholds up front
Accuracy is only meaningful when agencies are clear about what “good enough” looks like. Establishing acceptance thresholds, what deviation is tolerable, which conditions break the metric, and what level of confidence decision-makers require, creates a shared standard for evaluating data. Without defined thresholds, quality becomes subjective; with them, it becomes measurable and manageable.
4. Document data lineage and metadata
Knowing who owns the data, where it comes from, and what each field represents may sound basic, but it becomes essential in large organizations where many teams rely on the same datasets. Clear documentation of data lineage and metadata builds internal trust, reduces misinterpretation, and ensures analysts understand the context and limitations behind every metric.
5. Pre-validate a data provider(s)
Agencies no longer accept vague claims of “accuracy”; they expect transparent, defensible validation. That includes clear visibility into sample size and penetration, data frequency, vehicle composition, what is measured versus modeled, independent validation studies, and a willingness to benchmark against agency sensors.
As the industry evolves, data providers must be increasingly open as customers demand accountability. Once a provider has been thoroughly validated, revalidating for every use case may not be necessary. Integrating a trusted, well-understood data provider seamlessly into agency workflows reduces friction, saves time, and accelerates how insights reach decision-makers.

Conclusion
Accuracy in big data is not a fixed number or a one-time achievement; it is the outcome of deliberate validation by both data providers and end users. Agencies that invest the time to build and follow this process will ultimately make the most confident and defensible data-driven decisions.
For a deeper look at what agencies should expect from data providers and how to assess data transparency and credibility, read Top 6 questions to ask a data provider and why they matter.
===============================
If you’re interested in a deeper discussion, including real-world examples, agency perspectives, and insights from TomTom, Altitude by Geotab, the City of Toronto, HNTB, and SMATS, watch the full webinar recording here.
