Why accuracy in big data isn’t one thing – and 5 best practices for validating it

Accuracy in big data isn’t a single number, it’s context-dependent, metric-specific, and only meaningful when validated against the purpose it’s meant to serve.

SMATS Traffic Solutions

In today’s connected world, traffic engineers and agencies have access to more data than ever, and much of this information comes from probe-based datasets, such as Floating Car Data (FCD). These sources deliver unprecedented visibility into real-world traffic conditions at scale.

However, access to more data does not automatically resolve the long-standing question of accuracy, meaning how closely a dataset reflects the true traffic conditions on the road. Accuracy is often treated as a single, absolute metric, something a dataset either has or doesn’t.  

To unpack what accuracy really means, we recently hosted the webinar, Big Data That Delivers: Accuracy, Governance, and Agency Best Practices, featuring Craig Smith (TomTom), Jesse Coleman (City of Toronto), Jesse Newberry (HNTB), and Matthew Konski (Altitude by Geotab).

As it turned out, the discussion made a point unmistakably clear: Accuracy is nuanced and contextual; it depends on what you’re measuring, the problem you’re trying to solve, and how well you understand the data behind the chart.

Why accuracy in big data isn’t one thing

This distinction is especially important with Floating Car Data. Each metric, such as speed, travel time, volumes, and origin-destination flows, has its own determinants of accuracy, shaped by factors such as data sources, sampling rates, penetration, and modeling methods.  

But understanding what influences accuracy is only half of the equation. The more important question is: What level of accuracy is actually required? Accuracy requirements differ widely depending on the metric and the purpose.  

In other words, accuracy isn’t a single universal standard. It varies by context, by use case, and by the decision being supported, and it must be validated through transparency, validation, and fit-for-purpose evaluation.

Accuracy means something different for every metric

One of the clearest takeaways from the panel was that accuracy requirements vary depending on the use case. Each metric has its own determinants of accuracy, shaped by how the data is collected, processed, and modeled.

Let’s look at an example involving different metrics on different road types: freeway travel-time monitoring versus turning-movement estimation at urban intersections. Both analyses rely on the same underlying data attribute, ping frequency. But the required level of that attribute differs. On highways, minute-level data may be sufficient to capture stable travel-time patterns. At urban intersections, however, estimating turning movements requires much higher-frequency, second-by-second data to reflect stop-and-go behaviour, short movements, and rapid fluctuations.

The same principle applies across the board:

  • Volume estimation accuracy: Volumes estimations are generated by applying statistic models to observed counts, and these models must be calibrated with ground-truth data to achieve meaningful accuracy. In this context, the key determinants of accuracy are the representation of the observed sample and the quality and calibration of the underlying model.
  • Speed accuracy: Speed metrics depend on how frequently vehicles report their positions and how many probes are present on the road. Higher probe density and consistent reporting intervals create more reliable speed profiles, particularly across different varying road types where traffic conditions change at different rates.
  • Origin–destination (O/D) accuracy: O/D metrics, such as trip counts, OD matrices, trip durations, and travel times require the ability reconstruct full trips from probe traces. Accuracy depends on clear data lineage and sufficient ping frequency to reliably infer complete trips and movement patterns.
  • Travel time accuracy: Travel time metrics depend on consistent and continuous probe traces along a route. Adequate sampling density and stable ping intervals ensure that segment-level speeds and delays are captured accurately, while gaps or sparse observations increase uncertainty, especially in areas with complex traffic patterns or frequent stops.

Data accuracy requirement is determined by the purpose

Beyond the technical determinants of accuracy, the required level of accuracy is also shaped by the purpose of the analysis. Different decisions demand different levels of precision. For example, if an agency is responding to a citizen complaint about suspected speeding on a neighbourhood street, a few days of historical speed and volume data may provide sufficient evidence to evaluate the concern and guide an operational response.

In contrast, a Transportation Master Plan carries long-term implications, often shaping policy and investment decisions involving tens of millions of dollars in infrastructure spending for the next decade. In this context, agencies require far more rigorous inputs, such as high temporal coverage and carefully calibrated models capable of generating reliable, network-wide volume estimates. The stakes are higher, the analysis is broader, and the tolerance for uncertainty is much lower.

In other words, accuracy requirements scale with the importance, impact, and financial consequences of the decision being made. Fit-for-purpose accuracy, not maximum accuracy in all cases, is what ensures data is used responsibly and effectively.

Best practices: accuracy comes from repeatable validation, not one-off checks

Building on this foundation, accuracy is not something achieved once; it must be maintained continually. Trust in big data, particularly when working with Floating Car Data, grows when validation becomes a routine part of the workflow rather than an exception.  

Here are five best practices transportation agencies can adopt to get ahead of the game in embracing big data responsibly and confidently.

1. Embed quality checks directly into your data pipelines

Accuracy isn’t a one-time verification; it’s a continuous discipline. The City of Toronto’s data science team, for example, builds automated quality checks into sensor data pipelines, monitors feed for anomalies, and recalibrates permanent counters every six months. Quality is sustained through process, not assumptions.

2. Corroborate across datasets

Corroborating probe-based insights with independent data sources is a reliable way to validate accuracy. Agencies should regularly compare probe data with pneumatic road tubes, embedded loop detectors, as well as Bluetooth, camera and radar sensors installed permanently or temporarily to confirm patterns and flag discrepancies. Consistency across datasets reinforces trust in the results.

3. Define your thresholds up front

Accuracy is only meaningful when agencies are clear about what “good enough” looks like. Establishing acceptance thresholds, what deviation is tolerable, which conditions break the metric, and what level of confidence decision-makers require, creates a shared standard for evaluating data. Without defined thresholds, quality becomes subjective; with them, it becomes measurable and manageable.

4. Document data lineage and metadata

Knowing who owns the data, where it comes from, and what each field represents may sound basic, but it becomes essential in large organizations where many teams rely on the same datasets. Clear documentation of data lineage and metadata builds internal trust, reduces misinterpretation, and ensures analysts understand the context and limitations behind every metric.

5. Pre-validate a data provider(s)

Agencies no longer accept vague claims of “accuracy”; they expect transparent, defensible validation. That includes clear visibility into sample size and penetration, data frequency, vehicle composition, what is measured versus modeled, independent validation studies, and a willingness to benchmark against agency sensors.

As the industry evolves, data providers must be increasingly open as customers demand accountability. Once a provider has been thoroughly validated, revalidating for every use case may not be necessary. Integrating a trusted, well-understood data provider seamlessly into agency workflows reduces friction, saves time, and accelerates how insights reach decision-makers.

Conclusion

Accuracy in big data is not a fixed number or a one-time achievement; it is the outcome of deliberate validation by both data providers and end users. Agencies that invest the time to build and follow this process will ultimately make the most confident and defensible data-driven decisions.

For a deeper look at what agencies should expect from data providers and how to assess data transparency and credibility, read Top 6 questions to ask a data provider and why they matter.

===============================

If you’re interested in a deeper discussion, including real-world examples, agency perspectives, and insights from TomTom, Altitude by Geotab, the City of Toronto, HNTB, and SMATS, watch the full webinar recording here.  

Reach out today

The first step towards a better traffic management solution!

Contact us