Case Study
Select Page

Case Study

Enabling Precise Historical Truck Load Factor Calculation with Data Cleansing and Enrichment

Project Background

As data becomes the most important asset of the 21st century, businesses invest a lot of resources to leverage it to the fullest. However, before extracting the insights from your data and using it to improve operations, companies need to pay close attention to their data quality. In the linehaul business, many organizations do not have reliable KPIs that demonstrate how full or empty their trailers and trucks are.

The problem behind this is connected to data collection and validation that reinforce a simple principle of “garbage in, garbage out” when extracting the insights.

By applying modern technologies such as Artificial Intelligence and Machine Learning algorithms, logistics companies can increase the accuracy of their data to over 90% and make competitive decisions that are truly data-driven.

Container Management Book a demo


Companies operating in the logistics industry often suffer from very poor data quality, which is regularly much worse than one can find in say a bank or in a healthcare provider. There are a couple of reasons for that. First, the data is mostly generated by people since the penetration of Electronic Data Interchange (EDI) in Logistics is in early stages. There is still no consensus on the standards for EDI recordings or data transfers which further troubles the implementation and usage of such systems.

Second, the number of shipments being transported every day is extremely high. If you look at the top logistics companies, they are moving hundreds of millions of shipments per month now, especially with the growth of e-commerce, which has accelerated even more due to the Covid-19 pandemic. Having people enter all this information manually into the IT systems with high reliability is enormously costly and most companies cannot afford that. Thus, it is a common pattern for logistics companies to use the existing data that is sparse and imprecise and simply make the best of it.

Here are 4 examples of data quality issues:

Duplicated Data - Various systems have duplicated data, creating a challenge to make sense out of the data. This problem usually occurs by accident over time, when merging data sources or migrating systems. A clear example of duplicated data would be a double-scanning made by mistake e.g. a scan of the shipment shows that it is on Truck A and the next scan is for the same shipment but on Truck B.

Unstructured Data - In this case, the data is usually correct, however, it does not provide the needed insights since it is not organized in the data set. A common instance of this issue happens when different offices or regions use various methods of structuring their data, making it difficult to have it as a single version of truth. If there is email communication with the customer about the orders, Natural Language Processing (NLP) can assist and transform unstructured data (email text) to structured data (data set).

Inconsistent Data - Within the context of logistics, inconsistent data means inability to recreate the whole lifecycle of the shipment/container. For instance, according to the data, the shipment has never left the Hub A, then, 2 weeks later, the shipment is located in Hub B. This issue is generally caused by human errors.

Predictive Linehaul Planning in Express/Parcel Business

Incorrect Data - Incorrect data is a common issue in the processes that are extremely manual-driven. For example, an employee of a transport company can enter 2 additional zero’s on top of the real numbers. Another example are street addresses that are prone to incorrect data due to spelling mistakes.

Access the full case study

Download Resources


Our Client is a top 5 provider of groupage overland transportation in Europe. They didn’t have direct statistics of the utilization (load factors) of their groupage linehauls (trucks moving between hubs). While warehouse staff perceived average linehaul utilization to be 95% or more, this number was based on a few spot-checks and was not backed up by consistent and detailed operational data. Thus, management felt that valuable optimization opportunities to improve linehauls, cancel unnecessary trips, and increase revenue per cubic meter were being missed.

From a business perspective, the assignment of Transmetrics was to reach data-driven linehaul load factors transparency and the ability to see problem areas and patterns with the utilization of linehauls. As a second phase, this would turn into a predictive optimization of future linehaul capacity. In this case study, we focus on the process and the achieved results of the first phase.

From a technical perspective, we were tasked with seeing whether data quality of the Transport Management System (TMS) data could be improved sufficiently, to reach a level where loading factors of linehauls measured from TMS data would be matching what is observed on the hub floor.

Here is what Transmetrics did to overcome the challenges:

Enabling Precise Historical Truck Load Factor Calculation with Data Cleansing and Enrichment


Transmetrics has developed a way to improve the data quality via the back-end, after information is extracted from the TMS of the client. This is an AI-driven data quality framework, which gives reliable results, at a relatively low cost, with a one-time set-up effort.

Furthermore, since this is an algorithm, it is automated once and is continuously used on a daily basis, cleansing the data on-the-fly, without in any way affecting the existing processes of the company and running in parallel to the established operations.

The AI data quality algorithm was applied to the case of the client. Since data quality was quite low (in the estimation of both the client and of our team, less than 20% of the data could be used as-is) we had to use most tools available at our disposal:

Natural Language Processing (NLP)

It was used to group different spellings of the same customer name (e.g. “Company A.G. “ and “Company AG”) together, as well as to group different commodity spellings (“WEIN” and “Wine”). It was also used to extract shipment dimensions, which were sometimes entered by some customers as part of free-text comments in the data.

Machine Learning Algorithms

They were used to train the computer about the sizes of commonly seen shipment types, so it could complete missing values, as well as detect and correct non-precise user entries. After training on the 20% of data that was identified to be correct, the AI correctly filled in and corrected the missing values in the remaining 80% of the data.

Computer Optimization Algorithms

The algorithms were used to reconcile different predictions to each other, e.g. if the volume for a particular shipment was predicted to be 0.5 m3, but the dimensions were predicted to be 1m x 1m x 1m (=1 m3) the two values would be adjusted to fit each other, and the learning would be propagated to also adjust the prediction for other shipments. Optimization was then used to calculate how the shipment would fit the truck contour in a 3D environment.



With the data cleansing process now signed off, we were able to show our Client that the actual loading factors are significantly below the reported 95% on average. In fact, the average on all the lines analysed, was closer to 55%, with some lines reporting 20-40% load factors during the analysed period. Furthermore, each loading factor measurement was backed up by precise shipment-level data (as shown in Figure 2 below).

These results have demonstrated our client, that in order to achieve higher and much more efficient loading factors, the approach towards transport data should change drastically. However, thanks to the Transmetrics Data Cleansing module, the process of data cleansing and enrichment could be effortless and opens doors for accurate forecasting and predictive capacity utilization. In return, the clients who have implemented it, can already move to the proactive and data-driven logistics operations in the span of several months, while also improving the bottom line.

Transmetrics actively supports the development of this tool and has a roadmap for further improvement and features in place. We welcome new projects to overcome any challenges that other linehaul companies are facing. Our industry experts are standing by to help you out!