Although torque and angle measurement have greatly reduced errors when tightening nuts and bolts, they can’t catch every fault. At Ford, we wondered if machine learning could help detect faulty run-downs. To do that, however, we first had to determine what constitutes an anomaly during a normal run-down.
Developing a system to detect anomalies on assembly lines is challenging. For starters, access to labeled anomaly data is often difficult on assembly lines. There are many potential failure modes, each of which is usually rare and difficult to interpret in time-series data. Furthermore, there is a lack of publicly available datasets that can be used to develop anomaly detection methods for assembly lines.
Manufacturers must therefore develop their own training datasets and solve complex processing challenges that require expertise in both data science and the target process. This R&D is time-consuming, and any given solution may not be transferable to other assembly processes, even if the processes seem similar. These challenges often make it difficult to estimate the return on investment (ROI) of analytics projects. As a result, the value of machine learning is yet to be fully realized in the automotive industry, which typically focuses on short-term ROI projects.
Throughout the engine assembly process, various in-process and end-of-line tests are performed to ensure quality. Many of these tests use static process limits to identify faults. These limits are set by experienced engineers with considerable knowledge of the process. These limits are also reviewed and updated regularly.
For many test processes, data is displayed visually on dashboards in real time. This can help identify when a process is drifting out of spec, particularly when the data are clean, well-structured and highly regular, and the failure modes are well understood.
However, there are opportunities to improve the process for detecting anomalies. First, the traditional approach is not well-suited to identifying new anomalies where faults may occur within the specified limits. Second, the traditional approach requires regular tuning whenever the operating parameters of the test or machinery are changed. Automating the process would reduce the burden on test engineers to evaluate and maintain their current methods. Automation would also help to increase error detection rates in complex processes that exhibit high variability.
An assembler uses a DC electric nutrunner to tighten a bolt on an engine. Photo courtesy Ford Motor Co.
To that end, we sought to develop machine learning algorithms that could automatically detect errors when installing threaded fasteners with DC electric nutrunners.
Looking for quick answers on assembly and manufacturing topics?
Try Ask ASM, our new smart AI search tool.
Ask ASM
At Ford, we are already using machine learning to monitor our manufacturing processes. Specifically, we use a single unsupervised algorithm to detect anomalies in various assembly and test processes. We use principal component analysis (PCA) to reduce the dimensionality of the time series data. We then perform a cluster analysis using density-based spatial clustering (DBSCAN) under the assumption that any noise points are anomalies.
This approach is successful with a range of end-of-line tests, outperforming the traditional static limit approach. However, it is not effective at identifying anomalies in fastening processes.
Detecting anomalies when tightening threaded fasteners is challenging. Yes, a transducer in the nutrunner can measure torque and angle over time, and that data can be analyzed to detect process anomalies. The challenge lies in the variability of that data.
An engine contains numerous fasteners with different thread forms, torque and angle specifications, and run-down times. Normal variation in the parts can create different torque and angle signatures. A handheld fastening tool might produce different data than a set of fixtured spindles. A person may pause for a short period between fastening stages, causing normal torque measurements to shift in time. Similarly, an automated process might pause between assemblies for tool changes or geometrical differences between product variants.
This variability in both normal and anomalous data makes traditional unsupervised clustering approaches, such as one-class support vector machines (SVM) and PCA ineffective, since anomalies are not always outliers. Reconstruction methods, such as encoder-decoders, are also infeasible because the data can shift in time at multiple stages, making it difficult to draw probability distributions.
We looked at three semi-supervised clustering approaches to identify outliers in nutrunner data. Each approach uses a Gaussian mixture model (GMM) in combination with different dimensionality reduction methods: PCA, t-distributed stochastic neighbor embedding (t-SNE), and uniform manifold approximation and projection (UMAP). PCA and t-SNE are well-established. UMAP is a relatively new technique that shows promise, but has yet to be explored for detecting manufacturing errors using real-world data.
After applying dimensionality reduction, we then trained a GMM using a semi-supervised approach. The GMM is a common approach to clustering data that assumes the generative processes to produce the dataset can be described by a mixture of isotropic Gaussian probability density functions. By training the GMM on normal data, threshold regions can be defined, assuming any process that generates anomalies will fall outside of these regions.
Anomaly Detection
Anomaly detection is the process of finding, removing, describing or extracting observations in a dataset that are generated by a different generative process than that of most normal data. Anomaly detection in time-series has been studied for more than 50 years in various domains, including fraud detection, cybersecurity, stock market prediction, cardiology, engine monitoring, fault detection and condition monitoring, and manufacturing.
Approaches to anomaly detection vary, depending on the task. However, following advances in neural network architectures and computational statistics, most researchers have focused on machine learning for anomaly detection.
There are three approaches for anomaly detection:
- Supervised: Training data are labeled and include both the nominal and anomalous data.
- Clean semi-supervised: Training data only include nominal data, while test data also include anomalies.
- Unsupervised: Training data are unlabeled and include both the nominal and anomalous data.
Supervised methods frame the anomaly detection task as a binary classification problem (normal vs. anomaly). They use labeled data to train classifiers that distinguish between nominal and anomalous data. This can be effective when the percentage of anomalies is high.
However, in most cases anomalies are rare (less than 1 percent of data points), making supervised approaches infeasible as it is both difficult and time-consuming to obtain sufficient labeled anomalous data. Furthermore, the supervised approach assumes that the distribution of anomalous data can be well-defined, and that this distribution can be used to train a statistical model. This is known as the well-defined anomaly distribution (WDAD) assumption.
In manufacturing, this assumption can be used to detect repeated machine failures for which the problem space is well-understood and sufficient data are available to define the distribution. This is the theoretical basis for Six Sigma, for which time-invariant data are modeled to fit a well-defined Gaussian distribution. If some measurement exceeds ±6 from the mean, those instances are flagged as anomalous.
Unfortunately, the WDAD assumption is rarely applicable in the real world, since few approaches assume that the anomaly and nominal distribution can be accurately modeled. This is especially true in manufacturing, due to the increased complexity and variance of the data.
In cases where the WDAD assumption does not hold, and the fraction of training points that are anomalies is small, unsupervised or clean semi-supervised methods can be used to detect outliers, although these methods may also fail if anomalies are not outliers or if the distribution of the nominal data has long tails.

Fastening data can be challenging for machine learning algorithms, because varying amounts of torque are applied at different stages of the tightening process. Graph courtesy Ford Motor Co.
Types of Anomalies
Anomaly detection in manufacturing deals with time-series data and requires different statistical approaches that assume constant variance and independence of variables. Time-series data are a sequence of observations taken by continuous measurement over time, with observations usually collected at equidistant time intervals. Time-series data can have properties such as trends, seasonality, cycles and levels, which can be used to predict future trends and identify anomalies that deviate from the norm.
Most research has focused on three types of anomalies in time-series data: point anomalies, collective anomalies and contextual anomalies.
Point anomalies are instances where a single point in time deviates significantly from most of the time series. An example of a point anomaly in historic weather patterns could be a single day of heavy snowfall in British springtime. Point anomalies have been studied extensively. Most approaches assume that anomalies are scarce and occur independently of each other. Neural networks, tree-based approaches, SVM and long short-term memory (LSTM) have been used successfully to identify point anomalies.
Collective anomalies are where multiple data points in the time series may be considered normal when analyzed individually, but when viewed as a collective they demonstrate a pattern of unusual characteristics. Continuing with the weather example, a collective anomaly would be if the snowfall continues for multiple days.
Contextual anomalies are cases where data may deviate from most of the anomaly-no-concern (ANC) data points, but are dismissed as normal due to the context. Contextual anomalies are defined by two attributes:
- A spatial attribute that describes the local context of a data-point relative to it neighbors.
- A behavioral attribute that describes the normality of the data point.
Clustering algorithms can be used to identify contextual anomalies in a range of real-world and synthetic data. A common example of contextual anomalies is credit card data. For example, if an individual’s credit card expenditure is significantly high over the course of a week in April, it might be considered a collective anomaly and flagged as fraudulent activity. The same transaction behavior the week before Christmas, however, may be considered normal behavior given the context.
In the credit card example, we can see that there can be an overlap between different types of anomalies. Therefore, it is necessary to develop a system that identifies all three types of anomalies.

To train their AI model, the researchers asked a fastening expert to review sets of run-down data like the ones shown here, looking for instances of good and bad assemblies. Photo courtesy Ford Motor Co.
Dimensionality Reduction and Clustering
The first step in any anomaly detection task is to use domain knowledge to extract meaningful features from the raw data. These features can then be analyzed using statistical tools to highlight outliers, which are potential anomalies.
The number of meaningful features a dataset has determines whether it has high dimensionality or low dimensionality. As the dimensionality of data increases, it becomes more difficult to draw relationships between these features. This will require more data and more processing power to train models to find defects. It also makes the trained models more susceptible to overfitting due to noise across all dimensions.
Dimensionality reduction methods aim to represent high-dimensional data in a lower-dimensional space to visualize data in two or three dimensions and apply cluster analysis approaches that are more suited to lower-dimensional datasets. The most common cluster analysis approaches that have been applied to anomaly detection in time-series data include K-means clustering, fuzzy C-means clustering, GMM, and hierarchical clustering.
K-means and fuzzy C-means clustering involve making initial guesses on the centroid position of a given number of clusters before applying stochastic approaches to iteratively optimize the centroid locations by minimizing the distances to points that lie within each centroid’s respective clusters. K-means clustering is a hard clustering approach in which each point is assigned to a specific cluster. C-means is a soft clustering approach that assigns individual probabilities to each data point so that data can be assigned to multiple clusters.
GMM is a similar clustering approach that assumes that the process can be described by several sub-processes, each of which may generate a Gaussian component in the lower dimensional representation. GMM is a probabilistic approach for which maximum-likelihood estimation algorithms, such as expectation maximization, are used for model fitting.
Unsupervised anomaly detection is a commonly used method. It is often beneficial because it can avoid the need to build high-quality labeled datasets to develop the screening system. However, in real-world applications, testing datasets will need to be developed to test models before implementation. In cases where the fraction of training points that are anomalies is small, any testing datasets will be highly imbalanced, with significantly more normal data than anomalies. In these cases, it is practical to use this surplus normal data as part of a semi-supervised approach.
In hierarchical clustering, the initial number of clusters equals the number of data points. At each iteration, each point is merged with neighboring clusters until a single cluster is formed. This bottom-up approach is called agglomerative hierarchical clustering. This process is then used to construct a dendrogram where branches are joined or split at a depth equal to the number of iterations at which those clusters were merged or split. The resulting dendrogram explains the relationship between all the data points in the system and can be horizontally sliced at any point to return the required number of clusters. Small clusters may indicate anomalous system behavior.
Dimension reduction techniques can be split into two categories: matrix factorization and neighbor graph approaches. Matrix factorization includes algorithms such as linear autoencoders, generalized low-rank models, and PCA.
PCA is one of the most common methods for dimensionality reduction across a range of scientific disciplines, dating back to the early 1900s. PCA uses the eigenvectors and eigenvalues of the dataset’s covariance matrix to construct linear representations of the data in latent space. These linear representations are called principal components. Those with the highest variance capture the most information of the original data and can be retained for further analysis or plotting, while components with low variance can be discarded.
PCA has been widely applied in a range of time series anomaly detection tasks. One limitation of PCA is that if the correlations between features are non-linear or unrelated, the resultant transformation may result in false positives or fail to draw any useful relationships.
In recent years, there have been multiple advancements in the development of learning-based neighbor graph algorithms, such as t-SNE and UMAP.
While PCA retains global structure through eigenvectors with high variance, t-SNE reduces dimensionality by modeling high dimensional data neighbor points as a probability distribution in low dimensional space, thus retaining a more detailed local structure with the loss of some global information. It has been used to detect bearing faults and superconductor defects.
UMAP is a recent advancement in dimensionality reduction that has drawn much attention since its publication in 2020. UMAP has been shown to improve on t-SNE in preserving both local and global structure of data, while also achieving superior run time performance. UMAP outperforms PCA in clustering time-series data based on cyclical and seasonal characteristics, and it has been used in combination with density-based clustering approaches to highlight periods of anomalous behavior in time-series data.

These graphs show the labeled plots of the first two principal components for datasets 1 and 2. These plots show that, when applying the PCA transform on nutrunner data, not all noise points are anomalies, anomalies can form tight clusters, and not all are outliers. Photo courtesy Ford Motor Co.
Methodology
We tested three semi-supervised approaches to detect anomalies in nutrunner data. Each method uses a GMM trained on normal data to generate outlier thresholds in a reduced feature space. Three dimensionality reduction approaches were used in combination with the GMM to compare performance.
Nutrunner anomalies are rare, and historical process data are not always stored long-term. This makes it challenging to obtain sufficient data to develop training and testing datasets. If historical fault data do exist, these still need to be reviewed by a process expert to ensure sufficiently high-quality datasets.
The task of labeling nutrunner data was therefore our first major hurdle. Even fully unsupervised methods require high-quality datasets to validate accuracy. In fault detection applications, training datasets will likely be highly imbalanced, since fault instances are rare. Therefore, large amounts of data must be reviewed by process experts to gather sufficient data to validate the model.
To create our dataset, we gave three fastening experts data from 5,000 run-downs to label. We developed a dashboard to speed up the labeling process. The dashboard presented each expert with a set of 12 run-down graphs to label. Given that anomalies were presumed to be rare, the experts were informed that all the data were examples of normal operating conditions.
If the experts saw any instances that could be considered anomalous, they were asked to label them as follows:
- True anomaly. True anomalies are instances where either a known process anomaly has occurred that has compromised part quality, or an unknown anomaly has occurred that requires further inspection before the assembly is released.
- ANC. An ANC is defined as an anomalous observation for which no action is needed. This may be because the anomaly can be explained by a known process error that is unlikely to have compromised quality.
- Re-hit. A re-hit is an instance where no data were recorded and the amplitude of the time series remains constant at 0.
If none of the 12 examples fell within one of these categories, a refresh button was used to label all observations as “normal,” and the display was refreshed with a new batch of 12 images.
This approach proved to be fast, since less than 1 percent of the run-downs included true anomalies. Using this system, our experts were able to label data at a rate of approximately 1,000 observations every 30 minutes.
Early on, we had to overcome confusion around what constitutes an anomaly. Test engineers consider anomalies to be any observation that would result in a part being rejected. In contrast, data analysts view anomalies as having features or characteristics not found in most of the data.
That’s why we included the ANC category in the labeling process. The judgement between a true anomaly and an ANC is based on experience, so there is some level of subjectivity in this category. For this reason, the task of labeling was given only to engineers who have a high level of understanding of the fastening process.
While this does introduce some uncertainty into our testing datasets, engineers will typically err on the side of caution, since quality is prioritized over all other production metrics. For this reason, any anomaly detection system should flag both true anomalies and ANCs. That said, an algorithm’s ability to detect ANCs should never be improved at the expense of finding true anomalies.
With our labeling dashboard, we overcame two major hurdles of developing machine learning algorithms for fault detection: the time needed to label quality data, and the knowledge gap between fastening experts and data analytics experts.
Ultimately, we created two testing datasets: dataset 1 and dataset 2. Dataset 1 is a handheld nutrunner with high variability. Dataset 2 is an automated process in which a staging problem can be clearly observed.

Simplices can be combined into a simplicial complex to describe a multi-dimensional feature space. Photo courtesy Ford Motor Co.
Machine Learning Models
We decided to apply a GMM for anomaly detection in nutrunner data. Three dimension reduction approaches were compared prior to training the GMM: PCA, t-SNE and UMAP.
During model development, the visualizations produced by these approaches proved useful in communicating the results of our analysis to other team members. This was particularly helpful when discussing the importance of high-quality data, and it revealed early on that test engineers would often disagree on data labels. Visualizing the results made it easier to identify and communicate potential labeling mistakes and to obtain feedback from test engineers.
PCA is a dimensionality reduction technique that aims to preserve the global structure of the data by preserving pairwise distance among all data samples. This is achieved by applying linear mapping using matrix factorization.
Currently, Ford identifies anomalies using DBSCAN to identify noise points in the first two principal components. When applying the PCA transform on nutrunner data, not all noise points are anomalies, anomalies can form tight clusters, and not all are outliers. In Dataset 1, many anomalies lie within the nominal distribution of the data, sometimes forming small clusters of anomalies within the nominal distribution. In Dataset 2, most anomalies form tight clusters, and the ANC class overlaps significantly with the nominal data.
Because PCA is sensitive to variance, the data were first normalized. Typically, when applying PCA, the first two or three components are selected, as these components retain the most information on the original structure of the data. However, initial experiments with nutrunner data found that using different combinations of principal components resulted in more distinct clusters of anomalies. For this reason, the principal components were also varied when optimizing the hyperparameters for the semi-supervised approach.
We also compared t-SNE and UMAP with PCA at clustering the data in 2D before applying the semi-supervised GMM. T-SNE and UMAP are neighbor graph approaches that determine the similarity between the data points before projecting the data onto the lower dimensional space.
UMAP is by far the most complex approach. At a high level, UMAP applies manifold approximation together with local set representations to map the data onto lower dimensional space. These high-dimensional set representations, known as simplicial sets, describe the high-dimensional feature space by combining multiple simplices defined by the data points.
Because t-SNE is a probabilistic approach, and both t-SNE and UMAP use stochastic processes, we had to combine the training and testing data and perform dimensionality reduction on both datasets to ensure they were mapped onto the same lower dimensional feature space. The 2D outputs of the dimensionality reduction are then split back into the training and testing sets before applying GMM for semi-supervised clustering.
Our GMM was then trained in a semi-supervised manner, using 400 normal training data points and the number of Gaussian components.

Because both t-SNE and UMAP use stochastic processes, the researchers combined the training and testing data and performed dimensionality reduction on both datasets to ensure they are mapped onto the same lower dimensional feature space. This graph shows how t-SNE was applied to the training and testing nutrunner data. Graph courtesy Ford Motor Co.
Results
In this study, anomalies were treated as the positive class. With this in mind, we had the following goals for our algorithm:
- Minimize false negative rate. The end goal was to develop an anomaly detection system to improve overall product quality. Therefore, our main objective was to reduce the number of true anomalies incorrectly identified as normal.
- Identify a high percentage of true anomalies. Reducing the false negative rate should not come at the expense of identifying a low percentage of true anomalies.
- Near real time. Any system must be able to identify a potential anomalous reading before the part continues onto the next process. While this time varies, we set a target of under 5 seconds to perform the analysis.
- Adaptable and transferable. Since processes change over time, any AI system must be re-trainable with minimal additional development by engineers. Furthermore, any system must be demonstrated to be effective on multiple nutrunner datasets to demonstrate its transferability to multiple use cases.
When evaluating the performance of our models, we treated ANCs as true anomalies. This approach makes sense, since ANCs are still outliers and should be reviewed by test engineers to err on the side of caution and ensure the highest output quality. However, we argue that this ANC information can provide additional insights into the performance of the models and should be further considered when analyzing performance. This is particularly true for nutrunner data, where the quality of the datasets is difficult to assure given that there is some level of subjective judgement when labeling the data.

The researchers applied a GMM to their data. These graphs show a plot of datasets 1 and 2 with the outlier thresholds visualized. Photo courtesy Ford Motor Co.
For this reason, we considered two situations:
- ANCs are considered as true anomalies.
- ANCs are considered normal.
The results of our semi-supervised GMM algorithm largely depended on the dimensionality reduction approach used to prepare the data. For Dataset 1a, where ANCs are considered true anomalies, the PCA-GMM performs the best, achieving F-scores of 0.55 compared to 0.25 and 0.27 for t-SNE-GMM and UMAP-GMM respectively. The t-SNE and UMAP approaches are less desirable in comparison, with t-SNE returning a low recall, while UMAP returns a low precision. However, all methods outperform the current PCA and DBSCAN used for anomaly detection.
When considering Dataset 1b, where ANCs are considered normal, the PCA-GMM method sees a drop in true positive rates and false negative rates. The t-SNE approach is the only method for which the F-score increases when ANCs are treated as normal data. This aligns with the findings during model development, where t-SNE was found to be useful in identifying true anomalies contaminating the normal training data. However, the method was not as good at distinguishing between ANC and normal data.
The UMAP-GMM achieves the highest recall of all methods on Dataset 1. However, the very low precision makes the approach undesirable in practice, since it would result in considerable added work for test engineers to review all the false positives.
For Dataset 2a, the PCA-GMM performs the best of all methods, achieving the highest recall and second-highest precision. Furthermore, when considering Dataset 2b, this method identifies 100 percent of the true anomalies. The t-SNE-GMM also performs well on Dataset 2, with similar results to the PCA-GMM method. The UMAP-GMM is once again the least-best method, due to high false positive rates. However, it still achieves a higher F-score than the original PCA-DBSCAN approach.
Editor’s note: This article is a summary of a research paper co-authored by James Simon Flynn, Ph.D., and Cinzia Giannetti, Ph.D., associate professor of mechanical engineering, at Swansea University in Swansea, UK. To read the entire paper, click here.
For more information on fastening tools, read these articles:
Are Pneumatic Tools Still Relevant on the Assembly Line?
Fastening Threads: Getting with the Program
Software Analyzes Fastening Data



