Quality by coincidence or by design? Data mining for quality control


By David F.Nettleton (Ph.D.) and Elodie Bugnicourt (Ph.D.)

Quality by coincidence? This is unlikely. Quality is much more likely to be the result of the right combination of a diversity of processes, raw materials, and external parameters. In fact, any quality control manager would know that for complex products, whatever the number and type of final tests made, there may always be a few batches that seemed to be in spec but somehow finally did not meet the customer’s satisfaction. And in fact, human cognition reaches its limits when trying to correlate all the individual parameters which could have led to an issue.

There are different ways to approach this:
● Always looking for more complex offline quality tests with direct representation of the customer’s application: sometimes tricky in view of their variability, not always available, time-consuming and costly
Monitoring additional parameters in-line, this is of interest in many cases but often a smarter approach is needed, e.g. by data fusion of the available process data from different sensors, to get the full picture.
Data mine available data (indeed, the industry already generates a myriad of values) and use them to predict the probability of receiving a customer reclamation.

In the following we describe three real B2B examples in which data mining the available data (together with additional data capture when necessary), was the chosen approach:
a. Caterpillar Inc.: Component Failure Prediction for industrial machine equipment.
b. Semiconductor manufacture QA.
c. Food processing industry: Data Mining for improving the quality of cottage cheese manufacturing.

(a) Caterpillar is a leading manufacturer of construction and mining equipment, which has used data mining methods to predict quality outcomes [1]. According to the principle manufacturing engineer, products can fail quality tests for many reasons. One key issue identified was “tribal know-how”, that is, unwritten rules which are common knowledge for experienced workers. This issue was mitigated by using software (such as decision tree/rule induction) to extract implicit knowledge from empirical data generated by Caterpillar’s manufacturing processes in relation to product performance. This enabled the management to obtain a more accurate idea of the factors which contribute to product compliance.


Image source

In order to solve a vibration problem, 113 different assembly features were measured during the manufacturing process. From these, a subset of predictor variables, such as clearances and fits in the rotating assembly, were identified which most affected the trim balance in an engine, and thus the degree of vibration. Thanks to this study, Caterpillar have been able to reduce rotating machinery anomalies by nearly 45%.

(b) Semiconductor manufacturers have to keep strict delivery schedules while maintaining high quality and low spoilage. However, issues such as micro-contamination can have a significant impact on yield. A study was conducted to optimise a new wafer cleaning procedure [2], for sub-micron particle removal using a laser-based dry cleaning technology. This cleaning technology can be very effective but requires a careful parameterization in order to give optimum results. Data mining played a key role in this study and was applied with two objectives: (i) to understand which are the most important factors that influence the process and (ii) predict the expected precision for a complex set of parameters. Among the input factors considered were the number of pulses in each cleaning session, the number of cleaning passes over the wafer, the angle between the laser beam and the substrate surface, and the characteristics of ozone and oxygen flow. Different data mining techniques were used in the data exploration and modelling phases, including C4.5 rule induction, Kohonen SOM searching for hidden data clusters, and predictive neural networks. An “ensemble” classifier was found to give the best results, combining the output of a rule based data model and a neural network model. Thanks to this study, the laser intensity and the delay between subsequent laser pulse trains to the same spot were confirmed as being the two most important factors in cleaning effectiveness. Also, the efficiency prediction model for the process parameters obtained a precision of over 85% for particle removal, thus complying with the operational requirements.


Image source

(c) Manufacturing cottage cheese is one of the most complicated processes in producing dairy products. The process, which may take up to 20 hours, typically involves many stages, which commences by skimming the milk in a centrifuge. As in every dairy product, there is a chance that a specific batch will be found sour when consumed by the customer, prior to the end of the product’s shelf-life. Thus, the plant’s production processes were analysed in order to identify batches with a high probability of becoming sour based on the process variables [3].


Image source
Data mining tools were applied to the process data in order to discover novelistic and useful patterns, which were subsequently used to improve manufacturing quality. One challenge was that the data had an unbalanced distribution for the target attribute, and a small training set relative to the number of input features. For this data, conventional methods were found to be inaccurate for predicting quality improvement. Thus, a new feature set decomposition methodology was used which is capable of dealing with class imbalance while processing a small number of records. A new algorithm called BOW (Breadth-Oblivious-Wrapper) was developed for this, which performs a breadth first search while using an F-measure splitting criterion for building multiple “oblivious” trees. The new algorithm was tested on several real cottage cheese manufacturing datasets and was found to give superior results with respect to two state-of-the-art algorithms (C4.5 and Naïve Bayes).
Thanks to this study, insights were obtained related to the probability of the cheese becoming sour (measured as PH below 4.9) before shelf life expired. For example, a dependence was found on factors such as average cooling temperature, scalding duration, calcium quantity and culture quantity.

To summarise, these three case studies show how data mining is already helping to leverage quality control in a diversity of manufacturing and operational sectors. At IRIS , our know-how of in-line monitoring for the food processing and pharmaceutical industries is now helping companies in new sectors to exploit new data analytical capabilities in order to win a competitive advantage in their markets.

1. Predicting quality outcomes through data mining. Tony Grichnik, Thomas Hill, Ph.D., and Mike Seskin. Caterpillar Inc.: a case study, http://www.qualitydigest.com/sept06/articles/04_article.shtml
2. Data Mining for Improving a Cleaning Process in the Semiconductor Industry, Dan Braha, Armin Shmilovici, IEEE Transactions on Semiconductor Manufacturing, Vol. 15, Nº 1.
3. Data mining for improving the quality of manufacturing: a feature set decomposition approach, Lior Rokach, Oded Maimon, Journal of Intelligent Manufacturing, Vol. 17, Issue 3, pp 285–299.

Title picture source: Source: http://www.mining-technology.com/features/feature-the-worlds-top-10-biggest-diamond-mines/

Carlota Feliu
Marketing Department

Leave a Reply

Your email address will not be published. Required fields are marked *