Bioz Ratings For Life-Science Research

Full Text

PDF

International Journal of Hydrogen Energy

Towards deep computer vision for in-line defect detection in polymer electrolyte membrane fuel cell materials

PDF

Article Details

Authors

Alfred Yan, Peter Rupnowski, Nalinrat Guba, Ambarish Nag

Journal

International Journal of Hydrogen Energy

DOI

10.1016/j.ijhydene.2023.01.257

Table of Contents

Abstract

Article History:

Keywords:

PEM

Introduction

Methods

Detection And Localization Of Pinholes

Detection Of Pinholes

Detection Of Scratches And Scuffs

Results

Discussion

5 33.6 ± 2.8 65.3 ± 5.1 30.7 ± 3.6

10 40.3 ± 1.3 72.1 ± 1.8 40.3 ± 2.9

Conclusion

Declaration Of Competing Interest

Acknowledgements

Abstract

Polymer Electrolyte Membrane (PEM) fuel cells are a promising source of alternative energy. However, their production is limited by a lack ofwell-establishedmethods for quality control of their constituent materials like the membrane-electrode assembly during roll-to-roll manufacturing. One potential solution is the implementation of deep learning methods to detect unwanted defects through their detection in scanned images. We explore the detection of defects like scratches, pinholes, and scuffs in a sample dataset of PEM optical images using two deep learning algorithms: Patch Distribution Modeling (PaDiM) for unsupervised anomaly detection and Faster-RCNN for supervised object detection. Both methods achieve scores on performance metrics (ROC-AUC and PRO-AUC for PaDiM and AP for Faster-RCNN) that are comparable to their scores on benchmark datasets. These methods also have the potential to detect a wider range of defects compared to IR thermography and previous optical inspectionmethods. Overall, deep learning shows promise at detecting relevant defects of interest and has the potential to achieve real-time defect detection. © 2023 Hydrogen Energy Publications LLC. Published by Elsevier Ltd. All rights reserved.

ww.sciencedirect.com i n t e rn a t i o n a l j o u r n a l o f h y d r o g e n en e r g y 4 8 ( 2 0 2 3 ) 1 8 9 7 8e1 8 9 9 5 Available online at w ScienceDirect journal homepage: www.elsevier .com/locate/he Towards deep computer vision for in-line defect detection in polymer electrolyte membrane fuel cell materials Alfred Yan a, Peter Rupnowski b, Nalinrat Guba a, Ambarish Nag a,* a Computational Science Center, National Renewable Energy Laboratory, 15013 Denver West Pkwy, Golden, CO 80401, USA b Material Science Center, National Renewable Energy Laboratory, 15013 Denver West Pkwy, Golden, CO 80401, USA h i g h l i g h t s Deep learning models were used to detect defects in fuel cell electrode images. Anomaly detection and object detection algorithms were tested. Deep learning can detect a wider range of defects than previous methods. Inference times show feasibility of real-time defect detection. a r t i c l e i n f o

Article history:

Received 29 July 2022 Received in revised form 15 January 2023 Accepted 21 January 2023 Available online 24 February 2023

Keywords:

Anomaly detection

PEM

Object detection Pinhole Defect detection Electrode * Corresponding author. E-mail address: ambarish.nag@nrel.gov (A https://doi.org/10.1016/j.ijhydene.2023.01.257 0360-3199/© 2023 Hydrogen Energy Publicati a b s t r a c t Polymer Electrolyte Membrane () fuel cells are a promising source of alternative energy. However, their production is limited by a lack ofwell-establishedmethods for quality control of their constituent materials like the membrane-electrode assembly during roll-to-roll manufacturing. One potential solution is the implementation of deep learning methods to detect unwanted defects through their detection in scanned images. We explore the detection of defects like scratches, pinholes, and scuffs in a sample dataset of PEM optical images using two deep learning algorithms: Patch Distribution Modeling (PaDiM) for unsupervised anomaly detection and Faster-RCNN for supervised object detection. Both methods achieve scores on performance metrics (ROC-AUC and PRO-AUC for PaDiM and AP for Faster-RCNN) that are comparable to their scores on benchmark datasets. These methods also have the potential to detect a wider range of defects compared to IR thermography and previous optical inspectionmethods. Overall, deep learning shows promise at detecting relevant defects of interest and has the potential to achieve real-time defect detection. © 2023 Hydrogen Energy Publications LLC. Published by Elsevier Ltd. All rights reserved.

Introduction

The rising threat of climate change and growing worldwide energy demand necessitates the development of clean energy technologies that have a minimal carbon footprint and . Nag). ons LLC. Published by Els environmental impact. Polymer Electrolyte Membrane (PEM) fuel cells are a promising example of such a clean energy technology with the potential for a wide range of applications. However, one barrier to their deployment is the need for quality control methods for relevant components such as the PEM, catalyst layer, and gas diffusion layer during evier Ltd. All rights reserved. manufacturing, which have been extensively documented in a recent work by Yuan et al. [1]. In particular, Yuan et al. outline a “book of attributes”, or a list of the different functions of each component, the material property that needs to be optimized for the specific function, and methods for measuring the property for quality control. Such properties include the presence of surface defects, which need to be detected during manufacturing as well as during operation from chemical and mechanical degradation [2,3]. The PEM is a fuel cell component that facilitates diffusion of protons through the fuel cell, and its relevant materials properties/defects are outlined in the book of attributes. Some noted defects include tears, bubbles, contaminants, pinholes, and cracks. Furthermore, for the catalyst-coated media (CCM) consisting of the integrated PEM and catalyst layer, key defects similarly include cracks [4,5], orientation, delaminations [6e8], electrolyte clusters, platinum clusters, and thickness variations [9e11]. Much work has been done on documenting the effects of artificially created pinhole defects on performance in-situwhere the PEM, CCM, and gas diffusion layer are integrated into the membrane-electrode assembly (MEA) [12e16]. In particular, pinholes have been shown to accelerate hydrogen crossover, which can be measured in both fuel cells and stacks [17] through electrochemical methods like Linear Sweep Voltammetry, IeV polarization curves, electrical impedance spectroscopy [18], as well as a decrease in opencircuit voltage (OCV) that can detected through open-circuit experiments [19e22]. For example, measuring the OCV has been shown to be able to detect and discriminate between electrical shorts and pinholes [17] and enable in-situ detection of bubbles and cracks [23]. Some other methods include magnetic tomography for non-invasively detecting material degradation in a fuel cell stack [24] and X-ray computed tomography (XCT) for visualizing catalyst layer thinning [25]. Some ex-situ methods for failure diagnosis include scanning electron microscopy (SEM), ion chromatography, and measuring the membrane's selective permeability, many of which have also been applied to cell-level and stack-level diagnosis [26,27]. The propagation of cracks in the CCM is often characterized these ex-situ methods, especially combined with experiments subjecting the CCM to varying stresses, temperature, and humidities [28e33]. Other defect detection methods include inspecting for leaks through the bubble test and pressure drop measurements, which have been performed in-situ and ex-situ [27,34,35]. A well-studied quality control method is infrared (IR) thermography. IR thermography utilizes a gas that reacts with the material when a defect is present and creates a measurable heat signal, and this has been frequently used to detect artificial bare spots on the gas diffusion electrode catalyst layer [36e38]. For example, in Phillips et al. [39], MEAs are induced with artificial bare spots (“irregularities”) in the cathode catalyst layer. The MEAs are subjected to an accelerated stress test (AST) where they are subjected to dry and wet humidities while the OCV is recorded to detect cell failure. The cell apparatus consists of a removable cathode flow field that allows quasi in-situ IR thermography [40]: at different AST operating times, cell operation halts and operating conditions are returned to ambient conditions, and IR thermography is utilized to detect failure points. It was found that under counter-flow conditions, catalyst layer irregularities near the H2 inlet were associated with reductions in MEA lifetimes, and that some failure points were located near the irregularities. Other IR thermography studies have also generated a signal through resistive heating [41,42]. Lastly, in the through-plane reactive excitative technique developed by Ulsh et al. [43], artificially-induced pinholes in MEAs are detected through pulsing hydrogen-containing gas at the membrane andmeasuring a heat signature that would arise if the hydrogen crosses through themembrane and contacts the catalyst layer. Whilemany of thesemethods are frequently used to detect defects that occur from cell degradation during operation, defects are also created in the manufacturing stage during processing or handling as well [9,10,23,43,44]. For example, in Wang et al. [45], hot-pressing the GDE to a membrane during cell assembly is shown through SEM to sometimes result in process-inducedmorphology changes in theMEAnear defects like cracks in the microporous layer and from fibers from the gas diffusion layer penetrating into the PEM. These morphology changes later developed into defects like pinholes, shortening MEA lifetime. Thus, it is desired to develop in-line methods to detect defects in a sheet of material during R2R manufacturing. Such methods would not only need to be accurate but also fast enough to detect defects in ‘real-time’, or quickly enough as to be almost instantaneous and thus allowing the processing of large volumes of material. IR thermographymethods have received special attention in this area because of their relatively fast speed and scalability. Optical inspection methods are also a potential candidate for this task. Some opticalmethods include X-ray fluorescence for monitoring the chemical composition and film thickness of the catalyst layer [36,46], microscopy for non-destructively viewing real CCM defects [44], and other spectroscopy methods [47,48] for viewing changes in molecular structure during degradation or monitoring ionomer content. However, reasons like slow acquisition times, small spatial coverage, and destructiveness can hinder their use in roll-to-roll PEM fuel cell manufacturing lines. Some other optical methods include multispectral imaging, in which incident light generates a measured thermal response from a material. This has applications in thickness mapping of Li-ion battery electrodes [49] and for general purpose membrane inspection [50e52]. One promising optical defect detection method involves the rolling of thematerial across a camera feed, so that images of the membrane can be processed by computer vision (CV) algorithms that can locate and identify defects [53,54]. Such optical methods have been investigated for the detection of defects in PEM materials, many of which involve predetermining a set of defects in the optical image followed by the application of CV algorithms like intensity thresholding to isolate the bright or dark defects. For example, in Rupnowski et al. [55], defects are artificially induced to two sample membranes: one membrane with scratches and scuffs, and the other with debris sprinkled onto the surface. CV algorithms based on binary thresholding show potential at identifying sprinkled debris but are not investigated for detecting scratches and scuffs due to their complex morphology. In Johnson [56], a thresholding algorithm is used to find potential objects of interest on a membrane, and a neural network is used to classify the objects based on their characteristics like shape and brightness. Another method used on polymer films by Tolba et al. [57] calculates the similarity of each region of an image on a rectangular grid with a defect-free baseline, where similarity is calculated by comparing the luminance and contrast between the image region and baseline as well as their covariances. While high scores are obtained using this method, a potential flaw is that its performance may be sensitive to noise in the imagesdif the images are induced with Gaussian noise, such that the baseline is not solid gray, performance greatly deteriorates. Additionally, since small sliding windows are analyzed one at a time, the method may not be able to analyze regions in a more global context. The aforementioned study by Johnson marks the beginning of a trend of applyingmachine learning for optical defect detection methods. However, it still leaves open challenges: careful feature selection is required before the neural network can be trained, which may be cumbersome, and the neural network can only classify defects that have already had their features carefully studied. Challenges also exist in thework by Rupnowski et al. [55], in that the binary thresholding algorithms were not successful in detecting some faint artificial slits and scuffs because of their complexity. Overall, algorithms trained to detect defects by filtering regions of interest by parameters like size, shape, or brightness may miss other defects that do not fit the specified parameters, and such algorithmsmay struggle to detect defects with complex shapes. With this problem, optical defect detection may be enhanced bymore advanced deep learningmethods. Machine learning already has many applications in PEM fuel cells [58] ranging from on-line fault diagnosis [59e64], aiding in fuel cell design [65e67], and predicting optimal operating conditions [68e70]. However, to the authors’ knowledge, there have been no studies using machine learning, besides that by Johnson, centered on detecting defects from sample images during the roll-to-roll manufacturing process. A recent study byWei et al. [71] applies a convolutional neural network combined with IR imaging to detect defects in the glass solder of solid oxide fuel cell stacks during cooling, but this may have limited applicability for non-glass materials. Nevertheless, there is a variety of studies investigating deep learning for defect detection through image inspection for other materials that serve as inspiration [72e78]. Unsupervised anomaly detection is a deep learning framework that is well applied in this area. Unsupervised anomaly detection models only require baseline anomaly-free images for training and a relatively small test dataset of defective and defect-free images. After training, models have the ability to highlight any anomalous pixels or regions in an image that deviate from the training baseline images. Thus, it is expected that these models have a greater ability to generalize to unseen defects than previous methods that require feature selection, and can detect defects with arbitrarily complex shapes. The amount of image preprocessing is also minimized, as the model can use baseline images that are not necessarily monotonically one color and can account for many irregularities that may appear in the baseline images. An example of an anomaly detection algorithm is Patch Distribution Modeling (PaDiM) developed by Defard et al. [79], which is an algorithm that creates a matrix of normal distributions from the training data after extracting features from a pre-trained convolutional neural network. The Mahalanobis distances from the normal distributions are then used to calculate how anomalous a test image is. PaDiM achieves excellent performance on the MVTec-AD dataset, which is a dataset commonly used to test anomaly detection models. MVTec-AD contains pristine and defective images of different materials (e.g. leather, carpet, wood) and objects (e.g. hazelnut, toothbrush, zipper), and thus allows rigorous testing of new anomaly detection models [80]. Another type of deep learningmodel is the object detection model. In object detection, a dataset of images labeled with bounding boxes identifying certain objects in the images is used to train amodel to detect and identify the same objects in new images. Object detection is a popular machine learning problem,withmany differentmodel architectures created like Faster-RCNN [81] and YOLO [82]. Object detection is also particularly appealing because of existing methods for ‘fewshot’ detection, where a small dataset showing very few instances of a particular object is used to train a network. Here, object detection models may also have advantages in defect detection over previous methods because they also do not involve feature selection, and thus may be able to detect defects with arbitrarily complex morphologies. In this study, PaDiM and an object detection model with the Faster-RCNN architecture are trained and tested on datasets of optical images of electrode samples with their defects labeled, and their performances are evaluated and discussed.

Methods

The following section will separately outline the procedure used to train PaDiM for anomaly detection and Faster-RCNN for object detection. Detection of defects as anomalies using PaDiM PaDiM models are first trained and tested for defect detection in electrode images. Model architecture details are as follows: in every PaDiM model in this study, EfficientNet-B4 was used for feature extraction, and activation vectors were extracted from the 3rd, 7th, and 17th blocks [83,84]. Full details on the implementation of PaDiM can be found in Defard et al. [79]. Detection and localization of scratches and scuffs The first image dataset investigated in this study featured scratches and scuffs that were artificially induced on some electrodes. To create the scratches/scuffs dataset, four 800 11 fuel cell electrode specimenswere used, whichwere originally featured in Rupnowski et al. [55]. Two of these specimens contained artificially induced defects in the forms of scratches and scuffs arranged in a 6 3 grid, of which one had defects induced prior to tacking a polymer electrolyte membrane, and the other had defects induced after. The scratches and scuffs had varying sizes and brightness and so are believed to represent a wide range of cases. Two other electrode specimens were identified as defect-free baselines, of which one had a polymer electrolyte membrane, and the other did not. A large high-resolution imagewas taken of each specimen using reflectance mapping, a method that is well-established for rapidly imagingMEA samples in-line for inspection, and labels around the artificially induced defects were manually drawn. These images were then split up into smaller sub-images, to later input into the PaDiM model. However, it may be useful to investigate how the model performs at different image resolutions, because higher resolutions may make faint defects easier to detect. Thus, each membrane image was split into a 6c 3c grid of smaller sub-images, where c was an integer constant, and each sub-imagewas resized to 224 224 pixels. Three values of c (1,3,5) were tested, which were arbitrarily chosen but ensured that a large range of resolutions was tested so that the effect of resolution on model performance could be illustrated well. As c increased, each subimage was shrunk less relative to its original size during resizing, and thus image resolution increased. For c ¼ 1, each of the four membrane specimens was each split into a 6 3 grid of sub-images. This yielded 36 images containing defects and 36 defect-free images. Because the defects were originally created in a grid on the membrane, each defective image contained 1 defect with little overlap. Two different procedures were used to organize the images into training and test data and evaluate themodel. The first, which will be referred toas the “standard”procedure, followedDefard et al. [79], in which the model was trained on the defect-free training data, and then evaluated on the test dataset containing both defective and defect-free images. In this study, 18 of the total 36 defect-free imageswere chosen randomly to create the training set, and the remaining 18 were used as test data. To combat class imbalance arising from the different number of defect-free images in the test and training data, each of the 18 defect-free test images was duplicated once to create 36 defect-free test images in total. Finally, after all images were reduced from size 1359e1365 pixels 2041e2124 pixels to size 224 pixels 224 pixels to be processed by the PaDiM model, each image in the training dataset was augmented through rotations of 90, 180, and 270 clockwise, aswell as randomflips (horizontal, vertical, and both horizontal and vertical). This helps combat the small sizes of the dataset by effectively multiplying the size of the training dataset by 8 to yield 144 images. Finally, themodel is trainedon thedefect-free training images and evaluated on the test images. Different accuracy scores were used for testing, including the average ROC-AUC, PRO-AUC score, and threshold PRO. The ROC-AUC describes the performance of the model at a variety of anomaly thresholds. It is calculated first by plotting the number of true positives vs. the number of false positives obtained at different anomaly thresholds to create a curve known as the receiver operating characteristic (ROC), and then by calculating the area (i.e. Riemann sum) under the normalized ROC. The ROC-AUC was evaluated at the pixellevel and image-level. The ROC for the pixel-level ROC-AUC is obtained by plotting the number of pixels in the test images that are true positives vs. the number of pixels in the test images that are false positives. Thus the pixel-level ROC-AUC summarizes the model's ability to locate defects and identify pixels that are part of a defect. The ROC for the image-level ROC-AUC score is obtained by assigning each test image its maximum pixel anomaly score, and then plotting the number of images that are true positives (i.e., the number of images that are correctly identified as containing a defect) vs false positives (i.e., the number of images that are incorrectly identified as containing a defect) at different anomaly thresholds. Thus, the image-level ROC-AUC describes the model's ability to detect defects (classify whether an image contains a defect overall) without necessarily being able to locate the defect in the image. The PRO-AUC is calculated by integrating the normalized per-region overlap (PRO) curve, where different PRO scores are plotted as a function of different anomaly thresholds. To calculate the PRO score for a certain threshold, for each defective region, the fraction of that region that is labeled by the model as being anomalous is calculated, and all the fractions from all the defective regions in the test images are averaged. The PRO-AUC is described in more detail by Bergmann et al. [85] and Shi et al. [86]. The last accuracy metric, the threshold PRO, is the PRO score calculated after optimizing the anomaly threshold. In this study, the chosen threshold was one that maximized the F1 score, which is defined in Lipton et al. [87] as follows: F1¼ 2*tp 2*tpþ fpþ fn Here, tp is the number of true positives, fp is the number of false positives, and fn is the number of false negatives. While the smaller dataset sizes of 18 training and 54 testing images may appear concerning, it should be noted that data augmentation greatly increased the effective size of the training data. Furthermore, the size of the test dataset of 18 pristine and 36 defective images is well in the range of what has been investigated for PaDiM. For example, in the MVTecAD dataset, the test dataset for toothbrush defects consists of 12 pristine and 30 defective images. Nevertheless, to address potential instability resulting from the small dataset sizes in this study, 36 repeated trials were takenwith different random samples of 18 defect-free images to use as training data, and the average of the scores were reported at a 95% confidence interval. These 36 repeated trials were conducted once without reducing the embedding vector size and once where the embedding vector size was reduced through randomly selecting 100 features. Details on this dimensionality reduction procedure can be found in Defard et al. [79]. Because the dataset sizes are small, it is worth investigatingwhether the small dataset sizes prevent the finalmodel and anomaly thresholds from generalizing to defects outside the test dataset. This can be done by quantifying the variance in model performance across the different images in the test set. To do this, another procedure was used to evaluate the models, to be referred to as leave-one-out cross-validation (LOOCV). LOOCV is performed as follows: 36 runs were repeated like in the standard procedure with c ¼ 1. In each run, the baseline sub-images are split in half between train and test images just like in the standard procedure, as shown in Fig. 1a). However, in each run, a different defective subimage was removed from the rest of the defective subimages, as shown in Fig. 1b). During the training and testing process, all the 71 remaining images were first used to train and evaluate the model to obtain accuracy scores and select a pixel-level anomaly threshold, just like in the standard procedure. This processwill be referred to as the “inner loop.” The accuracy metrics evaluated on the inner test dataset were the ROC-AUC, PRO-AUC, and threshold PRO scores. Afterwards, the threshold PROwas evaluated on the removed image alone, using the selected threshold and trained model obtained in the inner loop, a processwhichwill be referred to as the “outer loop.” The combined inner and outer loop training and testing process that is repeated 36 times is shown in Fig. 1c). Finally, after the 36 repeated runs, the average of all 36 outer loop threshold PRO scores is calculated at a 95% confidence interval and compared to the average threshold PRO scores obtained on the inner loops to quantify the model's performance degradation. Different random train/test splits of the defectfree membrane sub-images were used in the 36 repeated runs. This procedure was also tested without dimensionality reduction as well as with 100 randomly selected features. For c¼ 3 and c¼ 5, the resolutionwas higher andmore subimages were created from the largermembrane images. Thus, all the sub-images created from splitting the defect-free baseline membranes into a 6c 3c grid were used for training data, and all the sub-images created from splitting the defect-containing membranes were used for test data. Data augmentation through rotating and flipping the images was not used because a larger number of sub-images was available, close to the number of images per class in the MVTec-AD benchmark dataset [80]. Only the standard procedure was used (no LOOCV), and runs were not repeated because of the large dataset size.

Detection and localization of pinholes

PaDiMwas also investigated in its ability to detect and localize pinholes on the membrane. The pinhole dataset was created from 4 8'' 11” electrode samples, of which one was described in Rupnowski et al. [55], with each sample containing 9 artificial pinholes. These pinholes either looked like gray rings or solid circles, and the smallest pinholes were approximately 150 mm in diameter. These imageswere also divided into a grid of sub-images, except instead of dividing a whole membrane image into a grid of images with a preset dimension (e.g., 6c 3c), the membrane image was kept at its original resolution and divided into a large grid of 224 224 pixel subimages, with no resizing, after each dimension in each membrane was slightly cropped to become a multiple of 224. This was because a far larger resolution was required for the small pinholes to become visible. 39 sub-images containing pinholes were then manually isolated and segmented to create a test dataset. There were more images than the number of pinholes because some pinholes were located on the boundary between two sub-images, causing two images to contain the same pinhole. 36 random images from the set of pinhole-free imageswere also added to the test set, so that the total size of the test set was 75. The rest of the pinhole-free images were used for training. Both the standard and LOOCV evaluation method were used, each once with no dimensionality reduction, and once with 100 randomly selected features. In the standard method, 20 runs were repeated. In both the standard and LOOCV method, scores were averaged at a 95% confidence interval. Data augmentation on the training data was not used because of its larger size. In each of the 20 pinhole runs during the standard procedure, the testing process, which was parallelized with 1 GPU, was timed to investigate the feasibility of real-time inline quality inspection. Times from the 20 runs were averaged at a 95% confidence interval. Two different metrics were used: the total inference time and the total testing processing time. The total inference time is the amount of time elapsed when Mahalanobis distances are calculated for all the patches’ feature vectors in all the test images, and the total testing processing time describes the time elapsed from when subimages are loaded to when the inference step finishes. The latter most realistically represents the total time it takes for the model to make a prediction on an image, starting from when sub-images are acquired and ending when the model outputs an anomaly map for each image. Timing was only investigated for the pinholes and not the scratches/scuffs, as it is expected that real-time defect detection would be harder for the pinholes than the scratches/scuffs because of their higher image resolution. Detection of defects and objects using Faster-RCNN Object detection models with the Faster-RCNN architecture were trained to detect defects as well. Faster-RCNN is a type of convolutional neural network architecture that is wellestablished in achieving high performance in object detection tasks, and details on its implementation can be found in Ren et al. [81]. While anomaly detection methods may already be tailored for detecting defects, object detection methods may be suitable for when defects need to be classified. The methods used in this study are particularly inspired by recent advances in few-shot object detection. Few-shot object detection is a framework where models are optimized to detect objects after being trained on very few images. Some recent work on few-shot detection includes a study by Wang et al. [88], in which a “two stage fine-tuning” approach (TFA) is developed. In this two-stage approach, a Faster-RCNN model is first trained on a large base dataset and then fine-tuned on a smaller novel dataset. The base dataset contains many different object classes, each of which contains many images. The novel dataset contains all the object classes in the base dataset as well as some new classes, but there are only K examples of each class in the novel dataset, where K is a very small number like 1,5,7, etc. This method, while illuminating some effective strategies for learning from small datasets, cannot be optimally implemented in this study because it requires a large base dataset during the first stage of training, which should ideally be similar to the novel dataset. However, aside from the small hand-labeled datasets of scratches, scuffs, and pinholes used in the PaDiM study, there is no obvious candidate for a closely related base dataset that is well-labeled. Thus, it is not immediately obvious how to exactly replicate TFA. Nevertheless, we can borrow some insights by using a more rudimentary single-stage fine-tuning approach. By using a technique that is similar to but less optimal than TFA, the results from this study can establish a baseline performance for object detection in PEMs that can be expected to improve as more data becomes available to fully implement TFA. To train a model to detect defects, a rudimentary finetuning approach inspired by TFA is used. A model with the Faster-RCNN [81] architecture pre-trained on a section of the PascalVOC benchmark dataset is taken from an online model repository [88]. The ResNet backbone, region proposal network, and feature extractor are frozen according to TFA, and only the box classifier and regressor are given randomized weights and then trained on a novel dataset that contained only images of defects. Through this process, knowledge from the PascalVOC training dataset could be used for transfer learning to improve prediction performance on detecting defects. A cosine similarity-based box classifier with a ¼ 20 was used. In this study, two novel datasets were used: the pinholes and scratches/scuffs. Model architectures and training hyperparameters were the same as those used to train on the Pascal-VOC novel classes in Wang et al. [88], which can be found in their publicly available code repository.

Detection of pinholes

To create an image dataset from the sample images containing pinholes, a 66 66 rectangular boundary was drawn around each pinhole, and the image inside each boundarywas resized to800 800,which is the input resolution for Faster-RCNN,and then added to the training dataset. Box labels were then manually drawn around the pinhole in each image. To acquire baseline images, each samplewas divided into a grid of 66 66 sub-images. Gaussian adaptive thresholding was used to extract baseline sub-images that contained bright spots and discard the rest. Each baseline sub-image was also resized to size 800 800, and sub-images containing a pinhole were discarded. It should benoted that thedrastic resizingwas done tomake the pinhole appear larger in each sub-image, as object detection models are known to perform worse at detecting small objects [89]. Finally, to perform 5-shot and 10-shot learning, either 5 or 10 pinhole images were randomly chosen from the 36 total pinhole images to be used as training images, the rest were used for testing, and 36 baseline images that contained bright spots were also randomly selected to be added to the test dataset. For each shot (5 or 10), stable results were obtained by repeating runs 30 times with different random seeds and calculating the average at a 95% confidence interval. At each new random seed, a new random sample of 5 or 10 imageswas taken from the total set of defective images to serve as the training data. The object detection metrics evaluatedwereAP, AP50, andAP75,which are calculatedbyfinding the area under the precision-recall curve at a specified Intersection over Union (IoU) threshold [90].

Detection of scratches and scuffs

For the scratches/scuffs sample images, the same procedure was used with a few modifications. A lower resolution was used: each defect-containing membrane was split into a 6 3 grid of sub-images, such that approximately each sub-image contained 1 defect, and each defect was manually labeled. In the original study [55], 12 scuffs and 24 scratches were reported to be created on the membrane, and ideally there should be the same number of images containing scratches and scuffs. However, some images were labeled with both, which will be explained in the Discussion section of this paper. One of the baseline membranes (membrane B2 in the original study) was split into 18 sub-images as well, to be added to the test set in each run. Only 5-shot learning was used for the scratches and scuffs because of the lower data availability. Just like with the pinholes, the object detection metrics evaluated were AP, AP50, and AP75, and mean scores across the 30 repeated runs were recorded at a 95% confidence interval. Finally, the inference time was recorded in every run to determine the feasibility of real-time detection to yield 90 different inference time measurements. These were all averaged at a 95% confidence interval to determine an expected inference time.

Results

For PaDiM, various performance metrics obtained from implementing the standard procedure are displayed for both the scratches/scuffs dataset in Table 1 and the pinhole dataset in Table 2. Performance metrics obtained from implementing LOOCV are displayed in Table 3 for the scratches/scuffs dataset and in Table 4 for the pinholes. Details on rounding are as follows: all margins of error are rounded up to the nearest hundredth; for average scores on accuracy metrics like ROC-AUC and PRO-AUC, averages were rounded down to the previous hundredth, while for the time metrics (inference time, processing time), averages were rounded to the nearest hundredth (up or down). For the scratches/scuffs scores with c ¼ 3 and c ¼ 5, there was no averaging, but scores were rounded down to the nearest hundredth as well. The model performs the best on the pinhole dataset, and the high image-level ROC-AUC scores indicate that the model can effectively distinguish whether an image contains a pinhole. However, as evidenced by the large confidence Table 1 e Performance metrics using PaDiM for anomaly segm using the standard evaluation procedure. Features c Image ROC-AUC Pixel ROC 100 1 0.9 ± 0.02 0.95 ± 0 248 1 0.9 ± 0.01 0.95 ± 0 100 3 0.85 0.94 248 3 0.82 0.94 100 5 0.81 0.95 248 5 0.79 0.95 Table 2 e Performance metrics using PaDiM for anomaly segm standard evaluation procedure. 20 runs were repeated and sco Features Image ROC-AUC Pixel ROC-AUC PRO-AUC score Thre 100 0.99 ± 0.01 0.99 ± 0.0 0.99 ± 0.0 248 0.99 ± 0.01 0.99 ± 0.0 0.99 ± 0.01 Table 3 e Performance metrics using PaDiM for anomaly segm using the LOOCV procedure. Scores are averaged over 36 runs Features Inner Image ROC-AUC Inner Pixel ROC-AUC Inner PRO 100 0.89 ± 0.02 0.95 ± 0.01 0.83 248 0.9 ± 0.01 0.95 ± 0.01 0.83 Table 4 e Performance metrics using PaDiM for anomaly segm LOOCV procedure. Scores are averaged over 39 runs at a 95% c Features Inner Image ROC-AUC Inner Pixel ROC-AUC PRO 100 0.99 ± 0.01 0.99 ± 0.01 248 0.99 ± 0.01 0.99 ± 0.01 interval in the outer threshold PRO, there can be some variance in model performance when making new predictions. This can also be seen in Fig. 2, which shows a histogram distribution of the 39 different outer PRO scores achieved in a entation are recorded on the scratches and scuffs dataset, -AUC PRO-AUC score Threshold PRO score .01 0.82 ± 0.01 0.47 ± 0.01 .01 0.83 ± 0.01 0.48 ± 0.01 0.84 0.53 0.84 0.53 0.86 0.52 0.86 0.52 entation are recorded on the pinhole dataset using the res are recorded at a 95% confidence interval. shold PRO score Inference time (s) Total Processing Time (s) 0.81 ± 0.01 0.42 ± 0.01 1.19 ± 0.05 0.81 ± 0.01 0.63 ± 0.02 1.48 ± 0.06 entation are recorded on the scratches and scuffs dataset at a 95% confidence interval. -AUC score Inner threshold PRO Mean outer threshold PRO ± 0.01 0.46 ± 0.01 0.63 ± 0.12 ± 0.01 0.48 ± 0.01 0.65 ± 0.11 entation are recorded on the pinhole dataset using the onfidence interval. Inner -AUC score Inner threshold PRO Mean outer threshold PRO 0.99 ± 0.01 0.8 ± 0.01 0.8 ± 0.09 0.99 ± 0.01 0.81 ± 0.01 0.8 ± 0.08 LOOCV runwhere only 100 random features were included for dimensionality reduction. In this histogram, one pinhole is completely missed and obtainsaPROscoreof 0, and for anotherpinhole, only16%of its area isdetectedby themodel.These twoare showninFig. 3. For each pinhole, the original image is shown, along with the ground truth that highlights the pinhole in the image and tells themodel that it is anomalous. The predicted heatmap shows a heat map of anomaly scores predicted by the model, where red indicates high anomaly scores. The predicted mask shows areaswhere the anomaly score of the image is greater than the chosen anomaly threshold. Finally, the segmentation result circles areas that were highlighted in the predicted mask. False positives are also prevalent in the pinhole baseline sub-images. Inspecting the results from a separate standard run with dimensionality reduction, there are three false positives among all the 75 test images. Some seem to be from debris like lint, which still may beworth detecting. These false positives are circled in red in Fig. 4. There are no other false positives in that one run specifically, so the model appears to have good specificity. Model performance on the scratches and scuffs dataset is weaker: the outer threshold PRO scores of the scratches and scuffs dataset are lower than those of the pinholes, and there is more variation in performance. A histogram of threshold PRO scores for each image evaluated in the outer loop during LOOCV with dimensionality reduction is shown in Fig. 5d3 defects have a value of 0 (were missed entirely). The three defects that had a PRO score of 0 were indeed anticipated in Rupnowski et al. [55] to be challenging to detect. However, as the resolution increases, more low-contrast defects are localized by the model, including the three defects were not localized during LOOCV. These defects are displayed at low resolution (c ¼ 1) in Fig. 6, and at higher resolution (c ¼ 3) in Fig. 7. In Fig. 6, some of the defects may be extremely difficult to spot by eye because of the low image resolution. While faint defects are more easily segmented at higher resolutions, themodel also incorrectly segmentsmore regions as defects, resulting inmore false positives. Small bright spots are ubiquitous in the sample images, and as resolution increases, they becomemore visible and are more likely seen as defects by the model. For example, Fig. 7a) shows model predictions for a sub-image containing some scratches. While the model successfully detects the faint scratches, it also segments outside regions that do not belong to any labeled defect. One (top) appears to be a bright spot, while the other (bottom) does not seem to have any distinguishing features. Another example is seen in Fig. 8bed), which show 3 subimages taken at c ¼ 1, c ¼ 3, and c ¼ 5, respectively, that show the same area of the membrane. Here, it is clear that with increasing c, new regions of the membrane falsely become segmented that were not noticed at lower resolutions. In Fig. 8bed), there is another feature, which is a triangular discoloration, that is predicted to be a defect, although the origin of the discoloration is unknown. For Faster-RCNN, mean scores across the 30 repeated runs are recorded for pinholes in Table 5 and scratches and scuffs in Table 6, where all averages and confidence intervals were rounded to the nearest tenth. The average inference time is displayed in Table 7. Additionally, some success and failure cases are shown in Fig. 9. The top row shows successes and the bottom row shows failures. The first three images in each row show examples from the images of scratches and scuffs, while the last two show examples of pinholes. Failure cases include not predicting a bounding box with sufficiently high confidence and overlapping predictions. For example, a scratch can be predicted to be both a scratch and scuff.

Discussion

Overall, the performance metrics of the PaDiM models on the scratches/scuffs and pinholes datasets are comparable to those that PaDiM achieves on the benchmark MVTec-AD dataset. Nevertheless, there are a few trends worth noting. First, trends are analyzed for the performance of PaDiM on detecting scratches and scuffs. Inspecting results from the LOOCV runs on the scratches/scuffs, the inner threshold PRO scores are noticeably lower than the outer threshold PRO scores, which is surprising because it was initially expected that the model performance would degrade when generalizing to the outer loop image, leading to a slightly lower outer threshold PRO score. The observed trend is caused by differences in calculating the inner vs outer PRO score. As described in Bergmann et al. [80], the inner threshold PRO score is calculated for a dataset by looping through all the connected components in the ground truth masks and calculating the relative area of each connected component that is segmented by the model at a chosen threshold. Then, the relative areas are averaged over all the connected components, such that each connected component has equal weight on the final score, and defects with many connected components have a larger impact. In contrast, the outer threshold PRO score is found by calculating an individual threshold PRO score for each defective image and then averaging over all the defective images’ PRO scores uniformly. Thus, each defect has equal impact on the final score regardless of how many connected components they aremade of. Here, the scuffs, consisting of a cluster of small scratches and discolorations, have far more connected components than the scratches, so if the model performs poorly on a scuff defect, the inner threshold PRO is lowered disproportionately compared to scratch defects. So far, there is no reason to use a score that gives the scuffs more weight when evaluating model performance, because although they have more connected components, the components are generally fainter and smaller than those of the scratches, so the scuffs are not necessarily more serious Table 5 e Faster-RCNN AP, AP50, and AP75 scores for detecting pinholes. Shot AP AP50 AP75

5 33.6 ± 2.8 65.3 ± 5.1 30.7 ± 3.6

10 40.3 ± 1.3 72.1 ± 1.8 40.3 ± 2.9

Table 6 e Faster-RCNN AP, AP50, and AP75 scores for detecting scratches and scuffs. Object AP AP50 AP75 scratches 8.0 ± 0.9 17.7 ± 1.7 6.0 ± 1.3 scuffs 3.9 ± 1.2 22.2 ± 5.6 2.0 ± 1.6 Table 7 e Average inference time of the Faster-RCNN model, using 1 GPU. Inference time (milliseconds) 39.8 ± 4390 10 5 defects than the scratches. Thus, it can be argued that the outer threshold PRO score is a more useful indicator of performance than the threshold PRO score obtained in the standard procedure. It should be noted that this issue does not occur with the pinhole dataset, as noted in Tables 2 and 4 where all the threshold PRO scores are very close. This is because the ground truthmask for each defect only consists of one connected component so each defect has equal weight when calculating the inner and standard threshold PRO. Another noteworthy trend is that the image-level AUCROC decreases as c increases, and becomes rather low at c¼ 5. This is probably because as c increases, the entire membrane image is divided into more sub-images, and more defects are cut-off and spread out amongmultiple sub-images. Some subimages may contain a very small portion of a defect such that they are identical to defect-free sub-images, causing the model to incorrectly classify them as being defect-free. An example is shown in Fig. 10, which shows a sub-image created from splitting themembrane imagewith c¼ 3 that has a small sliver of a scratch defect near the bottom right corner. The lack of segmentations on the predicted mask shows that the model does not notice the defect, and the defect is virtually unrecognizable by eye. While image-level ROC-AUC decreases with increasing resolution, the pixel-level AUC-ROC is not impacted by choice of c. As noted, this is likely because while higher c allows fainter defects to be detected, the number of false positives also increases because small bright spots become more conspicuous. It is unlikely that these bright spots are real defects; they have postulated to be a result of specular reflection or scattering from slight variations in surface roughness, and were not seriously treated as defects in the original study [55]. Thus, the absence of these bright spots would be greatly beneficial. It is possible that alternative methods for optical imaging of the membrane could reduce scattering and lead to fewer problematic bright spots. Additionally, using image processing algorithms to remove them like may be helpful, but they may also accidently remove real defects. Lastly, false positives could be reduced by raising the anomaly threshold, though this may sacrifice some of the model's ability to detect faint defects. For the pinholes dataset, PaDiM performance is stronger, andmany of the scores are very close to 1.0. However, the high pixel-level ROC-AUC scores are difficult to interpret because very few pixels belong to a pinhole compared to the number of pixels belonging to the baseline, leading to class imbalance in the pixel labels and thus making it easier to achieve very high scores. The PRO-AUC accounts for the imbalance by ignoring areas that are not labeled as defective [80], but the PRO-AUC scores are very high as well. The only scores that are not very close to 1.0 are the threshold PROs. However, they still indicate that most pinholes were detected even if their entire areas were not segmented. Lastly, the outer threshold PRO scores from LOOCV are close to the threshold PRO scores from the standard procedure. This suggests that the model generalizes well when localizing new pinholes that have similar appearances to those in this study. Overall, the model performs better on the pinholes for several reasons. There were less problematic bright spots in the pinhole scanned images that would have otherwise led to many false positives. While there are some false positives, many seemed like debris contamination, which may be worth detecting anyway and could be subject to further study. Furthermore, most of the pinholes had a distinctive appearance as a gray ring [55], which made them easy to detect. The pinholes that were not easy to detect did not have the characteristic ring and thus were less unusual-looking, which led them to have lower anomaly scores. Nevertheless, PaDiM detected themajority of pinholes, including those without the ring shape. Overall, unsupervised anomaly detection like PaDiM may have a few advantages over traditional defect detection methods in the future, despite some flawed performance in this study. The current best method for detecting scratches and pinholes is through-plane reactive excitation IR thermography [43]. However, this method is unable to detect pinholes that do not fully penetrate the PEM because they do not allow the reactive gas to cross over and react with the catalyst layer. PaDiM may be able to circumvent this issue, because defects thatdonot fullypenetrate thePEMmaystill bedetected through imaging. This is evidenced by PaDiM's success at detecting some scratches which appear fainter in the scanned images than the scratches that were missed by reactiveexcitation, which were also characterized by reflectance mapping. However, it is not known for certain whether the defects in this studydidnot fullypenetrate themembrane, so it is worth trying to confirm this advantage in future studies. Another potential advantage of PaDiM is the ability to detect a wider range of defects than before. This is because it is trained to detect anomalies in an image instead of merely finding instances of a predefined type of defect. Thus, although a test dataset usually contains a few predefined classes of anomalies, it is still possible for the model to find defects outside those that appeared in the test dataset. For example, there is the possibility of detecting discolorations, folds, or indentations in an image. In fact, this is evidenced by a discoloration that was detected, previously shown in Fig. 8. This is useful because for quality control, it may be worth detecting anything in a sample that looks anomalous rather than confining the possible space of defects to those which have already been documented. This is especially relevant from a high volume manufacturing standpoint, as defects in manufacturing equipment can cause defects that may not necessarily be documented in the literature, which are usually carefully generated under controlled lab conditions. This advantage more immediately applies for the pinholes, because the baseline is plain enough such that any anomalies detected by PaDiM are less likely to be random noise. This could also apply for the scratches/scuffs dataset if the amount of random noise can be reduced. This capability also gives PaDiM a potential advantage over other image processing algorithms that rely on “feature selection,” and thus are optimized to detect one defect and cannot generalize to other defects. For example, binary thresholding algorithms have been used to detect black carbon debris defects by isolating any pixels that are darker than a certain threshold [55], but this method would be unable to generalize to other possible contaminations. Other contaminations could have a wide range of colors, shapes, and sizes, such that creating a binary thresholding algorithm for each specific type of contamination may be impossible. However, PaDiM may be able to detect them because such contaminations can be expected to appear anomalous from the baseline, given they are visible. This is already evidenced by some of the false positives found by the pinhole model, which look like pieces of debris. Future studies may investigate the nature and effects of these detected features; however, if they have no effect on cell performance, manufacturing conditions may need to be controlled so that such features do not appear on the membrane and cause false positives. For the object detection models, the performance metrics are comparable to those seen in the literature. While the models underperform on the scratches and scuffs datasets, this is not too surprising as the TFA method by Wang et al. achieves AP scores on the COCO dataset that also appear to be in the single digits [88]. The strongest score is the AP50, with lower AP and AP75. This likely indicates that the model struggles to predict boundingboxes that precisely overlapwith the ground truth boxes. This is likely because many scuffs, being a loose cluster of scratches, had ambiguous bordering such that drawing ground truth bounding boxes required heuristic judgement. Thus, it is not expected that the model would predict boxes that precisely intersect with the ground truth bounding box. For the object detection models, issues that lead to worse performance on the scratches and scuffs than the pinholes are similar to those seen in PaDiM. As noted previously, the membrane images containing the artificially induced scratches and scuffs are ‘noisier’ withmore bright spots in the background. Some clusters of bright spotsmay look like scuffs, which may confuse the model. There are also more several straywhite streaks that becomemore apparent. Some of those streaks may in fact be scratches from other sources, but it is impossible to know their true causes. These streaks may also confuse the model. Following the convention outlined in the original study in Rupnowski et al. [55], most objects that resembled scratches and scuffs but whose nature were unknown and didn't seem artificially created were not labeled– only artificially created defects were counted. Avoiding drawing a box arbitrarily around any object that resembled a defect minimized human bias during the labeling process. However, heuristic judgement still had to be used in some cases. For example, there were two long curved streaks on a membrane that were labeled as scratches. They were not grouped with other scratches and were shaped significantly different from the other scratches. They were also located near a scuff, so they may have been created accidently when the scuff was being created. Nevertheless, they were still labeled as scratches because they greatly stood out in the image, even if they were unlikely to be significant. The ground truth-labeled image containing these defects is shown in Fig. 11. To avoid such confusion in future studies, it would be helpful to increase collaboration with experimentalists with domain expertise who could assist with labeling in the training data. Another obstacle is the overlap in themorphology of scuffs and scratches. Although the scratches and scuffswere created differently, many scuffs resemble a large, tight cluster of scratches, and a group of scratches may also be seen as a very small, sparse scuff. If the scratches inside the scuff defects are not labeled as scratches, the model is discouraged from being able to identify scratches based on appearance alone, and its performance degrades. Lastly, many scratches were carved close to eachother, so that overlapbetweendifferent bounding boxes around scratches likely degraded performance. Inspecting performance for the pinholes, the AP scores are relatively high, and in fact surpass the scores seen on the benchmarks seen in Wang et al. [88]. This is unexpected, because a rudimentary fine-tuning approach was used, keeping the backbone frozen the entire time without following the two-stage fine-tuning procedure. The source dataset used to train the model, which was the Pascal VOC training data containing color photographs, was largely dissimilar to the grayscale 2-D pinhole images, which could have also decreased the effectiveness of transfer learning. However, the high pinhole performance is unsurprising, as most of the pinholes look very similar, usually being either a large white dot or a large black/gray dot with a whiter border. Also, pinholes were the only object class, so there was less misclassification error. However, for these models to be useful, higher scores are still desired. While the average scores themselves need improvement, another issue for detecting both scratches/scuffs and pinholes is the large confidence intervals of the different AP scores, which are larger than those seen in the benchmark scores. This is likely due to the small size of the test data. For example, 12 images of scuffs were available in total, so 7 images were used for testing in each run (as 5 images were used for training). Such a small number of images causes individual differences in each image to make a larger relative impact on the overall AP score, leading to higher variance. Furthermore, a different sample was taken from the images to serve as test data in each repeated run, so different test datasets also lead to higher variance. Overall, much more work is necessary before Faster-RCNN can achieve accuracy required for deployment. In the future, Faster-RCNN may share some advantages over IR thermography with PaDiM, like the ability to detect defects that do not fully penetrate the membrane. However, larger datasets would need to be acquired, both to increase model accuracy and to allowmore rigorous testing. Furthermore, Faster-RCNN does not have the ability to detect anomalies like PaDiM does, though it does have the ability to classify defects, which may be useful. Finally, detection times are estimated to assess viability for real-time defect detection. It should be noted that the PaDiM inference times in this study are significantly shorter than those reported in the original study due to our parallel GPU implementation. For example, the original study reports an inference timeof 0.23 sper imageusingaCPU [79],while for the pinhole dataset, the inference time is on average 0.42 s for 75 images, or around 0.01 s/image. Nevertheless, for deep learning optical methods to be viable, they need to process large volumes of a membrane quickly: facilities may have a coating speed of 10 feet per minute [55], which need to be matched by the algorithm to find defects in real time.Without a manufacturing line to test our models, the detection time is roughly estimated using the time-complexity metrics for PaDiMand the inference times for Faster-RCNN reported in the Results section. For PaDiM,we focusedon thepinholedataset,whichshould bemore time-consuming todetect than the scratchesashigher resolutions were needed. For the membranes with pinholes, each 8'' 11''membrane image was divided into 224 244 pixel sub-images for processing by the PaDiM algorithm. This corresponds to 3924 sub-images that need to be inspected to find defects on the four membrane specimens. However, this was after cropping the scanned image so that its height and width were amultiple of 224 pixels, so a small part of the membrane on the borderwas removed. If the borderwas included and the membrane was not cropped, then splitting the membranes into 224 224-sized sub-images would result in 4181 subimages total. This amount more realistically represents the number of tiles that need to be processed so that the entire membrane can be inspected. It should be noted that this is not an exact multiple of 4 because the different scanned membrane images had slightly different sizes. The time taken to extract features for all these images and create an anomaly map is roughly estimated using Equation 1: Ttotalz Ntotal Ntesting Tprocessingtesting Here, Ttotal represents the total amount of time taken to create an anomaly map for Ntotal sub-images, which is assumed to be 4181;Ntesting is the number of images in the test dataset (75) and Tprocessingtesting is the processing time for the test dataset as described in the Results. This estimation assumes that the time duration is linear in the amount images to be processed. For the pinholes, the total processing time for the test dataset was 1.5 s with no dimensionality reduction. Thus, the amount of time Ttotal to inspect the four membrane specimens is calculated as: Ttotal z 4181 75 1:5z84 seconds 84 s is a long time and does not account for other steps like image calibration or splitting up the images into sub-images; to reach the goal of 10 ft/min, the four 11'' long membranes in this study would need to be inspected in around 22 s. The reactive excitative method developed by Ulsh et al. [43] can find pinholes in 7.5 7.5 cm samples in a little over 5 s. A direct comparison between the two methods is difficult because studies have not yet been performed on how time duration of the reactive excitative method would scale up as sample area increases to the sizes of the membranes used in this study. However, it can be estimated that a 7.5 7.5 cm portion of the pinhole membrane would correspond to approximately 104 sub-images, so the amount of time to inspect it would be approximately: Ttotal z 104 75 1:5z2:1 seconds Thus, PaDiM's inspection time seems comparable to some IR thermography methods. Several othermeasures can be taken to improve processing speed. Other more recent unsupervised anomaly detection algorithms like Fastflow [91] and CFLOW-AD [92] might yield improved accuracy and inference time over PaDiM. However, the parallelized PaDiM implementation in this study is also a magnitude faster than the original serial implementation in Defard et al. [79], so Fastflow and CFLOW-AD are not guaranteed to be faster. Future research could also investigate parallelizing the inspection process with multiple cameras and/or computers, which could divide the processing time by 2 or more. Minimizing the real-time image pre-processing needed by training the PaDiM models on raw un-processed images could also optimize processing time. For example, Rupnowski et al. [55] describes how sample images may need to be calibrated due to differences in light intensity at different angles from the light source. Training the models on uncalibrated images instead may negate the need for calibration. A similar calculation on processing time is made for object detection algorithms. In this study, Faster-RCNN has a reported inference time of 0.04 s for 1 image. The images used to train and test the model were created from dividing each of the four pinhole-containing membrane images into size 66 66 sub-images, which resulted in 46,219 66 66 pixel subimages total. For the inference stage, it would take approximately 30.8 min to make a prediction for all the 46,219 subimages, which is too slow to achieve real-time defect detection of pinholes. It is clear that increasing the image resolution greatly slows down the inspection time, so smaller image resolutions could be investigated in the future. More recent algorithms could be applied, like YOLOv7 with reported inference times as fast as 286 FPS [93].

Conclusion

The application of deep learning for in-line defect detection is a promising area overall. The anomaly detectionmodel PaDiM showsmore promise than the supervised Faster-RCNNmodel, detecting more defects while being within the magnitude of real-time detection speed. PaDiM could detect faint scratches, scuffs, and artificially induced pinholes, highlighting the possibility of detecting defects that cannot be found using infrared thermography, such as those that do not fully penetrate the membrane. Unsupervised anomaly detection methods in general may be able to detect a wider range of defects than previous algorithms since they do not rely on feature selection. However, the utility of deep computer vision is limited to visible defects that can be seen on camera. Invisible defects, like catalyst layer thickness irregularities on gas diffusion electrodes [36], will be less straightforward to detect through deep computer vision. The main obstacle for detecting visible defects is “noisy” baselines: sample images may have very bright spots which are not supposed to be defects and can cause false positives. Although they are not of interest as harmful defects, many were still labeled as anomalous. A lower frequency of bright spots likely allowed PaDiM to perform well on the pinholecontaining samples, and a high frequency caused the model to perform more poorly on the scratches/scuffs-containing samples. Thus, higher quality datasets with less noise are needed. New datasets are also necessary to apply deep learningmethods to other types of fuel cells, materials, and/or defects, such as naturally occurring defects that were not artificially induced. However, the methods in this study are otherwise agnostic to the specific fuel cell type/material. Further studies could also compare the energy and resource consumption of IR thermography and optical detection methods. Infrared thermography methods need a steady supply of reactive gas like H2, energy to pump the gas through the material and purge it, as well an IR camera. On the other hand, optical detection methods in this study need a camera and light source to capture the images, and a computer with GPUs for the model to make calculations. Thus, optical methods negate the need for reactive gas and might be more resource-efficient than IR thermography. In conclusion, the application of deep learning for defect detection in PEMs will likely require more collaborative and interdisciplinary efforts. However, if successful, the applicability of optical methods for detecting defects could be greatly enhanced.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work was authored by the National Renewable Energy Laboratory, operated by Alliance for Sustainable Energy, LLC, for the U.S. Department of Energy (DOE) under Contract No. DE-AC36-08GO28308. Funding provided by the U.S. Department of Energy Office of Energy Efficiency and Renewable Energy and the Hydrogen and Fuel Cell Technologies Office. This work was also sponsored by the Office of Science and the Office of Workforce Development for Teachers and Scientists (WDTS) under the Science Undergraduate Laboratory Internship (SULI) program. The views expressed in the article do not necessarily represent the views of the DOE or the U.S. Government. The U.S. Government retains and the publisher, by accepting the article for publication, acknowledges that the U.S. Government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this work, or allow others to do so, for U.S. Government purposes. r e f e r e n c e s [1] Yuan X-Z, Nayoze-Coynel C, Shaigan N, Fisher D, Zhao N, Zamel N, Gazdzicki P, Ulsh M, Friedrich KA, Girard F, Groos U. A review of functions, attributes, properties and measurements for the quality control of proton exchange membrane fuel cell components. J Power Sources 2021;491:229540. https://doi.org/10.1016/ j.jpowsour.2021.229540. [2] Xing Y, Li H, Avgouropoulos G. Research progress of proton exchange membrane failure and mitigation strategies. Materials 2021;14:2591. https://doi.org/10.3390/ma14102591. [3] Nguyen HL, Han J, Nguyen XL, Yu S, Goo Y-M, Le DD. Review of the durability of polymer electrolyte membrane fuel cell in long-term operation: main influencing parameters and testing protocols. Energies 2021;14:4048. https://doi.org/ 10.3390/en14134048. [4] Shi S, Li J, Gao B, Dai H, Lin Q, Chen X. Effect of catalyst layer on fatigue life and fracture mechanisms of fuel cell membrane. Fatig Fract Eng Mater Struct 2022;45:687e700. https://doi.org/10.1111/ffe.13626. [5] Choi SR, Kim DY, An WY, Choi S, Park K, Yim S-D, Park J-Y. Assessing the degradation pattern and mechanism of membranes in polymer electrolyte membrane fuel cells using open-circuit voltage hold and humidity cycle test protocols. Mater Sci Energy Technol 2022;5:66e73. https:// doi.org/10.1016/j.mset.2021.12.001. [6] Tavassoli A, Lim C, Kolodziej J, Lauritzen M, Knights S, Wang GG, Kjeang E. Effect of catalyst layer defects on local membrane degradation in polymer electrolyte fuel cells. J Power Sources 2016;322:17e25. https://doi.org/10.1016/ j.jpowsour.2016.05.016. [7] Ma S, Qin Y, Liu Y, Sun L, Guo Q, Yin Y. Delamination evolution of PEM fuel cell membrane/CL interface under asymmetric RH cycling and CL crack location. Appl Energy 2022;310:118551. https://doi.org/10.1016/ j.apenergy.2022.118551. [8] Banan R, Zu J, Bazylak A. Humidity and temperature cycling effects on cracks and delaminations in PEMFCs. Fuel Cell 2015;15:327e36. https://doi.org/10.1002/fuce.201400118. [9] Wang M, Rome G, Medina S, Pfeilsticker JR, Kang Z, Pylypenko S, Ulsh M, Bender G. Impact of electrode thick spot irregularities on polymer electrolyte membrane fuel cell initial performance. J Power Sources 2020;466:228344. https:// doi.org/10.1016/j.jpowsour.2020.228344. [10] Phillips A, Ulsh M, Porter J, Bender G. Utilizing a segmented fuel cell to study the effects of electrode coating irregularities on PEM fuel cell initial performance. Fuel Cell 2017;17:288e98. https://doi.org/10.1002/fuce.201600214. [11] Kundu S, Fowler MW, Simon LC, Grot S. Morphological features (defects) in fuel cell membrane electrode assemblies. J Power Sources 2006;157:650e6. https://doi.org/ 10.1016/j.jpowsour.2005.12.027. [12] Kreitmeier S, Michiardi M, Wokaun A, Büchi FN. Factors determining the gas crossover through pinholes in polymer electrolyte fuel cell membranes. Electrochim Acta 2012; 80:240e7. https://doi.org/10.1016/j.electacta.2012.07.013. [13] Cho Y-H, Park H-S, Kim J, Cho Y-H, Cha SW, Sung Y-E. The operation characteristics of MEAs with pinholes for polymer electrolyte membrane fuel cells. Electrochem Solid State Lett 2008;11:B153. https://doi.org/10.1149/1.2937450. [14] Reshetenko TV, Bender G, Bethune K, Rocheleau R. Application of a segmented cell setup to detect pinhole and catalyst loading defects in proton exchange membrane fuel cells. Electrochim Acta 2012;76:16e25. https://doi.org/ 10.1016/j.electacta.2012.04.138. [15] Bodner M, Hochenauer C, Hacker V. Effect of pinhole location on degradation in polymer electrolyte fuel cells. J Power Sources 2015;295:336e48. https://doi.org/10.1016/ j.jpowsour.2015.07.021. [16] Lü W, Liu Z, Wang C, Mao Z, Zhang M. The effects of pinholes on proton exchange membrane fuel cell performance. Int J Energy Res 2011;35:24e30. https://doi.org/10.1002/er.1728. [17] Niroumand AM, Homayouni H, Goransson G, Olfert M, EikerlingM. In-situ diagnostic tools for hydrogen transfer leak characterization inPEMfuel cell stacks part III:manufacturing applications. J Power Sources 2020;448:227359. https://doi.org/ 10.1016/j.jpowsour.2019.227359. [18] Tang Z, Huang Q-A, Wang Y-J, Zhang F, Li W, Li A, Zhang L, Zhang J. Recent progress in the use of electrochemical impedance spectroscopy for the measurement, monitoring, diagnosis and optimization of proton exchange membrane fuel cell performance. J Power Sources 2020;468:228361. https://doi.org/10.1016/j.jpowsour.2020.228361. [19] Yuan X-Z, Zhang S, Wang H, Wu J, Sun JC, Hiesgen R, Friedrich KA, Schulze M, Haug A. Degradation of a polymer exchange membrane fuel cell stack with Nafion® membranes of different thicknesses: Part I. In situ diagnosis. J Power Sources 2010;195:7594e9. https://doi.org/10.1016/ j.jpowsour.2010.06.023. [20] Zheng W, Xu L, Hu Z, Ding Y, Li J, Ouyang M. Dynamic modeling of chemical membrane degradation in polymer electrolyte fuel cells: effect of pinhole formation. J Power Sources 2021;487:229367. https://doi.org/10.1016/ j.jpowsour.2020.229367. [21] Ehlinger VM, Kusoglu A, Weber AZ. Modeling coupled durability and performance in polymer-electrolyte fuel cells: membrane effects. J Electrochem Soc 2019;166:F3255. https:// doi.org/10.1149/2.0281907jes. [22] Tang Q, Li B, Yang D, Ming P, Zhang C, Wang Y. Review of hydrogen crossover through the polymer electrolyte membrane. Int J Hydrogen Energy 2021;46:22040e61. https:// doi.org/10.1016/j.ijhydene.2021.04.050. [23] Phillips A, Ulsh M, Mackay J, Harris T, Shrivastava N, Chatterjee A, Porter J, Bender G. The effect of membrane casting irregularities on initial fuel cell performance. Fuel Cell 2020;20:60e9. https://doi.org/10.1002/fuce.201900149. [24] Ifrek L, Rosini S, Cauffet G, Chadebec O, Rouveyre L, Bultel Y. Fault detection for polymer electrolyte membrane fuel cell stack by external magnetic field. Electrochim Acta 2019; 313:141e50. https://doi.org/10.1016/j.electacta.2019.04.193. [25] White RT, Wu A, Najm M, Orfino FP, Dutta M, Kjeang E. 4D in situ visualization of electrode morphology changes during accelerated degradation in fuel cells by X-ray computed tomography. J Power Sources 2017;350:94e102. https:// doi.org/10.1016/j.jpowsour.2017.03.058. [26] Obermaier M, Jozwiak K, Rauber M, Bauer A, Scheu C. Comparative study of pinhole detection methods for automotive fuel cell degradation analysis. J Power Sources 2021;488:229405. https://doi.org/10.1016/ j.jpowsour.2020.229405. [27] Yuan X-Z, Zhang S, Ban S, Huang C, Wang H, Singara V, Fowler M, Schulze M, Haug A, Andreas Friedrich K, Hiesgen R. Degradation of a PEM fuel cell stackwith Nafion®membranes of different thicknesses. Part II: ex situ diagnosis. J Power Sources 2012;205:324e34. https://doi.org/10.1016/ j.jpowsour.2012.01.074. [28] Shi J, Zhan Z, Zhang D, Yu Y, Yang X, He L, Pan M. Effects of cracks on the mass transfer of polymer electrolyte membrane fuel cell with high performance membrane electrode assembly. J Wuhan Univ Technol -Materials Sci Ed 2021;36:318e30. https://doi.org/10.1007/s11595-021-2412-z. [29] Shi S, Sun X, Lin Q, Chen J, Fu Y, Hong X, Li C, Guo X, Chen G, Chen X. Fatigue crack propagation behavior of fuel cell membranes after chemical degradation. Int J Hydrogen Energy 2020;45:27653e64. https://doi.org/10.1016/ j.ijhydene.2020.07.113. [30] Lin Q, Shi S, Wang L, Chen X, Chen G. Biaxial fatigue crack propagation behavior of perfluorosulfonic-acid membranes. J Power Sources 2018;384:58e65. https://doi.org/10.1016/ j.jpowsour.2018.02.002. [31] Singh Y, Khorasany RMH, Kim WHJ, Alavijeh AS, Kjeang E, Rajapakse RKND, Wang GG. Ex situ characterization and modelling of fatigue crack propagation in catalyst coated membrane composites for fuel cell applications. Int J Hydrogen Energy 2019;44:12057e72. https://doi.org/10.1016/ j.ijhydene.2019.03.108. [32] Ramani D, Singh Y, Orfino FP, Dutta M, Kjeang E. Characterization of membrane degradation growth in fuel cells using X-ray computed tomography. J Electrochem Soc 2018;165:F3200. https://doi.org/10.1149/2.0251806jes. [33] Soleymani AP, Chen J, Ricketts M, Waldecker J, Jankovic J. Failure analysis and defects characterization of polymer electrolyte membrane fuel cells after relative humidity cycling. ECS Trans 2020;98:109. https://doi.org/10.1149/ 09809.0109ecst. [34] De Moor G, Bas C, Charvin N, Dillet J, Maranzana G, Lottin O, Caqu e N, Rossinot E, Flandin L. Perfluorosulfonic acid membrane degradation in the hydrogen inlet region: a macroscopic approach. Int J Hydrogen Energy 2016;41:483e96. https://doi.org/10.1016/j.ijhydene.2015.10.066. [35] Lai Y-H, W G. Fly, In-situ diagnostics and degradation mapping of a mixed-mode accelerated stress test for proton exchange membranes. J Power Sources 2015;274:1162e72. https://doi.org/10.1016/j.jpowsour.2014.10.116. [36] Zenyuk IV, Englund N, Bender G, Weber AZ, Ulsh M. Reactive impinging-flow technique for polymer-electrolytefuel-cell electrode-defect detection. J Power Sources 2016;332:372e82. https://doi.org/10.1016/ j.jpowsour.2016.09.109. [37] Ulsh M, Porter JM, Bittinat DC, Bender G. Defect detection in fuel cell gas diffusion electrodes using infrared thermography. Fuel Cell 2016;16:170e8. https://doi.org/10.1002/ fuce.201500137. [38] Das PK, Weber AZ, Bender G, Manak A, Bittinat D, Herring AM, Ulsh M. Rapid detection of defects in fuel-cell electrodes using infrared reactive-flow-through technique. J Power Sources 2014;261:401e11. https://doi.org/10.1016/ j.jpowsour.2013.11.124. [39] Phillips A, Ulsh M, Neyerlin KC, Porter J, Bender G. Impacts of electrode coating irregularities on polymer electrolyte membrane fuel cell lifetime using quasi in-situ infrared thermography and accelerated stress testing. Int J Hydrogen Energy 2018;43:6390e9. https://doi.org/10.1016/ j.ijhydene.2018.02.050. [40] Bender G, Felt W, Ulsh M. Detecting and localizing failure points in proton exchange membrane fuel cells using IR thermography. J Power Sources 2014;253:224e9. https:// doi.org/10.1016/j.jpowsour.2013.12.045. [41] De Moor G, Charvin N, Bas C, Caqu e N, Rossinot E, Flandin L. In situ quantification of electronic short circuits in PEM fuel cell stacks. IEEE Trans Ind Electron 2015;62:5275e82. https:// doi.org/10.1109/TIE.2015.2395390. [42] Ulsh M, Sopori B, Aieta NV, Bender G. Challenges to highvolume production of fuel cell materials: quality control. ECS Trans 2013;50:919. https://doi.org/10.1149/05002.0919ecst. [43] Ulsh M, DeBari A, Berliner JM, Zenyuk IV, Rupnowski P, Matvichuk L, Weber AZ, Bender G. The development of a through-plane reactive excitation technique for detection of pinholes in membrane-containing MEA sub-assemblies. Int J Hydrogen Energy 2019;44:8533e47. https://doi.org/10.1016/ j.ijhydene.2018.12.181. [44] Arcot Mp, Zheng K, McGrory J, Fowler Mw, Pritzker Md. Investigation of catalyst layer defects in catalyst-coated membrane for PEMFC application: non-destructive method. Int J Energy Res 2018;42:3615e32. https://doi.org/10.1002/ er.4107. [45] Wang M, Medina S, Ochoa-Lozano J, Mauger S, Pylypenko S, Ulsh M, Bender G. Visualization, understanding, and mitigation of process-induced-membrane irregularities in gas diffusion electrode-based polymer electrolyte membrane fuel cells. Int J Hydrogen Energy 2021;46:14699e712. https:// doi.org/10.1016/j.ijhydene.2021.01.186. [46] Su R, Kirillin M, Chang EW, Sergeeva E, Yun SH, Mattsson L. Perspectives of mid-infrared optical coherence tomography for inspection and micrometrology of industrial ceramics. Opt Express 2014;22:15804e19. https://doi.org/10.1364/ OE.22.015804. [47] Danilczuk M, Lancucki L, Schlick S, Hamrock SJ, Haugen GM. In-depth profiling of degradation processes in a fuel cell: 2D spectral-spatial FTIR spectra of nafion membranes. ACS Macro Lett 2012;1:280e5. https://doi.org/10.1021/mz200100s. [48] Ohma A, Yamamoto S, Shinohara K. Membrane degradation mechanism during open-circuit voltage hold test. J Power Sources 2008;182:39e47. https://doi.org/10.1016/ j.jpowsour.2008.03.078. [49] Rupnowski P, Ulsh M, Sopori B, Green BG, Wood DL, Li J, Sheng Y. In-linemonitoring of Li-ion battery electrode porosity andareal loadingusingactive thermal scanning -modelingand initial experiment. J Power Sources 2018;375:138e48. https:// doi.org/10.1016/j.jpowsour.2017.07.084. [50] Sopori B, Rupnowski P, Ulsh M. On-line, continuous monitoring in solar cell and fuel cell manufacturing using spectral reflectance imaging. Golden, CO (United States): National Renewable Energy Lab. (NREL); 2016. https://www. osti.gov/doepatents/biblio/1234545-line-continuousmonitoring-solar-cell-fuel-cell-manufacturing-usingspectral-reflectance-imaging. [Accessed 12 May 2022]. [51] Sopori BL, Ulsh MJ, Rupnowski P, Bender G, Penev MM, Li J, W III DL, Daniel C. Batch and continuous methods for evaluating the physical and thermal properties of films. US10684128B2; 2020. https://patents.google.com/patent/ US10684128B2/en. [Accessed 16 May 2022]. [52] Rupnowski P, Ulsh MJ. Thicknessmapping usingmultispectral imaging. 2019, US10480935B2. https://patents.google.com/ patent/US10480935B2/en. [Accessed 12 May 2022]. [53] Choi D, Jeon Y-J, Kim SH, Moon S, Yun JP, Kim SW. Detection of pinholes in steel slabs using gabor filter combination and morphological features. ISIJ Int 2017;57:1045e53. https:// doi.org/10.2355/isijinternational.ISIJINT-2016-160. [54] Sopori B, Ulsh M, Rupnowski P. Optical techniques for monitoring continuous manufacturing of proton exchange membrane fuel cell components. US20130226330A1; 2013. https://patents.google.com/patent/US20130226330A1/en. [Accessed 16 May 2022]. [55] Rupnowski P, Ulsh M, Sopori B. High throughput and high resolution in-line monitoring of PEMFC materials by means of visible light diffuse reflectance imaging and computer vision. American Society of Mechanical Engineers Digital Collection; 2015. https://doi.org/10.1115/FUELCELL201549212. [56] johnson_jay_t_200912_mast.pdf. n.d. https://smartech. gatech.edu/bitstream/handle/1853/37234/johnson_jay_t_ 200912_mast.pdf. [Accessed 11 May 2022]. [57] Tolba AS, Raafat HM. Multiscale image quality measures for defect detection in thin films. Int J Adv Manuf Technol 2015;79:113e22. https://doi.org/10.1007/s00170-014-6758-7. [58] Wang Y, Seo B, Wang B, Zamel N, Jiao K, Adroher XC. Fundamentals, materials, and machine learning of polymer electrolyte membrane fuel cell technology. Energy AI 2020;1:100014. https://doi.org/10.1016/ j.egyai.2020.100014. [59] Xie J, Wang C, Zhu W, Yuan H. A multi-stage fault diagnosis method for proton exchange membrane fuel cell based on support vector machine with binary tree. Energies 2021;14:6526. https://doi.org/10.3390/en14206526. [60] Zhou S, Shearing PR, Brett DJL, Jervis R. Machine learning as an online diagnostic tool for proton exchange membrane fuel cells. Curr. Opin. Electrochem. 2022;31:100867. https:// doi.org/10.1016/j.coelec.2021.100867. [61] Xing Y, Wang B, Gong Z, Hou Z, Xi F, Mou G, Du Q, Gao F, Jiao K. Data-driven fault diagnosis for PEM fuel cell system using sensor pre-selection method and artificial neural network model. IEEE Trans Energy Convers 2022. 1e1,https:// doi.org/10.1109/TEC.2022.3143163. [62] Park JY, Lim IS, Choi EJ, Kim MS. Fault diagnosis of thermal management system in a polymer electrolyte membrane fuel cell. Energy 2021;214:119062. https://doi.org/10.1016/ j.energy.2020.119062. [63] Liu J, Li Q, Chen W, Yan Y, Wang X. A fast fault diagnosis method of the PEMFC system based on extreme learning machine and dempstereshafer evidence theory. IEEE Trans. Transp. Electrification. 2019;5:271e84. https://doi.org/ 10.1109/TTE.2018.2886153. [64] Zhou S, Lu Y, Bao D, Wang K, Shan J, Hou Z. Real-time datadriven fault diagnosis of proton exchange membrane fuel cell system based on binary encoding convolutional neural network. Int J Hydrogen Energy 2022;47:10976e89. https:// doi.org/10.1016/j.ijhydene.2022.01.145. [65] Wang B, Xie B, Xuan J, Jiao K. AI-based optimization of PEM fuel cell catalyst layers for maximum power density via datadriven surrogate modeling. Energy Convers Manag 2020;205:112460. https://doi.org/10.1016/ j.enconman.2019.112460. [66] Fan W, Xu B, Li H, Lu G, Liu Z. A novel surrogate model for channel geometry optimization of PEM fuel cell based on Bagging-SVM Ensemble Regression. Int J Hydrogen Energy 2022;47:14971e82. https://doi.org/10.1016/ j.ijhydene.2022.02.239. [67] Briceno-Mena LA, Venugopalan G, Romagnoli JA, Arges CG. Machine learning for guiding high-temperature PEM fuel cells with greater power density. Patterns 2021;2:100187. https://doi.org/10.1016/j.patter.2020.100187. [68] Mor an-Dur an A, Martı́nez-Sibaja A, Rodrı́guez-Jarquin JP, Posada-G omez R, Gonz alez OS. PEM fuel cell voltage neural control based on hydrogen pressure regulation. Processes 2019;7:434. https://doi.org/10.3390/pr7070434. [69] Derbeli M, Napole C, Barambones O. Machine learning approach for modeling and control of a commercial heliocentris FC50 PEM fuel cell system. Mathematics 2021;9:2068. https://doi.org/10.3390/math9172068. [70] Wang J, Ding R, Cao F, Li J, Dong H, Shi T, Xing L, Liu J. Comparison of state-of-the-art machine learning algorithms and data-driven optimization methods for mitigating nitrogen crossover in PEM fuel cells. Chem Eng J 2022;442:136064. https://doi.org/10.1016/j.cej.2022.136064. [71] Wei Z, Osman A, Gross D, Netzelmann U. Artificial intelligence for defect detection in infrared images of solid oxide fuel cells. Infrared Phys Technol 2021;119:103815. https://doi.org/10.1016/j.infrared.2021.103815. [72] Si B, Yasengjiang M, Wu H. Deep learning-based defect detection for hot-rolled strip steel. J. Phys. Conf. Ser. 2022;2246:012073. https://doi.org/10.1088/1742-6596/2246/1/ 012073. [73] Czimmermann T, Ciuti G, Milazzo M, Chiurazzi M, Roccella S, Oddo CM, Dario P. Visual-based defect detection and classification approaches for industrial applicationsda survey. Sensors 2020;20:1459. https://doi.org/10.3390/ s20051459. [74] Wang T, Chen Y, Qiao M, Snoussi H. A fast and robust convolutional neural network-based defect detection model in product quality control. Int J Adv Manuf Technol 2018;94:3465e71. https://doi.org/10.1007/s00170-017-0882-0. [75] Fang X, Luo Q, Zhou B, Li C, Tian L. Research progress of automated visual surface defect detection for industrial metal planar materials. Sensors 2020;20:5136. https://doi.org/ 10.3390/s20185136. [76] Shahrabadi S, Castilla Y, Guevara M, Magalh~aes LG, Gonzalez D, Ad~ao T. Defect detection in the textile industry using image-based machine learning methods: a brief review. J. Phys. Conf. Ser. 2022;2224:012010. https://doi.org/ 10.1088/1742-6596/2224/1/012010. [77] Westphal E, Seitz H. A machine learning method for defect detection and visualization in selective laser sintering based on convolutional neural networks. Addit Manuf 2021;41:101965. https://doi.org/10.1016/j.addma.2021.101965. [78] Bhatt PM, Malhan RK, Rajendran P, Shah BC, Thakar S, Yoon YJ, Gupta SK. Image-based surface defect detection using deep learning: a review. J Comput Inf Sci Eng 2021;21. https://doi.org/10.1115/1.4049535. [79] Defard T, Setkov A, Loesch A, Audigier R. PaDiM: a Patch distribution modeling framework for anomaly detection and localization. https://doi.org/10.48550/arXiv.2011.08785; 2020. [80] Bergmann P, Batzner K, Fauser M, Sattlegger D, Steger C. The MVTec anomaly detection dataset: a comprehensive realworld dataset for unsupervised anomaly detection. Int J Comput Vis 2021;129:1038e59. https://doi.org/10.1007/ s11263-020-01400-4. [81] Ren S, He K, Girshick R, Sun J, Faster R-CNN. Towards realtime object detection with region proposal networks. arXiv, 2016. https://doi.org/10.48550/arXiv.1506.01497. [82] Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: unified, real-time object detection. arXiv, 2016. https:// doi.org/10.48550/arXiv.1506.02640. [83] Tan M, Le QV. EfficientNet: rethinking model scaling for convolutional neural networks. 2020. https://doi.org/10. 48550/arXiv.1905.11946. [84] yyj, PaDiM-EfficientNet. https://github.com/youngjaeavikus/PaDiM-EfficientNet. [Accessed 9 January 2023]. accessed. [85] Bergmann P, Fauser M, Sattlegger D, Steger C. Uninformed students: student-teacher anomaly detection with discriminative latent embeddings. In: 2020 IEEECVF conf. Comput. Vis. Pattern. Recognit. CVPR; 2020. p. 4182e91. https://doi.org/10.1109/CVPR42600.2020.00424. [86] Shi Y, Yang J, Qi Z. Unsupervised anomaly segmentation via deep feature reconstruction. Neurocomputing 2021;424:9e22. https://doi.org/10.1016/j.neucom.2020.11.018. [87] Lipton ZC, Elkan C, Narayanaswamy B. Thresholding classifiers to maximize F1 score. arXiv, 2014. https://doi.org/ 10.48550/arXiv.1402.1892. [88] Wang X, Huang TE, Darrell T, Gonzalez JE, Yu F. Frustratingly simple few-shot object detection. 2020. https://doi.org/10. 48550/arXiv.2003.06957. [89] Nguyen N-D, Do T, Ngo TD, Le D-D. An evaluation of deep learning methods for small object detection. J. Electr. Comput. Eng. 2020;2020:e3189691. https://doi.org/10.1155/ 2020/3189691. [90] Cartucho J, Ventura R, Veloso M. Robust object recognition through symbiotic deep learning in mobile robots in 2018. IEEERSJ Int. Conf. Intell. Robots Syst. IROS 2018:2336e41. https://doi.org/10.1109/IROS.2018.8594067. [91] Yu J, Zheng Y, Wang X, Li W, Wu Y, Zhao R, Wu L. FastFlow: unsupervised anomaly detection and localization via 2D normalizing flows. 2021. https://doi.org/10.48550/arXiv.2111. 07677. [92] Gudovskiy D, Ishizaka S, Kozuka K. CFLOW-AD: real-time unsupervised anomaly detection with localization via conditional normalizing flows. arXiv, 2021. https://doi.org/10. 48550/arXiv.2107.12571. [93] WangC-Y,BochkovskiyA, LiaoH-YM.YOLOv7: trainablebag-offreebies sets new state-of-the-art for real-time object detectors. arXiv, 2022. https://doi.org/10.48550/arXiv.2207.02696.

Article Images (0)

Bioz is the world’s most comprehensive AI search engine for scientific experimentation. The Bioz search engine offers researchers billions of data-driven product, technique, and protocol recommendations. Bioz taps into the latest advances in Artificial Intelligence (AI) – including Natural Language Processing (NLP), Machine Learning (ML) and Generative AI to mine and structure hundreds of millions of pages of complex and unstructured scientific papers. By harnessing the power of AI, Bioz provides researchers with an unprecedented amount of summarized scientific experimentation knowledge right at their fingertips, ultimately helping to speed up drug discovery and the rate of success in finding cures for diseases. Bioz is used by over 15 Million researchers from over 17,000 different universities and companies in 196 countries.