The rapid development of Ultra-Violet Induced Fluorescence (UV-LIF) instruments such as the Wideband Integrated Bioaerosol Sensor (WIBS) – and others for atmospheric sampling has allowed for the collection of increasingly large datasets. It is therefore imperative to evaluate the performance of different algorithms for differentiating between Bacteria, Fungal Spores and Pollen for a wide range of aerobiology applications.
We apply Hierarchical Cluster Analysis (HCA), an unsupervised method and Gradient Boosting (GB), a supervised method, to two randomly generated datasets; a dataset consisting of five PSLs and a dataset with a variety of laboratory generated aerosol.
A possible limitation of the Calinski-Harabasz (CH) index for cluster selection is highlighted. We see that when attempting to analyse clusters which are sufficiently different in size the index frequently makes an incorrect conclusion as to the number of clusters. For bioaerosol, where we might expect the concentrations of bacteria to be much larger than fungal spores and pollen this is an important limitation that needs to be circumvented in future analysis.
We also see that common Aspen and Poplar Pollen samples, for example, could not effectively be differentiated from others, such as puffball samples, using GB. It is unclear whether this will be the case for other pollen samples. In the future more data will need to be collected, preferably with higher spectral resolution, to determine whether this issue will persist for a variety of samples.
Overall, GB yields correct classification of 99.0% and 94.4% for the PSLs and the laboratory generated aerosol respectively.