VN September 2025

Vetnews | September 2025 30 « BACK TO CONTENTS [2016], which uses the known human body temperature to identify people in thermal images. This technique is not, however, used within an active learning algorithm. While De Oliveira and Wehrmeister [2016] knew the human body temperature with reasonably high precision, we face the challenge of not knowing the temperature of rhino middens. We do know, however, that they are often warmer than the surrounding ground and vegetation. Thus, in our active learning system, we prioritise images for labelling that have higher pixel values in the thermal band. With this method, the system is able to more quickly learn to distinguish between middens and non-middens. Setup Data The remote sensing data used for this project was collected by a DJI M600 multicopter flown over a 284-hectare site in Kruger National Park in January 2020. This drone was equipped with an animal landscape observatory sensor package, consisting of a FLIR Tau-2 thermal camera, a Sony A6000 camera, and a Riegl VUX-1LR LiDAR scanner, which simultaneously collected thermal, RGB, and LiDAR data, respectively, throughout the drone’s flight. The thermal imagery was rectified and mosaicked at a resolution of 0.5 m, the RGB imagery at 0.05 m, and the LiDAR imagery at 0.25 m, yielding the orthomosaics shown in Figure 3. The ecologists on our team identified candidate middens in the thermal and RGB orthomosaics and confirmed their presence on the ground, yielding a list of the x and y coordinates of the centres of 52 rhino middens. We mapped these middens onto the orthomosaics and then cropped them using an interval of 20 m (40 pixels for thermal, 400 for RGB, and 80 for LiDAR) and a stride of 5 m (10 pixels for thermal, 100 for RGB, and 20 for LiDAR). Each cropped image was assigned a label of 1 if it contained the centre of a midden and 0 otherwise. We downshifted the pixel values of each thermal image such that the cropped thermal images all had a minimum of 0 to enable a meaningful comparison among them. After removing images with all zeros in either the thermal or RGB bands, we were left with 89 images with middens and 9,683 empty images, which means that our dataset has 9,772 images (in each modality), 0.91% of which have a midden. We fused the images using the blend function in the PIL class, yielding fusions of thermal and RGB, thermal and LiDAR, RGB and LiDAR, and thermal, RGB, and LiDAR images, with each data modality weighted equally. Model We employ transfer learning with a VGG16 model pretrained on the ImageNet dataset [Simonyan and Zisserman, 2015]. We freeze all the parameters in the model except for those in the classifier. We alter the final linear layer to have a single out feature and then end with a sigmoid function so that the output of the model represents the probability that an image contains a midden. For all our models, we use a batch size of 10, the Binary Cross Entropy loss function in PyTorch, and an Adam optimiser with a learning rate of 0.0001. Active Learning Methodology MultimodAL Algorithm Active learning aims to reduce the number of instances that need to be labelled to train a model by requesting labels for those that are most useful for its learning. The general procedure works as follows: (i) a small batch of labelled instances is used to begin training a model, (ii) the model then uses some criteria to select the next batch of instances to be labelled, typically those about which the model is least certain [Lewis and Gale, 1994], (iii) this process continues until a labelling budget is reached, and (iv) the trained model can then be used for inference on the remaining unlabelled instances. Problematically, however, traditional active learning approaches can have poor performance in the presence of severe class imbalance. To address this challenge, our active learning algorithm, MultimodAL, is designed to detect as many of the rare positive samples as possible in each round. To achieve this, rather than have the model predict on the entire training dataset as is done in many typical active learning systems, we propose constraining the set of instances on which the model predicts through a novel technique that exploits some characteristic of the object of interest that can be used for ranking. Furthermore, we propose a dynamic method for combining the outputs of several models trained on different data modalities to further speed up learning. We first describe our method assuming a single data modality, diagrammed in Figure 4. We assume each instance in the dataset can be assigned a value corresponding to a metric (e.g. temperature, colour, etc.) that is associated with the desired rare signal of interest. We then rank all the instances by the distance between their value of the informative metric and a specified target value (e.g. human body temperature, colour of grass, etc.) characteristic of the desired signal (top left box of Figure 4). Once the images are ranked, we select a subset to be labelled. Let b be the size of a batch of images selected by the active learning system for labelling. (1) To produce each of the batches, we first compute the output of the model on the sample with the highest ranking out of those remaining unlabelled. To reflect the uncertainty captured in the model’s output, we do not simply assign the instance the highest probability class. Instead, we classify it by randomly sampling a class according to the model’s output (i.e., the output specifies the parameters of a multinomial distribution). If the instance is ultimately predicted to be a positive sample, we add it to the batch. We then feed the sample with the next highest ranking to the model and continue this process until we have a batch with b samples predicted by the model to be positive. In this way, we bias the model towards selecting high-ranking images that we know are more likely to be positive instances. (2) Next, the batch is sent to the labeller. (3) The batch is labelled by the annotator and then (4) added to the set of instances queried so far. Because positive samples may be so rare, batches can be imbalanced toward the negative class(es). (5) If all the instances queried so far are negative, then we select all of them for training. If any are positive, we take all of those for training and randomly select an equivalent number of negative instances to get a balanced training set. At this point, the model weights are reset to their initial values to prevent overfitting on a small labelled dataset, and the model is then trained on the selected Figure 4: Active learning cycle where the images are ranked by their brightness. (1) Predict on highest-ranked images. (2) Query images predicted to be positive. (3) Assign labels to queried images. (4) Add the newly labelled images to the set of all labelled images. (5) Train the model on a selection of the labelled images. (6) Restart Article

RkJQdWJsaXNoZXIy OTc5MDU=