A deep learning-based method for marine oil spill detection and its application in UAV imagery☆

  1. Introduction
    With the intensifying global exploitation of marine resources, frequent oil spill incidents during offshore extraction, transportation, and storage have emerged as a critical environmental threat, jeopard- izing both marine ecosystem security and coastal socio-economic

stability (Dong et al., 2022; Zhou et al., 2018; Taleghani and Tyagi, 2017; Fingas, 2016). Oil dispersion in marine environments not only leads to large-scale surface slicks, which inhibit photosynthesis and reduce dissolved oxygen levels, but also causes toxic bioaccumulation that affects various biological communities, including plankton, fish, and seabirds. These impacts can ultimately result in severe ecological

☆ This article is part of a Special issue entitled: ‘Oil spills MPB’published in Marine Pollution Bulletin.

  • Corresponding authors at: Sun Yat-sen University, Center for Earth Environment and Earth Resources, Zhuhai, 519000, Guangdong Province, China. E-mail addresses: zhouyz@mail.sysu.edu.cn (Y. Zhou), 05161935@cumt.edu.cn (J. Ma).
    https://doi.org/10.1016/j.marpolbul.2025.118889
    Received 23 May 2025; Received in revised form 6 October 2025; Accepted 21 October 2025
    Available online 29 October 2025
    0025-326X/© 2025 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.

damage and pose risks to human health through transmission along the food chain (Sun et al., 2021; Ainsworth et al., 2021; Forsgren et al., 2009; Banks, 2003). In real-world emergency response scenarios, rapid detection and precise localization in the early stages of oil spills are crucial for containing pollution spread, assessing environmental impact, and deploying timely remediation measures (Wu et al., 2025; Yang et al., 2022; Fingas, 2016). However, traditional oil spill monitoring methods rely primarily on satellite remote sensing, airborne infrared scanning, and manual inspections, which are often limited by low spatial resolution and poor performance in detecting weak targets under com- plex environmental conditions (Zhuang et al., 2024). These approaches suffer from significant limitations, including long latency, inadequate spatial resolution and strong sensitivity to meteorological and marine conditions, which render them ineffective for meeting the stringent re- quirements of high-precision and time-sensitive detection in emergent spill incidents (Valavan et al., 2025; Qiu et al., 2024; Mishra et al., 2023; Jafarzadeh et al., 2021).
In recent years, advancements in remote sensing imaging and un- manned aerial vehicle (UAV) technologies have enabled image-based oil spill detection to emerge as a key method in emergency monitoring (Temitope Yekeen and Balogun, 2020; Al-Ruzouq et al., 2020; Fingas and Brown, 2014). However, conventional image processing approaches typically rely on manually crafted features such as texture, color or edge indicators, including Gray-Level Co-occurrence Matrix (GLCM), edge gradient operators and regional threshold segmentation. These methods often show poor stability under complex marine conditions such as sun reflection, wave fluctuations, and cloud shadows (Li et al., 2022; Cai et al., 2022; Temitope Yekeen and Balogun, 2020; Zhao et al., 2014). While some machine learning-based approaches have introduced clas- sifiers such as Support Vector Machines (SVM) and Random Forests (RF), these methods often rely heavily on manual feature engineering, demonstrate limited generalization ability, and show low processing efficiency when applied to large-scale image datasets (Zhou et al., 2023; Genovez et al., 2023; Seydi et al., 2021; et al., 2021; Al- Ruzouq et al., 2020). In contrast, deep learning techniques, especially end-to-end visual models based on Convolutional Neural Networks (CNNs), have demonstrated superior performance in semantic image understanding, object detection, and spatial feature extraction (Lei et al., 2024). Thus, these models have been widely adopted in various remote sensing tasks, including urban pollution monitoring, water eutrophication assessment, and coastline extraction (Z. Zhang et al., 2025; Weng et al., 2025; Xing et al., 2024; Rabie et al., 2024). Building upon this foundation, object detection networks, which simultaneously perform object localization and classification, offer a promising approach for the automated identification of marine oil slicks (Jiang et al., 2024). Among them, theYOLO (You Only Look Once) model series has become a core component in the construction of real-time maritime monitoring systems due to their lightweight architecture, rapid infer- ence speed, and robust end-to-end optimization capability (X. Wang et al., 2025; Wanga et al., 2025; Mao and Hong, 2025; Chen et al., 2023; Jiang et al., 2022).
As an end-to-end real-time object detection framework, the YOLO (You Only Look Once) series has undergone multiple iterations since the introduction of YOLOv1, achieving a favorable trade-off between detection speed and accuracy (Wang and Liao, 2024; Huang et al., 2017). YOLOv4 introduced the CSPDarknet backbone and Mish activa- tion function, enhancing deep feature extraction but still faced limita- tions in boundary precision and attention modeling under noisy conditions (Alhassan and Yılmaz, 2025). YOLOv5 adopted a lightweight modular design to reduce inference time, making it ideal for deployment on edge devices, though it lacked integrated attention mechanisms and fine-grained spatial modeling (Jani et al., 2023). Subsequent versions such as YOLOv8 to YOLOv11 introduced partial attention modules and anchor-free mechanisms, improving small object detection but still fell short in tasks involving blurred boundaries or texture-sparse targets, which are common in marine oil spill imagery (Nikouei et al., 2025; Mao

and Hong, 2025; Sapkota et al., 2024; Hussain, 2024). The latest YOLOv12 architecture incorporates Dynamic Feature Aggregation and an Edge-aware Attention Module, and leverages Neural Architecture Search (NAS) to optimize the model structure (Guo et al., 2025). These innovations enable further compression of model size and acceleration of inference speed without compromising detection accuracy, particu- larly benefiting deployment on computationally constrained platforms such as UAVs and embedded systems (Zheng et al., 2025; Khanam and Hussain, 2025; Sapkota et al., 2024). Although some studies have recently applied YOLO models to marine oil spill image recognition with encouraging preliminary results, most of them focus on static target detection in remote sensing images and have not fully addressed the challenges of real-time deployment and robustness under complex background conditions (Wanga et al., 2025; Cai et al., 2024; Xu et al., 2023). At present, there is still a lack of deep learning detection frameworks tailored for real-world marine monitoring that effectively balance robustness, edge deployment compatibility, and real-time pro- cessing capabilities (Hu et al., 2024). Therefore, there is an urgent need to leverage the architectural advantages of YOLOv12 to develop a customized detection framework for marine oil spill detection that can provide real-time and precise monitoring of polluted areas under com- plex sea conditions (Alazeb et al., 2024).
In response to the above research gaps and technical demands, this study aims to develop a marine oil spill detection method that is capable of achieving high-precision and real-time performance in complex ma- rine environments, while also exploring its practical applicability on UAV platforms (Chen et al., 2025). Therefore, a customized detection framework is proposed based on an improved YOLOv12 architecture. To address challenges such as significant target scale variation, fuzzy boundaries, and severe background interference in sea surface oil slicks, the model integrates an edge-aware mechanism and multi-scale feature fusion modules to enhance detection sensitivity and robustness in low- contrast pollution areas (Deng et al., 2025). In terms of data, a UAV image dataset has been constructed to include diverse lighting condi- tions, wave disturbances, and oil spill morphologies, thereby enhancing the model's adaptability to real-world marine scenarios (Huang and Li, 2023). In terms of deployment, lightweight model optimization and adaptive anchor mechanisms have been incorporated to reduce computational overhead and enable embedded deployment on resource- constrained platforms, including marine observation units and patrol UAVs (Li et al., 2024). A series of quantitative experiments have been conducted across multiple real-world marine image scenarios to sys- tematically evaluate the proposed method with respect to detection accuracy, inference speed, and operational stability (Y. Wang et al., 2025). The results are expected to provide technical support for intel- ligent monitoring of marine oil spills and contribute theoretical and engineering insights for the cross-scenario application of deep learning models in marine pollution detection. To better illustrate the workflow, Fig. 1 provides a graphical summary of the proposed framework, which includes dataset construction, annotation, training, evaluation, test-set validation, and scenario validation under dynamic and multi-scale conditions, ultimately leading to potential applications and engineer- ing deployment in real-world marine environments.

  1. YOLOv12 model
    2.1. Overview and recent advances in YOLO algorithms
    YOLOv12 (You Only Look Once version 12), released in 2025, rep- resents the latest generation of real-time object detection models, of- fering systematic improvements targeting the challenges of small object recognition, feature entanglement, and processing efficiency in complex environments. Building upon YOLO's hallmark efficiency-oriented ar- chitecture, YOLOv12 integrates several critical optimization strategies that significantly enhance its robustness and computational perfor- mance in cluttered backgrounds (Tian et al., 2025; Hussain, 2024). Key
  2. enhancements of YOLOv12 include the Area Attention mechanism, the Residual Efficient Layer Aggregation Network (R-ELAN), large-kernel separable convolution, the position-aware modules, and the latest Fla- shAttention acceleration algorithm. Specifically, the Area Attention mechanism enables a broad receptive field while minimizing redundant computation; the R-ELAN architecture improves the depth and stability of feature extraction; and FlashAttention effectively reduces memory access bottlenecks, thereby accelerating inference on edge devices. The YOLOv12 architecture supports five model scales (N, S, M, L, X), allowing flexible deployment tailored to platform performance con- straints. This makes it particularly suitable for computationally con- strained platforms such as unmanned aerial vehicles (UAVs) and edge- sensing terminals (Tian et al., 2025; Ma et al., 2025).
  3. From a performance perspective, YOLOv12 demonstrates a leading advantage on the MS COCO dataset, which is a large-scale and widely adopted benchmark for object detection. It contains over 200,000 labeled images covering 80 object categories, and is designed to support standardized evaluation of models in terms of both detection accuracy and computational efficiency. More information about the dataset is available at [https://cocodataset.org]. As shown in Fig. 2, a comparative analysis of detection performance under varying latency and computa- tional complexity is conducted among YOLOv12, YOLOv11, RT-DETR, and other model series. The results indicate that YOLOv12 consis- tently achieves higher mean Average Precision (mAP) within low- latency ranges (e.g., 1.6–6.7 ms) and with comparable FLOPs, showing particularly notable gains in lightweight configurations. For instance, YOLOv12-N achieves a 40.6 % mAP on the T4 platform, out- performing YOLOv10-N and YOLOv11-N by 2.1 % and 1.2 %, respec- tively. In addition, YOLOv12 natively supports joint modeling of object detection and instance segmentation tasks, providing a complete perceptual information stream that enhances both object recognition and spatial localization in marine oil spill imagery.
  4. Overall, YOLOv12 achieves significant advancements in detection accuracy, model compactness, and multi-scale adaptability, offering both theoretical and practical support for intelligent detection tasks involving highly dynamic, small-object, and complex-background sce- narios in marine environments. It is particularly well-suited for deployment on low-power, embedded unmanned platforms required for offshore oil spill monitoring.
A deep learning-based method for marine oil spill detection and its application in UAV imagery☆

相关推荐