A highly granular methodology designed to unravel the micro-climatic environmental parameters driving strictly local jellyfish occurrences. This documentation outlines the exact data extraction and rigorous statistical pipelines driving the Meduseo web dashboard.
Phase 1 Highly Localized Data Retrieval
NetCDF Spatial Slicing
Unlike traditional models that average environmental data across massive regional polygons, Meduseo utilizes dynamically sliced raw NetCDF files accessed natively through xarray.
For every valid location, the pipeline maps the exact GPS coordinates and forms a dynamic bounding box using a 0.15° spatial radius. This strictly limits the environmental assessment to approximately a 15km radius around the specific coastline, effectively capturing bay-specific micro-climates.
Data Sources
We harness two main pillars of the Copernicus scientific suite:
- ERA5 Atmospheric Reanalysis: Sourced via the CDS API at 0.25° grid resolution, covering Wind, Atmospheric Pressure, Temperature, and Precipitation.
- Copernicus Marine Environment (CMEMS): Sourced natively at 0.083° resolution, extracting physical indicators like SST, Salinity, Current Surface Vector Velocities, and Oceanic Waves.
Phase 2 Statistical Classifications
Target Formulation: What makes a "Jellyfish Day"?
For any localized city, Meduseo users submit sighting reports scaled from 0 (Clear) to 4 (Severe Outbreak).
To safely attribute an environmental profile, the script calculates the daily arithmetic average of all reports tied to that specific GPS location.
- Jellyfish Present: If
mean > 1. - Clear Day: If
mean ≤ 1. - Threshold Requirement: A city is only considered for statistical analysis if it contains a strict minimum of 10 fully verified reporting days during the summer seasons.
> 1
Mean Load Threshold
Phase 3 Rigorous Statistical Testing
Mann-Whitney U
Because weather variables notoriously violate normal distribution assumptions, we utilize the non-parametric Mann-Whitney U test. This algorithm compares the summed rank of parameter values observed on clear days against jellyfish days.
Cohen's d (Effect Size)
While a p-value strictly specifies if an effect exists, Cohen's d establishes the magnitude of the trigger. Calculated using the difference of means relative to the pooled standard deviation, allowing us to rank occurrences drivers.
FDR Correction
Running unconstrained tests inflates the risk of false positives. We handle this by passing all base p-values through a Benjamini-Hochberg FDR correction. Only metrics satisfying q < 0.05 are tagged as significant drivers.
Phase 4 Computed Variable Lexicon
Every analysis assesses the native daily state alongside computed chronological delays (lags). Because marine physics requires momentum buildup, analyzing 1-day and 2-day temporal lags allows us to chart delayed environmental transport mechanisms.
| Variable Code | Display Name | Unit | Source & Definition |
|---|---|---|---|
sst |
Sea Surface Temp | °C | Copernicus Marine daily mean extraction. |
msl_mean |
Atmospheric Pressure | hPa | ERA5 reduced to sea level pressure. Indicates general cyclonic activities. |
salinity |
Salinity | PSU | Marine surface salt concentration (~5m proxy depth). |
current_speed |
Surface Current | m/s | Magnitude derived through vector coordinates. |
wave_height |
Wave Height | m | Mean height of the highest 33% of waves (VHM0). |
*_lag_1d / 2d |
Temporal Lags | - | Tracking 24-hr and 48-hr historical states safely within season boundaries. |
*_direction |
Circular Trajectories | ° | Statistics evaluate Circular Means avoiding standard arithmetic breakage. |