Skip to content

Environmental Analysis Methodology

24/04/2026

A highly granular methodology designed to unravel the micro-climatic environmental parameters driving strictly local jellyfish occurrences. This documentation outlines the exact data extraction and rigorous statistical pipelines driving the Meduseo web dashboard.

Phase 1 Highly Localized Data Retrieval

NetCDF Spatial Slicing

Unlike traditional models that average environmental data across massive regional polygons, Meduseo utilizes dynamically sliced raw NetCDF files accessed natively through xarray.

For every valid location, the pipeline maps the exact GPS coordinates and forms a dynamic bounding box using a 0.15° spatial radius. This strictly limits the environmental assessment to approximately a 15km radius around the specific coastline, effectively capturing bay-specific micro-climates.

Data Sources

We harness two main pillars of the Copernicus scientific suite:

  • ERA5 Atmospheric Reanalysis: Sourced via the CDS API at 0.25° grid resolution, covering Wind, Atmospheric Pressure, Temperature, and Precipitation.
  • Copernicus Marine Environment (CMEMS): Sourced natively at 0.083° resolution, extracting physical indicators like SST, Salinity, Current Surface Vector Velocities, and Oceanic Waves.

Phase 2 Statistical Classifications

Target Formulation: What makes a "Jellyfish Day"?

For any localized city, Meduseo users submit sighting reports scaled from 0 (Clear) to 4 (Severe Outbreak). To safely attribute an environmental profile, the script calculates the daily arithmetic average of all reports tied to that specific GPS location.

  • Jellyfish Present: If mean > 1.
  • Clear Day: If mean ≤ 1.
  • Threshold Requirement: A city is only considered for statistical analysis if it contains a strict minimum of 10 fully verified reporting days during the summer seasons.

> 1

Mean Load Threshold


Phase 3 Rigorous Statistical Testing

Mann-Whitney U

Because weather variables notoriously violate normal distribution assumptions, we utilize the non-parametric Mann-Whitney U test. This algorithm compares the summed rank of parameter values observed on clear days against jellyfish days.

Cohen's d (Effect Size)

While a p-value strictly specifies if an effect exists, Cohen's d establishes the magnitude of the trigger. Calculated using the difference of means relative to the pooled standard deviation, allowing us to rank occurrences drivers.

FDR Correction

Running unconstrained tests inflates the risk of false positives. We handle this by passing all base p-values through a Benjamini-Hochberg FDR correction. Only metrics satisfying q < 0.05 are tagged as significant drivers.


Phase 4 Computed Variable Lexicon

Every analysis assesses the native daily state alongside computed chronological delays (lags). Because marine physics requires momentum buildup, analyzing 1-day and 2-day temporal lags allows us to chart delayed environmental transport mechanisms.

Variable Code Display Name Unit Source & Definition
sst Sea Surface Temp °C Copernicus Marine daily mean extraction.
msl_mean Atmospheric Pressure hPa ERA5 reduced to sea level pressure. Indicates general cyclonic activities.
salinity Salinity PSU Marine surface salt concentration (~5m proxy depth).
current_speed Surface Current m/s Magnitude derived through vector coordinates.
wave_height Wave Height m Mean height of the highest 33% of waves (VHM0).
*_lag_1d / 2d Temporal Lags - Tracking 24-hr and 48-hr historical states safely within season boundaries.
*_direction Circular Trajectories ° Statistics evaluate Circular Means avoiding standard arithmetic breakage.