Metran practical example

This notebook shows a practical application of Metran on calculated residuals from univariate time series models as published in the article van Geer and Berendrecht in Stromingen (2015).

[1]:
import os

import pandas as pd

import metran

metran.show_versions()
Python version: 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:40:32) [GCC 12.3.0]
numpy version: 1.26.4
scipy version: 1.12.0
pandas version: 2.2.0
matplotlib version: 3.7.2
pastas version: 1.4.0
numba version: 0.59.0
lmfit version: 1.2.2

Read example data

Read residuals from time series analysis models for 5 piezometers at different depths at location B21B0214. (The time series models are not shown here, only the resulting residuals.)

[2]:
residuals = {}
rfiles = [
    os.path.join("./data", f) for f in os.listdir("./data") if f.endswith("_res.csv")
]

for fi in rfiles:
    name = fi.split(os.sep)[-1].split(".")[0].split("_")[0]
    ts = pd.read_csv(
        fi, header=0, index_col=0, parse_dates=True, date_format="%Y-%m-%d"
    )
    residuals[name] = ts
[3]:
# sort names (not necessary, but ensures the order of things)
sorted_names = list(residuals.keys())
sorted_names.sort()
sorted_names
[3]:
['B21B0214001', 'B21B0214002', 'B21B0214003', 'B21B0214004', 'B21B0214005']

Create Metran model

First collect series in a list with their unique IDs.

[4]:
series = []

for name in sorted_names:
    ts = residuals[name]
    ts.columns = [name]
    series.append(ts)

Create the Metran model and solve.

[5]:
mt = metran.Metran(series, name="B21B0214")
mt.solve()
INFO: Number of factors according to Velicer's MAP test: 1

Fit report B21B0214                  Fit Statistics
=======================================================
tmin     None                        obj        2332.33
tmax     None                        nfev            77
freq     D                           AIC        2344.33
solver   ScipySolve

Parameters (6 were optimized)
=======================================================
                         optimal   stderr initial  vary
B21B0214001_sdf_alpha   5.501017  ±18.98%    10.0  True
B21B0214002_sdf_alpha  13.560042  ±10.04%    10.0  True
B21B0214003_sdf_alpha   4.682870  ±28.86%    10.0  True
B21B0214004_sdf_alpha  11.381674  ±18.22%    10.0  True
B21B0214005_sdf_alpha  13.140605   ±8.48%    10.0  True
cdf1_alpha             22.980925   ±7.43%    10.0  True

Parameter correlations |rho| > 0.5
=======================================================
None

Metran report B21B0214        Factor Analysis
=============================================
tmin     None                nfct     1
tmax     None                fep     88.32%
freq     D

Communality
=============================================

B21B0214001      73.61%
B21B0214002      87.59%
B21B0214003      93.35%
B21B0214004      91.74%
B21B0214005      81.15%

State parameters
=============================================
                      phi         q
B21B0214001_sdf  0.833781  0.080429
B21B0214002_sdf  0.928908  0.017023
B21B0214003_sdf  0.807716  0.023102
B21B0214004_sdf  0.915889  0.013316
B21B0214005_sdf  0.926724  0.026607
cdf1             0.957419  0.083349

Observation parameters
=============================================
                   gamma1     scale      mean
B21B0214001      0.857982  5.920523 -0.001924
B21B0214002      0.935874  5.565866 -0.055813
B21B0214003      0.966197  5.702295 -0.001265
B21B0214004      0.957794  5.833851 -0.033373
B21B0214005      0.900857  6.234234 -0.022840

State correlations |rho| > 0.5
=============================================
None

Visualizing and accessing Metran results

The results of the Metran can be visualized using the Metran.plots class.

Scree plot

We can draw a scree plot to visualize the eigenvalues (used in determining the number of factors).

[6]:
# Plot eigenvalues in scree plot, see e.g. Fig 2 in JoH paper
ax = mt.plots.scree_plot()
../_images/examples_ex01_metran_practical_example_10_0.png

State means

Plot the calculated state means for each of the specific and common dynamic components:

[7]:
mt.get_state_means()
[7]:
B21B0214001_sdf B21B0214002_sdf B21B0214003_sdf B21B0214004_sdf B21B0214005_sdf cdf1
date
1988-10-14 0.226549 0.021665 0.028548 0.026005 0.153683 0.809228
1988-10-15 0.182900 0.013039 0.026154 0.022935 0.149910 0.790042
1988-10-16 0.145313 0.004483 0.024958 0.020043 0.147006 0.772353
1988-10-17 0.112540 -0.004048 0.024904 0.017306 0.144954 0.756126
1988-10-18 0.083497 -0.012601 0.025990 0.014702 0.143742 0.741331
... ... ... ... ... ... ...
2005-11-24 0.909463 -0.389068 -0.022535 -0.002155 0.133346 -0.152090
2005-11-25 0.940340 -0.416086 -0.021679 -0.017868 0.134755 -0.288478
2005-11-26 1.001189 -0.445368 -0.021816 -0.033719 0.136945 -0.426332
2005-11-27 0.951887 -0.477072 -0.022951 -0.049831 0.139928 -0.675974
2005-11-28 1.068061 -0.511373 -0.025137 -0.066328 0.143722 -0.823190

6255 rows × 6 columns

[8]:
axes = mt.plots.state_means(adjust_height=True)
../_images/examples_ex01_metran_practical_example_13_0.png

Simulations

The simulated mean values for each time series in our Metran model can be obtained with:

[9]:
# Get all (smoothed) simulated state means
means = mt.get_simulated_means()
means.head(10)
[9]:
B21B0214001 B21B0214002 B21B0214003 B21B0214004 B21B0214005
date
1988-10-14 5.450000 4.280000 4.620000 4.640000 5.480000
1988-10-15 5.094121 4.132049 4.500647 4.514891 5.348733
1988-10-16 4.781724 3.992287 4.396365 4.399175 5.231284
1988-10-17 4.505266 3.860280 4.306655 4.292536 5.127359
1988-10-18 4.258161 3.735609 4.231336 4.194678 5.036712
1988-10-19 4.034568 3.617869 4.170535 4.105327 4.959142
1988-10-20 3.829204 3.506668 4.124708 4.024229 4.894493
1988-10-21 3.637170 3.401621 4.094661 3.951149 4.842653
1988-10-22 3.453793 3.302355 4.081594 3.885870 4.803556
1988-10-23 3.274477 3.208502 4.087163 3.828191 4.777178

For obtaining the data for a simulation with the observations and an (optional) confidence interval, use mt.get_simulation().

[10]:
# Get simulated mean for specific series with/without confidence interval
name = "B21B0214005"
sim = mt.get_simulation(name, alpha=0.05)
sim.head(10)
[10]:
mean lower upper
date
1988-10-14 5.480000 5.480000 5.480000
1988-10-15 5.348733 1.669956 9.027510
1988-10-16 5.231284 0.273555 10.189012
1988-10-17 5.127359 -0.640237 10.894955
1988-10-18 5.036712 -1.263478 11.336903
1988-10-19 4.959142 -1.669234 11.587519
1988-10-20 4.894493 -1.891043 11.680029
1988-10-21 4.842653 -1.942746 11.628053
1988-10-22 4.803556 -1.824438 11.431550
1988-10-23 4.777178 -1.522468 11.076824

There is also a method to visualize these results for a single time series including optional confidence interval:

[11]:
ax = mt.plots.simulation("B21B0214005", alpha=0.05)
../_images/examples_ex01_metran_practical_example_19_0.png

Or all for all time series:

[12]:
axes = mt.plots.simulations(alpha=0.05)
../_images/examples_ex01_metran_practical_example_21_0.png

Decompositions

The decomposition of a simulation into specific and common dynamic components can be obtained with mt.decompose_simulation().

[13]:
# Decomposed simulated mean for specific series
decomposition = mt.decompose_simulation("B21B0214001")
decomposition.head(10)
[13]:
sdf cdf1
date
1988-10-14 1.339364 4.110636
1988-10-15 1.080942 4.013179
1988-10-16 0.858403 3.923322
1988-10-17 0.664372 3.840894
1988-10-18 0.492420 3.765741
1988-10-19 0.336849 3.697720
1988-10-20 0.192504 3.636701
1988-10-21 0.054601 3.582569
1988-10-22 -0.081428 3.535222
1988-10-23 -0.220092 3.494570

This can also be visualized:

[14]:
axes = mt.plots.decomposition("B21B0214001", split=True, adjust_height=True)
../_images/examples_ex01_metran_practical_example_25_0.png

Or for all time series

[15]:
axes = mt.plots.decompositions()
../_images/examples_ex01_metran_practical_example_27_0.png

Example application: removing outliers

The Kalman smoother can be re-run after removing (masking) outliers from the observations. This is illustrated below.

First plot the simulation for the original data:

[16]:
name = "B21B0214005"
alpha = 0.05

ax1 = mt.plots.simulation(name, alpha=alpha, tmin="1997-01-01", tmax="1997-12-31")
../_images/examples_ex01_metran_practical_example_29_0.png

Mask (remove) the outlier on 28 august 1997.

[17]:
oseries = mt.get_observations()
mask = (0 * oseries).astype(bool)
mask.loc["1997-8-28", name] = True
mt.mask_observations(mask)

Now plot the simulation again. Note the estimated value and its 95%-confidence interval for the observation on 28 August 1997 based on the common dynamic factor.

[18]:
# remove outlier from series B21B0214005 at 1997-8-28
# and re-run smoother to get estimate of observation
# (Fig 3 in Stromingen without deterministic component)
ax2 = mt.plots.simulation(name, alpha=alpha, tmin="1997-01-01", tmax="1997-12-31")
sim = mt.get_simulation(name, alpha=None).loc[["1997-8-28"]]

# plot outlier and corrected value
outlier = oseries.loc[["1997-8-28"], name]
ax2.plot(outlier.index, outlier, "C3x", label="outlier")
ax2.plot(sim.index, sim, "C2s", label="corrected value", mec="k", mew=0.5)
ax2.legend(loc=(0, 1), numpoints=1, frameon=False, ncol=3)
INFO: Running Kalman filter with masked observations.
../_images/examples_ex01_metran_practical_example_33_1.png

To reset the observations (remove all masks):

[19]:
# unmask observations to get original observations
mt.unmask_observations()

References

  • Van Geer, F.C. en W.L. Berendrecht (2015) Meervoudige tijdreeksmodellen en de samenhang in stijghoogtereeksen. Stromingen 23 nummer 3, pp. 25-36.