{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n# Data transformations\n\nThe statistics of intermittent precipitation rates are particularly non-Gaussian\nand display an asymmetric distribution bounded at zero.\nSuch properties restrict the usage of well-established statistical methods that\nassume symmetric or Gaussian data.\n\nA common workaround is to introduce a suitable data transformation to approximate\na normal distribution.\n\nIn this example, we test the data transformation methods available in pysteps\nin order to obtain a more symmetric distribution of the precipitation data\n(excluding the zeros).\nThe currently available transformations include the Box-Cox, dB, square-root and\nnormal quantile transforms.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from datetime import datetime\nimport matplotlib.pyplot as plt\nimport numpy as np\nfrom pysteps import io, rcparams\nfrom pysteps.utils import conversion, transformation\nfrom scipy.stats import skew"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Read the radar input images\n\nFirst, we will import the sequence of radar composites.\nYou need the pysteps-data archive downloaded and the pystepsrc file\nconfigured with the data_source paths pointing to data folders.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# Selected case\ndate = datetime.strptime(\"201609281600\", \"%Y%m%d%H%M\")\ndata_source = rcparams.data_sources[\"fmi\"]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Load the data from the archive\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "root_path = data_source[\"root_path\"]\npath_fmt = data_source[\"path_fmt\"]\nfn_pattern = data_source[\"fn_pattern\"]\nfn_ext = data_source[\"fn_ext\"]\nimporter_name = data_source[\"importer\"]\nimporter_kwargs = data_source[\"importer_kwargs\"]\ntimestep = data_source[\"timestep\"]\n\n# Get 1 hour of observations in the data archive\nfns = io.archive.find_by_date(\n    date, root_path, path_fmt, fn_pattern, fn_ext, timestep, num_next_files=11\n)\n\n# Read the radar composites\nimporter = io.get_method(importer_name, \"importer\")\nZ, _, metadata = io.read_timeseries(fns, importer, **importer_kwargs)\n\n# Keep only positive rainfall values\nZ = Z[Z > metadata[\"zerovalue\"]].flatten()\n\n# Convert to rain rate\nR, metadata = conversion.to_rainrate(Z, metadata)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Test data transformations\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# Define method to visualize the data distribution with boxplots and plot the\n# corresponding skewness\ndef plot_distribution(data, labels, skw):\n\n    N = len(data)\n    fig, ax1 = plt.subplots()\n    ax2 = ax1.twinx()\n\n    ax2.plot(np.arange(N + 2), np.zeros(N + 2), \":r\")\n    ax1.boxplot(data, labels=labels, sym=\"\", medianprops={\"color\": \"k\"})\n\n    ymax = []\n    for i in range(N):\n        y = skw[i]\n        x = i + 1\n        ax2.plot(x, y, \"*r\", ms=10, markeredgecolor=\"k\")\n        ymax.append(np.max(data[i]))\n\n    # ylims\n    ylims = np.percentile(ymax, 50)\n    ax1.set_ylim((-1 * ylims, ylims))\n    ylims = np.max(np.abs(skw))\n    ax2.set_ylim((-1.1 * ylims, 1.1 * ylims))\n\n    # labels\n    ax1.set_ylabel(r\"Standardized values [$\\sigma$]\")\n    ax2.set_ylabel(r\"Skewness []\", color=\"r\")\n    ax2.tick_params(axis=\"y\", labelcolor=\"r\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Box-Cox transform\nThe Box-Cox transform is a well-known power transformation introduced by\n`Box and Cox (1964)`_. In its one-parameter version, the Box-Cox transform\ntakes the form T(x) = ln(x) for lambda = 0, or T(x) = (x**lambda - 1)/lambda\notherwise.\n\nTo find a suitable lambda, we will experiment with a range of values\nand select the one that produces the most symmetric distribution, i.e., the\nlambda associated with a value of skewness closest to zero.\nTo visually compare the results, the transformed data are standardized.\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "data = []\nlabels = []\nskw = []\n\n# Test a range of values for the transformation parameter Lambda\nLambdas = np.linspace(-0.4, 0.4, 11)\nfor i, Lambda in enumerate(Lambdas):\n    R_, _ = transformation.boxcox_transform(R, metadata, Lambda)\n    R_ = (R_ - np.mean(R_)) / np.std(R_)\n    data.append(R_)\n    labels.append(\"{0:.2f}\".format(Lambda))\n    skw.append(skew(R_))  # skewness\n\n# Plot the transformed data distribution as a function of lambda\nplot_distribution(data, labels, skw)\nplt.title(\"Box-Cox transform\")\nplt.tight_layout()\nplt.show()\n\n# Best lambda\nidx_best = np.argmin(np.abs(skw))\nLambda = Lambdas[idx_best]\n\nprint(\"Best parameter lambda: %.2f\\n(skewness = %.2f)\" % (Lambda, skw[idx_best]))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Compare data transformations\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "data = []\nlabels = []\nskw = []"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Rain rates\nFirst, let's have a look at the original rain rate values.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "data.append((R - np.mean(R)) / np.std(R))\nlabels.append(\"R\")\nskw.append(skew(R))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### dB transform\nWe transform the rainfall data into dB units: 10*log(R)\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "R_, _ = transformation.dB_transform(R, metadata)\ndata.append((R_ - np.mean(R_)) / np.std(R_))\nlabels.append(\"dB\")\nskw.append(skew(R_))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Square-root transform\nTransform the data using the square-root: sqrt(R)\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "R_, _ = transformation.sqrt_transform(R, metadata)\ndata.append((R_ - np.mean(R_)) / np.std(R_))\nlabels.append(\"sqrt\")\nskw.append(skew(R_))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Box-Cox transform\nWe now apply the Box-Cox transform using the best parameter lambda found above.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "R_, _ = transformation.boxcox_transform(R, metadata, Lambda)\ndata.append((R_ - np.mean(R_)) / np.std(R_))\nlabels.append(\"Box-Cox\\n($\\lambda=$%.2f)\" % Lambda)\nskw.append(skew(R_))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Normal quantile transform\nAt last, we apply the empirical normal quantile (NQ) transform as described in\n`Bogner et al (2012)`_.\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "R_, _ = transformation.NQ_transform(R, metadata)\ndata.append((R_ - np.mean(R_)) / np.std(R_))\nlabels.append(\"NQ\")\nskw.append(skew(R_))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "By plotting all the results, we can notice first of all the strongly asymmetric\ndistribution of the original data (R) and that all transformations manage to\nreduce its skewness. Among these, the Box-Cox transform (using the best parameter\nlambda) and the normal quantile (NQ) transform provide the best correction.\nDespite not producing a perfectly symmetric distribution, the square-root (sqrt)\ntransform has the strong advantage of being defined for zeros, too, while all\nother transformations need an arbitrary rule for non-positive values.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "plot_distribution(data, labels, skw)\nplt.title(\"Data transforms\")\nplt.tight_layout()\nplt.show()"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.6"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}