Logo

Robust Linear Models

In [1]: import numpy as np

In [1]: import statsmodels.api as sm

In [1]: import matplotlib.pyplot as plt

In [1]: from statsmodels.sandbox.regression.predstd import wls_prediction_std

Estimating RLM

Load data

In [1]: data = sm.datasets.stackloss.load()

In [1]: data.exog = sm.add_constant(data.exog)

Huber’s T norm with the (default) median absolute deviation scaling

In [1]: huber_t = sm.RLM(data.endog, data.exog, M=sm.robust.norms.HuberT())

In [1]: hub_results = huber_t.fit()

In [1]: print hub_results.params

In [1]: print hub_results.bse

In [1]: varnames = ['var_%d' % i for i in range(len(hub_results.params))]

In [1]: print hub_results.summary(yname='y', xname=varnames)

Huber’s T norm with ‘H2’ covariance matrix

In [1]: hub_results2 = huber_t.fit(cov="H2")

In [1]: print hub_results2.params

In [1]: print hub_results2.bse

Andrew’s Wave norm with Huber’s Proposal 2 scaling and ‘H3’ covariance matrix

In [1]: andrew_mod = sm.RLM(data.endog, data.exog, M=sm.robust.norms.AndrewWave())

In [1]: andrew_results = andrew_mod.fit(scale_est=sm.robust.scale.HuberScale(),
   ...:                                 cov='H3')
   ...: 

In [1]: print andrew_results.params

See help(sm.RLM.fit) for more options and module sm.robust.scale for scale options

Comparing OLS and RLM

Artificial data

In [1]: nsample = 50

In [1]: x1 = np.linspace(0, 20, nsample)

In [1]: X = np.c_[x1, (x1 - 5)**2, np.ones(nsample)]

In [1]: sig = 0.3   # smaller error variance makes OLS<->RLM contrast bigger

In [1]: beta = [0.5, -0.0, 5.]

In [1]: y_true2 = np.dot(X, beta)

In [1]: y2 = y_true2 + sig * 1. * np.random.normal(size=nsample)

In [1]: y2[[39, 41, 43, 45, 48]] -= 5   # add some outliers (10% of nsample)

Example: quadratic function with linear truth

Note that the quadratic term in OLS regression will capture outlier effects.

In [1]: res = sm.OLS(y2, X).fit()

In [1]: print res.params

In [1]: print res.bse

In [1]: print res.predict

Estimate RLM

In [1]: resrlm = sm.RLM(y2, X).fit()

In [1]: print resrlm.params

In [1]: print resrlm.bse

Draw a plot to compare OLS estimates to the robust estimates

In [1]: plt.figure();

In [1]: plt.plot(x1, y2, 'o', x1, y_true2, 'b-');

In [1]: prstd, iv_l, iv_u = wls_prediction_std(res);

In [1]: plt.plot(x1, res.fittedvalues, 'r-');

In [1]: plt.plot(x1, iv_u, 'r--');

In [1]: plt.plot(x1, iv_l, 'r--');

In [1]: plt.plot(x1, resrlm.fittedvalues, 'g.-');

In [1]: plt.title('blue: true,   red: OLS,   green: RLM');
examples/generated/../../_static/rlm_ols_0.png

Example: linear function with linear truth

Fit a new OLS model using only the linear term and the constant

In [1]: X2 = X[:, [0, 2]]

In [1]: res2 = sm.OLS(y2, X2).fit()

In [1]: print res2.params

In [1]: print res2.bse

Estimate RLM

In [1]: resrlm2 = sm.RLM(y2, X2).fit()

In [1]: print resrlm2.params

In [1]: print resrlm2.bse

Draw a plot to compare OLS estimates to the robust estimates

In [1]: prstd, iv_l, iv_u = wls_prediction_std(res2)

In [1]: plt.figure();

In [1]: plt.plot(x1, y2, 'o', x1, y_true2, 'b-');

In [1]: plt.plot(x1, res2.fittedvalues, 'r-');

In [1]: plt.plot(x1, iv_u, 'r--');

In [1]: plt.plot(x1, iv_l, 'r--');

In [1]: plt.plot(x1, resrlm2.fittedvalues, 'g.-');

In [1]: plt.title('blue: true,   red: OLS,   green: RLM');
examples/generated/../../_static/rlm_ols_1.png