Formulas: Fitting models using R-style formulas
===============================================


.. _formulas_notebook:

`Link to Notebook GitHub <https://github.com/statsmodels/statsmodels/blob/master/examples/notebooks/formulas.ipynb>`_

.. raw:: html

   
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>Since version 0.5.0, <code>statsmodels</code> allows users to fit statistical models using R-style formulas. Internally, <code>statsmodels</code> uses the <a href="http://patsy.readthedocs.org/">patsy</a> package to convert formulas and data to the matrices that are used in model fitting. The formula framework is quite powerful; this tutorial only scratches the surface. A full description of the formula language can be found in the <code>patsy</code> docs: </p>
   <ul>
   <li><a href="http://patsy.readthedocs.org/">Patsy formula language description</a></li>
   </ul>
   <h2 id="loading-modules-and-functions">Loading modules and functions</h2>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">
   In&nbsp;[1]:
   </div>
   <div class="inner_cell">
       <div class="input_area">
   <div class="highlight"><pre><span class="kn">from</span> <span class="nn">__future__</span> <span class="kn">import</span> <span class="n">print_function</span>
   <span class="kn">import</span> <span class="nn">numpy</span> <span class="kn">as</span> <span class="nn">np</span>
   <span class="kn">import</span> <span class="nn">statsmodels.api</span> <span class="kn">as</span> <span class="nn">sm</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <h4 id="Import-convention">Import convention<a class="anchor-link" href="#Import-convention">&#182;</a></h4>
   </div>
   </div>
   </div>
   
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>You can import explicitly from statsmodels.formula.api</p>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">
   In&nbsp;[2]:
   </div>
   <div class="inner_cell">
       <div class="input_area">
   <div class="highlight"><pre><span class="kn">from</span> <span class="nn">statsmodels.formula.api</span> <span class="kn">import</span> <span class="n">ols</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>Alternatively, you can just use the <code>formula</code> namespace of the main <code>statsmodels.api</code>.</p>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">
   In&nbsp;[3]:
   </div>
   <div class="inner_cell">
       <div class="input_area">
   <div class="highlight"><pre><span class="n">sm</span><span class="o">.</span><span class="n">formula</span><span class="o">.</span><span class="n">ols</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt output_prompt">
       Out[3]:</div>
   
   
   <div class="output_text output_subarea output_pyout">
   <pre>
   &lt;bound method type.from_formula of &lt;class &apos;statsmodels.regression.linear_model.OLS&apos;&gt;&gt;
   </pre>
   </div>
   
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>Or you can use the following conventioin</p>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">
   In&nbsp;[4]:
   </div>
   <div class="inner_cell">
       <div class="input_area">
   <div class="highlight"><pre><span class="kn">import</span> <span class="nn">statsmodels.formula.api</span> <span class="kn">as</span> <span class="nn">smf</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>These names are just a convenient way to get access to each model&#39;s <code>from_formula</code> classmethod. See, for instance</p>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">
   In&nbsp;[5]:
   </div>
   <div class="inner_cell">
       <div class="input_area">
   <div class="highlight"><pre><span class="n">sm</span><span class="o">.</span><span class="n">OLS</span><span class="o">.</span><span class="n">from_formula</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt output_prompt">
       Out[5]:</div>
   
   
   <div class="output_text output_subarea output_pyout">
   <pre>
   &lt;bound method type.from_formula of &lt;class &apos;statsmodels.regression.linear_model.OLS&apos;&gt;&gt;
   </pre>
   </div>
   
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>All of the lower case models accept <code>formula</code> and <code>data</code> arguments, whereas upper case ones take <code>endog</code> and <code>exog</code> design matrices. <code>formula</code> accepts a string which describes the model in terms of a <code>patsy</code> formula. <code>data</code> takes a <a href="http://pandas.pydata.org/">pandas</a> data frame or any other data structure that defines a <code>__getitem__</code> for variable names like a structured array or a dictionary of variables. </p>
   <p><code>dir(sm.formula)</code> will print a list of available models. </p>
   <p>Formula-compatible models have the following generic call signature: <code>(formula, data, subset=None, *args, **kwargs)</code></p>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <h2 id="ols-regression-using-formulas">OLS regression using formulas</h2>
   <p>To begin, we fit the linear model described on the <a href="gettingstarted.html">Getting Started</a> page. Download the data, subset columns, and list-wise delete to remove missing observations:</p>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">
   In&nbsp;[6]:
   </div>
   <div class="inner_cell">
       <div class="input_area">
   <div class="highlight"><pre><span class="n">dta</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">get_rdataset</span><span class="p">(</span><span class="s">&quot;Guerry&quot;</span><span class="p">,</span> <span class="s">&quot;HistData&quot;</span><span class="p">,</span> <span class="n">cache</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">
   In&nbsp;[7]:
   </div>
   <div class="inner_cell">
       <div class="input_area">
   <div class="highlight"><pre><span class="n">df</span> <span class="o">=</span> <span class="n">dta</span><span class="o">.</span><span class="n">data</span><span class="p">[[</span><span class="s">&#39;Lottery&#39;</span><span class="p">,</span> <span class="s">&#39;Literacy&#39;</span><span class="p">,</span> <span class="s">&#39;Wealth&#39;</span><span class="p">,</span> <span class="s">&#39;Region&#39;</span><span class="p">]]</span><span class="o">.</span><span class="n">dropna</span><span class="p">()</span>
   <span class="n">df</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt output_prompt">
       Out[7]:</div>
   
   <div class="output_html rendered_html output_subarea output_pyout">
   <div style="max-height:1000px;max-width:1500px;overflow:auto;">
   <table border="1" class="dataframe">
     <thead>
       <tr style="text-align: right;">
         <th></th>
         <th>Lottery</th>
         <th>Literacy</th>
         <th>Wealth</th>
         <th>Region</th>
       </tr>
     </thead>
     <tbody>
       <tr>
         <th>0</th>
         <td> 41</td>
         <td> 37</td>
         <td> 73</td>
         <td> E</td>
       </tr>
       <tr>
         <th>1</th>
         <td> 38</td>
         <td> 51</td>
         <td> 22</td>
         <td> N</td>
       </tr>
       <tr>
         <th>2</th>
         <td> 66</td>
         <td> 13</td>
         <td> 61</td>
         <td> C</td>
       </tr>
       <tr>
         <th>3</th>
         <td> 80</td>
         <td> 46</td>
         <td> 76</td>
         <td> E</td>
       </tr>
       <tr>
         <th>4</th>
         <td> 79</td>
         <td> 69</td>
         <td> 83</td>
         <td> E</td>
       </tr>
     </tbody>
   </table>
   </div>
   </div>
   
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>Fit the model:</p>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">
   In&nbsp;[8]:
   </div>
   <div class="inner_cell">
       <div class="input_area">
   <div class="highlight"><pre><span class="n">mod</span> <span class="o">=</span> <span class="n">ols</span><span class="p">(</span><span class="n">formula</span><span class="o">=</span><span class="s">&#39;Lottery ~ Literacy + Wealth + Region&#39;</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">df</span><span class="p">)</span>
   <span class="n">res</span> <span class="o">=</span> <span class="n">mod</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span>
   <span class="k">print</span><span class="p">(</span><span class="n">res</span><span class="o">.</span><span class="n">summary</span><span class="p">())</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   <div class="output_subarea output_stream output_stdout output_text">
   <pre>
                               OLS Regression Results                            
   ==============================================================================
   Dep. Variable:                Lottery   R-squared:                       0.338
   Model:                            OLS   Adj. R-squared:                  0.287
   Method:                 Least Squares   F-statistic:                     6.636
   Date:                Tue, 25 Aug 2015   Prob (F-statistic):           1.07e-05
   Time:                        00:45:47   Log-Likelihood:                -375.30
   No. Observations:                  85   AIC:                             764.6
   Df Residuals:                      78   BIC:                             781.7
   Df Model:                           6                                         
   Covariance Type:            nonrobust                                         
   ===============================================================================
                     coef    std err          t      P&gt;|t|      [95.0% Conf. Int.]
   -------------------------------------------------------------------------------
   Intercept      38.6517      9.456      4.087      0.000        19.826    57.478
   Region[T.E]   -15.4278      9.727     -1.586      0.117       -34.793     3.938
   Region[T.N]   -10.0170      9.260     -1.082      0.283       -28.453     8.419
   Region[T.S]    -4.5483      7.279     -0.625      0.534       -19.039     9.943
   Region[T.W]   -10.0913      7.196     -1.402      0.165       -24.418     4.235
   Literacy       -0.1858      0.210     -0.886      0.378        -0.603     0.232
   Wealth          0.4515      0.103      4.390      0.000         0.247     0.656
   ==============================================================================
   Omnibus:                        3.049   Durbin-Watson:                   1.785
   Prob(Omnibus):                  0.218   Jarque-Bera (JB):                2.694
   Skew:                          -0.340   Prob(JB):                        0.260
   Kurtosis:                       2.454   Cond. No.                         371.
   ==============================================================================
   
   Warnings:
   [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
   
   </pre>
   </div>
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <h2 id="categorical-variables">Categorical variables</h2>
   <p>Looking at the summary printed above, notice that <code>patsy</code> determined that elements of <em>Region</em> were text strings, so it treated <em>Region</em> as a categorical variable. <code>patsy</code>&#39;s default is also to include an intercept, so we automatically dropped one of the <em>Region</em> categories.</p>
   <p>If <em>Region</em> had been an integer variable that we wanted to treat explicitly as categorical, we could have done so by using the <code>C()</code> operator: </p>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">
   In&nbsp;[9]:
   </div>
   <div class="inner_cell">
       <div class="input_area">
   <div class="highlight"><pre><span class="n">res</span> <span class="o">=</span> <span class="n">ols</span><span class="p">(</span><span class="n">formula</span><span class="o">=</span><span class="s">&#39;Lottery ~ Literacy + Wealth + C(Region)&#39;</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">df</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span>
   <span class="k">print</span><span class="p">(</span><span class="n">res</span><span class="o">.</span><span class="n">params</span><span class="p">)</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   <div class="output_subarea output_stream output_stdout output_text">
   <pre>
   Intercept         38.651655
   C(Region)[T.E]   -15.427785
   C(Region)[T.N]   -10.016961
   C(Region)[T.S]    -4.548257
   C(Region)[T.W]   -10.091276
   Literacy          -0.185819
   Wealth             0.451475
   dtype: float64
   
   </pre>
   </div>
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>Patsy&#39;s mode advanced features for categorical variables are discussed in: <a href="contrasts.html">Patsy: Contrast Coding Systems for categorical variables</a></p>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <h2 id="operators">Operators</h2>
   <p>We have already seen that &quot;~&quot; separates the left-hand side of the model from the right-hand side, and that &quot;+&quot; adds new columns to the design matrix. </p>
   <h3 id="removing-variables">Removing variables</h3>
   <p>The &quot;-&quot; sign can be used to remove columns/variables. For instance, we can remove the intercept from a model by: </p>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">
   In&nbsp;[10]:
   </div>
   <div class="inner_cell">
       <div class="input_area">
   <div class="highlight"><pre><span class="n">res</span> <span class="o">=</span> <span class="n">ols</span><span class="p">(</span><span class="n">formula</span><span class="o">=</span><span class="s">&#39;Lottery ~ Literacy + Wealth + C(Region) -1 &#39;</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">df</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span>
   <span class="k">print</span><span class="p">(</span><span class="n">res</span><span class="o">.</span><span class="n">params</span><span class="p">)</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   <div class="output_subarea output_stream output_stdout output_text">
   <pre>
   C(Region)[C]    38.651655
   C(Region)[E]    23.223870
   C(Region)[N]    28.634694
   C(Region)[S]    34.103399
   C(Region)[W]    28.560379
   Literacy        -0.185819
   Wealth           0.451475
   dtype: float64
   
   </pre>
   </div>
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <h3 id="multiplicative-interactions">Multiplicative interactions</h3>
   <p>&quot;:&quot; adds a new column to the design matrix with the interaction of the other two columns. &quot;*&quot; will also include the individual columns that were multiplied together:</p>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">
   In&nbsp;[11]:
   </div>
   <div class="inner_cell">
       <div class="input_area">
   <div class="highlight"><pre><span class="n">res1</span> <span class="o">=</span> <span class="n">ols</span><span class="p">(</span><span class="n">formula</span><span class="o">=</span><span class="s">&#39;Lottery ~ Literacy : Wealth - 1&#39;</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">df</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span>
   <span class="n">res2</span> <span class="o">=</span> <span class="n">ols</span><span class="p">(</span><span class="n">formula</span><span class="o">=</span><span class="s">&#39;Lottery ~ Literacy * Wealth - 1&#39;</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">df</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span>
   <span class="k">print</span><span class="p">(</span><span class="n">res1</span><span class="o">.</span><span class="n">params</span><span class="p">,</span> <span class="s">&#39;</span><span class="se">\n</span><span class="s">&#39;</span><span class="p">)</span>
   <span class="k">print</span><span class="p">(</span><span class="n">res2</span><span class="o">.</span><span class="n">params</span><span class="p">)</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   <div class="output_subarea output_stream output_stdout output_text">
   <pre>
   Literacy:Wealth    0.018176
   dtype: float64 
   
   Literacy           0.427386
   Wealth             1.080987
   Literacy:Wealth   -0.013609
   dtype: float64
   
   </pre>
   </div>
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>Many other things are possible with operators. Please consult the <a href="https://patsy.readthedocs.org/en/latest/formulas.html">patsy docs</a> to learn more.</p>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <h2 id="functions">Functions</h2>
   <p>You can apply vectorized functions to the variables in your model: </p>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">
   In&nbsp;[12]:
   </div>
   <div class="inner_cell">
       <div class="input_area">
   <div class="highlight"><pre><span class="n">res</span> <span class="o">=</span> <span class="n">smf</span><span class="o">.</span><span class="n">ols</span><span class="p">(</span><span class="n">formula</span><span class="o">=</span><span class="s">&#39;Lottery ~ np.log(Literacy)&#39;</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">df</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span>
   <span class="k">print</span><span class="p">(</span><span class="n">res</span><span class="o">.</span><span class="n">params</span><span class="p">)</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   <div class="output_subarea output_stream output_stdout output_text">
   <pre>
   Intercept           115.609119
   np.log(Literacy)    -20.393959
   dtype: float64
   
   </pre>
   </div>
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>Define a custom function:</p>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">
   In&nbsp;[13]:
   </div>
   <div class="inner_cell">
       <div class="input_area">
   <div class="highlight"><pre><span class="k">def</span> <span class="nf">log_plus_1</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
       <span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="o">+</span> <span class="mf">1.</span>
   <span class="n">res</span> <span class="o">=</span> <span class="n">smf</span><span class="o">.</span><span class="n">ols</span><span class="p">(</span><span class="n">formula</span><span class="o">=</span><span class="s">&#39;Lottery ~ log_plus_1(Literacy)&#39;</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">df</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span>
   <span class="k">print</span><span class="p">(</span><span class="n">res</span><span class="o">.</span><span class="n">params</span><span class="p">)</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   <div class="output_subarea output_stream output_stdout output_text">
   <pre>
   Intercept               136.003079
   log_plus_1(Literacy)    -20.393959
   dtype: float64
   
   </pre>
   </div>
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>Any function that is in the calling namespace is available to the formula.</p>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <h2 id="using-formulas-with-models-that-do-not-yet-support-them">Using formulas with models that do not (yet) support them</h2>
   <p>Even if a given <code>statsmodels</code> function does not support formulas, you can still use <code>patsy</code>&#39;s formula language to produce design matrices. Those matrices 
   can then be fed to the fitting function as <code>endog</code> and <code>exog</code> arguments. </p>
   <p>To generate <code>numpy</code> arrays: </p>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">
   In&nbsp;[14]:
   </div>
   <div class="inner_cell">
       <div class="input_area">
   <div class="highlight"><pre><span class="kn">import</span> <span class="nn">patsy</span>
   <span class="n">f</span> <span class="o">=</span> <span class="s">&#39;Lottery ~ Literacy * Wealth&#39;</span>
   <span class="n">y</span><span class="p">,</span><span class="n">X</span> <span class="o">=</span> <span class="n">patsy</span><span class="o">.</span><span class="n">dmatrices</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">df</span><span class="p">,</span> <span class="n">return_type</span><span class="o">=</span><span class="s">&#39;dataframe&#39;</span><span class="p">)</span>
   <span class="k">print</span><span class="p">(</span><span class="n">y</span><span class="p">[:</span><span class="mi">5</span><span class="p">])</span>
   <span class="k">print</span><span class="p">(</span><span class="n">X</span><span class="p">[:</span><span class="mi">5</span><span class="p">])</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   <div class="output_subarea output_stream output_stdout output_text">
   <pre>
      Lottery
   0       41
   1       38
   2       66
   3       80
   4       79
      Intercept  Literacy  Wealth  Literacy:Wealth
   0          1        37      73             2701
   1          1        51      22             1122
   2          1        13      61              793
   3          1        46      76             3496
   4          1        69      83             5727
   
   </pre>
   </div>
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>To generate pandas data frames: </p>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">
   In&nbsp;[15]:
   </div>
   <div class="inner_cell">
       <div class="input_area">
   <div class="highlight"><pre><span class="n">f</span> <span class="o">=</span> <span class="s">&#39;Lottery ~ Literacy * Wealth&#39;</span>
   <span class="n">y</span><span class="p">,</span><span class="n">X</span> <span class="o">=</span> <span class="n">patsy</span><span class="o">.</span><span class="n">dmatrices</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">df</span><span class="p">,</span> <span class="n">return_type</span><span class="o">=</span><span class="s">&#39;dataframe&#39;</span><span class="p">)</span>
   <span class="k">print</span><span class="p">(</span><span class="n">y</span><span class="p">[:</span><span class="mi">5</span><span class="p">])</span>
   <span class="k">print</span><span class="p">(</span><span class="n">X</span><span class="p">[:</span><span class="mi">5</span><span class="p">])</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   <div class="output_subarea output_stream output_stdout output_text">
   <pre>
      Lottery
   0       41
   1       38
   2       66
   3       80
   4       79
      Intercept  Literacy  Wealth  Literacy:Wealth
   0          1        37      73             2701
   1          1        51      22             1122
   2          1        13      61              793
   3          1        46      76             3496
   4          1        69      83             5727
   
   </pre>
   </div>
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">
   In&nbsp;[16]:
   </div>
   <div class="inner_cell">
       <div class="input_area">
   <div class="highlight"><pre><span class="k">print</span><span class="p">(</span><span class="n">sm</span><span class="o">.</span><span class="n">OLS</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">X</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span><span class="o">.</span><span class="n">summary</span><span class="p">())</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   <div class="output_subarea output_stream output_stdout output_text">
   <pre>
                               OLS Regression Results                            
   ==============================================================================
   Dep. Variable:                Lottery   R-squared:                       0.309
   Model:                            OLS   Adj. R-squared:                  0.283
   Method:                 Least Squares   F-statistic:                     12.06
   Date:                Tue, 25 Aug 2015   Prob (F-statistic):           1.32e-06
   Time:                        00:45:48   Log-Likelihood:                -377.13
   No. Observations:                  85   AIC:                             762.3
   Df Residuals:                      81   BIC:                             772.0
   Df Model:                           3                                         
   Covariance Type:            nonrobust                                         
   ===================================================================================
                         coef    std err          t      P&gt;|t|      [95.0% Conf. Int.]
   -----------------------------------------------------------------------------------
   Intercept          38.6348     15.825      2.441      0.017         7.149    70.121
   Literacy           -0.3522      0.334     -1.056      0.294        -1.016     0.312
   Wealth              0.4364      0.283      1.544      0.126        -0.126     0.999
   Literacy:Wealth    -0.0005      0.006     -0.085      0.933        -0.013     0.012
   ==============================================================================
   Omnibus:                        4.447   Durbin-Watson:                   1.953
   Prob(Omnibus):                  0.108   Jarque-Bera (JB):                3.228
   Skew:                          -0.332   Prob(JB):                        0.199
   Kurtosis:                       2.314   Cond. No.                     1.40e+04
   ==============================================================================
   
   Warnings:
   [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
   [2] The condition number is large, 1.4e+04. This might indicate that there are
   strong multicollinearity or other numerical problems.
   
   </pre>
   </div>
   </div>
   
   </div>
   </div>
   
   </div>

   <script src="https://c328740.ssl.cf1.rackcdn.com/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"type="text/javascript"></script>
   <script type="text/javascript">
   init_mathjax = function() {
       if (window.MathJax) {
           // MathJax loaded
           MathJax.Hub.Config({
               tex2jax: {
               // I'm not sure about the \( and \[ below. It messes with the
               // prompt, and I think it's an issue with the template. -SS
                   inlineMath: [ ['$','$'], ["\\(","\\)"] ],
                   displayMath: [ ['$$','$$'], ["\\[","\\]"] ]
               },
               displayAlign: 'left', // Change this to 'center' to center equations.
               "HTML-CSS": {
                   styles: {'.MathJax_Display': {"margin": 0}}
               }
           });
           MathJax.Hub.Queue(["Typeset",MathJax.Hub]);
       }
   }
   init_mathjax();

   // since we have to load this in a ..raw:: directive we will add the css
   // after the fact
   function loadcssfile(filename){
       var fileref=document.createElement("link")
       fileref.setAttribute("rel", "stylesheet")
       fileref.setAttribute("type", "text/css")
       fileref.setAttribute("href", filename)

       document.getElementsByTagName("head")[0].appendChild(fileref)
   }
   // loadcssfile({{pathto("_static/nbviewer.pygments.css", 1) }})
   // loadcssfile({{pathto("_static/nbviewer.min.css", 1) }})
   loadcssfile("../../../_static/nbviewer.pygments.css")
   loadcssfile("../../../_static/ipython.min.css")
   </script>