Sunday, February 7, 2016

Countries with high Internet use have a high Alcohol Consumption


The Study uses the Gapminder Dataset shows a correlation between Alcohol and Internet. After controlling for urbanization rate, the a doubling in internet usage accounts for beta=11.5% of Alcohol consumption (p< 0,005). A second order Regression alcconsumption ~ urbanrate_c + I(urbanrate_c**2) + internetuserate_c shows a that 34% of the variability can be explained by the model, R^2=0,343, F statistic= 2.56e-11.

Intercept sayst that Alcohol consumption with average internet use and urbanisation rate (variables are centered) is 8.63 litres (estimated average alcohol consumption, adult (15+) per capita consumption in litres pure alcohol)

urban rate (centered) has a small negative effect ~ -6% on alcohol consumption, but increases the explanatory power about 10% (compare the R^2 of the models with/without control for urbanization)

I(urbanrate_c ** 2) is significant with littel effect (- 0,31%)

                            OLS Regression Results                            
==============================================================================
Dep. Variable:         alcconsumption   R-squared:                       0.343
Model:                            OLS   Adj. R-squared:                  0.327
Method:                 Least Squares   F-statistic:                     21.59
Date:                Sun, 07 Feb 2016   Prob (F-statistic):           2.56e-11
Time:                        12:40:28   Log-Likelihood:                -365.20
No. Observations:                 128   AIC:                             738.4
Df Residuals:                     124   BIC:                             749.8
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
=======================================================================================
                          coef    std err          t      P>|t|      [95.0% Conf. Int.]
---------------------------------------------------------------------------------------
Intercept               8.6291      0.501     17.230      0.000         7.638     9.620
urbanrate_c            -0.0596      0.024     -2.466      0.015        -0.107    -0.012
I(urbanrate_c ** 2)    -0.0031      0.001     -3.980      0.000        -0.005    -0.002
internetuserate_c       0.1149      0.018      6.558      0.000         0.080     0.150
==============================================================================
Omnibus:                        5.219   Durbin-Watson:                   1.972
Prob(Omnibus):                  0.074   Jarque-Bera (JB):                5.213
Skew:                           0.316   Prob(JB):                       0.0738
Kurtosis:                       3.760   Cond. No.                         891.
==============================================================================

The result strongly suggests that there is a correlation between alcohol and internet use. It is both highly statistically significant (p>0,0005) and has an effect of  beta = 11.5%

There is no evidence of confounding since the internet use rate did not lose statistical significance after the addition of urbanization rate as an explanatory variable.


q-q plot: Residuals are pretty normaly distributed with two outliers on the upper quantiles. This indicates that there is very little systematic error, so probably the model can not be optimized further by adding other variables.


standardized residuals for all observations: We see that most of the residuals are in between two standard deviations. There are few major outliers, so the model captures well the variability in the dataset.




Also, these plots show that the data is pretty normal and explain the response variable well. "Urbanrate" has normally distributed residuals and a well fitting regression line with small negative effect on the response.

leverage plot: The data that has the most influence on the regression outcome has little residuals, which supports our claim that the model fits well the data and has predictive power.

No comments: