The Study uses the Gapminder Dataset shows a correlation between Alcohol and Internet. After controlling for urbanization rate, the a doubling in internet usage accounts for beta=11.5% of Alcohol consumption (p< 0,005). A second order Regression alcconsumption ~ urbanrate_c + I(urbanrate_c**2) + internetuserate_c shows a that 34% of the variability can be explained by the model, R^2=0,343, F statistic= 2.56e-11.
Intercept sayst that Alcohol consumption with average internet use and urbanisation rate (variables are centered) is 8.63 litres (estimated average alcohol consumption, adult (15+) per capita consumption in litres pure alcohol)
urban rate (centered) has a small negative effect ~ -6% on alcohol consumption, but increases the explanatory power about 10% (compare the R^2 of the models with/without control for urbanization)
I(urbanrate_c ** 2) is significant with littel effect (- 0,31%)
OLS Regression Results
==============================================================================
Dep. Variable: alcconsumption R-squared: 0.343
Model: OLS Adj. R-squared: 0.327
Method: Least Squares F-statistic: 21.59
Date: Sun, 07 Feb 2016 Prob (F-statistic): 2.56e-11
Time: 12:40:28 Log-Likelihood: -365.20
No. Observations: 128 AIC: 738.4
Df Residuals: 124 BIC: 749.8
Df Model: 3
Covariance Type: nonrobust
=======================================================================================
coef std err t P>|t| [95.0% Conf. Int.]
---------------------------------------------------------------------------------------
Intercept 8.6291 0.501 17.230 0.000 7.638 9.620
urbanrate_c -0.0596 0.024 -2.466 0.015 -0.107 -0.012
I(urbanrate_c ** 2) -0.0031 0.001 -3.980 0.000 -0.005 -0.002
internetuserate_c 0.1149 0.018 6.558 0.000 0.080 0.150
==============================================================================
Omnibus: 5.219 Durbin-Watson: 1.972
Prob(Omnibus): 0.074 Jarque-Bera (JB): 5.213
Skew: 0.316 Prob(JB): 0.0738
Kurtosis: 3.760 Cond. No. 891.
==============================================================================
There is no evidence of confounding since the internet use rate did not lose statistical significance after the addition of urbanization rate as an explanatory variable.
q-q plot: Residuals are pretty normaly distributed with two outliers on the upper quantiles. This indicates that there is very little systematic error, so probably the model can not be optimized further by adding other variables.
standardized residuals for all observations: We see that most of the residuals are in between two standard deviations. There are few major outliers, so the model captures well the variability in the dataset.
Also, these plots show that the data is pretty normal and explain the response variable well. "Urbanrate" has normally distributed residuals and a well fitting regression line with small negative effect on the response.
leverage plot: The data that has the most influence on the regression outcome has little residuals, which supports our claim that the model fits well the data and has predictive power.
No comments:
Post a Comment