Wednesday, March 30, 2011

Fully Automated Prediction with Random Forests

Put the input vector down each of the trees in the forest. Each tree gives a classification, and we say the tree "votes" for that class. The forest chooses the classification having the most votes (over all the trees in the forest).
Basically Random Forests automatically generate many decision trees with mostly weak predictive goodness, and gain high predictive power by averaging them out. The algo can be sketched like this:
  1. randomly sample variables and predictors, repeat:
    1. identifying a predictor
    2. repeat down the tree
      1. seeking the most correlated variable
      2. make a binary decision out of it
  2. combine all predictions and average them out, voila!

Result: high predictive goodness sans parameters!

http://stat-www.berkeley.edu/users/breiman/RandomForests/

Tuesday, March 15, 2011

Switching of the Nuclear Powerplants in Germany now?

Today chancellor Merkel announced to switch of  7 nuclear power plants as a consequence of the fukushima catastrophe.
The decision means cutting 10% to 15% of Germanys power consumption, using the net power production median of the weakest nuclear powerplants (890 MW) and the median of the overall powerplants (1288 MW)

Update: they named the plants in question


de.wikipedia.org/wiki/Liste_der_Kernreaktoren_in_Deutschland
UnterweserKKUNI NIE.ON1.4101.345


Biblis BKWB BHE HERWE1.3001.240








Biblis AKWB AHE HERWE1.2251.167








Philippsburg 1KKP 1BW BWEnBW926890
Isar/Ohu 1KKI 1BY BYE.ON912878
Neckarwestheim 1GKN 1BW BWEnBW840785








BrunsbüttelKKBSH SHVattenfall806771








These have a net power of 771+785+878+890+1167+1240+1345= 7076 MW or 11.3% of annual consumption.

Update: the last data on power trade balance shows a surplus of 22 TWh

Monday, March 14, 2011

Hot Bozen



Since I study time series analysis, here a little exercise with data from Hydrographisches Amt South Tyrol:
Its getting hotter:
decomposing mean temperature in Bozen 1981-2010, 
the observed data is splitted into a seasonal component and a level

The code for R:
bz<-read.csv("data/bozen.csv.txt", header=FALSE)
bz<-ts(bz[,1], start=c(1981,1), frequency = 12)
plot(decompose(bz))





+2°C is the mean temperature increase since 1971



Wednesday, March 9, 2011

R-Quote of the Day

"It is not meant to be the big data distributed crunching engine; it is more like the intellectual's answer to Excel."  - such a post, and the day begins with a smile.