There is an issue with testing adaptive classifiers on autocorrelated data, such as the Electricity dataset due to Harries, 1999. In such a case random change alarms may boost the accuracy figures. Hence, we cannot be sure if the adaptation is working well.
I wrote a note on the subject:
I. Žliobaitė (2013). How good is the Electricity benchmark for evaluating concept drift adaptation. arXiv:1301.3524