Seriation Article Leads Off Volume 3
Joseph S. Verduccii, Editor, Statistical Analysis and Data Mining
Table of Contents
- Seriation and Matrix Reordering Methods: A Historical Overview
- Bayesian Adaptive Nearest Neighbor
Ruixin Guo and Sounak Chakraborty
- Mining and Tracking Evolving Web User Trends from Large Web Server Logs
Basheer Hawwash and Olfa Nasraoui
- Modeling User Reputation in Wikis
Sara Javanmardi, Cristina Videira Lopes, and Pierre Baldi
The four papers in Volume 3, Number 2 of the journal Statistical Analysis and Data Mining span a wide range of topics, from a very general method for discovering patterns in data to very specific models for the reputation of those who update wikis.
In the first paper, Innar Liiv reviews how seriation, or reordering of observations, has revealed the hidden structure of data in many disciplines. Typically seriation is achieved for these myriad examples by permuting the rows and/or columns of matrices to optimize interesting objective functions.
In the second paper, Ruixin Guo and Sounak Chakraborty present a general method, Bayesian adaptive nearest neighbor (BANN), for classification in high dimensions. BANN uses a Bayesian framework to combine ideas for adapting the shape (discriminative adaptive nearest neighbor [DANN]) as well as the size (probabilistic nearest neighbor [PNN]) of neighborhoods, based on extended local patterns. BANN performs better than DANN or PNN on nine benchmark data sets.
In the third paper, Basheer Hawwash and Olfa Nasraoui demonstrate an efficient method for mining evolving profiles, which is particularly sensitive to changes in profile patterns. They apply their method to track changes in the profiles of those accessing a library web site.
Finally, Sara Javanmardi, Cristina Videira Lopes, and Pierre Baldi propose three nested models to estimate dynamically the reputation of contributors to a wiki site, where “reputation” ranges from 0 (vandals) to 1 (administrators). The first model simply updates the fraction of “good” contributions, the second adjusts each contribution by the length of time it has endured, and the third takes into account the reputation of the deletor.
All in all, the papers span the range from thought-provoking to immediately useful.