IEEE Transactions on Systems, Man, and Cybernetics-Part B:
Volume 39, Issue 4, pp 898-1001, March 2009
Abstract
Support vector machines can be trained to be very accurate classifiers and have been used in many
applications. However, the training and to a lesser extent prediction time of support vector machines on very
large data sets can be very long. This paper presents a fast compression method to scale up support vector
machines to large data sets. A simple bit reduction method is applied to reduce the cardinality of the data
by weighting representative examples. We then develop support vector machines trained on the weighted data.
Experiments indicate that the bit reduction support vector machine produces a significant reduction in the
time required for both training and prediction with minimum loss in accuracy. It is also shown to,
typically, be more accurate than random sampling when the data are not over-compressed.
Data Sets
Experiments on several data sets: banana [ratsch01], phoneme [elena], shuttle[statlog],
page [merz], pendigit [merz], letter [merz], SIPPER II plankton images [luoicpr],
waveform [merz] and satimage [merz]. They come from several sources ranging in size from 5000
to 58,000 examples and from 2 to 36 attributes. We also ran experiments on the Adult, Forest, and Web
data sets to compare with previous work as detailed in subsection \ref{brtb:Comparison}.
[statlog], D. Michie and D. J. Spiegelhalter and C. C. Taylor, Machine Learning, Neural and Statistical Classification, url = ftp://ftp.ncc.up.pt/pub/statlog/, 1994
[merz], C. J. Merz and P. M. Murphy, {UCI} repository of machine learning database, http://www.ics.uci.edu/~mlearn/MLRepository.html, year = "1999"
[luoicpr], T. Luo and K. Kramer and D. Goldgof and L.O. Hall and S. Samson and A. Remsen and T. Hopkins,
Active learning to recognize multiple types of plankton,
17th conference of the International Association for Pattern Recognition,
vol 3, pages 478-481, 2004