Marginally Interesting: Some Benchmark Data Sets

Thursday, November 12, 2009

Some Benchmark Data Sets

Sören Sonnenburg recently brought my attention to a few possibly lesser known benchmark data sets. Of course, benchmark data sets are always a double-sided sword: On the one hand, they are a great way to test and compare your learning algorithms, but on the other hand you’re usually not really solving any real problems anymore.

So you probably already know the UCI repository, or the DELVE repository. Here are a few links to probably lesser known benchmark data sets:

Generic Benchmarking

IDA dataset repository (a.k.a. “the Gunnar Benchmark Data Set”)
libsvm datasets
Datasets from cervisia.org
KDD datasets
Pascal Large Scale Challenge

Multiple Kernel Learning

MKL interpretable datasets
Multiple Kernel Learning repository

Bioinformatics

Human Promoter recognition
Splice site prediction (several organisms)
Alternative splicing

Image Processing

Caltech 101
Caltech 256
Other caltech image

Marginally Interesting

Thursday, November 12, 2009

Some Benchmark Data Sets

No comments:

About Me

Personal Pages

Projects

Labels

Blog Archive

Machine Learning Feed

Machine Learning Feed - Subscriptions

Marginally Interesting

Thursday, November 12, 2009

Some Benchmark Data Sets

No comments:

About Me

Personal Pages

Projects

Labels

Blog Archive

Machine Learning Feed

Machine Learning Feed - Subscriptions

Subscribe To