Useful data sets

If you are doing a performance test, it always a good thing to do that using a real dataset. Following are several useful datasets.

Often you can find data in the CSV format, and then parsing and using it is pretty easy.

  1. DLPB catalog – http://kdl.cs.umass.edu/data/dblp/dblp-info.html – this is data about publications. About 900MB raw size. 
  2. Google Fusion tables, http://www.google.com/fusiontables/Home
     – this has several useful datasets as CSV.
  3. Federal reserve economic data – http://research.stlouisfed.org/fred2/  
  4. Amazon public datasets – aws.amazon.com/publicdatasets
There are lot more. Following are some of them. If anyone knows list giving a sizes of datasets and nature of datasets, please let me know. 
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s