If you are doing a performance test, it always a good thing to do that using a real dataset. Following are several useful datasets.
Often you can find data in the CSV format, and then parsing and using it is pretty easy.
- DLPB catalog – http://kdl.cs.umass.edu/data/dblp/dblp-info.html – this is data about publications. About 900MB raw size.
- Google Fusion tables, http://www.google.com/fusiontables/Home
– this has several useful datasets as CSV.
- Federal reserve economic data – http://research.stlouisfed.org/fred2/
- Amazon public datasets – aws.amazon.com/publicdatasets/
There are lot more. Following are some of them. If anyone knows list giving a sizes of datasets and nature of datasets, please let me know.