Datasets available for Projects

This page lists the available datasets for the project.

IEEE VGTC VPG International Data-Visualization Contest: Perceived vs. Actual Student Interest

The dataset for the IEEE VGTC VPG International Data-Visualisation Contest is available there http://vacommunity.org/ieeevpg/2015/. Both the csv file as well as a summary of the dataset are available on the website.

Yelp Dataset Challenge

The dataset for the Yelp Dataset Challenge is available http://www.yelp.com/dataset_challenge

Co-authorship network of the INRIA AVIZ Team

Dataset: co-authorship network of the INRIA AVIZ Team, extracted from HAL. This dataset contains all publications for the AVIZ team. The xml file contains publications, and for each of these publications the list of authors is available among other information.

Infovis conference citations

Dataset: Infovis conference citations, from http://www.cc.gatech.edu/gvu/ii/citevis/. This dataset describes, for each article published at the Infovis conference, which article within the Infovis conference cited it, and when. Additional information such as keywords and authors are available.

VIS 2014: where to eat, drink and sleep?

Dataset: restaurants, Bars, Hotels and food near Marriott rive gauche. In November 2014, 1000 or more visualization researchers will come to the Marriott Rive Gauche for the biggest international visualization conference. They need to find a place to eat and drink and perhaps sleep. Here are four datasets generated from Yelp through a search for Hotel, Food, Bar, and Restaurant. Build a visualization that will help them to choose - in particular places to eat. Feel free to draw in any other data you may find online.

The categories mentioned in the dataset are listed on this website: http://www.yelp.com/developers/documentation/category_list

Information about how to use json data in processing can be found here: http://processing.org/reference/loadJSONObject_.html

OECD Better life index

Dataset: Better life index, from http://www.oecdbetterlifeindex.org/. In this dataset, countries are given a score according to several attributes, such as the job security, the years in education, the homicide rate, etc.

Books

This dataset contains a subset of books collected from http://www.freebase.com. The data comes in the form of a tab separated file with a set of metadata about each book:

  • awards won
  • authors
  • date first published
  • language
  • subjects
  • cover image
  • binding
  • cover price
  • number of pages

etc.

We provide a subset of the data here for you to start: Download the tab separated file and a collection of cover photos

Feel free to download more data or other metadata from freebase. See datahub.io/dataset/freebase for information on how to get more data.

Paris RATP network

Dataset, from http://data.ratp.fr/.

This dataset contains several files, with instructions about their content (in French). We provide 3 files: one showing the annual traffic for each metro/RER station, one describing each station with its geographic coordinates, and one describing the routes of the network (such as line A, etc.).

French Football Ligue 1 championship

Dataset: we provide the results from the French Ligue1 football championship for 4 complete seasons (2009 to 2012), and one partial season (2013, in progress). Each season file consists of the results for each day of the championship, it is then easy to compute the number of points and the rank of each team for each day of the championship.

Movies from 1990

Dataset, from http://www.imdb.com: we provide two files that we computed from the IMDB database describing all movies from 1990. The first one contains all movies identified by a unique mid, with several attributes and a list of crew ID. The second one contains all people involved in at least one of the movies, identified by a unique pid, with several attributes and a list of movies ID for each role (such as Actor, Director, Composer, etc.).

CHI papers citations 1982-2013

Dataset, from http://www.tabard.fr/blog/2013/12/10/chi-paper-data-from-1982-to-2013/: One CSV file per year represents each CHI paper with the following properties: "conference", "year", "doi", "title", "citationCount", "download6weeks", "download12months", "downloadAll", "keywords", "pageNumber", "authors (name, affilitiation)". Citation and downloads counts come from the ACM DL (fetched over the past 3 days).

Dataset Databases

We provide a set of urls where you can find various datasets. it does not mean that you can choose one of them without asking us, and it does not mean either that you are constrained to these external links only. Feel free to browse the web and find other datasets. If you pick one of these datasets (or a personal one), you need to send an email to lonni.besancon@gmail.com for validation, before the 10th of December, 2013.