Data Briefs Submitted to the InfoVis 2019/2020 Class


1) Carnivorous Plant Habitats Citizen Science

Information on the Data

Data Description: reports of where people found plants to grow in the wild, including location, time, picture, social media ID

Data Collection: personal data plus data export from iNaturalist.org

Domain: biology, botany

Intended Audience

carnivorous plants enthusiasts, botanists

  • programming, preferably in Python plus HTML 5 (etc.)
  • general InfoVis skills

Interesting Challenges and Questions About the Data

Challenges and Questions about the data:

  • goal: analysis of data bias/errors based on the two datasets as well as within a dataset
  • goal: visualizations that show the data bias/error
  • goal: visualization that includes the different types of bias/error/uncertainty in a geographic view
  • challenge: people introduce bias both unintentionally and on purpose
  • requirement: my personal dataset may not be shared, an NDA will need to be signed

Additional Material

Additional Material: https://www.inaturalist.org/

Additional Comments: need for NDA for my own dataset

2) Bitcoin news

Information on the Data

Data Description: This dataset contains the list of Bitcoin news. It is crawled from the Bitcointalk forum. The data includes the date, news header, content from the post, and URL link to the article.

Data Collection: https://bitcointalk.org/index.php?board=77.0

Domain: Bitcoin

Intended Audience

Economic analysts

Cleaning and preprocessing text data

Interesting Challenges and Questions About the Data

Challenges and Questions about the data: How can we know which news impacts to the Bitcoin market? What are the factors that cause the Bitcoin price fluctuation in a period of time?

Additional Material

Additional Material:

Additional Comments:

3) Infographic on RER B incidents

Information on the Data

Data Description: Dataset of RER B incidents across the course of 5 years containing cause of incident, place, date and time. We are working on extending the dataset to 10 years worth of RER B incidents.

Data Collection: The dataset results from parsing a large number of RER B alerts sent by email.

Domain: transportation

Intended Audience

Our ideal audience is the general public, with a focus on RER B users. We would ideally like an interactive web infographic that anyone could understand. This would be especially good for the students involved in this project, because they'll be able to advertise their work easily and may get a lot of visibility.

Web programming, information design

Interesting Challenges and Questions About the Data

Challenges and Questions about the data: We don't have any hypothesis on the data so the goal is exploration. It would be nice to get insights on how RER B incidents (type, location, duration) have evolved across years.

Additional Material

Additional Material: n/a

Additional Comments:

4) Visualizing Emergent Narrative Structures in Persistent Game Worlds

Information on the Data

Data Description: The data represents information of a highly persistent virtual game world. This means access to spatial and temporal information, player statistics, inventory and player created content. We also persist "story events" that gives a snapshot of specific narrative occurrences that has emerged from players actions. The data is updated continuously, but we can provide snapshots from various points in time and it is easy to work with only sections of the data set. Ex Table. story_events(id, eventid, item, type, primary_char, secondary_char, location, date, story_points, special, compantions)

Data Collection: The data is generated by players in our own live and running Multiplayer Online Role Playing Game. www.weridegame.com

Domain: Emergent narratives, game design, role playing, mmorpgs, virtual worlds

Intended Audience

Game designers, game masters, players (potentially)

Not required, but experience of playing MMORPGs (especially older ones) is very helpful.

Interesting Challenges and Questions About the Data

Challenges and Questions about the data: We argue that persistence and narrative emergence can let us build upon players’ influence rather than restrict it. What if games could persist traces of players’ stories and reify them into interactive gameplay elements? What if players’ past actions could generate ‘legendary’ artifacts? What if locations could adapt themselves, according to past events? Players would become true legends, remembered and influential, where player actions have meaningful impact.

To achieve this, we have built a game environment that persists "story representative data" and are now looking at how designers and game masters can detect relationships within the global narrative (the sum of all players logged data) that can generate new meaningful content. Broadly:

  • How can we visualize (and ideally interact) with history of a game world to detect and reify new relationships between characters and items.
  • How can we visualize players' different relationships with items, locations and characters within a specified timeframe?
  • How can we visualize history of a specified location?

Additional Material

Additional Material: www.weridegame.com

Additional Comments:

5) Readable URIs

Information on the Data

Data Description: The data consists in lists of URIs. For instance, we could have a list of laureates from the Nobel Dataset, a list of music bands from Dbpedia, a list of poets from the library.

Data Collection: The data are Linked Open Data from Dbpedia, the French national Library, the Nobel Prizes foundation or other sources, according to students' interest.

Domain: semantic web, URI, pattern analysis, typography

Intended Audience

The main audience is readers of Semantic Web content, which often contains plain lists of URIs (eg http://dbpedia.org/page/The_Beatles).

Javascript / CSS / SVG

Interesting Challenges and Questions About the Data

Challenges and Questions about the data: The main challenge is to make a list readable, analysing the patterns in URIs and playing on the typography. For instance:

Additional Material

Additional Material: https://www.w3.org/TR/cooluris/ + API to retrieve a prefix for a URI https://prefix.cc/ + https://www.worldcat.org/title/typographie/oclc/495488899&referer=brief_results

Additional Comments:

6) Exploration of Tephrochronology Database of the Southern and Austral Volcanic Zones of the Andes

Information on the Data

Data Description: The data set contains information on past volcanic eruptions in the Southern and Austral Volcanic Zones of the Andes. So far the database includes ~14.500 samples of 56 different volcanoes and 119 different eruptions. The database includes information on 50 different features including geochemical characteristics (36 features), age, stratigraphy, measured material and measuring technique, geographical position, among others.

Data Collection: The data has been collected by me from published papers and from direct controbution from the authors of the data.

Domain: Earth sciences, tephrochronology

Intended Audience

Researchers of disciplineas associated with tephrochronology, such as paleoceanographers, paleoclimatologists, vulcanology, archaeology, volkanic risk assessment.

Basic programming skills for processing, data analysis and data visualization. A specific language or knowledge is not required.

Interesting Challenges and Questions About the Data

Challenges and Questions about the data: General goal: Create a visualization tool that allows the exploration of samples of tephra deposits in the data base in their geographical, temporal, geochemical and stratigraphic features.

Specific goals:

  • Visualize the geographical distribution of volcanoes and the geo-temporal distribution of their eruptions
  • Visualize distributions of the eruptions in depth (stratigraphy), including the age in depth and the thickness of the deposits of the eruption.
  • Visualize the geochemistry associated with the different volcanoes and eruptions and its variation in time.
  • Include interaction features that allows to:
    • select a particular sample to know more details of them
    • select characteristics of their multidimensional characteristics to see which samples volcanoes or eruptions also have them
    • compare the user's data with the database
    • select a particular set of volcanoes and/or eruptions to visualize
  • Display the values of uncertainty of the numerical features of the samples. These are divided in:
  • Age, which has uncertainties associated both with what is measured to obtain the age of the eruption, but also with the laboratory techniques
  • Geochemical features, this uncertainty is associated with the laboratory techniques.

Additional Material

Additional Material:

Additional Comments: The Tephrochronology database of the Southern and Austral Volcanic Zones of the Andes aims to improve the access to the tephrochronology knowledge by homogenizing the information, but also by developing the tools necessary for a thorough exploration of the data. This project will continue to grow in the medium term since the amount of the data will continue to grow in 2020, not only the ammount of samples but of features included in the database. Also, this growing ammount of data allows for novel statistcs analysis in the field and a collaboration with a statistics team at INRIA is in development to improve the interpretability of the data. Finally, the durability of the project is supported by the server ESPRI-IPSL which hosts the data and the geological chilean service SERNAGEOMIN which will host the data and provide the maintenance of the database.

7) Visualization of characters’ interaction in a TV series

Information on the Data

The objective of this project is to provide a way to visualize the relationships between characters in the first five seasons of the TV series Game of Thrones. We consider that two characters are in a relationship (positive or negative) if they both speak within the same scene. As characters are traveling a lot in Westeros and Essos, their relationship evolve over time and some characters that are close at the beginning (e.g. Jon Snow and Arya Stark) are not in a relationship at all later in the show.

Data Description: The data is composed by the list of every scenes in the first five seasons of Game of Thrones, TV series. Each scene is associated with its start and end time and the list of characters’ (normalized) name who speak inside the scene.

Data Collection: The data is manual annotation based on the first five seasons of Game of Thrones TV series.

Domain: Narrative Structure and Scene Analysis, Most generally TV series

Intended Audience

People interested in TV series and scene analysis. Binge watchers.

Manipulating CSV file. Interest on TV series

Interesting Challenges and Questions About the Data

Challenges and Questions about the data: The main challenge of the project is to provide a simple and interactive way to present the evolution of the interactions between a really large number of characters, give that the interactions between characters in this series evolve a lot.

How the relationship between characters evolve over time?

Additional Material

Additional Material:

Additional Comments: The project is a starting point of a larger work on narrative structure visualization that will be dealt with during a 6-month internship next spring (application is very welcome)

8) Stories intertwinedness inside TV series visualization

Information on the Data

Data Description: The data contains the linking information of scenes in the episodes of the first 2 seasons og Game of Thrones. Each scene is associated with the previous scenes that are part of the same story and titles have been assigned to each scene. Besides, the main characters that are involved are given and the fact that the scene is a most reportable scene (MRS) or not is reported. Shots of each scene can also be provided.

Data Collection: The data is manual annotation based on the first two seasons of Game of Thrones, TV series.

Domain: Narrative Structure and Scene Analysis, Most generally TV series

Intended Audience

People interested in TV series and narrative structure extraction. Binge watchers. People who work on analysis of TV series.

Manipulating CSV file. Interest on TV series

Interesting Challenges and Questions About the Data

Challenges and Questions about the data: Most TV series are composed by different stories that are intertwined inside episodes. In order to analyze the narrative structure of these TV series, the first step consists in extracting these different stories by creating links between scenes, since stories progress through different scenes. The most reportable scene of a story, i.e. the scenes that contain the most important events of the story (provided). The main challenge of the project is to provide a simple way to visualize the narrative structure of a complex TV series at different level of granularity (episode level and season level). The visualization should be able to provide users with visual summary of the story (with textual or visual information) and emphasize the most reportable scene. How the relationship between characters evolve over time?

Additional Material

Additional Material:

Additional Comments: The project is a starting point of a larger work on narrative structure visualization that will be dealt with during a 6-month internship next spring (application is very welcome)

9) Visualizing Provenance to Support Exploratory Trade-off Analysis

Information on the Data

Data Description: The dataset describe four interactive model exploration sessions, and contains:

  • Video recordings : ≈500 minutes
  • Log files of user interactions with a visualization tool (124k lines of xml, 17 fields)
  • Manually labelled dataset: ≈4K individual events, 15 dimensions. This dataset provides high-level information such as change of hypotheses and research questions, expertise and found insights.

The collected datasets are described in detail a published paper (see additional material, Boukhelifa et al., 2019).

Data Collection: Part of the data was logged automatically during four interactive exploration sessions of agronomy models with domain experts. The experts wanted to analyse various trade-off scenarios (e.g., for a wheat-crop model, they wanted to explore fertalisation strategies where wheat yield is maximised but the amount of fertilisers is minimised). The other part of the dataset was labelled manually from videos of those exploration sessions.

The interactive exploration sessions and collected datasets are described in detail in a published paper (see additional material, Boukhelifa et al., 2019).

Domain: Agronomy, trade-off analysis, interaction log data

Intended Audience

1. Domain experts wanting to understand their own analysis or exploration processes (reflection), or domain experts who want to share their exploration process and findings with colleagues (reproducibility). 2. Tool builders who want to understand how their systems are used by domain experts.

no domain knowledge is required; programming skills required.

Interesting Challenges and Questions About the Data

Challenges and Questions about the data: Analysis and visualisation of those interaction logs (and associated data) can :

  • Help experts keep track of what they are doing ;
  • Help experts reproduce, validate and share their approach ;
  • Help us understand how experts use our tool to conduct trade-off analysis;

It would be interesting to :

  • provide an overview of the different exploration sessions
  • detect and visualise different or similar analysis scenarios from those exploration sessions.
  • find typical trade-off exploration strategies or stages of exploration, e.g. start from model parameter exploration then objectives, or vice versa ?
  • what parts of the search space did experts explore the most, and what did they not explore?
  • do experts prioritise model objectives differently? At different stages of the exploration ?
  • other ?

Additional Material

Additional Material: N. Boukhelifa, A. Bezerianos, I. C Trelea, N. Perrot, E. Lutton. An Exploratory Study on Visual Exploration of Model Simulations by Multiple Types of Experts. http://pfl.grignon.inra.fr/nb/papers/boukhelifa_chi_2019.pdf

Additional Comments: