Choosing a Research Questions & First Analysis

Choose a Research Question

In the remainder of the course we are going to continue using the dataset we have already begun to explore in class. The dataset of IEEE VIS publications is part of ongoing research and has just been made available to the world. You will be one of the first to dig into and explore this dataset, so there are still lots of interesting observations to make and tools to build.

The full dataset, and explanation about its variables, as well as some first simple visualizations can be found at the following url: http://www.vispubdata.org

In class we have already begun to discuss some interesting questions we may want to ask about this dataset - or tasks we may want to use the dataset for.

Your task is to now choose one of the tasks or higher-level questions that you are interested in and to prepare your first exploratory analysis into the data, keeping this question/task in mind. In the following I prepared a few sample tasks and questions that you can choose from. If you would like to ask/choose other questions, please contact your instructor to verify that you question is at the adequate level for this and the following assignments.

NOTE: in the following descriptions I talk only about papers of type "J" or "C" in the dataset. It is best to also restrict your analysis to these two types of paper.

Sample Research Questions

Supporting Peer-Review

Every paper in the dataset (of type "J" and "C") has undergone a process called peer-review (read the following wikipedia article for more details). This means that in order to get accepted at the IEEE VIS conference, the paper has been assigned to four reviewers all of which read the paper, gave comments, and gave it an accept or reject recommendation. The process of peer-review, however, is not without problems (see this Nature article or this JRSM paper). One of the challenges with peer-review is finding the right experts for a paper with a given topic. These experts have to have two main properties:

  • they need to be (very) knowledgeable about the topic of the paper. Knowledge about a topic, can, for example be established by looking at the past work of a reviewer -- their past papers and the topics of these papers. A knowledgeable reviewer is also typically one who has been active in the community by publishing papers for a few years (although it does sometimes happen that a PhD student with only 1-2 published papers can become a reviewer if they are sufficiently specialized).
  • they are not allowed to be in conflict with any of the authors of the paper. A conflict arises if a reviewer has co-authored a publications (with any of the authors of the paper subjected to peer review) within the last five years (there are also other sources of conflict such as joint grants, friendships, or supervisor-student relationships - that we will not consider here).

Here a picture of what a typical reviewer assignment page looks like. Can we do something graphical that is better?

One potential research questions on the peer-review topic is, thus:

  • Who would be a good reviewer for a given paper?

Understanding Individual Researchers

The community of visualization is extremely divers with people working on many different topics over a number of years. Given the publications of a specific researcher there are a number of interesting things we can learn about them: their history, their (changing)interests, their social network, their expertise, etc.

This is what google scholar displays for a researcher's profile. Can we provide better overviews?

One potential research question this topic is, thus:

  • What does a researcher's research track and environment look like?

Understanding the Field of Visualization

Visualization is an extremely divers area of research with various influences. A recent paper has attempted to understand the field based on the keywords provided with this dataset. However, there are many other ways in the dataset can help us understand what the field looks like. For example:

  • a similar analysis but looking at paper titles and/or abstracts,
  • a more in-depth analysis on temporal evolution of the field (e.g. the researchers joining and leaving, topics trending and declining, ...)
  • the evolution of the individual sub-conferences Vis/SciVis, InfoVis, and VAST

Here an example of a keyword co-occurrence map. What other graphics can be provide to help understand the Visualization community:

Potential research questions on this topic is, thus:

  • What are current and/or historic trends in the Visualization community?
  • How can we best describe the work conducted in the community?

Supporting Paper Writing

When writing a scientific paper, it is important to have an up-to-date background section in which the most important related articles can be found. Our dataset could be a good source for finding relevant related work.

Here an example extract of a related work section. The numbers in brackets refer to papers listed at the end of the paper in the bibliography section:

One potential research question on this topic is, thus:

  • What are the most important related papers on a given topic? Can we suggest relevant literature by going beyond a simple text search on keywords, titles, and texts?

Supporting Conference Organization

The IEEE VIS conference is organized by a huge number of people who serve in various committees. One of these committees is called the program committee. The program committee includes a large number of people who serve as the primary reviewer for a number of papers (usually about 7 per person) submitted to the conference. The primary reviewer is a particularly important reviewer as he/she makes the final recommendation to the papers chairs about accepting or rejecting a submitted paper. As such, the program committee is made up of people who have published a number of papers at the conference already and, thus, have experience in the domain. However, building a program committee is not an easy task as a number of factors need to be balanced. Ideally you would like to have people:

  • with at least three years of expertise in the field (based on publications)
  • who are from a variety of affiliations
  • who are a mix of male and female
  • and who represent a variety of research topics

Here a picture showing a subset of the people on the IEEE InfoVis 2016 program committee:

One potential research question on this topic is, thus:

  • What would be a recommended program committee based on the publication data. Note that InfoVis, SciVis/Vis, and VAST each have separate program committees.

Understanding Citation Behavior

Each scientific paper includes a list of references at the end. In our dataset we have listed a set of references to other IEEE VIS papers in the dataset. It would be interesting to understand general citation behavior, such as is there an influence on time, affiliation, or authors and co-authors on which papers are cited?

For an example of a related project check ut CitNetExplorer:

One potential research question on this topic is, thus:

  • What are typical citation patterns for IEEE VIS papers

Tableau Assignment

In this assignment you will be doing exploratory data analysis on the dataset you have been assigned. In particular you will be focusing on graphical summaries of the dataset. Using graphical summaries your goal is to describe characteristics of the dataset.

  1. Your first task it to choose a higher-level research question. This can be from the list above or you can choose another question. However, if you choose another higher-level question check with your instructor to make sure that it is sufficiently broad to be interesting.
  2. From this higher-level question generate 5 lower-level questions that you can answer doing exploratory analysis. For example, if your higher-level research questions was: what are general writing patterns for abstracts in the IEEE VIS literature, then a lower-level question may be: how many papers have abstracts in each year, how long are the abstracts in the dataset, how many questions are included on average in an abstract, or 'what kind of special characters are used.
  3. Next, for each of your lower-level questions you need to provide two alternative visual representations with Tableau. People should be able to answer get an answer to the questions by exploring or looking at your visualization. Add descriptive titles, legends, and axis labels.
  4. Build a Tableau Story that includes information (slide/point 1) on the higher-level question you chose as well as five subsequent points with the two visualizations for the five questions you have generated. The story points / dashboards that you created should each include interactive elements.
  5. Export your tableau workbook as a packaged workbook .twbx and submit it as your assignment

In preparation for the assignment I recommend to go to https://www.tableau.com/learn/tutorials/ and watch the videos explaining dashboards and stories.

When building your visualization you may use any inspiration from the web, including images of visualizations, Tableau tutorials, or existing visualizations that have been built in Tableau Desktop. However, you must reference all external resources that you have used for this practical as part of your report. You can also generate new datasets (you may even have to) to answer your questions -- following the types of data wrangling we have done in previous assignments.

Submitting the Assignment


WHAT - You should submit a single ZIP file called "YOUR_LASTNAME-Assignment4.zip" via email. It should contain:

  1. Your packaged workbook "YOUR_LASTNAME-Assignment4.tbwx" containing the story as described above

WHERE - You should email the file to petra.isenberg@inria.fr with the subject VA-Assignment4.

WHEN - Assignment 4 is due before "23:00 on Wednesday, November 2nd.'''