Assignment Exploratory Analysis

In this assignment we'll perform an exploratory analysis to better understand the shape & structure of the data, investigate initial questions, and develop preliminary insights & hypotheses. Your final submission will take the form of a GoogleDoc file consisting of visualizations that convey key insights gained during your analysis.

Step 1: Data Analysis

For the last assignment you have (hopefully) selected a dataset that you found interesting. In this assignment you can keep working on this dataset or extend it or change it - but beware of the extra work changing and extending creates. Now you are also free to download datasets, rather than scrape or query an API. Stick to the topic requirements of the last assignment.

After selecting a topic and dataset – but prior to analysis – you should write down an initial set of at least three questions you'd like to investigate.

Step 2: Exploratory Analysis

Next, you will perform an exploratory analysis of your dataset using a visualization tool of your choice. You can use Excel, Tableau, Altaire, etc.. You should consider two different phases of exploration.

If you kept the dataset from the last assignment you will have already completed this phase. You should seek to gain an overview of the shape & stucture of your dataset. What variables does the dataset contain? How are they distributed? Are there any notable data quality issues? Are there any surprising relationships among the variables? Be sure to also perform "sanity checks" for patterns you expect to see!

In the second phase, you should investigate your initial questions, as well as any new questions that arise during your exploration. For each question, start by creating a visualization that might provide a useful answer. Then refine the visualization (by adding additional variables, changing sorting or axis scales, filtering or subsetting data, etc.) to develop better perspectives, explore unexpected observations, or sanity check your assumptions. You should repeat this process for each of your questions, but feel free to revise your questions or branch off to explore new questions if the data warrants.

Step 3: Final Deliverable

Your final submission should take the form of a Google Docs report – similar to a slide show or comic book – that consists of 10 or more captioned visualizations detailing your most important insights. Your "insights" can include important surprises or issues (such as data quality problems affecting your analysis) as well as responses to your analysis questions. To help you gauge the scope of this assignment, see the example of a similar report analyzing data about motion pictures. The report has been annotated and graded this example to help you calibrate for the breadth and depth of exploration we're looking for.

Each visualization image should be a screenshot exported from a visualization tool, accompanied with a title and descriptive caption (1-4 sentences long) describing the insight(s) learned from that view. Provide sufficient detail for each caption such that anyone could read through your report and understand what you've learned. You are free, but not required, to annotate your images to draw attention to specific features of the data. You may perform highlighting within the visualization tool itself, or draw annotations on the exported image. To easily export images from Tableau, use the Worksheet > Export > Image... menu item.

The end of your report should include a brief summary of main lessons learned.

Visualization Tools

You are free to use one or more visualization tools in this assignment. However, in the interest of time and for a friendlier learning curve, we strongly encourage you to use Tableau. Tableau provides a graphical interface focused on the task of visual data exploration. You will (with rare exceptions) be able to complete an initial data exploration more quickly and comprehensively than with a programming-based tool.

  • Tableau - Desktop visual analysis software. Available for both Windows and MacOS; register for a free student license.
  • Data Transforms in Vega-Lite. A tutorial on the various built-in data transformation operators available in Vega-Lite.
  • Data Voyager, a research prototype from the UW Interactive Data Lab, combines a Tableau-style interface with visualization recommendations. Use at your own risk!
  • R, using the ggplot2 library or with R's built-in plotting functions.
  • http://jupyter.org/Jupyter Notebooks (Python), using libraries such as Altair or Matplotlib.
  • Streamlit + Alair as begun to teach in class

Submitting the Assignment


This is an individual assignment. You may not work in groups.

WHAT - To complete the assignment you should: have created the Google Doc page as outlined above

'''WHERE - Submit your assignment via this Google form

WHEN - Assignment 3 is due before "23:00 on Oct 14th.'''

Acknowledgements

This assignment borrows from a similar one run by my colleague Arvind Satyanarayan for his 6.894 : Interactive Data Visualization class at MIT.