Final Assignment

The goal of the final assignment is to bring together the set of tools and techniques you learned about in class and explain some patterns in the data through analysis.

Remember the general challenge:

In the roughly twenty years that Tethys based GAStech has been operating a natural gasproduction site in the island country of Kronos, it has produced remarkable profits and developed strong relationships with the government of Kronos. However, GAStech has not been as successful in demonstrating environmental stewardship.
In January, 2014, the leaders of GAStech are celebrating their new­found fortune as a result of the initial public offering of their very successful company. In the midst of this celebration, several employees of GAStech go missing. An organization known as the Protectors of Kronos (POK) is suspected in the disappearance, but things may not be what they seem.

Your task:

  • Take the following datasets from some of the previous tutorials: credit cards, loyalty cards, GPS tracks, and driver to car assignments (the text files from the Jigsaw tutorial are not part of this final assignment).
  • Create a report in which you provide answers to the questions below. For the report and for each question:
    • For each question - list the assumptions you were first making to find the solution to an answer. Tell us if your assumptions changed throughout your analysis.
    • Provide a list of tools you used to arrive at your answers and why you chose the tool(s). List briefly if choosing the tool(s) proved to be useful or not so useful for arriving at your answer and why.
    • Provide your final answer for each question. Illustrate the answer with screenshots showing externalizations (graphs, tables, output from R, etc.) that you created with the tool(s) and that show what evidence you based your answers on.
    • Describe if you had to make any more changes to the data. Did you have to do more cleaning? Did you have to create / calculate additional metadata? Were there uncertainties in the data you had to deal with?

We will judge your solution to each question based on how well we can follow your reasoning and analysis process given your description and the images you include.


Part 1: Questions:

Answer and report on how you answered the following questions.

  1. What are the five most popular restaurants for GAStech employees? (don't forget to define what "popular" could mean in this context)
  2. Two GAStech employees are having an affair, which ones are they?
  3. Identify where GASTech employees live. Describe your solution and show a visual solution for at least five employees.
  4. Do the executives socialize outside of work? If so, where and when?
  5. Are there any anomalous driving patterns for the security guards? If so what are they?

One final difficult question:
6. One employee's credit card was stolen. Whose was it and who stole it?

Part 2: Hypotheses

Given the past analyses you have done in class and for the questions above. What hypotheses do you have about which employees have gone missing, what might have happened, and who could be responsible?

  • List at least three hypotheses based on what you have seen in your data so far.
  • Find at least five pieces of evidence for or against your three hypotheses. Describe the diagnosticity of your evidence by creating a matrix as discussed in Lecture 5.
  • For each piece of evidence give a screenshot of an externalization (graph, table, output from R, etc.) showing where the evidence comes from in the data.

Part 3: What is missing?

Provide a description of the types of analyses you would have liked to do to answer the questions or confirm your hypotheses but you could not do given the tools discussed in class.

  • List at least five requirements for a tool that would have made your analyses easier. Provide a clear description of where these requirements come from. What could you not do with the tools you tried?
  • Make at least five sketches that show a possible externalization that you would have liked to have during your analysis but that wasn't supported in the tools we covered in the previous tutorials. Provide some captioning or labels on your sketch to explain it.

Submitting the Assignment

WHAT - You should submit a single PDF file called "" via email.

WHERE - You should email the file to AND with the subject VA-Assignment6.

WHEN - Remember that Assignment 6 is due before "23:00 on Monday, November 3rd.'''