Tutorial 5 - Visual Text Analysis in Jigasw

In this tutorial you will use Jigsaw as a visual analytics tool to explore a collection of text documents and answer some questions. We will perform some initial analysis together in class, and you will then perform some more on your own.

You should submit the completed assignment to us before 23:00 on Monday, October 13th (details below).

Getting Started


You should have installed Jigsaw already. If not, you can download it here: http://www.jigsaw-analytics.net/.

Kronos Scenario


Note: This scenario and all the people, places, groups, technologies, contained therein are fictitious. Any resemblance to real people, places, groups, or technologies is purely coincidental.

In the roughly twenty years that Tethys-based GAStech has been operating a natural gas production site in the island country of Kronos, it has produced remarkable profits and developed strong relationships with the government of Kronos. However, GAStech has not been as successful in demonstrating environmental stewardship.

In January, 2014, the leaders of GAStech are celebrating their new-found fortune as a result of the initial public offering of their very successful company. In the midst of this celebration, several employees of GAStech go missing. An organization known as the Protectors of Kronos (POK) is suspected in the disappearance, but things may not be what they seem.

It is January 21, 2014, and as an expert in visual analytics, you are called in to help law enforcement from Kronos and Tethys to assess the situation and figure out where the missing employees are and how to get them home again. Time is of the essence.

Files


Kronos-Tutorial5.zip - A ZIP file containing:

  • A map of Kronos
  • A chart describing the local GAStech organization, in PDF format.
  • A spreadsheet of GAStech employee records, in Microsoft Excel format. The primary worksheet contains the data; the index worksheet contains the data dictionary
  • Email headers from two weeks of internal GAStech company email, in comma-separated values (CSV) format
  • Resumes and short biographies of many, but not all, of the GAStech employees, in Microsoft Word format
  • Historical reports and descriptions of the countries involved, in Microsoft Word format
  • Relevant current and historical news reports from multiple domestic and translated foreign sources, in text file format. Because these articles have come from multiple sources and original formats, some of the them may contain corrupted characters, which is typical for this type of data. These corrupted characters should not interfere with your ability to analyze the data.
  • A pre-processed Jigsaw file (articles_enriched.jig) ready for import into Jigsaw. This file contains additional entities pre-identified and visible uppon initial import.

Assignment


Explore the dataset provided in the ZIP file above using Jigsaw and answer each of the following questions as accurately as possible:

Q1. Make an alias for the different spellings of the organisation entity “GAStech”, making the latter the representative entity name.
Q2. Show the expanded circular graph of the least connected person entity showing all their related entities, where least connected means the person who appears the fewest number of times in all documents.
Q3.a Cluster all documents by text and select the largest cluster of documents. What is the main topic of this cluster?
Q3.b Explore the selected cluster in a new Document View.
Q4.a Find the three most similar documents to article id=24, where similarity is based on entities (rather than text).
Q4.b. Rank all documents by sentiment (red for negative and blue for positive). Which cluster (by entity) has the most negative sentiments associated with it, and which has the most positive sentiments.
Questions about the disappearance of GAStech employees
Q5.a Identify two leaders of POK.
Q5.b Describe potential connections between POK and GAStech.
Q5.c Identify two possible explanations why the GAStech employees may be missing. What evidence do you have to support each of these explanations?
Q6. [Bonus Question] Any additional insight regarding the Kronos scenario supported by evidence generated using the Jigsaw tool will benefit from bonus points.

For each of the 6 questions, you need to have a short paragraph describing briefly: (a) the solution, (b) the exploration process that allowed you to reach the solution, and (c) evidence for the process and the solution in the form of screenshot images from Jigsaw. These images should have a descriptive caption and should demonstrate both the process of the discovery and act as evidence for the insight. You may use as many images as you need - but try to keep it clear and communicative.

Tips and WARNINGS


  • Note that the Time Line and Calendar views are not available for this dataset. Use other views to explore time related events (e.g. List View).
  • Start the analysis by importing articles_enriched.jig and performing Jigsaw entity recognition (Using Illinois-NER). Additional insight can be found by exploring other data files such as for email and resumes.

Submitting the Assignment


WHAT - You should submit a single PDF file called "YOUR_NAME-Assignment5.pdf" via email. The PDF file should contain:

  • (a) your name (b) answers to each of the 6 questions clearly labeled (Q1 to Q6). Screenshot images required for questions (Q1 to Q6) should be embedded in the sinlge PDF file under the corresponding question.

WHERE - You should email the file to nadia.boukhelifa@inria.fr with the subject VA-Assignment5.

WHEN - Remember that Assignment 5 is due before "23:00 on Monday, October 13th.'''