Forensic Email Visualization
Today the mass of E-mail communication is immense and is going to enlarge in
the next few years. As a result it gets more and more difficult for forensic
examiners to receive an impression of a suspected E-mail communication, to
identify the main communication partners, and to recognize patterns in the
During a bachelor thesis an E-mail analyzing tool was developed that helps the
investigators to get a quick and descriptive overview of suspected E-mail
accounts. It offers its result in a new innovative way of visualization using a
responsive and interactive graph visualization supported by several statistics
about the mail account.
2. The tool and how to use it
The developed tool can be divided into several parts and modules:
Fig 1: Structure of the tool.
The tool currently supports three different types of mailbox formats: MBOX, PST
and OST. The user is able to pass multiple files of each format via the command
$ python main.py [-mbox <file1, file2, …>] [-pst <file1, file2, …>] [-ost <file1, file2, …>]
The next step is to parse these files to get access to their content. Therefore
a parsing module called “unboxer” is implemented. The resulting data of
this step is saved into a sqlite database and handed over to the
“processing” functions where the data set is filtered, cleared and
processed further by transforming to JSON data structures.
After that, the application needs an interface from where the examiners can
interact with the data. Because of the amount of new possibilities HTML offers
to move the whole interaction into a web-based framework.
2.1 The User-Interface
The user interface consists of two pages. The first one is a small front page
with general information and the second offers the proper interaction by
showing graphs and charts. That page is divided in two panels: On the left we
offer several responsive diagramms and charts in order to investigate general meta
information of the mail account. The following charts are included:
* By datetime
* By weekday
* Top-15 sender addresses
The right panel defines the centerpiece of the
application by offering an undirected graph which spans the whole communication
of the mail accounts. Addresses are represented as nodes and the edges imply a
communication between two parties. The graph additionally offers the following features:
* Quick response on who are the main communication parties: The thicker and shorter an edge the more was communicated
* Limit the interval of time
* Get more information of a address by hovering and clicking
* Access full content of single messages
* Merge two nodes which are owned by the same person
Fig 2: Investigator Panel: The Meta-Panel is settled on the left including several
responsive charts and diagramms in order to visualize important meta
information. On the right there is the centerpiece of the application:
undirected graph to display the whole communication.
Fig 3: Meta Panel: Several responsive charts displaying meta information. When the
user filters on one dimension (i.e. by weekday) all other charts are updated to
that specific extract.
Fig 4: Exploration Panel: Undirected graph which represents the whole
communication. Addresses are displayed as nodes and an edge implies
communication between two parties. The thicker and shorter and edge, the more
was communicated. That arrangement allows the user to identify the main
communication partners at first view.
This tool was developed by Johannes Stadlinger in in Bachelor thesis. He was supervised by Andreas Dewald, please do not hesitate to contact us.