Objective of the Challenge

As part of this data analysis module, you will explore a new approach to data processing, “Data Visualization”.

Your goal is to analyze, as a team, a dataset and tell a story using charts based on an original dataset as you would in a “Data Visualization” (or “DataViz”) competition.

The goal is not to perform complex “mathematical” demonstrations but to tell a comprehensible and interesting story for everyone. Therefore, place particular importance on this “story” you are going to tell, and on the design of your charts and presentation materials.

Description of the Challenge

Tourism in the Occitanie region in 2018

Throughout the year, thousands of tourists stay overnight in our beautiful region.

Here you will find a unique dataset that locates them and counts them by overnight stays. You will know :

  • The accommodation capacities (hotel, camping, etc.) of each department
  • The origin of the tourists, whether from a French department or abroad
  • The weather and the main cultural events for each day

Some rules of the game

  • You can use any tools you wish to explore these data and propose a visual representation based on charts such as Excel, SPSS, PSPP, Tableau (https://www.tableau.com/), Observable HQ (https://observablehq.com/), and all Python libraries as well as any presentation support for your results such as PowerPoint, Canva, Adobe PDF

  • You must provide a list of the tools used to create the charts You can use any type of data analysis tool
  • You can perform all types of calculations based on this dataset
  • The format of the visual analysis will be in PDF and should not exceed the equivalent of 2 A4 pages or 3 screenshots if the creation is on the web.
  • You will add all the necessary contextual elements to comment on the chart(s).
  • You are not required to use all the data.
  • Apart from base maps, you are not allowed to use data other than those provided.

Data for the challenge

  • The overnight stay volumes were constructed by a mobile phone operator from phone call data. These data were provided by the Regional Tourism Committee (CRT).
  • The data regarding accommodation capacities were constructed by TDV from data provided by the Regional Tourism Committee (CRT).
  • The data regarding events were constructed by TDV from data provided by the Regional Tourism Committee (CRT).
  • The weather data come from a website providing historical weather data for many cities in France and around the world.
  • The geometry data of the departments are included only in the geojson file. This format is suitable for those who wish to use mapping tools such as the free software QGIS or JavaScript libraries such as d3.js.
  • The cell phone location data are not raw data but the result of an innovative processing work (adjustment, segmentation, anonymization) carried out by the telephone operator with the participation of tourism stakeholders. The “volume of overnight stays” data are therefore statistical estimates.
  • The datasets are usable in this framework following the agreement of Mr. Alain Otteinheimer, President of the Toulouse Dataviz association, director of DataSens.

The exhaustive description of the data can be found on the following Github repository : https://github.com/ToulouseDataViz/Hackaviz2020/blob/master/README.md

The data includes several files : Download the data

Synthetic and easy-to-access data: Nuitées.xls and .CSV

  • 365 lines and 15 columns
  • Overnight stays per day summarized by department

The most detailed but not the simplest to exploit: par_origines.xlsx and .csv

  • 493,235 lines and 8 columns
  • Per day with all the details

Crossing capacities x overnight stays: Serves as an optional complement to others

  • capacites.xlsx, .csv, and .geojson
  • 13 lines and 61 columns
  • Per week in categories of overnight stays by department

It is possible to create beautiful visualizations from just one of these three data files, the simplest being nuitées which is an aggregate of par_origines.

The more expert will manage to combine the three, but it is not certain that the most beautiful story needs all this data.

The important thing is to tell a beautiful story with quality charts.

Details of the Files and Download

Nuitées

Aggregation of data from the par_origines file. For each day of the year (365 lines / 15 columns) :

  • Date
  • Number of overnight stays in department 09
  • Number of overnight stays in department 11
  • Number of overnight stays in department 12
  • Number of overnight stays in department 30
  • Number of overnight stays in department 31
  • Number of overnight stays in department 32
  • Number of overnight stays in department 34
  • Number of overnight stays in department 46
  • Number of overnight stays in department 48
  • Number of overnight stays in department 65
  • Number of overnight stays in department 66
  • Number of overnight stays in department 81
  • Number of overnight stays in department 82
  • Number of overnight stays in the Occitanie region

par_origines

For each day of the year 2018 (532,399 lines / 8 columns) :

  • Date
  • Department or country of origin of the tourists
  • Destination department in Occitanie
  • Volume of overnight stays in the destination department
  • Status of the holidays of the department of origin
  • Noon temperature (solar) of the destination department:
    • 0: not on vacation,
    • 1: on vacation,
    • 2: not specified
  • Qualitative status of the weather in the destination department:
    • 0: very unfavorable weather,
    • 1: unfavorable weather,
    • 2: correct weather,
    • 3: favorable weather,
    • 4: ideal weather
  • Number of major events in the destination department

capacités

For each department (13 lines / 61 columns) :

  • Department
  • Name of the department
  • Population of the department
  • Number of places (people) in collective accommodation
  • Number of places (people) in rental accommodation
  • Number of places (people) in outdoor accommodation
  • Number of places (people) in hotel accommodation
  • Total number of places (people)
  • Number of overnight stays for week 1
  • Number of overnight stays for week 53

Additional data :

Coding of departments, coding of countries, and list of events.

Examination Modalities

Your work will be evaluated through one of two solutions left to your choice :

  • An oral presentation by group of a maximum duration of 10 minutes
  • OR
  • A video presentation of your DataViz, including your comments, of a maximum duration of 10 minutes to be submitted in the deposit space of this page the evening before the exam date at the latest. The video deposit space will be opened later.

In both cases, the oral presentation or the viewing of the video will be followed by questions for a maximum duration of 5 minutes.

Evaluation Criteria

The works will be evaluated according to different criteria, including the following :

Evaluation of Visualizations

CriteriaPoints
Ability of the visualization to clarify the data5
Ability of the visualization to be easily understood5
Choice of colors appropriate to the message to be conveyed5
Ability of the visualization to faithfully transcribe the data (choice of scales or addition of effects that could mislead the audience)5

Evaluation of the Oral Presentation

CriteriaPoints
Ability to enhance the subject (Dynamism of the presentation, ability to arouse interest)5
Quality of the presentation materials (care taken for the realization)5
Quality of the responses to questions5
Ability to explain the work carried out5

Please note, this is not a “statistical performance” but rather a test of creativity, originality, and searching for the best way to “illuminate” the data. If you attempted a complex analysis without succeeding, still present at the end of your presentation what you wanted to do and how you tried to go about it.

Oral Presentation Times

The times for oral presentations will be defined later.

Sources of Inspiration

https://www.dataviz-inspiration.com/

https://www.data-to-viz.com/

https://datavizproject.com/

https://www.awwwards.com/websites/data-visualization/

https://viz.wtf/

Some Tools

https://www.tableau.com/fr-fr/academic/teaching

https://observablehq.com/pricing

Python and Some Libraries

https://www.python.org/

https://geopandas.org/en/stable/

https://python-visualization.github.io/folium/

https://pandas.pydata.org/

https://matplotlib.org/

https://seaborn.pydata.org/

Some Tutorials 😉

Google Colaboratory & PandasDownload

Drawing a map in Python

Drawing a Sankey in Python

Have fun !

This challenge is published with the permission of the Toulouse Dataviz association (https://toulouse-dataviz.fr/)