by Bertrand de Véricourt
Here’s what happened last year : in one of the poorly heated rooms made for practice lessons of Télécom-ParisTech, 4 guys and I decided to work on the prediction of our future president.
OK, it looks crazy. But we realized that somebody in the USA had had great results with mainly polls : Nate Silver had predicted Obama’s victory in 2008 (he was right for 49 states out of 50) and 2012 (he was right for… all states).
But then, badaboum. A few weeks after our decision, the guy who could not possibly win by 71.4% as for N. Silver, well, he actually won! And that was no fake news : Trump seemed to have jeopardized our project indirectly.
But of course we knew there’s no magic in here, and the best we can do is get results, polls, socioeconomic data from previous elections… but also collect data from Google, Twitter, Facebook, to improve what the first data say!
These data concern much more people than polls. They have their disadvantages (e.g. the dominant social class on the French Twitter is “CSP+” (“privileged socioeconomic category”, and people don’t answer to forms made especially for us), but they have their strengthes (data come at any time, from loads of sources, and give other informations).
So you understand now, why we went on with the project that we call : Predict The President.
Please find underneath another quick and formal presentation. Don’t hesitate to tell us whether it is clear or not, as it is used to compete in the Datajournalism Awards 2017. Thanks!
Our project aims at bringing data insights on the 2017 presidential campaign in France.
It is divided into 3 axes :
- get insights over the campaign through internet, social networks especially
- gather socio-economical and polls data to analyze the campaign
- Make predictions with data science
This is a combo project, combining server architecture, scraping, industrialization, dataviz, data science and storytelling skills. We nearly did everything from scratch.
What makes this project innovative? What was its impact?
This project is innovative because it mixes data science, datavisualisation and journalism in the same spot.
But, mostly, I see it innovative as it mixes social networks data, polls and socio-economical data to make a prediction.
We try not to repeat the experience of Nate Silver, who played with polls mostly, and extend the wideness of our voters “sensors”.
For the primary of the left wing in January 2017, Twitter and Google data curbs were an interesting way to check out trends – that were verified, indeed.
This was reported in two articles from the magazine Le Point with whom we’ve been in contact.
More articles are to come in April and May, with more density in both the journalistic and data scientific aspects.
In the meanwhile, we work on different dataviz gathering daily refresh data for the dashboard of the campaign. Some are already available here : lepoint.fr/presidentielle.