Ten things I learnt at the 2018 Melbourne Datathon

The Melbourne Datathon was run over 2 months, it involved two hack days and concluded in the Melbourne Data Science Week which included the Hackers Helping Melbourne Symposium where the finalists for the Insights completions (student teams and mixed teams) were asked to present their findings.

From the Data Rules submitted slideset
From the Data Rules submitted slideset

The winners for the Insights and the Data2App competitions were then announced. As a semi retired health professional / app developer I am not the usual run of the mill person who takes part in the Melbourne Datathon. When I asked my friends and family for ideas on what issues they have with the Melbourne Public Transport System they wanted to know what I was doing this for. But to me it was a chance to learn something new and to do something that might provide a benefit to Melbourne.

1. Technically speaking

I got to work with a huge dataset. The initial dataset was about 400 million records of Myki touch-ons and touch-offs, but in the end we had access to 1.8 billion. We also were given a dataset from Intelematics which gave us average speeds for each hour of the working week in 7000 locations around Melbourne.

Created from the Intelematics data
Created from the Intelematics data

On the hackdays we had sessions teaching us how to use mapping and Amazon Web Server (AWS), how to create a web app and so much more. Help was provided but we needed to figure things out for ourselves. After all, Data Scientists need to be self driven.

Attending the first Hackday
*
Attending the first Hackday *

2. Start your slides earlier

Our team decided to look at congestion but struggled with the dataset trying to figure out how we could do this and find an angle that we thought PTV would not be able to tell from other methods. We eventually found our angle but it was not about congestion. We also looked at usage of trams, trains and buses over the 3 financial years. If we had started our slides earlier we would have realised the benefits of investigating this further and in different ways.

3. Use another dataset

Several of the finalists used public available datasets to provide a different angle. For example one team developed a safety rating which was related to individual tips.

Another team highlighted an area of Melbourne that were not well served with public transport by showing that people travelling from these areas had slow journeys because they had to transfer between different modes of transport. Very clever.

Created from the PTV data
Created from the PTV data

4. Relate it to specific people

We were told to ‘tell a story’, and several finalists did this by moving from an overall analysis to speaking about specific people or groups of people using the public transport system.

As the Symposium was held the week before the grand final, it was interesting to see three finalists use the data to analyse footy crowds in different ways, one team giving us an in depth analysis on which train lines supporters of each team lived. Hopefully PTV can use this to schedule trains based on the teams playing next year.

Grand Final 2018. No I am not over it
Grand Final 2018. No I am not over it

5. More pictures

Looking at what the finalists produced, we needed to show our information in more interesting ways. More eye catching. Sparkle as well as substance. This comes back to starting our presentation slides earlier.

From the Data Rules submitted slideset
From the Data Rules submitted slideset

6. It was a lot of hard work

Of the 212 teams (with more than 1000 people) who registered only 40 teams (161 individuals) submitted. One of the reasons for this was that even the initial 400 million rows was too much for laptops and home computers to handle all at once. Some of the teams were from large companies who may have had more resources to throw at it and many teams received credit from AWS. Our team did not get the $500 credit through the datathon but a big thank you to AWS from me for coming through with $100 credit when my usage exceeded the free tier. However there was still a large amount of work in loading, cleaning, understanding, and manipulating the data to find an interesting perspective that could be developed into an insight and/or app.

Taken at the Symposium
Taken at the Symposium

7. Not the only imposter

One of the talks at the Hackers Helping Melbourne Symposium was called Data Science Imposter Syndrome. Working in medical research for the last 10 years I have only recently been introduced to Data Science as a profession. This talk resonated with me and looking back at my history with clinical trials and data registries I realise that really I have been dipping my toe into Data Science for quite a while.

8. Planning, planning, planning

We so ran out of time. For the submissions we had to speak over our presentation slides to make a 3 minute video. Our team should have practised this in advance. Technically I had done this before and when the first recording was too long I could have put it into a movie making program and sped it up (removing the umms) rather than re-recorded! Something to remember next time! 

 

Created from the PTV data
Created from the PTV data

It is hard with volunteer projects such as a datathon, the work has to be secondary to paid employment or study. Time management is so much more important. Our team ran out of time analysing the data and so did not produce an app for the Data2App competition, something I would have loved to do. Maybe that competition will be run next year. I hope so.

9. Stop worrying about others being more experienced 

Looking at the presentations all teams produced great insights. However the student teams stood out on the day for the way they presented their work. One of the benefits to people submitting to the competitions was the possibility that they could submit CVs and be considered for internships or jobs by some of the large companies sponsoring the event. In fact after the winners were declared one company announced extra internships, obviously snapping up a couple of very promising interns.

The Finalists. *
The Finalists. *

10. You don’t have to win to be a winner

Congratulations to the winners, your hard work really paid off. Congratulations also to all the finalists, you did a fantastic job, but congratulations to everyone who submitted! Not only did you overcome the obstacles of this huge dataset but you managed to put the time and effort to produce something worthwhile. AND I bet you learnt a huge amount. BUT most importantly you showed you can overcome obstacles and deliver. Something that will stand you in good stead now and in the future, whether you are a student, working as a Data Scientist, or like me just doing it for fun.

A big thank you to the sponsors, Phil Brierley and the team for putting on this datathon. Also thank you to PTV and Intelematics for providing the datasets. 

by Maree Williams (Team Data Rules)

*Photos curtesy of Melbourne Datathon

There are videos of the presentations by the finalists for the Insights competition posted by Woratana Perth

Insight Pitches 0/2

Insight Pitches 1/2

Insight Pitches 2/2