CLOSE X
Get in Touch
Thank you for your interest! Please fill out the form below if you would like to work together.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Data Visualization in Tableau

January 20, 2020
Click to view live version
Due to the limitations on the number of rows that can be uploaded to Tableau Public, this only shows data from one station (MIT at Mass Ave. / Amherst St. Station).

In this data visualization project, we used Tableau to create an interactive dashboard showing various aspects of bike trips in Boston using trips data from BLUEbikes.

Brainstorming

In the beginning, we had a lot of different ideas ranging from displaying all rides for all stations for all available years as an interactive flow map to very advanced 3D flow maps. All in all, we needed to restrict ourselves to a realistic amount of work and a detailed but still nice looking dashboard concept. Due to the volume of the BLUEBikes trip data, we used only the trips from the Top 10 Stations for 2017 and 2018. We decided to focus on the time aspect of the data and explore the distribution of trips with regards to different seasons, months, and days of the week. We also figured to display our visualization using two Tableau dashboards. The first one is a dashboard with a Flow Map in the main display and additional elements on the side that dynamically change as the viewer interacts with the dashboard. And the second one shows a dynamic time series dashboard that displays the trip information for each day for two years. So, the focus of our dashboards is strongly on an intuitive and deep exploration of data by a variety of demographic and user-centric means.

Data Preprocessing

We downloaded all monthly datasets for 2017 and 2018, so we had to combine 24 CSV datasets in total. We used the Windows command prompt to automate this process. We grouped the .csv datasets according to seasons such as spring, summer, autumn, and winter because we did some further preprocessing and data cleaning within Excel and Excel has the limitation of handling a maximum of 1 million rows per File which we weren’t allowed to exceed.

We did some data format standardization procedures as well, especially since the trip data have different formats when Hubway was acquired by and renamed to BLUEBikes after March 2018. For instance, the date format used by Hubway is different from the date format used in BLUEBikes. Also, the longitude and latitude fields were initially in String format so we still had to convert them to Decimals and set the Geographic roles in Tableau.

Also, we did a couple of field data calculation in Tableau in connection to the additional data that we would like to display in our dashboard. First, we calculate the trip duration which initially has its units in seconds to get the values in hours which is more intuitive for the users. We also calculate another field to indicate the generation of the users based on their birth year. We initially have five generation categories namely; Baby Boomers, Gen X, Gen Z, Millennials, and Silent Generation, but we ended up removing the last since the age group is quite too old to use bicycles and this might be anomalous data only. Then we calculate another field for the Season categories so we can group the trips into Winter, Spring, Summer, and Autumn. We also calculate another field concerning the specific day of the week when a trip was carried out so we can visualize and compare the trip distribution in terms of workdays and weekends. Lastly, we also add another field for the TripID which is a crucial part for successfully creating a flow map in Tableau.

Data Visualization

The first dashboard we made features a Flow Map as well as some other time-related elements which we derived from the data. So, we needed to prepare our worksheets accordingly. As an important part of our dashboard, we wanted to have a display of the total amount of trips proportionally in size for each of our top 10 stations. The stations are clickable and serve later on as an interactive, clickable filter for the dashboard.

An additional important element is our overview of the trip duration (color) and the total number of trips (size of blocks) depending on the month and the weekday. Because we included two years of data, we can see quite some differences and tendencies, for example we see a clear differentiation between weekdays and weekends more in terms of total trips than trip duration.

We grouped the ages of users to typical generation classifications as mentioned in one of the previous pages of this report. The size of the circles is proportional to the total number of trips by generation. Within this circle we can see the proportion between male, female and no gender provided users.

Flow Map

To do the Flow Map, we based our approach to the Connecting the Dots approach used by Konstantin Greger which makes use of both the trip data as well as the station data.

First, we did a self-union in Tableau which essentially duplicates the trip records to establish the origins and destinations for each trip based on Trip ID. This step automatically adds a new generated field [Table Name] which was then used to create a Join Calculation query to join the trips to the station using the Trip ID. Another field called Path ID was generated which contains a value of 1 signifying the start station and 2 for the end station.

After joining, the trips were symbolized as lines using the Path ID with relatively low transparency and lighter line weight to make the map less cluttered when all the trips are visualized.

Time-Series Dashboard

For the second dashboard, we created a dynamic time-time series dashboard. The picture below shows the trips on the first of January 2017. We can observe a low number of trips and identify only 6 out of 10 top ten stations being in use. Some of the 4 missing stations are either on maintenance or not being used that certain day. Stations are usually going on maintenance during the winter period because of the lower ridership. Here again, the top 10 stations are shown proportionally by the number of trips for that certain day within the timeline.

In the beginning of April, all top 10 stations of Boston are in use again. Here you can see a snapshot from 23th April 2017. In comparison to January 2017 we can see much more activity across the map and a larger number of total trips also by station. Interestingly there were 497 trips handled by only 331 bikes which means that some bikes were used multiple times that day. When we compare the total trip duration of 752 hours with the total number of trips, we could conclude that the average trip duration lasted more than one hour. However, we can’t read out the true distribution of values/trip durations because there could as always be some outlier. Furthermore, we can conclude by looking at the size of the “top 10 stations”- visualization that this certain day has only approximately 30% of traffic compared to the peak day for 2017/2018 which would be the day with most trips in total. For more information about the certain paths used on these certain days, the user can interact with the map and zoom in, click and set filter.

Here's the dashboard with the time series in one of the most intense days (1524 total trips) of our period 2017-2018. Display: 11th July 2018
Here we can see how the top 10 stations almost filled up the area (bottom-right). We can see that the time series is a very powerful display especially when we want to analyse spatio-temporal patterns.

*This project was accomplished with the help of my partner, Stefan Mirkovic.

Let's Work Together
Contact Me