Back in 2011, I started rating all the movies I was watching. First, I rated all the movies I knew I had already watched, and then keeping up with the practice. I started this because I would get confused whether I had already watched a particular movie or not. Now, I just check IMDb, and if I have already rated it, then I have watched it. With almost 10 years of ratings data available to me now; I think it would be really cool to visualize it and see my movie-watching stats!
All the data visualization is done using Google Data Studio and the report is embedded below. If you have trouble viewing it, you can view it in a new window. It works best on a large screen (not really mobile-friendly). Under the report, I have also written a bit about the process.
Visualizing movie ratings data sounds very simple, but there were a few hoops I had to jump over. The data I had was an export of My Ratings list from IMDb. It gives me a unique ID for each movie, the movie title, my rating, IMDb rating, number of votes, year of release, genres, release date, and director(s). One important piece of information it didn’t give me, was actors.
To get this additional data, I used the OMDb API. It is very similar to IMDb, but has a free API with a daily limit of 1000 calls. Luckily, my list only had 987 movies. To combine my IMDb data and OMDb data, I used Python with Jupyter Notebooks and Pandas to run the API calls, split my data and then join it all into one large dataset to get all my movie-watching stats in one place.
There were a few elements I was hoping to use Google Data Studio’s custom fields for. But, for some reason, it wouldn’t let me (even though it works in the Data Explorer). I suspect this is a bug on Google’s part but I got around it using some additional calculations in the dataset.