Scrape your cinema's listings to get a daily email of films with a high IMDb rating (Part 3)
Never miss a great film with this Scrapy tutorial
So we have our 2 scripts! But they are still 2 separate scripts and it will quickly become a complete pain running each one individually every time. So let’s fix that with, you guessed it, another little script.
This script is slightly different as it is a shell (bash) script; in their simplest form bash scripts can simply be line by line commands that you would type in to the terminal anyway. We’re going to fluff ours out a bit so it provides some feedback as to what is going on.
Hi! I'm Darian, a software developer based in London, and I teach my readers here at Hexfox to automate the repetitive tasks that would otherwise suck hours out of their life, with the aim of preparing them for a future world where automation is king. Sound interesting? Stick around or sign up!
echo "Starting scraper"
scrapy runspider cinema_scraper.py -t json --nolog -o - > "movies.json"
echo "Scrape complete, checking movies with imdb"
python check_imdb.py movies.json
The first line tells the shell it’s a bash script; and the echo lines simply print the message to the screen giving a bit of feedback to the person running it. The other lines you will recognise from our previous testing, the scrapy file simply saves its results to movies.json, and the check_imdb.py script reads it results from the same file.
You can run this script from the code folder with:
My challenge to you, right now
We’re nearly done, feels a lot easier now right? The challenge I set to now is to convert this script into one that works with your local cinema as I very much doubt you care what my cinema is showing. If you remember from my explanation about why we separated the scripts - this is yet another reason as to why we did it; you now only have to change the web scraping part to look at your local cinemas website and as long as you return the data in the same way - you don’t even have to touch any of the lookup script. If you follow these steps you can get a working script today:
- visit your local cinema’s website
- find the page which details the films on today
- work out what CSS selector you need to use to extract the film names
- knowing the results of the previous 2 points, update the web scraping script
You want more of a challenge? Good on ya. Can you update the scripts to:
- include the show times of the films in the email?
- remember which films have been seen so you don’t get an email every day?
- limit the films you’re told about to a certain genre?
As ever, email me for some hints or just to show me how you got along - it makes all this worthwhile. Same goes for if you have anything to add, I’m all ears. Simply reply to any email from my list.
Reminder: want the code?
As I said earlier, the code for this is available for free and the license allows you to do what the hell you want to it with the one caveat that you’re allowed to take me to court if your computer melts through the floor (I jest). To get it, simply plonk your email below and my army of robot minions will make sure you get it swiftly.