Scrape your cinema's listings to get a daily email of films with a high IMDb rating (Part 3)

Never miss a great film with this Scrapy tutorial

So we have our 2 scripts! But they are still 2 separate scripts and it will quickly become a complete pain running each one individually every time. So let’s fix that with, you guessed it, another little script.

This script is slightly different as it is a shell (bash) script; in their simplest form bash scripts can simply be line by line commands that you would type in to the terminal anyway. We’re going to fluff ours out a bit so it provides some feedback as to what is going on.

echo "Starting scraper"
scrapy runspider cinema_scraper.py -t json --nolog -o - > "movies.json"
echo "Scrape complete, checking movies with imdb"
python check_imdb.py movies.json

The first line tells the shell it’s a bash script; and the echo lines simply print the message to the screen giving a bit of feedback to the person running it. The other lines you will recognise from our previous testing, the scrapy file simply saves its results to movies.json, and the check_imdb.py script reads it results from the same file.

You can run this script from the code folder with:


My challenge to you, right now

We’re nearly done, feels a lot easier now right? The challenge I set to now is to convert this script into one that works with your local cinema as I very much doubt you care what my cinema is showing. If you remember from my explanation about why we separated the scripts - this is yet another reason as to why we did it; you now only have to change the web scraping part to look at your local cinemas website and as long as you return the data in the same way - you don’t even have to touch any of the lookup script. If you follow these steps you can get a working script today:

  • visit your local cinema’s website
  • find the page which details the films on today
  • work out what CSS selector you need to use to extract the film names
  • knowing the results of the previous 2 points, update the web scraping script

You want more of a challenge? Good on ya. Can you update the scripts to:

  • include the show times of the films in the email?
  • remember which films have been seen so you don’t get an email every day?
  • limit the films you’re told about to a certain genre?

As ever, email me for some hints or just to show me how you got along - it makes all this worthwhile. Same goes for if you have anything to add, I’m all ears. Simply reply to any email from my list.

