cover image for post 'Download Posters with 'The Movie Database' API in Python'

Download Posters with 'The Movie Database' API in Python

The Movie Database is a community maintained movie, tv and actor database. One of its most useful feature is artwork, something that IMDb does not provide. XBMC uses the database as the default resource for posters and backdrops.

All content on the website can be accessed with a public API, currently in version 3. This blog post illustrates how to use the API to download posters for a movie.

Step 1: account creation and API key

To use the API you must have an account and an API key. You can register the latter in your account settings under section API.

Step 2: system wide configuration information

Many of the API request, including the image download, rely on a system wide configuration. All artwork, for instance, is stored on Cloudfront and the API gives the image locations relative to this base. The API request to get the configuration has the following simple format:

http://api.themoviedb.org/3/configuration?api_key=<your_api_key>

Here’s a small Python snippet to request the configuration and store it as config.

import requests
CONFIG_PATTERN = 'http://api.themoviedb.org/3/configuration?api_key={key}'
KEY = '<your_api_key>'

url = CONFIG_PATTERN.format(key=KEY)
r = requests.get(url)
config = r.json()

The request will return the data in JSON by default with the following content:

{'change_keys': ['adult',
                  'also_known_as',
                  ...,
                  'translations'],
 'images': {'backdrop_sizes': ['w300', 'w780', 'w1280', 'original'],
             'base_url': 'http://d3gtl9l2a4fn1j.cloudfront.net/t/p/',
             'logo_sizes': ['w45', 'w92', 'w154', 'w185', 'w300', 'w500', 'original'],
             'poster_sizes': ['w92', 'w154', 'w185', 'w342', 'w500', 'original'],
             'profile_sizes': ['w45', 'w185', 'h632', 'original'],
             'secure_base_url': 'https://d3gtl9l2a4fn1j.cloudfront.net/t/p/'}}

We need two values from the images section:

  • base_url: this is where the images are stored.
  • poster_sizes: those are the available sizes.

Let’s assume we want to download the maximum size, which seems to always be the last element of the sizes list. For example, original in [‘w92′, ‘w154′, ‘w185′, ‘w342′, ‘w500′, ‘original’] will lead to the maximum resolution. I’m not certain though that the API will always return sizes in ascending order, therefore I use a custom sort function to get largest size:

base_url = config['images']['base_url']
sizes = config['images']['poster_sizes']
"""
    'sizes' should be sorted in ascending order, so
        max_size = sizes[-1]
    should get the largest size as well.        
"""
def size_str_to_int(x):
    return float("inf") if x == 'original' else int(x[1:])
max_size = max(sizes, key=size_str_to_int)

Step 3: Get available poster urls

Now that we have the necessary information from the configuration, we can proceed to request the posters for the desired movie. The request to get the images has the following format:

http://api.themoviedb.org/3/movie/<imdbid>/images?api_key=<key>

where is the IMDb movie id, e.g., tt0095016 (you can find out the ID for a movie title with the script at the end of this blog post). The following Python script retrieves the image information from the movie database:

IMG_PATTERN = 'http://api.themoviedb.org/3/movie/{imdbid}/images?api_key={key}' 
r = requests.get(IMG_PATTERN.format(key=KEY,imdbid='tt0095016'))
api_response = r.json()

The API response has the following format:

{'backdrops': [{'aspect_ratio': 1.78,
                 'file_path': '/sEkWkPFIcoSyP3qRiZunyOfdMpv.jpg',
                 'height': 1080,
                 'iso_639_1': None,
                 'vote_average': 5.36734693877551,
                 'vote_count': 7,
                 'width': 1920},
                ... ],
 'id': 562,
 'posters': [{'aspect_ratio': 0.67,
               'file_path': '/mc7MubOLcIw3MDvnuQFrO9psfCa.jpg',
               'height': 1500,
               'iso_639_1': 'en',
               'vote_average': 5.45518207282913,
               'vote_count': 5,
               'width': 1000},
              ...]}

To later download the poster, we only need the file_path information. With the information from the system-wide configuration (step 2), we have all information to build the full url to the image as follows:

url = <base_url> + <max_size> + <rel_path>

for example

base_url = 'http://d3gtl9l2a4fn1j.cloudfront.net/t/p/'
max_size = 'original'
rel_path = 'mc7MubOLcIw3MDvnuQFrO9psfCa.jpg'
url = 'http://d3gtl9l2a4fn1j.cloudfront.net/t/p/original/mc7MubOLcIw3MDvnuQFrO9psfCa.jpg'

The following Python snippet assembles the image urls and adds them to a list:

posters = api_response['posters']
poster_urls = []
for poster in posters:
    rel_path = poster['file_path']
    url = "{0}{1}{2}".format(base_url, max_size, rel_path)
    poster_urls.append(url)

 

Step 4: download posters

Finally, we store all posters as poster_1.jpg, poster_2.jgp, etc. in the current directory:

for nr, url in enumerate(poster_urls):
    r = requests.get(url)
    filetype = r.headers['content-type'].split('/')[-1]
    filename = 'poster_{0}.{1}'.format(nr+1,filetype) 
    with open(filename,'wb') as w:
        w.write(r.content)

The final code can be seen on GitHub Gist.

Bonus: get IMDb id for movie title

The following function uses the undocumented IMDb API to get the IMDb ID for a movie title.

import requests
import urllib

def imdb_id_from_title(title):
    """ return IMDB id for search string

        Args::
            title (str): the movie title search string

        Returns: 
            str. IMDB id, e.g., 'tt0095016' 
            None. If no match was found

    """
    pattern = 'http://www.imdb.com/xml/find?json=1&nr=1&tt=on&q={movie_title}'
    url = pattern.format(movie_title=urllib.quote(title))
    r = requests.get(url)
    res = r.json()
    # sections in descending order or preference
    for section in ['popular','exact','substring']:
        key = 'title_' + section 
        if key in res:
            return res[key][0]['id']

Link to Gist

Archived Comments

Note: I removed the Disqus integration in an effort to cut down on bloat. The following comments were retrieved with the export functionality of Disqus. If you have comments, please reach out to me by Twitter or email.

Hunter Jun 09, 2016 03:57:47 UTC

it downloads a bunch of cover photos. i only want one. how do i change this? I tried putting in a size manually for the url variable, but to no avail.

Johannes Bader Jun 09, 2016 07:17:28 UTC

If you only want the best rated poster, do this:

api_response['posters'][0]

If you want a specific size, then iterate over all posters and check the "height" and "width" fields:

for poster in api_response['posters']:
if poster['height'] > ... and poster['width'] > ....:

Hunter Jun 09, 2016 18:53:49 UTC

where would the api_response go? sorry, i'm new to python. also, so i want to do an if statement, to check if it is larger or smaller than certain parameters then acquire the image? where does this followup with the api_response?

J Jun 10, 2016 00:01:54 UTC

I suggest you read up on JSON. The format is pretty easy and there are fuctions in almost all programming languages. Since you dont know Python, just use whatdver you are comfortable with. The API can be used by any language, so you do not need to use Python.

Hunter Jun 09, 2016 20:09:37 UTC

actually, how could i change it save the url for the image instead of the image to a folder. ultimately, i would like to put this url into a database for a website, but i could figure that out, even if you just guide me into getting it into a text file or similar. thanks!

Chris Wilson Oct 15, 2017 13:26:51 UTC

Why IMDB? I find it inferior in most ways to Wikipedia.

林玗靜 Apr 04, 2018 17:38:26 UTC

Thanks for your tutorial !
It was really helpful for me :)