Download Posters with 'The Movie Database' API in Python
The Movie Database is a community maintained movie, tv and actor database. One of its most useful feature is artwork, something that IMDb does not provide. XBMC uses the database as the default resource for posters and backdrops.
All content on the website can be accessed with a public API, currently in version 3. This blog post illustrates how to use the API to download posters for a movie.
Step 1: account creation and API key
To use the API you must have an account and an API key. You can register the latter in your account settings under section API.
Step 2: system wide configuration information
Many of the API request, including the image download, rely on a system wide configuration. All artwork, for instance, is stored on Cloudfront and the API gives the image locations relative to this base. The API request to get the configuration has the following simple format:
http://api.themoviedb.org/3/configuration?api_key=<your_api_key>
Here’s a small Python snippet to request the configuration and store it as config.
import requests CONFIG_PATTERN = 'http://api.themoviedb.org/3/configuration?api_key={key}' KEY = '<your_api_key>' url = CONFIG_PATTERN.format(key=KEY) r = requests.get(url) config = r.json()
The request will return the data in JSON by default with the following content:
{'change_keys': ['adult', 'also_known_as', ..., 'translations'], 'images': {'backdrop_sizes': ['w300', 'w780', 'w1280', 'original'], 'base_url': 'http://d3gtl9l2a4fn1j.cloudfront.net/t/p/', 'logo_sizes': ['w45', 'w92', 'w154', 'w185', 'w300', 'w500', 'original'], 'poster_sizes': ['w92', 'w154', 'w185', 'w342', 'w500', 'original'], 'profile_sizes': ['w45', 'w185', 'h632', 'original'], 'secure_base_url': 'https://d3gtl9l2a4fn1j.cloudfront.net/t/p/'}}
We need two values from the images section:
- base_url: this is where the images are stored.
- poster_sizes: those are the available sizes.
Let’s assume we want to download the maximum size, which seems to always be the last element of the sizes list. For example, original in [‘w92′, ‘w154′, ‘w185′, ‘w342′, ‘w500′, ‘original’] will lead to the maximum resolution. I’m not certain though that the API will always return sizes in ascending order, therefore I use a custom sort function to get largest size:
base_url = config['images']['base_url'] sizes = config['images']['poster_sizes'] """ 'sizes' should be sorted in ascending order, so max_size = sizes[-1] should get the largest size as well. """ def size_str_to_int(x): return float("inf") if x == 'original' else int(x[1:]) max_size = max(sizes, key=size_str_to_int)
Step 3: Get available poster urls
Now that we have the necessary information from the configuration, we can proceed to request the posters for the desired movie. The request to get the images has the following format:
http://api.themoviedb.org/3/movie/<imdbid>/images?api_key=<key>
where
IMG_PATTERN = 'http://api.themoviedb.org/3/movie/{imdbid}/images?api_key={key}' r = requests.get(IMG_PATTERN.format(key=KEY,imdbid='tt0095016')) api_response = r.json()
The API response has the following format:
{'backdrops': [{'aspect_ratio': 1.78, 'file_path': '/sEkWkPFIcoSyP3qRiZunyOfdMpv.jpg', 'height': 1080, 'iso_639_1': None, 'vote_average': 5.36734693877551, 'vote_count': 7, 'width': 1920}, ... ], 'id': 562, 'posters': [{'aspect_ratio': 0.67, 'file_path': '/mc7MubOLcIw3MDvnuQFrO9psfCa.jpg', 'height': 1500, 'iso_639_1': 'en', 'vote_average': 5.45518207282913, 'vote_count': 5, 'width': 1000}, ...]}
To later download the poster, we only need the file_path information. With the information from the system-wide configuration (step 2), we have all information to build the full url to the image as follows:
url = <base_url> + <max_size> + <rel_path>
for example
base_url = 'http://d3gtl9l2a4fn1j.cloudfront.net/t/p/' max_size = 'original' rel_path = 'mc7MubOLcIw3MDvnuQFrO9psfCa.jpg' url = 'http://d3gtl9l2a4fn1j.cloudfront.net/t/p/original/mc7MubOLcIw3MDvnuQFrO9psfCa.jpg'
The following Python snippet assembles the image urls and adds them to a list:
posters = api_response['posters'] poster_urls = [] for poster in posters: rel_path = poster['file_path'] url = "{0}{1}{2}".format(base_url, max_size, rel_path) poster_urls.append(url)
Step 4: download posters
Finally, we store all posters as poster_1.jpg, poster_2.jgp, etc. in the current directory:
for nr, url in enumerate(poster_urls): r = requests.get(url) filetype = r.headers['content-type'].split('/')[-1] filename = 'poster_{0}.{1}'.format(nr+1,filetype) with open(filename,'wb') as w: w.write(r.content)
The final code can be seen on GitHub Gist.
Bonus: get IMDb id for movie title
The following function uses the undocumented IMDb API to get the IMDb ID for a movie title.
import requests import urllib def imdb_id_from_title(title): """ return IMDB id for search string Args:: title (str): the movie title search string Returns: str. IMDB id, e.g., 'tt0095016' None. If no match was found """ pattern = 'http://www.imdb.com/xml/find?json=1&nr=1&tt=on&q={movie_title}' url = pattern.format(movie_title=urllib.quote(title)) r = requests.get(url) res = r.json() # sections in descending order or preference for section in ['popular','exact','substring']: key = 'title_' + section if key in res: return res[key][0]['id']
Archived Comments
Note: I removed the Disqus integration in an effort to cut down on bloat. The following comments were retrieved with the export functionality of Disqus. If you have comments, please reach out to me by Twitter or email.