XBMC Library Cleanup Part I - Missing and Duplicate Movies
Detect missing or duplicate movies in the XBMC database with SQL queries and some PythonXBMC is an outstanding media player with an attractive and fast user interface. The software manages your media collection almost perfectly, nevertheless, from time to time movies are not properly added one-to-one. The two types of misrelation are:
- XBMC does not add the movie to the library at all.
- XBMC adds multiple instances of the same movie.
The second error is quite apparent when your browse through the movie collection (see for instance the featured image, where the movie The World’s End appears twice). The first type is harder to notice, especially if you add movies in bulk. There is an excellent add-on, however, called “Missing Movie Scanner” that lists missing media within XBMC.
But even though both types of errors can be detected from within XBMC, you might want to directly query the XBMC video database for the following reasons:
- you get a report of errors almost instantly, rather than having to browse the library and record errors separately
- you can detect errors without launching XBMC
- you can easily automate and extend the procedure, e.g., to automatically delete movies with duplicate entries
The tutorial focuses on how to query the XBMC video database to find out how many times a particular movie will appear in XBMC. Given those SQL queries it’s easy to build a script that generates a report for an entire movie collection; a small proof of concept Python code to do so is shown at the end.
SQL Walkthrough
Step 1: Connect to the Video Library
The database is named MyVideos**.db
, where **
denotes the version number. On Windows you usually find the file under C:\Users\USERNAME\AppData\Roaming\XBMC\userdata\Database</code> (see the entry on the XBMC wiki for more information).
To query the database, you’ll need a SQLite interface. Some popular choices on Windows are SQLite3Explorer{.highlight}, the SQLite Manager Extension for Firefox{.highlight} and the plain command-line shell{.highlight} – which I’m going to use in the following.
For the command-line program simply provide the database as the first argument to connect:
>sqlite3.exe <path_to_db>\MyVideos75.db
SQLite version 3.8.2 2013-12-06 14:53:30
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite>
Step 2: Find the idPath – Reveals Missing Movies
Let’s assume your movies are in separate folders that match the movie title, e.g.,
X:\Movies\Con Air (1997)\
Con Air 1997.mk
extrathumbs\
thumb1.jpg
thumb2.jpg
extrafanart\
fanart1.jpg
The starting point is the folder that contains the video (X:\Movies\Con Air (1997)
). The table path
maps the path to a unique idPath
:
SELECT idPath FROM path WHERE strPath="X:\Movies\Con Air (1997)\";
/* example result: 483 */
If there isn’t an entry for your movie path, then the movie won’t show up in XBMC. To fix this, make sure the folder name corresponds to the movie title. If that doesn’t help, add a text file movie.nfo
to the folder with the following content:
<movie>
<title>Con Air (1997)</title>
<sorttitle>Con Air (1997)</sorttitle>
</movie>
http://akas.imdb.com/title/tt0118880/
Remark: If you have DVDs that you store as is, and if keep the VIDEO_TS
folder, then add the following to your WHERE
-clause to account for both cases:
... OR strPath="X:\Movies\Con Air (1997)\VIDEO_TS\";
Step 3: Find the idFiles – Reveals Duplicate Entries
Given the idPath
(for example 483
) one can now query the table files
to get a list of all movie files:
SELECT strFilename FROM files WHERE idPath=483;
/* example result: Con Air 1997 (dvdrip).avi */
Even if the movie consists of multiple parts (file stacking{.highlight}) or if your storing raw DVD, you should only get one entry for each movie. Let’s say your movie is split into two parts Gomorra (2008) disc 1.avi
and Gomorra (2008) disc 2.avi
respectively. XBMC won’t add two entries to the files database like might assume, but rather combines both files into a comma separated list prefixed by ‘stack://’
. Here’s the example as it should be stored in the database:
SELECT strFilename FROM files
WHERE idPath=(
SELECT idPath FROM path WHERE strPath="X:\Movies\Gomorra (2008)\"
);
/* example result:
stack://X:\Movies\Gomorra (2008)\Gomorra (2008) disc 1.avi , X:\Movies\Gomorra (2008)\Gomorra (2008) disc 2.avi */
If you get more than one entry for a given path then all those entries will show up as individual movies in XBMC. For instance:
SELECT strFilename FROM files
WHERE idPath=(
SELECT idPath FROM path WHERE strPath="X:\Movies\The World's End (2013)"
);
X:\Movies\The World's End (2013)\first part.avi
X:\Movies\The World's End (2013)\second part.avi
indicates a failed file stacking resulting in two database entries and twice the same movie in XBMC:
Automate the Procedure with Python
The following code iterates over all movies in a movie folder and executes the above SQL queries for each one of them. It then reports missing movies or movies added more than once.
Code
import argparse
import sqlite3 as lite
import os
def _open_db(db):
try:
con = lite.connect(db)
cur = con.cursor()
return (cur,con)
except lite.Error as e:
print("Could not open database %s: %s" % (db,e))
quit(1)
def find_flaws(db, movie_folder):
""" find missing or duplicate movies in video library
lookup all movies from a folder in video db
and find the following flaws:
(1) the movie is not in the libary (won't show up in XBMC)
(2) the movie has multiple entries (will show up multiple times)
Args:
db: path to 'MyVideos**.db' (XBMC video library)
movie_folder: path to directory with movies in separate folders
Returns:
flaws: dictionary of flaws.
key: movie name
value: flaw name
"""
(cur,con) = _open_db(db)
flaws = dict()
movies = os.listdir(movie_folder)
# iterate over all subdirectories of movies, i.e., over all movies on disk
for i, movie in enumerate(movies):
# show progress
print "\r{0:>3}/{1:<3} {2:<100}:".format(i, len(movies), movie),
path = os.path.join(movie_folder, movie)
# 1) check table 'path' for movies without 'idPath' (movie is missing)
# also checks 'VIDEO_TS' subfolder for DVDs
cur.execute('SELECT idPath FROM path WHERE strPath=? or ' +
'strPath like ?', (path + os.sep,
os.path.join(path,"VIDEO_TS") + os.sep))
res = cur.fetchone()
if not res:
flaws[movie] = "missing"
continue
else:
idPath = res[0]
# 2) check table 'files' for multiple entry (movie duplicates)
cur.execute('SELECT idFile FROM files WHERE idPath=?', (idPath,))
idsFile = [tmp[0] for tmp in cur.fetchall()]
if len(idsFile) > 1:
flaws[movie] = "{0} duplicate(s)".format(len(idsFile)-1)
elif len(idsFile) == 0:
flaws[movie] = "missing"
# clean progress line
print "\r" + 100*" " + "\r",
con.close()
return flaws
if __name__=="__main__":
parser = argparse.ArgumentParser(description='find flaws in XBMC video database')
parser.add_argument('db', help='path to myvideos**.db library')
parser.add_argument('movie_folder', help='path to movie folder')
args = parser.parse_args()
flaws = find_flaws(args.db, args.movie_folder)
for movie, flaw in sorted(flaws.items()):
print("{0:<15}: {1} ".format(flaw, movie))
Example Output
>missing_or_duplicate_movies.py ~/.xbmc/userdata/Database/MyVideos75.db ~/Movies
1 duplicate(s) : A Very Harold & Kumar 3D Christmas (2011) :
1 duplicate(s) : Devil Seed (2012)
1 duplicate(s) : Hera Pheri (2000)
3 duplicate(s) : Iron Man 3 (2013)
1 duplicate(s) : The Devil's Double (2011)
1 duplicate(s) : The French Connection (1971)
missing : The Vanishing (1988)
1 duplicate(s) : The World's End (2013)
Archived Comments
Note: I removed the Disqus integration in an effort to cut down on bloat. The following comments were retrieved with the export functionality of Disqus. If you have comments, please reach out to me by Twitter or email.