Author(s): M. Saraee, S. White & J. Eccleston
This paper details our analysis of the Internet Movie Database (IMDb), a free,
user-maintained, online resource of production details for over 390,000 movies,
television series and video games, which contains information such as title,
genre, box-office taking, cast credits and user's ratings.
We gather a series of interesting facts and relationships using a variety of data
In particular, we concentrate on attributes relevant to the user
ratings of movies, such as discovering if big-budget films are more popular than
their low budget counterparts, if any relationship between movies produced
during the "golden age" (i.e.
Citizen Kane, Itís A Wonderful Life, etc.) can be
proved, and whether any particular actors or actresses are likely to help a movie
The paper also reports on the techniques used, giving their
implementation and usefulness.
We have found that the IMDb is difficult to perform data mining upon, due to
the format of the source data.
We also found some interesting facts, such as the
budget of a film is no indication of how well-rated it will be, there is a downward
trend in the quality of films over time, and the director and actors/actresses
involved in a film are the most important factors to its success or lack thereof.
The data used in this paper is not freely distributable, but remains copyright to
the Internet Movie Database inc.
It is used here within the terms of their copying
Further distribution of the source data used in this paper may be
IMDb, Internet Movie Database, data mining, classification, movies,
Size: 357 kb
Paper DOI: 10.2495/DATA040331
the Full Article
This article is part of the WIT OpenView scheme and you can download the full text Adobe PDF article for FREE by clicking the 'Openview' icon below.
this page to a colleague.