IMDb data key concepts

IMDb IDs

IMDb uses unique identifiers for each of the entities referenced in IMDb data. For example "Name IDs" identify name entities (people) and "Title IDs" identify title entities (movies, series, episodes and video games). IMDb identifiers always take the form of two letters, which signify the type of entity being identified, followed by a sequence of at least seven numbers that uniquely identify a specific entity of that type. For example:

tt0050083 is the unique identifier for the movie "12 Angry Men (1957)", where tt signifies that it's a title entity and 0050083 uniquely indicates "12 Angry Men (1957)".
nm0000020 is the unique identifier for the actor "Henry Fonda", where nm signifies that it's a name entity and 0000020 uniquely indicates "Henry Fonda".

Within the data set, each entry relates to a single IMDb identifier.

These IDs can be seen in some of the IMDb URLs, for example the title page https://www.imdb.com/title/tt0050083/ has the Title ID tt0050083 to reference the movie "12 Angry Men (1957)".

Changes to Entities and Resolving IDs

Duplicate IDs

IMDb data is constantly being updated, both with the addition of new data and enhancement of the quality of existing data. While there is only ever one unique IMDb identifier, there are, on occasion, instances where there might be duplicate entries for the same entity. This could happen, for instance, if multiple users have contributed data for the same entity (e.g. the same person) under different identifiers (e.g. different name ids). In this case IMDb maintains both identifiers in the data set, effectively duplicating the data. This allows you to continue using any matching you have between IMDb identifiers and other identifiers. To identify when this is the case a remappedTo field is included in the bulk data sets and the title.meta.canonicalId and name.meta.canonicalId field is included in the API. From these fields, you get the new preferred identifier for that entity.

The Big Bang Theory pilot episode has multiple Title ID entries referring to the same episode: tt1044014 (the Title ID that has been remapped) and tt0775431 (the preferred Title ID). When you retrieve either the remappedTo value from the Bulk Data or the title.meta.canonicalId for Title ID tt1044014, you will receive the preferred Title ID tt0775431.

Deleted IDs

Sometimes entities are deleted from the data set. The most prominent example of this is the deletion of titles that have been canceled during development and will therefore never be released. When an entity is deleted, it is no longer included in the data set. The identifier associated with it is never reused for a different entity.

Principal Credits

Principal credits are a set of the most important cast/crew credits for a title, with the selection and order determined by IMDb. Principal credits are often similar to top-billed cast, but they can be different, for example if the title credits are in order of appearance or alphabetical. For more details see IMDb help site.

API Product Overview

Learn about the benefits of the GraphQL IMDb API on AWS Data Exchange