- Documentation
- Key Concepts
- Release Notes
IMDb data key concepts
IMDb IDs
IMDb uses unique identifiers for each of the entities referenced in IMDb data. For example "Name IDs" identify name entities (people) and "Title IDs" identify title entities (movies, series, episodes and video games). IMDb identifiers always take the form of two letters, which signify the type of entity being identified, followed by a sequence of at least seven numbers that uniquely identify a specific entity of that type. For example:
tt0050083
is the unique identifier for the movie "12 Angry Men (1957)", wherett
signifies that it's a title entity and0050083
uniquely indicates "12 Angry Men (1957)".nm0000020
is the unique identifier for the actor "Henry Fonda", wherenm
signifies that it's a name entity and0000020
uniquely indicates "Henry Fonda".
Within the data set, each entry relates to a single IMDb identifier.
These IDs can be seen in some of the IMDb URLs, for example the title page https://www.imdb.com/title/tt0050083/ has the Title ID tt0050083
to reference the movie "12 Angry Men (1957)".
Changes to Entities and Resolving IDs
Duplicate IDs
IMDb data is constantly being updated, both with the addition of new data and enhancement of the quality of existing data. While there is only ever one unique IMDb identifier, there are, on occasion, instances where there might be duplicate entries for the same entity. This could happen, for instance, if multiple users have contributed data for the same entity (e.g. the same person) under different identifiers (e.g. different name ids). In this case IMDb maintains both identifiers in the data set, effectively duplicating the data. This allows you to continue using any matching you have between IMDb identifiers and other identifiers. To identify when this is the case a remappedTo
field is included in the bulk data sets and the title.meta.canonicalId
and name.meta.canonicalId
field is included in the API. From these fields, you get the new preferred identifier for that entity.
The Big Bang Theory pilot episode has multiple Title ID entries referring to the same episode: tt1044014
(the Title ID that has been remapped) and tt0775431
(the preferred Title ID). When you retrieve either the remappedTo
value from the Bulk Data or the title.meta.canonicalId
for Title ID tt1044014
, you will receive the preferred Title ID tt0775431
.
Deleted IDs
Sometimes entities are deleted from the data set. The most prominent example of this is the deletion of titles that have been canceled during development and will therefore never be released. When an entity is deleted, it is no longer included in the data set. The identifier associated with it is never reused for a different entity.
Principal Credits
Principal credits are a set of the most important cast/crew credits for a title, with the selection and order determined by IMDb. Principal credits are often similar to top-billed cast, but they can be different, for example if the title credits are in order of appearance or alphabetical. For more details see IMDb help site.
Learn about the benefits of the GraphQL IMDb API on AWS Data Exchange