Developer
DocumentationContact Us
  • Documentation
  • Key Concepts
  • Overview
  • Getting Access to the API
  • API Key Concepts
  • Calling the API
  • Sample Queries
  • Title/Name
  • Box Office
  • Search
  • Overview
  • Bulk Data Key Concepts
  • Data Dictionary
  • Names
  • Titles
  • Box Office
  • Meters
  • Parents Guide
  • Querying In Athena
  • Creating Tables DDL
  • Release Notes

Querying Data in Amazon Athena Tables

What is Amazon Athena?

Amazon Athena is an interactive query service that makes it easy to analyze data directly in Amazon Simple Storage Service (Amazon S3) using standard SQL. For more information, and for getting started with Athena, read the user guide.

Getting Started

You first need to create a database in Athena. This process is documented in the user guide

When you have a database, you're ready to create a table that's based on the dataset. You need to upload your dataset to Amazon S3. When you specify a location for your table you should use a trailing slash for your folder or bucket. Do not use file names or glob characters.

Use: s3://S3-BUCKET/S3-KEY/

Don't use: s3://S3-BUCKET s3://S3-BUCKET/S3-KEY/* s3://S3-BUCKET/S3-KEY/DATASET.JSONL.GZ

Athena will query all objects in the specified location so it is important that only one dataset is found at that path.

To create a table use the create table DDL statement found at the end of this document. Remember to set the location to the location of your dataset. If you do not need to query all of the columns in the table you can remove them from the create table DDL statement.

Now that you have a table created in Athena based on the data in Amazon S3, you can run queries on the table and see the results in Athena.

We have provided some example queries for common use cases.

What Are the Highest-Rated Movies On IMDb?

IMDb user ratings can be found in the title essential dataset as part of the imdbRating structure.

select
    tc.titleId,
    tc.originalTitle,
    tc.imdbRating.rating,
    tc.imdbRating.numberOfVotes
from
    title_essential_v1 as tc
where
    tc.remappedTo is null and tc.titleType = 'movie'
order by
    tc.imdbRating.rating desc,
    tc.imdbRating.numberOfVotes desc

Running this query might return the following results:

titleidoriginaltitleratingnumberofvotes
tt005008312 Angry Men8.9616319
tt0110413Léon: The Professional8.5953632
tt0064116Once Upon a Time in the West8.5278746

What Are the Title Texts for the Titles That a Person Is Known For?

The title IDs that a person is known for can be found in the name essential dataset as part of the knownFor array. To query this array it is necessary to flatten it into multiple rows using CROSS JOIN in conjunction with the UNNEST operator. To include the original title text it is necessary to JOIN the title essential dataset.

select
    nc.nameId,
    nc.name,
    u_knownFor.titleId,
    tc.originalTitle
from
    name_essential_v1 as nc
cross join
    unnest(nc.knownFor) with ordinality as t(u_knownFor, ordinal)
join
    title_essential_v1 as tc
        on u_knownFor.titleId = tc.titleId
where
    nc.remappedTo is null
order by
    nc.nameId, ordinal

Running this query might return the following results:

nameidnametitleidoriginaltitle
nm0000020Henry Fondatt005008312 Angry Men
nm0000020Henry Fondatt0082846On Golden Pond
nm0000020Henry Fondatt0032551The Grapes of Wrath

What Are the Names of the Principal Cast for a Title?

The name IDs for the principal cast for a title can be found in the title essential dataset as part of the principalCastMembers array. To query this array it is necessary to flatten it into multiple rows using CROSS JOIN in conjunction with the UNNEST operator. To include the name it is necessary to JOIN the name essential dataset.

select
    tc.titleId,
    tc.originalTitle,
    nc.name
from
    title_essential_v1 as tc
cross join
    unnest(tc.principalCastMembers) with ordinality as t(u_pcm, ordinal)
join
    name_essential_v1 as nc
        on u_pcm.nameId = nc.nameId
where
    tc.remappedTo is null
order by
    tc.titleId, ordinal

Running this query on the sample dataset returns the following results:

titleidoriginaltitlenameidname
tt005008312 Angry Mennm0000020Henry Fonda
tt005008312 Angry Mennm0002011Lee J. Cobb
tt005008312 Angry Mennm0000842Martin Balsam

What Awards was a Title Nominated for, and who were the Award Nominees?


select
    tc.titleId,
    tc.originalTitle,
    tc_awards.awardNominationId,
    tc_awards.awardName,
    nc.nameId,
    nc.name
from
    name_essential_v1 as nc
cross join
    unnest(nc.knownFor) with ordinality as t(u_knownFor, ordinal)
join
    title_essential_v1 as tc
        on u_knownFor.titleId = tc.titleId
cross join
    unnest(tc.awards) as t(tc_awards)
cross join
    unnest(nc.awards) as t(nc_awards)
where
    tc_awards.awardNominationid = nc_awards.awardNominationId

Running this query might return the following results:

titleIdoriginalTitleawardNominationIdawardNamenameIdname
tt0032551The Grapes of Wrathan0052939Oscarnm0002034Jane Darwell
tt0032551The Grapes of Wrathan0829169NBR awardnm0002034Jane Darwell
tt0032551The Grapes of Wrathan0052926Oscarnm0000020Henry Fonda

What Are the Title Texts for Episodes of a Series?

The title IDs for episodes that are part of a series can be found in the title essential dataset as part of the episodeTitleIds array. To query this array it is necessary to flatten it into multiple rows using CROSS JOIN in conjunction with the UNNEST operator. To include the title text it is necessary to JOIN the title essential dataset.

select
    tc_series.titleId,
    tc_series.originalTitle,
    tc_episode.titleId,
    tc_episode.originalTitle,
    tc_episode.episodeInfo.seasonNumber,
    tc_episode.episodeInfo.episodeNumber
from
    title_essential_v1 as tc_series
cross join
    unnest(tc_series.seriesInfo.episodeTitleIds) as t(u_eti)
join
    title_essential_v1 as tc_episode
        on u_eti = tc_episode.titleId
where
    tc_series.remappedTo is null
order by
    tc_series.titleId,
    tc_episode.episodeInfo.seasonNumber,
    tc_episode.episodeInfo.episodeNumber

Running this query might return the following results:

titleidoriginaltitletitleidoriginaltitleseasonnumberepisodenumber
tt5491994Planet Earth IItt6142646Islands11
tt5491994Planet Earth IItt6209126Mountains12
tt5491994Planet Earth IItt6209130Jungles13

What Is the Title Text of the Series for an Episode Title?

The title ID for a series that an episode is part of can be found in the title essential dataset as part of the episodeInfo structure. To include the title text it is necessary to JOIN the title essential dataset.

select
    tc_episode.titleId,
    tc_episode.originalTitle,
    tc_series.titleId,
    tc_series.originalTitle,
    tc_episode.episodeInfo.seasonNumber,
    tc_episode.episodeInfo.episodeNumber
from
    title_essential_v1 as tc_episode
join
    title_essential_v1 as tc_series
        on tc_episode.episodeInfo.seriesTitleId = tc_series.titleId
where
    tc_episode.remappedTo is null
order by
    tc_episode.titleId

Running this query might return the following results:

titleidoriginaltitletitleidoriginaltitleseasonnumberepisodenumber
tt6142646Islandstt5491994Planet Earth II11
tt6209126Mountainstt5491994Planet Earth II12
tt6209130Junglestt5491994Planet Earth II13

What are the most popular titles on IMDb?

The popularity ranking of a title can be found in the TITLEmeter dataset as part of the IMDb Meters product. The most popular IMDb titles (movies and TV shows) can be obtained by running the below query.

select
    titleId,
    date,
    rank
from
    title_meter_v1
order by
    rank

Running this query might return the following results:

titleiddaterank
tt00011677/14/20241
tt84586647/14/20242
tt14288217/14/20243

Who are the most popular names on IMDb?

The popularity ranking of a name can be found in the STARmeter dataset as part of the IMDb Meters product. The most popular IMDb names can be obtained by running the below query.

select
    nameId,
    date,
    rank
from
    star_meter_v1
order by
    rank

Running this query might return the following results:

nameIddaterank
nm00011677/14/20241
nm84586647/14/20242
nm14288217/14/20243

Creating Tables DDL

Explore sample DDL commands to create tables for IMDb name and essential data.

Get started

Contact us to see how IMDb data can solve your customers needs.

Contact Us
  • Conditions of Use
  • Privacy Policy

© 1990-2025 by IMDb.com, Inc.