Data on the top 5,000 (about) biggest budget movies (as found onthe-numbers.com).
Give me a shout if you make something cool with this!
Data (pulled on 12/5/2019)
movie-budgets.csv
contains a dense file on data of ~5,000 movies
title
: Movie Titlerelease_date
: Release Dateproduction_budget
: Production Budget (USD)domestic_gross
: Domestic Gross Profit (USD)worldwide_gross
: Worldwide Gross Profit (USD)genre
: Movie Genrestory_source
: Story Source (Ex. Novel, Comic Book)creative_type
: Creative Type (Ex. Super Hero)production_method
: Production Method (Ex. Live Action, Stop Motion)production_company
: Production CompanyMPAA_Rating
: MPAA Ratins (Ex. PG13)runtime
: Runtime (minutes)opening_theater_count
: Number of theaters the movie opened tomax_theater_count
: Max number of theaters showing the movieavg_run_per_theater
: Average time in theaters (weeks)actors : Primary Actors/Actresses in movie (pipe ( |
) delimited string of names) |
directors : Director(s) (pipe ( |
) delimited string of names) |
data/
contains gzipped json files (each holding ~100 movies, named by budget rank)Working files
scrape.py
contains all scraping logicmain.py
executes the scrape & gzips the resulting json file into the json_data foldercreate_csv.py
ungizps & creates a csv from all the whole of the json files