I was inspired by the research done by bold. web design on their Fortune 500 palettes site to compile a dataset around company logo/brand/website colors.
This repo contains brand palettes for said companies, palettes extracted from said companies homepages, as well as each company’s logo & a screenshot of their homepage. Do with it what you may (I have some links at the bottom for possible analysis tools).
logo&palette_scraper.py: hits the bold. site, looping over various industries and scrapes company name/industy/brand palette, and logo locationdownload_logos.py: downloads the logos from the last bullet into logos/get_urls.py: takes the company names and does a quick Google search for their homepage urlstake_screenshots.py: pops open a headless Chrome browser and screenshots the urls from above. Saves them to a hidden screenshots/ folder. Hidden because ~see next bullet~bulk_resize_images.py: resizes the screenshots to 512x512 imagesextract_screenshot_colors.py: takes said screenshots and uses the colorgram package to extract the top 6 colors in the screenshotlogo_colors.csv (sourced from the bold. site)
company: company name, hypen separatedcategory: industrycolor_{1-8}: contains 1-8 hex codes of brand colors (as determined by bold.)screenshot_colors.csv (extracted from website screenshots using colorgram)
company: company name, hypen separatedcolor_{1-6}: contains 1-6 hex codes of colors in the screenshotcolor_{1-6}_proportion: proportion of the screenshot that contains said color.
screenshot_location: where the screenshot is savedcompany_urls.csv
company: company name, hypen separatedurl: the company’s homepage to be screenshottedlogo_locations.csv
company: company name, hypen separatedfile_name: where the logo is savedurl: the url of the logo to be downloaded