art data


Housed within white walls of galleries, the contemporary art market is opaque. Information is limited to winning bid prices published by auction houses and transactions taking place in the primary market, i.e. sale of art directly from the artists' studios or galleries are impenetrable. Challenged by the lack of transparency, I sought my own answers. Artsy is an online platform that boasts to house over 300,000 artworks. By scraping the Artsy’s website, I was able to attain information on 66,043 artworks, 3X the amount available on Artsy's public API at the time. Scraping also gave me access to information unavailable through the API such as artwork prices and whether or not the work has been sold.

Github repo

Why does artwork sell?
Does color have an effect?

“Attractive subjects are more highly valued than unattractive subjects ... certain colors are more desirable than others – for example, red and blue will generally dominate yellow and green.”

– Stephen Satchell

Artwork prices were scarce

28% of the records had associated had prices associated with them and sale status was available to only 7% of the entries. My regression model had a score of 90% just by predicting that artwork will not sell.

My top 11 features had a roc auc score of 0.67, which is only slightly superior to a random guess.

When evaluating the effect of color, my regression model confirmed that red artworks sell and green ones have a negative correlation.