art data


Housed within white walls of galleries, the contemporary art market is opaque. Information is limited to winning bid prices published by auction houses and transactions taking place in the primary market, i.e. sale of art directly from the artists' studios are impenetrable. Challenged by the lack of transparency, I sought my own answers. Artsy is an online platform that boasts to house 300,000+ (in 2015) artworks. By scraping Artsy’s website, I was able to attain information on over 66,000 artworks, 3X the amount available on Artsy's public API at the time. Scraping also gave me access to information on artwork prices and whether or not the work has been sold.
‘what factors contribute to artwork sale?’ as well as ‘what about color?’ were the questions I asked.

Github repo

data feature set for two artworks

Artwork prices were scarce 28% of the records had prices associated with them and sale status was available to only 7% of the entries.While running regression models, the model had a score of 90% just by predicting that artwork will not sell and a roc auc score of 0.54.

By narrowing and expanding my feature set I aimed to balance the dataset. Finally settlling on 11 features, my roc aucto rose to 0.67, which is only slightly superior to a random guess.

In 'Collectible Investments for the High Net Worth Investor', Stephen Satchell states: “Attractive subjects are more highly valued than unattractive subjects… Certain colors are more desirable than others – for example, red and blue will generally dominate yellow and green [in terms of artwork sales].”