Music discovery

Finding new music one would enjoy is commonly desirable, and it's a problem for which computers and Internet are well-suited. Major obstacles are copyright laws restricting sharing of music itself, as well as centralized music recommender systems not sharing their data, and not doing a great job themselves. For a long time I imagined that distributed systems and increased user collaboration would help to replace the latter, making the data available for anyone to analyze, but that doesn't happen so far. Here I will focus on analysis itself, even though there's not much of freely and legally available data to analyze.

Metadata analysis

Metadata—including musical properties (possibly tagged by listeners) and presence in users' playlists—is relatively easy to acquire and handle: human listeners themselves do the hard work, they are likely to analyze/classify/cluster the music better than neural networks or other statistical methods would, and copyright restrictions on the music itself don't apply to it. Elements of music, types of ornaments, voice type, and a bunch of other parameters can be parts of metadata, in addition to genres and other common tags.

The hard problem here is perhaps to make users to publish their playlists and tags freely, instead of giving them to centralized proprietary systems that won't give even the raw data back. While it's not a general solution, I've at least uploaded my playlist as a small step towards that.

As for now, there's the Spotify Million Playlist Dataset Challenge, requiring registration and acceptance of a few long user agreements, and then free for research/non-commercial usage, apparently.

Audio analysis

As with other statistical analysis, there's usually preprocessing, where Mel-frequency cepstrum is often employed to basically simplify and split the sound into timbre and timbre-less features, and the actual statistical model, often neural networks for larger inputs, and one may also try to feed them raw input.

Music sources

There is mostly classical music in public domain, but some independent music under Creative Commons licenses is available as well. openverse.engineering used to work as a starting point for search (linking a few websites with indie music, which it uses as sources), but seems to be defunct, as of 2024. Then there is "Legal Music For Videos" on the CC website, though some of the linked websites, such as jamendo.com and soundcloud.com (to which SoundFarm redirects; blocked for hosting Radio Svoboda podcasts, while podcastaddict.com is blocked for linking some SoundCloud-hosted podcasts), are already blocked in Russia, and others are also defunct. The "Pixabay has royalty free music you can use for free" HN discussion thread mentions a few more, though there seem to be issues with the ones I checked (no clear license terms, registration is needed, just plain sales, and so on).

Manual discovery

While automation is potentially useful, the data isn't quite available, but one can still rely on the discovery methods that would be archaic if it wasn't for copyright: Internet "radios", for instance; Icecast directory lists a bunch of those. There's also the Audio Archive, which includes at least record samples, and includes various compilations intended for discovery. Commercial services (YouTube, Jamendo, Bandcamp, and others) can also be helpful, providing some sorts of a "radio", generated playlist, suggestions, or other music discovery mechanisms. gnod.com ("The Global Network Of Discovery) tries to help with discovery as well, and includes gnoosic.com for music.

Some record labels (e.g., Projekt Records, Napalm Records) specialize on certain genres, and their rosters may help to discover new bands.