[Logo] Jaikoz and SongKong Forums
  [Search] Search   [Recent Topics] Recent Topics   [Members]  Member Listing   [Groups] Back to home page 
[Register] Register / 
[Login] Login 
Request: Remove dupicates using source count as factor  XML
Forum Index -> SongKong Questions
Author Message
greengeek

Pro
[Avatar]

Joined: 18/09/2007 02:50:48
Messages: 435
Offline

I have been playing around with tearing apart my collection and making a second copy of it but with just focusing on tracks, instead of full albums. (part of a jukebox project)

I have used songkong to sort my collection by artist and track name and to clean up the duplicates in my collection by having SK check for Acoustid. This cleaned up quite a bit of the duplicates. However I still have a lot of duplicate tracks that have the same Artist and Song titles. I don't want to run these through something like just Artist and Title check, as some of these tracks might be one offs or used in live performances and i wouldn't want SK to choose them over an official or more obscure release, just because it might be a higher quality or be longer duration.

It would be awesome if there was an option where it could do a check for duplicates where duplicates have the same Artist name and Song title and then keep the one that has the highest amount of sources in the acoustid.

For example, I have 3 tracks all with diff acousticids, but they are all the Artist: Yanni, and the Title: Aria. They have the following Acousticid Ids:

43210c63-bcaf-4297-ad49-7ea684bf4686
10eabbeb-d680-4f3c-9230-1193ad9ceabc
21a8a0a1-3f1b-4454-82fe-b876f5be461c

Going to each page, we can see that they are of varying popularity.

http://acoustid.org/track/43210c63-bcaf-4297-ad49-7ea684bf4686
The total sources for all IDs on this page is 12

http://acoustid.org/track/10eabbeb-d680-4f3c-9230-1193ad9ceabc
The total sources for all IDs on this page is 221

http://acoustid.org/track/21a8a0a1-3f1b-4454-82fe-b876f5be461c
The total sources for all IDs on this page is 8

The middle recording is by far the most common and expected recording, hence the one I would like to keep.

Could there be a drop down option, something like?
Same Artist, Title, and Most Popular Acoustid

Would save me a ton of time having to manually clean out all these duplicates. Thanks!



KevinBluemel.com
Contemporary Instrumental Musician and Composer. Stop by to listen, watch videos, and get free downloads of songs and sheet music.
[WWW]
paultaylor

Pro
[Avatar]

Joined: 21/08/2006 09:21:27
Messages: 7485
Offline

We do get the number of sources from Acoustid and use it when matching to recording only (during Fix Songs) to help pick the best recording WITHIN an acoustid when we don't have enough metadata. That is remember one acoustid can be linked to multiple recordings, and these can be completely different recordings by different artists rather than just versions of the same recording.

But we don't store the number of sources so the Delete Duplicates process doesn't have this information and it would be too slow if it had to requery all the acoustids. More importantly I don't think it makes much sense as the number of sources only really makes sense comparing multiple recordings linked to the same acoustids.

But I do see a problem here, if the songs you consider duplicates have different recording ids and they have different acoustids then using any of the Musicbrainz metrics they are not duplicates, and using metadata only can be dangerous.

Would a artist name, trackname, ReleaseGroup Id option help ?

The idea being that it could only find duplicates if we finds two songs with the same name AND the same release group, i.e both songs are found on different versions of the same album, but its allows for different recordings ids because after all the Musicbrainz Database is imperfect and there are actually very many recordings that could be merged.

thanks Paul (Administrator)
greengeek

Pro
[Avatar]

Joined: 18/09/2007 02:50:48
Messages: 435
Offline

What happens a lot of times is that you may get the same song released with multiple takes or versions by the artist. Some times its a one off take that is released on a b-side or box set, some times an alternate take, some times just a live version.

What I have been doing is looking at the source count to find out which one is the most common release. If all the IDs only have 4 or so sources a piece, that is not a release that has been commonly submitted by the end users. However if another page has IDs that are a far greater number, say over 50 or over 100, then that release is far more common as it has been submitted by a lot more people. Given that, I would want the release that would be the one more people would likely recognize.

Currently SK allows for the removal of TRUE duplicates, where you have multiple copies of the exact same recording. I would like to be able to remove duplicates of the same song that may be different recordings, ie different lengths, takes, versions, etc, keeping only the one that would most likely be the main or official release.

In the examples below, you have an artist named Yanni, He has a lot of his songs, his Live at Acropolis versions, that are the ones that most people know him for and have greater plays than some of his studio album releases. Basically from really taking off from his PBS special that was played nonstop in the 80s. He is kind of a one off, as most artists studio albums far out shine their live releases. However there are still many artists that release many different takes and versions of a given song.

SK does a great job of removing duplicates, I guess what I am trying to do is take it to the next level and remove similar songs, same artist same song, but less known recordings when a more known recording exists in the collection. If that makes sense?

KevinBluemel.com
Contemporary Instrumental Musician and Composer. Stop by to listen, watch videos, and get free downloads of songs and sheet music.
[WWW]
paultaylor

Pro
[Avatar]

Joined: 21/08/2006 09:21:27
Messages: 7485
Offline

Yes the idea of a fuzzier matching based on similar recordings make sense, thats why I suggested this new option

artist name, trackname, ReleaseGroup Id

the idea being that recordingid doesnt have to be the same but the releasegroupid constraint protects against properly different version s of the same song.

My problem with your idea is that you are using noofsources across multiple acoustids when they are only meant to be used to compare recordings within an acoustid and as such Im not sure its a very good measure of popularity, though i can see it sort of works. Secondly there is the practical issue that we would have to look up the no-of-sources
slowing things down somewhat.

What about if this option (or a purely Artist, Title option) was combined with preview dialog letting you review changes before actually deleting.

Perhaps the concept of Popularity should be added to Advanced/Preferred Deletion Criteria , there might be some measure we could use for a recording instead of restricting to acoustid and then having found duplicates (e.g artist, songtitle) if the user makes this the top criteria then the most popular is kept.





thanks Paul (Administrator)
 
Forum Index -> SongKong Questions
Go to:   
Powered by JForum 2.1.6 © JForum Team