[Logo] Jaikoz and SongKong Forums
  [Search] Search   [Recent Topics] Recent Topics   [Members]  Member Listing   [Groups] Back to home page 
[Register] Register / 
[Login] Login 
Messages posted by: mjw  XML
Profile for mjw -> Messages posted by mjw [24]
Author Message
Hi Paul.

Sorry for the belated reply. I tried the new version and here were my initial impressions. The new algorithm works better than the old one but there are still a number of problems:
1) You have to manually select all the songs in each album. This is not feasible if you have thousands of albums. The algorithm still doesn't seem to have an option to treat each collection of songs found in a leaf-node sub-directory as a potential album. This means that I have to manually select all the tracks that are in the same album which is way to slow to be practical.
2)The release matching algorithm still picks the wrong release in many instances. By looking at a few examples, I believe that the accuracy could be greatly improved if you incorporated the following:
a) Treat song durations that differ by 3 seconds or less as being a 100% match for the duration field. It would also be useful to be able to add the duration field as a column on the first tab since I find this field (together with track number and title) is very useful when trying to choose which MusicBrainz release is the best match
b) Prefer releases from a particular country (for example, many of my songs were taken from CD's bought in the UK, but there is no way of giving Jaikoz this hint, so instead it seems to pick releases from Germany most of the time!)
3) The new version of Jaikoz seems to have lost the feature for customizing the Auto-correcting workflow (or maybe I just can't find that feature any more).
4) Jaikoz now highlights the metadata fields that it has changed. I may be wrong, but I don't think there is any indication for the artwork that has been changed.

paultaylor wrote:
Perhaps in this particular case it could be recalculated by just discarding the release metadata from the score for the compilation matches only.

Another idea - It might be make sense in the Manual Correct popup window to leave the scores as-is, but perhaps instead have a checkbox to hide/display tracks from compilation releases. If "Prefer Original" is enabled, then by default the compilation matches would be hidden, but a user could bring them back by disabling the the "Hide Compilation Releases" checkbox.

Alfg wrote:

I collect only songs and I'm not so interested in Albums but I'm interested in the first release date of that specific song

paultaylor wrote:

What we might be able to do in the next NGS release though is find all the releases that a particular track is on, and sort by date.

That sounds like it would be exactly what Alfg is looking for.

paultaylor wrote:

I'm taking about the recording entity in NGS.

Yes, makes sense. I was just thinking that you might have trouble finding a the earliest Recording if the Duration of Alfg's MP3 file doesn't match with that of the earliest Recording. Probably requires some experimentation when NGS comes out.

A question about MB querying - Do you know if it's possible in NGS to do queries like:
"Track='Love Me Tender' AND Duration > 2:43 AND Duration < 2:50"

Alfg wrote:

we have two different approachs. You collect Albums and they have to be constistent. I collect only songs and I'm not so interested in Albums but I'm interested in the first release date of that specific song, because I like to listen to music of a specific time-interval (or genre or mood etc.). 

Yes, I think I understand now and agree that two approaches are required.
It certainly does seem difficult. I found this page that describes the current MB search syntax

but ideally you would want a much richer syntax for finding the earlier release. In SQL you would need to use expressions similar to this

SELECT * FROM Track WHERE Track.Name='Love Me Tender'
AND Track.Artist='Elvis Presley'
AND Track.Type != 'Compilation'
ORDERBY SELECT Date FROM Release WHERE Release.ReleaseId=Track.ReleaseId

I suppose that complex queries like this are probably not possible via the current MB API. Presumably you have to make multiple queries and then refine the results on the client side.

Apparently MB are in the midst of implementing a richer relational data model and it sounds like the "Work" entity might be helpful in achieving Alfg's objective since it provides link between all tracks that are variants of the same song.

Sounds like the query language post NGS will still be a bit too primitive (uses Lucene) though:
By the way, here's a query in MB to find all the Tracks called "Love Me Tender" where the Artist is "Elvis Presley"

It looks to me like MB treats each "Love Me Tender" track as a separate Track object because they can each have a different TrackNo and Duration. Perhaps they can even be in a different language?!

How do you tell which one is the Original Release? Is there some way of finding the earliest release that contains a Track with a matching TrackName from the same Artist?

paultaylor wrote:
He want to associates all tracks wit the original release, so some/most could have metadata that is for the the correct release, and some won't we don't know ahead of time.

I mean even if you don't want to match earliest release the existing metadata in any file could help or hinder the process  

No, I actually think some of the meta-data is less helpful in his case than my case. In my case I am taking tracks that have been ripped from an Album to contain a subset of meta data tags and then am trying to match those tags against a corresponding release in MB. If someone has submitted that Album/Release to MB in the past then there's a good chance that my meta data will match best with that specific Release in MB rather than any others that might exist.
In his case, he is taking a set of tracks that have been ripped from various Albums. The meta data inside is going to match best with a whole plethora of different MB Releases. He doesn't want that to happen, so he needs a different algorithm than I do.

Since he doesn't want the tracks to be matched to the MB releases that they actually came from TrackNo, AlbumName and AlbumArtist meta data will almost certainly hinder (unless they happen to match by sheer chance). But luckily it sounds like it might not be necessary to even use those meta data fields in order to find the Original Release. I would need to clarify something first - for a given Track in MB is there a lookup function that would allow you to traverse to one and only one Original Release?

If this constraint generally holds true then I would propose that his algorithm does something like this:
1. For each Track in the collection, match the local track to the MB track that gives the highest track-level score but TrackNo, AlbumName and AlbumArtist should be excluded from the scoring (i.e. 0 points). Points awarded for matching some other release-independent fields (e.g. MusicIP ID) would be beneficial and help ensure that we match to the correct track.
2. Use the MusicBrainz API to lookup the Original MB Release for each MB Track and map the local track to the Original MB Release

paultaylor wrote:

No these options are NOT acceptable , what customer wants is to select 'Prefer Original releases ....' and for Jaikoz to do just that.  

OK, but if I am understanding the requirements correctly, the tracks that he wants to associate to the Original Release could initially contain meta data that are from a completely different release. Is that right?

In which case, the existing meta data fields (TrackNo, AlbumName, etc) are going to mislead the matching algorithm rather than assist it with matching to the desired "original" release. So I would argue that this usecase requires a different scoring algorithm than my usecase discussed previously. In this usecase I believe that TrackNo, AlbumName, etc should be completely excluded from the matching score since they are irrelevant to the choice of the correct release. Do you see my point?

paultaylor wrote:

Yes, that is the issue. Also it may appear to the user that hardly anyhing is happening at all if most of the tracks bing fixed are not visible.

I don't think it's a problem. You already popup a dialog box that shows that progress is being made i.e.

Checking 7.722 songs
142 Songs processed so far

Alfg wrote:

That's right, it's not always the case. I organize my collection in folders A, B, C etc. organized by Artist - Title. Because I don't like to have the same song 10 times ore more in different albums. Take a look at Elvis Presley etc. His songs come on various albums again and again. I like to tag one song for the earliest release (maybe from a single not the album or there exist no album). And for my nearly 200k songs I use MusicIP Mixer to do the playlists etc. 

That raises some interesting issues. So a particular Elvis Presley song like "Love Me Tender" appears on a lot of albums, but some users would like to only store 1 file on disk for the "Love Me Tender.mp3" even if they own many Albums containing that song. From a disk-space perspective I can see why people would want that, but this does seem to create problems for where to store the meta data. For example, after deleting the other 4 copies of "Love Me Tender.mp3" from disk, how would you be able to ask questions like "In my collection, which Albums contain the song Love Me Tender?". I think the only answer you could expect is - 1 album, not 5.

[A bit of a digression - The problem seems to come down to a limitation of ID3 tagging. I may be wrong, but I think that ID3 Tags only allow you to store one TrackNo, AlbumName, ReleaseDate, etc inside one MP3 file. So this limitation seems to force you to choose one release to link each file to. I suppose that in an ideal world none of the meta data would be stored inside the music files, rather it would be stored in a central database much like MusicBrainz. If you want to play a particular album, your music player would be able to lookup the List of MB Unique Id's associated with that Album and then see which of these you have available in your music library, then start playing in track order. All the meta data displayed by the music player would come from the MB database, not from the track. If instead you want to view songs by a particular Artist that could also be achieved in a similar way, again using the MB Unique Id as the keys to find the correct tracks to play. ]

But given the limitation that the current tagging technology requires that you to associate at Track with at most 1 Album (release), this raises some problems. Say you start with a bunch of albums (initially ripped from CD) that are all tagged with the specific release that they were ripped from:

Option 1 - Clean first, then delete duplicates
I guess you could first use Jaikoz to enrich the meta data (with Album Art etc) and once that has been done you could delete the duplicate tracks. But then you are going to lose the Album Art that was specific to that track reside in that specific Album. In fact, when you browse specific Albums with a music player it's going to look like your albums are missing tracks all over the place. I suppose that might be acceptable to some people if they never browse their collection by Album. Is that true in your case?

Option 2 - Start without duplicates, then try to map to an arbitrary album chosen by the user
If on the other hand you start with a hodge-podge of tracks that initially came from various incomplete Albums it's a slightly different problem. If you really only care about browsing your collection by Artist or Genre and not Album then it wouldn't be a problem though. You could delete all the meta data that has any connection with that track appearing on a specific release (e.g. TrackNo, AlbumName, ReleaseDate, etc).
But it sounds like you would instead like to switch all of those tracks so that they appear as if they have come from some other Album of your choice. But the TrackNo, AlbumName, ReleaseDate, etc is likely to be different in the Album that you would like to link the track to than the Album that the track initially was ripped from. So perhaps what would be needed is a Manuipulator Task in Jaikoz to empty out those fields (TrackNo, AlbumName, ReleaseDate, etc). Then you could do a "Manual Correct" to pick the release that you want to associate those tracks to. Would that work for you?

Perhaps if you are using MusicIP Mixer you don't need any tags to be written into your music files apart from MusicIP ID. Is that the case?

paultaylor wrote:
Theres alot of good ideas here, and Im going to make this the focus of the next release but three points I need to reiterate.

This is great news.

paultaylor wrote:

1. Jaikoz used to allow you to set the score weightings and other additional preferences but this didn't really help anyone, I want as much as possible to just let Jaikoz fix everything automatically (whilst accepting it wont be 100% accurate), and any options provided to the user should be non technical such as 'Prefer Original Releaase.. ' rather than technical such as 'Grouping options.

Point taken. I can appreciate the trade-offs between flexibility and complexity and why you might be concerned with exposing some of this. The only reason I suggested exposing the score results and score-weightings is that I believe that we might have been able to do better than simply "guessing" the optimal weightings to use in the scoring algorithm. I think it should be possible to find the optimal weightings for most users by optimizing on a sufficiently large initial collection of ripped music (one that has not been touched by taggers yet). Perhaps a compromise would be to put them in a disk-based preferences file?

paultaylor wrote:

2. You seem to be trying Jaikoz out on one album at a time, but this isn't how it is normally used, it is normally used on 100's or 1000's of songs at a time. Hence any option which would need setting/unsetting for different set of tracks would not be workable.

No, I'm running on about 8,000 tracks at a time. However, the majority of those tracks are complete (or mostly complete) albums with good directory structure and reasonable existing meta data. The examples Albums that I have raised as issues are just small subsets of the 8000 tracks that illustrate certain problems.

After I run AutoCorrect across the 8,000 tracks, I scroll down the artwork column to try to spot Albums where the artwork is either missing or inconsistent. I would like to be able to sort by Album AND by Artwork simultaneously to find errors more quickly, but this is not currently possible (I guess a fancier Table widget would be required).

I can see that it would probably be very convenient to be able to have an option in the Context Menu called "AutoCorrect treating selected tracks as one release". This would group (into one release) all the selected tracks regardless of AlbumName or SubFolder and then run the release-orientated correction algorithm. This would reduce the necessity to go into the Preferences and change settings before re-running to fix the odd AutoCorrect mistake.

Then I could use the following process to fix my collection:
1) Run AutoCorrect across all 8000 tracks
2) Look for mistakes (hopefully 0-10% of the tracks where there were issues with AlbumName)
3) Fix each mistake one by one by manually selecting the tracks that should have been identified as being in the same album, and choose "AutoCorrect treating selected tracks as one release"

paultaylor wrote:

3. I mention that tracks might not be in the release/track no order and you suggested multi column sorting. There is a technical reason why I haven't done this yet, which I might be able to resolve but this is not the problem. Even if the user could sort my multiple columns my point was that the user might choose to not sort the tracks by release, and therefore could be confusing if tracks were fixed in release order regardless. 

I see (I think), I didn't really fully understand your point the first time around. I'm not sure that much can be done to change an algorithm that is essentially working on a per-release basis so that it fixes in any order other than release order.

I suppose that I had always thought of the Table as just being a convenient way for browsing the tracks, making manual edits, and kicking off correction tasks on either individual tracks or on one or more releases. Certainly I didn't expect that sort order in the table could/should affect the AutoCorrect algorithm. I did however expect that multi-selecting rows would affect the subset of tracks that were considered by the AutoCorrect algorithm.

Are you perhaps worrying that when AutoCorrect is running the user will see rows begin corrected in a seemingly random order?

Perhaps I'm still not understanding the usecase here. Are you suggesting that users might want to organize their collection around some MusicBrainz entity other than Release (say TrackArtist)? Is this even possible using the MusicBrainz API? At the heart of any algorithm to clean up using MB, surely you would need to associate a track to a concrete release before you could confidently do useful data fixing.

I could perhaps imagine for some reason a certain type of user doesn't care about which album a song came from, but just wants to collect all the songs ever made by a particular artist. But if you don't even associate the tracks into their respective MB releases, then I think you would also have to forego making changes to TrackNo, AlbumName, Year, Artwork, AlbumArtist, etc. Without some kind of release orientation you might not even be able to know the TrackTitle for sure since even the same song has a different title on different releases.

So my initial thoughts are (unless you have some specific example use cases) that the table sort order should not have any effect on the AutoCorrect algorithm. i.e. Regardless of whether the tracks are currently displayed in release/track no order, artist order, or whatever, just run the same AutoCorrect algorithm. The only UI thing that should have an effect is the List of currently selected Tracks.

Thanks for listening to all these long-winded suggestions and even better for considering implementing them! This certainly sounds like a lot of work. Perhaps it's even worthy of a major revision number - Jaikoz 4.0?

paultaylor wrote:

Most users are just going to think, what is this, information overload ecetera, and as you say if the algorithm get s more complex it wouldnt make much sense to display this anyway.

True. Perhaps you could hide it as an advanced option or something. I'm thinking that once you get this new algorithm implemented, the next stage is going to be to find the optimum scoring weightings. This is going to be much easier for myself and other users to assist you with if there is an clear way to see how each score was computed. It will also be useful to answer future user questions along the lines of - "Why did Jaikoz pick release X instead of Y?" without requiring your support time to perform the analysis.

paultaylor wrote:

If we were just to group tracks by subDirectory it would be okay as long as the user is organizing there files as one album per folder.

Even if they put multiple albums in one subDirectory it will also work as long as the string "AlbumName" is unique and consistent for all the tracks that belong to the same Album within that subDirectory.

If users have some other file organization, there is a workaround - move all the tracks that you want to consider as a Group of Albums into the same leaf directory before running Jaikoz.

A problem arises when they want to combine tracks that have inconsistent AlbumName as a Group. Maybe your concept of Manipulator Tasks (and their associated Preferences) can be used to provide the flexibility to handle some of these user-specific cases. Maybe the "Grouping" algorithm could be an optional task that you run prior to any matching and scoring. You could potentially allow users to configure the "Group" task by adding some preference check-boxes to allow the user to specify which fields the Grouping algorithm would use (a bit like defining the "WHERE" clause in a SELECT statement operating on all the tracks) when performing the Grouping. It sounds like the cases that you have mentioned could be handled if the following were optionally allowed in the WHERE clause for the SELECT:
- Subfolder [true by default]
- AlbumName [true by default]
- AlbumArtist [false by default]
- TrackArtist [false by default]

You could even optionally display a GroupId (populated once the Grouping task has run across the tracks) as a column in the table so that we can easily see whether the grouping worked as expected.

paultaylor wrote:

But this is not always the case either by design or because of lack of organization. Your'e rule of subDirectory AND album is safe enough, but why not just use artist AND Album then this would group tracks from the same album , that don't happen to be in same folder

I'm presuming you are suggesting AlbumArtist rather than TrackArtist (since a compilation release can have many TrackArtists so this would prevent the correct matching of most compilation albums). In my collection AlbumArtist is empty for every single local track so it wouldn't be helpful in the initial grouping process. I think that this might be the most common case with ripped CD collections.

TrackArtist wouldn't help my usecase either since personally, I don't want Jaikoz to prioritize Original Releases over Compilations. In my personal view it is preferable that songs are matched accurately to the actual albums that they were ripped from. Then I would like MB metadata to improve fields like Artist, TrackName, AlbumArtist, TrackArtist, Genre etc to make it easy to find songs on an iPod/Sonos etc. But there would be no harm (and possible benefit to some types of users) if you could add both AlbumArtist and TrackArtist as an option in the specification of a new "Group" task (as described above).

I would think that users with small music collections don't need Jaikoz as much as those with large ones. In the large music collection case I would say that not using subDirectory in the case where a user does have reasonable organization is a missed opportunity for increasing accuracy and reducing the percentage of manual corrections that are needed (perhaps by as much as 50%). I wouldn't sacrifice this potential improved accuracy just for some potential fringe cases that can probably be accommodated via preferences like "Prefer Original Release"

paultaylor wrote:

I assume its because you are trying to deal with the case of artist being set to 'Various' for some tracks.

This is certainly one example of where a Grouping based on AlbumArtist or even TrackArtist would break down.

paultaylor wrote:

Trouble is people aren't always burning from a CD, so all the tracks might not have the same album set even if the user intends this.

Realistically in any large collection there are going to be some cases that require some manual intervention. Hopefully these can be minimized. But if AlbumName really cannot be trusted for some subset of directories in a collection, then it might make sense to disable "AlbumName" from the Grouping task preferences as described above.

paultaylor wrote:

Another case is users trying to amass a complete original album but because of the rarity of the album having to make up some tracks from compilations, so we have the case where tracks are not all form one album but they would like them to be.

If they are indeed trying to amass a specific original album from tracks found from various sources, then this sounds like a case where only the user himself knows which tracks he wants to belong to which album. There's no way for Jaikoz to know for sure. You already have the Manual Correct functionality that can handle this case. In addition, it doesn't seem unreasonable to expect the user to do some manual edits to the AlbumName and Subdirectory himself before kicking off ManualCorrect/AutoCorrect to encourage Jaikoz to match tracks to the albums that the user wants.

This case could also be handled by disabling "AlbumName" in the Grouping task preferences and either
a) Manually fixing meta data and moving all the tracks into a common subDirectory and enabling "SubDirectory" in the Grouping task prefs, and/or
b) Enabling AlbumArtist and/or TrackArtist in the Grouping task prefs

paultaylor wrote:

Conversely we have the case where the user does want the traks to be to maintained on the compilation album.

If the "Prefer original release over compilation" option is set then modify the sorting algorithm used for sorting the best release match so that original releases come first even if they have a lower score than compilation albums. If the option is not set, then simply numerically sort by release-score.

paultaylor wrote:

I think section 3.2 would be more a case of just matching tracks to mb release by comparing name,length, track nums and I already do this.

That sounds reasonable. The weightings can always be experimented with and optimized later. My sense is that, at present, Length and TrackNo are usually more reliable indicators than TrackName where slight differences can cause a disproportionate number of points to be deducted.

paultaylor wrote:

It doesn't really need to consider all combinations because normally most tracks were clearly match one location on the release. Ive been reluctant to this in the past because users may not elect to match all tracks

I suppose it might become tricky when there are cases like:
a) User doesn't have all the tracks in the release in his music collection
b) A similar TrackName occurs twice on the same release (e.g. one of the is an instrumental/extended/remix version of another song)
One thing that is useful in Picard is that it colour codes each track to show how good the match was (green, amber, red) and it also has a place to put tracks in the same Group/Cluster) that didn't match properly. I think that you could achieve similar if not better results by displaying your track and release level matching scores in your table (perhaps also with colour coding to quickly spot problems).
c) TrackNo is missing or unreliable

paultaylor wrote:

and the list of songs in the Edit Pane may not be displayed in release order.

It would definitely be helpful to allow users to sort by multiple columns simultaneously. What I have been doing is using the Filter to select 1 album at a time, then sorting by TrackNo to get the songs into TrackNo order. Then I can more easily compare to a release in the MB website.

paultaylor wrote:

I think this step can just be applied as part of 3.1, so that 3.4 returns the best score.

It could be combined (i.e. after scores have been calculated for all tracks in a potential matching release, but before sorting based on final release score). It might be more flexible though to keep them as separate methods (at least from a code perspective) so that in the future it might be easier to make these scoring algorithms separate plugable tasks with preferences that you can sequence together via your Manipulator concept. That way you could expose all the track-level scoring parameter weightings as one set of "Track-scoring task preferences" and all the release-level scoring parameter weightings as another "Release-scoring task preferences", then just run them in sequence one after the other. In the future you might even define alternative plugable scoring algorithms that suit other usecases that come up.

paultaylor wrote:

mjw wrote:

At this point there could be some local tracks that have no assigned MBTrack. The user could try re-running to see if these remaining tracks could also be matched to separate Releases, but most likely it's an indication that the precise Release has not yet been submitted to MB for that AlbumName. E.g. A common example would be a release with an extra bonus track as the last track.

If there is a reasonable match for the track during the initial steps it should be assigned to the release even if its a difference release to the other tracks.

Makes sense. I guess there would probably need to be some kind of minimum threshold to define what constitutes a "reasonable match" though. If a track falls below that threshold (e.g.the calculated fingerprint doesn't even slightly match with any songs on the release), then perhaps there needs to be some way in the UI to indicate that it was not matched to a MBRelease in a similar way to how Picard does. Perhaps you could put some colour coding on your "Status" column to allow users to easily spot where the problems lie.

paultaylor wrote:

mjw wrote:

Note 4 This approach also solves the problem you raised about 'Prefer original release even if better meta Match to later compilation release'. It will always pick the release with the highest final score so it doesn't matter if there are no original releases in MB.

This is the one problem, it doesn't solve this issue because your're algorithm would always favour the compilation. A user may have 20 tracks in a compilation but they actually want them tagged to their original release, even though this might mean they ends up with 20 different releases. If both the original releases and compilations are in the database I want the traks to match the original releases when 'Prefer original release even if better meta Match to later compilation release' is selected and the compilation release when it isn't.

Its further complicated by the fact that Musicbrainz would (correctly) call a release a compilation if it was a Greatest Hits album by one album, but iTunes expect it have multiple artists on the release.

So need to finesse the algorithm a little perhaps as follows.

IF 'Prefer original release even if better meta Match to later compilation release' selected
IF the tracks appear to be from a compilation because either
1. the IsCompilation field is checked
2. The artist or album artist is Various.
3. The release has different artists on each track.
4. The release name is 'Greatest Hits'

Use simplified original algorithm to find the best release for track without considering other tracks.

Perhaps both the original algorithm and the new algorithm can be made into a set of Manipulator task steps so that users could use whatever approach suits their collection. Perhaps call the old one "Track orientated match with MusicBrainz" and the new one "Release orientated match with MusicBrainz". Prior to "Release orientated match with MusicBrainz" you would need to have the task "Group (before release-orientated match)". After "Track orientated match with MusicBrainz" you would need the current "Cluster" task, but perhaps to reduce confusion
"Cluster" could be renamed "Group (after track-orientated match)" so that users don't try to incorrectly mix release and track orientated tasks.

In the case of the new release-scoring algorithm I'm not sure it would be necessary to even know a-priori whether the local tracks are part of a compilation album. If the "Prefer original release over compilation" option is set then modify the sorting algorithm (step 5) so that original releases come first, even if they have a lower score than compilation albums. Then the algorithm will pick the original release with the highest release-level score. This might have the consequence that there are some tracks that do not match that release. The user will need to manually decide what to do about that,
a) they might want to move the orphan songs to another directory
b) delete them
c) try to match them up with other albums as a second pass of the algorithm
Thanks for this thorough analysis. It has really helped me to understand better how things work.

It seems like there are a few quick fixes that would help here:
1) Make the Title match agnostic of the type of bracket that is used
2) Remove ReleaseType from the scoring and add the 3 points to the TrackNumber max score instead.
3) Making isCompilation a column in the Manual Correct popup
4) Consider displaying each component of the total score at the beginning of each cell in the Manual Correct popup. So for example, instead of 'Hunting High and Low [Remix]' it would instead appear as '{17/28} Hunting High and Low [Remix]'. I guess you could even make this part of the string a different colour to make it stand out. By seeing each component of the score like this it would be very easy to work out why particular local tracks were matching with particular remote MB Tracks

If you were to implement the release-level matching algorithm that I describe below, then item 4 would need to be revisited since there would now be a release-level score as well as track-level scores so the UI might need to look different.

paultaylor wrote:

if you were to disable 'Prefer original release even if better meta Match to later compilation release' you would get

Thanks for explaining this. I will try experimenting with this setting.

paultaylor wrote:

But the scoring is additive not subtractive , you are saying give credit for the artist field matching even when it doesn't.

Sorry, yes I was suggesting that you give credit for the artist field matching only in the case where it contains the string "Various". But let me think on this one a bit more. Now that I understand the principals a bit better I've got the beginnings of a few ideas that might be able to greatly increase the accuracy of the matching without having to use this approach.

paultaylor wrote:

Your algorithm is a series of big if statements.

You've given me a few ideas and after a lot of thought I think there's a way of doing this with very few if statements. At a high level it could work like this:
1. Get a group of local tracks that are in the same subDirectory and have the same AlbumName
2. Use your existing "ManualCorrect" algorithm to find all the candidate MB Releases for all of those tracks and add them to a Set to eliminate duplicate releases
3. Calculate the maximum total track-level score for each of the candidate releases as follows
...3.1 Get the next candidate release from the Set returned by step 2
...3.2 Assign all the local tracks from step 1 to the optimal track number in the MB Release that will cause the SUM of the individual track-level scores to be the highest possible number. This number is the initial release-level score. Clearly if you have 20 tracks you don't want to try 20 factorial combinations so we need some smarter (parallel?) algorithm to reduce the number of combinations to test.
...3.3 Goto 3.1 and process the next candidate release
...3.4 Return a List of ScoreResults containing the release-level score of each candidate release (from step 2) paired with the optimal MBRelease=>LocalTrack association (step 3.2) that achieved that score
4. Apply any release-level scoring adjustments that might be required using the release attributes AlbumName, Release is Original Album, Release is an Offical Album, isCompilation, ReleaseCountry. TotalTracksOnReleaseGreaterOrEqualToNumberOfLocalTracks might not be needed since the algorithm will naturally bump up the score for a release where more tracks match.
5. Sort the candidate MB Releases by the final total release-level score
6. Map all the local tracks to the MBTracks in the MBRelease with the highest final score. At this point there could be some local tracks that have no assigned MBTrack. The user could try re-running to see if these remaining tracks could also be matched to separate Releases, but most likely it's an indication that the precise Release has not yet been submitted to MB for that AlbumName. E.g. A common example would be a release with an extra bonus track as the last track.

Note 1 - The algorithm for calculating individual track match scores would be almost the same as your current algorithm, except it would exclude any attribute that belongs to the Release rather than the Track. So it would include only Artist, Title, TrackNo, TrackDuration. "Release used by another Song" would no longer be needed since the algorithm now only checks songs that are together in the same release.
Note 2 - The score for a release containing 14 tracks is going to be in the range of 0-1400. This works nicely because it means that the more tracks in the local grouping that match with a MB Release, the higher the score will be. It will also guarantee that only one MB Release is chosen for all the tracks. If a set of tracks matches somewhat with one MBRelease containing 9 tracks with good track match scores and another containing 14 tracks with good track match score, it will tend to pick the MBRelease containing 14 tracks.
Note 3 - If you provide both the Release-Level score (same for each track in the same release) and the Track-Level score as columns in the table, then it would be very easy for users to sort and review the fixes made by the AutoCorrector so that you can spot any major mistakes in assigning tracks to the wrong release. It would be handy though if you could sort by more than one column simultaneously though like the way that it works in Excel for example. If you have this functionality then you might not need to even have minimum score thresholds any more since you can just show all the best matches and let the user review them.
Note 4 This approach also solves the problem you raised about 'Prefer original release even if better meta Match to later compilation release'. It will always pick the release with the highest final score so it doesn't matter if there are no original releases in MB.

paultaylor wrote:

Yes, used to do this but looked very clunky 

Perhaps just removing the radio button from the first row would do the trick.
Alfg was indeed correct. Now I just have 4 tasks remaining in my AutoCorrector Tasks:
1. Correct Tags from MB
2. Update Tags from existing Discogs Ids
3. Correct Lyrics
4. Cluster Albums

This is getting closer to what I want since it allows me to enrich my meta data using MB and Discogs without changing any of my existing meta data (which might be correct in some cases).

Now if Paul can implement the F12 functionality, I'll be able to re-enable the settings to allow MB and Discogs to change existing fields and I'll be able to review those changes easily before Saving. This will allow me to safely improve the meta data for a good portion of my collection.

paultaylor wrote:

The problem is not that Remix case is different it is that your song uses square brackets, and Musicbrainz uses curly brackets, and this is affecting the score, This is probably a bug it should treat the brackets the same.

Is this a bug in MB or Jaikoz? If it is the former, would you be able to submit the bug?

paultaylor wrote:

So two workarounds are to run lower the threshold score, run Autocorrect twice

Thanks. I'll try both.

paultaylor wrote:

1) track length doesnt contribute 16 pts, there must be other items that don't match, I dont have the full scoring algorithm handy at the moment but I'll publish it later.

That would be useful. My suggestion would be to ensure that all of the fields that are used in the algorithm are made visible in the Manual Match popup. At the moment, all I can see if that we've somehow lost 16 pts even though the only difference in the popup is Track Length. And I can't verify this because there is no way I can edit Track Length in Jaikoz.

paultaylor wrote:

2)This doesn't make much sense in the general case because your artist name is wrong so it shoudlt get as good as score as when you had the artist field matching. Also makes it difficult to get a good match, i.e we could find the same song covered by two different artsists without the artist name to match on not clear which is the correct one

I'm not sure that I follow. Let's take the Trainspotting album example. If you take this CD and rip it with Windows Media Player then each track will have Artist="Various". This is not wrong per-say, it's just ambiguous. Ripped compilation albums often have Artist="Various" from what I have seen. What "Various" means in 99% of the cases is - "Don't know, but each track in the album might be a different artist". If you look at MB, there are also many case there where the Artist="Various Artists". All I am suggesting is that you don't deduct any matching points for a mismatch where either the local or remote data contains the string "Various". I'm not sure how you would implement this, but perhaps it is as simple as replacing "*Various*" with "".

Regarding your example "same song covered by two different artists without the artist name to match on not clear which is the correct one". I can see that this might be tricky, but couldn't you use the following other meta data if available to choose the right one:
- Fingerprint
- Consider the other tracks in the user's same Album/Directory. If MB has 2 tracks to choose from with identical Title but different Artist, then pick the track that is from the same Album as the other track(s) that the user has in that Album/Directory
- Track Length
- Release Date
- etc

If at least some of this is available, the chances are that you should be able to pick the correct track in preference to the wrong tracks even while ignoring the Artist field.

paultaylor wrote:

3)This is what 'Cluster Albums' does isnt it.

I'm not sure, could you perhaps outline the algorithm that you use here? I guess I might just be confused because "Cluster" does something slightly different in Picard where it is used as a precursor to a manual match whereas in Jaikoz cluster seems to be used as a post-processing step.

paultaylor wrote:

4)Don't see why this would be reundant if I did (3), as you describe it wouldn't it be just the same as running Tag from Musicbrainz but with a low threshold score. I did consider chnaging Manual Tag from Musicbrainz so it defaulted to selecting the best score rather than selecting no score, would that be useful.  

In the popup window, right? I think it would be useful. It would also be clearer if the first row if moved completely out of the table to just above the title row of the table so that it doesn't have a radio button beside it. I didn't realise for a long time that this first row was local data whereas the other rows are remote data because initially the first row looked the same as the other rows to me.

paultaylor wrote:
3a. The F12 approach is a good idea that I might add (whilst keeping the View Pane)

That would be tremendous. Thanks.

paultaylor wrote:

3b. Ah this is because there are also a seperate Preferences/Remote Correct/Discogs settings. They are kept seperate because by default changes from Musicbrainz overwrite fields, whereas changes from Discogs just fill in empty fields. (UPdates from Discogs can occur when the Url Discogs Release field has a value)

I looked at my Discogs preferences and they also are set to only fill when empty, so I still don't understand why Jaikoz is changing those fields (or at least marking them purple until I save)?

paultaylor wrote:

You can revert individual cells, select the cells, right click and select Edit/Undo

Thanks for pointing out. "Undo" indeed does 50% of what I would like the F12 key to do. The other 50% would be to perform a "Redo" on the selected cell(s). Then that way "Toggle" (F12) would just quickly flip the selected cells back and forth so you could see the changes. While you're at it, key mappings for undo (crtl-z) and redo (crtl-y) would be useful too.

paultaylor wrote:

4. Not in the short term.

OK, I will try to use Windows. Perhaps you could update the issues list (http://www.jthink.net/jaikoz/jsp/issues/start.jsp) to note this problem on Linux?

paultaylor wrote:

6. If you have a musicbrainz id then it should be able to get the artwork UNLESS the amazon artwork is stored in non-standard url, Ill check this one out (The MusicIP id is irellevant for getting artwork)


paultaylor wrote:

9. But playing time is not a field that Jaikoz can change, in fact it can't change any of the fields that are displayed in the View Audio tab.

OK, so if I'm understanding you correctly - Say for example I have a track in my collection with an incorrect PlayingTime of 2:05, and Jaikoz matches it to a track in MB with a PlayingTime of 4:40. Even if my local data is wrong, Jaikoz has no way to make and save an update that field?

Also, regarding setting of preferences to Only Update Empty Fields, I am still confused and need something clarified. If I have set Jaikoz to Only Update Empty Fields for Title and Album and Jaikoz matches a track to a MB track with a different Title and Album field I notice that the GUI DOES change the colour of the cells to Purple (which indicates change) regardless of my preference settings. Are you saying that this highlighting is misleading and that in actually fact the changes are never written to disk when I save?

My main concern is to have complete transparency around what Jaikoz might be changing/adding especially for tags that I cannot see via the Jaikoz UI or where the UI might be misleading.

paultaylor wrote:

13. I an see your point but not that keen on doing this, will ponder.

I guess my need for this really depends on your answers to the previous question. If indeed Jaikoz can never change these fields, then I guess they could be excluded from the main table after all. When doing a Manual Lookup I have since realised that the first row in the popup window shows the current tag values in my collection so at least the info in displayed there. It just wasn't clear to me initially that the first row was local data and all the other rows were remote data. I'm not sure if this is doable, but it would have been clearer to me that the first row is local data if the first line in the popup did not have a radio button next to it or if perhaps the first row was placed separately just above the title row in the table.
Thanks for the detailed answers.

paultaylor wrote:

AMplified Music Services are currently updating their sw so these restrictions should disappear.

Do you have any guess as to how long it would take to get this into Jaikoz? I'm wondering whether to put the tagging of my WMA files on hold for a month, or alternatively try tagging that part of the collection iunder WinXP within VMWare (although I would only have 1 thread and 1GB RAM instead of 8 threads and 6GB RAM).

paultaylor wrote:

2. Looking up songs from Musicbrainz works either by acoustic ids or by matching metadata, although your metadata is quite good because the artist is completely incorrect in the examples thats having a big impact on the score for potential matches. From your screenshot of Manual Matches you can see that the scores are about 60 but the default required to get an automatch is 70, but you can reduce this value in Preferences/Musicbrainz/Automatch/Minimum Rating required if Meta Match Only.

It looks like the score is always dropping below 70 when the artist is "Various". Reducing the threshold to 55 seems to be the level that is required to get good matching but I worry that may also be a little too low and start matching things that it shouldn't.

paultaylor wrote:

I will be adding full Discogs support.

Any thoughts on a timeframe? I could delay this cleanup for a month if it's coming soonish.

paultaylor wrote:

Jaikoz is unable to find album art for track 8 "Hunting High and Low (remix)" and all other tracks containing the string "remix". Is this because the 'r' of 'remix' is not a capital 'R' in my existing meta data?

I would suggest that is the case, but best to expirement by running Manual tag from Musicbrainz to see what results come back.

I did some experimentation and the results are interesting. Capitalization had no effect on score, but the type of bracket I use matters. Changing from [remix] to (remix) increases the score from 73 to 84. Even when I change every single other editable field on the popup manual match window so that it matches exactly, I still cannot get the score above 84. The only other field that I can see on the page but am unable to edit is the Track Length which is off by 1 second. So I'm assuming that we must be having 16 points deducted even though a 1 second difference is a pretty negligible difference IMO.

I can think of 3 improvements that would largely eliminate the AutoCorrect problems:
1) Can you perhaps adjust the scoring algorithm so that the type of brackets that are used does not have any impact on scoring? Also make it so that a track length that is off by 1 or 2 seconds only causes one or two points to be deducted rather than 16 points.
2) Do you think you could detect the string "Various" and treat it like a wildcard, as if the Artist field was a 100% match no matter what? Perhaps you could even ideally find matches regardless of whether the "Various" is in the local data or the remote data or both.
3) Add a Manipulator that goes through all the clusters (leaf directories?) at the last step and attempts to convert all the tracks in the same cluster so that they use the same Release Id. If different tracks are mapped to different Release Id's it should pick the Release Id to use for each track in the Cluster according to the following criteria:
a) Number of tracks in Release >= Number of tracks in Cluster
b) Track Number is 100% match
c) Track Length is 95% match
While ensuring that the above criteria are satisfied in all cases it should try to minimize the number of separate releases.
4) [This suggestion is probably is redundant if you develop 3 above] Provide a "Apply Top Manual Match from MusicBrainz" option which just picks the best match from MB without bothering with the dialog box.

paultaylor wrote:
Actually Im think Im getting confused about, Linux doesnt do M4a.

Look at this post http://www.jthink.net/jaikozforum/posts/list/926.page, you might need to do:

sudo aptitude install libstdc++5

This does not seem to be the source of my problem with fingerprinting. On my linux box I already have this library installed and genpuid works fine. I've actually now found that this problem is not limited to WMA, its also M4A. I can't seem to check if it also is a problem with MP3 or not since I get the console message "Ignored 11 Songs because they already have an Acoustic Id" and can't find a way to remove the Acoustic Ids or force them to be recalculated.

Is there any way you can load up my files in an Ubuntu virtual machine and try this out at your end?
1. True. It's not a major point, but a low priority improvement.
3a. I personally find that the View Pane is not a very effective way of quickly reviewing changes and would prefer an in-place F12 approach.

paultaylor wrote:

3b. If you dont want values to be overwritten you can set this on a column by column basis in Preferences/Musicbrainz/Format and Format2 when updating from Musicbrainz and Preferences/Remote Correct/Discogs

I do have the Preferences/Musicbrainz/Format options set to "Fill this field only if it is empty" for all fields but still after running AutoCorrect, a number of fields are still changing content and colour to purple. Fields that change include Album, TrackNo, Genre and AlbumArtist and possibly others that I haven't been able to detect yet. Can this be fixed?

I guess that what I would also really like is the ability to Revert individually selected cells without having to revert every cell on the row
4. Do you think you can fix this for Linux? I have found out today that this affects WMA and M4A tracks which is most of my collection
6. When I tried this a couple of days ago it matched the MB release but could not get the associated artwork. It seems not to have a MusicIP Id. I think this may be caused by the fingerprinting issue on that is not working on Ubuntu 8.10 64-bit.
8. See 3a.
9. Jaikoz is certainly doing much better than most taggers when it comes to reviewing changes before saving so credit where credit is due! But even if I display all available columns, I still wont see PlayingTime and some of the other in the "View Audio" tab in the View Pane. Also, if I add all fields then they will not fit on the width of 1 screen and a scrollbar will become necessary. This would really impact the efficiency of the reviewing of changes. What I am looking for is some means of being able to quickly identify the complete list of meta data field names that Jaikoz will change if I elect to save. That way I can get some comfort that no good data is being trashed behind the scenes. This happened to me when I tried Picard - it silently trashed all my embedded album art within my WMA files and I didn't find out until my Sonos player didn't display and art any more
13. As mentioned in 9, this is an essential field to look at to detect the most common cases where the wrong MB Id has been assigned. So if you can add it read-only then that would be very much appreciated.
16. I think the concern of the MB people is that users will submit poor quality data and it will overwhelm their review process. Perhaps these concerns could be addressed by also adding a second API which allows Taggers to tell MusicBrainz each time a user overwrites their local meta data using the remote meta data from a particular MB Id. This could be considered as a "vote" by the user that the quality of the meta data is good. This would then provide a feedback loop that would give the MB guys some confidence that meta data that has been reviewed by many people. It also wouldn't increase the burden on the current review process. At some point you could even include the ability for users to modify/enhance existing remote meta data in MB.

Here's a slightly different example. Jaikoz fails to assign any songs from the full Trainspotting album to any release.

When you try Manual Correct, it seems that Jaikoz thinks the potential releases all have only 10 tracks (and perhaps for this reason fails to find a match)

In fact the MusicBrainz database shows that this album actually has 14 tracks contrary to what Jaikoz shows. And all 14 tracks have meta data that match well with the existing meta data on the tracks in my directory:
Here's another example for "Headlines and Deadlines: the Hits of a-Ha"

Jaikoz is unable to find album art for track 8 "Hunting High and Low (remix)" and all other tracks containing the string "remix". Is this because the 'r' of 'remix' is not a capital 'R' in my existing meta data?
Hi Paul,

Thanks in advance for your help.

I kicked off Jaikoz Auto Correct on a larger collection of around 8000 tracks last night to test Jaikoz on a more realistic subset of my collection before I commit to making the license purchase. I should mention that I am running on a 4 core (8 threads) 64 bit Linux machine.

The next morning I could see that Album Art exists for under 15% of the tracks. I believe that many of those tracks already had Album Art, so actually the percentage successfully added by Jaikoz is probably much lower than that. Assuming that some of the automatic changes were probably to the wrong album and will probably need manual adjustment I'm thinking that the success rate will be even lower still. It would be useful if Jaikoz could output some statistics after each operation regarding how many tags of which type where changed, added, or deleted.

Can anyone help? Unless there is a way of increasing the success rate I don't think Jaikoz is going to be a feasible way of adding the correct Album Art to my library.

All of the tracks have existing tag data for at least Title, Artist, Album and Track Number. In some cases they even have Year. The tracks have all been ripped from CDs using Windows Media Player and are all organized neatly in directories according to Artist/Album/Tracks.wma.

All I want to do for now is to add Album art where it is missing (and ensure that the tracks that live in the same directory get assigned the same correct album art). I don't care so much about MusicBrainz Id's at this point since I believe that my meta data is in reasonable shape for these directories and I don't want the MusicBrainz metadata to overwrite my metadata unless I'm absolutely sure that it's improving data quality.

In case it helps. here is a concrete example which illustrates some of the problems. I have taken one Album in a single directory called Eminem - 8 Mile. As you can see from the attached screenshot Jaikoz assigns artwork to only 4 out of 16 tracks but Lyrics to 8 out of 16. No Accoustic Id can be generated for any of the songs. I believe this is because Jaikoz fails for generate Accoustic Id for all WMA files. Is this a bug? Jaikoz claims to support WMA:

Regardless, I would expect that Jaikoz should be able to find album art without MB Id's given a reasonable amount of meta data. [Aside - Logically I'm reasoning that if this were not the case then Jaikoz would be completely reliant on successful Acoustic Id generation to be able to function at all, and this would mean that Jaikoz would therefore be completely dependent on the quality and coverage of data in the MusicBrainz database rather than the meta data already in the libraries of users. This would also mean that Jaikoz would be unable to take reasonable meta-data that users already have in their ripped collections and contribute to MB by adding the albums into MB.]

The correct MB release page for this album is here:

As you can see from my screenshot Jaikoz has a wealth of information that it could potentially use for matching:
1) Title matches exactly for every track (even case matches)
2) Album matches exactly
3) Track number matches exactly
4) Playing time matches to within a second or two for every track
5) Year matches
6) Other tracks with the same Album name and in the same directory are able to be successfully matched to the same correct MB release
7) Total number of tracks in directory matches total number of tracks in release. All track numbers are unique within the directory and the highest track number matches with the number of tracks in the correct release

I realize that fuzzy matching algorithms can be tricky, but one would think that all of the above would be sufficient for Jaikoz to have a high degree of certainty regarding which release to choose, but in 12 out of 16 cases it seems unable to select a release automatically.

The only deficiency in the prior existing meta data in this example is that the Artist does not match and Album Artist is empty. In all cases, my tracks have the artist as "Eminem" whereas in MB the tracks are attributed to different artists. In MB Album Artist is "Various Artists".

Here's the console output:

Oct 26, 2009 4:24:21 PM: INFO: Started Autocorrecter on 16 Songs
Oct 26, 2009 4:24:21 PM: INFO: Task 1:Started Correct Artists on 16 Songs
Oct 26, 2009 4:24:21 PM: INFO: Task 1:Completed Correct Artists on 16 Songs
Oct 26, 2009 4:24:21 PM: INFO: Task 2:Started Correct Albums on 16 Songs
Oct 26, 2009 4:24:21 PM: INFO: Task 2:Completed Correct Albums on 16 Songs
Oct 26, 2009 4:24:21 PM: INFO: Task 3:Started Correct Titles on 16 Songs
Oct 26, 2009 4:24:22 PM: INFO: Task 3:Completed Correct Titles on 16 Songs
Oct 26, 2009 4:24:22 PM: INFO: Task 4:Started Correct Genres on 16 Songs
Oct 26, 2009 4:24:22 PM: INFO: Task 4:Completed Correct Genres on 16 Songs
Oct 26, 2009 4:24:22 PM: INFO: Task 5:Started Correct Comments on 16 Songs
Oct 26, 2009 4:24:22 PM: INFO: Task 5:Completed Correct Comments on 16 Songs
Oct 26, 2009 4:24:22 PM: INFO: Task 6:Started Correct Track Numbers on 16 Songs
Oct 26, 2009 4:24:22 PM: INFO: Task 6:Completed Correct Track Numbers on 16 Songs
Oct 26, 2009 4:24:22 PM: INFO: Task 7:Started Correct Recording Times on 16 Songs
Oct 26, 2009 4:24:22 PM: INFO: Task 7:Completed Correct Recording Times on 16 Songs
Oct 26, 2009 4:24:22 PM: INFO: Task 8:Started Correct Tags from Filename on 16 Songs
Oct 26, 2009 4:24:22 PM: INFO: Task 8:Completed Correct Tags from Filename on 16 Songs
Oct 26, 2009 4:24:22 PM: INFO: Task 9:Started Correct Artists on 16 Songs
Oct 26, 2009 4:24:22 PM: INFO: Task 9:Completed Correct Artists on 16 Songs
Oct 26, 2009 4:24:22 PM: INFO: Task 10:Retrieving Acoustic Ids for 16 Songs
Oct 26, 2009 4:24:22 PM: INFO: Task 11:Started Correct Tags from MusicBrainz on 16 Songs
Oct 26, 2009 4:24:22 PM: INFO: Task 12:Started Updating tag data for 16 Songs
Oct 26, 2009 4:24:22 PM: INFO: Task 13:Started Correct Lyrics on 16 Songs
Oct 26, 2009 4:24:23 PM: WARNING: Unable to retrieve an acoustic id for song 6
Oct 26, 2009 4:24:23 PM: WARNING: Unable to retrieve an acoustic id for song 0
Oct 26, 2009 4:24:23 PM: WARNING: Unable to retrieve an acoustic id for song 4
Oct 26, 2009 4:24:23 PM: WARNING: Unable to retrieve an acoustic id for song 5
Oct 26, 2009 4:24:23 PM: WARNING: Unable to retrieve an acoustic id for song 13
Oct 26, 2009 4:24:23 PM: WARNING: Unable to retrieve an acoustic id for song 1
Oct 26, 2009 4:24:23 PM: WARNING: Unable to retrieve an acoustic id for song 8
Oct 26, 2009 4:24:23 PM: WARNING: Unable to retrieve an acoustic id for song 14
Oct 26, 2009 4:24:24 PM: WARNING: Unable to retrieve an acoustic id for song 10
Oct 26, 2009 4:24:25 PM: WARNING: Unable to retrieve an acoustic id for song 11
Oct 26, 2009 4:24:26 PM: WARNING: Unable to retrieve an acoustic id for song 2
Oct 26, 2009 4:24:27 PM: WARNING: Unable to retrieve an acoustic id for song 3
Oct 26, 2009 4:24:28 PM: WARNING: Unable to retrieve an acoustic id for song 12
Oct 26, 2009 4:24:29 PM: WARNING: Unable to retrieve an acoustic id for song 15
Oct 26, 2009 4:24:30 PM: WARNING: Unable to retrieve an acoustic id for song 9
Oct 26, 2009 4:24:31 PM: WARNING: Unable to retrieve an acoustic id for song 7
Oct 26, 2009 4:24:41 PM: INFO: Task 10:Retrieve Acoustic Ids was unable to find a match for 16 Songs
Oct 26, 2009 4:24:41 PM: INFO: Task 10:Completed retrieval of Acoustic Ids for 16 Songs
Oct 26, 2009 4:24:41 PM: INFO: Task 11:Correct Tags From MusicBrainz was unable to find a match for 12 Songs
Oct 26, 2009 4:24:41 PM: INFO: Task 11:Corrected 4 Songs from MusicBrainz successfully
Oct 26, 2009 4:24:41 PM: INFO: Task 11:Completed Correcting Tags from MusicBrainz on 16 Songs
Oct 26, 2009 4:24:41 PM: INFO: Task 12:Updated 4 tags from existing Discogs Id successfully
Oct 26, 2009 4:24:41 PM: INFO: Task 12:Update Tags from Existing Discogs Ids was unable to find a match for 12 Songs
Oct 26, 2009 4:24:41 PM: INFO: Task 12:Completed Updating Tags from Discogs for 16 files
Oct 26, 2009 4:24:41 PM: INFO: Task 13:Corrected 8 Lyrics
Oct 26, 2009 4:24:41 PM: INFO: Task 13:Unable to find a match for 8 Songs
Oct 26, 2009 4:24:41 PM: INFO: Task 13:Completed Correct Lyrics on 16 Songs
Oct 26, 2009 4:24:41 PM: INFO: Task 14:Started Clustering Albums for 16 Songs
Oct 26, 2009 4:24:41 PM: INFO: Task 14:Before clustering there were 1 albums spread over 1 MusicBrainz Release Ids
Oct 26, 2009 4:24:41 PM: INFO: 8 Songs were modified with the Autocorrector
Oct 26, 2009 4:24:41 PM: INFO: Completed Autocorrecter on 16 Songs

Here's the same Eminem 8Mile directory of files loaded into Picard:

On the plus side, at least Picard manages to assign 15 out of the 16 tracks to the correct album which beats Jaikoz's 4/16. On the minus side, Picard still leaves manual work to the user by mis-assigning track 2 "Love Me" to the wrong Album. It looks like Picard is able to generate fingerprints for all tracks though, unlike Jaikoz which fails to generate any fingerprints.

paultaylor wrote:
how would you like it to work ? 

Maybe you could add another colour code to the table to indicate the following situation:
- Field Colour when cell would have changed (but there was already a prior value in the cell and preferences were set to not overwrite non-empty cells)

Then allow the user to select any cells that they want to inspect more closely, then press F5 to toggle the selected cells back and forth between:
a) The original value in the cell
b) The alternative value that Jaikoz found through the Autocorrect process

Once the user is happy with the way the meta data looks for the selected cells, then they can Save. [As an aside - is there any way to get these improved meta data values back into the MB database so that other users can benefit from the cleaning that other users are manually doing like this? ]

This would be very convenient because users can then set up preferences to handle the most common case (we usually don't want our existing tags to be overwritten) but override this policy for a selected group of tracks as required.

It would also have the added benefit that we could then use the F5 key as a way of seeing what values Jaikoz is going to change in each selected cell if we Save.

paultaylor wrote:
There is a reason it works this way:
Perhaps if I deal with the tracks as a group rather than individually as I do now I can relax this criteria in the future.

I was surprised to read that you don't handle tracks as a group when attempting to match. It seems like the obvious thing to do. If you don't, then you throw away a large amount of the matchable data. You will also end up splitting tracks that were previously together in the same Album into a multitude of separate Albums. After saving, this will then irrevocably (unless you have a backup) mess up all the tag data and album art.

There is a very common use case that is being ignored here, namely a user who has ripped a large collection of CD's to disk using a common program like Windows Media Player. The tracks will generally be in a uniform directory structure (Artist/Album/Track) with a reasonable set of tags already set. Sure, these tags might not match exactly with those in the MB database, but the data quality will already be quite high since the data came from the CD itself rather than MB users who typed in their best guesses. Perhaps album art might be missing and maybe the file naming convention may be a bit off, the track numbers will probably not be zero padded. The way Jaikoz (and Picard) seem to work at the moment is they seem to sometime ignore some of this useful meta data when they could be making use of it.

Surely the algorithm should be something like this:
1. Get all the files in the current directory
2. Group them in a HashMap where the key is the Album name
3. For each group of tracks, attempt to find the best-match release in Music Brainz where additional points are added to the MB score if:
a) The number of tracks in MB is >= Number of tracks in this key of the HashMap (don't allow a match with a MB release where num tracks in MB < number of tracks in this group)
b) Track Numbers (if this tags is present) match exactly with those in MB
c) Duration (if this tag is present) is within 1%
d) Track name (if this tag is present) is a close match (after converting to lower case, converting '&' to 'and', etc, removing whitespace and non-alphabetical characters
4. Assign all the tracks in this group to the best-match release found in step 3.
5. Goto 3 and process next group of tracks

Can you see any issues with this approach?
First of all, congratulations on producing perhaps the only music library cleaning program that can almost handle >10,000 track music collections. Believe me, I have tried a lot of others and this has the most promise so far apart from the issues I've found below (which I'm hoping are either due to my lack of understanding or down to issues that you can fix).

1) In the Console output it is not clear which task is generating which output messages. Sure, the Task number is output, but it would be better if the name of the task were output as well. Rather than "Task 1:Started Correct Artists on 12 Songs" have "1) Correct Artists - Started on 12 songs" etc.
2) It would be better if the tasks could provide more detailed information (perhaps an option to increase the logging detail level) so that one can deduce exactly why Jaikoz was unable to match a set of tracks to an album (was the fingerprint below threshold? was the meta data below threshold? were there multiple possible matches? what can the user do to fix the problem?). Let us know if one of the configuration settings in the Automatch tab causes the match to fail. For example, a message like this might be useful - "Unable to retrieve acoustic ID for song 14. Did not attempt tag data matching because option 'Do not match if unable to find an Acoustic Id Match' was enabled (try disabling)".
3a) Highlighting changed fields in Blue is useful, but it is very difficult to decide whether the change Jaikoz will make is for the better or for the worse if we can't see the before and after values. I suggest some way of seeing the before/after values of all the fields in the table. Perhaps a convenient way would be to have a function key (e.g. F12) to toggle all the values back and forth quickly on the selected tracks so that you can see precisely what is going to be changed if you Save. This should also work for Album art and filenames. Alternatively you could have 2 rows for each track (one showing BEFORE and the other AFTER). In Picard you can at least see the before/after for Title, Artist, Album, Track, Length, Date (but not album art).
3b) Related to the prior issue, it is highly limiting to assume that the online databases such as MB contain 100% correct tag data fields. In collections that have been ripped from CDs using Windows Media Player, it is often the case that certain tag data is already very complete and accurate (e.g. Title, Album, Track Number and in some cases even Artist, Genre and Year) whereas some fields are missing (e.g. Artwork, Album Artist). In order to be able to fix >10,000 track collections you need to be able to trust that AutoCorrect will at least not adversely affect any existing tag data. At the moment there does not seem to be any way of ensuring this automatically, nor does there seem to even be a way to check manually for this (see 3a). For example, I can see that Jaikoz has changed some of my TrackNo fields. I have "Pad numbers with zero to aid sorting" enabled so perhaps these changes are purely cosmetic formating, but perhaps the underlying values have changed - there's no efficient way to tell. I can see error messages in the console like Album "Come on Over (International Version)" by Shania Twain is spread over 2 MusicBrainz Release Ids" and (bad) experience with Picard tells me that this usually means that some tracks have been assigned to the wrong Album (release) and therefore the tag data from MB is going to be wrong.
4) Get Acoustic ID working for WMA music files. At the moment it seems that Jaikoz fails to get the acoustic ID for all WMA files. If you can't handle WMA files, then perhaps it's worth adding a facility for converting them on-the-fly to MP3 (but not saving the converted file) to obtain the Acoustic Id. Currently this is manifested by thousands of errors in the console like "WARNING: Unable to retrieve an acoustic id for song 1,423" and "Retrieve Acoustic Ids was unable to find a match for 7,788 Songs". This limitation is a bit of a show-stopper at the moment for large WMA collections
5) Create a key binding for quickly Viewing Artwork Fullsize for the selected track
6) Improve success rate for downloading album art. For example, Picard has no problem grabbing "Dave Matthews Band-Under the Table and Dreaming" (099148ab-9a81-4672-a1af-fa60261a7f15) but for some reason Jaikoz cannot even though the artwork is in the MB database.
7a) It seems that when "Acoustic Id Match must also have minimum meta rating" is true, then Jaikoz fails to find any Album art for songs without an Acoustic Id (thus all WMA files fail to find Album art).
7b) When I set this option to false then Jaikos finds album art for "K.T. Tunstall-Eye to the Telescope-Black Horse and Cherry Tree" but fails to find album art for all the other tracks in the same album even though I can see that Jaikos knows that the MB Release ID (7c62fb9c-26ad-4c9c-b08b-8361a9a1e6c7) is identical for all of the tracks in the album and therefore they share the same album art. I know you can copy and paste the missing artwork onto the other tracks but this makes the whole fixing process much too manual for a 50,000 track collection like mine.
8. What is the purpose of having both the "View Pane" as well as the "Editing Pane"? It looks like the Editing pane is almost identical in content to the View Pane except it allows editing of fields. It would be more intuitive (and be more efficient in terms of screen real-estate) if both were combined into something like the current View Pane, but make every field editable.
9) It is currently not possible to know if Jaikoz has changed tag data in a column that is not currently displayed. The user thus has no way of reviewing these changes before Saving. Perhaps one way to handle this would be as follows - when the user selects track(s), change the highlighting colour of each of the Tabs at the top of the View Pane to indicate which Tabs contain which sorts of changes for the current selection
10) Most of my music already has reasonable tag data, it just needs to have some missing fields added. Is there any way to prevent Jaikoz from changing tags for a track and assigning a new MB Id if it is trying to split tracks with the same Album name across multiple MB Release Ids (e.g. "Album Come on Over (International Version) by Shania Twain is spread over 2 MusicBrainz Release Ids"). This is a sure sign that tags are going to get messed up especially if all those tracks reside in the same directory together (i.e. it's even less likely that they should be from different album release ids)
11) Ability to save album art as "folder.jpg". Some music players (e.g. Sonos) sometimes has trouble displaying certain embedded album art but has no problem with folder.jpg files.
12) Display the total number of files currently selected in the status bar at the bottom of the window. This would be useful for counting statistics like how many files are missing Album Art etc.
13) Add TrackLength to the possible list of columns that can be displayed. This is very useful when trying to compare for matches with the MusicBrainz website. Otherwise you have to open the View Pane just so you can see the Track Length tag. Bit Rate and other commonly populated tags that are currently missing would be useful as well.
14) Output in the console to indicate whether or not the "Cluster" task was successful or not in getting all the tracks in the Album into the same MusicBrainz releaseId
15) It would be tremendously useful if Jaikoz could generate some kind of "Likelihood of error" score for each of the changes that it makes to each track. Then after running AutoCorrect, the user could sort the tracks by "Likelihood of error" and more quickly review the riskiest changes that Jaikoz is suggesting. At the moment the user can only review the console to try to spot potential errors in the AutoCorrect procedure. You could perhaps base part of the score formula on the Acoustic Id match score for the track. Examples of things that should raise a red flag and require user review include:
- Failure to cluster all of an album into a single MB Release Id (especially if all the tracks reside in the same directory)
- Changing the numeric value of a track
- Drastically changing the name of a track
- Drastically changing the name of an album
- Changing PlayingTime by more than 5%
16) Add a context menu item "Submit Album to MusicBrainz". This would only be enabled if sufficient meta-data is present for each of the Tracks. Jaikos would then check that the submission is not a duplicate, then automatically submit all the Release, and Track information without the user needing to visit the website at all. This would greatly increase the number of users that contribute to the website and with the appropriate checks in place would also improve data quality.

I would rate the priority of the issues as
High - 3, 4, 6, 7, 10
Medium - 9, 15, 16
Low - 1, 2, 5, 11, 12, 13, 14
Profile for mjw -> Messages posted by mjw [24]
Go to:   
Powered by JForum 2.1.6 © JForum Team