[Logo] Jaikoz and SongKong Forums
  [Search] Search   [Recent Topics] Recent Topics   [Members]  Member Listing   [Groups] Back to home page 
[Register] Register / 
[Login] Login 
Messages posted by: Nocturnal  XML
Profile for Nocturnal -> Messages posted by Nocturnal [20]
Author Message

paultaylor wrote:

Nocturnal wrote:

Maybe if it was called "Remote Correct Lyrics" I would have understood quicker  

Maybe, but it is in the "Remote Correct" sub menu 

So it is, but I just added it to the list of autocorrector tasks preferences. I never looked for it in the menus
It was my mistake. I thought "Correct Lyrics" would do the same as "Correct Artist" etc.: just remove widespaces, trim extra spaces, etc.

I didn't realize it would actually look up the lyrics.

Maybe if it was called "Remote Correct Lyrics" I would have understood quicker
The "Correct lyrics"-action is working very slow for me, doing only about 1 song per second. CPU usage is mostly below 20%. Most of the songs don't even have lyrics stored (the ID3 field should be either empty or more likely non-existant).

I was wondering what could be making this so time-consuming?
A reply in less than an hour, that's fast support!
I was wondering what the warning about a low memory limit meant in the debug log (even before loading files):

debug log wrote:
19/10/2008 19.24.10:com.jthink.jaikoz.monitoring.MemoryManager:addMemoryNotification:WARNING: Low memory limit is set for:Tenured Gen:1326238924 


I use the following line to start Jaikoz:

a shortcut wrote:
C:\WINDOWS\system32\java.exe -Xms150m -Xmx1612m -jar lib\jaikoz.jar -l2 -m2 -f 
(with the working path set to the Jaikoz directory, of course).

Jaikoz seems to be using all the memory I give it:

user log wrote:
Oct 19, 2008 7:23:08 PM: INFO: Jaikoz has been configured with minimum heap memory of 150 Mb, maximum heap memory of 1,599 Mb and maximum permanent memory of 64 Mb 


So I was wondering what the warning about a low memory limit meant?

paultaylor wrote:
The error will only occur for tracks that have no artist, no album and a track no of the form x/y. What happens is Jaikoz constructs a query based partly on the track no but because track no isn't a simple number (i.e 1 versus 1/10) it fails to extract the number properly and the query sent to Musicbrainz is invalid. The workaround is to manually change the trackno field or run Action/Local Correct/Track No Correct to change fields from x/y to x . Of course this means you lose the TotalTrack information. This will definently be fixed fopr next release. 


I was about to post about a "503 Busy" problem I had, when I decided to search the forums first and found this post. This is probably the same problem I am having. The console (which is always open because I start Jaikoz with jaikoz.bat) even shows the partial queries used:

debug output wrote:
19/10/2008 19.04.01:com.jthink.jaikoz.manipulate.musicbrainzhelper.MusicBrainzServerQuery:performQuery:WARNING: URL:http://musicbrainz.org/ws/1/track/?type=xml&limit=10&query=track:"Psycho Theme" AND () AND qdur:(85 86 87 88 89 ) failed with error code:503 


The track in question only has a title ("Psycho Theme") and a track number specified. The track number is not in the format TrackNumber/TotalTracks; it's actually just "00". It also has a MusicIP ID, "f40d920b-dea0-5e9d-a8d9-c0d9bd3bcb36", should that matter. :)

Do you already have an idea when a version with this fixed will be released?
I find that Jaikoz usually runs out of memory not because of too many files. That just makes Jaikoz really slow because it constantly has to search through and sort the list of loaded files. The problem of not enough memory lies with the mipcore.exe program that extracts the audio from compressed form to wav-form for analysis. When you have an audio file of one hour (a long DJ set, for example), that means 44100 Hz x 16 bit x 2 channels x 3600 seconds = 606MB

This comes on top of all your running programs, the operating system, and Jaikoz itself (which is also using a lot of memory for the file list and all the ID3 information, even more so when you have a lot of cover art).

One thing you can do to reduce the memory usage of Jaikoz, is disable the column that displays the cover art.

I'm currently doing a batch of 60 000 files. The loading of the files went fine. 30 000 of these already have an acoustic ID saved, so for the first step (analysis with genpuid, acoustic analysis) this leaves 30 000 files.

This is on a computer with 3GB of RAM. Windows XP and the JRE are 32 bit. Because everything is 32 bit, the maximum heap I can assign to Java is around 1600MB (currently using -Xmx1612m). This is not a bug in Jaikoz, but a limitation of the 32 bit memory addressing system. For more information for Windows on this limitation, see http://support.microsoft.com/kb/924054.

paultaylor wrote:
Appears to be a problem processing records with ? in it, Ill look into it further. 


I can provide the files if you can't find the error on the output alone. Just ask

paultaylor wrote:
It is unfortunate that Jaikoz crashed, Im suprised I would have thought it would just return an error. But Focher is correct asking Jaikoz to hold 25,000 records in memory is asking alot, 


The memory really isn't an issue when looking up MB ID's. Jaikoz uses about 400 MBytes of RAM under Windows when 28 000 files are loaded, if I remember correctly. Generating PUID's is different, because in addition to the memory Jaikoz uses, the "mipcore" process that decodes the songs for analysis uses something at least 10 MBytes of RAM per minute of MP3. So if you have a mix of 60 minutes, it can get pretty memory-consuming

paultaylor wrote:
I think when I introduce a File/open using a tree structure customers will find it easier to process subsets of data the time. 


I realize processing them in parts is smarter, but I was away from my computer over the weekend so I hoped it would have finished when I got back. Bad gamble

paultaylor wrote:
There would be some benefit to gaining from having Jaikoz Save Acoustic Ids to file as they are processed on the basis that the acoustic id is ALWAYS correct, takes longer to generate than any other task and I cant see any reason why somebody would not want to save it. However this would break Jaikozes rule of only persisting changes as and when the customer demands and could cause confusion. I could add a setting 'Save Acoustic Id to File as Created' which would save this attribute (but no others) automatically, but this would mean that the status flag would revert to unchanged as soon as it had been saved. What do people think ? 


I would certainly use that option. Perhaps it would be easiest for the user when you don't make a setting out of it. Instead, when you press the "Retrieve acoustic id's" button, you could have it pop up a dialog with the radio button options of either saving the ID as soon as a file finishes (new behaviour), or just use the old behaviour.
Code:
org.apache.lucene.queryParser.ParseException: Cannot parse '\??': '*' or '?' not allowed as first character in WildcardQuery
         at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:149)
         at com.jthink.jaikoz.indexed.DataIndexer.recNoColumnMatchesSearch(DataIndexer.java:684)
         at com.jthink.jaikoz.manipulate.MusicBrainzRESTQuery.compareValues(MusicBrainzRESTQuery.java:923)
         at com.jthink.jaikoz.manipulate.MusicBrainzRESTQuery.calculateUnnormalisedAlbum(MusicBrainzRESTQuery.java:1031)
         at com.jthink.jaikoz.manipulate.MusicBrainzRESTQuery.calculateUnnormalisedScore(MusicBrainzRESTQuery.java:1148)
         at com.jthink.jaikoz.manipulate.TrackWithUnnormalizationScore.<init>(TrackWithUnnormalizationScore.java:23)
         at com.jthink.jaikoz.manipulate.MusicBrainzRESTQuery.findMatch(MusicBrainzRESTQuery.java:380)
         at com.jthink.jaikoz.manipulate.MusicBrainzRESTQuery.updateFromOnlineDatabase(MusicBrainzRESTQuery.java:217)
         at com.jthink.jaikoz.manipulate.TagFromMusicBrainzRowAnalyser$WorkerThread.analyse(TagFromMusicBrainzRowAnalyser.java:336)
         at com.jthink.jaikoz.manipulate.TagFromMusicBrainzRowAnalyser$WorkerThread.run(TagFromMusicBrainzRowAnalyser.java:289)
 30/06/2007 19.15.25:SEVERE: There was a problem submitting a query to MusicBrainz for Record Number 25,672 with filename Metallica - Stone cold crazy.mp3
 java.lang.RuntimeException: Unable to do perform reno match search:\??:Cannot parse '\??': '*' or '?' not allowed as first character in WildcardQuery
         at com.jthink.jaikoz.indexed.DataIndexer.recNoColumnMatchesSearch(DataIndexer.java:701)
         at com.jthink.jaikoz.manipulate.MusicBrainzRESTQuery.compareValues(MusicBrainzRESTQuery.java:923)
         at com.jthink.jaikoz.manipulate.MusicBrainzRESTQuery.calculateUnnormalisedAlbum(MusicBrainzRESTQuery.java:1031)
         at com.jthink.jaikoz.manipulate.MusicBrainzRESTQuery.calculateUnnormalisedScore(MusicBrainzRESTQuery.java:1148)
         at com.jthink.jaikoz.manipulate.TrackWithUnnormalizationScore.<init>(TrackWithUnnormalizationScore.java:23)
         at com.jthink.jaikoz.manipulate.MusicBrainzRESTQuery.findMatch(MusicBrainzRESTQuery.java:380)
         at com.jthink.jaikoz.manipulate.MusicBrainzRESTQuery.updateFromOnlineDatabase(MusicBrainzRESTQuery.java:217)
         at com.jthink.jaikoz.manipulate.TagFromMusicBrainzRowAnalyser$WorkerThread.analyse(TagFromMusicBrainzRowAnalyser.java:336)
         at com.jthink.jaikoz.manipulate.TagFromMusicBrainzRowAnalyser$WorkerThread.run(TagFromMusicBrainzRowAnalyser.java:289)

Seems like I have to redo 25 000 lookups, since it crashed
So what I think is a general good approach for sorting (the largest part of) a big music collection is this:

  • Get Acoustic ID's for my entire collection (done)
  • Get MusicBrainz ID's for the items that have a Acoustic ID (to increase the chance the matches are accurate) (in progress)
  • Remove duplicate MB ID's (maybe using the function I posted somewhere else, if that gets implemented)
  • Rename and move the items that have a MB ID (to increase the chance the tags are accurate) according to their tags (folder name: %artist%\%artist% - %albumname%; file name: %artist% - %albumname% - %tracknr% - %title%)

    The problem with the last step is that if I have an album with various artists (like a soundtrack), the songs will be spread over different folders (as they all have different artists). Any suggestions on how to avoid that?
  • Am I correct in thinking that the MusicBrainz ID only is the same when it is the exact same release of a song? In other words, the MB ID of "Michael Jackson - Billie Jean" on the album "Thriller" should always be DIFFERENT from the MB ID of "Michael Jackson - Billie Jean" on the greatest hits album "Number Ones"?

    paultaylor wrote:

    Nocturnal wrote:
    maybe it doesn't get reset properly after every run... 


    That isnt the issue, but I have found something else that maybe causing the problem, a debug log showing the problem occurring would be useful if you have a chance. 

    That's good news!

    I don't have a debug log yet. I'm running a session of 22 000+ files now, already going 20+ hours, I really didn't want to slow that down or risk a hang with that many files . The error only occured 25 times in 10 000 files so it really is a rare bug. I don't actually have multiple CPU's, but I do have Hyperthreading activated, which the OS (both Windows and Linux) will treat as if they were 2 CPU's. I do have some more output (different error messages but I'm pretty sure they're caused by the same problem):
    Code:
    [Fatal Error] :1:1: Content is not allowed in prolog.
     24/06/2007 19.46.35:WARNING: Unable to create Puid for:2556:null
     [Fatal Error] :1:20: XML document structures must start and end within the same entity.
     24/06/2007 19.55.24:WARNING: Unable to create Puid for:2997:null
     [Fatal Error] :-1:-1: Premature end of file.
     24/06/2007 19.57.38:WARNING: Unable to create Puid for:3132:null
     [Fatal Error] :-1:-1: Premature end of file.
     24/06/2007 19.57.49:WARNING: Unable to create Puid for:3135:null
     [Fatal Error] :1:20: XML document structures must start and end within the same entity.
     24/06/2007 19.57.56:WARNING: Unable to create Puid for:3136:null
     [Fatal Error] :-1:-1: Premature end of file.
     24/06/2007 20.04.30:WARNING: Unable to create Puid for:3505:null
     [Fatal Error] :1:20: XML document structures must start and end within the same entity.
     24/06/2007 20.08.34:WARNING: Unable to create Puid for:3741:null
     [Fatal Error] :-1:-1: Premature end of file.


    This shows that it's really a problem with recording of the output of genpuid. Since I doubt the problem is in the output itself, it might be caused by processing the output before it's entirely finished (i.e. it still has to write "npuid>" or something)?

    If you need me to beta-test a possible bugfix, let me know.
    BTW, I see this issue on Windows, as well, just not as frequently.

    paultaylor wrote:
    I dont know where
    [Fatal Error] :-1:-1: Premature end of file.
    is coming from but I guess it is coming from genpuid in some way. Ive just reread your message how do you its from the xmlparser, thats interesting. 

    When you look it up on Google, the first result shows the exact same error message, coming from the Java SAX XML parser. No doubt you also use some kind of Java XML parser on the output of genpuid...

    "Premature end of file" would probably refer to the XML data you feed (from genpuid) into the parser, so the problem is probably either the XML returned by genpuid, or the variable you use to store it in (maybe it doesn't get reset properly after every run?)...

    paultaylor wrote:
    [...]
    Can you clarify for me is this happening consistently for the same files. For example in the debug log where it says Unable to create Puid for:132, the 132 refers to the 133nd file (because records are labelled from zero) in the list when sorted by record number. So if this problem occurs again after identifying the record you could right click on it s record header and just select 'Remote Correct/Correct Acoustic Ids' to retry this one file.
    [...] 

    (The list in the GUI of Jaikoz also starts from 0.)
    What I did was first analyse a directory of 517 files with genpuid (all files at once). I saved the output in a file. Then I opened the same directory in Jaikoz, started the acoustic analysis, and waited until 2 songs had shown the error. Then I cancelled the operation, and retried acoustic analysis on them separately. This did work. The PUID for these songs was also present in the saved output of genpuid itself.

    Then I closed Jaikoz (without saving anything), reopened it, and did everything exactly the same. There were errors again, but this time on different files! The files that gave errors the first time, had no problem this time. Again right-clicking and analyzing only that file gave no problems. The files that had errors this time also didn't give any error with genpuid.

    It is possible that these files are corrupt, but even that can't explain why it doesn't always hang on the same files. What I am wondering about is the error format:
    Code:
     [Fatal Error] :-1:-1: Premature end of file.
     23/06/2007 01.06.15:WARNING: Unable to create Puid for:11:null
     

    What programs generate these? The first one doesn't seem like your log format. [edit]I found out it's an error from your XML parser. So probably the output received from genpuid isn't up to par all the time...[/edit] The second one states the list number (so probably from your program) and then 'null'. Where do you print this message? What should follow the list number? Seems like that parameter isn't initialized...

    I'll retry this with more debug output later...
    I'm also going to try redoing the entire directory with genpuid, seeing if it also sometimes fails on different files.
    I see this issue, too.
    This is under Gentoo Linux with all latest updates and Sun's VM 1.5.0_11.

    Examples: Code:
     [Fatal Error] :-1:-1: Premature end of file.
     21/06/2007 19.41.12:WARNING: Unable to create Puid for:29:null
     [Fatal Error] :-1:-1: Premature end of file.
     21/06/2007 19.45.07:WARNING: Unable to create Puid for:95:null
     [Fatal Error] :-1:-1: Premature end of file.
     21/06/2007 19.45.56:WARNING: Unable to create Puid for:112:null
     [Fatal Error] :-1:-1: Premature end of file.
     21/06/2007 19.46.32:WARNING: Unable to create Puid for:127:null
     [Fatal Error] :-1:-1: Premature end of file.
     21/06/2007 19.46.45:WARNING: Unable to create Puid for:132:null
     [Fatal Error] :-1:-1: Premature end of file.

    I have uploaded all log files (userlog, debuglog and part of the console output) here.
    I was wondering why so many characters are removed during the local auto correct? It says somewhere this improves the matching, but I don't understand that.

    Take for instance the Radiohead album "OK Computer". It has a song titled "Exit Music (For a Film)". If you remove the parenthesis from the title, but the parenthesis ARE present in the Musicbrainz DB, wouldn't the matching be made more difficult?

    Same goes for characters like single quotes, double quotes, slashes, ...

    I know this can be completely customized or turned off, but I was wondering why this was done in the first place?

    I thought you would want to preserve as much as possible of the ID3 tag present, only removing illegal/undisplayable characters, sending that off to match with Musicbrainz, save the found data in the ID3 tag, and form a filename from it, only replacing illegal filesystem characters like '?' with a space, underscore, or nothing...
    I think I have found the (or at least part of the) problem.

    Using Wireshark I found out that genpuid always tries to connect to the MusicIP server through port 10001 first. When that fails, it uses port 80 as a fallback. Port 10001 is blocked here, port 80 isn't. So it has to wait for a timeout on the connect to port 10001 before it falls back to port 80.

    And since Jaikoz starts genpuid over for each song, it needs to wait for the timeout for each song. When using genpuid in recursive mode (-r), the timeout only occurs once.

    I don't have the router password to unblock this, however, and the guy who does is on holiday :s

    At least now we know what causes this
    When I've loaded a couple thousand songs in Jaikoz, and I start the acoustic analysis on them, I expect it to use most of my CPU cycles. After all, the decoding of an MP3, the mathematical analysis of it, etc. are all very CPU intensive tasks. However, when I monitor the CPU usage, it is only around 20% most of the time, with 1 spike of 80+% per song analysed.

    This is odd, isn't it? Shouldn't it just use the maximum CPU power available and make analysis faster? I know that it also sends and receives data, but that should not prevent it from using, on average, more CPU time, right?

    Acoustic analysis takes about 25 seconds per song on my PC.

    Could this be a problem with settings, and/or the genpuid program, and/or is there anything I can do to speed things up? Is the time (~25s/song) normal?
    Like the topic title says, I'd like it if the bit rate, length (of the audio, like mm:ss) and file size are available as columns in Jaikoz's table. This would make it a lot easier to decide which one of duplicate files you'd like to keep.

    It could be even more easy if there was another action that handles duplicates automatically. Its settings could look like this:
    "If the musicIP ID/musicbrainz ID is the same, keep the one with the largest/smallest bitrate/filesize/duration.
    If the criterium above is equal, keep the one with the largest/smallest bitrate/filesize/duration.
    If the criterium above is equal, keep the one with the largest/smallest bitrate/filesize/duration.
    If the criterium above is equal, ask me which one to keep/ignore this duplicate/call the Java police."
     
    Profile for Nocturnal -> Messages posted by Nocturnal [20]
    Go to:   
    Powered by JForum 2.1.6 © JForum Team