Thank you for your donation!


Cloudsmith graciously provides open-source package management and distribution for our project.


Solved: mpd error malformed utf-8 characters, possibly incorrectly encoded
#11
@chaos333

Got it.

The track is a WAV file. The track file name contains Chinese characters. The file name displays properly in moOde's Library Folder view and in audio info modals.

Fortunately, the mediainfo utility provided with moOde knows the WAV format (a derivative of RIFF) and can reach into the INFO chunk to get at the metadata.

Here's what I see [1]

Code:
pi@m8pi3a:/mnt/SDCARD/chaos333 $ mediainfo 06*
General
Complete name                            : 06 但愿人长久.wav
Format                                   : Wave
File size                                : 43.0 MiB
Duration                                 : 4 min 15 s
Overall bit rate mode                    : Constant
Overall bit rate                         : 1 411 kb/s
Album                                    : 《和谐之声 为祖国祝福 长城独唱音乐会》
Performer                                : 谭晶
Director                                 : ??
Original source form/Name                : ????? ????? ????????
...

Not a very useful set of entries. There is no track name entry at all. It would appear that the 'Original source form/Name' entry is overriding the Album entry when the MPD decoder processes the file. (I seem to recall this is how such an entry is intended to be used.). It looks like the Director entry is overriding the Performer entry. Candidly, I don't use WAV format enough to know what metadata can be recorded and how MPD processes it. Time to do some homework. 

If I get time later today, I'm going to try to replace this INFO chunk with a more reasonable set of values and see what happens.

I'll also see what happens when I transcode this track to FLAC and populate the metadata with reasonable values.

I have not been able to replicate the 'malformed UTF-8 character...' issue but maybe I overlooked something.

Regards,
Kent

[1] Caution: mediainfo doesn't necessarily print the actual names of metadata items as they are entered in the file so one has to be cautious when making changes to them using some other tool.
Reply
#12
To be clear, in @chaos333's case, the ??? strings are encoded as such in the file's metadata. It is not an UTF-8 issue. 

@OldNick

What you are reporting appears to be unrelated to this thread. You should start a new thread.

Regards,
Kent
Reply
#13
@chaos333 

I couldn't resist. Rather than spending time studying the WAV metadata mechanism, I turned to my Linux laptop and transcoded the WAV file to FLAC using ffmpeg

Code:
ffmpeg -i '06 但愿人长久.wav' -c:a flac '06 但愿人长久.flac'

Almost instantaneously, I had a FLAC file. Interestingly, just like MPD, ffmpeg chose the ?? entries to populate the FLAC metadata. 

I used metaflac to remove those entries and create new entries from the command line. Here's what mediainfo reports for my edited file

Code:
pi@m8pi3a:/mnt/SDCARD/newchaos $ mediainfo '06 但愿人长久.flac'
General
Complete name                            : 06 但愿人长久.flac
Format                                   : FLAC
Format/Info                              : Free Lossless Audio Codec
File size                                : 23.9 MiB
Duration                                 : 4 min 15 s
Overall bit rate mode                    : Variable
Overall bit rate                         : 785 kb/s
Album                                    : 和谐之声 为祖国祝福 长城独唱音乐会
Track name                               : 但愿人长久
Track name/Position                      : 6
Performer                                : 谭晶
Writing application                      : Lavf58.29.100

The artist, track name, and album name for this track all appear as they should in Chinese characters in moOde's various panels (see attached screenshot of the Playback panel).

Bottom line: I don't believe there's a problem related to Chinese characters encoded in UTF-8.

Regards,
Kent


Attached Files Thumbnail(s)
   
Reply
#14
(05-13-2022, 04:41 PM)TheOldPresbyope Wrote: @chaos333 

I couldn't resist. Rather than spending time studying the WAV metadata mechanism, I turned to my Linux laptop and transcoded the WAV file to FLAC using ffmpeg

Code:
ffmpeg -i '06 但愿人长久.wav' -c:a flac '06 但愿人长久.flac'

Almost instantaneously, I had a FLAC file. Interestingly, just like MPD, ffmpeg chose the ?? entries to populate the FLAC metadata. 

I used metaflac to remove those entries and create new entries from the command line. Here's what mediainfo reports for my edited file

Code:
pi@m8pi3a:/mnt/SDCARD/newchaos $ mediainfo '06 但愿人长久.flac'
General
Complete name                            : 06 但愿人长久.flac
Format                                   : FLAC
Format/Info                              : Free Lossless Audio Codec
File size                                : 23.9 MiB
Duration                                 : 4 min 15 s
Overall bit rate mode                    : Variable
Overall bit rate                         : 785 kb/s
Album                                    : 和谐之声 为祖国祝福 长城独唱音乐会
Track name                               : 但愿人长久
Track name/Position                      : 6
Performer                                : 谭晶
Writing application                      : Lavf58.29.100

The artist, track name, and album name for this track all appear as they should in Chinese characters in moOde's various panels (see attached screenshot of the Playback panel).

Bottom line: I don't believe there's a problem related to Chinese characters encoded in UTF-8.

Regards,
Kent

 hi kent,
perfect.
that shows all characters instead of the ??? issue.

so can  you advice what is the cause or how to avoid it permantly? i might have more than 1000 sound tracks or 200+albums that has the same problem...


and malformed UTF-8 character issue happen when i have ??? issue and now is return to normal after i try tim's sgueestion to clear the cathe while i did failed after twices but when i load another music source using SMB and turn the previous soucre off, the problem disapperrs and i use back the previous source---it never happen again. so maybe tim is correct but i dit not know why the cache is not cleared enough..
Reply
#15
@chaos333

I have a number of audio-related software tools which can read/edit metadata from a variety of file formats. They give conflicting views of the metadata data in your WAV file.

In desperation, I examined your WAV file with a hex editor. It turns out the LIST INFO chunk contains metadata using both RIFF-defined tags and ID3v23 tags. (RIFF is the parent file type of a WAV file; ID3v23 is a metadata tagging scheme developed originally for MP3 files.)

So, for example, the RIFF tag IPRD contains "????? ????? ????????" while the ID3v23 tag TALB contains "《和谐之声 为祖国祝福 长城独唱音乐会》".  Depending on the program reading the metadata, either of these may be interpreted as the album title. This explains the conflicting answers I get with my tools. It also suggests why you saw what you reported in your first post in this thread.

I looked at several WAV test files I found online. They also contained this mix of metadata tags but in their cases there were no conflicts; e.g., IPRD and TALB contained the same album title values; IART and TPE1 contained the same artist name values; etc.

Regards,
Kent
Reply
#16
@chaos333

Quote:so can  you advice what is the cause or how to avoid it permantly? i might have more than 1000 sound tracks or 200+albums that has the same problem...

Ouch. Obviously, editing tracks by hand is not an option!

If every track were like this one then I can imagine a script which extracts the track title from the file name and writes it into an ID3v23 TIT2 tag. For example, one could use the mutagen module available in Python. 

What I don't know how to do is reconcile the conflicting RIFF and ID3v23 tag values in a WAV file.

It would seem like there would be some open-source packages already available in a library or at github which would help, but I haven't found any yet. Maybe someone else knows of software which can help.

I avoided this dilemma by transcoding the WAV file into a FLAC file and then hand-editing the FLAC metadata. To automate the process one would need a script which did something like

1. read metadata and file name from WAV file
2. transcode WAV to FLAC
3. use the values read in step 1 to write appropriate metadata to the FLAC file
4. save the file

On a Linux system like my laptop, this could be scripted using ffmpeg for the transcoding and, for the metadata work, Python with its mutagen module or bash with, say, the mediainfo and metaflac programs. There are other possibilities as well; I've just cited programs I've been comfortable using for similar purposes.

Regards,
Kent
Reply
#17
(05-13-2022, 03:06 PM)OldNick Wrote:
(05-13-2022, 01:01 PM)chaos333 Wrote: hi, dear all,
here are the music files with the problems on both my hdmi monitor and webbrowsers.

https://drive.google.com/drive/folders/1...sp=sharing


the web browser google chromw shows only the name but no alburm information only "???"



and the hdmi-monitor is even worth and even the name of it is also ???
what language is this ?? should show?
english or chinese characters or others?




thank you.

i really dont know what happen so if i can get helped. that is really very thanks.

Interestingly enough, I had a DSD file last night that displayed ? for the sample rate.  I will investigate later today (time willing) and report back here if I can find anything useful... Huh
Reply
#18
(05-14-2022, 09:15 PM)TheOldPresbyope Wrote: @chaos333

Quote:so can  you advice what is the cause or how to avoid it permantly? i might have more than 1000 sound tracks or 200+albums that has the same problem...

Ouch. Obviously, editing tracks by hand is not an option!

If every track were like this one then I can imagine a script which extracts the track title from the file name and writes it into an ID3v23 TIT2 tag. For example, one could use the mutagen module available in Python. 

What I don't know how to do is reconcile the conflicting RIFF and ID3v23 tag values in a WAV file.

It would seem like there would be some open-source packages already available in a library or at github which would help, but I haven't found any yet. Maybe someone else knows of software which can help.

I avoided this dilemma by transcoding the WAV file into a FLAC file and then hand-editing the FLAC metadata. To automate the process one would need a script which did something like

1. read metadata and file name from WAV file
2. transcode WAV to FLAC
3. use the values read in step 1 to write appropriate metadata to the FLAC file
4. save the file

On a Linux system like my laptop, this could be scripted using ffmpeg for the transcoding and, for the metadata work, Python with its mutagen module or bash with, say, the mediainfo and metaflac programs. There are other possibilities as well; I've just cited programs I've been comfortable using for similar purposes.

Regards,
Kent
hi Kent,
you have been very helpful enough to help me locating this issue and problem.
and i am now using a kid3 to try to edite an album tag.
and as your finding goes, when i change the tags with correct album name and album artist, it works for my cellphone browser using chrome.

but it does not does show the correct characters in the -HDMI display monitor playing the same song.  still showing ??? issue.
so i doubt i have met another similar issue but not the tag issue with the internal (moode itself) brower. maybe this time is a real chinese character issue with the browser?

see below picture.
i have no ”???in my cellphone browser 

   

and these are the ??? character issues

   

so, what now? 
thanks.
Reply
#19
@chaos333

Interesting. I wouldn't have thought manipulating the metadata with a tag editor such as Kid3 would be sufficient. NIce.

As for the local display, I'm able to reproduce your finding.

This display is being generated by a local instance of chromium browser running as an X11-client on the moOde host. I suppose the discrepancy we see is due to the X11-server on the same host not having access to a compatible set of xfonts or running with the wrong locale or font setting. 

I've never had an occasion to run an X11-server with locale/font settings different from the system settings so I don't know what to suggest here. Sorry. If I get a chance I'll try tweaking some settings to see what happens but perhaps someone else who already knows the answer will respond first.

Regards,
Kent
Reply
#20
(05-16-2022, 07:15 PM)TheOldPresbyope Wrote: @chaos333

Interesting. I wouldn't have thought manipulating the metadata with a tag editor such as Kid3 would be sufficient. NIce.

As for the local display, I'm able to reproduce your finding.

This display is being generated by a local instance of chromium browser running as an X11-client on the moOde host. I suppose the discrepancy we see is due to the X11-server on the same host not having access to a compatible set of xfonts or running with the wrong locale or font setting. 

I've never had an occasion to run an X11-server with locale/font settings different from the system settings so I don't know what to suggest here. Sorry. If I get a chance I'll try tweaking some settings to see what happens but perhaps someone else who already knows the answer will respond first.

Regards,
Kent

yes. kid3 is a nice opensource for tag modifying..thanks for your hint.

as per the monitor not showing characters, your finding is really very impressive and pity that so far you do not have a deep research.

please inform me about this no matter how many years after. lol.

and i wonder if it support chinese charasters at all for the local chromium browser...then i might be the moode does not have the fonts..maybe..
i dunno.

some sone else could explain more? not even solve.
thanks
Reply


Forum Jump: