So I'll demonstrate how to correct these errors by using some linux commands
Identifying the error
To identify the offending character, we can use the iconv command as follows:[root@webserver failed]# iconv -f utf-8 bad_file.xml -o /dev/null
iconv: illegal input sequence at position 25691275
Here we can see that the offending characters is at position 25691275
To view the character at this position use the following command:
head -c 25691310 bad_file.xml
Removing the bad characters
If you can get away with removing the bad characters without having to replace them, then you can use this command:iconv -f utf8 -t utf8 -c bad_file.xml > fixed.xml
If you want to fix a batch of files then you can use this command:
find . -type f -exec bash -c 'iconv -f utf8 -t utf8 -c "{}" > ../fixed/"{}"' \;
No comments:
Post a Comment