'Embedded nul in string' error with fread and sed.exe - fread

I have several huge text files(above 7G each), which were preprocessed with
sed.exe 's/\\0//g'
to clear all null in the string.
However, fread() still complains about 'Embedded null in string' error. Any idea to make sure that all null are cleaned in the files?

Related

Encoding issue in Postgres ERROR "UTF8" is it best to set encoding to UTF8 or to make the data WIN1252 compatible?

I created a table importing a CSV file from an excel spreadsheet. When I try to run the select statement below I get the error.
test=# SELECT * FROM dt_master;
ERROR: character with byte sequence 0xc2 0x9d in encoding "UTF8" has no equivalent in encoding "WIN1252"
I have read the solution posted in this stack overflow post and was able to overcome the issue by setting the encoding to UTF8, so up to that point I am still able to keep working with the data. My question, however, is whether setting the encoding to UTF8 actually is solving the problem or it is just a workaround that and will create other problems down the road and I would be better off removing the conflicting characters and making the data WIN1252 compliant.
Thank you
You have a weird character in your database (Unicode code point 9D, a control character) that probably got there by mistake.
You have to set the client encoding to the encoding that your application expects; no other value will produce correct results, even if you get rid of the error. The error has a reason.
You have two choices:
Fix the data in the database. The character is very likely not what was intended.
Change the application to use LATIN1 or (better) UTF-8 internally and set the client encoding appropriately.
Using UTF-8 everywhere would have the advantage that you are safe from this kind of problem.

why get string from redis, will miss blank space

For example:
I get string by command, there is a blank space \x00
127.0.0.1:6379> get "87102213_87102208"
"173275,3915125,10,\x00"
but in code in print log, the blank space is missed. Did you know why?
log_error("reply->str:%s,reply->len:%d",reply->str,reply->len);
reply->str:173275,3915125,10,,reply->len:19
Well \x00 is not a blank space, this is a nul char, which in C also happens to be the string terminator character. Most C API using strings as parameter will consider the string stops at the first \x00 character. This probably includes this log_error function.
However, Redis is binary safe, and all characters are meaningful including the nul char. When this value has been inserted, probably the size was wrong, so the nul char terminating the string was stored as well.

Illegal xml parsing import to sql mac roman

I have a xml that says it's encoding is UTF-8. When I use openxml to import data into sql, I always get "XML parsing: line xxxxxx, character xx, illegal xml character.
Right now I can go to each line and replace it with the a legal character and it goes well. Sometimes there maybe be more than 5 mac roman characters and it becomes tedious to replace. I am currently using notepad ++ and there is probably a way for this.
Can anyone suggest if anything can be done in sql level or does it have to checked before ran in sql?
So far, most of the characters found are, x95, x92, x96, xbc, xbd, xbo.
Thanks.
In your question, you did not specify whether illegal characters you had to remove were Unicode or not. Or whether the file was really expected to contain UTF-8 characters. Unlike for the ASCII, for UTF-8 some byte combinations are illegal, so if you declare the text file to be encoded in UTF-8, you might not be able to read it successfully till end (such a thing could never happen with ASCII).
So it is possible that by removal of <?xml version="1.0" encoding="UTF-8"?> you just declared some non-unicode encoding of your file (instead of previously declared UTF-8), so reading the data passed. You did not have many foreign characters like ľťčý in the file, did you? Normally, it is a must that you check what happened to those after the import. It might happen that your import passes without error, but city name Čadca becomes äadca and somebody will thank your company for rendering his address unreadable.

Convert a file to Binary or Hexadecimal

So I have a file that I need to have in either binary or hex format. Everything that I've been able to find basically says to store the text in a string and convert it to binary or hex from there, but I cant do it this way. The file was written using its own private character set that uses null and system hex codes, so notepad doesn't know what to do with these characters and replaces it with wrong characters and spaces. This distorts the information so it wont be correct if I try to convert it to binary/hex.
I really just need to have the binary/hex information stored in a string or text box so I can work with it. I don't really need it to be saved as a file.
Never mind, I finally figured it out. I used a file stream to read the data byte by byte. I didn't understand how to convert this as the first byte data in the array was showing as 80 when i knew the binary data should've been "1010000" (i didn't realize at that time that 80 was the decimal format).
Anyways I used the bitconverter.tostring and it put everything together and converted it to hexadecimal format. So i'm all good now.

Specific symbol and the Strings file

So I have a symbol: π in the strings file and it turnes out that due to it I cannot successfuly compile to fatal:
Copy EN.strings
Command /Developer/Library/Xcode/Plug-ins/CoreBuildTasks.xcplugin/Contents/Resources/copystrings failed with exit code 1
If I remove π it's fine. The strange thing is that even if I put π in the comment it still won't compile.
what to do?
Thankx
If you can find the Unicode value of the character, you could escape it in the following manor:
NSString *str = #"\u00F6"
And Java (just for comparison):
String str = "\u00F6";
Although I'd imagine that the compile issue relates to the character being from a different encoding to the specified encoding of your source file. I believe the compiler will interpret your source as UTF-8 by default.
Make sure your strings file is using a Unicode encoding, and make sure the string is quoted; this has solved the issue for me in the past.