Encoding works wrong - intellij-idea

Encoding works wrong - intellij-idea

I put the encoding UTF-8, it didn`t workenter image description here
Then I tried to change on windows-1251, and then back at UTF-8, it works only 1 compilation, then the same error.
In Settings -> File Encodings I set default and UTF-8, but it didn`t help enter image description here
Any thoughts?
Thanks

Related

UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xf1 in position 990: invalid continuation byte

I am comparing to csv files to each other to produce the final file with fathered differences information its giving me error message. I have resaved all files to csv decoded with utf-8 and tried running - it does not work. Can someone help me.

The problem is that your file is not in UTF-8 format. Many tools will refuse to handle data that is claimed to be UTF-8, but isn’t. I’d check first if that file is actually UTF-8 or is stored in some different encoding.

Combine SQL files with command `copy` in a batch file introduce an incorrect syntaxe because it does add an invisible character `U+FEFF`

In a pre-build event, a batch file is executed to combine multiple SQL files into a single one.
It is done using this command :
COPY %#ProjectDir%\Migrations\*.sql %#ProjectDir%ContinuousDeployment\AllFilesMergedTogether.sql
Everything appear to work fine but somehow the result give an incorrect syntaxe error.
After two hours of investigation, it turn out the issue is caused by an invisible character that remain invisible even with notepad++.
Using an online website, the character has been spotted and is U+FEFF has shown in following image.
Here are the two input scripts.
PRINT 'Script1'
PRINT 'Script2'
Here is the output given by the copy command.
PRINT 'Script1'
PRINT 'Script2'
Additional info :
Batch file is encoded with UTF-8
Input files are encoded with UTF-8-BOM
Output file is encoded with UTF-8-BOM.
I'm not sure it is possible to change the encoding output of command copy.
I've tried and failed.
What should be done to eradicate this extremely frustrating parasitic character?

It has turned out that changing encoding of input files to ANSI does fix the issue.
No more pesky character(s).
Also, doing so does change the encoding of the result file to UTF-8 instead of UTF-8-BOM which is great I believe.
Encoding can be changed using Notepad++ as show in following picture.

CSV encoding inside input text file step on Kettle

I am doing i simple input text file into Kettle Pentaho PDI 8.1.0. The file has several acceding char like "á" and it is a .csv file.
In the settings of the input text file step i set the encoding to ISO-8859-1. So when i go to "Show file content" button everything are correct.
But when i press the Preview rows so i can see the data separated into columns then i get error on all acceding chars and are replaced with ? So Mária becomes M�ria.
By using the word error i do not mean that kettle does not run the transformation but that the data are not correct.
Any Idea?

Your file is obviously not encoded in ISO-8859-1.
The Encoding field in the Content tab of 'Text file input' is used by the "Preview rows" button but not by the "Show file content" button.
Try another encoding.

Try encoding cp866, hope it helps, or also you could try encoding with latin-1

Encoding issue in I/O with Jena

I'm generating some RDF files with Jena. The whole application works with utf-8 text. The source code as well is stored in utf-8.
When I print a string contaning non-English characters on the console, I get the right format, e.g. Est un lieu généralement officielle assis....
Then, I use the RDF writer to output the file:
Model m = loadMyModelWithMultipleLanguages()
log.info( getSomeStringFromModel(m) ) // log4j, correct output
RDFWriter w = m.getWriter( "RDF/XML" ) // default enc: utf-8
w.setProperty("showXmlDeclaration","true") // optional
OutputStream out = new FileOutputStream(pathToFile)
w.write( m, out, "http://someurl.org/base/" )
// file contains garbled text
The RDF file starts with: <?xml version="1.0"?>. If I add utf-8 nothing changes.
By default the text should be encoded to utf-8.
The resulting RDF file validates ok, but when I open it with any editor/visualiser (vim, Firefox, etc.), non-English text is all messed up: Est un lieu g√©n√©ralement officielle assis ... or Est un lieu g\u221A\u00A9n\u221A\u00A9ralement officielle assis....
(Either way, this is obviously not acceptable from the user's viewpoint).
The same issue happens with any output format supported by Jena (RDF, NT, etc.).
I can't really find a logical explanation to this.
The official documentation doesn't seem to address this issue.
Any hint or tests I can run to figure it out?

My guess would be that your strings are messed up, and your printStringFromModel() method just happens to output them in a way that accidentally makes them display correctly, but it's rather hard to say without more information.
You're instructing Jena to include an XML declaration in the RDF/XML file, but don't say what encoding (if any) Jena declares in the XML declaration. This would be helpful to know.
You're also not showing how you're printing the strings in the printStringFromModel() method.
Also, in Firefox, go to the View menu and then to Character Encoding. What encoding is selected? If it's not UTF-8, then what happens when you select UTF-8? Do you get it to show things correctly when selecting some other encoding?
Edit: The snippet you show in your post looks fine and should work. My best guess is that the code that reads your source strings into a Jena model is broken, and reads the UTF-8 source as ISO-8859-1 or something similar. You should be able to confirm or disconfirm that by checking the length() of one of the offending strings: If each of the troublesome characters like é are counted as two, then the error is on reading; if it's correctly counted as one, then it's on writing.

My hint/answer would be to inspect the byte sequence in 3 places:
The data source. Using a hex editor, confirm that the é character in your source data is represented by the expected utf-8 hex sequence 0xc3a8.
In memory. Right after your call to printStringFromModel, put a breakpoint and inspect the bytes in the string (or convert to hex and print them out.
The output file. Again, use a hex editor to inspect the byte sequence is 0xc3a8.
This will tell exactly what is happening to the bytes as they travel along the path of your program, and also where they deviate from the expected 0xc3a8.

The best way to address this would be to package up the smallest unit of your code that you can that demonstrates the issue, and submit a complete, runnable test case as a ticket on the Jena Jira.

Problem with word "Nestlé" in an XML doc (UTF-8 encoding) using NXXMLParser. Any idea?

We are using NSXMLParser in Objective-C to parse our XML document, which are all UTF-8 encoded. One document has a string "Nestlé" in it (as in ...<title>Nestlé Novelties</title>...). The parser just quit, reporting an error with error code=9, due to the French letter "e" at the end of the word "Nestle". Furthermore, we tried using IE, Chrome, Safari to show the same document directly. They reported a similar encoding error.
We are using UTF-8 for all incoming XML document, which means that all of them have "<?xml version="1.0" encoding="UTF-8" ?>" as the top of the document.
Is this an encoding problem? If so, how do we solve this? What encoding should we use for all of our XML documents? Thanks in advance!
Barclay

Have you checked the file with a hex editor to verify that the "é" is indeed UTF-8, 0xC3 0xA9 ?

In HTML, I would use Nestlé Does that work for your application?

Something I saw just now in an example XML file was that a string containing user-defined input (which happened to include é characters) wrapped the contents of the containing tag in CDATA declarations. This has the effect of making the parser completely ignore the characters contained therein.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Encoding works wrong - intellij-idea

Related

UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xf1 in position 990: invalid continuation byte

Combine SQL files with command `copy` in a batch file introduce an incorrect syntaxe because it does add an invisible character `U+FEFF`

CSV encoding inside input text file step on Kettle

Encoding issue in I/O with Jena

Problem with word "Nestlé" in an XML doc (UTF-8 encoding) using NXXMLParser. Any idea?

Categories

Resources