HI there
i noticed that when I generate a pex test solution the default encoding of the files is UCS-2 Little Endian, this is not really cool, because all the rest of the files are normally encoded with Windows ANSI
(I m getting this info from Notepadd ++) and its confirmed by my CI breaking
Anyone knows
1) why is it using this encoding?
2) how to change it so by default it uses Windows ANSI like the rest of the files
NOTE:I know this is the issue because i saved the file with Windows Ansi Encoding and it all works
I know I probably shouldnt but I went and posted this same question on the pex forum
link to the question
and this was an Answer from Peli ( he is heavily involved in the Pex project AFAIK)
Copy of the Answer
1) why is it using this encoding?
There is no particular reason for this, besides that we decide to use this particular encoding. We will switch on Windows-1252 (ANSI) encoding in the future for source files. XML files will still be encoded as UTF-8.
2) how to change it so by default it uses Windows ANSI like the rest of the files
Unfortunately, this is hard-coded in Pex and you cannot change this. The next release of Pex (0.93) will use ANSI.
Related
I just read
How to parse a OFX (Version 1.0.2) file in PHP?
I am not a developer. What easy tool can I use to make this code run with no code skill or appetence ? web browser is pretty hard to use for non dev guys.
I need this to use the file into Power BI, which accept M code, json source or xml, but not sgml ofx or PHP.
Thanks in advance
Welcome Didier to StackOverflow!
I'm going to try and give you a clue how I'd approach the problem here. But keep in mind that your question really lacks details for us to help you, and I'm asking to update your question with example data that you want to integrate into PowerBI. Also, I'm not too familiar with PowerBI nor PHP, and won't go into making that PHP code you linked run for you.
Rather, I'd suggest to convert your OFX file into XML, and then use PowerBI's XML import on that converted file.
From your linked question, I get that your OFX file is in SGML format. There's a program specifically designed to convert SGML into XML (which is just a restricted form of SGML) called osx. I've detailed how to install it on Linux and Mac OS in another question related to SGML-to-XML down-converting; if you're on Windows, you may have luck by just downloading a really ancient (32bit) version of it from ftp://ftp.jclark.com/pub/sp/win32/sp1_3_4.zip. Alternatively, you can use my sgmljs.net software as explained in Converting HTML to XML though that tutorial is really about the much more complex task of converting HTML to XML/XHTML and will probably confuse you.
Anyway, if you manage to install osx, running it on your OFX file (which I assume to have the name yourfile.ofx just for illustration) is just a matter of invoking (on the Windows or Linux/Mac OS command line):
osx yourfile.ofx > yourfile.xml
to result in yourfile.xml which you can attempt to load with PowerBI.
Chances are your OFX file has additional text at the beginning (lines like XYZ:0001 that come before <ofx>). In that case, you can just remove those lines using a text editor before invoking osx on it. Maybe you also need a .dtd file or additional instructions at the top of the OFX file informing SGML about the grammar of your file; it's really difficult to say without seeing actual test data.
Before bothering with SGML and all that, however, I suggest to remove those first few lines in your OFX file (everything until the first < character) and check if PowerBI can already recognize your changed input file as XML (which, from other OFX example files, has a good chance of succeeding). Be sure to work on a copy of your original file rather than overwriting it. Then come back and update your question with your results and example data.
I have different set of SQL files which has French/Spanish and other language characters. In windows, we are able to see the specific language characters and when it transfers to Linux and i see weird characters.
I understand windows uses different character set like WINDOWS-1252, WINDOWS-1258 and iso-8859-1.
How can we change the charset which is similar to Windows in Linux, So that we won't insert the weird characters in DB when triggering the queries from Linux?
Thanks in advance.
If I'm understanding the problem correctly, you have SQL scripts produced in a variety of windows encodings that include non-ASCII characters. You want to execute these scripts on Linux.
I would think you'd want to lossless-ly convert the files to something that your linux SQL parser can handle, probably to unicode UTF-8. This sort of conversion can be done with iconv (command-line utility, I believe there are libraries as well).
A challenge though is whether or not you know what each file's original encoding is, as this cannot necessarily be automatically detected...might be better if you can get the script files' authors to provide the scripts with a specified encoding.
In windows, we are able to see the specific language characters
You can open it in notepad++ and see what encoding the file is using and you can also convert it to UTF-8.
You will want to use Encode or utf8 modules.
Normally for SQL or MySQL you will set the DB encoding to what you prefer to work with. These days most people set it to UTF-8 to support a large range of character sets.
But in this case you can play around with the encoding to match the right one needed, This could work.
use Encode qw(decode encode);
$data = encode("utf8", decode("iso-8859-1", $data));
So I have some Spanish content saved in Excel, that I am exporting into a .csv format so I can import it from the Firefox sql manager add-on into a .sql db. The problem is that when I import it, whenever there is an accent mark, (or whatever the technical name for those things are) Firefox doesn't recognize it, and accordingly produces a big black diamond with a white ?. Is there a better way to do this? Is there something I can do to have my Spanish content readable in a sql db? Maybe a more preferable program than the Firefox extension? Please let me know if you have any thoughts or ideas. Thanks!
You need to follow the chain and make sure nothing gets lost "in translation".
Specifically:
assert which encoding is used in the CSV file; ensure that the special charaters are effectively in there, and see how they are encoded (UTF8, particular Code page, ...)
ensure the that SQL server can
a) read these characters and
b) store them in an encoding which will preserve their integrity. (BTW, the encoding used in the CSV can of course be remapped to some other encoding of your choosing, i.e. one that you know will be suitable for consumption by your target application)
ensure that the database effectively stored these characters ok.
see if Firefox (or whichever "consumer" of this text) properly handles characters in this particular encoding.
It is commonplace but useful for this type of inquiries to recommend the following reading assignement:
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky
what I want to do is to be able to mount via sshfs some files on the mainframe via USS on my local PC. I can do that but sshfs doesnt do straight off the conversion from EBCDIC to ascii/unicode. Is there any flags that i can set.
Alternativly, does anybody know of a library that does EBCDIC to ASCII conversions so i can add to SSHFS?
Cheers
Mark
Be aware though that transparent charset conversion is a very dangerous game. Are you absolutely sure that you will never read anything but EBCDIC files via SSHFS? What if there is binary data?
Some systems used transparent conversions in the past:
the infamous "ASCII mode" of FTP, which messed up many binary downloads
the vfat filesystem in Linux, which notes: "Programs that do computed lseeks won't like in-kernel text conversion. Several people have had their data ruined by this translation. Beware!"
So I'd strongly advise to be aware of the consequences.
Why not use use an editor that can handle EBCDIC? Vim e.g. can do it (if it is compiled in).
There are several libraries for character set conversion — iconv (normally part of your C library; see for example iconv_open) and GNU recode come to mind.
I know a lot of time has passed since the original question but I'll leave the info here:
I've wrote a patch for sshfs which adds automatic conversion between ASCII and EBCDIC. It can be found here: https://github.com/vadimshchukin/sshfs-ebcdic
The patch adds "-t" command-line option which defines regular expression for files that should be converted. For example sshfs -t".*"
defines conversion for all files.
I had to "hard-code" the conversion table since there are various "flavours" of EBCDIC and iconv didn't translate the text between ASCII as EBCDIC on my system as needed. The advantage here is that someone can easily change that translation table as needed.
By the way I wrote the same patch for win-sshfs.
In one sentence I have manage to create 16 possible variations on how I present information. Does it matter as long as the context is clear? Do any common mistakes irritate you?
regarding Perl: How should I capitalize Perl?
TIFF stands for Tagged Image File Format, whereas the extension of files using that format is often ".tif".
That is for the purpose of compatibility with 8.3 filenames, I believe.
I generally like the Perl way of capitalizing when used as a proper noun, but lowercasing when referring to the command itself (assuming the command is lowercase to begin with).
Well, Perl and TIFF have already been answered, so I'll add the last two
the Apache Foundation writes "Apache Ant".
Rational ClearCase (or sometimes "IBM Rational ClearCase") is written as such at its web site.
Even though Perl was originally an acronym for Practical Extration and Report Language, it is written Perl.
These things dont 'bother' me as much as they provide insights into the level of knowledge of the speaker/author. You see, we work in a industry that requires precision, so precision in language does matter as it affects the understanding of the consumer.
The one that really seems to bother me is when people fully upper case JAVA as though it was an acronym.