Create file with UTF-16LE encoding - kotlin

I am trying to create a tab separate CSV via kotlin.
For the requirement we need to have UTF-16LE for the produced file encoding.
My stripped down code is something like this:
import java.io.File
import java.io.FileOutputStream
import java.io.OutputStreamWriter
fun main(args: Array<String>) {
val fileOutputStream = FileOutputStream(File("bla.csv"))
val writer = OutputStreamWriter(fileOutputStream, Charsets.UTF_16LE)
writer.write("bla\tbla\tbla")
writer.write("\n")
writer.write("lab\tlab\tlab")
writer.flush()
writer.close()
}
So after executing this program the file generated has this information:
(I am running file on the actual file)
file -I bla.csv
bla.csv: application/octet-stream; charset=binary
This is what I get when I go for
Charsets.UTF_16LE
I have tried using other UTF-16 variation which made me even more confuse!
So if I use Charsets.UTF_16 it will result in:
file -I bla.csv
bla.csv: text/plain; charset=utf-16be
And if I use Charsets.UTF_16BE it will result in:
file -I bla.csv
bla.csv: application/octet-stream; charset=binary
So after a lot of self doubt and being sure that I am doing something wrong I have give up and come here.
Any guidance will be appreciated. Thanks in advance

I suspect this is a limitation of the file command, and not a problem with your code (which is fine*).
If you write a Byte Order Mark (\uFEFF) as the first character of the file, then file recognises it fine:
> file bla.csv
bla.csv: Little-endian UTF-16 Unicode text
> file -I bla.csv
bla.csv: text/plain; charset=utf-16le
The file should be perfectly valid without a BOM, though.  So I'm not sure why file isn't recognising it.  It may be that it's not always possible to safely identify UTF16-LE without a BOM, though you'd think a case like this (where every other byte is 0) would be a safe bet!
(* Well, there are always potential improvements…  For example, it'd be safer to wrap the output in a call to writer.use() instead of closing the file manually.  You could wrap the OutputStreamWriter in a BufferedWriter for efficiency.  And in production code, you'd want some error handling, of course.  But none of that's related to the question!)

Related

Kotlin prints non-English characters as question marks

I am trying to print Hebrew characters from a Kotlin program (running on the console).
All the Hebrew characters are being output as question marks.
I created the following simple test.kts script file for testing:
println("שלום מקוטלין")
// Try to print a simple non-Hebrew character too
println("\u0394") // Greek Delta
The file is properly saved in UTF-8 format.
It prints:
???? ???????
?
I tried running it in Command Prompt, PowerShell (both in its native window and in Windows Terminal), and Git Bash, all of which give the same result. I also tried redirecting the output to a file to rule out display issues in the shells.
To make sure the problem isn't the console itself, I also made simple test.bat, test.ps1, and test.sh files with the following content:
echo "שלום מקוטלין"
All three shells correctly displayed the Hebrew text here, indicating that the problem is in Kotlin's output, not in the shell display. (Though PowerShell requires the file to be saved "UTF-8 with BOM" to display properly, this can't be the issue with Kotlin since Kotlin won't even run a script that is saved with a BOM.)
As far as I can tell, Kotlin should support UTF-8 output by default with no configuration needed.
How can I get the proper output?
Updates:
If I write the output to a file using java.io.File("out.txt").writeText("שלום מקוטלין"), it works properly.
Also, if I open a new PrintStream using val out = java.io.PrintStream(System.out, true, "UTF-8") and then write to it using out.println("שלום מקוטלין"), that works properly too.
Only writing to the console with println is broken.
System info:
Windows 10 2004 (Build 19041.450)
Kotlin 1.4.0 (downloaded from GitHub Releases)
Tested with JAVA_HOME pointing to both JRE 1.8.0_261 (Oracle) and 11.0.2 (Oracle OpenJDK).
(Update at bottom)
Partial answer, but was able to get some Hebrew characters in the console in both Kotlin and Java. Was verry painful. Included some commented out stuff to show you some other things I may have tried if you run into any other hurdles.
Saved Tester.kt as UTF-8 with Notepad.
fun main(args : Array<String>) {
System.setProperty("file.encoding", "UTF8")
//val charset = Charsets.UTF_8
//val byteArray = "שלום מקוטלין".toByteArray(charset)
//System.out.printf("%c",byteArray.toString(charset))
//System.out.println(Charset.defaultCharset())
System.out.println("ל")
}
kotlinc.bat .\Tester.kt -include-runtime -d Tester.jar
Now, this leads to another mess, which I discovered by trying to copy and paste Hebrew characters to Powershell/Cmd. When copying, the ? marks showed right off the bat. Dug around a little bit, seems Powershell ISE is better suited for this (reference below). Without any plugins, copy and pasted successfully. Then had to run this:
PS> [Console]::OutputEncoding = [System.Text.Encoding]::UTF8
Because on my system, running the following showed:
PS> [Console]::OutputEncoding
IsSingleByte : True
BodyName : iso-8859-1
EncodingName : Western European (Windows)
HeaderName : Windows-1252
WebName : Windows-1252
WindowsCodePage : 1252
IsBrowserDisplay : True
IsBrowserSave : True
IsMailNewsDisplay : True
IsMailNewsSave : True
EncoderFallback : System.Text.InternalEncoderBestFitFallback
DecoderFallback : System.Text.InternalDecoderBestFitFallback
IsReadOnly : True
CodePage : 1252
Then,
java -jar -D"file.encoding=UTF-8" tester.jar
and voila, a single Lamedh
ל
Also, the Java route, which may or may not bring more insights:
Tester.java saved as UTF-8 with Notepad, imports redundant, yes, but shows some standout imports
import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;
import static java.nio.charset.StandardCharsets.*;
import java.nio.*;
public class Tester{
public static void main(String[] args){
String str1 = "שלום מקוטלין";
byte[] ptext = str1.getBytes(UTF_8);
String value = new String(ptext, UTF_8);
ByteBuffer byteBuffer = StandardCharsets.UTF_8.encode("ש");
System.out.println(Charset.defaultCharset());
System.out.println("שלום מקוטלין");
System.out.println(value);
System.out.print(byteBuffer.getChar());
System.out.printf("Value: %s",value);
}
}
javac would give:
javac .\Tester.java
.\Tester.java:8: error: unmappable character (0x9D) for encoding windows-1252
System.out.println("╫⌐╫£╫ò╫? ╫₧╫º╫ò╫ÿ╫£╫Ö╫ƒ");
So
javac -encoding UTF-8 .\Tester.java
and voila again, PS ISE only:
PS> java -D"file.encoding=UFT-8" Tester
UTF-8
שלום מקוטלין
שלום מקוטלין
힩Value: שלום מקוטלין
I think this shows there are several hurdles, but it can work with Kotlin, and with println after making sure the file is correct, running the file the right way, and the output is correct. Hebrew may be particularly difficult due to the right-to-left nature, other characters like Greek were easier I think.
No matter what, I feel your pain, good luck. From what I read, there may be other bottlenecks like sending Hebrew over a network. This opened my eyes to several things, will continue to learn about this myself.
(Update)
Using the second link in the reference actually provided before, you can make two small changes and get Hebrew in Powershell (not just ISE)!!
PS> $OutputEncoding = [console]::InputEncoding = [console]::OutputEncoding = New-Object System.Text.UTF8Encoding
Then,
Font: Courier New
References:
https://markw.dev/unicode_powershell/
Displaying Unicode in Powershell
https://community.idera.com/database-tools/powershell/ask_the_experts/f/learn_powershell_from_don_jones-24/11793/add-hebrew-to-powershell
https://docs.oracle.com/javase/7/docs/api/java/nio/charset/Charset.html
I want to display Greek unicode characters but i get "?" instead on ouput
Encode String to UTF-8

How to open and read a .gz file in Nim (preferably line by line)

I just sat down to write my first Nim script to parse a .vcf (Variant Call Format) file. This file format stores genetic mutations from sequencing data.
For scripting languages, I 'grew up' on Perl and later migrated to Python, but I would love to use a language with the speed that Nim offers. I realize Nim is still young, but I couldn't even find a clear example for how to open and read a .gz (gzip) file (preferably line by line).
Can anyone provide a simple example to open and read a gzip file using Nim, line by line?
In Python, I'm accustomed to the following (uber-simple) code:
import gzip
my_file = gzip.open('my_file.vcf.gz', 'w')
for line in my_file:
# do something
my_file.close()
I have seen related questions, but they're not clear. The posts are also relatively old and I hope/suspect something better has come about. Here's what I've found:
Read gzip-compressed file line by line
File, FileStream, and GZFileStream
Reading files from tar.gz archive in Nim
Really appreciate it.
P.S. I also think it would be useful if someone created a Nim tag in StackOverflow. I do not have the reputation to create tags.
Just in case you need to handle VCF rather than .gz, there's a nice wrapper for htslib written by Brent Pedersen:
https://github.com/brentp/hts-nim
You need to install the htslib in your system, and then require the library in your .nimble file with requires "hts", or install the library with nimble install hts. If you are going to do NGS analysis in Nim you'll need it.
The code you need:
import hts
var v:VCF
doAssert open(v, "myfile.vcf.gz")
# Here you have the VCF file loaded in v, and can access the headers through
# v.header property
for record in v:
# Here you get a Record object per line, e.g. extract the Ref and Alts:
echo v.REF, " ", v.ALT
v.close()
Be sure to follow the docs, because some things differ from python, specially when getting the INFO and FORMAT fields.
Checkout the whole Brent repo. It has plenty of wrappers, code samples and utilities to handle NGS problems (e.g. an ultrafast coverage tool utility called Mosdepth).
Per suggestion from Maurice Meyer, I looked at the tests for the Nim zip package. It turned out to be quite simple. This is my first Nim script, so my apologies if I didn't follow convention, etc.
import zip/gzipfiles # Import zip package
block:
let vcf = newGzFileStream("my_file.vcf.gz") # Open gzip file
defer: outFile.close() # Close file (like a 'final' statement in 'try' block)
var line: string # Declare line variable
# Loop over each line in the file
while not vcf.atEnd():
line = vcf.readLine()
# Cure disease with my VCF file
To install the zip package, I simply ran because it is already in the Nim package library:
> nimble refresh
> nimble install zip
I tried to use Nim some time ago to parse a fastq or fastq.gz file.
The code should be available here:
https://gitlab.pasteur.fr/bli/qaf_demux/blob/master/Nim/src/qaf_demux.nim
I don't remember exactly how this works, but apparently, I did an import zip/gzipfiles and used newGZFileStream on the input file name to obtain a Stream from which lines can be read using .readLine() in this piece of code:
proc fastqParser(stream: Stream): iterator(): Fastq =
result = iterator(): Fastq =
var
nameLine: string
nucLine: string
quaLine: string
while not stream.atEnd():
nameLine = stream.readLine()
nucLine = stream.readLine()
discard stream.readLine()
quaLine = stream.readLine()
yield [nameLine, nucLine, quaLine]
It is used in something that amounts to this piece of code:
let inputFqs = fastqParser(newGZFileStream($inFastqFilename))
Hopefully you can adapt this to your case.
My .nimble file has a requires "zip#head". I suppose this triggers the installation of zip/gzipfiles.

GIMP Script.Fu script to batch convert JPEG to PNG

Can someone give me the script I would need to run to batch convert many *.jpeg files to *.png in Script.Fu in GIMP?
Currently I am spending way too much time manually exporting every image and it's a waste of time.
I can't install anything right now so can't use alternative applications.
Alright, after a lot of trials and errors I finally figured out how to convert one file format to another using only GIMP.
This is the Script-Fu script for conversion to PNG:
(
let* ((filename "{{filename}}")
(output "{{output}}")
(image (car (gimp-file-load 1 filename filename)))
(drawable (car (gimp-image-get-active-layer image))))
(file-png-save-defaults 1 image drawable output output)
)
Where {{filename}} is input file that needs to be converted (a jpeg file, for example), {{output}} is the output file that you need (it can be simply the same file name but with PNG extension)
How to run it: it can probably be improved
gimp -i -n -f -d --batch "{{one-line script-fu}}"
More about command line options you can find in GIMP online documentation.
The place that needs to be changed is {{one-line script-fu}} and it has to be... one-line! You can probably do all of this in one file using cmd (in case if you use Windows), but for me it was easier to use Python, so here's the script for it:
import subprocess, os
def convert_to_png(file_dds):
#Loads the command to run gimp cli (second code block)
#Note: remove "{{one-line script-fu}}" and leave one space after the --batch
with open("gimp-convert.bat", "r") as f:
main_script = f.read()
#Prepares the Script-Fu script to be run, replacing necessary file names and makes it one-line (the firs code block)
with open("gimp-convert-png.fu", "r") as f:
script = f.read().replace("\n", " ").replace("{{filename}}", file_dds) \
.replace("{{output}}", file_dds[:-3]+"PNG").replace("\\", "\\\\").replace("\"", "\\\"")
subprocess.run(main_script + " \"" + script + "\" --batch \"(gimp-quit 1)\"",
cwd = os.getcwd(),
shell = True)
And you should get your file converted to PNG!
I needed this for my texture upscale project, all of the code below you can find here.
Tested with GIMP 2.10
The real solution is to use ImageMagicks convert, as simple as magick convert some.jpeg some.png. There must be a "portable" version somewhere that you can use off a USB key.
Otherwise with Gimp, a much less manual way that doesn't need for a new script, since it uses an existing script:
get/install ofn-export-layers
File>Open the first JPEG
File>Open as layers more Jpegs. You can select several/all jpegs in one call (actual number limited by available RAM mostly). Once this is done you have many Jpegs stacked in the same image
File>Export all layers, making sure the name pattern you use ends in .png (the doc that comes with the script explains how that works).

How to write to gzip file from elixir code

I would like to write gzip file from elixir code.
I tried to following code, but it doesn't work well.
io_device = File.open!("/path/to/file.gzip", [:write, :compressed])
IO.write(io_device, "test")
IO.write returns :ok, but, /path/to/file.gzip is empty.
How can I write to gzip file?
You can also do whole thing in one step:
File.write "/path/to/file.gzip", "test", [:compressed]
You need one more step: close the file so that any buffered data gets written:
File.close io_device

gzip several files and pipe them into one input

I have this program that takes one argument for the source file and then it parse it. I have several files gzipped that I would like to parse, but since it only takes one input, I'm wondering if there is a way to create one huge file using gzip and then pipe it into the only one input.
Use zcat - you can provide it with multiple input files, and it will de-gzip them and then concatenate them just like cat would. If your parser supports piped input into stdin, you can just pipe it directly; otherwise, you can just redirect the output to a file and then invoke your parser program on that file.
If the program actually expects a gzip'd file, then just pipe the output from zcat to gzip to recompress the combined file into a single gzip'd archive.
http://www.mkssoftware.com/docs/man1/zcat.1.asp