Writing lines to a binary file - raku

I'm further playing with Raku's CommaIDE and I wanna print a binary file line by line.
I've tried this, but it doesn't work:
for "G.txt".IO.lines -> $line {
say $_;
}
How shall I fix it ? It's obviously incorrect.
EDIT
this doesn't work either, see the snippet bellow
for "G.txt".IO.lines -> $line {
say $line;
}

You're showing us h.raku but Comma is giving you an error regarding c.raku, which is some other file in your Comma project.
It looks like you're working with a text file, not binary. Raku makes a clear distinction here: a text file is treated as text, regardless of encoding. If it's UTF-8, using .lines as you are now should work just fine because that's the default. If it's some other encoding, you can call .lines(:enc<some-other-encoding>). If it's truly binary, then the concept of "lines" really has no meaning, and you want something more like .slurp(:bin), which will give you a Buf[uint8] for working on the byte level.

The question specifically refers to reading a binary file, for which reading line-wise may (or may not) make sense--depending on the file.
Here's code to read a binary file straight from the docs (using class IO::CatHandle):
~$ raku -e '(my $f1 = "foo".IO).spurt: "A\nB\nC\n"; (my $f2 = "foo"); with IO::CatHandle.new: $f2 {.encoding: Nil; .slurp.say;};'
Buf[uint8]:0x<41 0A 42 0A 43 0A>
Compare to reading the file with default encoding (utf8):
~$ raku -e '(my $f1 = "foo".IO).spurt: "A\nB\nC\n"; (my $f2 = "foo"); with IO::CatHandle.new: $f2 {.slurp.say;};'
A
B
C
See:
https://docs.raku.org/routine/encoding
Note: the read method uses class IO::Handle which reads binary by default. So the code is simply:
~$ raku -e '(my $file1 = "foo".IO).spurt: "A\nB\nC\n"; my $file2 = "foo".IO; given $file2.open { .read.say; .close;};'
Buf[uint8]:0x<41 0A 42 0A 43 0A>
See:
https://docs.raku.org/type/IO::Handle#method_read
For further reading, see discussion of Perl5's <> diamond-operator-equivalent in Raku:
https://docs.raku.org/language/5to6-nutshell#while_until
...and some (older) mailing-list discussion of the same:
https://www.nntp.perl.org/group/perl.perl6.users/2018/11/msg6295.html
Finally, the docs refer to writing a mixed utf8/binary file here (useful for further testing):
https://docs.raku.org/routine/encoding#Examples

Related

Issues converting a small Hex value to a Binary value

I am trying to take the contents of a file that has a Hex number and convert that number to Binary and output to a file.
This is what I am trying but not getting the binary value:
xxd -r -p Hex.txt > Binary.txt
The contents of Hex.txt is: ff
I have also tried FF and 0xFF, but would like to just use ff since the device I am pulling the info from has it in that format.
Instead of 11111111 which it should be, I get a y with 2 dots above it.
If I change it to ee, I get an i with 2 dots. It seems to be reading it just fine but according to what I have read on the xxd -r -p command, it is not outputing it in the correct format.
The other ways I have found to convert Hex to Binary have either also not worked or is a pretty big Bash script that seems unnecessary to do what I thought would be a simple task.
This also gives me the y with 2 dots.
$ for i in $(cat Hex.txt) ; do printf "\x$i" ; done > Binary.txt
For some reason almost every solution I find gives me this format instead of a human readable Binary value with 1s and 0s.
Any help is appreciated. I am planning on using this in a script to pull the Relay values from Digital Loggers devices using curl and giving Home Assistant a readable file to record the Relay State. Digital Loggers curl cmd gives the state of all 8 relays at once using Hex instead of being able to pull the status of a specific relay.
If "file.txt" contains:
fe
0a
and you run this:
perl -ane 'printf("%08b\n",hex($_))' file.txt
You'll get this:
11111110
00001010
If you use it a lot, you might want to make a bash function of it in your login profile along these lines - being extremely respectful of spaces and semi-colons that might look unnecessary:
bin(){ perl -ane 'printf("%08b\n",hex($_))' $1 ; }
Then you'll be able to do:
bin file.txt
If you dislike Perl for some reason, you can achieve something similar without it as follows:
tr '[:lower:]' '[:upper:]' < file.txt |
while read h ; do
echo "obase=2; ibase=16; $h" | bc
done

Perl 6 $*ARGFILES.handles in binary mode?

I'm trying out $*ARGFILES.handles and it seems that it opens the files in binary mode.
I'm writing a zip-merge program, that prints one line from each file until there are no more lines to read.
#! /usr/bin/env perl6
my #handles = $*ARGFILES.handles;
# say $_.encoding for #handles;
while #handles
{
my $handle = #handles.shift;
say $handle.get;
#handles.push($handle) unless $handle.eof;
}
I invoke it like this: zip-merge person-say3 repeat repeat2
It fails with: Cannot do 'get' on a handle in binary mode in block at ./zip-merge line 7
The specified files are text files (encoded in utf8), and I get the error message for non-executable files as well as executable ones (with perl6 code).
The commented out line says utf8 for every file I give it, so they should not be binary,
perl6 -v: This is Rakudo version 2018.10 built on MoarVM version 2018.10
Have I done something wrong, or have I uncovered an error?
The IO::Handle objects that .handles returns are closed.
my #*ARGS = 'test.p6';
my #handles = $*ARGFILES.handles;
for #handles { say $_ }
# IO::Handle<"test.p6".IO>(closed)
If you just want get your code to work, add the following line after assigning to #handles.
.open for #handles;
The reason for this is the iterator for .handles is written in terms of IO::CatHandle.next-handle which opens the current handle, and closes the previous handle.
The problem is that all of them get a chance to be both the current handle, and the previous handle before you get a chance to do any work on them.
(Perhaps .next-handle and/or .handles needs a :!close parameter.)
Assuming you want it to work like roundrobin I would actually write it more like this:
# /usr/bin/env perl6
use v6.d;
my #handles = $*ARGFILES.handles;
# a sequence of line sequences
my $line-seqs = #handles.map(*.open.lines);
# Seq.new(
# Seq.new( '# /usr/bin/env perl6', 'use v6.d' ), # first file
# Seq.new( 'foo', 'bar', 'baz' ), # second file
# )
for flat roundrobin $line-seqs {
.say
}
# `roundrobin` without `flat` would give the following result
# ('# /usr/bin/env perl6', 'foo'),
# ('use v6.d', 'bar'),
# ('baz')
If you used an array for $line-seqs, you will need to de-itemize (.<>) the values before passing them to roundrobin.
for flat roundrobin #line-seqs.map(*.<>) {
.say
}
Actually I personally would be more likely to write something similar to this (long) one-liner.
$*ARGFILES.handles.eager».open».lines.&roundrobin.flat.map: *.put
:bin is always set in this type of objects. Since you are working on the handles, you should either read line by line as instructed on the example, or reset the handle so that it's not in binary mode.

sqlQuery in R fails when called via source() [duplicate]

The following, when copied and pasted directly into R works fine:
> character_test <- function() print("R同时也被称为GNU S是一个强烈的功能性语言和环境,探索统计数据集,使许多从自定义数据图形显示...")
> character_test()
[1] "R同时也被称为GNU S是一个强烈的功能性语言和环境,探索统计数据集,使许多从自定义数据图形显示..."
However, if I make a file called character_test.R containing the EXACT SAME code, save it in UTF-8 encoding (so as to retain the special Chinese characters), then when I source() it in R, I get the following error:
> source(file="C:\\Users\\Tony\\Desktop\\character_test.R", encoding = "UTF-8")
Error in source(file = "C:\\Users\\Tony\\Desktop\\character_test.R", encoding = "utf-8") :
C:\Users\Tony\Desktop\character_test.R:3:0: unexpected end of input
1: character.test <- function() print("R
2:
^
In addition: Warning message:
In source(file = "C:\\Users\\Tony\\Desktop\\character_test.R", encoding = "UTF-8") :
invalid input found on input connection 'C:\Users\Tony\Desktop\character_test.R'
Any help you can offer in solving and helping me to understand what is going on here would be much appreciated.
> sessionInfo() # Windows 7 Pro x64
R version 2.12.1 (2010-12-16)
Platform: x86_64-pc-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United Kingdom.1252
[2] LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
loaded via a namespace (and not attached):
[1] tools_2.12.1
and
> l10n_info()
$MBCS
[1] FALSE
$`UTF-8`
[1] FALSE
$`Latin-1`
[1] TRUE
$codepage
[1] 1252
On R/Windows, source runs into problems with any UTF-8 characters that can't be represented in the current locale (or ANSI Code Page in Windows-speak). And unfortunately Windows doesn't have UTF-8 available as an ANSI code page--Windows has a technical limitation that ANSI code pages can only be one- or two-byte-per-character encodings, not variable-byte encodings like UTF-8.
This doesn't seem to be a fundamental, unsolvable problem--there's just something wrong with the source function. You can get 90% of the way there by doing this instead:
eval(parse(filename, encoding="UTF-8"))
This'll work almost exactly like source() with default arguments, but won't let you do echo=T, eval.print=T, etc.
We talked about this a lot in the comments to my previous post but I don't want this to get lost on page 3 of comments: You have to set the locale, it works with both input from the R-console (see screenshot in comments) as well as with input from file see this screenshot:
The file "myfile.r" contains:
russian <- function() print ("Американские с...");
The console contains:
source("myfile.r", encoding="utf-8")
> Error in source(".....
Sys.setlocale("LC_CTYPE","ru")
> [1] "Russian_Russia.1251"
russian()
[1] "Американские с..."
Note that the file-in fails and it points to the same character as the original poster's error (the one after "R). I can not do this with Chinese because i would have to install "Microsoft Pinyin IME 3.0", but the process is the same, you just replace the locale with "chinese" (the naming is a bit inconsistent, consult the documentation).
I think the problem lies with R. I can happily source UTF-8 files, or UCS-2LE files with many non-ASCII characters in. But some characters cause it to fail. For example the following
danish <- function() print("Skønt H. C. Andersens barndomsomgivelser var meget fattige, blev de i hans rige fantasi solbeskinnede.")
croatian <- function() print("Dodigović. Kako se Vi zovete?")
new_testament <- function() print("Ne provizu al vi trezorojn sur la tero, kie tineo kaj rusto konsumas, kaj jie ŝtelistoj trafosas kaj ŝtelas; sed provizu al vi trezoron en la ĉielo")
russian <- function() print ("Американские суда находятся в международных водах. Япония выразила серьезное беспокойство советскими действиями.")
is fine in both UTF-8 and UCS-2LE without the Russian line. But if that is included then it fails. I'm pointing the finger at R. Your Chinese text also appears to be too hard for R on Windows.
Locale seems irrelevant here. It's just a file, you tell it what encoding the file is, why should your locale matter?
For me (on windows) I do:
source.utf8 <- function(f) {
l <- readLines(f, encoding="UTF-8")
eval(parse(text=l),envir=.GlobalEnv)
}
It works fine.
Building on crow's answer, this solution makes RStudio's Source button work.
When hitting that Source button, RStudio executes source('myfile.r', encoding = 'UTF-8')), so overriding source makes the errors disappear and runs the code as expected:
source <- function(f, encoding = 'UTF-8') {
l <- readLines(f, encoding=encoding)
eval(parse(text=l),envir=.GlobalEnv)
}
You can then add that script to an .Rprofile file, so it will execute on startup.
I encounter this problem when a try to source a .R file containing some Chinese characters. In my case, I found that merely set "LC_CTYPE" to "chinese" is not enough. But setting "LC_ALL" to "chinese" works well.
Note that it's not enough to get encoding right when you read or write plain text file in Rstudio (or R?) with non-ASCII. The locale setting counts too.
PS. the command is Sys.setlocale(category = "LC_CTYPE",locale = "chinese"). Please replace locale value correspondingly.
On windows, when you copy-paste a unicode or utf-8 encoded string into a text-control that is set to single-byte-input (ascii... depending on locale), the unknown bytes will be replaced by questionmarks. If i take the first 4 characters of your string and copy-paste it into e.g. Notepad and then save it, the file becomes in hex:
52 3F 3F 3F 3F
what you have to do is find an editor which you can set to utf-8 before copy-pasting the text into it, then the saved file (of your first 4 characters) becomes:
52 E5 90 8C E6 97 B6 E4 B9 9F E8 A2 AB
This will then be recognized as valid utf-8 by [R].
I used "Notepad2" for trying this, but i am sure there are many more.

awk getline skipping to last line -- possible newline character issue

I'm using
while( (getline line < "filename") > 0 )
within my BEGIN statement, but this while loop only seems to read the last line of the file instead of each line. I think it may be a newline character problem, but really I don't know. Any ideas?
I'm trying to read the data in from a file other than the main input file.
The same syntax actually works for one file, but not another, and the only difference I see is that the one for which it DOES work has "^M" at the end of each line when I look at it in Vim, and the one for which it DOESN'T work doesn't have ^M. But this seems like an odd problem to be having on my (UNIX based) Mac.
I wish I understood what was going with getline a lot better than I do.
You would have to specify RS to something more vague.
Here is a ugly hack to get things working
RS="[\x0d\x0a\x0d]"
Now, this may require some explanation.
Diffrent systems use difrent ways to handle change of line.
Read http://en.wikipedia.org/wiki/Carriage_return and http://en.wikipedia.org/wiki/Newline if you are
interested in it.
Normally awk hadles this gracefully, but it appears that in your enviroment, some files are being naughty.
0x0d or 0x0a or 0x0d 0x0a (CR+LF) should be there, but not mixed.
So lets try a example of a mixed data stream
$ echo -e "foo\x0d\x0abar\x0d\x0adoe\x0arar\x0azoe\x0dqwe\x0dtry" |awk 'BEGIN{while((getline r )>0){print "r=["r"]";}}'
Result:
r=[foo]
r=[bar]
r=[doe]
r=[rar]
try]oe
We can see that the last lines are lost.
Now using the ugly hack to RS
$ echo -e "foo\x0d\x0abar\x0d\x0adoe\x0arar\x0azoe\x0dqwe\x0dtry" |awk 'BEGIN{RS="[\x0d\x0a\x0d]";while((getline r )>0){print "r=["r"]";}}'
Result:
r=[foo]
r=[bar]
r=[doe]
r=[rar]
r=[zoe]
r=[qwe]
r=[try]
We can see every line is obtained, reguardless of the 0x0d 0x0a junk :-)
Maybe you should preprocess your input file with for example dos2unix (http://sourceforge.net/projects/dos2unix/) utility?

How to read text files transfered as binary

My code copies files from ftp (using text transfer mode) to local disk and then trys to process them.
All files contain only text and values are seperated using new line. Sometimes files are moved to this ftp using binary transfer mode and looks like this will mess up line-ends.
Using hex editor, I compared line ends depending the transfer mode used to send files to ftp:
using text mode: file endings are 0D 0A
using binary mode: file endings are 0D 0D 0A
Is it possible to modify my code so it could read files in both cases?
Code from job that illustrates my problem and shows how i'm reading file:
(here i use same file, that contains 14 rows of data)
int i;
container con;
container files = ["c:\\temp\\axa_keio\\ascii.txt", "c:\\temp\\axa_keio\\binary.txt"];
boolean purchLineFirstRow;
IO inFile;
;
for(i=1; i<=conlen(files); i++)
{
inFile = new AsciiIO(conpeek(files,i), "R");
inFile.inFieldDelimiter('\n');
con = inFile.read();
info(int2str(conlen(con)));
}
Files come from Unix system to Windows sytem.
Not sure but maybe the question could be: "Which inFieldDelimiter values should i use to read both Unix and Windows line ends?"
Use inRecordDelimiter:
inFile.inRecordDelimiter('\n');
instead of:
inFile.inFieldDelimiter('\n');
There may still be a dangling CR on the last field, you may wish remove this:
strRem(conpeek(con, conlen(con)), '\r')
See also: http://en.wikipedia.org/wiki/Line_endings