Failure to read full line including embedded zero bytes - file-io

Lua script:
i=io.read()
print(i)
Command line:
echo -e "sala\x00m" | lua ll.lua
Output:
sala
I want it to print all character from input, similar to this:
salam
in HEX editor:
0000000: 7361 6c61 006d 0a sala.m.
How can I print all character from input?

You tripped over one of the few places where the Lua standard library is still not 8-bit-clean.
Specifically, file reading line-by-line is not embedded-0 proof.
The reason it isn't yet is an unfortunate combination of:
Only standard C90 or equally portable constructs are allowed for the core, which does not provide for efficient 0-clean text parsing.
Every solution discussed to date on the mailinglist under that constraint has considerable overhead.
Embedded 0-bytes in text files are quite rare.
Workarounds:
Use a modified library, fixing these formats: "*l" "*L" for file:read(...)
parse your raw data yourself. (read a block using a number or as much as possible using "*a")
Badger the Lua developers/maintainers for a bugfix until they give in.

Related

End-of-line conversion during Input/Output for text files

How to write strings (&str and String) containing newlines to text files?
In C you can switch between writing text as is or converting '\n' to proper end of line symbol for the OS via fopen flags, "w" or "wb". For example in Windows '\n' is converted to "\r\n" during I/O.
How can I achieve this with Rust? I cannot find corresponding API in std::fs::File.
There is no such API in the standard library (there might be a crate for this, though). The simplest way to write lines to a file is with the writeln! macro and it only uses \n for newlines.
It was probably considered (by the Rust developers) not useful enough because I'm pretty sure that nowadays \r\n is used only for Microsoft Notepad compatibility.
There once was an issue related to write not using CRLF on Windows, but it was concluded that:
the raw io::File will likely not handle it by default but would instead require a wrapper
(note: since Rust 1.0 it is no longer io::File, but fs::File)

How to set lang env to "en_US.UTF-8" in awk script?

I am using awkc to generate an executable file from an awk script. I have the following line in an awk script abc.awk:
BEGIN{printf "Value=%s\n",(3.13+3.26)}
I have generated an executable file (abc.exe), which I have executed on different systems. It gives different outputs in floating point operations.
On one local system it gives the output 6.39 but it gives the output 6 on another system located in a different time zone.
When I searched in various sites I am able to see to set the LANG environmental variable, but how?
I'm not sure that this to do with the locale settings on your different systems but if you are looking for a floating-point number, you should use the "%f" format specifier. To get an answer to 2 decimal places, use "%.2f":
BEGIN{printf "Value=%.2f\n",(3.13+3.26)}
This should give the same result, regardless of the system it is run on.
Edit: based on the linked question, perhaps you should try using LC_NUMERIC=C to explicitly set the locale:
LC_NUMERIC=C awk 'BEGIN{printf "Value=%.2f\n",(3.13+3.26)}'
should work regardless of the system it is being run on.
Environment variables are typically something that you do not control if you hand a program over to someone else. I would look into being more specific in your printf-calls. You can format the numbers into strings in to an astonishing detail. You are probably looking for the float (%f) (you are now using a string (%s).
If you notice that you are making always the same trick with your printf-calls, you can control the number-to-string conversion settings with a variable CONVFMT in a BEGIN-block. I got introduced to this one through gawk, apparently it is however in the POSIX standard. Whether awkc supports this, I don't know.
From the gawk manual:
This string controls conversion of numbers to strings (see
Conversion). It works by being passed, in effect, as the first
argument to the sprintf() function (see String Functions). Its default
value is "%.6g". CONVFMT was introduced by the POSIX standard.

How to write a custom assembly compiler (sort of) in VB.NET

I've been trying to write a simple script compiler for a custom language used by the Game Boy Advance's Z80 processor.
All I want it to do is look at a human-readable command, take it and its arguments and convert it into a hexadecimal value into a ROM file. That's it. Each command is a byte, and each may take a different number of arguments - arguments can be either 8, 16, or 32 bits and each command has a specific number of arguments that it takes.
All of this sort of code is handled by the game and converted into workable machine code within the game's memory, so I'm not writing a full-on assembly compiler if you will. The game automatically knows how many args a command has, what each command does, exactly how to execute it as it is, etc.
For instance, you have command 0x4E, which takes in one 8-bit argument and another 32-bit argument. In hex that would obviously be 4E XX YY YY YY YY. I want my compiler to read it from text as foo 0xXX 0xYYYYYYYY and directly write it into a file as the former.
My question is, how would I do that in VB.NET? I know it's probably a very simple answer, but I see a lot of different options to write it to a file--some work and most don't for me. Could you give me some sample code as to how I would do this?
Writing an assembly compiler as I understand it is not so simple. I recomed you to use one already written see: Software Development Tools for Z80 Family
If you are still interested in writing it here are instructions:
Write the text you want to translate to some file (or memory stream)
Read it line by line
Parse the line either splitting it to an array or with regular
expressions
Identify command and arguments (as far as I remember it some commands
does not have arguments)
Translate the command to Hex (with a collection or dictionary of
commands)
Write results to an array remembering the references for jump
addresses
When everything is translated resolve addresses and write them to
right places.
I think that the most tricky part is to deal with symbolic addressees.
If you are still interested write a first piece of code (or ask how to do it) and continue with next ones.
This sounds like an assembler, even if it for a 'custom language'.
Start by parsing the command lines. use string.split method to convert the string to an array of strings. the first element in the array is your foo, you can then look that up and output 4E, then convert the subsequent elements to bytes.

Reading a large-single XML line to a variable using Batch Script

I have a xml file which only contains a single line, but the problem is the line is very large, so it seems that I can't store in a variable.
What i want is this,
given tag1, tag2.....tag900, I want to break each tag into a line as follow:
tag1
tag2
tag3
......
tag900
Do not attempt to do this using native batch. It will be extremely difficult, and any solution will be very slow.
The problem is native batch cannot read lines > 8k, and batch does not have a good way to read partial lines.
There is a method that creates a test file that has size >= your file that consists of a single repeated character. A binary file compare ( FC /B ) is then done and the results are parsed character by character expressed as hex codes. It's a bit more complex than that, but I don't think you want to go there.
The only other option is to use SET /P to read in 1021 chars at a time, and then parse and piece things together. But this is unproven, and again, I don't think worth the effort.
If you want to use a native scripting language than I suggest VBScript or JScript. (Perhaps PowerShell, but I don't really know much about its capabilities).
You could download a Unix text processing tool like sed that has been ported to Windows.
I don't do much with XML, but I've got to believe there is a free tool geared specifically for XML that would make your job fairly easy.
Basically, use anything except batch! (this is coming from someone whose hobby is solving problems with batch)

Is there a tool to clean the output of the script(1) tool?

script(1) is a tool for keeping a record of an interactive terminal session; by default it writes to the file transcript. My problem is that I use ksh93, which has readline features, and so the transcript is mucked up with all sorts of terminal escape sequences and it can be very difficult to reconstruct the command that was actually executed. Not to mention the stray ^M's and the like.
I'm looking for a tool that will read a transcript file written by script, remove all the junk, and reconstruct what the shell thought it was executing, so I have something that shows $PS1 and the commands actually executed. Failing that, I'm looking for suggestions on how to write such a tool, ideally using knowledge from the terminfo database, or failing that, just using ANSI escape sequences.
A cheat that looks in shell history, as long as it really really works, would also be acceptable.
Doesn't cat/more work by default for browsing the transcript? Do you intend to create a script out of the commands actually executed (which in my experience can be dangerous)?
Anyway, 3 years without an answer, so I will give it a shot with an incomplete solution. If your are only interested in the commands actually typed, remove the non-printable characters, then replace PS1' with something readable and unique, and grep for that unique string. Like this:
$ sed -i 's/[^[:print:]]//g' transcript
$ sed 's/]0;cartman#southpark: ~cartman#southpark:~/CARTMAN/g' transcript | grep CARTMAN
Explanation: After first sed, PS1' can be taken from one of the first few lines of the transcript file, as is -- PS1' is different from PS1 -- and can be modified with a unique readable string ("CARTMAN" here). Note that the dollar sign at the end of the prompt was left out intentionally.
In the few examples that I tried, this didn't solve everything but took care of most issues.
This is essentially the same question asked recently in Can I programmatically “burn in” ANSI control codes to a file using unix utils? -- removing all nonprinting characters will not fix
embedded escape sequences
backspace/overstriking for underlining
use of carriage-returns for overstriking