Is there a tool to clean the output of the script(1) tool? - scripting

script(1) is a tool for keeping a record of an interactive terminal session; by default it writes to the file transcript. My problem is that I use ksh93, which has readline features, and so the transcript is mucked up with all sorts of terminal escape sequences and it can be very difficult to reconstruct the command that was actually executed. Not to mention the stray ^M's and the like.
I'm looking for a tool that will read a transcript file written by script, remove all the junk, and reconstruct what the shell thought it was executing, so I have something that shows $PS1 and the commands actually executed. Failing that, I'm looking for suggestions on how to write such a tool, ideally using knowledge from the terminfo database, or failing that, just using ANSI escape sequences.
A cheat that looks in shell history, as long as it really really works, would also be acceptable.

Doesn't cat/more work by default for browsing the transcript? Do you intend to create a script out of the commands actually executed (which in my experience can be dangerous)?
Anyway, 3 years without an answer, so I will give it a shot with an incomplete solution. If your are only interested in the commands actually typed, remove the non-printable characters, then replace PS1' with something readable and unique, and grep for that unique string. Like this:
$ sed -i 's/[^[:print:]]//g' transcript
$ sed 's/]0;cartman#southpark: ~cartman#southpark:~/CARTMAN/g' transcript | grep CARTMAN
Explanation: After first sed, PS1' can be taken from one of the first few lines of the transcript file, as is -- PS1' is different from PS1 -- and can be modified with a unique readable string ("CARTMAN" here). Note that the dollar sign at the end of the prompt was left out intentionally.
In the few examples that I tried, this didn't solve everything but took care of most issues.

This is essentially the same question asked recently in Can I programmatically “burn in” ANSI control codes to a file using unix utils? -- removing all nonprinting characters will not fix
embedded escape sequences
backspace/overstriking for underlining
use of carriage-returns for overstriking

Related

AIX: remove the last symbols (CRLF) from a file

There is a large file where the last symbols are \r\n. I need to remove them. It seems to be equivalent to removing the last line(?).
UPD: no, it's not: a file have only one line, which ends with \r\n.
I know two ways, but both don't work for AIX:
sed 's/\r\n$//' file # I don't why it doesn't work
head -c-2 # head doesn't work with negative numbers
Is there any solution for AIX? A lot of large files must be processed, so performance is important.
Usually, if you need to edit a file via a script in place, I use ed due to historical reasons. For example:
ed - /tmp/foo.txt <<EOF
g/^$/d
w
q
EOF
ed is more than a bit cantankerous. Note also that you did not really remove the empty lines at the bottom of the file but rather all of the empty lines. With ed and some practice you can probably achieve deleting only the empty lines at the bottom of the file. e.g. go to the bottom of the file, search up for a non-empty line, then move down a line and delete from that point to the end of the file. ed command scripts act (pretty much) as you would expect.
Also, if they really do have \r\n, then those are not going to be considered empty lines but rather lines with a control-M (\r) in them. You may need to adjust your pattern if that is the case.
My answer https://stackoverflow.com/a/46083912/3220113 to the duplicate question should work here too. Another solution is using
awk ' (NR>1) { print s }
{s=$0}
END { printf("%s",substr($2, 1, length($2)-1) ) }
' inputfile

Need clean syntax in batch

Context
I am thinking I can solve a problem with the proper creation of a *.bat file.
I am automating a process in a backup program called Acronis Backup and Recovery.
I am able to make a script (jScript) that creates all the syntax except for one part correctly.
In a normal command prompt the command I would run looks like this
acrocmd backup file --include="C:\documents\Gale_thesis.doc" "D:\Sandbox\!oDC!-IMG_0222.MOV" "C:\temp\magnifyReader" --loc="D:\backups" --arc="Backup1a"
The jScript I am creating can generate this with no problem and save as a *.bat file. This can works perfect if my file names are clean. By clean I mean no characters the batch files think are key words and commands.
Anytime I have a word like “copy” or a character like “!” in a file name it fails.
Question
So I am now wondering if loading variables from a text file would do the trick?
I am sure a lot of readers know that when load multiple file/folder paths at the command line you need to surround them with double quotes.
So I need this variable to have the correct syntax to be parsed by the batch file and work like the example when I type it directly at a command prompt.
I had tried to follow info about using for /f etc.
But the examples are not broad enough for me to understand, nobody seems to explain how to use these variables mixed in with other syntax.
I know a little about working with variable in a *.bat file. My jScript application can produce the text in any format a list, escaped, what ever is needed.
Thanks
I might suggest you to take a look at escaping characters
http://www.robvanderwoude.com/escapechars.php
in for loops !var! is used when delayedexpansion is enabled so you might need to escape it
I used the following code provided by Aacini to test the arguments that are being passed
#echo off
setlocal enabledelayedexpansion
set argCount=0
for %%x in (%*) do (
set /A argCount+=1
set "argVec[!argCount!]=%%~x"
)
echo Number of processed arguments: %argCount%
and since delayedexpansion is enabled I had to escape ! character
arg.bat --include="C:\documents\Gale_thesis.doc" "D:\Sandbox\^^^!oDC^^^!-IMG_0222.MOV" "C:\temp\magnifyReader" --loc="D:\backups" --arc="Backup1a"
Also about the triple escape quotes ^^^
the problem here is that we need to pass two special characters,
1st is the up arrow ^ and 2nd is the exclamation mark !
so the 2nd batch file (the one that reads our arguments) should get ^!
to escape ^ we use ^^ and to escape ! we use ^!
Thanks to Aacini for his code in HERE

Failure to read full line including embedded zero bytes

Lua script:
i=io.read()
print(i)
Command line:
echo -e "sala\x00m" | lua ll.lua
Output:
sala
I want it to print all character from input, similar to this:
salam
in HEX editor:
0000000: 7361 6c61 006d 0a sala.m.
How can I print all character from input?
You tripped over one of the few places where the Lua standard library is still not 8-bit-clean.
Specifically, file reading line-by-line is not embedded-0 proof.
The reason it isn't yet is an unfortunate combination of:
Only standard C90 or equally portable constructs are allowed for the core, which does not provide for efficient 0-clean text parsing.
Every solution discussed to date on the mailinglist under that constraint has considerable overhead.
Embedded 0-bytes in text files are quite rare.
Workarounds:
Use a modified library, fixing these formats: "*l" "*L" for file:read(...)
parse your raw data yourself. (read a block using a number or as much as possible using "*a")
Badger the Lua developers/maintainers for a bugfix until they give in.

DCL sort - different start positions

I have a DCL script that creates a .txt file that looks something like this
something,somethingelse,00000004
somethingdifferent,somethingelse1,00000002
anotherline,line,00000015
I need to sort the file by the 3rd column highest to lowest
ex:
anotherline,line,00000015
something,somethingelse,00000004
somethingdifferent,somethingelse1,00000002
Is it best to use the sort command, if so everything i've seen required a position number, how can this be done if each line would have a different start position?
If sort is a bad way to handle this is there something else or can I somehow handle this while writing the lines to the file.
I've only been working with VMS/DCL for a few weeks now so i'm not fimilar with all of the commands yet.
Thanks!
As you already noticed, the VMS sort expects fields with a fixed start position within a record. You can not specify a field by a separator. If you want to use the VMS sort you have to make sure your third field starts at the same column, for all records. In other words, you have to pad preceding fields. If you have control on how the file is created, this may work for you. If you don't or you don't know how big the string in front of the sort field will be, this may not be a workaround. Maybe changing the order of the fields is an option.
On the other hand, you may find GNV installed on your system. Then you can try to use its sort, which is a GNU style sort. That is, $ mcr gnv$gnu:[bin]sort -t, -k3 -r x.txt may get you the wanted results.
VMS Sort is indeed not really equipped for this.
Reformatting as you did is about the only way.
If you so not have access to GNV sort on the OpenVMS system then perhaps you have, or can install PERL? Is is somewhat easier to install.
In perl there are of course many ways.
For example using an anonymous sort function ( $a is first arg, $b second; <> reads all input )
$ perl -e "print sort { 0+(split /,/,$b)[1] <=> 0+(split /,/,$a)[2]} <>" x.x
where the 0 + forces numeric evaluation. For (fixed length?) string compare use:
$ perl -e "print sort { (split /,/,$b)[2] cmp (split /,/,$a)[2]} <>" x.x
hth,
Hein.enter code here

Reading a large-single XML line to a variable using Batch Script

I have a xml file which only contains a single line, but the problem is the line is very large, so it seems that I can't store in a variable.
What i want is this,
given tag1, tag2.....tag900, I want to break each tag into a line as follow:
tag1
tag2
tag3
......
tag900
Do not attempt to do this using native batch. It will be extremely difficult, and any solution will be very slow.
The problem is native batch cannot read lines > 8k, and batch does not have a good way to read partial lines.
There is a method that creates a test file that has size >= your file that consists of a single repeated character. A binary file compare ( FC /B ) is then done and the results are parsed character by character expressed as hex codes. It's a bit more complex than that, but I don't think you want to go there.
The only other option is to use SET /P to read in 1021 chars at a time, and then parse and piece things together. But this is unproven, and again, I don't think worth the effort.
If you want to use a native scripting language than I suggest VBScript or JScript. (Perhaps PowerShell, but I don't really know much about its capabilities).
You could download a Unix text processing tool like sed that has been ported to Windows.
I don't do much with XML, but I've got to believe there is a free tool geared specifically for XML that would make your job fairly easy.
Basically, use anything except batch! (this is coming from someone whose hobby is solving problems with batch)