No space between columns in awk output - awk

I am trying to print 2 column output using awk. I need to separate them out with a space. In this example below the first column value is '1' and the 2nd column is '1['. As seen in the output the two values are merged together. I am not able to print a space in between. The -vOFS flag does not seem to help. I am also printing just the last line of a cmd output in this awk statement.
In addition, I would also like to get rid of the '[' in the 2nd column output ('1['). So it's left with the '1' only. How exactly do I do that?
awk command:
sudo iblinkinfo | awk -vOFS=' ' 'NR==1; END{print $11 $12}'
awk'd Output I get:
CA: MT25408 ConnectX Mellanox Technologies:
11[
awk'd Output I want:
1 1
Original cmd output: (the last line starts with "CA: MT..."). Although first column output (with a $1) is the hex value 0xe41d2d0300e29e01. I would like to print the 11th, and 12th columns; which are 1 1[ (towards the end)
1 34[ ] ==( Down/ Polling)==> [ ] "" ( )
1 35[ ] ==( Down/ Polling)==> [ ] "" ( )
1 36[ ] ==( Down/ Polling)==> [ ] "" ( )
CA: MT25408 ConnectX Mellanox Technologies:
0xe41d2d0300e29e01 2 1[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 1 1[ ] "Infiniscale-IV Mellanox Technologies" ( )

Is this what you're trying to do?
$ cat file
1 34[ ] ==( Down/ Polling)==> [ ] "" ( )
1 35[ ] ==( Down/ Polling)==> [ ] "" ( )
1 36[ ] ==( Down/ Polling)==> [ ] "" ( )
CA: MT25408 ConnectX Mellanox Technologies:
0xe41d2d0300e29e01 2 1[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 1 1[ ] "Infiniscale-IV Mellanox Technologies" ( )
$ awk 'END{print $11, $12+0}' file
1 1
The above relies on undefined behavior since the values of $0, $1, etc. in the END section are undefined by the POSIX standard but it'll work in GNU awk as you're using.

Related

jq reducing stream to an array of all leaf values using input

I want to receive streamed json inputs and reduce them to an array containing the leaf values.
Demo: https://jqplay.org/s/cZxLguJFxv
Please consider
filter:
try reduce range(30) as $i ( []; (.+[until(length==2;input)[1]] // error(.)) )
catch empty
input:
[
[
0,
0,
"a"
],
null
]
[
[
0,
0,
"a"
]
]
[
[
0,
1,
"b"
],
null
]
[
[
0,
1,
"b"
]
]
[
[
0,
1
]
]
[
[
1
],
0
]
...
output:
empty
I expect the output: [null, null, 0, ...] but I get empty instead.
I told reduce to iterate 30 times but the size of inputs is less than that. I'm expecting it will empty those input of length other than 2 and produce an array containing all leaf values.
I don't know how this will behave when there is no more input with length 2 left and there are iterations of reduce left.
I want to know why my filter returns empty. What am I doing wrong? Thanks!
These filters should do what you want:
jq -n 'reduce inputs as $in ([]; if $in | has(1) then . + [$in[1]] else . end)'
Demo
jq -n '[inputs | select(has(1))[1]]'
Demo

How to use sed or awk selectively regarding the line length (file-by-file)

I've got around 100 formatted files in the following format
[[ 1.102 -0.26499999 0. ]
[ 2.25999999 -0.88700002 0. ]
[-0.152 2.78900003 0. ]
[-2.23300004 -1.19700003 0. ]
[-2.30699992 1.43700004 0. ]]
where some files are in the form
[[ -1.22399998e+00 -4.05999988e-01 -0.00000000e+00]
[ -2.00000009e-03 1.70599997e+00 0.00000000e+00]
[ 1.29299998e+00 -3.49999994e-01 -0.00000000e+00]
[ 1.20299995e+00 1.10699999e+00 0.00000000e+00]
[ 2.12299991e+00 1.67100000e+00 0.00000000e+00]]
which is, however, unpredictable for me when I get this output.
I'd like to have these numbers being rounded to three decimals in the upper form. I've tried things like sed 's/^\(.\{8\}\).\{4\}/\1/' file, but this isn't specific regarding the length of a line (and it also doesn't round the numbers, obviously).
I'm sure that NumPy could do this, but I think sed or awk would do the job more efficient.
Additional information: If this is of interest, the output represents coordinates and comes from pymol, which uses NumPy for this.
Edit:
It doesn't matter wether the number of characters between two decimal points in a line differs from the example; having all files formatted in the same way is of interest, which means in detail that
the decimal points are placed in the same three columns (character positions).
every file has the same notation of numbers (e. g. decimal, scientific).
the brackets are either in every output/file at the very same position(s) or in none of any output/file.
the number of decimals differs neither in a file nor between them.
In short: the only difference between the files is the numeric characters representing the numbers and not how, how exact or where they're written.
Desired output of the above examples:
[[ 1.102 -0.264 0.000 ]
[ 2.256 -0.887 0.000 ]
[-0.152 2.789 0.000 ]
[-2.233 -1.197 0.000 ]
[-2.307 1.437 0.000 ]]
[[-1.224 -4.056 -0.000 ]
[-2.000 1.706 0.000 ]
[ 1.293 -3.500 -0.000 ]
[ 1.203 1.107 0.000 ]
[ 2.123 1.671 0.000 ]]
Massage the output spacing in the printf to suit whatever criteria works for you if this doesn't as-is:
$ cat tst.awk
{
gsub(/[][]+/," & ")
for (i=2; i<NF; i++) {
$i = sprintf("%.3f",$i)
}
printf "%2s%6s%12s%12s %-2s\n", $1, $2, $3, $4, $5
}
.
$ awk -f tst.awk file
[[ 1.102 -0.265 0.000 ]
[ 2.260 -0.887 0.000 ]
[-0.152 2.789 0.000 ]
[-2.233 -1.197 0.000 ]
[-2.307 1.437 0.000 ]]
[[-1.224 -0.406 -0.000 ]
[-0.002 1.706 0.000 ]
[ 1.293 -0.350 -0.000 ]
[ 1.203 1.107 0.000 ]
[ 2.123 1.671 0.000 ]]
The above was run against this input file:
$ cat file
[[ 1.102 -0.26499999 0. ]
[ 2.25999999 -0.88700002 0. ]
[-0.152 2.78900003 0. ]
[-2.23300004 -1.19700003 0. ]
[-2.30699992 1.43700004 0. ]]
[[ -1.22399998e+00 -4.05999988e-01 -0.00000000e+00]
[ -2.00000009e-03 1.70599997e+00 0.00000000e+00]
[ 1.29299998e+00 -3.49999994e-01 -0.00000000e+00]
[ 1.20299995e+00 1.10699999e+00 0.00000000e+00]
[ 2.12299991e+00 1.67100000e+00 0.00000000e+00]]
Perl to the rescue!
perl -lpe 's/([-0-9.e+]+) */sprintf "%.3f ", $1/ge' -- file
-l removes newlines from input and adds them to output
-p processes the input line by line and prints each line after processing
s/// is substitution, similar to the same command in sed
/e interprets the replacement as code and runs it, which in this case means every number is formatted using sprintf.

How to compare number of lines of two files using Awk

I am new to awk and need to compare the number of lines of two files.
The script shall return true,
if lines(f1) == (lines(f2)+1)
otherwise false. How can I do that?
Best regards
If it has to be awk:
awk 'NR==FNR{x++} END{ if(x!=FNR){exit 1} }' file1 file2
The varibale x is incremented and contains the number of line of file1 and FNR contains the number of file2. At the end, both are compared and the script is exited 0 or 1.
See an example:
user#host:~$ awk 'NR==FNR{x++} END{ if(x!=FNR){exit 1} }' shortfile longfile
user#host:~$ echo $?
1
user#host:~$ awk 'NR==FNR{x++} END{ if(x!=FNR){exit 1} }' samefile samefile
user#host:~$ echo $?
0
Something like this should suit your purposes:
[ oele3110 $] cat line_compare.awk
#!/usr/bin/gawk -f
NR==FNR{
n_file1++;
}
NR!=FNR{
n_file2++;
}
END{
n_file2++;
if(n_file1==n_file2){exit(1);}
}
[ oele3110 $] cat f1
1
1
1
1
1
1
[ oele3110 $] cat f2
1
1
1
1
1
[ oele3110 $] cat f3
1
1
1
1
1
[ oele3110 $]
[ oele3110 $] wc -l f*
6 f1
5 f2
5 f3
16 total
[ oele3110 $] ./line_compare.awk f1 f2
[ oele3110 $] echo $?
1
[ oele3110 $] ./line_compare.awk f2 f3
[ oele3110 $] echo $?
0
[ oele3110 $]
Actually, I think I should have asked you to invest a bit more effort before giving you the answer. I'll leave it for now, but next time I won't make the same mistake.

How do I refer to variable in func argument when same is used in foreach

How can I refer to date as argument in f within the foreach loop if date is also used as block element var ? Am I obliged to rename my date var ?
f: func[data [block!] date [date!]][
foreach [date o h l c v] data [
]
]
A: simple, compose is your best friend.
f: func[data [block!] date [date!]][
foreach [date str] data compose [
print (date)
print date
]
]
>> f [2010-09-01 "first of sept" 2010-10-01 "first of october"] now
7-Sep-2010/21:19:05-4:00
1-Sep-2010
7-Sep-2010/21:19:05-4:00
1-Oct-2010
You need to either change the parameter name from date or assign it to a local variable.
You can access the date argument inside the foreach loop by binding the 'date word from the function specification to the data argument:
>> f: func[data [block!] date [date!]][
[ foreach [date o h l c v] data [
[ print last reduce bind find first :f 'date 'data
[ print date
[ ]
[ ]
>> f [1-1-10 1 2 3 4 5 2-1-10 1 2 3 4 5] 8-9-10
8-Sep-2010
1-Jan-2010
8-Sep-2010
2-Jan-2010
It makes the code very difficult to read though. I think it would be better to assign the date argument to a local variable inside the function as Graham suggested.
>> f: func [data [block!] date [date!] /local the-date ][
[ the-date: :date
[ foreach [date o h l c v] data [
[ print the-date
[ print date
[ ]
[ ]
>> f [1-1-10 1 2 3 4 5 2-1-10 1 2 3 4 5] 8-9-10
8-Sep-2010
1-Jan-2010
8-Sep-2010
2-Jan-2010

I need to generate 50 Millions Rows csv file with random data: how to optimize this program?

The program below can generate random data according to some specs (example here is for 2 columns)
It works with a few hundred of thousand lines on my PC (should depend on RAM). I need to scale to dozen of millions row.
How can I optimize the program to write directly to disk ? Subsidiarily how can I "cache" the parsing rule execution as it is always the same pattern repeated 50 Millions times ?
Note: to use the program below, just type generate-blocks and save-blocks output will be db.txt
Rebol[]
specs: [
[3 digits 4 digits 4 letters]
[2 letters 2 digits]
]
;====================================================================================================================
digits: charset "0123456789"
letters: charset "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
separator: charset ";"
block-letters: [A B C D E F G H I J K L M N O P Q R S T U V W X Y Z]
blocks: copy []
generate-row: func[][
Foreach spec specs [
rule: [
any [
[
set times integer! [['digits (
repeat n times [
block: rejoin [block random 9]
]
)
|
'letters (repeat n times [
block: rejoin [ block to-string pick block-letters random 24]
]
)
]
|
[
'letters (repeat n times [block: rejoin [ block to-string pick block-letters random 24]
]
)
|
'digits (repeat n times [block: rejoin [block random 9]]
)
]
]
|
{"} any separator {"}
]
]
to end
]
block: copy ""
parse spec rule
append blocks block
]
]
generate-blocks: func[m][
repeat num m [
generate-row
]
]
quote: func[string][
rejoin [{"} string {"}]
]
save-blocks: func[file][
if exists? to-rebol-file file [
answer: ask rejoin ["delete " file "? (Y/N): "]
if (answer = "Y") [
delete %db.txt
]
]
foreach [field1 field2] blocks [
write/lines/append %db.txt rejoin [quote field1 ";" quote field2]
]
]
Use open with /direct and /lines refinement to write directly to file without buffering the content:
file: open/direct/lines/write %myfile.txt
loop 1000 [
t: random "abcdefghi"
append file t
]
Close file
This will write 1000 random lines without buffering.
You can also prepare a block of lines (lets say 10000 rows) then write it directly to file, this will be faster than writing line-by-line.
file: open/direct/lines/write %myfile.txt
loop 100 [
b: copy []
loop 1000 [append b random "abcdef"]
append file b
]
close file
this will be much faster, 100000 rows less than a second.
Hope this will help.
Note that, you can change the number 100 and 1000 according to your needs an memory of your pc, and use b: make block! 1000 instead of b: copy [], it will be faster.