How do I check if input is one word or 2 separated by delimiter "-" - awk

I need help with the following ksh script:
ExpResult=`echo "$LoadString" | awk -F"-" '{print NF}'=2`
MinExp=`echo "$ExpResult" | tr -s " " | sed 's/^[ ]//g'| cut -d"-" -f1`
MaxExp=`echo "$ExpResult" | tr -s " " | sed 's/^[ ]//g'| cut -d"-" -f2`
I can get an input as two options : "50-100" or "50" (for example)
I have two questions:
How do I check if the input is "one word" or two words separated by delimiter "-"?
If the input is two words, how can I separate them?

Rather than call an external program to parse your input, you can use the internal case statement to validate input and parameter expansion features to convert your input, i.e.
# set a copy/paste value for $1
set -- 50-10
case "$1" in
*-* )
range="$1"
min="${range%-*}"
max="${range#*-}"
;;
* )
singleNum="$1"
;;
esac
echo min=$min ... max=$max
output
min=50 ... max=100
Try for non-pair
unset min max
set -- other values
case ...
echo min= ... max= ... singleNum=$singleNum
output
min= ... max= ... singleNum=other
Hopefully the case processing is self-explanatory, but the parameter expansion may require a little explanation.
The statement
min=${range%-*}
says remove from the right side of the expanded value (50-100) anything starting at the last - until the end of the string. This leaves the value 50 remaining.
The reverse happens with
max=${range#*-}
Says remove from the left side of the expanded value anything up to the first - char. This leaves the 100.
As there is only one - char in this string, you don't need to worry about the other versions of ${var##*-} which says remove all from the left until the last match of -, and the reverse ${var%%-*} , remove all from right (backwards) until the very first - char.
The fanatical minimalists will remind us that this can be done without a temporary variable, i.e.
min=${1%-*} ; max=${1#*-}
And the one-line fantasists can be satisfied with
case "$1" in *-* ) range="$1";min="${range%-*}";max="${range#*-}";;* ) singleNum="$1";;esac; echo min=$min ... max=$max .,, singleNum=$singleNum
:-)
IHTH

you can try this;
LoadString=$1
MinExp=`echo "$LoadString" | awk -F"-" '{if (NF==2) print $1}`
MaxExp=`echo "$LoadString" | awk -F"-" '{if (NF==2) print $2}`
echo $MinExp
echo $MaxExp
eg:
user#host:/tmp/test$ ksh test.ksh 50-100
50
100

Related

Bash command/script to split line on a certain character

I would like to split the below data to the expected output:
Raw Data:
931096|376601|1|ART|AT-2151780724|2151780724|2|102809198|I|CGM44I|MIL3VF03|52576377.3600|PENDING|MO|PEND-INFO|Pend ACS4R|N|N|N|N|N|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|N|NULL|NULL|N|system|NULL|NULL|52576377.3600|1317720|system|2020-02-13 02:00:42|0
931097|375789|1|AYT|AT-2151509210|2151509210|7|102614605|A|CTHGMI|OZF19|444006.6400|APPROVED|NULL|APPROVED|Approved|N|N|N|N|N|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|N|NULL|NULL|N|kg17718|NULL|NULL|0.0000|1317722|system|2020-02-13 02:00:43|0931098|375979|1|AHT|AT-2151780726|2151780726|2|102809199|I|CGMI|MILaesLF11|26312.0000|PENDING|MO|PEND-INFO|Pend ACRES|N|N|N|N|N|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|N|NULL|NULL|N|system|NULL|NULL|26312.0000|1317721|system|2020-02-13 02:00:43|0
931099|376572|1|AT|AT-2151399812|2151399812|5|102673999|I|CG2rMI|WEL44LF15|60991.6956|PENDING|MO|PEND-INFO|Pend ACERS|N|N|N|N|N|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|N|NULL|NULL|N|system|NULL|NULL|0.0000|1317723|system|2020-02-13 02:00:45|0
Expected Output:
931096|376601|1|ART|AT-2151780724|2151780724|2|102809198|I|CGM44I|MIL3VF03|52576377.3600|PENDING|MO|PEND-INFO|Pend ACS4R|N|N|N|N|N|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|N|NULL|NULL|N|system|NULL|NULL|52576377.3600|1317720|system|2020-02-13 02:00:42|0
931097|375789|1|AYT|AT-2151509210|2151509210|7|102614605|A|CTHGMI|OZF19|444006.6400|APPROVED|NULL|APPROVED|Approved|N|N|N|N|N|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|N|NULL|NULL|N|kg17718|NULL|NULL|0.0000|1317722|system|2020-02-13 02:00:43|0
931098|375979|1|AHT|AT-2151780726|2151780726|2|102809199|I|CGMI|MILaesLF11|26312.0000|PENDING|MO|PEND-INFO|Pend ACRES|N|N|N|N|N|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|N|NULL|NULL|N|system|NULL|NULL|26312.0000|1317721|system|2020-02-13 02:00:43|0
931099|376572|1|AT|AT-2151399812|2151399812|5|102673999|I|CG2rMI|WEL44LF15|60991.6956|PENDING|MO|PEND-INFO|Pend ACERS|N|N|N|N|N|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|N|NULL|NULL|N|system|NULL|NULL|0.0000|1317723|system|2020-02-13 02:00:45|0
Basically the \n character is getting lost sometimes in the data and the lines are getting merged. Sometimes more than 1 line gets merged as well (even the opposite happens but we can get to that later).
The data always has 43 columns | separated. The last but one column(42nd) always is a timestamp and the last column is usually 0 or 1.
Trying for the below approach:
If cols > 43
Split 44th column to add \n and print the remaining.
Repeat process until cols=43
echo "${curr}" | awk -F\| ' { if(NF > 43) {for(i=43;i<NF;i++) "sed '${NR}s/\(^0\)/\1\n/p' $i" }}' filename
less complex
awk 'BEGIN {FS=OFS="|"}
NF>43 {for(i=43;i<=NF;i+=42) {t=$i; $i=substr(t,1,1) ORS substr(t,2)}}1' file
931096|376601|1|ART|AT-2151780724|2151780724|2|102809198|I|CGM44I|MIL3VF03|52576377.3600|PENDING|MO|PEND-INFO|Pend ACS4R|N|N|N|N|N|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|N|NULL|NULL|N|system|NULL|NULL|52576377.3600|1317720|system|2020-02-13 02:00:42|0
931097|375789|1|AYT|AT-2151509210|2151509210|7|102614605|A|CTHGMI|OZF19|444006.6400|APPROVED|NULL|APPROVED|Approved|N|N|N|N|N|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|N|NULL|NULL|N|kg17718|NULL|NULL|0.0000|1317722|system|2020-02-13 02:00:43|0
931098|375979|1|AHT|AT-2151780726|2151780726|2|102809199|I|CGMI|MILaesLF11|26312.0000|PENDING|MO|PEND-INFO|Pend ACRES|N|N|N|N|N|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|N|NULL|NULL|N|system|NULL|NULL|26312.0000|1317721|system|2020-02-13 02:00:43|0
931099|376572|1|AT|AT-2151399812|2151399812|5|102673999|I|CG2rMI|WEL44LF15|60991.6956|PENDING|MO|PEND-INFO|Pend ACERS|N|N|N|N|N|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|NULL|N|NULL|NULL|N|system|NULL|NULL|0.0000|1317723|system|2020-02-13 02:00:45|0
following your spec
If cols > 43 Split 44th 43th column to add
\n and print the remaining. Repeat process until cols=43 the end.
The usual way with sed: write a regex that matches 43 | characters with anything in between and a digit. Then insert a newline after the matched string.
sed 's/[0-9]\{6\}\(|[^|]*\)\{41\}|[0-9]/&\n/g ; s/\n$//'
# ^^^^^^^ - remove the leftover newline
# ^ - the matched string
# ^^^^^ - trailing digit
# ^ - 42th pipe character
# ^^^^^^^^^^^^^^^^ - 41 fields with anything in between
# ^^^^^^^^^^ - leading 6 digits
tested on repl
Or maybe match 42 pipes with anything in front and a digit::
sed 's/\([^|]*|\)\{42\}[0-9]/&\n/g ; s/\n$//'
Or match a character after 42 pipes and a digit and insert a newline in between:
sed 's/\(\([^|]*|\)\{42\}[0-9]\)\(.\)/\1\n\3/g'
Could you please try following, written and tested with shown samples. This solution will take care of inserting new lines even if you have more than 1 occurrences present in your single line too.
awk '
match($0,/[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}\|0/){
val=substr($0,RSTART+RLENGTH)
if(val){
num=gsub(/[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}\|0/,"&")
while(++count<num){
sub(/[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}\|0/,"&\n")
}
}
val=count=num=""
}
1
' Input_file
You don't trust the source of the data. Maybe it will add another | and the number of columns is wrong.
Another approach is guessing that you can trust the timestamp field.
So try to split the line when the field after the timestamp has more dan one character (and split after the first).
sed -E 's/([0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}\|.)(.)/\1\n\2/g' file
This might work for you (GNU sed):
sed 's/[^|]*/\n&/44;s/\(|.\)\([^|]*|\)\n/\1\n\2/;P;D' file
If there is a 44th field, insert a newline before it. Then remove that newline and insert it following the first character of the 43rd field. Print the first line, delete the first line and repeat.

Sed/Awk: how to find and remove two lines if a pattern in the first line is being repeated; bash

I am processing text file(s) with thousands of records per file. Each record is made up of two lines: a header that starts with ">" and followed by a line with a long string of characters "-AGTCNR". The header has 10 fields separated by "|" whose first field is a unique identifier to each record e.g ">KEN096-15" and a record is termed duplicate if it has same identifier. Here is how a simple record look like:
>ACML500-12|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_-2
----TAAGATTTTGACTTCTTCCCCCATCATCAAGAAGAATTGT-------
>ACRJP458-10|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
-----------TCCCTTTAATACTAGGAGCCCCTGACATAGCCTTTCCTAAATAAT-----
>ASILO303-17|Dip|gs-Par|sp-Par vid|subsp-NA|co
-------TAAGATTCTGATTACTCCCCCCCTCTCTAACTCTTCTTCTTCTATAGTAGATG
>ASILO326-17|Dip|gs-Goe|sp-Goe par|subsp-NA|c
TAAGATTTTGATTATTACCCCCTTCATTAACCAGGAACAGGATGA---------------
>CLT100-09|Lep|gs-Col|sp-Col elg|subsp-NA|co-Buru
AACATTATATTTGGAATTT-------GATCAGGAATAGTCGGAACTTCTCTGAA------
>PMANL2431-12|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_
----ATGCCTATTATAATTGGAGGATTTGGAAAACCTTTAATATT----CCGAAT
>STBOD057-09|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
ATCTAATATTGCACATAGAGGAACCTCNGTATTTTTTCTCTCCATCT------TTAG
>TBBUT582-11|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
-----CCCCCTCATTAACATTACTAAGTTGAAAATGGAGCAGGAACAGGATGA
>TBBUT583-11|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
TAAGATTTTGACTCATTAA----------------AATGGAGCAGGAACAGGATGA
>AFBTB001-09|Col|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
TAAGCTCCATCC-------------TAGAAAGAGGGG---------GGGTGA
>PMANL2431-12|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_
----ATGCCTATTAGGAAATTGATTAGTACCTTTAATATT----CCGAAT---
>AFBTB003-09|Col|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
TAAGATTTTGACTTCTGC------CATGAGAAAGA-------------AGGGTGA
>AFBTB002-09|Cole|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
-------TCTTCTGCTCAT-------GGGGCAGGAACAGGG----------TGA
>ACRJP458-10|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
-----------TCCCTTTAATACTAGGAGCCCCTTTCCT----TAAATAAT-----
Now I am triying to delete repeats, like duplicate records of "ACRJP458-10" and "PMANL2431-12".
Using a bash script I have extracted the unique identifiers and stored repeated ones in a variable "$duplicate_headers". Currently, I am trying to find any repeated instances of their two-line records and deleting them as follows:
for i in "$#"
do
unset duplicate_headers
duplicate_headers=`grep ">" $1 | awk 'BEGIN { FS="|"}; {print $1 "\n"; }' | sort | uniq -d`
for header in `echo -e "${duplicate_headers}"`
do
sed -i "/^.*\b${header}\b.*$/,+1 2d" $i
#sed -i "s/^.*\b${header}\b.*$//,+1 2g" $i
#sed -i "/^.*\b${header}\b.*$/{$!N; s/.*//2g; }" $i
done
done
The final result (with thousands of records in mind) will look like:
>ACML500-12|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_-2
----TAAGATTTTGACTTCTTCCCCCATCATCAAGAAGAATTGT-------
>ACRJP458-10|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
-----------TCCCTTTAATACTAGGAGCCCCTGACATAGCCTTTCCTAAATAAT-----
>ASILO303-17|Dip|gs-Par|sp-Par vid|subsp-NA|co
-------TAAGATTCTGATTACTCCCCCCCTCTCTAACTCTTCTTCTTCTATAGTAGATG
>ASILO326-17|Dip|gs-Goe|sp-Goe par|subsp-NA|c
TAAGATTTTGATTATTACCCCCTTCATTAACCAGGAACAGGATGA---------------
>CLT100-09|Lep|gs-Col|sp-Col elg|subsp-NA|co-Buru
AACATTATATTTGGAATTT-------GATCAGGAATAGTCGGAACTTCTCTGAA------
>PMANL2431-12|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_
----ATGCCTATTATAATTGGAGGATTTGGAAAACCTTTAATATT----CCGAAT
>STBOD057-09|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
ATCTAATATTGCACATAGAGGAACCTCNGTATTTTTTCTCTCCATCT------TTAG
>TBBUT582-11|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
-----CCCCCTCATTAACATTACTAAGTTGAAAATGGAGCAGGAACAGGATGA
>TBBUT583-11|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
TAAGATTTTGACTCATTAA----------------AATGGAGCAGGAACAGGATGA
>AFBTB001-09|Col|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
TAAGCTCCATCC-------------TAGAAAGAGGGG---------GGGTGA
>AFBTB003-09|Col|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
TAAGATTTTGACTTCTGC------CATGAGAAAGA-------------AGGGTGA
>AFBTB002-09|Cole|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
-------TCTTCTGCTCAT-------GGGGCAGGAACAGGG----------TGA
$ awk -F'[|]' 'NR%2{f=seen[$1]++} !f' file
>ACML500-12|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_-2
----TAAGATTTTGACTTCTTCCCCCATCATCAAGAAGAATTGT-------
>ACRJP458-10|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
-----------TCCCTTTAATACTAGGAGCCCCTGACATAGCCTTTCCTAAATAAT-----
>ASILO303-17|Dip|gs-Par|sp-Par vid|subsp-NA|co
-------TAAGATTCTGATTACTCCCCCCCTCTCTAACTCTTCTTCTTCTATAGTAGATG
>ASILO326-17|Dip|gs-Goe|sp-Goe par|subsp-NA|c
TAAGATTTTGATTATTACCCCCTTCATTAACCAGGAACAGGATGA---------------
>CLT100-09|Lep|gs-Col|sp-Col elg|subsp-NA|co-Buru
AACATTATATTTGGAATTT-------GATCAGGAATAGTCGGAACTTCTCTGAA------
>PMANL2431-12|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_
----ATGCCTATTATAATTGGAGGATTTGGAAAACCTTTAATATT----CCGAAT
>STBOD057-09|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
ATCTAATATTGCACATAGAGGAACCTCNGTATTTTTTCTCTCCATCT------TTAG
>TBBUT582-11|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
-----CCCCCTCATTAACATTACTAAGTTGAAAATGGAGCAGGAACAGGATGA
>TBBUT583-11|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
TAAGATTTTGACTCATTAA----------------AATGGAGCAGGAACAGGATGA
>AFBTB001-09|Col|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
TAAGCTCCATCC-------------TAGAAAGAGGGG---------GGGTGA
>AFBTB003-09|Col|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
TAAGATTTTGACTTCTGC------CATGAGAAAGA-------------AGGGTGA
>AFBTB002-09|Cole|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
-------TCTTCTGCTCAT-------GGGGCAGGAACAGGG----------TGA
To run it on multiple files at once would be this to remove duplicates across all files:
awk -F'[|]' 'FNR%2{f=seen[$1]++} !f' *
or this to only remove duplicates within each file:
awk -F'[|]' 'FNR==1{delete seen} FNR%2{f=seen[$1]++} !f' *

How to Field Separate and append text using awk

Experts,
I have the following text in an xml files ( there will 20,000 rows in file).
<record record_no = "1" error_code="101">"21006041";"28006041";"34006211";"43";"101210-0001"
Here is how I need the result for each row to be and append to new file.
"21006041";"28006041";"34006211";"43";"101210-0001";101
Here is what I need to do to get the above result.
I replaced " with "
remove <record record_no = "1" error_code="
Get the text 101 ( it can have any value in this position)
append to the last.
Here is what I have been trying.
BEGIN { FS=OFS=";" }
/<record/ {
gsub(/"/,"\"")
gsub(/&apos;/,"")
gsub(/.*="|">.*/,"",$1)
$(NF+1)=$1;
$1="";
print $0;
}
This should do the trick.
awk -F'">' -v OFS=';' '{gsub(/<record record_no = \"[0-9]+\" error_code="/,""); gsub(/"/,"\""); print $2,$1}'
The strategy is to:
split the string at closing chars of the xml element ">
remove the first bit of the xml element including the attribute names leaving only the error code.
replace all " xml entities with ".
print the two FS sections in reverse order.
Test it out with the following data generation script. The script will generate 500x20000 line files with records of random length, some with dashes in the values.
#!/bin/bash
recCount=0
for h in {1..500};
do
for i in {1..20000};
do
((recCount++))
error=$(( RANDOM % 998 + 1 ))
record="<record record_no = "'"'"${recCount}"'"'" error_code="'"'"${error}"'"'">"
upperBound=$(( RANDOM % 4 + 5 ))
for (( k=0; k<${upperBound}; k++ ));
do
randomVal=$(( RANDOM % 99999999 + 1))
record+=""${randomVal}"
if [[ $((RANDOM % 4)) == 0 ]];
then
randomVal=$(( RANDOM % 99999999 + 1))
record+="-${randomVal}"
fi
record+="""
if [[ $k != $(( ${upperBound} - 1 )) ]];
then
record+=";"
fi
done;
echo "${record}" >> "file-${h}.txt"
done;
done;
On my laptop I get the following performance.
$ time cat file-*.txt | awk -F'">' -v OFS=';' '{gsub(/<record record_no = \"[0-9]+\" error_code="/,""); gsub(/"/,"\""); print $2,$1}' > result
real 0m18.985s
user 0m17.673s
sys 0m2.697s
As an added bonus, here is the "equivalent" command in sed:
sed -e 's|\("\)|"|g' -e 's|^.*error_code="\([^>]\+\)">\(.\+\).*$|\2;\1|g'
Much slower although the strategy is the same. Two expressions are used. First replace all " xml entities with ". Lastly group all characters (.+) after >. Display the remembered patterns in reverse order \2;\1
Timing statistics:
$ time cat file-* | sed -e 's|\("\)|"|g' -e 's|^.*error_code="\([^>]\+\)">\(.\+\).*$|\2;\1|g' > result.sed
real 5m59.576s
user 5m56.136s
sys 0m9.850s
Is this too thick:
$ awk -F""+" -v OFS='";"' -v dq='"' '{gsub(/^.*="|">$/,"",$1);print dq""$2,$4,$6,$8,$10dq";"$1}' test.in
"21006041";"28006041";"34006211";"43";"101210-0001";101

awk: changing OFS without looping though variables

I'm working on an awk one-liner to substitute commas to tabs in a file ( and swap \\N for missing values in preparation for MySQL select into).
The following link http://www.unix.com/unix-for-dummies-questions-and-answers/211941-awk-output-field-separator.html (at the bottom) suggest the following approach to avoid looping through the variables:
echo a b c d | awk '{gsub(OFS,";")}1'
head -n1 flatfile.tab | awk -F $'\t' '{for(j=1;j<=NF;j++){gsub(" +","\\N",$j)}gsub(OFS,",")}1'
Clearly, the trailing 1 (can be a number, char) triggers the printing of the entire record. Could you please explain why this is working?
SO also has Print all Fields with AWK separated by OFS , but in that post it seems unclear why this is working.
Thanks.
Awk evaluates 1 or any number other than 0 as a true-statement. Since, true statements without the action statements part are equal to { print $0 }. It prints the line.
For example:
$ echo "hello" | awk '1'
hello
$ echo "hello" | awk '0'
$

How to strictly compare two long numeric string in awk

I wrote some awk script to process some data, and found the result unexpected.
I found the root cause is that the following string comparison is not correct
echo "59558711052462309110012 59558711052462313120012"|awk '{print $1;print $2;print ($1==$2)?"eq":"ne"}'
The result is
59558711052462309110012
59558711052462313120012
eq
I guess the reason is that awk treats the two numeric strings as numbers, and cuts off them to compare.
My question is that how can I strictly compare the two strings in awk.
Force a string comparison by telling awk that at least one of the operands IS a string by concatenating that operand with the null string:
echo "59558711052462309110012 59558711052462313120012"|
awk '{print $1;print $2;print ($1""==$2)?"eq":"ne"}'
59558711052462309110012
59558711052462313120012
ne
#EdMorton's solution already fails when a positive sign exists in front of it, all else being equal :
echo '59558711052462309110012' | mawk '1; 1' |
mawk '($++NF = (!_?_="+":_=__) $!__)^__' | ...
... | awk ' { print $0, "\f\r\t" ( ($1""==$2) ? "eq" : "ne" ) }'
1 59558711052462309110012 +59558711052462309110012
ne
2 59558711052462309110012 59558711052462309110012
eq