Unix Shell Script - AWK delimiter issue

Unix Shell Script - AWK delimiter issue - awk

I have following lines in a file. Please note I have intentionally kept the extra hash between 2 and 0 in the 2nd line.
File name : test.txt
Name#|#Age#|#Dept
AC#|#2#0#|#Science
BC#|#22#|#Commerce
I am using awk to get the data in Dept column
awk -F "#|#" -v c="Dept" 'NR==1{for (i=1; i<=NF; i++) if ($i==c){p=i; break}; next} {print $p}' "test.txt" >> result.txt
The result.txt shows me the following
|
Commerce
The first line is coming as pipe because if the extra # in the first line.
Can anyone help on this

Currently the meaning of the delimiter set is: match # or #. The pipe | character in this case acts as an OR statement; instead try using:
awk -F '#[|]#' ...
Putting | into a character class [ ... ] awk will match it literally.

If you desire to extract Dept in your content, here's a good choice you can choose from,
awk -F'#' 'NR>1{print $NF}' test.txt
output:
Science
Commerce

Related

How to extract word from a string that may/may not start with a single quote

Sample string:
'kernel-rt|kernel-alt|/kernel-' 'headers|xen|firmware|tools|python|utils'
cut -d' ' -f 1 string.txt gives me
'kernel-rt|kernel-alt|/kernel-'
But how do we proceed further to get just the 'kernel' from it?

Assuming you want only the 3rd kernel (in bold) and not the others
'kernel-rt|kernel-alt|/kernel-' 'headers|xen|firmware|tools|python|utils'
Here is how you extract it using single command awk (standard Linux gawk).
input="kernel-rt|kernel-alt|/kernel-' 'headers|xen|firmware|tools|python|utils"
echo $input|awk -F"|" '{split($3,a,"-");match(a[1],"[[:alnum:]]+",b);print b[0]}'
explanation
-F"|" specify field separator is | so that only is 3rd field required
split($3,a,"-") split 3rd field by -, left part assigned to a[1]
match(a[1],"[[:alnum:]]+",b) from a[1] extract sequence of alphanumeric string into b[0]
print b[0] output the matched string.
If you want to extract kernel from 2nd or 1st fields. Change $3 to $2 or $1.

$ cat file
'kernel-rt|kernel-alt|/kernel-' 'headers|xen|firmware|tools|python|utils'
$
$ awk '{print $1}' file
'kernel-rt|kernel-alt|/kernel-'
$
$ awk '{gsub(/\047/,"",$1); print $1}' file
kernel-rt|kernel-alt|/kernel-
$
$ awk '{gsub(/\047/,""); split($1,f,/[|]/); print f[1]}' file
kernel-rt
and just to make you think...
$ awk '{gsub(/\047|\.*/,"")}1' file
kernel-rt

awk printing the second to last record of a file

I have a file set up like
Words on
many line
%
More Words
on many lines
%
Even More Words
on many lines
%
and I would like to output the second to last record of this file where the record is delimited by % after each block of text.
I have used:
awk -v RS=\% ' END{ print NR }' $f
to find the number of records (1136). Then I did
awk -v RS=\% ' { print $(NR-1) }' $f
and
awk -v RS=\% ' { print $(NR=1135) }' $f
.
Neither of these worked, and, instead, displayed a record towards the beginning of the file and a many blank lines.
OUTPUT:
"You know, of course, that the Tasmanians, who never committed adultery, are
now extinct."
-- M. Somerset Maugham
"The
is
what
that
This output had many, many more blank lines and contains a record near the middle of the file.
awk -v RS=\% 'END{ print $(NR-1) }' $f
returns a blank line. The same command with different $(NR-x) values also returns a blank line.
Can someone help me to print the second to last record in this case?
Thanks

You can do:
awk '{this=last;last=$0} END{print this}' file
Or, if you don't mind having the entire file in memory:
awk '{a[NR]=$0} END{print a[NR-1]}' file
Or, if it is just line count (or record count) based, you can keep a rolling deletion going so you are not too piggish on memory:
$ seq 999999 | tail -2
999998
999999
$ seq 999999 | awk '{a[NR]=$0; delete a[NR-3]} END{print a[NR-1]}'
999998
If they are blocks of text the same method works if you can separate the blocks into delimited records.
Given:
$ echo "$txt"
Words on
many line
%
More Words
on many lines
%
Even More Words
on many lines
%
You can do:
$ echo "$txt" | awk -v RS=\% '{a[NR]=$0} END{print a[NR-1]}'
Even More Words
on many lines
$ echo "$txt" | awk -v RS=\% '{a[NR]=$0} END{print a[NR-2]}'
More Words
on many lines
If you want to not print the leading and trailing \n you can do:
$ echo "$txt" | awk 'BEGIN{RS="%\n"} {a[NR]=$0} END{printf a[NR-2]}'
Words on
many line
Finally, if you know the specific record you want to print, do it this way in awk:
$ seq 999999 | awk -v mrk=1135 'NR==mrk{print; exit}'
1135
If you want a random record, you can do:
$ awk -v min=1 -v max=1135 'BEGIN{srand()
RS="%\n"
tgt=int(min+rand()*(max-min+1))
}
NR==tgt{print; exit}' file

Does the solution have to be with awk? Just using head and tail would be simpler.
tail -2 file.txt | head 1 > justthatline.txt

The best way for this would be to use the BEGIN construct.
awk 'BEGIN{RS="%\n"; ORS="%\n"}(NR>=2){print}' file
RS and ORS set the input file and output record separators respectively.

match two pattern in log file and output as table

I need to match two patterns in a log file and need to get the output as a table if possible. The log file has several lines with the words I want to match, here is an example of the log file:
Seed for random set to: uuzTjCqMVRk=
--out /home/ALL/ADRL.GLND.FET-EnhA
--max-shift False
--min-shift False
p-value = 0.542
Seed for random set to: P2+shGCxj70=
--out /home/ALL/BLD.CD14.MONO-EnhA
--max-shift False
--min-shift False
p-value = 0.737
I would like to get an output like this (tab delimited to export as text file):
Group Pvalue
ADRL.GLND.FET-EnhA 0.542
BLD.CD14.MONO-EnhA 0.737
I would like to do it in bash if it is possible
EDIT:
This is what I have tried:
grep 'out' file.log | awk '{print $0}' > file1.txt
grep 'p-value' file.log | awk '{print $0}' > file2.txt
paste -d"\t" file1.txt file2.txt > pval.txt

$ awk -F'[/ ]' -v OFS='\t' 'BEGIN{print "Group","Pvalue"} (NR%7)==2{g=$NF} (NR%7)==6{print g, $NF}' file
Group Pvalue
ADRL.GLND.FET-EnhA 0.542
BLD.CD14.MONO-EnhA 0.737
or if you prefer:
$ awk -F'[/ ]+' -v OFS='\t' 'BEGIN{print "Group","Pvalue"} $2=="--out"{g=$NF} $1=="p-value"{print g, $NF}' file
Group Pvalue
ADRL.GLND.FET-EnhA 0.542
BLD.CD14.MONO-EnhA 0.737

Zero error-checking:
awk '/--out/ { sub(".*/","",$2);printf "%s\t",$2; } /p-value = / { print $3; }' < file.log
If a line has --out, prints the base name of the path followed by a tab. If a line has p-value =, prints the number and a newline.
awk is nice, in this case, because you can modify the lines you match. Thinking in terms of grep, you'd have to deploy additional tools (like sed) to get the parts you wanted, then reassemble them into a useful form. Your use of grep and paste is valiant, and with tweaking would work, at the cost of many more processes and deployed tools.
You could do this in one bigger block of awk pattern matching, which would be more bullet-proof. I'll leave as exercise for the reader.

AWK get specificic pattern

I have lines like this:
Volume.Free_IBM_LUN59_28D: 2072083693568
I would like to get only IBM_LUN59_28D from this line using awk.
Thanks

You can use sub to do substitutions on each input line, as per the following transcript:
pax> echo 'Volume.Free_IBM_LUN59_28D: 2072083693568' | awk '
...> {
...> sub (".*Free_", "");
...> sub (":.*", "");
...> print
...> }'
IBM_LUN59_28D
That command crosses multiple lines for readability but, if you're operating on a file and not too concerned about readability, you can just use the compressed version:
awk '{sub(".*Free_","");sub(":.*","");print}' inputFile
If you're amenable to non-awk solutions, you could also use sed:
sed -e 's/.*Free_//' -e 's/:.*//' inputFile
Note that both those solutions rely on your (somewhat sparse) test data. If your definition of "like" includes preceding textual segments other than Free_ or subsequent characters other than :, some more work may be needed.
For example, if you wanted the string between the first _ and the first :, you could use:
awk '{sub("[^_]*_","");sub(":.*","");print}'

With sed:
sed 's/[^_]*_\(.*\):.*/\1/'
Search for sequence of non _ characters followed by _ (this will match Volume.Free_), then another sequence of characters (this will match IBM_LUN59_28D, we group this for future use), followed by : and any char sequence. Substitute with the saved pattern (\1). That's it.
Sample:
$ echo "Volume.Free_IBM_LUN59_28D: 2072083693568" | sed 's/[^_]*_\(.*\):.*/\1/'
IBM_LUN59_28D

Here is one awk
awk -F"Free_" 'NF>1{split($2,a,":");print a[1]}'
Eks:
echo "Volume.Free_IBM_LUN59_28D: 2072083693568" | awk -F"Free_" 'NF>1{split($2,a,":");print a[1]}'
IBM_LUN59_28D
It divides the line by Free_.
If line then have more than one field NF>1 then:
Split second field bye : and print first part a[1]

With awk:
echo "$val" | awk -F: '{print $1}' | awk -F. '{print $2}' | awk '{print substr($0,6)}'
where the given string is in $val.

How to preserve spaces in input fields with awk

I'm trying to do something pretty simple but its appears more complicated than expected...
I've lines in a text file, separated by the comma and that I want to output to another file, without the first field.
Input:
echo file1,item, 12345678 | awk -F',' '{OFS = ";";$1=""; print $0}'
Output:
;item; 12345678
As you can see the spaces before 12345678 are kind of merged into one space only.
I also tried with the cut command:
echo file1,item, 12345678 | cut -d, -f2-
and I ended up with the same result.
Is there any workaround to handle this?
Actually my entire script is as follows:
cat myfile | while read l_line
do
l_line="'$l_line'"
v_OutputFile=$(echo $l_line | awk -F',' '{print $1}')
echo $(echo $l_line | cut -d, -f2-) >> ${v_OutputFile}
done
But stills in l_line all spaces but one are removed. I also created the quotes inside the file but same result.

it has nothing to do with awk. quote the string in your echo:
#with quotes
kent$ echo 'a,b, c'|awk -F, -v OFS=";" '{$1="";print $0}'
;b; c
#without quotes
kent$ echo a,b, c|awk -F, -v OFS=";" '{$1="";print $0}'
;b; c

The problem is with your invocation of the echo command you're using to feed awk the test data above. The shell is looking at this command:
echo file1,item, 12345678
and treating file1,item, and 12345678 as two separate parameters to echo. echo just prints all its parameters, separated by one space.
If you were to quote the whitespace, as follows:
echo 'file1,item, 12345678'
the shell would interpret this as a single parameter to feed to echo, so you'd get the expected result.
Update after edit to OP - having seen your full script, you could do this entirely in awk:
awk -F, '{ OFS = "," ; f = $1 ; sub("^[^,]*,","") ; print $0 >> f }' myfile

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Unix Shell Script - AWK delimiter issue - awk

Currently the meaning of the delimiter set is: match # or #. The pipe | character in this case acts as an OR statement; instead try using: awk -F '#[|]#' ... Putting | into a character class [ ... ] awk will match it literally.

If you desire to extract Dept in your content, here's a good choice you can choose from, awk -F'#' 'NR>1{print $NF}' test.txt output: Science Commerce

Related

How to extract word from a string that may/may not start with a single quote

awk printing the second to last record of a file

match two pattern in log file and output as table

AWK get specificic pattern

How to preserve spaces in input fields with awk

Categories

Resources