String concatenation doesn't work in gawk print instruction - awk

I have the following grep and gawk line running in windows
grep ItemDischarged D:\systems\CmcComRouting.log | gawk -v OFS=, "{print $8}" | cut -d ">" -f 1 | uniq -c | gawk -v OFS=, "{print $1,$2}" > d:\03TotalItems.log
the output is as follows
59523,ItemDischargedTlg
What I want to do is add "Lower" to the end of "ItemDischargedTlg" but cannot figure out how to do it, I have tried
{print $1,$2"Lower"}
but it prints nothing.
Thanks

This might do the trick:
gawk -v OFS=, '{$2=$2"Lower";print $1,$2}'
When trying to concatenate strings and commas you should be careful. Commas and concatenation as argument of a print instruction don't go well together.
If on windows, be careful with " and '.

Related

Is using awk at least 'awk -F' always will be fine?

What is the difference on Ubuntu between awk and awk -F? For example to display the frequency of the cpu core 0 we use the command
cat /proc/cpuinfo | grep -i "^ cpu MHz" | awk -F ":" '{print $ 2}' | head -1
But why it uses awk -F? We could put awk without the -F and it would work of course (already tested).
Because without -F , we couldn't find from wath separator i will begin the calculation and print the right result. It's like a way to specify the kind of separator for this awk's using. Without it, it will choose the trivial separator in the line like if i type on the terminal: ps | grep xeyes | awk '{print $1}' ; in this case it will choose the space ' ' as a separator to print the first value: pid OF the process xeyes. I found it in https://www.shellunix.com/awk.html. Thanks for all.

Using awk to print without double quotes

I would like to get the right value of the following command as a string without double quotes.
$ grep '^VERSION=' /etc/os-release
VERSION="20.04.3 LTS (Focal Fossa)"
When I pipe it with the following awk, I don't get the desired output.
$ grep '^VERSION=' /etc/os-release | awk '{print $0}'
VERSION="20.04.3 LTS (Focal Fossa)"
$ grep '^VERSION=' /etc/os-release | awk '{print $1}'
VERSION="20.04.3
$ grep '^VERSION=' /etc/os-release | awk '{print $2}'
LTS
How can I fix that?
You may use this single awk command:
awk -F= '$1=="VERSION" {gsub(/"/, "", $2); print $2}' /etc/os-release
20.04.3 LTS (Focal Fossa)
1st solution: With your shown samples, please try following awk code.
awk 'match($0,/^VERSION="[^"]*/){print substr($0,RSTART+9,RLENGTH-9)' Input_file
Explanation: Simple explanation would be, using match function of awk to match starting VERSION=" till next occurrence of " and then printing the matched part(to get only desired output as per OP's shown samples).
2nd solution: Using GNU grep with PCRE regex enabled option try following.
grep -oP '^VERSION="\K[^"]*' Input_file
3rd solution: Using awk's capability to set different field separators and then check conditions accordingly and print values.
awk -F'"' '$1=="VERSION="{print $2}' Input_file
Assuming that "the right value" you want output is 20.04.3:
$ awk -F'[" ]' '/^VERSION=/{print $2}' file
20.04.3
or if it's the whole quoted string:
$ awk -F'"' '/^VERSION=/{print $2}' file
20.04.3 LTS (Focal Fossa)
You can use an awk command like
awk 'match($0, /^VERSION="([^"]*)"/, m) {print m[1]}' /etc/os-release
Here, ^VERSION="([^"]*)" matches VERSION=" at the start of the string (^), then captures into Group 1 any zero or more chars other than " (with ([^"]*)) and then matches ". The match is saved in m where m[1] holds the Group 1 value.
Or, sed like
sed -n '/^VERSION="\([^"]*\)".*/s//\1/p' /etc/os-release
See an online test:
s='VERSION="20.04.3 LTS (Focal Fossa)"'
awk 'match($0, /^VERSION="([^"]*)"/, m) {print m[1]}' <<< "$s"
sed -n '/^VERSION="\([^"]*\)".*/s//\1/p' <<< "$s"
Here, -n option suppresses the default line output, /^VERSION="\([^"]*\)".*/ matches a string starting with VERSION=", then capturing into Group 1 any zero or more chars other than ", and then matching " and the rest of the string, and replacing the whole match with the Group 1 value. // means the previous regex pattern must be used. p only prints the result of the substition.
Both output 20.04.3 LTS (Focal Fossa).
Since the file /etc/os-release conforms to a variable assignment in bash or the shell in general (POSIX), sourcing it should do the job.
source /etc/os-release; echo "$VERSION"
Using a subshell just in case one does not want the pollute the current env variables.
( source /etc/os-release; echo "$VERSION" )
Assigning it to a variable.
version=$( source /etc/os-release; echo "$VERSION" )
If the shell you're using does not conform to POSIX.
sh -c '. /etc/os-release; echo "$VERSION"'
See your local man page if available.
man 5 os-release

Regexp in gawk matches multiples ways

I have some text I need to split up to extract the relevant argument, and my [g]awk match command does not behave - I just want to understand why?! (I have written a less elegant way around it now...).
So the string is blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header
I want to output just the contents of msgcontent1=, so did
echo "blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header" | gawk '{ if (match($0,/msgcontent1=(.*)[|]/,a)) { print a[1]; } }'
Trouble instead of getting
HeaderUUIiewConsenFlagPSMessage
I get the match with everything from there to the last pipe of the string HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002
Now I accept this is because the regexp in /msgcontent1=(.*)[|]/ can match multiple ways, but HOW do I make it match the way I want it to??
With your shown samples please try following. Written and tested in GNU awk this will print only contents from msgcontent1= till | first occurrence.
awk 'match($0,/msgcontent1=[^|]*/){print substr($0,RSTART+12,RLENGTH-12)}' Input_file
OR with echo + awk try:
echo "blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header" |
awk 'match($0,/msgcontent1=[^|]*/){print substr($0,RSTART+12,RLENGTH-12)}'
With FPAT option in GNU awk:
awk -v FPAT='msgcontent1=[^|]*' '{sub(/.*=/,"",$1);print $1}' Input_file
This is your input:
s='blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header'
You may use gnu awk like this to extract value after msgcontent1=:
awk -F= -v RS='|' '$1 == "msgcontent1" {print $2}' <<< "$s"
HeaderUUIiewConsenFlagPSMessage
or using this sed:
sed -E 's/^(.*\|)?msgcontent1=([^|]+).*/\2/' <<< "$s"
HeaderUUIiewConsenFlagPSMessage
Or using this gnu grep:
grep -oP '(^|\|)msgcontent1=\K[^|]+' <<< "$s"
HeaderUUIiewConsenFlagPSMessage
echo "blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header" | awk '{ if (match($0,/msgcontent1=([^\|]*)/,a)) print a[1] }'
this prints HeaderUUIiewConsenFlagPSMessage
The reason your regex match msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002 is that matching is 'hungry' so it allways finds the longest possible match
Also with awk:
echo 'blahblah|msgcontent1=HeaderUUIiewConsenFlagPSMessage|msgtype2=Blah002|msgcontent2=header' | awk -v FS='[=|]' '$2 == "msgcontent1" {print $3}'
HeaderUUIiewConsenFlagPSMessage

Using pipe character as a field separator

I'm trying different commands to process csv file where the separator is the pipe | character.
While those commands do work when the comma is a separator, it throws an error when I replace it with the pipe:
awk -F[|] "NR==FNR{a[$2]=$0;next}$2 in a{ print a[$2] [|] $4 [|] $5 }" OFS=[|] file1.csv file2.csv
awk "{print NR "|" $0}" file1.csv
I tried, "|", [|], /| to no avail.
I'm using Gawk on windows. What I'm I missing?
You tried "|", [|] and /|. /| does not work because the escape character is \, whereas [] is used to define a range of fields, for example [,-] if you want FS to be either , or -.
To make it work "|" is fine, are you sure you used it this way? Alternativelly, escape it --> \|:
$ echo "he|llo|how are|you" | awk -F"|" '{print $1}'
he
$ echo "he|llo|how are|you" | awk -F\| '{print $1}'
he
$ echo "he|llo|how are|you" | awk 'BEGIN{FS="|"} {print $1}'
he
But then note that when you say:
print a[$2] [|] $4 [|] $5
so you are not using any delimiter at all. As you already defined OFS, do:
print a[$2], $4, $5
Example:
$ cat a
he|llo|how are|you
$ awk 'BEGIN {FS=OFS="|"} {print $1, $3}' a
he|how are
For anyone finding this years later: ALWAYS QUOTE SHELL METACHARACTERS!
I think gawk (GNU awk) treats | specially, so it should be quoted (for awk). OP had this right with [|]. However [|] is also a shell pattern. Which in bash at least, will only expand if it matches a file in the current working directory:
$ cd /tmp
$ echo -F[|] # Same command
-F[|]
$ touch -- '-F|'
$ echo -F[|] # Different output
-F|
$ echo '-F[|]' # Good quoting
-F[|] # Consistent output
So it should be:
awk '-F[|]'
# or
awk -F '[|]'
awk -F "[|]" would also work, but IMO, only use soft quotes (") when you have something to actually expand (or the string itself contains hard quotes ('), which can't be nested in any way).
Note that the same thing happens if these characters are inside unquoted variables.
If text or a variable contains, or may contain: []?*, quote it, or set -f to turn off pathname expansion (a single, unmatched square bracket is technically OK, I think).
If a variable contains, or may contain an IFS character (space, tab, new line, by default), quote it (unless you want it to be split). Or export IFS= first (bearing the consequences), if quoting is impossible (eg. a crazy eval).
Note: raw text is always split by white space, regardless of IFS.
Try to escape the |
echo "more|data" | awk -F\| '{print $1}'
more
You can escape the | as \|
$ cat test
hello|world
$ awk -F\| '{print $1, $2}' test
hello world

How to preserve spaces in input fields with awk

I'm trying to do something pretty simple but its appears more complicated than expected...
I've lines in a text file, separated by the comma and that I want to output to another file, without the first field.
Input:
echo file1,item, 12345678 | awk -F',' '{OFS = ";";$1=""; print $0}'
Output:
;item; 12345678
As you can see the spaces before 12345678 are kind of merged into one space only.
I also tried with the cut command:
echo file1,item, 12345678 | cut -d, -f2-
and I ended up with the same result.
Is there any workaround to handle this?
Actually my entire script is as follows:
cat myfile | while read l_line
do
l_line="'$l_line'"
v_OutputFile=$(echo $l_line | awk -F',' '{print $1}')
echo $(echo $l_line | cut -d, -f2-) >> ${v_OutputFile}
done
But stills in l_line all spaces but one are removed. I also created the quotes inside the file but same result.
it has nothing to do with awk. quote the string in your echo:
#with quotes
kent$ echo 'a,b, c'|awk -F, -v OFS=";" '{$1="";print $0}'
;b; c
#without quotes
kent$ echo a,b, c|awk -F, -v OFS=";" '{$1="";print $0}'
;b; c
The problem is with your invocation of the echo command you're using to feed awk the test data above. The shell is looking at this command:
echo file1,item, 12345678
and treating file1,item, and 12345678 as two separate parameters to echo. echo just prints all its parameters, separated by one space.
If you were to quote the whitespace, as follows:
echo 'file1,item, 12345678'
the shell would interpret this as a single parameter to feed to echo, so you'd get the expected result.
Update after edit to OP - having seen your full script, you could do this entirely in awk:
awk -F, '{ OFS = "," ; f = $1 ; sub("^[^,]*,","") ; print $0 >> f }' myfile