removing a space from a string by awk - awk

How can I remove a space located between name and lastname from a string by using awk?
most of the examples are about redirecting data from command line to awk but I need to manipulate a string inside an awk script.
Convert this:
"steve john"
to:
"stevejohn"
I have a string variable which I asked user to input inside an awk script. I need to remove the spaces.

gsub is your friend. The following command basically does a global substitution of a regular expression (a single space in this case), replacing it with an empty string, on the target $0 (the whole line).
pax> echo "steve john" | awk '{ gsub (" ", "", $0); print}'
stevejohn
You can use any target, including one input by a user:
pax> awk 'BEGIN {getline xyzzy ; gsub(" ","", xyzzy) ; print xyzzy}'
hello there my name is pax
hellotheremynameispax

Use sed:
$ echo "steve john" | sed 's/ //g'
stevejohn
If you must use awk, do this:
$ echo "steve john" | gawk '{print $1 $2}'
stevejohn
Edit:
Inside a bash script, you can do this:
s="steve john" # user input
t=$(echo $s | gawk '{print $1 $2}')
echo $t

echo "john smith aaa " |\
awk 'BEGIN {FS=" "; OFS=""} {for(i=1;i<=NF;++i) {out = out OFS $i}} END {print out;}'

Related

How to extract string from a file in bash

I have a file called DB_create.sql which has this line
CREATE DATABASE testrepo;
I want to extract only testrepo from this. So I've tried
cat DB_create.sql | awk '{print $3}'
This gives me testrepo;
I need only testrepo. How do I get this ?
With your shown samples, please try following.
awk -F'[ ;]' '{print $(NF-1)}' DB_create.sql
OR
awk -F'[ ;]' '{print $3}' DB_create.sql
OR without setting any field separators try:
awk '{sub(/;$/,"");print $3}' DB_create.sql
Simple explanation would be: making field separator as space OR semi colon and then printing 2nd last field($NF-1) which is required by OP here. Also you need not to use cat command with awk because awk can read Input_file by itself.
Using gnu awk, you can set record separator as ; + line break:
awk -v RS=';\r?\n' '{print $3}' file.sql
testrepo
Or using any POSIX awk, just do a call to sub to strip trailing ;:
awk '{sub(/;$/, "", $3); print $3}' file.sql
testrepo
You can use
awk -F'[;[:space:]]+' '{print $3}' DB_create.sql
where the field separator is set to a [;[:space:]]+ regex that matches one or more occurrences of ; or/and whitespace chars. Then, Field 3 will contain the string you need without the semi-colon.
More pattern details:
[ - start of a bracket expression
; - a ; char
[:space:] - any whitespace char
] - end of the bracket expression
+ - a POSIX ERE one or more occurrences quantifier.
See the online demo.
Use your own code but adding the function sub():
cat DB_create.sql | awk '{sub(/;$/, "",$3);print $3}'
Although it's better not using cat. Here you can see why: Comparison of cat pipe awk operation to awk command on a file
So better this way:
awk '{sub(/;$/, "",$3);print $3}' file

Using pipe character as a field separator

I'm trying different commands to process csv file where the separator is the pipe | character.
While those commands do work when the comma is a separator, it throws an error when I replace it with the pipe:
awk -F[|] "NR==FNR{a[$2]=$0;next}$2 in a{ print a[$2] [|] $4 [|] $5 }" OFS=[|] file1.csv file2.csv
awk "{print NR "|" $0}" file1.csv
I tried, "|", [|], /| to no avail.
I'm using Gawk on windows. What I'm I missing?
You tried "|", [|] and /|. /| does not work because the escape character is \, whereas [] is used to define a range of fields, for example [,-] if you want FS to be either , or -.
To make it work "|" is fine, are you sure you used it this way? Alternativelly, escape it --> \|:
$ echo "he|llo|how are|you" | awk -F"|" '{print $1}'
he
$ echo "he|llo|how are|you" | awk -F\| '{print $1}'
he
$ echo "he|llo|how are|you" | awk 'BEGIN{FS="|"} {print $1}'
he
But then note that when you say:
print a[$2] [|] $4 [|] $5
so you are not using any delimiter at all. As you already defined OFS, do:
print a[$2], $4, $5
Example:
$ cat a
he|llo|how are|you
$ awk 'BEGIN {FS=OFS="|"} {print $1, $3}' a
he|how are
For anyone finding this years later: ALWAYS QUOTE SHELL METACHARACTERS!
I think gawk (GNU awk) treats | specially, so it should be quoted (for awk). OP had this right with [|]. However [|] is also a shell pattern. Which in bash at least, will only expand if it matches a file in the current working directory:
$ cd /tmp
$ echo -F[|] # Same command
-F[|]
$ touch -- '-F|'
$ echo -F[|] # Different output
-F|
$ echo '-F[|]' # Good quoting
-F[|] # Consistent output
So it should be:
awk '-F[|]'
# or
awk -F '[|]'
awk -F "[|]" would also work, but IMO, only use soft quotes (") when you have something to actually expand (or the string itself contains hard quotes ('), which can't be nested in any way).
Note that the same thing happens if these characters are inside unquoted variables.
If text or a variable contains, or may contain: []?*, quote it, or set -f to turn off pathname expansion (a single, unmatched square bracket is technically OK, I think).
If a variable contains, or may contain an IFS character (space, tab, new line, by default), quote it (unless you want it to be split). Or export IFS= first (bearing the consequences), if quoting is impossible (eg. a crazy eval).
Note: raw text is always split by white space, regardless of IFS.
Try to escape the |
echo "more|data" | awk -F\| '{print $1}'
more
You can escape the | as \|
$ cat test
hello|world
$ awk -F\| '{print $1, $2}' test
hello world

awk command with BEGIN does not work for me

This is the simple awk command i am trying to write
grep "Inputs - " access.log | awk 'BEGIN { FS = "Inputs -" } ; { print $2 }'
i am trying to grep the file access.log for all the lines with "Input -" and trying to awk the part after the "Input -". This is giving the following error
awk: syntax error near line 1
awk: bailing out near line 1
I am confused what is the issue with this, this should work!!!!
I have also tried the following and it does not work
grep "Inputs - " L1Access.log | awk -F='Inputs' '{print $1}'
Here is a sample input text file
This is line number 1. I dont want this line to be part of grep output
This is line number 2. I want this line to be part of grep output. This has "Input -", I want to display only the part after "Input -" from this line using awk
your problem cannot be reproduced here:
kent$ cat f
foo - xxx
foo - yyy
foo - zzz
fooba
kent$ grep 'foo - ' f| awk 'BEGIN { FS = "foo -"};{print $2}'
xxx
yyy
zzz
There must be something wrong in your awk codes. Besides, if you want to do a grep and awk to extract the part after your Inputs - you can use grep to do it in single shot:
kent$ grep -Po 'foo - \K.*' f
xxx
yyy
zzz
Since you stated you want everything after the first instance "Inputs -", and since your grep is unnecessary:
nawk -F"Inputs -" 'BEGIN {OFS="Inputs -"} {line=""}; { for(i=2;i<=NF;i++) line=line OFS $i} {print line}' test
Your own answer will only print out the second element. In the event that you have more than one "Input -" you will be missing the remaining of the line. If you don't want the second (or third.. ) "Inputs -" in the output you could use:
nawk -F"Input -" '{ for(i=2;i<=NF;i++) print $i}' test
OK folks i see what my issue is. I am using solaris and in solaris the awk does not have capability for regex, meaning it does not support more than 1 charater in the field seperator. So i used nawk
Please refer to this post
Stackoverflow post
grep "Inputs - " L1Access.log | nawk 'BEGIN { FS = "Inputs -" } { print $2 }'
this worked.
You are not clear on what to get. Here is a sample file:
cat file
test Inputs - more data
Here is nothing to get
yes Inputs - This is what we need Inputs - but what about this?
You can then use awk to get data:
awk -F"Inputs - " 'NF>1 {print $2}' file
more data
This is what we need
or like this?
awk -F"Inputs - " 'NF>1 {print $NF}' file
more data
but what about this?
By setting separator to Inputs - and test for NF>1 it will only print lines with Inputs -

How to preserve spaces in input fields with awk

I'm trying to do something pretty simple but its appears more complicated than expected...
I've lines in a text file, separated by the comma and that I want to output to another file, without the first field.
Input:
echo file1,item, 12345678 | awk -F',' '{OFS = ";";$1=""; print $0}'
Output:
;item; 12345678
As you can see the spaces before 12345678 are kind of merged into one space only.
I also tried with the cut command:
echo file1,item, 12345678 | cut -d, -f2-
and I ended up with the same result.
Is there any workaround to handle this?
Actually my entire script is as follows:
cat myfile | while read l_line
do
l_line="'$l_line'"
v_OutputFile=$(echo $l_line | awk -F',' '{print $1}')
echo $(echo $l_line | cut -d, -f2-) >> ${v_OutputFile}
done
But stills in l_line all spaces but one are removed. I also created the quotes inside the file but same result.
it has nothing to do with awk. quote the string in your echo:
#with quotes
kent$ echo 'a,b, c'|awk -F, -v OFS=";" '{$1="";print $0}'
;b; c
#without quotes
kent$ echo a,b, c|awk -F, -v OFS=";" '{$1="";print $0}'
;b; c
The problem is with your invocation of the echo command you're using to feed awk the test data above. The shell is looking at this command:
echo file1,item, 12345678
and treating file1,item, and 12345678 as two separate parameters to echo. echo just prints all its parameters, separated by one space.
If you were to quote the whitespace, as follows:
echo 'file1,item, 12345678'
the shell would interpret this as a single parameter to feed to echo, so you'd get the expected result.
Update after edit to OP - having seen your full script, you could do this entirely in awk:
awk -F, '{ OFS = "," ; f = $1 ; sub("^[^,]*,","") ; print $0 >> f }' myfile

printing variable inside awk

In this script , I want awk to print the variables $file, $f, $order and sum/NR (all in a single row)
#!/bin/bash
for file in pmb_mpi tau xhpl mpi_tile_io fftw ; do
for f in 2.54 1.60 800 ;do
if [ ${f} = 2.54 ]
then
for order in even odd ; do
# echo ${file}_${f}_${order}_v1.xls >> P-state-summary.xls
awk '{sum+=$2} END {print ${file}_${f}_${order}_v1.xls, sum/NR}' ${file}_${f}_${order}_v1.xls >> P-state-summary.xls
done
else
# echo ${file}_${f}_v1.xls >> P-state-summary.xls
awk '{sum+=$2} END {print ${file}_${f}_v1.xls , sum/NR}' ${file}_${f}_v1.xls >> P-state-summary.xls
fi
done
done
Colud anyone of you kindly help me with this ?
awk doesn't go out and get shell variables for you, you have to pass them in as awk variables:
pax> export x=XX
pax> export y=YY
pax> awk 'BEGIN{print x "_" y}'
_
pax> awk -vx=$x -v y=$y 'BEGIN{print x "_" y}'
XX_YY
There is another way of doing it by using double quotes instead of single quotes (so that bash substitutes the values before awk sees them), but then you have to start escaping $ symbols and all sorts of other things in your awk command:
pax> awk "BEGIN {print \"${x}_${y}\"}"
XX_YY
I prefer to use explicit variable creation.
By the way, there's another solution to your previous related question here which should work.
You can do this:
echo -n "${file}_${f}_${order}_v1.xls " >> P-state-summary.xls
# or printf "${file}_${f}_${order}_v1.xls " >> P-state-summary.xls
awk '{sum+=$2} END {print sum/NR}' "${file}_${f}_${order}_v1.xls" |
tee "${file}_${f}_avrg.xls" >> P-state-summary.xls
Using echo -n or printf without a "\n" will output the text without a newline so the output of the awk command will follow it on the same line. I added a space as a separator, but you could use anything.
Using tee will allow you to write your output to the individual files and the summary file using only one awk invocation per input (order) file.