what does the field separator in awk do here?

what does the field separator in awk do here? - awk

In this context, how does the specified field separator work?
awk -F\' '{print "conn kill "$2"\nrepair mailbox "$2" repair=1"}'

-F\' is using the single quote ' as the field separator.
Also the ' is being escaped by preceding it with a \ so that awk does not think of the ' as the beginning of the action part.
Alternatively you can enclose the ' in double quotes:
$ echo "foo'bar'baz" | awk -F\' '{print $1}'
foo
$ echo "foo'bar'baz" | awk -F"'" '{print $1}'
foo

AWK processes a file line by line. And each line is separated into fields, that you can then access with the dollar variables $1...$9 ($0 is the whole line, IIRC). By default, the line is split into fields by using separating on whitespace, but you can specify on which character to split by using the -F command line option or the FS variable.
So in your case, the field separator is set to a single quote ('). An input line like foo'bar'baz will thus set $1 == "foo", $2 == "bar" and $3 == "baz".

Related

how to use "," as field delimiter [duplicate]

This question already has answers here:
Escaping separator within double quotes, in awk
(3 answers)
Closed 1 year ago.
i have a file like this:
"1","ab,c","def"
so only use comma a field delimiter will get wrong result, so i want to use "," as field delimiter, i tried like this:
awk -F "," '{print $0}' file
or like this:
awk -F "","" '{print $0}' file
or like this:
awk -F '","' '{print $0}' file
but the result is incorrect, don't know how to include "" as part of the field delimiter itself,

If you can handle GNU awk, you could use FPAT:
$ echo '"1","ab,c","def"' | # echo outputs with double quotes
gawk ' # use GNU awk
BEGIN {
FPAT="([^,]*)|(\"[^\"]+\")" # because FPAT
}
{
for(i=1;i<=NF;i++) # loop all fields
gsub(/^"|"$/,"",$i) # remove leading and trailing double quotes
print $2 # output for example the second field
}'
Output:
ab,c
FPAT cannot handle RS inside the quotes.

What you are attempting seems misdirected anyway. How about this instead?
awk '/^".*"$/{ sub(/^\"/, ""); sub(/\"$/, ""); gsub(/\",\", ",") }1'
The proper solution to handling CSV files with quoting in them is to use a language which has an actual CSV parser. My thoughts go to Python, which includes a csv module in its standard library.

In GNU AWK
{print $0}
does print whole line, if no change were made original line is printed, no matter what field separator you set you will get original lines if only action is print $0. Use $1=$1 to trigger string rebuild.
If you must do it via FS AT ANY PRICE, then you might do it as follows: let file.txt content be
"1","ab,c","def"
then
BEGIN{FS="\x22,?\x22?"}{$1=$1;print $0}
output
1 ab,c def
Note leading space (ab,c is $3). Explanation: I inform GNU AWK that field separator is literal " (\x22, " is 22(hex) in ASCII) followed by zero or one (?) , followed by zero or one (?) literal " (\x22). $1=$1 trigger line rebuilt as mentioned earlier. Disclaimer: this solution assume that you never have escaped " inside your string,
(tested in gawk 4.2.1)

How to extract string from a file in bash

I have a file called DB_create.sql which has this line
CREATE DATABASE testrepo;
I want to extract only testrepo from this. So I've tried
cat DB_create.sql | awk '{print $3}'
This gives me testrepo;
I need only testrepo. How do I get this ?

With your shown samples, please try following.
awk -F'[ ;]' '{print $(NF-1)}' DB_create.sql
OR
awk -F'[ ;]' '{print $3}' DB_create.sql
OR without setting any field separators try:
awk '{sub(/;$/,"");print $3}' DB_create.sql
Simple explanation would be: making field separator as space OR semi colon and then printing 2nd last field($NF-1) which is required by OP here. Also you need not to use cat command with awk because awk can read Input_file by itself.

Using gnu awk, you can set record separator as ; + line break:
awk -v RS=';\r?\n' '{print $3}' file.sql
testrepo
Or using any POSIX awk, just do a call to sub to strip trailing ;:
awk '{sub(/;$/, "", $3); print $3}' file.sql
testrepo

You can use
awk -F'[;[:space:]]+' '{print $3}' DB_create.sql
where the field separator is set to a [;[:space:]]+ regex that matches one or more occurrences of ; or/and whitespace chars. Then, Field 3 will contain the string you need without the semi-colon.
More pattern details:
[ - start of a bracket expression
; - a ; char
[:space:] - any whitespace char
] - end of the bracket expression
+ - a POSIX ERE one or more occurrences quantifier.
See the online demo.

Use your own code but adding the function sub():
cat DB_create.sql | awk '{sub(/;$/, "",$3);print $3}'
Although it's better not using cat. Here you can see why: Comparison of cat pipe awk operation to awk command on a file
So better this way:
awk '{sub(/;$/, "",$3);print $3}' file

Recognising backslash in awk field separator

Input is
AZE D11/879\x0Dabc\x0D\x0A\x1E!DEF F11/999
awk script sets field separator to "\x0D" (I have tried with and without escaping the backslash.
awk script is
BEGIN {FS="\\x0D"}
{print NF}
It should output 3 because there are 2 occurrences of the field separator but it outputs 1 which indicates it is not being recognized.

There are 2 ways to provide a regexp in awk - a static regexp (aka regexp literal) written as /regexp/ and a dynamic regexp (aka computed regexp) written as "regexp" and used in a regexp context. A field separator is just a regexp with some additional behavior so lets just consider regexps in general to explain what's going on in your example.
The split() function takes a field separator (a regexp for our purposes) as it's third argument so it provides a good test bed:
Using a static regexp:
$ awk '{print split($0,a,/\x0D/)}' file
1
The \ above is escaping the x, it's not a literal \. For that you need to escape the \ itself:
$ awk '{print split($0,a,/\\x0D/)}' file
3
What if we used a dynamic regexp instead of the above static regexp?
$ awk '{print split($0,a,"\x0D")}' file
1
$ awk '{print split($0,a,"\\x0D")}' file
1
$ awk '{print split($0,a,"\\\x0D")}' file
' is not a known regexp operator FNR=1) warning: regexp escape sequence `\
1
$ awk '{print split($0,a,"\\\\x0D")}' file
3
The behavior above is because awk first parses the string to convert it into a regexp (using up one layer of escape chars) and then parses it a second time when using it as a regexp (using up a second layer of escape chars).
Unfortunately when you specify a FS there is no option to specify it as a literal regexp, it's always specified using a string and thus is a dynamic regexp and so needs an extra layer of escaping:
$ awk -v FS='\x0D' '{print NF}' file
1
$ awk -v FS='\\x0D' '{print NF}' file
1
$ awk -v FS='\\\x0D' '{print NF}' file
' is not a known regexp operatorence `\
1
$ awk -v FS='\\\\x0D' '{print NF}' file
3
Now - what if you were using the wrong type of quotes in the shell part of the script, i.e. " instead of '? Then you introduce even more pain because now you're inviting the shell to also parse the string even before awk gets to see and parse it twice:
$ awk -v FS="\\\\x0D" '{print NF}' file
1
$ awk -v FS="\\\\\x0D" '{print NF}' file
' is not a known regexp operatorence `\
1
$ awk -v FS="\\\\\\x0D" '{print NF}' file
' is not a known regexp operatorence `\
1
$ awk -v FS="\\\\\\\x0D" '{print NF}' file
3
That's different from the case where the double quotes are using inside awk because that's all wrapped inside single quotes and so protected from the shell already:
$ awk 'BEGIN{FS="\\\\x0D"} {print NF}' file
3
So - in the shell always use the most restrictive quotes (' over " over none) unless you have a very specific reason not to, and when using regexps or field separators always use literal /.../ rather than dynamic "...", again unless you have a very specific reason not to.
The odd, truncated looking error message above are because of the \rs the tool is trying to print due to the escape sequence we're providing, they're really all warning: regexp escape sequence '\^M' is not a known regexp operator

You need two backslashes for a literal backslash since \ is an escape character:
$ echo 'AZE D11/879\x0Dabc\x0D\x0A\x1E!DEF F11/999' |
awk 'BEGIN{ FS="\\\\x0D" } { print NF }'
3

Why I can't use as delimiter in awk the string "?B?"

By running the following I am getting as a result the string "utf-8"
I thought that with this command I would had string "tralala" returned
echo "=?utf-8?B?tralala" | awk -F "?B?" '{print $2 }'
Why is that?
What delimiter should I use in order to get the string "tralala" ?

? is a regex metacharacter that means zero or one matches of the preceding atom. (I'm surprised awk didn't complain about the one at the start but .)
Try echo "=?utf-8?B?tralala" | awk -F '\\?B\\?' '{print $2 }' instead.

Awk delimiters are NOT strings, they are "Field Separators" (hence the variable named FS) which are a type of Extended Regular Expression with some additional features (e.g. a single blank char as the field separator when not inside square brackets means separate by all chains of contiguous white space and ignore leading and trailing white space on each record).
The difference between a string, a regular expression, and a field separator are very important to be aware of. You sometimes also see the word "pattern" used - do not use that term, it has no (or too many possible) meaning.
A ? is an RE metacharacter so you need to tell awk not to treat it as such in your case by either of these methods:
$ echo "=?utf-8?B?tralala" | awk -F '[?]B[?]' '{print $2}'
tralala
$ echo "=?utf-8?B?tralala" | awk -F '\\?B\\?' '{print $2}'
tralala
You don't strictly need to do that for the first ? as it's metacharacter functionality is not applicable when it's the first char in an RE:
$ echo "=?utf-8?B?tralala" | awk -F '?B[?]' '{print $2}'
tralala
$ echo "=?utf-8?B?tralala" | awk -F '?B\\?' '{print $2}'
tralala
but IMHO it's best to do it anyway for clarity and future-proofing.

Using pipe character as a field separator

I'm trying different commands to process csv file where the separator is the pipe | character.
While those commands do work when the comma is a separator, it throws an error when I replace it with the pipe:
awk -F[|] "NR==FNR{a[$2]=$0;next}$2 in a{ print a[$2] [|] $4 [|] $5 }" OFS=[|] file1.csv file2.csv
awk "{print NR "|" $0}" file1.csv
I tried, "|", [|], /| to no avail.
I'm using Gawk on windows. What I'm I missing?

You tried "|", [|] and /|. /| does not work because the escape character is \, whereas [] is used to define a range of fields, for example [,-] if you want FS to be either , or -.
To make it work "|" is fine, are you sure you used it this way? Alternativelly, escape it --> \|:
$ echo "he|llo|how are|you" | awk -F"|" '{print $1}'
he
$ echo "he|llo|how are|you" | awk -F\| '{print $1}'
he
$ echo "he|llo|how are|you" | awk 'BEGIN{FS="|"} {print $1}'
he
But then note that when you say:
print a[$2] [|] $4 [|] $5
so you are not using any delimiter at all. As you already defined OFS, do:
print a[$2], $4, $5
Example:
$ cat a
he|llo|how are|you
$ awk 'BEGIN {FS=OFS="|"} {print $1, $3}' a
he|how are

For anyone finding this years later: ALWAYS QUOTE SHELL METACHARACTERS!
I think gawk (GNU awk) treats | specially, so it should be quoted (for awk). OP had this right with [|]. However [|] is also a shell pattern. Which in bash at least, will only expand if it matches a file in the current working directory:
$ cd /tmp
$ echo -F[|] # Same command
-F[|]
$ touch -- '-F|'
$ echo -F[|] # Different output
-F|
$ echo '-F[|]' # Good quoting
-F[|] # Consistent output
So it should be:
awk '-F[|]'
# or
awk -F '[|]'
awk -F "[|]" would also work, but IMO, only use soft quotes (") when you have something to actually expand (or the string itself contains hard quotes ('), which can't be nested in any way).
Note that the same thing happens if these characters are inside unquoted variables.
If text or a variable contains, or may contain: []?*, quote it, or set -f to turn off pathname expansion (a single, unmatched square bracket is technically OK, I think).
If a variable contains, or may contain an IFS character (space, tab, new line, by default), quote it (unless you want it to be split). Or export IFS= first (bearing the consequences), if quoting is impossible (eg. a crazy eval).
Note: raw text is always split by white space, regardless of IFS.

Try to escape the |
echo "more|data" | awk -F\| '{print $1}'
more

You can escape the | as \|
$ cat test
hello|world
$ awk -F\| '{print $1, $2}' test
hello world

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

what does the field separator in awk do here? - awk

In this context, how does the specified field separator work? awk -F\' '{print "conn kill "$2"\nrepair mailbox "$2" repair=1"}'

Related

how to use "," as field delimiter [duplicate]

How to extract string from a file in bash

Recognising backslash in awk field separator

Why I can't use as delimiter in awk the string "?B?"

Using pipe character as a field separator

Categories

Resources