how to call awk on excel in perl - awk

I am trying to use variable or system command to call an awk (dealing with csv file)
the awk command is
awk -F "\"*,\"*" '{if (\$6 == " ADMCHG") print \$0}' $output_dir/$userfile > $output_dir/$userfile.ADMCHG.
It works.
But, if I use variable or system command to call this awk command
$result = `awk -F "\"*,\"*" '{if ($6 == " ADMCHG") print $0}' "$output_dir/$userfile" > "$output_dir/$userfile.ADMCHG"`;
or
system ("awk -F "\"*,\"*" '{if (\$6 == " ADMCHG") print \$0}' $output_dir/$userfile > $output_dir/$userfile.ADMCHG");
I guess the problem is awk -F "\"*,\"*" , how can I do to fix it?
Using AWK on CSV Files

Once I ignored your confusing title, I think understood your problem -- you are failing to escape your quote marks, so your string is getting split up. A few things to consider:
You can switch between " and', So for example:
"this is a ' string with some ' single quotes in it" but
"this is two " strings "with a word in between"
BUT if you use double quotes, then variables with a $ in front will be evaluated
You can use \ to escape things, so " \" " is one string. You can add
Most likely, you want single quotes:
system ('awk -F "\"*,\\"*" \'{if (\$6 == " ADMCHG") print \$0}\' $output_dir/$userfile > $output_dir/$userfile.ADMCHG');
Note that I have switched the quotes enclosing the string to single quotes, and escaped the 's using \ . I also escaped the literal \ with another \, which is why there is a \\.
Look here for a more detailed explanation.

Related

How to use filenames having special characters with awk '{system("stat " $0)}'

For example, list.txt is like this:
abc.txt
-abc.txt
I couldn't get the correct answer with either
awk '{system("stat " $0)}' list.txt or awk '{system("stat \"" $0 "\"")}' list.txt.
How could I tell the awk-system to add quotes around the filename?
awk '{system("stat " $0)}' list.txt certainly would not work.
But why awk '{system("stat \"" $0 "\"")}' list.txt wouldn't either? It behaves just like the former.
But with awk '{system("stat \\\"" $0 "\\\"")}' list.txt, I would got this:
stat: cannot stat '"abc.txt"': No such file or directory
First of all, if you want to get the output of the stat command, system() is not the right way to go. It merely returns the return code instead of the execution output.
You may try cmd | getline myOutput in awk. The myOutput variable will hold the output (one line only). Or, you can write on a pipe print ... | cmd to print the output
Regarding your file -abc.txt. Quoting it isn't enough. You can try to execute it in terminal stat "-abc.txt" it won't work, as the filename starts with -. You need to add --: stat -- "-abc.txt" So, you probably want to check if the filename starts with - and add the -- in your awk code.
Finally, about quote, you can declare an awk variable, like awk -v q='"' '{.... then, when you want to use ", you use q, in this way, your code may easier to read. E.g., print "stat " q "myName" q

how to use "," as field delimiter [duplicate]

This question already has answers here:
Escaping separator within double quotes, in awk
(3 answers)
Closed 1 year ago.
i have a file like this:
"1","ab,c","def"
so only use comma a field delimiter will get wrong result, so i want to use "," as field delimiter, i tried like this:
awk -F "," '{print $0}' file
or like this:
awk -F "","" '{print $0}' file
or like this:
awk -F '","' '{print $0}' file
but the result is incorrect, don't know how to include "" as part of the field delimiter itself,
If you can handle GNU awk, you could use FPAT:
$ echo '"1","ab,c","def"' | # echo outputs with double quotes
gawk ' # use GNU awk
BEGIN {
FPAT="([^,]*)|(\"[^\"]+\")" # because FPAT
}
{
for(i=1;i<=NF;i++) # loop all fields
gsub(/^"|"$/,"",$i) # remove leading and trailing double quotes
print $2 # output for example the second field
}'
Output:
ab,c
FPAT cannot handle RS inside the quotes.
What you are attempting seems misdirected anyway. How about this instead?
awk '/^".*"$/{ sub(/^\"/, ""); sub(/\"$/, ""); gsub(/\",\", ",") }1'
The proper solution to handling CSV files with quoting in them is to use a language which has an actual CSV parser. My thoughts go to Python, which includes a csv module in its standard library.
In GNU AWK
{print $0}
does print whole line, if no change were made original line is printed, no matter what field separator you set you will get original lines if only action is print $0. Use $1=$1 to trigger string rebuild.
If you must do it via FS AT ANY PRICE, then you might do it as follows: let file.txt content be
"1","ab,c","def"
then
BEGIN{FS="\x22,?\x22?"}{$1=$1;print $0}
output
1 ab,c def
Note leading space (ab,c is $3). Explanation: I inform GNU AWK that field separator is literal " (\x22, " is 22(hex) in ASCII) followed by zero or one (?) , followed by zero or one (?) literal " (\x22). $1=$1 trigger line rebuilt as mentioned earlier. Disclaimer: this solution assume that you never have escaped " inside your string,
(tested in gawk 4.2.1)

Using AWK or SED to prepend a single_quote to each line in a file [duplicate]

This question already has answers here:
How can I prepend a string to the beginning of each line in a file?
(9 answers)
Closed 3 years ago.
I am working with ffmpeg and want to automate making one big video from many smaller. I can use a list file, but each line Must Be file 'name.ext'. I can't figure out how to get sed or awk to NOT see the ' as a control character. Any way to do that?
I have tried using a variable as instead of the string file ', tried a two statement script where i set file # then use another cmd to change the # to ', but it fails every time
awk '{print "line #" $0}' uselessInfo.txt >goofy2.txt
sed '/#/\'/g' goofy2.txt >goofy3.txt
tried the sed line with " around the ' also
Neither sed nor awk are seeing ' as a control character. In fact they aren't seeing the ' at all in the code you posted - the shell doesn't allow you to use single quotes inside a single quote delimited script.
Your question isn't clear. Is this what you're trying to do?
$ echo 'foo' | awk '{print "file \047" $0 "\047"}'
file 'foo'
$ echo 'foo' | sed 's/.*/file '\''&'\''/'
file 'foo'
$ echo 'foo' | sed "s/.*/file '&'/"
file 'foo'
If not then edit your question to clarify and provide a concrete example we can test against.
If you just want to add a single quote at the front of each line:
> cat test.txt
a
b
c
> sed "s/^/'/" test.txt
'a
'b
'c
You can then output this to whatever file you wish as in your example.
The solution to your problem, I believe, lies in the fact that characters within single quotes on the command line are not interpreted. This means that when you try to escape a quote inside a single quoted string, this will not work and you just get a backslash in your string. Comparing this to a string bound by double quotes, we see that the backslash is interpreted before being passed to the command echo as an argument.
> echo '\'
\
> echo "\""
"
another useful trick is defining single quote as a variable
$ echo "part 1" | awk -v q="'" '{print "line " q $0 q}'
line 'part 1'

Using awk to filter a CSV file with quotes in it

I have a text file with comma separated values.
A sample line can be something like
"Joga","Bonito",7,"Machine1","Admin"
The " seen are part of the text and are needed when this csv gets converted back to a java object.
I want to filter out some lines from this file based on some field in the csv.
The following statement doesnt work.
awk -F "," '($2== "Bonito") {print}' filename.csv
I am guessing that this has something to do with the " appearing in the text.
I saw an example like:
awk -F "\"*,\"*"
I am not sure how this works. It looks like a regex, but the use of the last * flummoxed me.
Is there a better option than the last awk statement I wrote?
How does it work?
Since some parameters have double quotes and other not, you can filter with a quoted parameter:
awk -F, '$2 == "\"Bonito\""' filename.csv
To filter on parameter that do not have double quote, just do:
awk -F, '$3 == 7' filename.csv
Another way is to use the double quote in the regex (the command ? that make the double quote optional):
awk -F '"?,"?' '$2 == "Bonito"' filename.csv
But this has a drawback of also matching the following line:
"Joga",Bonito",7,"Machine1","Admin"
First a bit more through test file:
$ cat file
"Joga","Bonito",7,"Machine1","Admin"
"Joga",Bonito,7,"Machine1","Admin"
Using regex ^\"? ie. starts with or without a double quote:
$ awk -F, '$2~/^\"?Bonito\"?$/' file
"Joga","Bonito",7,"Machine1","Admin"
"Joga",Bonito,7,"Machine1","Admin"

How to use variable including special symbol in awk?

For my case, if a certain pattern is found as the second field of one line in a file, then I need print the first two fields. And it should be able to handle case with special symbol like backslash.
My solution is first using sed to replace \ with \\, then pass the new variable to awk, then awk will parse \\ as \ then match the field 2.
escaped_str=$( echo "$pattern" | sed 's/\\/\\\\/g')
input | awk -v awk_escaped_str="$escaped_str" '$2==awk_escaped_str { $0=$1 " " $2 " "}; { print } '
While this seems too complicated, and cannot handle various case.
Is there a better way which is more simpler and could cover all other special symbol?
The way to pass a shell variable to awk without backslashes being interpreted is to pass it in the arg list instead of populating an awk variable outside of the script:
$ shellvar='a\tb'
$ awk -v awkvar="$shellvar" 'BEGIN{ printf "<%s>\n",awkvar }'
<a b>
$ awk 'BEGIN{ awkvar=ARGV[1]; ARGV[1]=""; printf "<%s>\n",awkvar }' "$shellvar"
<a\tb>
and then you can search a file for it as a string using index() or ==:
$ cat file
a b
a\tb
$ awk 'BEGIN{ awkvar=ARGV[1]; ARGV[1]="" } index($0,awkvar)' "$shellvar" file
a\tb
$ awk 'BEGIN{ awkvar=ARGV[1]; ARGV[1]="" } $0 == awkvar' "$shellvar" file
a\tb
You need to set ARGV[1]="" after populating the awk variable to avoid the shell variable value also being treated as a file name. Unlike any other way of passing in a variable, ALL characters used in a variable this way are treated literally with no "special" meaning.
There are three variations you can try without needing to escape your pattern:
This one tests literal strings. No regex instance is interpreted:
$2 == expr
This one tests if a literal string is a subset:
index($2, expr)
This one tests regex pattern:
$2 ~ pattern