Running awk command in awk script - awk

I am just looking to run a simple script that runs an awk command inside of the awk script.
sample_enrollment.csv file: "EffectiveDate","Status","EmployeeID","ClientID"
Below is the Lab4_1.awk
#!/bin/bash
BEGIN{FS=","}
{
awk 'gsub(/EfectiveDate/, "Effective Date")'
}
I am running the command from the command line like this
awk -f lab4_1.awk sample_enrollment.csv
The error that I am getting seems to indicate that the ' ' in the awk gsub command seem to be wrong. I have tried many variations on this awk command with out any luck. I am just asking for this portion, as I will need to add more to the awk script after I get this done
Any help would be appreciated. Thank you

I don't think there is need for using 2 awk commands here as per your shown effort it could be done in single awk like as follows too.
awk -F, '{gsub(/EfectiveDate/, "Effective Date")} 1' Input_file
As I mentioned in comments too in case you have more requirements you could let us know with samples in code tags in your post and we could help you from there too.
EDIT: As OP mentioned a script is needed so now adding code in a bash script format too.
cat script
#!/bin/bash
awk '{gsub("EffectiveDate","Effective Date")} 1' Input_file
......... do my other stuff too here in bash or awk...........

Related

qsub doesn't recognize awk field variables

Consider this awk script to print column #2 of every line:
awk '{print $2}' a.txt. $2 is not a shell variable, yet when I attempt to submit this code to qsub, $2 is interpreted as such. I.e.
qsub awk '{print $2}' a.txt
results in qsub executing the command
awk '{print }' a.txt
To be clear, I'm not trying to use a shell variable in an awk script; therefore How do I use shell variables in an awk script? is not applicable.
I tried suggestions in Using awk with qsub and issues with quotations, including \$2 and
qsub -- awk '{print $2}' a.txt.
Neither works.
I can certainly put awk in a script and call qsub that way, i.e., qsub awkscript.sh. However, if there's a way to use qsub+awk from the command line, I'd like to learn how.
does a double-layer quoting work, like
qsub 'awk '\''{ print $2 }'\'' a.txt '
RARE Kpop Manifesto below was 99% correct. With a backslash, the whole expression worked like magic :)
qsub 'awk '\''{ print \$2 }'\'' a.txt '

awk set command line options in script

I'm curious about how to set command-line options in awk script, like -F for field separator. I try to write the shebang line like
#!/usr/bin/awk -F ":" -f
and get the following error:
awk: 1: unexpected character '.'
For this example, I can do with
BEGIN {FS=":"}
but I still want to know a way to set all those options. Thanks in advance.
EDIT:
let's use another example that should be easy to test.
inputfile:
1
2
3
4
test.awk:
#!/usr/bin/awk -d -f
{num += $1}
END { print num}
run
/usr/bin/awk -d -f test.awk inputfile
will get 10 and generate a file called awkvars.out with some awk global variables in it.
but
./test.awk inputfile
will get
awk: cmd. line:1: ./test.awk
awk: cmd. line:1: ^ syntax error
awk: cmd. line:1: ./test.awk
awk: cmd. line:1: ^ unterminated regexp
if I remove '-d' from shebang line,
./test.awk inputfile
will normally output 10.
My question is that whether there is a way to write "-d" in test.awk file to generate awkvars.out file?
Answering for the OP question, beyond the setting of FS.
Short Answer: you can not use multiple options with '#!', and since you need to tell awk to read the program from stdin (-f-), you are out of luck.
Long Answer:
When using shebang (#!), there is a limit of single argument (which is passed to the named programs as the 1st argument. So in general:
#! /path/to/prog arg1
input-1
input-2
Will execute /path/to/prog arg1, with the content of the file (including the leading shebang) available as stdin. This is oversimplification, actual rules are more complex., see https://unix.stackexchange.com/questions/87560/does-the-shebang-determine-the-shell-which-runs-the-script
Given this limitation of one argument, when executing awk, the only valid and required parameter is '-f', which indicates that the awk programs is provided on STDIN. You can prepend few other options that do NOT take any argument, for example 'traditional' (e.g., '-Pf-' will force POSIX behavior).
As much as I can tell, all the 'interesting' options (setting FS, RS, ORS, ...) need to be separated from the '-f-' with a space, making it impossible to embed them into the command line, other then using the 'BEGIN { ... }' or similar in the script.
Bottom line, trying #! /usr/bin/awk -f- -F, will attempt to look for program is the same as awk -f' -F', and will look for a file named '- -F`. Usually not very useful, and will not set the FS.
Let's say following is our Input_file, which we are going to use for all mentioned solutions here.
cat Input_file
a,b,c,d
ab,c
1st way of setting Field separator: 1st simple way will be setting FS value in BEGIN section of awk program file. Following is our .awk file.
cat file1.awk
BEGIN{
FS=","
}
{
print $1"..."$2
}
Now when we run the code following output will come:
/usr/local/bin/awk -f file1.awk Input_file
a...b
ab...c
2nd way of setting field separator: 2nd way will be pass FS value before reading Input_file like as follows.
/usr/local/bin/awk -f file.awk FS="," Input_file
Example: Now following is the file.awk file which has awk code.
cat file.awk
{
print $1".."$2
}
Now when we run awk file with awk -f .. command as follows will be result.
/usr/local/bin/awk -f file.awk FS="," Input_file
a..b
ab..c
Which means it is picking up the field separator as , in this above program.
3rd way of setting field separator: We can set field separator in awk -f programs like how we do for usual awk programs using -F',' option as follows.
/usr/local/bin/awk -F',' -f file.awk Input_file
a..b
ab..c
4th way of setting field separator: We could mention field separator as a variable by using -v option on command line while running file.awk script as follows.
/usr/local/bin/awk -v FS=',' -f file.awk Input_file
Never use a shebang to call awk as it robs you of the ability to separate shell arguments into awk arguments and awk variables and do anything else that's better done in shell (e.g. arg parsing with getopts) before calling awk. Just call awk from inside your shell script.
Also, don't name your shell script test.awk as it's a shell script. The fact it's implemented in awk is irrelevant. There's no reason to create a file that you sometimes call as awk file to have awk interpret and other times as just file to have the shell interpret.

How to find the total of second column using awk commands?

Input file(filename:cat)
item1,200
item2,499
item3,699
item4,800
Awk command which i had tried
awk -F"," '{x+=$2}END{print x}'cat
Error
The above command display empty output.Is it any possible way to overcome with any solutions for it.
Edited and Final command
awk -F"," '{x+=$2}END{print x}' cat

Convert sequence list to fasta for multiple files

I have thousands of files, which are a list of sequence names followed by their sequence, one individual per line, something like this:
L.abdalai.LJAMM.14363.SanMartindeLosAndes CCCTAAGAATAATTTGTT
L.carlosgarini.LJAMM.14070.LagunadelMaule CCCTAAGAAT-ATTTGTT
L.cf.silvai.DD.038.Sarco CCCTAAGAAT-ATTTGTT
And I want to change them to fasta format, so looking something like:
>L.abdalai.LJAMM.14363.SanMartindeLosAndes
CCCTAAGAATAATTTGTTCAGAAAAGATATTTAATTATAT
>L.carlosgarini.LJAMM.14070.LagunadelMaule
CCCTAAGAAT-ATTTGTTCAGAAAAGATATTTAATTATAT
>L.cf.silvai.DD.038.Sarco
CCCTAAGAAT-ATTTGTTCAGAAAAGATATTTAATTATAT
I work on a Mac.
Thanks!
Using Perl
perl -pe 's/^/</;s/(\S+)\s+(\S+)/$1\n$2CAGAAAAGATATTTAATTATAT/g ' file
with your inputs
$ cat damien.txt
L.abdalai.LJAMM.14363.SanMartindeLosAndes CCCTAAGAATAATTTGTT
L.carlosgarini.LJAMM.14070.LagunadelMaule CCCTAAGAAT-ATTTGTT
L.cf.silvai.DD.038.Sarco CCCTAAGAAT-ATTTGTT
$ perl -pe 's/^/</;s/(\S+)\s+(\S+)/$1\n$2CAGAAAAGATATTTAATTATAT/g ' damien.txt
<L.abdalai.LJAMM.14363.SanMartindeLosAndes
CCCTAAGAATAATTTGTTCAGAAAAGATATTTAATTATAT
<L.carlosgarini.LJAMM.14070.LagunadelMaule
CCCTAAGAAT-ATTTGTTCAGAAAAGATATTTAATTATAT
<L.cf.silvai.DD.038.Sarco
CCCTAAGAAT-ATTTGTTCAGAAAAGATATTTAATTATAT
$
I believe you simplied your sample input, thus different from your expected output.
If not so, and my solutions not work, please comment under my answer to let me know.
So with awk, you can do it like this:
awk -v OFS="\n" '$1=">" $1' file
>L.abdalai.LJAMM.14363.SanMartindeLosAndes
CCCTAAGAATAATTTGTT
>L.carlosgarini.LJAMM.14070.LagunadelMaule
CCCTAAGAAT-ATTTGTT
>L.cf.silvai.DD.038.Sarco
CCCTAAGAAT-ATTTGTT
If you want to change inplace, please install GNU gawk, and use gawk -i inplace ....
And if you want the line endings be Carriages, add/change to -v ORS="\r" -v OFS="\r"
However, you can also, and maybe it's better to do it with sed:
sed -e 's/\([^[:space:]]*\)[[:space:]]*\([^[:space:]]*\)/>\1\n\2/' file
Add -i'' like this: sed -i'' -e ... to change file inplace.
Could you please try following(created and tested based on your samples, since I don't have mac to didn't test on it).
awk '/^L\./{print ">"$1 ORS $2 "CAGAAAAGATATTTAATTATAT"}' Input_file
Output will be as follows. If needed you could take it to a output_file by appending > output_file to above command too.
>L.abdalai.LJAMM.14363.SanMartindeLosAndes
CCCTAAGAATAATTTGTTCAGAAAAGATATTTAATTATAT
>L.carlosgarini.LJAMM.14070.LagunadelMaule
CCCTAAGAAT-ATTTGTTCAGAAAAGATATTTAATTATAT
>L.cf.silvai.DD.038.Sarco
CCCTAAGAAT-ATTTGTTCAGAAAAGATATTTAATTATAT

Awk script not reading an input file to execute

I'm having trouble finding out how to read in my file into my awk script.
This is what I have so far. Basically, I want to print out the header, and then read in the roster file which then I will edit to the necessary format. However, my problem is just figuring out how to read in the file.
#!/bin/awk -f
BEGIN {print "Last Name:First Name:Student ID:School – Major:Academic Level:ASURITE:Email" "\n" } {print $1,$2} roster
On running this
awk -f script.awk
Last Name:First Name:Student ID:School – Major:Academic Level:ASURITE:Email
^C
This is what I end up with - the file doesn't read in and I have to CTRL-C my way out since it doesn't close.
The idea is right, but the place where you have mentioned the input file roster is wrong. Move it out of the script. You need to understand that awk syntax is always as below
awk <action> <file>
The <action> part could be directly given in the command line or provided from a script using the -f flag. But the <file> argument still needs to be given no-matter which way. Moving it inside the script, makes awk wait for an input to read its standard input but it doesn't get any.
awk -f script.awk roster
You could modify the script.awk to just use awk without -f and use the /usr/bin/env for the shell to get the location of awk to execute
#!/usr/bin/env awk
BEGIN {
print "Last Name:First Name:Student ID:School – Major:Academic Level:ASURITE:Email" "\n"
}
{
print $1,$2
}