Generating 10 random numbers in a range in an awk script - awk

So I'm trying to write an awk script that generates passwords given random names inputted from a .csv file. I'm aiming to do first 3 letters of last name, number of characters in the fourth field, then a random number between 1-200 after a space. So far I've got the letters and num of characters fine, but am having a hard time getting the syntax in my for loop to work for the random numbers. Here is an example of the input:
Danette,Suche,Female,"Kingfisher, malachite"
Corny,Chitty,Male,"Seal, southern elephant"
And desired output:
Suc21 80
Chi23 101
For 10 rows total. My code looks like this:
BEGIN{
FS=",";OFS=","
}
{print substr($2,0,3)length($4)
for(i=0;i<10;i++){
echo $(( $RANDOM % 200 ))
}}
Then I've been running it like
awk -F"," -f script.awk file.csv
But it only shows the 3 characters and length of fourth field, no random numbers. If anyone's able to point out where I'm screwing up it would be much appreciated , thanks guys

You can use rand() to generate a random number between 0 and 1:
awk -F, '{print substr($2,0,3)length($4),int(rand()*200)+1}' file.csv

BEGIN{
FS=",";OFS=","
}
{print substr($2,0,3)length($4)
for(i=0;i<10;i++){
echo $(( $RANDOM % 200 ))
}}
There is not echo function defined in GNU AWK, if you wish to use shell command you might use system function, however keep in mind that it does return status code and does print what said command output, without ability to alter it, so you need to design command so you get desired output from it.
Let file.txt content be
A
B
C
then
awk '{printf "%s ",$0;system("echo ${RANDOM}%200 | bc")}' file.txt
might give output
A 95
B 139
C 1
Explanation: firstly I use printf so no newline is appended automatically, I output whole line followed by space, then I execute command which does output random value in range
echo ${RANDOM}%200 | bc
it does simply ram RANDOM followed by %200 into calculator, which does output result of such action.
If you are not dead set on using RANDOM variable, then rand function, might be use without hassle.
(tested with gawk 4.2.1 and bc 1.07.1)

Related

How to compare digits after find with awk, egrep

i have some file.txt where is a lot of information. Input in file looks like:
<ss>283838<ss>
.
.
<ss>111 from 4444<ss>
.
<ss>255<ss>
The numbers can have any number of digits.
I need to find and compare these 2 numbers
If they equal print name of file and print that they are equal if not, reverse meneaning. Only one string in file have digits with word "from" between
I tried to do like
Awk '/[0-9]+ from./ {print $0} file.txt | egrep -o '[0-9]+'
With this command i get those two digits, but i im stacked now, and do not know how to compare them
With your shown samples, could you please try following. Simple explanation would be: getting respective values of digits by regex and then comparing them to check 3 cases either they are greater, lesser or equal to each other, will add detailed explanation in sometime.
awk '
match($0,/<[a-zA-Z]+[0-9]+/){
val1=substr($0,RSTART,RLENGTH)
gsub(/[^0-9]*/,"",val1)
match($0,/[0-9]+[a-zA-Z]+>/)
val2=substr($0,RSTART,RLENGTH)
gsub(/[^0-9]*/,"",val2)
if(val1>val2){
print "val1("val1 ")is Greater than val2("val2")"
}
if(val2>val1){
print "val2("val2 ")is Greater than val1("val1")"
}
if(val1==val2){
print "val1("val1 ")is equals to val2("val2")"
}
}' Input_file
For your current shown sample output will be as follows:
val2(333)is Greater than val1(222)

Bash script process csv file line by line while updateing $6 with different value but keeping other values unchanged

I am beginner at bash scripting and I have been trying to fix this for more than 8 hours.
I have searched on StackOwerflow and tried the answers to fit my needs, but without success.
I want to use bash script to change csv file's date value to current date.
I am using a dummy .csv file ( http://eforexcel.com/wp/wp-content/uploads/2017/07/100-Sales-Records.zip ) and I want to change the 6th value (date) to the current date.
What I have been doing so far:
I have created one line csv to test the script
cat oneline.csv:
Australia and Oceania,Tuvalu,Baby Food,Offline,H,5/28/2010,669165933,6/27/2010,9925,255.28,159.42,2533654.00,1582243.50,951410.50
then I have tested the one line script:
echo `cat oneline.csv | awk -F, '{ print $1"," $2"," $3"," $4"," $5","}'` `date` `cat oneline.csv |awk -F, '{print $7"," $8"," $9"," $10"," $11"," $12"," $13"," $14"\n"}'
then I have this code for the whole 100 line files in source.sh:
#I want to change 6th value for every line of source.csv to current date and keep the rest and export it to output.csv
while read
do
part1=$(`cat source.csv | awk -F, '{ print $1"," $2"," $3"," $4"," $5","}'`)
datum=$(`date`)
part2=$(`cat source.csv |awk -F, '{print $7"," $8"," $9"," $10"," $11"," $12"," $13"," $14"\n"}'`)
echo `$part1 $datum $part2`
done
and I expect to run the command like ./source.sh > output.csv
What I want for the full 100 lines file is to have result like:
Food,Offline,H,Thu Jan 17 06:34:03 EST 2019,669165933,6/27/2010,9925,255.28,159.42,2533654.00,1582243.50,951410.50
Could you guide me how to change the code to get the result?
Refactor everything to a single Awk script; that also avoids the echo in backticks.
awk -v datum="$(date)" -F , 'BEGIN { OFS=FS }
{ $6 = datum } 1' source.csv >output.csv
Briefly, we split on comma (-F ,) and replace the value of the sixth field with the value of the variable we passed in with -v. OFS=FS sets the output field separator to the input field separator (comma). Then the 1 means "print unconditionally".
Generally speaking, you should probably avoid while read.
Tangentially, your quoting looks wacky; you don't want backticks around $part1 unless it is a command you want the shell to run (which in turn is probably a bad idea in itself). Also, backticks have long been deprecated in favor of $(command) syntax which is more legible and offers some syntactic advantages.

awk to split and run a calculation in command

This is for my own learning, but lets say I have the below input file that before I run an awk command needs to split $5 before the -. Basically, I am summing all matching $5 strings by using $3-$2, outputting the lines and the total, but without a split they are all different. I can split the file before but I am curious if I can do everything in one awk. The commandd works on a file if it is split before the awk is run. Thank you :).
input
chr1 955543 955763 chr1:955543-955763 AGRN-6|gc=75
chr1 957571 957852 chr1:957571-957852 AGRN-7|gc=61.2
awk
awk '{split($5,a,"-"); a[1]} {c1[$a1]++; c2[$a1]+=($3-$2)}
END{for (e in c1) print e, c1[e], c2[e]}' input > out
** current output** (without the split)
AGRN-6 220
AGRN-7 281
desired output
AGRN 2 501
The only problem I see with your script is the references to c1[$a1] and c2[$a1]. Remember that the dollar sign is NOT an indicator of a string, you should think of it more of a selector or an array whose index are the positions of fields on the line.
So that means that $a1 is not the value of the varliable a1, but rather the value in the field in the a1 variable. To demonstrate:
$ echo "one two three" | awk '{ n=2; print $n }'
Simply remove the extra dollar signs and you should be good to go.
Incidentally, I don't get the same output as you when I run the incorrect script. Instead, I get an error:
awk: illegal field $(), name "a1"
input record number 1, file inp1
source line number 1
I'm using BSD awk. I don't get the error when I run your script with GNU awk (gawk). If you'll be doing a lot of awk programming, I suggest you pick up another awk or two just to see how different implementations parse your code, when things don't run as expected.

Can I speed up AWK program using NR function

I am using awk to pull out data form a file that us +30M records. I know within a few 1000 records where the records I want are. I am curious if I can cut down on the time it take awk to find the records by telling it a starting point setting the NR. for example, my record is >25 million lines in I could use the following:
awk 'BEGIN{NR=25000000}{rest of my script}' in
would this make awk skip straight to the 25M record and save me the time of it scanning each record before that?
For a better example, I am using this AWK in a loop in sh. I need the normal output of the awk script, but I would also like it pass along the NR when it finished to the next interation when loop comes back to this script again.
awk -v n=$line -v r=$record 'BEGIN{a=1}$4==n{print $10;a=2}($4!=n&&a==2){(pass NR out to $record);exit}' in
Nope. Let's try it:
$ cat -n file
1 one
2 two
3 three
4 four
$ awk 'BEGIN {NR=2} {print NR, $0}' file
3 one
4 two
5 three
6 four
Are your records fixed length, or do you know the average line length? If yes, then you can use a language that allows you to open a file and seek to a position. Otherwise you have to read all those lines:
awk -v start=25000000 'NR < start {next} {your program here}' file
To maintain your position between runs of the script, I'd use a language like perl: at the end of the run use tell() to output the current position, say to a file; then at the start of the next run, use seek() to pick up where you left off. Add a check that the starting position is less than the current file size, in case the file was truncated.
One way (Using sed), if you know the line numbers
for n in 3 5 8 9 ....
do
sed -n "${n}p" file |awk command
done
or
sed -n "25000,30000p" file |awk command
Records generally have no fixed size so there is no way for awk but to scan the first part of the file even just to skip them.
Should you want to skip the first part of the input file and you (roughly) know the size to ignore, you can use dd to truncate the input, eg here assuming a record is 80 bytes wide:
dd if=inputfile bs=25MB skip=80 | awk ...
Finally, you can avoid awk to scan the last records by exiting from the awk script when you have hit the end of the interesting zone.

Multiple passes with awk and execution order

Two part question:
Part One:
First I have a sequence AATTCCGG which I want to change to TAAGGCC. I used gsub to change A to T, C to G, G to C and T to A. Unfortunetly awk executes these orders sequentially, so I ended up with AAACCCC. I got around this by using upper and lower case, then converting back to upper case values, but I would like to do this in a single step if possible.
example:
echo AATTCCGG | awk '{gsub("A","T",$1);gsub("T","A",$1);gsub("C","G",$1);gsub("G","C",$1);print $0}'
OUTPUT:
AAAACCCC
Part Two:
Is there a way to get awk to run to the end of a file for one set of instructions before starting a second set? I tried some of the following, but with no success
for the data set
1 A
2 B
3 C
4 D
5 E
I am using the following pipe to get the data I want (Just an example)
awk '{if ($1%2==0)print $1,"E";else print $0}' test | awk '{if ($1%2==0 && $2=="E") print $0}'
I am using a pipe to rerun the program, however I have found that it is quicker if I don't have to rerun the program.
This can be efficiently solved with tr:
$ echo AATTCCGG | tr ATCG TAGC
Regarding part two (this should be a different question, really): no, it is not possible with awk, pipe is the way to go.
for part two, try this command:
awk '{if ($1%2==0)print $1,"E"}' test
Here is a method I have found for the first part of the question using awk. It uses an array and a for loop.
cat sub.awk
awk '
BEGIN{d["G"]="C";d["C"]="G";d["T"]="A";d["A"]="T";FS="";OFS=""}
{for(i=1;i<(NF+1);i++)
{if($i in d)
$i=d[$i]}
}
{print}'
Input/Output:
ATCG
TAGC