how to insert new row in 1st position with single quotes with awk - awk

I got very limited knowledge with awk.
I got big csv files (500.000 lines) with following lines format:
'0000011197118123','136',,'35993706', '33745', '22052', 'appsflyer.com'
'0000011194967123','136',,'35282806', '74518', '30317', 'crashlytics.com'
'0000011199022123’,’139',,'01363100', '8776250', '373671', 'whatsapp.com'
............
I need to cut first 8 digit from first column and add date field, as a new first column, (date should be the day-1 date) like following:
'2016/03/12','97118123','136',,'35993706','33745','22052','appsflyer.com'
'2016/03/12','94967123','136',,'35282806','74518','30317','crashlytics.com'
'2016/03/12','99022123’,’139',,'01363100','8776250','373671','whatsapp.com'
Thanks a lot for your time.
M.Tave

You can do something similar to:
awk -F, -v date="2016/03/12" 'BEGIN{OFS=FS}
{sub(/^.{8}/, "'\''", $1)
s="'\''"date"'\''"
$1=s OFS $1
print }' csv_file
I did not understand how you a determining your date, so i just used a string.
Based on comments, you can do:
awk -v d="2016/03/12" 'sub(/^.{8}/,"'\''"d"'\'','\''")' csv_file

$ awk -v d='2016/03/12' '{print "\047" d "\047,\047" substr($0,10)}' file
'2016/03/12','97118123','136',,'35993706', '33745', '22052', 'appsflyer.com'
'2016/03/12','94967123','136',,'35282806', '74518', '30317', 'crashlytics.com'
'2016/03/12','99022123’,’139',,'01363100', '8776250', '373671', 'whatsapp.com'

Related

Countif like function in AWK with field headers

I am looking for a way of counting the number of times a value in a field appears in a range of fields in a csv file much the same as countif in excel although I would like to use an awk command if possible.
So column 6 should have the range of values and column 7 would have the times the value appears in column 7, as per below
>awk -F, '{print $0}' file3
f1,f2,f3,f4,f5,test
row1_1,row1_2,row1_3,SBCDE,row1_5,SBCD
row2_1,row2_2,row2_3,AWERF,row2_5,AWER
row3_1,row3_2,row3_3,ASDFG,row3_5,ASDF
row4_1,row4_2,row4_3,PRE-ASDQG,row4_5,ASDQ
row4_1,row4_2,row4_3,PRE-ASDQF,row4_5,ASDQ
>awk -F, '{print $6}' file3
test
SBCD
AWER
ASDF
ASDQ
ASDQ
What i want is:
f1,f2,f3,f4,f5,test,count
row1_1,row1_2,row1_3,SBCDE,row1_5,SBCD,1
row2_1,row2_2,row2_3,AWERF,row2_5,AWER,1
row3_1,row3_2,row3_3,ASDFG,row3_5,ASDF,1
row4_1,row4_2,row4_3,PRE-ASDQG,row4_5,ASDQ,2
row4_1,row4_2,row4_3,PRE-ASDQF,row4_5,ASDQ,2
#adds field name count that I want:
awk -F, -v OFS=, 'NR==1{ print $0, "count"}
NR>1{ print $0}' file3
Ho do I get the output I want?
I have tried this from previous/similar question but no joy,
>awk -F, 'NR>1{c[$6]++;l[NR>1]=$0}END{for(i=0;i++<NR;){split(l[i],s,",");print l[i]","c[s[1]]}}' file3
row4_1,row4_2,row4_3,PRE-ASDQF,row4_5,ASDQ,
,
,
,
,
,
very similar question to this one
similar python related Q, for my ref
I would harness GNU AWK for this task following way, let file.txt content be
f1,f2,f3,f4,f5,test
row1_1,row1_2,row1_3,SBCDE,row1_5,SBCD
row2_1,row2_2,row2_3,AWERF,row2_5,AWER
row3_1,row3_2,row3_3,ASDFG,row3_5,ASDF
row4_1,row4_2,row4_3,PRE-ASDQG,row4_5,ASDQ
row4_1,row4_2,row4_3,PRE-ASDQF,row4_5,ASDQ
then
awk 'BEGIN{FS=OFS=","}NR==1{print $0,"count";next}FNR==NR{arr[$6]+=1;next}FNR>1{print $0,arr[$6]}' file.txt file.txt
gives output
f1,f2,f3,f4,f5,test,count
row1_1,row1_2,row1_3,SBCDE,row1_5,SBCD,1
row2_1,row2_2,row2_3,AWERF,row2_5,AWER,1
row3_1,row3_2,row3_3,ASDFG,row3_5,ASDF,1
row4_1,row4_2,row4_3,PRE-ASDQG,row4_5,ASDQ,2
row4_1,row4_2,row4_3,PRE-ASDQF,row4_5,ASDQ,2
Explanation: this is two-pass approach, hence file.txt appears twice. I inform GNU AWK that , is both field separator (FS) and output field separator (OFS), then for first line (header) I print it followed by count and instruct GNU AWK to go to next line, so nothing other is done regarding 1st line, then for first pass, i.e. where global number of line (NR) is equal to number of line in file (FNR) I count number of occurences of values in 6th field and store them as values in array arr, then instruct GNU AWK to get to next line, so onthing other is done in this pass. During second pass for all lines after 1st (FNR>1) I print whole line ($0) followed by corresponding value from array arr
(tested in GNU Awk 5.0.1)
You did not copy the code from the linked question properly. Why change l[NR] to l[NR>1] at all? On the other hand, you should change s[1] to s[6] since it's the sixth field that has the key you're counting:
awk -F, 'NR>1{c[$6]++;l[NR]=$0}END{for(i=0;i++<NR;){split(l[i],s,",");print l[i]","c[s[6]]}}'
You can also output the header with the new field name:
awk -F, -vOFS=, 'NR==1{print $0,"count"}NR>1{c[$6]++;l[NR]=$0}END{for(i=0;i++<NR;){split(l[i],s,",");print l[i],c[s[6]]}}'
One awk idea:
awk '
BEGIN { FS=OFS="," } # define input/output field delimiters as comma
{ lines[NR]=$0
if (NR==1) next
col6[NR]=$6 # copy field 6 so we do not have to parse the contents of lines[] in the END block
cnt[$6]++
}
END { for (i=1;i<=NR;i++)
print lines[i], (i==1 ? "count" : cnt[col6[i]] )
}
' file3
This generates:
f1,f2,f3,f4,f5,test,count
row1_1,row1_2,row1_3,SBCDE,row1_5,SBCD,1
row2_1,row2_2,row2_3,AWERF,row2_5,AWER,1
row3_1,row3_2,row3_3,ASDFG,row3_5,ASDF,1
row4_1,row4_2,row4_3,PRE-ASDQG,row4_5,ASDQ,2
row4_1,row4_2,row4_3,PRE-ASDQF,row4_5,ASDQ,2

Splitting awked string in data format

I need to parse awked string:
dd.mm.yyyy %H:%M:%S into yyyy-mm-dd %H:%M:%S
example:
10.04.2017 10:15:05 into 2017-04.10 10:15:05
I need awk because file is big, and only one column is data with "|" delimiter. Column number $3.
Tried splitting, but I'm stuck on:
awk -F"|" '{split($3,data," "}' | awk '(split(data[3],data2,"."}
cannot get data from data to print variables in necessary order.
Simple awk
echo "10.04.2017 10:15:05" | awk -F"[. ]" '{print $3"-"$2"."$1,$4}
2017-04.10 10:15:05
Setting space and dot as field separator and print it out.
If its middle of something, you may need to find and tweak more.
If you have a variable string in awk which reads '10.04.2017 10:14:05' you can split it with field separators and rebuild it:
split(string,a,'[. ]')
string_new = a[3]"-"a[2]"-"a[1]" "a[4]
If your date in the 3rd field.
Maybe all you need it to replace . to - in the 3rd field. Something like this:
awk -F "|" '{gsub("\.","-",$3)}1'
Please provide some sample input and expected output.
The following command changes the third field ($3) as per your requirement.
awk -F"|" -v OFS="|" '{
split($3, o, "[. ]" );
$3 = o[3] "-" o[2] "-" o[1] " " o[4];
print
}'
For the input dfdsfsdg| kljgslfdjgl|10.04.2017 10:15:05 the output is dfdsfsdg| kljgslfdjgl|2017-04-10 10:15:05.

Using awk pattern to file filter data

I have the folling file(named /tmp/test99) which containd the rows:
"0","15","wall15"
123132,09808098,"0","15"
I am trying to filter the rows that contains "0" in the 3rd place, and "15" in 4th place (like in the second row)
I tried running:
cat /tmp/test99 | awk '/"0","15"/{print>"/tmp/0_15_file.out"} '
but instead of getting only the second row, I get also the first row starting with "0","15".
Could you please help with the pattern ?
Thanks:)
You may check if Fields 3 and 4 are equal to some hardcoded value using
awk -F, '$3=="\"0\"" && $4=="\"15\""'
Set the field separator to a comma and then, if Field 3 is "0" and Field 4 is "15" print the line, else discard.
See the online demo:
s='"0","15","wall15"
123132,09808098,"0","15"'
awk -F, '$3=="\"0\"" && $4=="\"15\""' <<< "$s"
# => 123132,09808098,"0","15"
Could you please try following.(comment on your effort, you need NOT to use cat with awk it could read Input_file by itself)
awk -F, '$3!~/\"0\"/ && $4!~/\"15\"/' Input_file

How to remove 0's from the second column

I have a file that looks like this :
k141_173024,001
k141_173071,002
k141_173527,021
k141_173652,034
k141_173724,041
...
How do I remove 0's from each line of the second field?
The desired result is :
k141_173024,1
k141_173071,2
k141_173527,21
k141_173652,34
k141_173724,41
...
What I've tied was
cut -f 2 -d ',' file | awk '{print $1 + 0} > file2
cut -f 1 -d ',' file > file1
paste file1 file2 > final_file
This was an inefficient way to edit it.
Thank you.
awk 'BEGIN{FS=OFS=","} {print $1 OFS $2+0}' Input.txt
Force to Integer value by adding 0
If it's only the zeros following the comma (,001 to ,1 but ,010 to ,10; it's not remove 0's from the second column but the example doesn't clearly show the requirement), you could replace the comma and zeros with another comma:
$ awk '{gsub(/,0+/,",")}1' file
k141_173024,1
k141_173071,2
k141_173527,21
k141_173652,34
k141_173724,41
Could you please try following.
awk 'BEGIN{FS=OFS=","} {gsub(/0/,"",$2)}1' Input_file
EDIT: To remove only leading zeros try following.
awk 'BEGIN{FS=OFS=","} {sub(/^0+/,"",$2)}1' Input_file
If the second field is a number, you can do this to remove the leading zeroes:
awk 'BEGIN{FS=OFS=","} {print $1 OFS int($2)}' file
As per #Inian's suggestion, this can be further simplified to:
awk -F, -v OFS=, '{$2=int($2)}1' file
This might work for you (GNU sed):
sed 's/,0\+/,/' file
This removes leading zeroes from the second column by replacing a comma followed by one or more zeroes by a comma.
P.S. I guess the OP did not mean to remove zeroes that are part of the number.

awk / split to return lines with a certain value in a certain column - create blocks of 100,000

I have a csv file where the third column is a number. Some of the entries don't have a value in this column.
I want to pull 100k blocks from the file, but only entries with a valid value for that column.
I could use split, but how do I make it check that column for a value?
$ cat test.txt
1,2,3,get me
4,5,,skip me
6,7,8,get me
9,10,11,stop before me
$ awk -F, '$3!="" && ++i<=2' test.txt
1,2,3,get me
6,7,8,get me
If your trying to verify whether or not the third field within a record has a value and output its contents if it does, you could try the following:
awk -F , '{ if($3 != ""){print $3} }'
This could also be written as:
awk -F , '$3 != ""{print $3}'