Splitting awked string in data format - awk

I need to parse awked string:
dd.mm.yyyy %H:%M:%S into yyyy-mm-dd %H:%M:%S
example:
10.04.2017 10:15:05 into 2017-04.10 10:15:05
I need awk because file is big, and only one column is data with "|" delimiter. Column number $3.
Tried splitting, but I'm stuck on:
awk -F"|" '{split($3,data," "}' | awk '(split(data[3],data2,"."}
cannot get data from data to print variables in necessary order.

Simple awk
echo "10.04.2017 10:15:05" | awk -F"[. ]" '{print $3"-"$2"."$1,$4}
2017-04.10 10:15:05
Setting space and dot as field separator and print it out.
If its middle of something, you may need to find and tweak more.

If you have a variable string in awk which reads '10.04.2017 10:14:05' you can split it with field separators and rebuild it:
split(string,a,'[. ]')
string_new = a[3]"-"a[2]"-"a[1]" "a[4]

If your date in the 3rd field.
Maybe all you need it to replace . to - in the 3rd field. Something like this:
awk -F "|" '{gsub("\.","-",$3)}1'
Please provide some sample input and expected output.

The following command changes the third field ($3) as per your requirement.
awk -F"|" -v OFS="|" '{
split($3, o, "[. ]" );
$3 = o[3] "-" o[2] "-" o[1] " " o[4];
print
}'
For the input dfdsfsdg| kljgslfdjgl|10.04.2017 10:15:05 the output is dfdsfsdg| kljgslfdjgl|2017-04-10 10:15:05.

Related

awk: counting fields in a variable

Given a string like {running_db_nodes,[ejabberd#host002,ejabberd#host001]}, , how could the number of comma-delimited strings in square brackets be counted?
The useful substring can be extracted with gensub:
awk '/running_db_nodes/ {print gensub(/ {running_db_nodes,\[(.*)\]},/, "\\1", 1)}' .
A naive approach with NF gets fields from the original input string:
awk -F, '/running_db_nodes/ {nodes=gensub(/ {running_db_nodes,\[(.*)\]},/, "\\1", 1); print NF}'
How could the number of fields in a variable like nodes in the last example be extracted?
You can set your FS to characters [ and ], then split your $2 to an array and capture the count of elements returned from split():
echo "{running_db_nodes,[ejabberd#host002,ejabberd#host001]}," |
awk -F"[][]" '{print split($2,a,",")}'
2
With your shown samples only and with shown attempts please try following awk code.
echo "{running_db_nodes,[ejabberd#host002,ejabberd#host001]}," |
awk '
{
gsub(/.*\[|\].*$/,"")
print gsub(/,/,"&")+1
}
'
Explanation: Simple explanation would be:
gsub(/.*\[|\].*$/,""): Globally substituting everything from starting to till [ AND substituting from [ to till end of value with NULL in current line.
print gsub(/,/,"&")+1: Globally substituting , with itself(just to count it) and adding 1 to it and printing it as pre requirement.
A naive approach with NF gets fields from the original input string
gensub does not change string it is working on, you might use sub (or gsub) which will alter string it is working at which will alter relevant built-in variables values that is
echo "{running_db_nodes,[ejabberd#host002,ejabberd#host001]}" | awk 'BEGIN{FS=","}{sub(/^.*\[/,"");sub(/].*$/,"");print NF}'
gives output
2
Explanation: use sub to delete everything before [ and [, then ] and everything behind it, print number of fields.
(tested in GNU Awk 5.0.1)

Awk Field number of matched pattern

I was wondering if there's a built in command in awk to get the field number of the phrase that you just matched.
Banana is yellow.
awk {
/yellow/{ for (i=1;i<=NF;i++) if($i ~/yellow/) print $i}'
Is there a way to avoid writing the loop?
Your command doesn't work when I test it. Here's my version:
echo "banana is yellow" | awk '{for (i=1;i<=NF;i++) if($i ~/yellow/) print i}'
The output is :
3
As far as I know, there's no such built-in feature, to improve your command, the pattern match /yellow/ at the beginning is not necessary, and also $i will print the matching field other than the field number that you need.
Alternatively, you can use an array to store each field and its corresponding index number, and then print field by arr["yellow"]
If the input string is a oneline string you can set the record delimiter to the field delimiter. Doing so you can use NR to print the position:
awk 'BEGIN{RS=FS}/yellow/{print NR}' <<< 'banana is yellow'
3

how to insert new row in 1st position with single quotes with awk

I got very limited knowledge with awk.
I got big csv files (500.000 lines) with following lines format:
'0000011197118123','136',,'35993706', '33745', '22052', 'appsflyer.com'
'0000011194967123','136',,'35282806', '74518', '30317', 'crashlytics.com'
'0000011199022123’,’139',,'01363100', '8776250', '373671', 'whatsapp.com'
............
I need to cut first 8 digit from first column and add date field, as a new first column, (date should be the day-1 date) like following:
'2016/03/12','97118123','136',,'35993706','33745','22052','appsflyer.com'
'2016/03/12','94967123','136',,'35282806','74518','30317','crashlytics.com'
'2016/03/12','99022123’,’139',,'01363100','8776250','373671','whatsapp.com'
Thanks a lot for your time.
M.Tave
You can do something similar to:
awk -F, -v date="2016/03/12" 'BEGIN{OFS=FS}
{sub(/^.{8}/, "'\''", $1)
s="'\''"date"'\''"
$1=s OFS $1
print }' csv_file
I did not understand how you a determining your date, so i just used a string.
Based on comments, you can do:
awk -v d="2016/03/12" 'sub(/^.{8}/,"'\''"d"'\'','\''")' csv_file
$ awk -v d='2016/03/12' '{print "\047" d "\047,\047" substr($0,10)}' file
'2016/03/12','97118123','136',,'35993706', '33745', '22052', 'appsflyer.com'
'2016/03/12','94967123','136',,'35282806', '74518', '30317', 'crashlytics.com'
'2016/03/12','99022123’,’139',,'01363100', '8776250', '373671', 'whatsapp.com'

Awk editing with field delimiter

Imagine if you have a string like this
Amazon.com Inc.:181,37:184,22
and you do awk -F':' '{print $1 ":" $2 ":" $3}' then it will output the same thing.
But can you declare $2 in this example so it only outputs 181 and not ,37?
Thanks in advance!
You can change the field separator so that it contains either : or ,, using a bracket expression:
awk -F'[:,]' '{ print $2 }' file
If you are worried that , may appear in the first field (which will break this approach), you could use split:
awk -F: '{ split($2, a, /,/); print a[1] }' file
This splits the second field on the comma and then prints the first part. Any other fields containing a comma are unaffected.

replacing the `'` char using awk

I have lines with a single : and a' in them that I want to get rid of. I want to use awk for this. I've tried using:
awk '{gsub ( "[:\\']","" ) ; print $0 }'
and
awk '{gsub ( "[:\']","" ) ; print $0 }'
and
awk '{gsub ( "[:']","" ) ; print $0 }'
non of them worked, but return the error Unmatched ".. when I put
awk '{gsub ( "[:_]","" ) ; print $0 }'
then It works and removes all : and _ chars. How can I get rid of the ' char?
tr is made for this purpose
echo test\'\'\'\':::string | tr -d \':
teststring
$ echo test\'\'\'\':::string | awk '{gsub(/[:\47]*/,"");print $0}'
teststring
This works:
awk '{gsub( "[:'\'']","" ); print}'
You could use:
Octal code for the single quote:
[:\47]
The single quote inside double quotes, but in that case special
characters will be expanded by the shell:
% print a\': | awk "sub(/[:']/, x)"
a
Use a dynamic regexp, but there are performance implications related
to this approach:
% print a\': | awk -vrx="[:\\\']" 'sub(rx, x)'
a
With bash you cannot insert a single quote inside a literal surrounded with single quotes. Use '"'"' for example.
First ' closes the current literal, then "'" concatenates it with a literal containing only a single quote, and ' reopens a string literal, which will be also concatenated.
What you want is:
awk '{gsub ( "[:'"'"']","" ) ; print $0; }'
ssapkota's alternative is also good ('\'').
I don't know why you are restricting yourself to using awk, anyways you've got many answers from other users. You can also use sed to get rid of " :' "
sed 's/:\'//g'
This will also serve your purpose. Simple and less complex.
This also works:
awk '{gsub("\x27",""); print}'
simplest
awk '{gsub(/\047|:/,"")};1'