How to display the date of each file as the first element of each lines with bash/awk? - awk

I have 7 txt files which are the output of the df -m command on AIX 7.2.
I need to keep only the first column and the second column for one filesystem. So I do that :
cat *.txt | grep hd4 | awk '{print $1","$2}' > test1.txt
And the output is :
/dev/hd4,384.00
/dev/hd4,394.00
/dev/hd4,354.00
/dev/hd4,384.00
/dev/hd4,484.00
/dev/hd4,324.00
/dev/hd4,384.00
Each files are created from the crontab and their filenames are :
df_command-2019-09-03-12:50:00.txt
df_command-2019-08-28-12:59:00.txt
df_command-2019-08-29-12:51:00.txt
df_command-2019-08-30-12:52:00.txt
df_command-2019-08-31-12:53:00.txt
df_command-2019-09-01-12:54:00.txt
df_command-2019-09-02-12:55:00.txt
I would like to keep only the date on the filename, I'm able to do that :
test=df_command-2019-09-03-12:50:00.txt
echo $test | cut -d'-' -f2,3,4
outout :
2019-09-03
But I would like to put each date as the first element of each line of my test1.txt :
2019-08-28,/dev/hd4,384.00
2019-08-29,/dev/hd4,394.00
2019-08-30,/dev/hd4,354.00
2019-08-31,/dev/hd4,384.00
2019-09-01,/dev/hd4,484.00
2019-09-02,/dev/hd4,324.00
2019-09-03,/dev/hd4,384.00
Do you have any idea to do that ?

This awk may do:
awk '/hd4/ {split(FILENAME,a,"-");print a[2]"-"a[3]"-"a[4]","$1","$2}' *.txt > test1.txt
/hd4/ find line with hd4
split(FILENAME,a,"-") splits the filename in to array a split by -
print a[2]"-"a[3]"-"a[4]","$1","$2 print year-month-date, field 1, field 2
> test1.txt to file test1.txt

Date output file : dates.txt
2019-08-20
2019-08-08
2019-08-01
File system data fsys.txt
/dev/hd4,384.00
/dev/hd4,394.00
/dev/hd4,354.00
paste can be used to append the files as columns. Use -d to specify comma as the separator.
paste -d ',' dates.txt fsys.txt

Related

Match 2 columns in 2 files and get another value from the first file

I have 2 csv files which have the following structure:
File 1:
date,keyword,location,page
2019-04-11,ABC,mumbai,http://www.insurers.com
and so on.
File 2:
date,site,market,location,url
2019-05-12,denmark,de ,Frankfurt,http://lufthansa.com
2019-04-11,Netherlands,nl,amsterdam,http://www.insurers.com
The problem is I need to match the dates in both the files as well as the the url. Example:
2019-04-11 and http://www.insurers.com (File 1)
with
2019-04-11 and http://www.insurers.com (File 2)
Edit:
If this condition is satisfied the keyword (ABC) in File 1 should be inserted into the File 2 as the third column(new column).
Expected Output:
date,site,keyword,market,location,url
2019-04-11,Netherlands,ABC,nl,amsterdam,http://www.insurers.com
I have tried putting the dates and urls in a map in java, but there are too many URLs duplicated.
So I am seeking a bash, awk, grep or sed solution.
Thanks.
$ awk '
BEGIN { FS=OFS="," }
NR==FNR { m[$1,(NR>1?$4:"url")]=$2; next }
($1,$5) in m { $2=$2 OFS m[$1,$5]; print }
' file1 file2
date,site,keyword,market,location,url
2019-04-11,Netherlands,ABC,nl,amsterdam,http://www.insurers.com
try gnu sed:
sed -En 's!^([0-9]{4}-[0-9]+-[0-9]+,).+(http://\w.+)!s#^\1([^,]+),[^,]+,\\s*\2#\\1#p!p' File2| sed -Enf - File1 >Result

Awk, order foreach 12 lines to insert query

I have the following script:
curl -s 'https://someonepage=5m' | jq '.[]|.[0],.[1],.[2],.[3],.[4],.[5],.[6],.[7],.[8],.[9],.[10],.[11],.[12]' | perl -p -e 's/\"//g' |awk '/^[0-9]/{print; if (++onr%12 == 0) print ""; }'
This is part of result:
1517773500000
0.10250100
0.10275700
0.10243500
0.10256600
257.26700000
1517773799999
26.38912220
1229
104.32200000
10.70579910
0
1517773800000
0.10256600
0.10268000
0.10231600
0.10243400
310.64600000
1517774099999
31.83806883
1452
129.70500000
13.29758266
0
1517774100000
0.10243400
0.10257500
0.10211800
0.10230000
359.06300000
1517774399999
36.73708621
1296
154.78500000
15.84041910
0
I want to insert this data in a MySQL database. I want for each line this result:
(1517773800000,0.10256600,0.10268000,0.10231600,0.10243400,310.64600000,1517774099999,31.83806883,1452,129.70500000,13.29758266,0)
(1517774100000,0.10243400,0.10257500,0.10211800,0.10230000,359.06300000,151774399999,36.73708621,1296,154.78500000,15.84041910,0)
I need merge lines each 12 lines, any can help me for get this result.
Here's an all-jq solution:
.[] | .[0:12] | #tsv | gsub("\t";",") | "(\(.))"
In the sample, all the subarrays have length 12, so you might be able to drop the .[0:12] part of the pipeline. If using jq 1.5 or later, you could use join(“,”) instead of the #tsv|gsub portion of the pipeline. You might, for example, want to consider:
.[] | join(“,”) | “(\(.))”. # jq 1.5 or later
Invocation: use the -r command-line option
Sample output:
(1517627400000,0.10452300,0.10499000,0.10418200,0.10449400,819.50400000,1517627699999,85.57150693,2340,452.63400000,47.27213035,0)
(1517627700000,0.10435700,0.10449200,0.10366000,0.10370000,717.37000000,1517627999999,74.60582079,1996,321.25500000,33.42273846,0)
(1517628000000,0.10376600,0.10390000,0.10366000,0.10370400,519.59400000,1517628299999,53.88836170,1258,239.89300000,24.88613854,0)
$ awk 'BEGIN {RS=""; OFS=","} {$1=$1; $0="("$0")"}1' file
(1517773500000,0.10250100,0.10275700,0.10243500,0.10256600,257.26700000,1517773799999,26.38912220,1229,104.32200000,10.70579910,0)
(1517773800000,0.10256600,0.10268000,0.10231600,0.10243400,310.64600000,1517774099999,31.83806883,1452,129.70500000,13.29758266,0)
(1517774100000,0.10243400,0.10257500,0.10211800,0.10230000,359.06300000,1517774399999,36.73708621,1296,154.78500000,15.84041910,0)
RS="":
Treat groups of lines separated one or more blank lines as a record
OFS=","
Set the output separator to be a ","
$1=$1
Reconstitute the line, replacing the input separators with the output separator
$0="("$0")"
Surround the record with parens
1
Print the record

exclude sequences depending on description ID in AWK

I have fasta files which have some description ID ( isoforms 2 , ... Isoform 9 ), i want to exclude them in fasta files.
I used this command line to see which file contain the isoform 2 to 9 ID :
for i in `ls *.fasta`; do l=`grep 'isoform X[2-9]' $i | head -1`; echo $i $l; done | awk '(NF==1){print}' | head
There is a way to include something in my command line for removing them all ?
Thanks.
sed 's/isoform [2-9]\{1,1\}//g' *.fasta

base64 decoding from file column

I have a file, every line with 6 columns separated by ",". Last column is zipped and encoded in base 64. Output file should be column 3 and column 6(decoded/unzipped).
I tried to do this by
awk -F',' '{"echo "$6" | base64 -di | gunzip" | getline x;print $3,x }' OFS=',' inputfile.csv >outptfile_decoded.csv
The result for the first lines is ok, but after some lines the decode output is the same as the line before. It seems that decoding & unzipping hungs, but I didn't get error message.
Singe decode/unzipping works fine i.e.
echo "H4sIAAAAAAAAA7NJTkuxs0lMLrEztNEHUTZAgcy8tHw7m7zSXLuS1BwrbRNjMzMTc3MDAzMDG32QqE1uSWVBqh2QB2HYlCYX2xnb6IMoG324ASCWHQAaafi1YQAAAA==" | base64 -di | gunzip
What can be the reason for this effect? (there are no error messages).
Is there another way which works reliable?
without a test case difficult to recommend anything. Here is a working script with input data
create a test data file
$ while read f; do echo $f,$(echo $f | gzip -f | base64); done < <(seq 5) | tee file.g
1,H4sIAJhBuVkAAzPkAgBT/FFnAgAAAA==
2,H4sIAJhBuVkAAzPiAgCQr3xMAgAAAA==
3,H4sIAJhBuVkAAzPmAgDRnmdVAgAAAA==
4,H4sIAJhBuVkAAzPhAgAWCCYaAgAAAA==
5,H4sIAJhBuVkAAzPlAgBXOT0DAgAAAA==
and decode
$ awk 'BEGIN {FS=OFS=","}
{cmd="echo "$2" | base64 -di | gunzip"; cmd | getline v; print $1,v}' file.g
1,1
2,2
3,3
4,4
5,5

print whole variable contents if the number of lines are greater than N

How to print all lines if certain condition matches.
Example:
echo "$ip"
this is a sample line
another line
one more
last one
If this file has more than 3 lines then print the whole variable.
I am tried:
echo $ip| awk 'NR==4'
last one
echo $ip|awk 'NR>3{print}'
last one
echo $ip|awk 'NR==12{} {print}'
this is a sample line
another line
one more
last one
echo $ip| awk 'END{x=NR} x>4{print}'
Need to achieve this:
If this file has more than 3 lines then print the whole file. I can do this using wc and bash but need a one liner.
The right way to do this (no echo, no pipe, no loops, etc.):
$ awk -v ip="$ip" 'BEGIN{if (gsub(RS,"&",ip)>2) print ip}'
this is a sample line
another line
one more
last one
You can use Awk as follows,
echo "$ip" | awk '{a[$0]; next}END{ if (NR>3) { for(i in a) print i }}'
one more
another line
this is a sample line
last one
you can also make the value 3 configurable from an awk variable,
echo "$ip" | awk -v count=3 '{a[$0]; next}END{ if (NR>count) { for(i in a) print i }}'
The idea is to store the contents of the each line in {a[$0]; next} as each line is processed, by the time the END clause is reached, the NR variable will have the line count of the string/file you have. Print the lines if the condition matches i.e. number of lines greater than 3 or whatever configurable value using.
And always remember to double-quote the variables in bash to avoid undergoing word-splitting done by the shell.
Using James Brown's useful comment below to preserve the order of lines, do
echo "$ip" | awk -v count=3 '{a[NR]=$0; next}END{if(NR>3)for(i=1;i<=NR;i++)print a[i]}'
this is a sample line
another line
one more
last one
Another in awk. First test files:
$ cat 3
1
2
3
$ cat 4
1
2
3
4
Code:
$ awk 'NR<4{b=b (NR==1?"":ORS)$0;next} b{print b;b=""}1' 3 # look ma, no lines
[this line left intentionally blank. no wait!]
$ awk 'NR<4{b=b (NR==1?"":ORS)$0;next} b{print b;b=""}1' 4
1
2
3
4
Explained:
NR<4 { # for tghe first 3 records
b=b (NR==1?"":ORS) $0 # buffer them to b with ORS delimiter
next # proceed to next record
}
b { # if buffer has records, ie. NR>=4
print b # output buffer
b="" # and reset it
}1 # print all records after that