Awk to remove characters from start of string in specific column - awk

Text as follows:
1.6M 2014-08-21 20:56 ./file1
1.6M 2014-08-22 10:59 ./file2
24K 2014-08-26 10:39 ../file3
0 2014-08-21 14:12 ./file4
0 2014-08-22 18:05 ../file5
1.5M 2014-08-22 04:15 ./file6
8.0K 2014-08-20 20:31 ../file7
I want the output to be time ordered:
8.0K 2014-08-20 20:31 ../file7
0 2014-08-21 14:12 ./file4
1.6M 2014-08-21 20:56 ./file1
1.5M 2014-08-22 04:15 ./file6
1.6M 2014-08-22 10:59 ./file2
0 2014-08-22 18:05 ../file5
24K 2014-08-26 10:39 ../file3
Then I want the leading ./ to be removed but NOT ../ and only size and filename to be printed...
So far I've got:
sort -k 2 | awk 'BEGIN{FS="\t"; OFS="\t"} {gsub(/.\//, ""); print}'
Which gives:
8.0K .file7
0 file4
1.6M file1
1.5M file6
1.6M file2
0 .file5
24K .file3
How can I make gsub only apply to start (first 2 characters) of coloumn 3 so that ../fileX doesn't become .fileX?

I figured it out... was very close :), just needed to slightly alter gsub command
Edited
{gsub(/^\.\//, "", $3)

Nice you figured a way out. Here's my solution with sed.
sort -k 2 tt | sed -r 's| \.{1}/| |'
Example:
sdlcb#Goofy-Gen:~/AMD$ sort -k 2 tt | sed -r 's| \.{1}/| |'
8.0K 2014-08-20 20:31 ../file7
0 2014-08-21 14:12 file4
1.6M 2014-08-21 20:56 file1
1.5M 2014-08-22 04:15 file6
1.6M 2014-08-22 10:59 file2
0 2014-08-22 18:05 ../file5
24K 2014-08-26 10:39 ../file3
Here the idea is to simply remove "./" which is preceded by a space. i.e remove " ./" or in other words, substitute " ./" with " " (space).substitution with space is only to maintain the indentation. -r in sed command is for regular expression support. sed 's|a|b|' means substitute first occurrence of "a" with "b". Thus in our case, substitution happens in 's| .{1}/| |. " .{1}/" means "single space followed by 1 '.' character". dot character needs to be escaped else sed interprets it as any character.

Related

Simple Pattern match with a field and a variable does not seem to work in GAWK/AWK

I am trying to extract all lines where a field matches a pattern which is defined as a variable.
I tried the following
head input.dat |
awk -F '|' -v CODE="39905|19043" '{print $13; if($13~CODE){print "Matched"} else {print "Nomatch"} }'
I am printing the value of the field before attempting a pattern match.(This way I don't have to show the entire line that contains many fields)
This is the output I got.
PLAN_ID
Nomatch
39905
Nomatch
39905
Nomatch
39883
Nomatch
19043
Nomatch
2215
Nomatch
19043
Nomatch
9149
Nomatch
42718
Nomatch
24
Nomatch
I expected to see at least 3 instances of Matched in the output. What am I doing wrong?
edit by #Fravadona
xxd input.dat | head -n 6
00000000: fffe 4d00 4f00 4e00 5400 4800 5f00 4900 ..M.O.N.T.H._.I.
00000010: 4400 7c00 5300 5600 4300 5f00 4400 5400 D.|.S.V.C._.D.T.
00000020: 7c00 5000 4100 5400 4900 4500 4e00 5400 |.P.A.T.I.E.N.T.
00000030: 5f00 4900 4400 7c00 5000 4100 5400 5f00 .I.D.|.P.A.T..
00000040: 5a00 4900 5000 3300 7c00 4300 4c00 4100 Z.I.P.3.|.C.L.A.
00000050: 4900 4d00 5f00 4900 4400 7c00 5300 5600 I.M._.I.D.|.S.V.
Turns out that the input file uses the UTF-16 LE Encoding (as shown by the hexdump of the content). Thus, the solution seems to be to convert the input file from UTF-16LE to UTF-8 before running AWK. Thanks
I found out (thanks to all who suggested looking at the hexdump of the input file) that the file used UTF-16LE encoding. Once I converted the input file using iconv , the AWK script worked as expected

How can I solve a problems with a date filter with awk [duplicate]

This question already has answers here:
Finding directories older than N days in HDFS
(5 answers)
Closed 4 years ago.
I want to filter some files for date (I can't use find, because the files are in HDFS). The solution that I find is using awk.
This is an example of data that I want to process
drwxrwx--x+ - hive hive 0 2019-01-01 20:02 /dat1
drwxrwx--x+ - hive hive 0 2019-01-02 16:38 /dat2
drwxrwx--x+ - hive hive 0 2019-01-03 16:59 /dat3
If I use this command:
$ ls -l |awk '$6 > "2019-01-02"'
drwxrwx--x+ - hive hive 0 2019-01-03 16:59 /dat3
I don't have any problems, but If I want to create a script to help me to filter 2 days ago, I add in the awk the expression:
$ date +%Y-%m-%d --date='-2 day'
2019-01-02
It is something like this, but isn't working:
ls -l |awk '$6 >" date +%Y-%m-%d --date=\'-2 day\'"'
>
It's like something is missing, but I don't know what it is.
First of all, Never try to parse the output of ls.
If you want to get your hands on the files/directories that are maximum n days old, which are in a directory /path/to/dir/
$ find /path/to/dir -type f -mtime -2 -print
$ find /path/to/dir -type d -mtime -2 -print
The first one is for files, the second for directories.
If you still want to parse ls with awk, you might try somthing like this:
$ ls -l | awk -v d=$(date -d "2 days ago" "+%F") '$6 > d'
The problem you are having is that you are nesting double quotes into single quotes.
Parsing the output of ls and manipulating the mod-time of the files is generally not recommended. But, if you stick to yyyymmdd format, then below workaround will help you. I use this hack for my daily chores as it uses number comparisons
$ ls -l --time-style '+%Y%m%d' delete_5lines.txt jobinfo.txt stan.in sample.dat report.txt
-rw-r--r-- 1 user1234 unixgrp 34 20181231 delete_5lines.txt
-rw-r--r-- 1 user1234 unixgrp 226 20190101 jobinfo.txt
-rw-r--r-- 1 user1234 unixgrp 7120 20190104 report.txt
-rw-r--r-- 1 user1234 unixgrp 70555 20190104 sample.dat
-rw-r--r-- 1 user1234 unixgrp 58 20190103 stan.in
Get files after Jan-3rd
$ ls -l --time-style '+%Y%m%d' delete_5lines.txt jobinfo.txt stan.in sample.dat report.txt | awk ' $6>20190103'
-rw-r--r-- 1 user1234 unixgrp 7120 20190104 report.txt
-rw-r--r-- 1 user1234 unixgrp 70555 20190104 sample.dat
Get files on/after Jan-3rd..
$ ls -l --time-style '+%Y%m%d' delete_5lines.txt jobinfo.txt stan.in sample.dat report.txt | awk ' $6>=20190103'
-rw-r--r-- 1 user1234 unixgrp 7120 20190104 report.txt
-rw-r--r-- 1 user1234 unixgrp 70555 20190104 sample.dat
-rw-r--r-- 1 user1234 unixgrp 58 20190103 stan.in
Exactly Jan-3rd
$ ls -l --time-style '+%Y%m%d' delete_5lines.txt jobinfo.txt stan.in sample.dat report.txt | awk ' $6==20190103'
-rw-r--r-- 1 user1234 unixgrp 58 20190103 stan.in
You can alias it like
$ alias lsdt=" ls -l --time-style '+%Y%m%d' "
and use it like
$ lsdt jobinfo.txt stan.in sample.dat report.txt
Note: Again, you should avoid it if you are going to use it for scripts... just use it for day-to-day chores

Why awk command only processes one time after I use sed command

The fist times, I use this command:
svn log -l1000 | grep '#xxxx' -B3 | awk 'BEGIN {FS="\n"; RS=""; OFS=";"} {print $1, $2}'
Out put are many lines. But it's not perfect as I want.
Because there are some blank lines or lines with format '----'. So, I use sed command to remove them. I use:
svn log -l1000 | grep '#xxxx' -B3 | sed '/^$/d' | sed '/^--/d' | awk 'BEGIN {FS="\n"; RS=""; OFS=";"} {print $1, $2}'
I checked output of command:
svn log -l1000 | grep '#xxxx' -B3 | sed '/^$/d' | sed '/^--/d'
it looks good. But when awk process it as input text, I only see one line output.
Ah, my input likes this:
------------------------------------------------------------------------
rxxxx | abc.xyz | 2016-02-01 13:42:21 +0700 (Mon, 01 Feb 2016) | 1 line
refs #kkkk [GolFeature] Fix UI 69
--
------------------------------------------------------------------------
rxxxjy | mnt.abc| 2016-02-01 11:33:45 +0700 (Mon, 01 Feb 2016) | 1 line
refs #kkkk [GoFeature] remove redundant function
--
------------------------------------------------------------------------
rxxyyxx | asdfadf.xy | 2016-02-01 11:02:06 +0700 (Mon, 01 Feb 2016) | 1 line
refs #kkkk Updated ini file
My expected output is:
2016-02-01 11:02:06 +0700 (Mon, 01 Feb 2016), rxxxx, mnt.abc, refs #kkkk Updated ini file ...

Using awk to format an output for nstats

I would like to get a complete hostname with their server up-time using "nstats" command. The script appears to be working ok. I need help with the 1st column with a "complete" hostname and the 7th column (server up-time) printed.
This following command only give me their partial hostnames:
for host in $(cat servers.txt); do nstats $host -H | awk 'BEGIN {IFS="\t"} {$2=$3=$4=$5=$6=$9="" ; print}' ; done
BAD Output: (host-names got cut off after the 12th characters)
linux_server 223 days
linux_server 123 days
windows_serv 23 days
windows_serv 23 days
EXPECTED Output:
linux_server1 223 days
linux_server2 123 days
windows_server1 23 days
windows_server2 123 days
The contents of servers.txt file are as follows:
linux_server1
linux_server2
windows_server1
windows_server2
Output without awk
LINXSERVE10% for host in $(cat servers.txt); do nstats $host -H ; done
linux_server 0.01 47% 22% 56 05:08 20 days 17:21:00
linux_server 0.00 23% 8% 45 05:08 24 days 04:16:46
windows_serv 0.04 72% 30% 58 05:09 318 days 23:32:17
windows_serv 0.00 20% 8% 40 05:09 864 days 12:23:10
windows_serv 0.00 51% 17% 41 05:09 442 days 05:30:14
Note: for host in $(cat servers.txt); do nstats $host -H | awk -v server=$host 'BEGIN {IFS="\t"} {$2=$3=$4=$5=$6=$9="" ; print server }' ; done *** this works ok but it will list only a complete hostname with no server uptime.
Any help you can give would be greatly appreciated.
Do you know you may choose the fields to print in awk?
for host in $(cat servers.txt); do
nstats $host -H |
awk 'BEGIN {IFS="\t"} {print $1,$7,$8}';
done
These will print only three fields you are interested.
The awk code labeled as a "note" is totally useless -- It is equivalent to
for host in $(cat servers.txt); do
echo "$host"
done
UPDATE: after realizing the problem was the nstats command, the awk line command would be
awk -v server="$host" 'BEGIN {IFS="\t"} {print server,$7,$8}';
then the output looked like this (server uptime overwrote the hostnames)
20 daysrver
24 daysrver
318 daysver
864 dayserv
442 dayserv
So I put that server variable at the end, it looked much better and i can extract that and play with it in excel. THANKS SO Much Jdamian!
for host in $(cat servers.txt); do nstats $host -H | awk -v server="$host" 'BEGIN {IFS="\t"} {print $7,$8,server}'; done
20 days linux_server1
24 days linux_server2
318 days windows_server1
864 days windows_server2
442 days windows_server3

List pid of process that is under 24 hrs old

Using QNX I am trying to list the processes that have been running for under 24 hours. I have the following code that will list every process' PID and elapsed time running. I have tried multiple loops to list only the PID's that have 'etime' greater than 2400 with no success.
ps -eo pid,etime,cmd | sed s/://g
returns
PID ELAPSED CMD
1 4618
2 4618 slogger
4099 4618 pci-bios
4100 4618 io-usb
4101 4618 io-hid
4102 4618 devc-con-hid
4103 4618 devb-eide
204808 4612 inetd
229385 4612 /pclogd
81930 4614 pipe
81931 4614 mqueue
94220 4614 dumper
81933 4614 tinit
94222 4614 io-net
Basially, I need if [elapsed -lt 2400];then list pid
ps -eo pid,etime,cmd | sed s/://g |
awk '$2 < 2400 {printf "%-10s %-10s %-20s\n", $2, $1, $3 }'