Why awk command only processes one time after I use sed command - awk

The fist times, I use this command:
svn log -l1000 | grep '#xxxx' -B3 | awk 'BEGIN {FS="\n"; RS=""; OFS=";"} {print $1, $2}'
Out put are many lines. But it's not perfect as I want.
Because there are some blank lines or lines with format '----'. So, I use sed command to remove them. I use:
svn log -l1000 | grep '#xxxx' -B3 | sed '/^$/d' | sed '/^--/d' | awk 'BEGIN {FS="\n"; RS=""; OFS=";"} {print $1, $2}'
I checked output of command:
svn log -l1000 | grep '#xxxx' -B3 | sed '/^$/d' | sed '/^--/d'
it looks good. But when awk process it as input text, I only see one line output.
Ah, my input likes this:
------------------------------------------------------------------------
rxxxx | abc.xyz | 2016-02-01 13:42:21 +0700 (Mon, 01 Feb 2016) | 1 line
refs #kkkk [GolFeature] Fix UI 69
--
------------------------------------------------------------------------
rxxxjy | mnt.abc| 2016-02-01 11:33:45 +0700 (Mon, 01 Feb 2016) | 1 line
refs #kkkk [GoFeature] remove redundant function
--
------------------------------------------------------------------------
rxxyyxx | asdfadf.xy | 2016-02-01 11:02:06 +0700 (Mon, 01 Feb 2016) | 1 line
refs #kkkk Updated ini file
My expected output is:
2016-02-01 11:02:06 +0700 (Mon, 01 Feb 2016), rxxxx, mnt.abc, refs #kkkk Updated ini file ...

Related

How can I solve a problems with a date filter with awk [duplicate]

This question already has answers here:
Finding directories older than N days in HDFS
(5 answers)
Closed 4 years ago.
I want to filter some files for date (I can't use find, because the files are in HDFS). The solution that I find is using awk.
This is an example of data that I want to process
drwxrwx--x+ - hive hive 0 2019-01-01 20:02 /dat1
drwxrwx--x+ - hive hive 0 2019-01-02 16:38 /dat2
drwxrwx--x+ - hive hive 0 2019-01-03 16:59 /dat3
If I use this command:
$ ls -l |awk '$6 > "2019-01-02"'
drwxrwx--x+ - hive hive 0 2019-01-03 16:59 /dat3
I don't have any problems, but If I want to create a script to help me to filter 2 days ago, I add in the awk the expression:
$ date +%Y-%m-%d --date='-2 day'
2019-01-02
It is something like this, but isn't working:
ls -l |awk '$6 >" date +%Y-%m-%d --date=\'-2 day\'"'
>
It's like something is missing, but I don't know what it is.
First of all, Never try to parse the output of ls.
If you want to get your hands on the files/directories that are maximum n days old, which are in a directory /path/to/dir/
$ find /path/to/dir -type f -mtime -2 -print
$ find /path/to/dir -type d -mtime -2 -print
The first one is for files, the second for directories.
If you still want to parse ls with awk, you might try somthing like this:
$ ls -l | awk -v d=$(date -d "2 days ago" "+%F") '$6 > d'
The problem you are having is that you are nesting double quotes into single quotes.
Parsing the output of ls and manipulating the mod-time of the files is generally not recommended. But, if you stick to yyyymmdd format, then below workaround will help you. I use this hack for my daily chores as it uses number comparisons
$ ls -l --time-style '+%Y%m%d' delete_5lines.txt jobinfo.txt stan.in sample.dat report.txt
-rw-r--r-- 1 user1234 unixgrp 34 20181231 delete_5lines.txt
-rw-r--r-- 1 user1234 unixgrp 226 20190101 jobinfo.txt
-rw-r--r-- 1 user1234 unixgrp 7120 20190104 report.txt
-rw-r--r-- 1 user1234 unixgrp 70555 20190104 sample.dat
-rw-r--r-- 1 user1234 unixgrp 58 20190103 stan.in
Get files after Jan-3rd
$ ls -l --time-style '+%Y%m%d' delete_5lines.txt jobinfo.txt stan.in sample.dat report.txt | awk ' $6>20190103'
-rw-r--r-- 1 user1234 unixgrp 7120 20190104 report.txt
-rw-r--r-- 1 user1234 unixgrp 70555 20190104 sample.dat
Get files on/after Jan-3rd..
$ ls -l --time-style '+%Y%m%d' delete_5lines.txt jobinfo.txt stan.in sample.dat report.txt | awk ' $6>=20190103'
-rw-r--r-- 1 user1234 unixgrp 7120 20190104 report.txt
-rw-r--r-- 1 user1234 unixgrp 70555 20190104 sample.dat
-rw-r--r-- 1 user1234 unixgrp 58 20190103 stan.in
Exactly Jan-3rd
$ ls -l --time-style '+%Y%m%d' delete_5lines.txt jobinfo.txt stan.in sample.dat report.txt | awk ' $6==20190103'
-rw-r--r-- 1 user1234 unixgrp 58 20190103 stan.in
You can alias it like
$ alias lsdt=" ls -l --time-style '+%Y%m%d' "
and use it like
$ lsdt jobinfo.txt stan.in sample.dat report.txt
Note: Again, you should avoid it if you are going to use it for scripts... just use it for day-to-day chores

PowerShell - Parse through comma delimited text file and insert values into SQL table

I have a text file that contains file names, file sizes, and created dates for before (.txt) and after (.txt.Z) compression. The data is separated by commas and looks like this:
Note: The File names below are not the actual file names. I will be receiving this type of file weekly, so each week the files would be different names.
File1.txt,1449124525,Jul 09 01:13
File2.txt,2601249364,Jul 09 01:30
File3.txt,18105630,Jul 09 01:01
File4.txt,732235442,Jul 09 01:17
File1.txt.Z,130652147,Jul 09 01:13
File2.txt.Z,217984273,Jul 09 01:30
File3.txt.Z,2320129,Jul 09 01:01
File4.txt.Z,61196011,Jul 09 01:17
etc...
Currently, the code that I have inserts the first row into SQL 44 times (there are 22 total file names, so 44 total with before and after compression).
$file = Get-Content "MY_FILE.txt"
$line = $null
foreach ($line in $file)
{
#Split fields into values
$line = $file -split (",")
$FileName = $line[0]
$FileSize = $line[1]
$FileDate = $line[2]
#Format Date Field
$DateString = $FileDate
$DateFormat = "MMM dd HH:mm"
$Culture = $(New-Object System.Globalization.CultureInfo -ArgumentList "en-US")
$DateString = $DateString -replace "\s+"," "
$NewDate = [Datetime]::ParseExact($DateString, $DateFormat, $Culture)
$FileDate = Get-Date $NewDate -Format "yyyy-MM-dd HH:mm:ss"
#SQL Connection Info
$Connection = New-Object System.Data.SQLClient.SQLConnection
$Connection.ConnectionString = "server='MY_SERVER';database='MY_DATABASE';trusted_connection=true;"
$Connection.Open()
$Command = New-Object System.Data.SQLClient.SQLCommand
$Command.Connection = $Connection
#Insert into SQL
$sql = "INSERT INTO [MY_DATABASE].[dbo].[MY_TABLE] ([FileName],[FileSize],[FileDate]) VALUES ('" + $FileName + "'," + $FileSize + ",'" + $FileDate + "')"
$Command.CommandText = $sql
$Command.ExecuteReader()
}
$Connection.Close()
Another tricky thing that I would love to be able to do would be to load each file, and its corresponding size and date, to the same row for before and after compression. I cannot seem to grasp how I would be able to get that. The above part is more important, though. Anyway, I would want it to look like this in SQL:
| InFileName | InFileSize | InFileDate | OutFileName | OutFileSize | OutFileDate |
-------------------------------------------------------------------------------------
| File1.txt | 1449124525 | Jul 09 01:13 | File1.txt.Z | 130652147 | Jul 09 01:13 |
| File2.txt | 2601249364 | Jul 09 01:30 | File2.txt.Z | 217984273 | Jul 09 01:30 |
| File3.txt | 18105630 | Jul 09 01:01 | File3.txt.Z | 2320129 | Jul 09 01:01 |
| File4.txt | 732235442 | Jul 09 01:17 | File4.txt.Z | 61196011 | Jul 09 01:17 |
Thanks!
I suggest you process the data first, before importing into SQL server. If you're using comma-separated-values, use Import-CSV.
As an addition to that, you can specify column names when importing, so I've added empty columns for the compressed files.
Then loop over the rows, and merge in the compressed files - note that their names come from the "InFileName" column, and get moved to the "OutFileName" column. It's not particularly efficient; for every file without a .z ending, it loops through all the rows to find the corresponding .z file.
$fileHeaders = 'InFileName','InFileSize','InFileDate','OutFileName','OutFileSize','OutFileDate'
$inData = Import-Csv D:\f.txt -Header $fileHeaders
$outData = foreach ($row in $inData) {
if ($row.InFileName -notmatch '\.z$') {
$outFile = $inData | Where {$_.InFileName -match "$($row.InFileName).."}
$row.OutFileName = $outFile.InFileName
$row.OutFileSize = $outFile.InFileSize
$row.OutFileDate = $outFile.InFileDate
$row
}
}
e.g. after that:
$outData | ft -AutoSize
InFileName InFileSize InFileDate OutFileName OutFileSize OutFileDate
---------- ---------- ---------- ----------- ----------- -----------
File1.txt 1449124525 Jul 09 01:13 File1.txt.Z 130652147 Jul 09 01:13
File2.txt 2601249364 Jul 09 01:30 File2.txt.Z 217984273 Jul 09 01:30
File3.txt 18105630 Jul 09 01:01 File3.txt.Z 2320129 Jul 09 01:01
File4.txt 732235442 Jul 09 01:17 File4.txt.Z 61196011 Jul 09 01:17
Then loop over $outData and you'll have to change your SQL inserts and so on to handle the 6 fields. You'll still need all the date parsing / field processing code as well, which I left out completely.
It looks like in the first line of your foreach loop you are calling split on the entire $file array rather than on the $line you are working with. It should work if you swap $line = $file -split (",") for $line = $line -split (","). Although you may want to use a different name for either the parameter or the variable you are assigning it to.

grep and tail -f for a UTF-16 binary file - trying to use simple awk

How can I achieve the equivalent of:
tail -f file.txt | grep 'regexp'
to only output the buffered lines that match a regular expression such as 'Result' from the file type:
$ file file.txt
file.txt:Little-endian UTF-16 Unicode text, with CRLF line terminators
Example of the tail -f stream content below converted to utf-8:
Package end.
Total warnings: 40
Total errors: 0
Elapsed time: 24.4267192 secs.
...Package Executed.
Result: Success
Awk?
The problems in piping to grep led me to awk as a on-stop-shop solution for stripping the offending characters and also producing matched lines from regex.
awk seems to be giving the most promising results, however, I am finding that it returns the whole stream rather than individual matching lines:
tail -f file.txt | awk '{sub("/[^\x20-\x7F]/", "");/Result/;print}'
Package end.
Total warnings: 40
Total errors: 0
Elapsed time: 24.4267192 secs.
...Package Executed.
Result: Success
What I have tried
converting the stream and piping to grep
tail -f file.txt | iconv -t UTF-8 | grep 'regexp'
using luit to change terminal encoding as per this post
luit -encoding UTF-8 -- tail -f file.txt | grep 'regexp'
delete non ASCII characters, described here, then piping to grep
tail -f file.txt | tr -d '[^\x20-\x7F]' | grep 'regexp'
tail -f file.txt | sed 's/[^\x00-\x7F]//' | grep 'regexp'
various combinations of the above using grep flags --line-buffered, -a as well as sed -u
using luit -encoding UTF-8 -- pre-pended to the above
using a file with the same encoding containing the regular expression for grep -f
Why they failed
Most attempts, simply nothing is printed to the screen because grep searches 'regexp' when in fact the text is something like '\x00r\x00e\x00g\x00e\x00x\x00p' - for example 'R' will return the line 'Result: Success' but 'Result' won't
If a full regular expression gets a match, such as in the case of using grep -f, it will return the whole stream and doesn't seem to just return the matched lines
piping through sed or tr or iconv seems to break the pipe to grep and grep seems to still only be able to match individual characters
Edit
I looked at the raw file in it's utf-16 format using xxd with an aim of using regex to match the encoding, which gave the following output:
$ tail file.txt | xxd
00000000: 0050 0061 0063 006b 0061 0067 0065 0020 .P.a.c.k.a.g.e.
00000010: 0065 006e 0064 002e 000d 000a 000d 000a .e.n.d..........
00000020: 0054 006f 0074 0061 006c 0020 0077 0061 .T.o.t.a.l. .w.a
00000030: 0072 006e 0069 006e 0067 0073 003a 0020 .r.n.i.n.g.s.:.
00000040: 0034 0030 000d 000a 0054 006f 0074 0061 .4.0.....T.o.t.a
00000050: 006c 0020 0065 0072 0072 006f 0072 0073 .l. .e.r.r.o.r.s
00000060: 003a 0020 0030 000d 000a 0045 006c 0061 .:. .0.....E.l.a
00000070: 0070 0073 0065 0064 0020 0074 0069 006d .p.s.e.d. .t.i.m
00000080: 0065 003a 0020 0032 0034 002e 0034 0032 .e.:. .2.4...4.2
00000090: 0036 0037 0031 0039 0032 0020 0073 0065 .6.7.1.9.2. .s.e
000000a0: 0063 0073 002e 000d 000a 002e 002e 002e .c.s............
000000b0: 0050 0061 0063 006b 0061 0067 0065 0020 .P.a.c.k.a.g.e.
000000c0: 0045 0078 0065 0063 0075 0074 0065 0064 .E.x.e.c.u.t.e.d
000000d0: 002e 000d 000a 000d 000a 0052 0065 0073 ...........R.e.s
000000e0: 0075 006c 0074 003a 0020 0053 0075 0063 .u.l.t.:. .S.u.c
000000f0: 0063 0065 0073 0073 000d 000a 000d 000a .c.e.s.s........
00000100: 00
The sloppiest solution that should work on Cygwin is fixing your awk statement:
tail -f file.txt | \
LC_CTYPE=C awk '{ gsub("[^[:print:]]", ""); if($0 ~ /Result/) print; }'
This has a few bugs that cancel each other out, like tail cutting a UTF-16LE file in awkward places but awk stripping what we hope is garbage.
A robust solution might be:
tail -c +1 -f file.txt | \
script -qc 'iconv -f UTF-16LE -t UTF-8' /dev/null | grep Result
but it reads the entire file and I don't know how well Cygwin works with using script to convince iconv not to buffer (it would work on GNU/Linux).
I realised a simple regex to ignore any characters between letters in the search string might work...
This matches 'Result' whilst allowing any one character between each letter...
$ tail -f file.txt | grep -a 'R.e.s.u.l.t'
Result: Success
$ tail -f file.txt | awk '/R.e.s.u.l.t./'
Result: Success
or as per this answer: to avoid typing all the tedious dots...
search="Result"
tail -f file.txt | grep -a -e "$(echo "$search" | sed 's/./&./g')"
You can use ripgrep instead which will handle nicely UTF-16 without having to convert your input
tail -f file.txt | rg regexp

Using awk to format an output for nstats

I would like to get a complete hostname with their server up-time using "nstats" command. The script appears to be working ok. I need help with the 1st column with a "complete" hostname and the 7th column (server up-time) printed.
This following command only give me their partial hostnames:
for host in $(cat servers.txt); do nstats $host -H | awk 'BEGIN {IFS="\t"} {$2=$3=$4=$5=$6=$9="" ; print}' ; done
BAD Output: (host-names got cut off after the 12th characters)
linux_server 223 days
linux_server 123 days
windows_serv 23 days
windows_serv 23 days
EXPECTED Output:
linux_server1 223 days
linux_server2 123 days
windows_server1 23 days
windows_server2 123 days
The contents of servers.txt file are as follows:
linux_server1
linux_server2
windows_server1
windows_server2
Output without awk
LINXSERVE10% for host in $(cat servers.txt); do nstats $host -H ; done
linux_server 0.01 47% 22% 56 05:08 20 days 17:21:00
linux_server 0.00 23% 8% 45 05:08 24 days 04:16:46
windows_serv 0.04 72% 30% 58 05:09 318 days 23:32:17
windows_serv 0.00 20% 8% 40 05:09 864 days 12:23:10
windows_serv 0.00 51% 17% 41 05:09 442 days 05:30:14
Note: for host in $(cat servers.txt); do nstats $host -H | awk -v server=$host 'BEGIN {IFS="\t"} {$2=$3=$4=$5=$6=$9="" ; print server }' ; done *** this works ok but it will list only a complete hostname with no server uptime.
Any help you can give would be greatly appreciated.
Do you know you may choose the fields to print in awk?
for host in $(cat servers.txt); do
nstats $host -H |
awk 'BEGIN {IFS="\t"} {print $1,$7,$8}';
done
These will print only three fields you are interested.
The awk code labeled as a "note" is totally useless -- It is equivalent to
for host in $(cat servers.txt); do
echo "$host"
done
UPDATE: after realizing the problem was the nstats command, the awk line command would be
awk -v server="$host" 'BEGIN {IFS="\t"} {print server,$7,$8}';
then the output looked like this (server uptime overwrote the hostnames)
20 daysrver
24 daysrver
318 daysver
864 dayserv
442 dayserv
So I put that server variable at the end, it looked much better and i can extract that and play with it in excel. THANKS SO Much Jdamian!
for host in $(cat servers.txt); do nstats $host -H | awk -v server="$host" 'BEGIN {IFS="\t"} {print $7,$8,server}'; done
20 days linux_server1
24 days linux_server2
318 days windows_server1
864 days windows_server2
442 days windows_server3

Read serial input with awk, insert date

I'm trying to reformat serial input, which consists of two integers separated by a comma (sent from an Arduino):
1,2
3,4
0,0
0,1
etc. I would like to append the date after each line, separating everything with a tab character. Here's my code so far:
cat /dev/cu.usbmodem3d11 | awk 'BEGIN {RS=","};{printf "%i\t%i\t%s",$1,$2,system("date")}'
Here's the result I get (with date in my locale):
1 2 0Mer 26 fév 2014 22:09:20 EST
3 4 0Mer 26 fév 2014 22:09:20 EST
0 0 0Mer 26 fév 2014 22:09:20 EST
0 1 0Mer 26 fév 2014 22:09:20 EST
Why is there an extra '0' in front of my date field? Sorry for the newbie question :(
EDIT This code solved my problem. Thanks to all who helped.
awk 'BEGIN {FS=","};{system("date")|getline myDate;printf "%i\t%i\t%s",$1, $2, myDate}' /dev/cu.usbmodem3d11
I'm not clear why, but in order for the date to keep updating and recording at what time the data was received, I have to use system("date")instead of just "date"in the code above.
2 things
It will be easier to see your problem if you add a \n at the end of your printf string
Then the output is
>echo '1,2' | awk 'BEGIN {RS=","};{printf "%i\t%i\t%s\n",$1,$2,system("date")}'
Wed Feb 26 21:30:17 CST 2014
1 0 0
Wed Feb 26 21:30:17 CST 2014
2 0 0
I'm guessing that output from system("date") returns its output "outside" of scope of awk's $0 natural scope of each line of input processed. Others may be able to offer a better explanation.
To get the output you want, I'm using the getline function to capture the output of the date command to a variable (myDt). Now the output is
> echo '1,2' | awk 'BEGIN {RS=","};{"date" | getline myDt ; printf "%i\t%i\t%s\n",$1,$2,myDt}'
1 0 Wed Feb 26 21:31:15 CST 2014
2 0 Wed Feb 26 21:31:15 CST 2014
Finally, we remove the "debugging" \n char, and get the output you specify:
> echo '1,2' | awk 'BEGIN {RS=","};{"date" | getline myDt ; printf "%i\t%i\t%s",$1,$2,myDt}'
1 0 Wed Feb 26 21:34:56 CST 2014
2 0 Wed Feb 26 21:34:56 CST 2014
And, per Jaypal's post, I see now that FS="," is another issue, so when we make that change AND return the `\n' char, we have
echo '1,2' | awk 'BEGIN {FS=","};{"date" | getline myDt ; printf "%i\t%i\t%s\n",$1,$2,myDt}'
1 2 Wed Feb 26 21:44:42 CST 2014
Two issues:
First - RS is record separator. You need FS which is Field Separator to separate two columns, where $1 will be 1 and $2 will be 2 (as per your first row)
Second - The extra 0 you see in output is the return value of system() command. It means it ran successfully. You can simply run the shell command in quotes and pipe it to getline. Putting a variable after it will allow you to capture the value returned.
Try this:
awk 'BEGIN {FS=","};{"date"|getline var;printf "%i\t%i\t%s\n",$1,$2,var}'
This is a more simple solution:
awk -F, '{print $1,$2,dt}' dt="$(date)" OFS="\t" /dev/cu.usbmodem3d11
1 2 Thu Feb 27 06:23:41 CET 2014
3 4 Thu Feb 27 06:23:41 CET 2014
0 0 Thu Feb 27 06:23:41 CET 2014
0 1 Thu Feb 27 06:23:41 CET 2014
IF you like to show date in another format, just read manual for date
Eks dt="$(date +%D)" gives 02/27/14