I'm trying to reformat serial input, which consists of two integers separated by a comma (sent from an Arduino):
1,2
3,4
0,0
0,1
etc. I would like to append the date after each line, separating everything with a tab character. Here's my code so far:
cat /dev/cu.usbmodem3d11 | awk 'BEGIN {RS=","};{printf "%i\t%i\t%s",$1,$2,system("date")}'
Here's the result I get (with date in my locale):
1 2 0Mer 26 fév 2014 22:09:20 EST
3 4 0Mer 26 fév 2014 22:09:20 EST
0 0 0Mer 26 fév 2014 22:09:20 EST
0 1 0Mer 26 fév 2014 22:09:20 EST
Why is there an extra '0' in front of my date field? Sorry for the newbie question :(
EDIT This code solved my problem. Thanks to all who helped.
awk 'BEGIN {FS=","};{system("date")|getline myDate;printf "%i\t%i\t%s",$1, $2, myDate}' /dev/cu.usbmodem3d11
I'm not clear why, but in order for the date to keep updating and recording at what time the data was received, I have to use system("date")instead of just "date"in the code above.
2 things
It will be easier to see your problem if you add a \n at the end of your printf string
Then the output is
>echo '1,2' | awk 'BEGIN {RS=","};{printf "%i\t%i\t%s\n",$1,$2,system("date")}'
Wed Feb 26 21:30:17 CST 2014
1 0 0
Wed Feb 26 21:30:17 CST 2014
2 0 0
I'm guessing that output from system("date") returns its output "outside" of scope of awk's $0 natural scope of each line of input processed. Others may be able to offer a better explanation.
To get the output you want, I'm using the getline function to capture the output of the date command to a variable (myDt). Now the output is
> echo '1,2' | awk 'BEGIN {RS=","};{"date" | getline myDt ; printf "%i\t%i\t%s\n",$1,$2,myDt}'
1 0 Wed Feb 26 21:31:15 CST 2014
2 0 Wed Feb 26 21:31:15 CST 2014
Finally, we remove the "debugging" \n char, and get the output you specify:
> echo '1,2' | awk 'BEGIN {RS=","};{"date" | getline myDt ; printf "%i\t%i\t%s",$1,$2,myDt}'
1 0 Wed Feb 26 21:34:56 CST 2014
2 0 Wed Feb 26 21:34:56 CST 2014
And, per Jaypal's post, I see now that FS="," is another issue, so when we make that change AND return the `\n' char, we have
echo '1,2' | awk 'BEGIN {FS=","};{"date" | getline myDt ; printf "%i\t%i\t%s\n",$1,$2,myDt}'
1 2 Wed Feb 26 21:44:42 CST 2014
Two issues:
First - RS is record separator. You need FS which is Field Separator to separate two columns, where $1 will be 1 and $2 will be 2 (as per your first row)
Second - The extra 0 you see in output is the return value of system() command. It means it ran successfully. You can simply run the shell command in quotes and pipe it to getline. Putting a variable after it will allow you to capture the value returned.
Try this:
awk 'BEGIN {FS=","};{"date"|getline var;printf "%i\t%i\t%s\n",$1,$2,var}'
This is a more simple solution:
awk -F, '{print $1,$2,dt}' dt="$(date)" OFS="\t" /dev/cu.usbmodem3d11
1 2 Thu Feb 27 06:23:41 CET 2014
3 4 Thu Feb 27 06:23:41 CET 2014
0 0 Thu Feb 27 06:23:41 CET 2014
0 1 Thu Feb 27 06:23:41 CET 2014
IF you like to show date in another format, just read manual for date
Eks dt="$(date +%D)" gives 02/27/14
Related
Here's how my data looks like:
date
sku
inventory_added
demand
22nd Nov 2021
XYZ
70
18
23rd Nov 2021
XYZ
0
18
24th Nov 2021
XYZ
0
50
25th Nov 2021
XYZ
0
15
26th Nov 2021
XYZ
80
30
27th Nov 2021
XYZ
0
20
28th Nov 2021
XYZ
0
15
29th Nov 2021
XYZ
0
20
30th Nov 2021
XYZ
0
10
1st Dec 2021
XYZ
100
40
2nd Dec 2021
XYZ
0
10
I want to create a new column named solution using BigQuery SQL where in the 1st row, i.e. 22nd Nov 2021, I want formula as - inventory_added - demand.
This will give me 1st row's value for solution will be 52.
Now what I am not able to do is from 2nd row:
So, next now, will be 52 (remaining inventory from previous day) + 0 (inventory_added on 23rd Nov 2021) - 18 (demand on 23 Nov 2021). This is equal to 34.
Similarly going to next row, i.e. 24th November:
value in solution will be 34 + 0 - 50 = -16. Now since it is negative, it should be put as 0.
I tried this - MAX(solutions, 0).
The result will look like this:
date
sku
inventory_added
demand
solution
22nd Nov 2021
XYZ
70
18
52
23rd Nov 2021
XYZ
0
18
34
24th Nov 2021
XYZ
0
50
0
25th Nov 2021
XYZ
0
15
0
26th Nov 2021
XYZ
80
30
50
27th Nov 2021
XYZ
0
20
30
28th Nov 2021
XYZ
0
15
15
29th Nov 2021
XYZ
0
20
0
30th Nov 2021
XYZ
0
10
0
1st Dec 2021
XYZ
100
40
60
2nd Dec 2021
XYZ
0
10
50
I am not sure if this can be accomplished by BigQuery, but all suggestions are welcome.
Thanks!
Without the condition "it is negative, it should be put as 0" you may use window (in BigQuery terms - analytic) variant of SUM() function:
SELECT *,
SUM(inventory_added - demand) OVER (PARTITION BY sku ORDER BY date) AS solution
FROM source_table
With this condition the output become iterative, and you must use recursive CTE (if available in BigQuery) or iterative stored procedure.
I see that recursive CTE is not available in BigQuery ... Can you provide a pseudo code may as a starting point for stored procedures? – Shantanu Jain
CREATE PROCEDURE procname()
BEGIN
CREATE temptable;
OPEN CURSOR FOR SELECT * FROM datatable ORDER BY date;
SET #solution = 0;
FETCH CURSOR INTO #date, #sku, #inventory_added, #demand;
LOOP
SET #solution = GREATEST(#solution + #inventory_added - #demand, 0);
INSERT INTO temptable VALUES (#date, #sku, #inventory_added, #demand, #solution);
FETCH CURSOR INTO #date, #sku, #inventory_added, #demand;
UNTIL NO_ROWS_IN_CURSOR END LOOP;
SELECT * FROM temptable;
DROP temptable;
END
AS an option - consider use of recently introduced FOR...IN Loop
declare result int64;
declare prev_sku string;
create temp table results as (select *, 0 as solution from your_table where false);
set (result, prev_sku) = (0, '');
for record in (select *, parse_date('%d %B %Y', regexp_replace(date, r'(\d*)(\w*)( \w{3} \d{4})', r'\1 \3')) dt from your_table order by sku, dt) do
if record.sku != prev_sku then set result = 0; end if;
set result = result + record.inventory_added - record.demand;
if result < 0 then set result = 0; end if;
insert into results values(record.date, record.sku, record.inventory_added, record.demand, result);
set prev_sku = record.sku;
end for;
select * from results
order by sku, parse_date('%d %B %Y', regexp_replace(date, r'(\d*)(\w*)( \w{3} \d{4})', r'\1 \3'));
If applied to sample data in your question - output is
Note: While delivering expected result - obviously this is going to be extremely slow (as any cursor based solution) - so while applicable for learning - I don't think appropriate for real production use
I am trying to append some text in /var/log/messages output whenever the timestamp between the two consecutive log is different such as :
previous log: 00:01:59 and current log 00:02:00
or
previous log:00:01:49 and current log 00:01:50
above substring of timestamp if different in consecutive log, append some message to $0.
You may run below command it is working for 1 minute, needed it for 10 sec.
tail -f /var/log/messages |awk '{split($3,a,":");split($3,b,"");new_time=a[1]":"a[2]":"b[1]; if(prev_time==new_time) print $0; else print "10 Second group is over, starting new: "$0" "prev_time " "new_time } {split($3,a,":");split($3,b,"");prev_time=a[1]":"a[2]":"b[1]}'
Required result is modification of above command to print same message in 10 second gap of logs , currently its doing for 1 minute. I have used split() to capture 'HH:MM:S" not "HH:MM:SS",so whenever privious 'HH:MM:S" and current 'HH:MM:S"differ , print the message "10 Second group is over, starting new: $0". Not sure what is the mistake here.
In short, currently its working when a minute changes, I need it when second changes from 39 to 40th sec or 09 sec to 10 sec. NOT 11 sec to 12 sec. HH:MM:SS , S marked in bold needed to be changed.
Sample lines:
Jan 23 15:09:54 foo bar
Jan 23 15:10:04 bla bla
this is the general idea:
$ for((i=35;i<45;i++)); do echo Jan 23 00:01:$i; done |
awk '{split($3,a,":"); print $0, (p!=(c=int(a[3]/10))?"<<<":""); p=c}'
Jan 23 00:01:35 <<<
Jan 23 00:01:36
Jan 23 00:01:37
Jan 23 00:01:38
Jan 23 00:01:39
Jan 23 00:01:40 <<<
Jan 23 00:01:41
Jan 23 00:01:42
Jan 23 00:01:43
Jan 23 00:01:44
first part is the test data generated for the script since you didn't provide enough. There is spurious first line match, which can be eliminated with NR>1 condition but I don't think that's critical.
I have a file of below format
02 Jul 2016 00:00:00 2736359
02 Jul 2016 00:02:00 2736594
02 Jul 2016 00:04:00 2736828
02 Jul 2016 00:06:00 2737068
02 Jul 2016 00:08:00 2737303
02 Jul 2016 00:10:00 2737542
02 Jul 2016 00:12:00 2737775
02 Jul 2016 00:14:00 2738011
02 Jul 2016 00:16:00 2738251
02 Jul 2016 00:18:00 2738483
Where the first column is the time stamp and second is a number. Given an input 2737778, I want the output to be "02 Jul 2016 00:12:00 and 02 Jul 2016 00:14:00" as 2737778 falls in between 2737775 and 2738011. Can I do this in awk? Is it possible to compare a number in current line with next line?
another similar awk
awk -v n=2737778 'n<=$NF{if(p) print p; print; exit} {p=$0}' file
02 Jul 2016 00:12:00 2737775
02 Jul 2016 00:14:00 2738011
Yes, it is possible to read ahead in awk, see Peek at next line, but don't consume it. Here is another way to do it.
awk '{ if (NR == 1) { save = $0; prtsw = 1 }
else if (prtsw == 1 && 2737778 < $5) {
print save
print $0
prtsw = 0
}
else { save = $0 }
}' abetween.txt
The fist times, I use this command:
svn log -l1000 | grep '#xxxx' -B3 | awk 'BEGIN {FS="\n"; RS=""; OFS=";"} {print $1, $2}'
Out put are many lines. But it's not perfect as I want.
Because there are some blank lines or lines with format '----'. So, I use sed command to remove them. I use:
svn log -l1000 | grep '#xxxx' -B3 | sed '/^$/d' | sed '/^--/d' | awk 'BEGIN {FS="\n"; RS=""; OFS=";"} {print $1, $2}'
I checked output of command:
svn log -l1000 | grep '#xxxx' -B3 | sed '/^$/d' | sed '/^--/d'
it looks good. But when awk process it as input text, I only see one line output.
Ah, my input likes this:
------------------------------------------------------------------------
rxxxx | abc.xyz | 2016-02-01 13:42:21 +0700 (Mon, 01 Feb 2016) | 1 line
refs #kkkk [GolFeature] Fix UI 69
--
------------------------------------------------------------------------
rxxxjy | mnt.abc| 2016-02-01 11:33:45 +0700 (Mon, 01 Feb 2016) | 1 line
refs #kkkk [GoFeature] remove redundant function
--
------------------------------------------------------------------------
rxxyyxx | asdfadf.xy | 2016-02-01 11:02:06 +0700 (Mon, 01 Feb 2016) | 1 line
refs #kkkk Updated ini file
My expected output is:
2016-02-01 11:02:06 +0700 (Mon, 01 Feb 2016), rxxxx, mnt.abc, refs #kkkk Updated ini file ...
When processing input with awk, sometimes I want to edit one of the fields, without touching anything else. Consider this:
$ ls -l | awk 1
total 88
-rw-r--r-- 1 jack jack 8 Jun 19 2013 qunit-1.11.0.css
-rw-r--r-- 1 jack jack 56908 Jun 19 2013 qunit-1.11.0.js
-rw-r--r-- 1 jack jack 4306 Dec 29 09:16 test1.html
-rw-r--r-- 1 jack jack 5476 Dec 7 08:09 test1.js
If I don't edit any of the fields ($1, $2, ...), everything is preserved as it was. But if let's say I want to keep only the first 3 characters of the first field:
$ ls -l | awk '{$1 = substr($1, 1, 3) } 1'
tot 88
-rw 1 jack jack 8 Jun 19 2013 qunit-1.11.0.css
-rw 1 jack jack 56908 Jun 19 2013 qunit-1.11.0.js
-rw 1 jack jack 4306 Dec 29 09:16 test1.html
-rw 1 jack jack 5476 Dec 7 08:09 test1.js
The original whitespace between all fields is replaced with a simple space.
Is there a way to preserve the original whitespace between the fields?
UPDATE
In this sample, it's relatively easy to edit the first 4 fields. But what if I want to keep only the 1st letter of $5 in order to get this output:
-rw-r--r-- 1 jack jack 8 J 19 2013 qunit-1.11.0.css
-rw-r--r-- 1 jack jack 56908 J 19 2013 qunit-1.11.0.js
-rw-r--r-- 1 jack jack 4306 D 29 09:16 test1.html
-rw-r--r-- 1 jack jack 5476 D 7 08:09 test1.js
If you want to preserve the whitespace you could also try the split function.
In Gnu Awk version 4 the split function accepts 4 arguments, where the latter is the separators between the fields. For instance,
echo "a 2 4 6" | gawk ' {
n=split($0,a," ",b)
a[3]=7
line=b[0]
for (i=1;i<=n; i++)
line=(line a[i] b[i])
print line
}'
gives output
a 2 7 6
I know this is an old question but I thought there had to be something better. This answer is for those that stumbled onto this question while searching. While looking around on the web, I have to say #Håkon Hægland has the best answer and that is what I used at first.
But here is my solution. Use FPAT. It can set a regular expression to say what a field should be. FPAT = "([[:space:]]*[[:alnum:][:punct:][:digit:]]+)"; In this case, I am saying the field should start with zero or more blank characters and ends with basically any other character except blank characters. Here is a link if you are having trouble understanding POSIX bracket expressions.
Also, change the output field to OFS = ""; separator because once the line has been manipulated, the output will add an extra blank space as a separator if you don't change OFS from its default.
I used the same example to test.
$ cat example-output.txt
-rw-r--r-- 1 jack jack 8 Jun 19 2013 qunit-1.11.0.css
-rw-r--r-- 1 jack jack 56908 Jun 19 2013 qunit-1.11.0.js
-rw-r--r-- 1 jack jack 4306 Dec 29 09:16 test1.html
-rw-r--r-- 1 jack jack 5476 Dec 7 08:09 test1.js
$ awk 'BEGIN { FPAT = "([[:space:]]*[[:alnum:][:punct:][:digit:]]+)"; OFS = ""; } { $6 = substr( $6, 1, 2); print $0; }' example-output.txt
-rw-r--r-- 1 jack jack 8 J 19 2013 qunit-1.11.0.css
-rw-r--r-- 1 jack jack 56908 J 19 2013 qunit-1.11.0.js
-rw-r--r-- 1 jack jack 4306 D 29 09:16 test1.html
-rw-r--r-- 1 jack jack 5476 D 7 08:09 test1.js
Keep in mind. The fields now have leading spaces. So if the field needs to be replaced by something else, you can do
len = length($1);
$1 = sprintf("%"(len)"s", "-42-");
$ awk 'BEGIN { FPAT = "([[:space:]]*[[:alnum:][:punct:][:digit:]]+)"; OFS = ""; } { if(NR==1){ len = length($1); $1 = sprintf("%"(len)"s", "-42-"); } print $0; }' example-output.txt
-42- 1 jack jack 8 Jun 19 2013 qunit-1.11.0.css
-rw-r--r-- 1 jack jack 56908 Jun 19 2013 qunit-1.11.0.js
-rw-r--r-- 1 jack jack 4306 Dec 29 09:16 test1.html
-rw-r--r-- 1 jack jack 5476 Dec 7 08:09 test1.js
It's possible to preserve the original whitespaces by editing $0 instead of individual fields ($1, $2, ...), for example:
$ ls -l | awk '{$0 = substr($1, 1, 3) substr($0, length($1) + 1)} 1'
tot 88
-rw 1 jack jack 8 Jun 19 2013 qunit-1.11.0.css
-rw 1 jack jack 56908 Jun 19 2013 qunit-1.11.0.js
-rw 1 jack jack 4306 Dec 29 09:16 test1.html
-rw 1 jack jack 5476 Dec 7 08:09 test1.js
This is relatively easy to do when editing the first column, but gets troublesome when editing others ($2, ..., $4), and breaks down after fields where the width of the whitespace in between is not fixed ($5 and beyond in this example).
UPDATE
Based on #Håkon Hægland's answer, here's a way to keep the first 2 characters of the 6th field (the month):
{
n = split($0, f, " ", sep)
f[6] = substr(f[6], 1, 2)
line = sep[0]
for (i = 1; i <= n; ++i) line = line f[i] sep[i]
print line
}
The simplest solution is to make sure that the field spliting is done on every single space. That is done by making the field separator [ ]:
$ awk -F '[ ]' '{$1=substr($1,1,3)}1' infile
-rw 1 jack jack 8 Jun 19 2013 qunit-1.11.0.css
-rw 1 jack jack 56908 Jun 19 2013 qunit-1.11.0.js
-rw 1 jack jack 4306 Dec 29 09:16 test1.html
-rw 1 jack jack 5476 Dec 7 08:09 test1.js
By default, awk will split on any repetition of white spaces (tabs and spaces, something similar to [ \t]+. The manual states:
In the special case that FS is a single space, fields are separated by runs of spaces and/or tabs and/or newlines.
That will collapse runs of spaces, tabs and newlines to only one value of OFS in the output. If OFS is also an space (also the default), the result is that only one space will be printed for each run of white space.
But awk could be told to select only one space as a field delimiter using a regular expression that will match only one character: [ ].
Note that that will change the field numbers of fields. Each space will start a new field. So, note this result from the data you presented:
$ awk -F '[ ]' '{print($4,$5,$6)}' infile
jack
jack 56908 Jun
jack 4306
jack 5476
In this specific case, there are no spaces before the first field, and only one space after, that's why it works correctly.