SQL aggregation on the latest output per machine for each time - sql

I have the following table:
ID machine app output time
1 1 A 12 1
2 1 B 15 1
3 1 B 8 3
4 1 A 11 4
5 2 C 14 4
6 2 D 17 4
For each app I want to get the latest output given up to each point in time, and aggregate these results grouped by machine using AVG
So for the table on top, the data before aggregation should be:
time machine app latest
1 1 A 12
1 1 B 15
3 1 A 12
3 1 B 8
4 1 A 11
4 1 B 8
4 2 C 14
4 2 D 17
And the aggregated result should be:
time machine avg
1 1 =(12+15)/2
3 1 =(12+8)/2
4 1 =(11+8)/2
4 2 =(14+17)/2
What is the correct way to approach this problem?

It is not as simple as I thought to be, but I think it works just as You want. I changed time column to ts, like this:
CREATE TABLE Table1
(ID int, machine int, app char(1), output int, ts int)
;
INSERT INTO Table1
(ID,machine,app,output, ts)
VALUES
(1, 1, 'A', 12, 1),
(2, 1, 'B', 15, 1),
(3, 1, 'B', 8, 3),
(4, 1, 'A', 11, 4),
(5, 2, 'C', 14, 4),
(6, 2, 'D', 17, 4)
;
And here is the query:
WITH
times as
(
SELECT distinct ts FROM Table1
),
machine_apps as
(
SELECT DISTINCT machine,app FROM Table1
),
grid as
(
SELECT
ts,machine,app
FROM
times
CROSS JOIN machine_apps
),
last_outputs as
(
SELECT
g.ts,
g.app,
g.machine,
max(t.ts) as last_time
FROM
grid g
JOIN Table1 t ON (t.app = g.app AND t.machine = g.machine AND t.ts <= g.ts)
GROUP BY
g.ts,
g.app,
g.machine
)
SELECT
l.ts,
l.machine,
AVG(t.output) as avg
FROM
last_outputs l
LEFT JOIN Table1 t ON (t.app = l.app AND t.machine = l.machine AND t.ts = l.last_time)
GROUP BY
l.ts,
l.machine
ORDER BY
l.ts,
l.machine

Related

Match rows that include one of each at least once in SQL

I have a users table:
ID Name OID TypeID
1 a 1 1
2 b 1 2
3 c 1 3
4 d 2 1
5 e 2 1
6 f 2 2
7 g 3 2
8 h 3 2
9 i 3 2
for this table, I want to filter by OID and TypeID so that I get the rows that it is filtered by OID and that includes all 1, 2, and 3 in TypeID.
For example, where OID=1, we have 1, 2, and 3 in TypeID but I shouldn't get the rows with IDs 4-6 because for IDs 4-6, OIDs are the same but TypeID does not include all of each(1, 2, and 3).
You can do :
select oid
from table t
where typeid in (1,2,3)
group by oid
having count(*) = 3;
If, oid contain duplicate typeid then you can use count(distinct typeid) instead.
you could use exists
select oid from table t1
where exists ( select 1 from table t1 where t1.oid=t2.oid
group by t2.oid
having (distinct TypeID)=3
)
Asume TypeID 1,2,3
if you are using sql-server, you can try this.
DECLARE #SampleData TABLE(ID INT, Name VARCHAR(5), OID INT, TypeID INT)
INSERT INTO #SampleData VALUES
(1 , 'a', 1, 1),
(2 , 'b', 1, 2),
(3 , 'c', 1, 3),
(4 , 'd', 2, 1),
(5 , 'e', 2, 1),
(6 , 'f', 2, 2),
(7 , 'g', 3, 2),
(8 , 'h', 3, 2),
(9 , 'i', 3, 2)
SELECT * FROM #SampleData D
WHERE NOT EXISTS (
SELECT * FROM #SampleData D1
RIGHT JOIN (VALUES (1),(2),(3)) T(TypeID) ON D1.TypeID = T.TypeID
AND D.OID = D1.OID
WHERE D1.TypeID IS NULL
)
Result:
ID Name OID TypeID
----------- ----- ----------- -----------
1 a 1 1
2 b 1 2
3 c 1 3

BigQuery: select the nth smallest value in window, ordered by another value

My table has two integer columns: a and b. For each row, I want to select the nth smallest value of b among the rows with smaller a values. Here's a sample input/output, with n=2.
Input:
a | b
-------
1 | 4
2 | 2
3 | 5
4 | 3
5 | 9
6 | 1
7 | 7
8 | 6
9 | 0
Output:
a | 2th min b
-------------
1 | null ← only 1 element in [4], no 2nd min
2 | 4 ← 2nd min between [4,2]
3 | 4 ← 2nd min between [4,2,5]
4 | 3 ← 2nd min between [4,2,5,3]
5 | 3 ← etc.
6 | 2
7 | 2
8 | 2
9 | 1
I used n=2 here to keep it simple, but in practice, I want the 2000th smallest value (or some other large-ish constant). The column a can be assumed to contain distinct integers (and even 1, 2, 3, … if that's easier).
The problem is that if I use ORDER BY b in my window clause and NTH_VALUE, it just computes the answer on the wrong set of values:
WITH data AS (
SELECT 1 AS a, 4 AS b
UNION ALL SELECT 2 AS a, 2 AS b
UNION ALL SELECT 3 AS a, 5 AS b
UNION ALL SELECT 4 AS a, 3 AS b
UNION ALL SELECT 5 AS a, 9 AS b
UNION ALL SELECT 6 AS a, 1 AS b
)
SELECT nth_value(b, 2) over (order by a)
from data
returns [null, 2, 2, 2, 2, 2]: the values are ordered by a (so in the same order than they appear), so the value b=2 is always the one in second place. I want to order by a and then take the nth smallest value of b. Any idea how to write this in BigQuery (preferably Standard SQL)?
Below is for BigQuery Standard SQL and produces correct result for given example.
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 a, 4 b UNION ALL
SELECT 2, 2 UNION ALL
SELECT 3, 5 UNION ALL
SELECT 4, 3 UNION ALL
SELECT 5, 9 UNION ALL
SELECT 6, 1 UNION ALL
SELECT 7, 7 UNION ALL
SELECT 8, 6 UNION ALL
SELECT 9, 0
)
SELECT
a,
(SELECT b FROM
(SELECT b FROM UNNEST(c) b ORDER BY b LIMIT 2)
ORDER BY b DESC LIMIT 1
) b2
FROM (
SELECT a, IF(ARRAY_LENGTH(c) > 1, c, [NULL]) c
FROM (
SELECT a, ARRAY_AGG(b) OVER (ORDER BY a) c
FROM `project.dataset.table`
)
)
-- ORDER BY a
with expected result as below
Row a b2
1 1 null
2 2 4
3 3 4
4 4 3
5 5 3
6 6 2
7 7 2
8 8 2
9 9 1
Note: to make it work for 2000th element you might change 2 to 2000 in LIMIT 2
meantime, i can admit it looks a little ugly/messy to me and not sure about scalability but you can give it a shot
Quick Update
Below is a little less ugly looking version (same output of course)
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 a, 4 b UNION ALL
SELECT 2, 2 UNION ALL
SELECT 3, 5 UNION ALL
SELECT 4, 3 UNION ALL
SELECT 5, 9 UNION ALL
SELECT 6, 1 UNION ALL
SELECT 7, 7 UNION ALL
SELECT 8, 6 UNION ALL
SELECT 9, 0
)
SELECT a, c[SAFE_ORDINAL(2)] b2 FROM (
SELECT x.a, ARRAY_AGG(y.b ORDER BY y.b LIMIT 2) c
FROM `project.dataset.table` x
CROSS JOIN `project.dataset.table` y
WHERE y.a <= x.a
GROUP BY x.a
)
-- ORDER BY a
For 2000th element replace 2 to 2000 in LIMIT 2 and SAFE_ORDINAL(2)
Still potentially same issue with scalability because of (now) explicit CROSS JOIN

To create a column Summing the values from another column in the same view

View:
A | B
10 1
15 2
12 3
5 2
2 1
2 1
Output View:
A | B | C
10 1 14
15 2 20
12 3 12
5 2 20
2 1 14
2 1 14
I need to sum the values from column A based on column B. So, all the values from column B having value 1 extract values from column A and then sum it to column C.
I don't see the point, but:
SELECT t.a
,t.b
,sumtab.c
FROM [yourtable] t
INNER JOIN (
SELECT t.b
,sum(t.a) AS C
FROM [yourtable] t
GROUP BY t.b
) AS sumtab
ON t.b = sumtab.b
You could use SUM() OVER like this
DECLARE #SampleData AS TABLE
(
A int,
B int
)
INSERT INTO #SampleData
(
A,
B
)
VALUES
( 10, 1),
( 15, 2),
( 12, 3),
( 5 , 2),
( 2 , 1),
( 2 , 1)
SELECT *,
sum(sd.A) OVER(PARTITION BY sd.B) AS C
FROM #SampleData sd
Returns
A B C
-----------
10 1 14
2 1 14
2 1 14
15 2 20
5 2 20
12 3 12

How would I select the max for each row of data based on timestamp and unique id using SQL? [duplicate]

This question already has answers here:
Select first row in each GROUP BY group?
(20 answers)
Closed 6 years ago.
I have a table in my database that I am using a SQL query to retrieve data from. In my query, I am replacing some text and using integers. The query returns the data below:
user_id | event_code | total_bookmarks | total_folders | folder_depth | ts
0 8 34 6 1 128926
0 8 35 6 1 129001
4 8 18 2 1 123870
6 8 30 2 1 130099
6 8 30 2 1 132000
6 8 30 2 1 147778
The query I am using is:
SELECT
user_id,
event_code,
CAST(REPLACE(data1, 'total bookmarks', '') AS INTEGER) as total_bookmarks,
CAST(REPLACE(data2, 'folders', '') AS INTEGER) as total_folders,
CAST(REPLACE(data3, 'folder depth ', '') AS INTEGER) as folder_depth,
timestamp AS ts
FROM events
WHERE event_code = 8
What do I need to add to my query in order to only select the rows for each unique user_id with the max ts (timestamp) for each id? I tried MAX(timestamp), but I get two rows returned for the same ID if the total_bookmark is different (example: user_id 0 having 34 in one row, and 35 in another) I want the table to look like this:
user_id | event_code | total_bookmarks | total_folders | folder_depth | ts
0 8 34 6 1 129001
4 8 18 2 1 123870
6 8 30 2 1 147778
Declare #table table (user_id int, event_code int, total_bookmarks int, total_folders int, folder_depth int, ts decimal(18,0))
Insert into #table (user_id , event_code , total_bookmarks , total_folders , folder_depth , ts)
Values (0,8,34,6,1,128926),
(0,8,34,6,1,129001),
(4, 8, 18 , 2, 1, 123870),
(6, 8, 30, 2, 1, 130099),
(6, 8, 30, 2, 1, 132000),
(6, 8, 30, 2, 1, 147778)
Select * from #table
Select user_id,event_code,total_bookmarks,total_folders,folder_depth,ts
From (
Select RANK() over (Partition by user_id
Order by ts desc
) as Rank,
user_id,event_code,total_bookmarks,total_folders,folder_depth,ts
From #table
) D1
Where D1.Rank = 1

Oracle: enumerate groups of similar rows

I have the following table:
ID | X
1 | 1
2 | 2
3 | 5
4 | 6
5 | 7
6 | 9
I need to enumerate groups of rows in such way that if row i and i-1 differ in column X by less than 2 they should have the same group number N. See example below.
ID | X | N
1 | 1 | 1
2 | 2 | 1
3 | 5 | 2
4 | 6 | 2
5 | 7 | 2
6 | 9 | 3
Note that rows X(2)-X(1)=1 so they are grouped in the first group. Than X(3)-X(2)=3 so the 3rd row goes to 2nd group with 3rd and 4th row. X(6)-X(5)=2 so 6th row is in the 3rd group.
Can anybody help me with writing SQL query that will return the second table?
This should do it:
select id, x, sum(new_group) over (order by id) as group_no
from
( select id, x, case when x-prev_x = 1 then 0 else 1 end new_group
from
( select id, x, lag(x) over (order by id) prev_x
from mytable
)
);
I get the correct answer for your data with that query.
SQL> create table mytable (id,x)
2 as
3 select 1, 1 from dual union all
4 select 2, 2 from dual union all
5 select 3, 5 from dual union all
6 select 4, 6 from dual union all
7 select 5, 7 from dual union all
8 select 6, 9 from dual
9 /
Table created.
SQL> select id
2 , x
3 , sum(y) over (order by id) n
4 from ( select id
5 , x
6 , case x - lag(x) over (order by id)
7 when 1 then 0
8 else 1
9 end y
10 from mytable
11 )
12 order by id
13 /
ID X N
---------- ---------- ----------
1 1 1
2 2 1
3 5 2
4 6 2
5 7 2
6 9 3
6 rows selected.
Which is essentially the same as Tony's answer, only one inline view less.
Regards,
Rob.
Using basic operations only:
create table test(id int, x int);
insert into test values(1, 1), (2, 2), (3, 5), (4, 6), (5, 7), (6, 9);
create table temp as
select rownum() r, 0 min, x max
from test t
where not exists(select * from test t2 where t2.x = t.x + 1);
update temp t set min = select max + 1 from temp t2 where t2.r = t.r - 1;
update temp t set min = 0 where min is null;
select * from temp order by r;
select t.id, t.x, x.r from test t, temp x where t.x between x.min and x.max;
drop table test;
drop table temp;