GBQ SQL: Find instance of X and pull corresponding row data - sql

I have a table that records the history of each ID per LOCATION. This table is updated each day to keep track of the history of any change in a certain row(ID). Note: The date field is not in chronological order.
ID Location Count Date (datetime type)
1 A 20 2020-01-15T12:00:00.000
1 A 10 2020-04-15T12:00:00.000
1 A 15 2020-03-15T12:00:00.000
1 B 10 2020-05-15T12:00:00.000
1 B 5 2020-06-15T12:00:00.000
1 B 0 2020-07-15T12:00:00.000
2 A 18 2020-01-15T12:00:00.000
2 A 0 2020-04-15T12:00:00.000
2 A 14 2020-03-15T12:00:00.000
2 B 10 2020-05-15T12:00:00.000
2 B 5 2020-06-15T12:00:00.000
2 B 1 2020-07-15T12:00:00.000
For each unique ID, I need to pull the first instance (oldest date) when the Count value is zero. If a unique ID does not have an instance where it Count value is zero, I need to pull the most current Count value.
Here's what my results should look like below:
ID Location Count Date (datetime type)
1 A 10 2020-04-15T12:00:00.000
1 B 0 2020-07-15T12:00:00.000
2 A 0 2020-04-15T12:00:00.000
2 B 1 2020-07-15T12:00:00.000
I can't seem to wrap my head around how to code this in Google BigQuery.

Below is for BigQuery Standard SQL
#standardSQL
SELECT AS VALUE
CASE COUNTIF(count = 0)
WHEN 0 THEN ARRAY_AGG(t ORDER BY date DESC LIMIT 1)
ELSE ARRAY_AGG(t ORDER BY count, date LIMIT 1)
END [OFFSET(0)]
FROM `project.dataset.table` t
GROUP BY id, location
if to apply to sample data from your question - output is
Row id location count date
1 1 A 10 2020-04-15 12:00:00 UTC
2 1 B 0 2020-07-15 12:00:00 UTC
3 2 A 0 2020-04-15 12:00:00 UTC
4 2 B 1 2020-07-15 12:00:00 UTC

Related

From Quarter-to-date data to daily values on bigquery

I have a bq table that recieves data quarter to date per id
id
value
date
1
200
02/11/2022
2
70
02/11/2022
3
120
02/11/2022
1
150
01/11/2022
2
50
01/11/2022
3
100
01/11/2022
So each id got the cumulative data of the quarter
I need to create a view that takes each day's value minus the day's before per id
for id 1 (02/11 value minus 01/11 value, ...etc)
so the output should be like this
id
value
date
1
50
02/11/2022
2
20
02/11/2022
3
20
02/11/2022
1
150
01/11/2022
2
50
01/11/2022
3
100
01/11/2022
any help is really appreciated
you might consider below query.
SELECT ID, date,
value - LEAD(value, 1, 0) OVER (
PARTITION BY id ORDER BY UNIX_DATE(PARSE_DATE('%d/%m/%Y', date)) DESC
) AS new_value
FROM sample_data;
Query results

How can I calculate user session time from heart beat data in Presto SQL?

I'm currently recording when user's are active via a heart beat. It's stored in a table like so:
User ID
Minute of Day
1
3
1
4
1
5
1
8
1
9
2
2
2
3
2
4
User ID 1 is active from 3 to 5 but then is inactive from 6 to 7 and then becomes active again from 8 to 9.
User ID 1 was active for 3 minutes: (5-3 + 9-8) = 3
User ID 2 was active for 2 minutes: 4-2 = 2
How can I calculate this using a SQL (Presto) query?
Output should be like so:
User ID
Total Minutes
1
3
2
2
You may try the following which uses the lag function to determine active periods (diff = 1) before summing them
SELECT
USERID,
SUM(diff) as TotalMinutes
FROM (
SELECT
UserId,
(MinuteofDay - LAG(MinuteofDay,1,MinuteofDay) OVER (PARTITION BY UserId ORDER BY MinuteofDay)) as diff
FROM
my_table
) t
WHERE
diff = 1
GROUP BY
UserID;
userid
TotalMinutes
1
3
2
2
View on DB Fiddle

GBQ SQL: How to find first instance of X value and pull a corresponding row

I have a table that records the history of each ID per LOCATION. This table is updated each day to keep track of the history of any change in a certain row(ID). Note: The date field is not in chronological order.
ID Count Date (datetime type)
1 20 2020-01-15T12:00:00.000
1 16 2020-03-15T12:00:00.000
1 13 2020-04-15T12:00:00.000
1 4 2020-05-15T12:00:00.000
1 0 2020-06-15T12:00:00.000
2 20 2020-01-15T12:00:00.000
2 10 2020-02-15T12:00:00.000
3 12 2020-01-15T12:00:00.000
3 10 2020-02-15T12:00:00.000
3 0 2020-03-15T12:00:00.000
For each unique ID, I need to pull the first instance (oldest date) when the Count value is zero. If a unique ID does not have an instance where it Count value is zero, I need to pull the most current Count value.
Here's what my results should look like below:
ID Count Date (datetime type)
1 0 2020-06-15T12:00:00.000
2 10 2020-02-15T12:00:00.000
3 0 2020-03-15T12:00:00.000
I can't seem to wrap my head around how to code this in Google BigQuery.
Below is for BigQuery Standard SQL
#standardSQL
SELECT AS VALUE
CASE COUNTIF(count = 0)
WHEN 0 THEN ARRAY_AGG(t ORDER BY date DESC LIMIT 1)[OFFSET(0)]
ELSE ARRAY_AGG(t ORDER BY count, date LIMIT 1)[OFFSET(0)]
END
FROM `project.dataset.table` t
GROUP BY id
if to apply to sample data in your question - output is
Row id count date
1 1 0 2020-05-15 12:00:00 UTC
2 2 10 2020-03-15 12:00:00 UTC
3 3 0 2020-06-15 12:00:00 UTC
Do you just want the last row for each id?
One method is row_number():
select t.*
from (select t.*,
row_number() over (partition by id
order by case when count = 0 then date end nulls last,
date desc
) as seqnum
from t
) t
where seqnum = 1;
But I also like using aggregation in BigQuery:
select (array_agg(t order by date desc limit 1))[ordinal(1)]
from t
group by id;

Select Query to Get Unique Cells in Two Columns

I have an SQL Server database, that logs weather device sensor data.
The table looks like this:
Id DeviceId SensorId Value
1 1 1 42
2 1 1 3
3 1 2 30
4 2 2 0
5 2 1 1
6 3 1 26
7 3 1 23
8 3 2 1
In return the query should return the following:
Id DeviceId SensorId Value
2 1 1 3
3 1 2 30
4 2 2 0
5 2 1 1
7 3 1 23
8 3 2 1
For each device the sensor should be unique. i.e. Values in Columns DeviceId and SensorId should be unique (row-wise).
Apologies if I'm not clear enough.
If you don't want to sum Value as your desired result suggest, so you just want to take an "arbitrary" row of each "DeviceId + SensorId"-group:
WITH CTE AS
(
SELECT Id, DeviceId, SensorId, Value,
RN = ROW_NUMBER() OVER (PARTITION BY DeviceId, SensorId ORDER BY ID DESC)
FROM dbo.TableName
)
SELECT Id, DeviceId, SensorId, Value
FROM CTE
WHERE RN = 1
ORDER BY ID
This returns the row with the highest ID per group. You need to change ORDER BY ID DESC if you want a different result. Demo: http://sqlfiddle.com/#!6/8e31b/2/0 (your result)

Single SQL query to display aggregate data while grouping by 3 fields

I have a table that contains basic info:
CREATE TABLE testing.testtable
(
recordId serial NOT NULL,
nameId integer,
teamId integer,
countryId integer,
goals integer,
outs integer,
assists integer,
win integer,
sys_time timestamp with time zone NOT NULL DEFAULT now(),
CONSTRAINT testtable_pkey PRIMARY KEY (recordid)
)
I want one single SQL query, (with one record per person-team-country) to display the following data. Note that I want it to group by nameId, teamId, and countryId
Name, Team, and Country
Goal/out ratio (G/O)
Goal + Assist / out ratio (GA/O)
Win percentage (Win%)
The difference between the current goal/out ratio and what it was one month ago (rDif)
The difference between the current goal+assist/out ratio and what it was one month ago (fDif)
The difference between the current win % and what it was one month ago (winDif)
Example Table with all records:
Id nameId teamId countryId goals outs assists win sys_time
1 1 3 5 2 4 11 1 2013-01-01
2 1 3 5 9 4 19 1 2013-01-01
3 1 3 4 10 2 1 0 2013-01-01
4 1 3 4 11 50 14 1 2013-01-01
5 2 2 2 10 5 4 1 2013-01-01
6 2 3 5 4 7 15 0 2013-01-01
7 1 3 5 4 8 22 0 2014-07-01
8 1 3 4 11 3 5 1 2014-07-01
9 3 1 4 44 1 4 1 2014-07-01
Example desired output record (1-3-5):
nameId teamId countryId G/O GA/O Win% rDif fDif winDif
1 3 5 0.938 4.19 66 0.44 0.94 -0.34
The ratios are easy enough to retrieve.. for the differences, I've done the following:
select tt.nameid
avg(tt.goals) - avg(case when tt.sys_time < date_trunc('day', NOW() - interval '1 month') then tt.goals end) as change
from testing.testtable tt
group by tt.nameid
order by change desc
This works if I want the differences for only the nameIds. But I want it to pull one record for each combination of name-team-country. I can't seem to get that working.
You can group by multiple fields:
select tt.nameid, tt.teamID, tt.countryID,
avg(tt.goals) - avg(case when tt.sys_time < date_trunc('day', NOW() - interval '1 month') then tt.goals end) as change
from testing.testtable tt
group by tt.nameid, tt.teamID, tt.countryID
order by change desc
just off the top of my head I think it would work for you to use
group by tt.nameid, tt.teamId, tt.countryId