GBQ SQL: How to find first instance of X value and pull a corresponding row - sql

I have a table that records the history of each ID per LOCATION. This table is updated each day to keep track of the history of any change in a certain row(ID). Note: The date field is not in chronological order.
ID Count Date (datetime type)
1 20 2020-01-15T12:00:00.000
1 16 2020-03-15T12:00:00.000
1 13 2020-04-15T12:00:00.000
1 4 2020-05-15T12:00:00.000
1 0 2020-06-15T12:00:00.000
2 20 2020-01-15T12:00:00.000
2 10 2020-02-15T12:00:00.000
3 12 2020-01-15T12:00:00.000
3 10 2020-02-15T12:00:00.000
3 0 2020-03-15T12:00:00.000
For each unique ID, I need to pull the first instance (oldest date) when the Count value is zero. If a unique ID does not have an instance where it Count value is zero, I need to pull the most current Count value.
Here's what my results should look like below:
ID Count Date (datetime type)
1 0 2020-06-15T12:00:00.000
2 10 2020-02-15T12:00:00.000
3 0 2020-03-15T12:00:00.000
I can't seem to wrap my head around how to code this in Google BigQuery.

Below is for BigQuery Standard SQL
#standardSQL
SELECT AS VALUE
CASE COUNTIF(count = 0)
WHEN 0 THEN ARRAY_AGG(t ORDER BY date DESC LIMIT 1)[OFFSET(0)]
ELSE ARRAY_AGG(t ORDER BY count, date LIMIT 1)[OFFSET(0)]
END
FROM `project.dataset.table` t
GROUP BY id
if to apply to sample data in your question - output is
Row id count date
1 1 0 2020-05-15 12:00:00 UTC
2 2 10 2020-03-15 12:00:00 UTC
3 3 0 2020-06-15 12:00:00 UTC

Do you just want the last row for each id?
One method is row_number():
select t.*
from (select t.*,
row_number() over (partition by id
order by case when count = 0 then date end nulls last,
date desc
) as seqnum
from t
) t
where seqnum = 1;
But I also like using aggregation in BigQuery:
select (array_agg(t order by date desc limit 1))[ordinal(1)]
from t
group by id;

Related

Getting count of last records of 2 columns SQL

I was looking for a solution for the below mentioned scenario.
So my table structure is like this ; Table name : energy_readings
equipment_id
meter_id
readings
reading_date
1
1
100
01/01/2022
1
1
200
02/01/2022
1
1
null
03/01/2022
1
2
100
01/01/2022
1
2
null
04/01/2022
2
1
null
04/01/2022
2
1
399
05/01/2022
2
2
null
02/01/2022
So from this , I want to get the number of nulls for the last record of same equipment_id and meter_id. (Should only consider the nulls of the last record of same equipment_id and meter_id)
EX : Here , the last reading for equipment 1 and meter 1 is a null , therefore it should be considered for the count. Also the last reading(Latest Date) for equipment 1 and meter 2 is a null , should be considered for count. But even though equipment 2 and meter 1 has a null , it is not the last record (Latest Date) , therefore should not be considered for the count.
Thus , this should be the result ;
equipment_id
Count
1
2
2
1
Hope I was clear with the question.
Thank you!
You can use CTE like below. CTE LatestRecord will get latest record for equipment_id & meter_id. Later you can join it with your current table and use WHERE to filter out record with null values only.
;WITH LatestRecord AS (
SELECT equipment_id, meter_id, MAX(reading_date) AS reading_date
FROM energy_readings
GROUP BY equipment_id, meter_id
)
SELECT er.meter_id, COUNT(1) AS [Count]
FROM energy_readings er
JOIN LatestRecord lr
ON lr.equipment_id = er.equipment_id
AND lr.meter_id = er.meter_id
AND lr.reading_date = er.reading_date
WHERE er.readings IS NULL
GROUP BY er.meter_id
with records as(
select equ_id,meter_id,reading_date,readings,
RANK() OVER(PARTITION BY meter_id,equ_id
order by reading_date) Count
from equipment order by equ_id
)
select equ_id,count(counter)
from
(
select equ_id,meter_id,reading_date,readings,MAX(Count) as counter
from records
group by meter_id,equ_id
order by equ_id
) where readings IS NULL group by equ_id
Explanation:-
records will order data by reading_date and will give counting as 1,2,3..
select max of count from records
select count of counter where reading is null
Partition by will give counting as shown in image
Result

SQL Sorted Count

I have the following table sorted by date:
date
id
9/1/20
1
9/1/20
2
9/3/20
1
9/4/20
3
9/4/20
2
9/6/20
1
I'd like to add a count column for each id so that the first count for each id is the earliest date and latest date would receive the highest count for each id:
date
id
count
9/1/20
1
1
9/1/20
2
1
9/3/20
1
2
9/4/20
3
1
9/4/20
2
2
9/6/20
1
3
How can I structure my Postgresql query to assemble this count column?
This looks like row_number():
select t.*,
row_number() over (partition by id order by date) as seqnum
from t
order by date, id;

GBQ SQL: Find instance of X and pull corresponding row data

I have a table that records the history of each ID per LOCATION. This table is updated each day to keep track of the history of any change in a certain row(ID). Note: The date field is not in chronological order.
ID Location Count Date (datetime type)
1 A 20 2020-01-15T12:00:00.000
1 A 10 2020-04-15T12:00:00.000
1 A 15 2020-03-15T12:00:00.000
1 B 10 2020-05-15T12:00:00.000
1 B 5 2020-06-15T12:00:00.000
1 B 0 2020-07-15T12:00:00.000
2 A 18 2020-01-15T12:00:00.000
2 A 0 2020-04-15T12:00:00.000
2 A 14 2020-03-15T12:00:00.000
2 B 10 2020-05-15T12:00:00.000
2 B 5 2020-06-15T12:00:00.000
2 B 1 2020-07-15T12:00:00.000
For each unique ID, I need to pull the first instance (oldest date) when the Count value is zero. If a unique ID does not have an instance where it Count value is zero, I need to pull the most current Count value.
Here's what my results should look like below:
ID Location Count Date (datetime type)
1 A 10 2020-04-15T12:00:00.000
1 B 0 2020-07-15T12:00:00.000
2 A 0 2020-04-15T12:00:00.000
2 B 1 2020-07-15T12:00:00.000
I can't seem to wrap my head around how to code this in Google BigQuery.
Below is for BigQuery Standard SQL
#standardSQL
SELECT AS VALUE
CASE COUNTIF(count = 0)
WHEN 0 THEN ARRAY_AGG(t ORDER BY date DESC LIMIT 1)
ELSE ARRAY_AGG(t ORDER BY count, date LIMIT 1)
END [OFFSET(0)]
FROM `project.dataset.table` t
GROUP BY id, location
if to apply to sample data from your question - output is
Row id location count date
1 1 A 10 2020-04-15 12:00:00 UTC
2 1 B 0 2020-07-15 12:00:00 UTC
3 2 A 0 2020-04-15 12:00:00 UTC
4 2 B 1 2020-07-15 12:00:00 UTC

Count rows within each group when condition is satisfied Sql Server

I have at table which looks like below:
ID Date IsFull
1 2020-01-05 0
1 2020-02-05 0
1 2020-02-25 1
1 2020-03-01 1
1 2020-03-20 1
I want to display how many months for ID = 1
have sum(isfull)/count(*) > .6 in a given month (More than 60% of the times in that month isfull = 1)
So the final output should
ID HowManyMonths
1 1 --------(Only month 3----2 out 2 cases)
If the question changes to sum(isfull)/count(*) > .4
then the final output should be
ID HowManyMonths
1 2 --------(Month 2 and Month 3)
Thanks!!
You can do this with two levels of aggregation:
select id, count(*) howManyMonths
from (
select id
from mytable
group by id, year(date), month(date)
having avg(1.0 * isFull) > 0.6
) t
group by id
The subquery aggregates by id, year and month, and uses a having clause to filter on groups that meet the success rate (avg() comes handy for this). The outer query counts how many month passed the target rate for each id.

SQL Get highest repeating value for a group clause

I want a SQL query which should tell me that for each ID which value repeated most of time.
For example lets take the following table:
Id Value
1 10
1 20
1 10
1 10
2 1
1 3
Desired Output
Id Value Count
1 10 3
2 1 1
From above example, it shows that for Id 1, Value 10 was repeated most of times and for Id 2, value 1 was repeated most of times
Any suggestion would be really appreciated.
Use rank to number the id's based on their value counts in descending order and pick up the 1st ranked rows.
select id, value, cnt
from (select id, value, count(*) as cnt,
rank() over (partition by id order by count(*) desc) as rnk
from t
group by id, value) x
where rnk = 1
Based on Gordon's comment, if you need only one value per id in case of ties, use row_number instead of rank, as rank returns all the ties in value counts.