Get the last time a value has changed in Google BigQuery - sql

I have an employee database which contains records about employees. The fields are :
employee_identifier
employee_salary
date_of_the_record
I would like to get, for each record, the date of the last change in employee_salary. Which SQL query could work ?
I have tried with multiple sub-queries, but it does not work.

Below is for BigQuery Standard SQL
#standardSQL
SELECT * EXCEPT(arr),
(SELECT MAX(date_of_the_record) FROM UNNEST(arr)
WHERE employee_salary != t.employee_salary
) AS last_change_in_employee_salary
FROM (
SELECT *, ARRAY_AGG(STRUCT(employee_salary, date_of_the_record)) OVER(win) arr
FROM `project.dataset.employee_database`
WINDOW win AS (PARTITION BY employee_identifier ORDER BY date_of_the_record)
) t

use row_number()
with cte as
(
select *,
row_number()over(partition by employee_identifier order by date_of_the_record desc) rn from table_name
) select * from cte where rn=1

You can also do this without a subquery. If you want all the columns:
SELECT as value ARRAY_AGG(t ORDER BY date_of_the_record DESC LIMIT 1)[ordinal(1)]
FROM t t
GROUP BY employee_identifier;
If you just want the date, use GROUP BY:
SELECT employee_identifier, MAX(date_of_the_record)
FROM t t
GROUP BY employee_identifier;

Related

Filter out null values resulting from window function lag() in SQL query

Example query:
SELECT *,
lag(sum(sales), 1) OVER(PARTITION BY department
ORDER BY date ASC) AS end_date_sales
FROM revenue
GROUP BY department, date;
I want to show only the rows where end_date is not NULL.
Is there a clause used specifically for these cases? WHERE or HAVING does not allow aggregate or window function cases.
One method uses a subquery:
SELECT r.*
FROM (SELECT r. *,
LAG(sum(sales), 1) OVER (ORDER BY date ASC) AS end_date
FROM revenue r
) r
WHERE end_date IS NOT NULL;
That said, I don't think the query is correct as you have written it. I would assume that you want something like this:
SELECT r.*
FROM (SELECT r. *,
LEAD(end_date, 1) OVER (PARTITION BY ? ORDER BY date ASC) AS end_date
FROM revenue r
) r
WHERE end_date IS NOT NULL;
Where ? is a column such as the customer id.
Try this
select * from (select distinct *,SUM(sales) OVER (PARTITION BY dept) from test)t
where t.date in(select max(date) from test group by dept)
order by date,dept;
And one more simpler way without sub query
SELECT distinct dept,MAX(date) OVER (PARTITION BY dept),
SUM(sales) OVER (PARTITION BY dept)
FROM test;

get the max record out of a google big query

I have a table with 198Mil records.
I am using this query to get the LATEST records for each ID:
with cte as (
select *, row_number() OVER (Partition by ATTOM_ID ORDER BY LastLoadDate DESC) rnum from `mother-216719.PROPERTY.ATTOM_DETAIL`
) Select * from cte where rnum = 1
Of note, there's 240 columns in this table.
This has been running over an hour, with no avail.
Is there a way to make this work?
Thanks!
Try below approach - usually it helps
#standardSQL
SELECT AS VALUE ARRAY_AGG(t ORDER BY LastLoadDate DESC LIMIT 1)[OFFSET(0)]
FROM `mother-216719.PROPERTY.ATTOM_DETAIL` t
GROUP BY ATTOM_ID

Oracle optimise SQL query - Multiple Max()

I have a table where first I need to select data by max(event_date) then need to
filter the data by max(event_sequence) then filter again by max(event_number)
I wrote following query which works but takes time.
Here the the query
SELECT DISTINCT a.stuid,
a.prog,
a.stu_prog_id,
a.event_number,
a.event_date,
a.event_sequence,
a.prog_status
FROM table1 a
WHERE a.event_date=
(SELECT max(b.event_date)
FROM table1 b
WHERE a.stuid=b.stuid
AND a.prog=b.prog
AND a.stu_prog_id=b.stu_prog_id)
AND a.event_seq=
(SELECT max(b.event_sequence)
FROM table1 b
WHERE a.stuid=b.stuid
AND a.prog=b.prog
AND a.stu_prog_id=b.stu_prog_id
AND a.event_date=b.event_date)
AND a.event_number=
(SELECT max(b.event_number)
FROM table1 b
WHERE a.stuid=b.stuid
AND a.prog=b.prog
AND a.stu_prog_id=b.stu_prog_id
AND a.event_date=b.event_date
AND a.event_sequence=b.event_sequence
I was wondering is there there a faster way to get the data?
I am using Oracle 12c.
You could try rephrasing your query using analytic functions:
SELECT
stuid,
prog,
stu_prog_id,
event_number,
event_date,
event_sequence,
prog_status
FROM
(
SELECT t.*,
RANK() OVER (PARTITION BY studio, prog, stu_prog_id
ORDER BY event_date DESC) rnk1,
RANK() OVER (PARTITION BY studio, prog, stu_prog_id, event_date
ORDER BY event_sequence DESC) rnk2,
RANK() OVER (PARTITION BY studio, prog, stu_prog_id, event_date, event_sequence
ORDER BY event_number DESC) rnk3
FROM table1 t
) t
WHERE rnk1 = 1 AND rnk2 = 1 AND rnk3 = 1;
Note: I don't actually know if you really need all three subqueries there. Adding sample data to your question might help someone else improve upon the solution I have given above.
I think you want a simple row_number() or rank():
select t1.*
from (select t1.*,
rank() over (partition by stuid, prog, stu_prog_id
order by event_date desc, event_sequence desc, event_number desc
) as seqnum
from table1 t1
) t1
where seqnum = 1;
If you have multiple records with EVENT_DATE, EVENT_SEQUENCE, EVENT_NUMBER as max respectively then in Tim's solution, Use DENSE_RANK or use the following to fetch the exact max and compare with original column data.
SELECT DISTINCT
A.STUID,
A.PROG,
A.STU_PROG_ID,
A.EVENT_NUMBER,
A.EVENT_DATE,
A.EVENT_SEQUENCE,
A.PROG_STATUS
FROM
(
SELECT
A.STUID,
A.PROG,
A.STU_PROG_ID,
A.EVENT_NUMBER,
A.EVENT_DATE,
A.EVENT_SEQUENCE,
A.PROG_STATUS,
MAX(A.EVENT_DATE) OVER(
PARTITION BY A.STUID, A.PROG, A.STU_PROG_ID
) AS MAX_EVENT_DATE,
MAX(A.EVENT_SEQUENCE) OVER(
PARTITION BY A.STUID, A.PROG, A.STU_PROG_ID, A.EVENT_DATE
) AS MAX_EVENT_SEQUENCE,
MAX(A.EVENT_NUMBER) OVER(
PARTITION BY A.STUID, A.PROG, A.STU_PROG_ID, A.EVENT_DATE, A.EVENT_SEQUENCE
) AS MAX_EVENT_NUMBER
FROM
TABLE1 A
) A
WHERE
A.MAX_EVENT_DATE = A.EVENT_DATE
AND A.MAX_EVENT_SEQUENCE = A.EVENT_SEQUENCE
AND A.MAX_EVENT_NUMBER = A.EVENT_NUMBER;
Cheers!!
As being an Oracle 12c user, you can use
[ OFFSET offset { ROW | ROWS } ]
[ FETCH { FIRST | NEXT } [ { rowcount | percent PERCENT } ]
{ ROW | ROWS } { ONLY | WITH TIES } ]
syntax as :
SELECT DISTINCT a.stuid,
a.prog,
a.stu_prog_id,
a.event_number,
a.event_date,
a.event_sequence,
a.prog_status
FROM table1 a
ORDER BY event_date DESC, event_sequence DESC, event_number DESC
FETCH FIRST 1 ROW ONLY;
where WITH TIES clause is not needed for your case, since you're looking for DISTINCT rows, and OFFSET is not needed either, since starting point is just the beginning of a descendingly ordered columns. Even, using the keyword ROW as ROWS is optional, even for the case of plural rows such as FETCH FIRST 5 ROW ONLY;
^^ --> ROWS without S
Demo

Get latest record from a table based on 2 columns in hive

I want to get the latest record from my source table based on num and id columns and insert in my target table.
Scenario is explained in the attached screen shot. For latest record date column can be used.
Screenshot
Thanks.
Select num,id, date
FROM
(
Select *, ROW_NUMBER() OVER(partition by num,id Order by date desc) as rnk
FROM source_table
)a
WHERE rnk = 1;
by using corelated Subquery
select * from your_table t
where t.date= (
select max(date) from your_table t1
where t1.num=t.num and t1.id=t.id
)
You can do it using max() function
select num,id,max(date) from your_table t
group by num,id
SELECT NUM,ID,DATE FROM TABLE_TEMP
QUALIFY RANK OVER(PARTITION BY NUM,ID ORDER BY DATE DESC)=1;
You can do this using single line query
SELECT NUM,ID,DATE FROM TABLE_TEMP
QUALIFY RANK OVER(PARTITION BY NUM,ID ORDER BY DATE DESC)=1;

How to select distinct records based on condition

I have table of duplicate records like
Now I want only one record from duplicate records which has latest created date as How can I do it ?
use row_number():
select EnquiryId, Name, . . .
from (select t.*,
row_number() over (partition by enquiryID order by CreatedDate desc) as seqnum
from table t
) t
where seqnum = 1;
Use ROW_NUMBER function to tag the duplicate records ordered by CreatedDate, like this:
;with CTE AS (
select *, row_NUMBER() over(
partition by EnquiryID -- add columns on which you want to identify duplicates
ORDER BY CreatedDate DESC) as rn
FROM TABLE
)
select * from CTE
where rn = 1