Differences between row in google big query - sql

I'm currently attempting to calculate differences between rows in google big query. I actually have a working query.
SELECT
id, record_time, level, lag,
(level - lag) as diff
FROM (
SELECT
id, record_time, level,
LAG(level) OVER (ORDER BY id, record_time) as lag
FROM (
SELECT
*
FROM
TABLE_QUERY(MY_TABLES))
ORDER BY
1, 2 ASC
)
GROUP BY 1, 2, 3, 4
ORDER BY 1, 2 ASC
But I'm working with big data and sometimes I have memory limit warning that does not let me execute the query. So, I would like to understand why I cant do an optimized query like bellow. I think it will allow work with more records without memory limit warning.
SELECT
id, record_time, level,
level - LAG(level, 1) OVER (ORDER BY id, record_time) as diff
FROM (
SELECT
*
FROM
TABLE_QUERY(MY_TABLES))
ORDER BY
1, 2 ASC
This kind of function level - LAG(level, 1) OVER (ORDER BY id, record_time) as diff, when the query is executed returns the error
Missing function in Analytic Expression
on Big Query.
I also tried to put ( ) into this function but it does not work as well.
Thanks for helping me!

It works fine for me. Maybe you forgot to enable standard SQL? Here is an example:
WITH Input AS (
SELECT 1 AS id, TIMESTAMP '2017-10-17 00:00:00' AS record_time, 2 AS level UNION ALL
SELECT 2, TIMESTAMP '2017-10-16 00:00:00', 3 UNION ALL
SELECT 1, TIMESTAMP '2017-10-16 00:00:00', 4
)
SELECT
id, record_time, level, lag,
(level - lag) as diff
FROM (
SELECT
id, record_time, level,
LAG(level) OVER (ORDER BY id, record_time) as lag
FROM Input
)
GROUP BY 1, 2, 3, 4
ORDER BY 1, 2 ASC;

Related

Difference in the output of query, using rank() and CTE

My first query looks like:
select trans.* from
( select
acc_num,
acc_type,
trans_amount,
load_date,
rank() over(partition by acc_num order by load_date) as rk
from monetary
where rat_code = 123
) trans
where trans.rk =1;
second query looks like
with a as (
select *,
row_number() over(partition by acc_num order by load_date) as rn
from monetary
where rat_code = 123 )
select
acc_num,
acc_type,
trans_amount,
load_date
from a
where rn =1;
Can any one please help me I am getting different number of records for both the cases.
though the query is same.
Its because there is difference between rank and row_number.
Below example will show
Accno, dt, rank_col, rownum_col
100, 2-jun-2022, 1, 1
100, 3-jun-2022, 1, 2
100, 1-jul-2022, 1, 3
54, 2-jun-2022, 4, 1
54, 1-jul-2022, 4, 2
In above example, you can see row number will calculate unique row id. Whereas rank gives unique id but in a continuous manner. You can see from above example, rank=1 gives you 3 rows but rownum=1 gives only two.

Lead & Analytical Functions in BigQuery

Assume my table is this
I am trying to modify my table with this information
I have added two columns where column WhenWasLastBasicSubjectDone will let you know when in which semester the student completed his latest Basic Course (sorted by Semester). The other column TotalBasicSubjectsDoneTillNow explains how many times had the student completed Basic Course(Subject) till now (sorted by Semester) ?
I think this is easy to solve with Joins as well as with UDFs but I want to use the power of existing analytical functions in BigQuery and solve it without joins.
You can use window functions for this -- assuming you have a column that specifies ordering. Let me assume that column is semester:
select t.*,
max( case when subject = 'Basic' then semester end ) over (partition by student order by semester end) as lastbasic,
sum( case when subject = 'Basic' then 1 else 0 end ) over (partition by student order by semester end) as numbasictillnow
from t
Below is for BigQuery Standard SQL
#standardSQL
SELECT *,
LAST_VALUE(IF(subject='Basic',semester,NULL) IGNORE NULLS) OVER(win) AS WhenWasLastBasicSubjectDone ,
COUNTIF(subject='Basic') OVER(win) AS TotalBasicSubjectsDoneTillNow
FROM `project.dataset.table`
WINDOW win AS (PARTITION BY student ORDER BY semester)
You can test, play with above using dummy data from your question as below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 Student, 'Sub1' Subject, 'Sem1' Semester UNION ALL
SELECT 1, 'Sub2', 'Sem2' UNION ALL
SELECT 1, 'Basic', 'Sem3' UNION ALL
SELECT 1, 'Basic', 'Sem4' UNION ALL
SELECT 1, 'Sub3', 'Sem5' UNION ALL
SELECT 1, 'Sub2', 'Sem6' UNION ALL
SELECT 1, 'Sub3', 'Sem7' UNION ALL
SELECT 1, 'Sub4', 'Sem8'
)
SELECT *,
LAST_VALUE(IF(subject='Basic',semester,NULL) IGNORE NULLS) OVER(win) AS WhenWasLastBasicSubjectDone ,
COUNTIF(subject='Basic') OVER(win) AS TotalBasicSubjectsDoneTillNow
FROM `project.dataset.table`
WINDOW win AS (PARTITION BY student ORDER BY semester)
-- ORDER BY Semester

SQL group rows into pairs

I'm trying to add some sort of unique identifier (uid) to partitions made of pairs of rows, i.e. generate some uid/tag for each two rows of (identifier1,identifier2) in a window partition with size = 2 rows.
So, for example, the first 2 rows for ID X would get uid A, the next two rows for the same ID would get uid B and, if there is only one single row left in the partition for ID X, it would get id C.
Here's what I'm trying to accomplish, the picture illustrates the table's structure, I manually added the expectedIdentifier to illustrate the goal:
This is my current SQL, ntile doesn't solve it because the partition size varies:
select
rowId
, ntile(2) over (partition by firstIdentifier, secondIdentifier order by timestamp asc) as ntile
, *
from log;
Already tried ntile( (count(*) over partition...) / 2), but that doesn't work.
Generating the UID can be done with md5() or similar, but I'm having trouble tagging the rows as illustrated above (so I can md5 the generated tag/uid)
While count(*) is not supported within a Snowflake window function, count(1) is supported and can be used to create the unique identifier. Below is an example of an integer unique ID matching pairs of rows and handling "odd" row groups:
select
ntile(2) over (partition by firstIdentifier, secondIdentifier order by timestamp asc) as ntile
,ceil(count(1) over( partition by firstIdentifier, secondIdentifier order by timestamp asc) / 2) as id
, *
from log;
select *, char(65 + (row_number() over(partition by
firstidentifier,secondidentifier order by timestamp)-1)/2)
expectedidentifier from log
order by firstidentifier, timestamp
Here is the Sql Server Version
with log (firstidentifier,secondidentifier, timestamp)
as (
select 15396, 14460, 1 union all
select 15396, 14460, 1 union all
select 19744, 14451, 1 union all
select 19744, 14451, 1 union all
select 19744, 14451, 1 union all
select 15590, 12404, 1 union all
select 15590, 12404, 1 union all
select 15590, 12404, 1 union all
select 15590, 12404, 1 union all
select 15590, 12404, 1
)
select *, char(65 + (row_number() over(partition by
firstidentifier,secondidentifier order by timestamp)-1)/2)
expectedidentifier from log
order by firstidentifier,secondidentifier,timestamp

"Group" some rows together before sorting (Oracle)

I'm using Oracle Database 11g.
I have a query that selects, among other things, an ID and a date from a table. Basically, what I want to do is keep the rows that have the same ID together, and then sort those "groups" of rows by the most recent date in the "group".
So if my original result was this:
ID Date
3 11/26/11
1 1/5/12
2 6/3/13
2 10/15/13
1 7/5/13
The output I'm hoping for is:
ID Date
3 11/26/11 <-- (Using this date for "group" ID = 3)
1 1/5/12
1 7/5/13 <-- (Using this date for "group" ID = 1)
2 6/3/13
2 10/15/13 <-- (Using this date for "group" ID = 2)
Is there any way to do this?
One way to get this is by using analytic functions; I don't have an example of that handy.
This is another way to get the specified result, without using an analytic function (this is ordering first by the most_recent_date for each ID, then by ID, then by Date):
SELECT t.ID
, t.Date
FROM mytable t
JOIN ( SELECT s.ID
, MAX(s.Date) AS most_recent_date
FROM mytable s
WHERE s.Date IS NOT NULL
GROUP BY s.ID
) r
ON r.ID = t.ID
ORDER
BY r.most_recent_date
, t.ID
, t.Date
The "trick" here is to return "most_recent_date" for each ID, and then join that to each row. The result can be ordered by that first, then by whatever else.
(I also think there's a way to get this same ordering using Analytic functions, but I don't have an example of that handy.)
You can use the MAX ... KEEP function with your aggregate to create your sort key:
with
sample_data as
(select 3 id, to_date('11/26/11','MM/DD/RR') date_col from dual union all
select 1, to_date('1/5/12','MM/DD/RR') date_col from dual union all
select 2, to_date('6/3/13','MM/DD/RR') date_col from dual union all
select 2, to_date('10/15/13','MM/DD/RR') date_col from dual union all
select 1, to_date('7/5/13','MM/DD/RR') date_col from dual)
select
id,
date_col,
-- For illustration purposes, does not need to be selected:
max(date_col) keep (dense_rank last order by date_col) over (partition by id) sort_key
from sample_data
order by max(date_col) keep (dense_rank last order by date_col) over (partition by id);
Here is the query using analytic functions:
select
id
, date_
, max(date_) over (partition by id) as max_date
from table_name
order by max_date, id
;

Combine two sql select queries (in postgres) with LIMIT statement

I've got a table and I want a query that returns the last 10 records created plus the record who's id is x.
I'm trying to do -
SELECT * FROM catalog_productimage
ORDER BY date_modified
LIMIT 10
UNION
SELECT * FROM catalog_productimage
WHERE id=5;
But it doesn't look like I can put LIMIT in there before UNION. I've tried adding another column and using it for sorting -
SELECT id, date_modified, IF(false, 1, 0) as priority FROM catalog_productimage
UNION
SELECT, id, date_modified, IF(true, 1, 0) as priority FROM catalog_productimage
WHERE id=5
ORDER BY priority, date_modified
LIMIT 10;
but I'm not making much progress..
Just checked that this will work:
(SELECT * FROM catalog_productimage
ORDER BY date_modified
LIMIT 10)
UNION
SELECT * FROM catalog_productimage
WHERE id=5;
This will give you records from 10th to 20th and should get you started.i will reply back with SQLfiddle
SELECT *
FROM (SELECT ROW_NUMBER () OVER (ORDER BY cat_id) cat_row_no, a.* FROM catalog_productimage a where x=5)
WHERE cat_row_no > 10 and cat_row_no <20