Subquery in BigQuery (JOIN on same Table) - sql

I have a BigQuery table with this data
client spent balance date
A 20 500 2022-01-01
A 10 490 2022-01-02
A 50 440 2022-01-03
B 200 1000 1995-07-09
B 300 700 1998-08-11
B 100 600 2002-04-17
C 2 100 2021-01-04
C 10 90 2021-06-06
C 70 20 2021-10-07
I need the latest balance of each client based on the date:
client spent balance date
A 50 440 2022-01-03
B 100 600 2002-04-17
C 70 20 2021-10-07
distinct does not work like in sql and group on client does also not work because I need count, sum, etc. with the other columns when I use group.
For just one client I use:
SELECT balance FROM `table` WHERE client = "A" ORDER BY date DESC LIMIT 1.
But how can I get this data for every client in just one statement.
I tried with subselect
SELECT client,
(SELECT balance FROM ` table ` WHERE client = tb. client ORDER by date DESC limit 1) AS bal
FROM `table` AS tb;
and got the error:
Correlated subqueries that reference other tables are not supported
unless they can be de-correlated, such as by transforming them into an
efficient JOIN.
I don’t know how to make a JOIN out of this subquery to make it work.
Hope you have an idea.

Use below
select * from your_table
qualify 1 = row_number() over(partition by client order by date desc)
if applied to sample data in your question - output is

have you tried using row_number window function?
select client, spent, balance, date
from (
select client, spent, balance, date
, ROW_NUMBER() OVER (PARTITION BY client ORDER BY date DESC) AS row_num -- adding row number, starting from latest date
from table
)
where row_num = 1 -- filter out only the latest date

Related

SQL update statement to sum column in one table, then add the total to a different column/table

Evening all, hoping for some pointers with an SQL Server query if possible.
I have two tables in a database, example as follows:
PostedTran
PostedTranID AccountID PeriodID Value TransactionDate
1 100 120 100 2019-01-01
2 100 120 200 2020-01-01
3 100 130 300 2021-01-01
4 101 120 400 2020-01-01
5 101 130 500 2021-01-01
PeriodValue
PeriodValueID AccountID PeriodID ActualValue
10 100 120 500
11 101 120 600
I have a mismatch in the two tables, and I'm failing miserably in my attempts. From the PostedTran table, I'm trying to select all transaction lines dated before 2021-01-01, then sum the Value for each AccountID from the results. I then need to add that value to the existing ActualValue in the PeriodValue table.
So, in the above example, the ActualValue on PeriodValueID 10 will update to 800, and 11 to 1000. The PeriodID in this example is constant and will always be 120.
Thanks in advance for any help.
Since RDMS not mentioned, pseudo-sql looks like:
with DataSum as
(
select AccountID, PeriodID, sum(Value) as TotalValue
from PostedTran
where TransactionDate<'1/1/2021'
group by AccountID, PeriodID
)
update PeriodValue set ActualValue = ActualValue + ds.TotalVaue
from PeriodValue pv inner join DataSum ds
on pv.accountid=ds.accountid and pv.periodid=ds.periodid
The following should do what you ask. I haven't included PeriodId in the correlation as you did not specify it in your description, however you can just include it if it's required.
update pv set pv.ActualValue=pv.ActualValue + t.Value
from PeriodValue pv
cross apply (
select Sum(value) value
from PostedTran pt
where pt.AccountId=pv.AccountId and pt.TransactionDate <'20210101'
)t

Include only transition states in SQL query

I have a table with customers and their purchase behaviour that looks as follows:
customer shop time
----------------------------
1 5 13.30
1 5 14.33
1 10 22.17
2 3 12.15
2 1 13.30
2 1 15.55
2 3 17.29
Since I want the shift in shop I need the following output
customer shop time
----------------------------
1 5 13.30
1 10 22.17
2 3 12.15
2 1 13.30
2 3 17.29
I have tried using
ROW_NUMBER() OVER (PARTITION BY customer, shop ORDER BY time ASC) AS a counter
and then only keeping all counter=1. However, this troubles me when the customer visits the same shop again later on, as with customer=2 and shop=3 in my example.
I came up with this:
WITH a AS
(
SELECT
customer, shop, time,
ROW_NUMBER() OVER (PARTITION BY customer ORDER BY time ASC) AS counter
FROM
db
)
SELECT a1.*
FROM a a1
JOIN a AS a2 ON (a1.device = a2.device AND a2.counter1 + 1 = a1.counter1 AND a2.id <> a1.id)
UNION
SELECT a.*
FROM a
WHERE counter1 = 1
However, this is very inefficient and running it in AWS where my data is located results in a error telling me that
Query exhausted resources at this scale factor
Is there any way to make this query more efficient?
This is a gaps-and-islands problem. But the simplest solution uses lag():
select customer, shop, time
from (select t.*, lag(shop) over (partition by customer order by time) as prev_shop
from t
) t
where prev_shop is null or prev_shop <> shop;

SQL select specific group from table

I have a table named trades like this:
id trade_date trade_price trade_status seller_name
1 2015-01-02 150 open Alex
2 2015-03-04 500 close John
3 2015-04-02 850 close Otabek
4 2015-05-02 150 close Alex
5 2015-06-02 100 open Otabek
6 2015-07-02 200 open John
I want to sum up trade_price grouped by seller_name when last (by trade_date) trade_status was 'open'. That is:
sum_trade_price seller_name
700 John
950 Otabek
The rows where seller_name is Alex are skipped because the last trade_status was 'close'.
Although I can get desirable output result with the help of nested select
SELECT SUM(t1.trade_price), t1.seller_name
WHERE t1.seller_name NOT IN
(SELECT t2.seller_name FROM trades t2
WHERE t2.seller_name = t1.seller_name AND t2.trade_status = 'close'
ORDER BY t2.trade_date DESC LIMIT 1)
from trades t1
group by t1.seller_name
But it takes more than 1 minute to execute above query (I have approximately 100K rows).
Is there another way to handle it?
I am using PostgreSQL.
I would approach this with window functions:
SELECT SUM(t.trade_price), t.seller_name
FROM (SELECT t.*,
FIRST_VALUE(trade_status) OVER (PARTITION BY seller_name ORDER BY trade_date desc) as last_trade_status
FROM trades t
) t
WHERE last_trade_status <> 'close;
GROUP BY t.seller_name;
This should perform reasonably with an index on seller_name
select
sum(trade_price) as sum_trade_price,
seller_name
from
trades
inner join
(
select distinct on (seller_name) seller_name, trade_status
from trades
order by seller_name, trade_date desc
) s using (seller_name)
where s.trade_status = 'open'
group by seller_name

Row number in query result

I have query to get firms by theirs sales last year.
select
Name,
Sale
from Sales
order by
Sale DESC
and I get
Firm 2 | 200 000
Firm 1 | 190 000
Firm 3 | 100 000
And I would like to get index of row in result. For Firm 2 I would like to get 0 (or 1), for Firm 3 1 (or 2) and etc. Is this possible? Or at least create some sort of autoincrement column. I can use even stored procedure if it is needed.
Firebird 3.0 supports row_number() which is the better way to do this.
However for Firebird 2.5, you can get what you want with a correlated subquery:
select s.Name, s.Sale,
(select count(*) from Sales s2 where s2.sale >= s.sale) as seqnum
from Sales s
order by s.Sale DESC;

Select info from table where row has max date

My table looks something like this:
group date cash checks
1 1/1/2013 0 0
2 1/1/2013 0 800
1 1/3/2013 0 700
3 1/1/2013 0 600
1 1/2/2013 0 400
3 1/5/2013 0 200
-- Do not need cash just demonstrating that table has more information in it
I want to get the each unique group where date is max and checks is greater than 0. So the return would look something like:
group date checks
2 1/1/2013 800
1 1/3/2013 700
3 1/5/2013 200
attempted code:
SELECT group,MAX(date),checks
FROM table
WHERE checks>0
GROUP BY group
ORDER BY group DESC
problem with that though is it gives me all the dates and checks rather than just the max date row.
using ms sql server 2005
SELECT group,MAX(date) as max_date
FROM table
WHERE checks>0
GROUP BY group
That works to get the max date..join it back to your data to get the other columns:
Select group,max_date,checks
from table t
inner join
(SELECT group,MAX(date) as max_date
FROM table
WHERE checks>0
GROUP BY group)a
on a.group = t.group and a.max_date = date
Inner join functions as the filter to get the max record only.
FYI, your column names are horrid, don't use reserved words for columns (group, date, table).
You can use a window MAX() like this:
SELECT
*,
max_date = MAX(date) OVER (PARTITION BY group)
FROM table
to get max dates per group alongside other data:
group date cash checks max_date
----- -------- ---- ------ --------
1 1/1/2013 0 0 1/3/2013
2 1/1/2013 0 800 1/1/2013
1 1/3/2013 0 700 1/3/2013
3 1/1/2013 0 600 1/5/2013
1 1/2/2013 0 400 1/3/2013
3 1/5/2013 0 200 1/5/2013
Using the above output as a derived table, you can then get only rows where date matches max_date:
SELECT
group,
date,
checks
FROM (
SELECT
*,
max_date = MAX(date) OVER (PARTITION BY group)
FROM table
) AS s
WHERE date = max_date
;
to get the desired result.
Basically, this is similar to #Twelfth's suggestion but avoids a join and may thus be more efficient.
You can try the method at SQL Fiddle.
Using an in can have a performance impact. Joining two subqueries will not have the same performance impact and can be accomplished like this:
SELECT *
FROM (SELECT msisdn
,callid
,Change_color
,play_file_name
,date_played
FROM insert_log
WHERE play_file_name NOT IN('Prompt1','Conclusion_Prompt_1','silent')
ORDER BY callid ASC) t1
JOIN (SELECT MAX(date_played) AS date_played
FROM insert_log GROUP BY callid) t2
ON t1.date_played = t2.date_played
SELECT distinct
group,
max_date = MAX(date) OVER (PARTITION BY group), checks
FROM table
Should work.