Trying to retrieve first rows partition by offer_id - sql

Suppose I have a table with columns
body | offer_id | created_at
I need to group them by offer_id, order by created_at and then retrieve first row in this table. I am now using the following logic
select first_value(M.body) over (partition by M.offer_id order by M.created_at ASC ROWS UNBOUNDED PRECEDING),
(select created_at
from quotes_site.offers O
where O.id = M.offer_id) as crt
from quotes_site.messages M
where M.created_at between '2016-01-01' and '2016-02-01'
Although there is no error, the query runs indefinitely so I assume there must be something wrong. I am also not very familiar with the frame clause so more detailed explanation would be greatly appreciated
P.S. Server runs on AWS Redshift

I'll suggest row_number():
SELECT * FROM (
SELECT t.offer_id,t.body,t.created_at,s.created_at,
ROW_NUMBER() OVER(PARTITION BY t.offer_ID ORDER BY t.created_at) as rnk
FROM quotes_site.messages t
INNER JOIN quotes_site.offers s
ON(s.id = t.offer_id))
WHERE rnk = 1

How about using distinct on instead:
select distinct on (M.offer_Id), M.*, o.created_at as o_created_at
from quotes_site.messages M left join
quotes_site.offers O
ON O.id = M.offer_id
where M.created_at between '2016-01-01' and '2016-02-01'
order by M.offer_id, M.created_at;

Related

SQL Server-How to avoid repetition of a column in the output

Output of my SQL Server Query is as below:
Following is my query:
SELECT
si.SupplyInvoiceID,
si.CompanyID,
si.TotalBill,
siph.BillPaidAmount,
si.TotalBill - SUM(siph.BillPaidAmount)
over( partition by si.SupplyInvoiceID order by siph.SupplyPaymentID asc) as RemainingBillAmount
from
SupplyInvoicePaymentHistory siph
left join
SupplyInvoice si
on siph.SupplyInvoiceID = si.SupplyInvoiceID
I want that in output column TotaBill, bill amount should be shown only one for each SupplyInvoiceID i.e
Required Output
Your problem requires an ordering for the table. It appears to be by SupplyPaymentId (although any column can be used). To do what you want, you can use row_number() and an explicit order by in the query:
select si.SupplyInvoiceID, si.CompanyID,
(CASE WHEN ROW_NUMBER() OVER (PARTITION BY si.SupplyInvoiceID order by siph.SupplyPaymentID) = 1
THEN si.TotalBill
END) as TotalBill
siph.BillPaidAmount,
(si.TotalBill -
SUM(siph.BillPaidAmount) over (partition by si.SupplyInvoiceID order by siph.SupplyPaymentID asc)
) as RemainingBillAmount
from SupplyInvoicePaymentHistory siph left join
SupplyInvoice si
on siph.SupplyInvoiceID = si.SupplyInvoiceID
order by si.SupplyInvoiceID, siph.SupplyPaymentID

how to use rank/join and where together

I have used multiple inner joins in my code, and I provided rank, but now I want to select a particular rank. So how to use rank in a where statement?
Here is my code, but now please help me to proceed further:
select [YEAR],
[IDManufacturer],
sum([TotalPrice]),
rank() over (order by sum(totalprice) desc) as sales_rank
from [dbo].[DIM_DATE]
join [dbo].[FACT_TRANSACTIONS]
on [dbo].[FACT_TRANSACTIONS].Date = [dbo].[DIM_DATE].DATE
join [dbo].[DIM_MODEL]
on [dbo].[DIM_MODEL].IDModel=[dbo].[FACT_TRANSACTIONS].IDModel
where [YEAR] in (2009,2010)
group by IDManufacturer,[year]
order by sum([TotalPrice]) desc
Now I want to select only rank 3 and 4. How to do that?
You could either do sub-query or CTE, i would suggest try with 2 methods and look at execution plan pick which performs better:
Sub Query
SELECT * FROM
(select [YEAR],
[IDManufacturer],
sum([TotalPrice]) TotalPrice,
rank() over (order by sum(totalprice) desc) as sales_rank
from [dbo].[DIM_DATE]
join [dbo].[FACT_TRANSACTIONS]
on [dbo].[FACT_TRANSACTIONS].Date = [dbo].[DIM_DATE].DATE
join [dbo].[DIM_MODEL]
on [dbo].[DIM_MODEL].IDModel=[dbo].[FACT_TRANSACTIONS].IDModel
where [YEAR] in (2009,2010)
group by IDManufacturer,[year]
) as SQ
Where sales_rank = 3 or sales_rank = 4
go
Common Table Expression
; with CTE as
(select [YEAR],
[IDManufacturer],
sum([TotalPrice]) TotalPrice,
rank() over (order by sum(totalprice) desc) as sales_rank
from [dbo].[DIM_DATE]
join [dbo].[FACT_TRANSACTIONS]
on [dbo].[FACT_TRANSACTIONS].Date = [dbo].[DIM_DATE].DATE
join [dbo].[DIM_MODEL]
on [dbo].[DIM_MODEL].IDModel=[dbo].[FACT_TRANSACTIONS].IDModel
where [YEAR] in (2009,2010)
group by IDManufacturer,[year]
)
SELECT * FROM CTE WHERE sales_rank = 3 or sales_rank = 4
If you want only rank 3 and 4 then try this:
select * from (
select [YEAR],
[IDManufacturer],
sum([TotalPrice]),
rank() over (order by sum(totalprice) desc) as sales_rank
from [dbo].[DIM_DATE]
join [dbo].[FACT_TRANSACTIONS]
on [dbo].[FACT_TRANSACTIONS].Date = [dbo].[DIM_DATE].DATE
join [dbo].[DIM_MODEL]
on [dbo].[DIM_MODEL].IDModel=[dbo].[FACT_TRANSACTIONS].IDModel
where [YEAR] in (2009,2010)
group by IDManufacturer,[year]
order by sum([TotalPrice]) desc
) t where sales_rank in (3,4)
If you only want the 3rd and 4th values -- and assuming no ties -- then use offset/fetch:
offset 2 rows fetch first 2 rows only
The offset 2 is because offset starts counting at 0 rather than 1.

How to join multiple sub queries as a single one without 'with'?

I have a query consisting of multiple subqueries. I used 'join' as im not allowed to use 'with'. The subqueries have 'from' clause which is creating an issue.
I have to display two columns with each column consisting certain logic to be displayed. For printing the two columns, i need to use sub queries which requires 'from' clause. I'm not sure how to write the 'from' clause to fit the whole query and make it runnable. I have checked the individual queries and they all work fine.
select lead(dt) over
(partition by t1.id_user order by f.topup_date desc rows between 0
preceding and unbounded following )
from
(select *,
(max(case when f.topup_value >= 20 then f.topup_date end) over (partition
by f.id_user order by f.topup_date desc rows between 0 preceding and
unbounded following )) as dt
from topups f) as f, //(<-I think this is incorrect)
CAST(f.topup_value as float)/CAST(t1.topup_value as float) from
(SELECT t1.seq,t1.id_user,t1.topup_value,row_number()
over (partition by t1.id_user order by t1.topup_date )
as rowrank from topups t1) as t1
inner join topups f on f.id_user=t1.id_user
inner join topups t2 on t1.seq=t2.seq
You're getting a syntax error because a query can only have a single FROM clause. It's difficult to tell the outcome you're trying to achieve, but turning the first query into a non-correlated subquery and using it for column f might be what you're looking for:
select
(select lead(dt) over (partition by t1.id_user order by f.topup_date desc rows between 0 preceding and unbounded following )
from (
select *,
(max(case when f.topup_value >= 20 then f.topup_date end) over (partition by f.id_user order by f.topup_date desc rows between 0 preceding and unbounded following )) as dt
from topups f
) x) as f,
CAST(f.topup_value as float)/CAST(t1.topup_value as float)
from (
SELECT t1.seq, t1.id_user, t1.topup_value, row_number() over (partition by t1.id_user order by t1.topup_date ) as rowrank
from topups t1
) as t1
inner join topups f on f.id_user=t1.id_user
inner join topups t2 on t1.seq=t2.seq
Really hard to read that query. What you marked as possible incorrectness is wrong because you're trying to add what looks like another SELECT after your original FROM clause. That's incorrect syntax. Think of your FROM subquery as a temporary table. You couldn't say something like:
SELECT some_column
FROM a_table, some_other_column
That's cross-join syntax. some_other_column would need to be a table for that to even be valid.
Consider adding a CREATE TABLE and sample data so we can test.
You might be looking for something along the lines of this:
SELECT LEAD(temp.dt) OVER(PARTITION BY temp.id_user ORDER BY temp.topup_date DESC ROWS BETWEEN 0 PRECEDING AND UNBOUNDED FOLLOWING)
, temp.division
FROM
(
SELECT (max(CASE WHEN f.topup_value >= 20 THEN f.topup_date END) OVER(PARTITION BY f.id_user ORDER BY f.topup_date DESC ROWS BETWEEN 0 PRECEDING AND UNBOUNDED FOLLOWING )) AS dt
, f.topup_value::float / t1.topup_value::float AS division
, t1.id_user
, f.topup_date
FROM topups t1
JOIN topups f USING (id_user)
) temp
;
Just an opinion but its less noisy to use the :: operator to cast variables. Instead of CAST(f.topup_value as float) just use f.topup_value::float

Select most recent status for each ID and department code

I have the following table:
I want to get the most recent status for each dept_code that a CL_ID has. So the desired output would be this:
I have tried the following but this give me just the most recent status for each client and not each of their dept_codes.
SELECT *
FROM [CIMSHR6_MERGED].[dbo].[C3CLSTAT] C
INNER JOIN
(SELECT CLIENT_NUMBER, MAX(STATUS_DATE) AS SDATE
FROM [CIMSHR6_MERGED].[dbo].[C3CLSTAT]
GROUP BY CLIENT_NUMBER) X
ON X.CLIENT_NUMBER = C.CLIENT_NUMBER
AND X.SDATE = C.STATUS_DATE
ORDER BY C.CLIENT_NUMBER
Any help would be much appreciated. Thanks.
A convenient method that works in SQL Server is:
select top (1) cl.*
from [CIMSHR6_MERGED].[dbo].[C3CLSTAT] cl
order by row_number() over (partition by cl_id, dept_code order by status_date desc);
A method that is efficient with the right indexes in almost any database is:
select cl.*
from [CIMSHR6_MERGED].[dbo].[C3CLSTAT] cl
where cl.status_date = (select max(cl2.status_date)
from [CIMSHR6_MERGED].[dbo].[C3CLSTAT] cl2
where cl2.cl_id = cl.cl_id and cl2.dept_code = cl.dept_code
);
The right index is on (cl_id, dept_code, status_date).
I would also use ROW_NUMBER, but with a subquery:
SELECT CL_ID, Status_date, Status, Dept_code
FROM
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY CL_ID, Dept_code ORDER BY Status_date DESC) rn
FROM CIMSHR6_MERGED].[dbo].[C3CLSTAT]
) t
WHERE rn = 1;
1) Firstly group everything on Dept_Code,CL_ID and assign rank for each row with in the group in descending order.
2) Select all the rows with rnk=1 which would display your desired result.
SELECT Z.CL_ID,
Z.Status_Date,
Z.Status,
Z.Dept_Code
FROM
(
SELECT *,
RANK() OVER( PARTITION BY Dept_Code,CL_ID, ORDER BY Status_Date DESC ) AS rnk
FROM [CIMSHR6_MERGED].[dbo].[C3CLSTAT]
) Z
WHERE Z.rnk = 1;
This would work for almost all databases
select * from c3clstat c
where exists
(select 1 from c3clstat c1
where c1.cl_id=c.cl_id
and c1.dept_code=c.dept_code
group by cl_id,dept_code
having c.status_date=max(c1.status_date)
)

Summing the most recent rows, grouped by the id

SELECT distinct on (prices.item_id) *
FROM prices
ORDER BY prices.item_id, prices.updated_at DESC
The above query retrieves the most recent prices, how would I get the total sum of all the current prices?
Is it possible without using a subselect?
This is trivial using a subquery:
select sum(p.price)
from (select distinct on (p.item_id) p.*
from prices p
order by p.item_id, p.updated_at desc
) p
If you don't mind repeated rows, I think the following might work:
select distinct on (p.item_id) sum(prices.price) over ()
from prices p
order by p.item_id, p.updated_at desc
You might be able to add a limit clause to this to get what you want. By the way, I would write this as:
select sum(p.price)
from (select p.*,
row_number() over (partition by p.item_id order by updated_at desc) as seqnum
from prices p
order by p.item_id, p.updated_at desc
) p
where seqnum = 1
ROW_NUMBER() is standard SQL. The DISTINCT ON clause is specific to Postgres.