SQL: Select Distinct on 1 column

SQL: Select Distinct on 1 column - sql

I need to select distinct values but only for 1 column and the other columns need to show the latest record, i.e.:
customerID Order Number Order Date
00001 1000011 2017-01-01
00001 1000022 2017-01-10
00001 1000033 2017-02-01
00002 2000011 2016-12-01
00002 2000022 2017-01-01
00003 3000011 2017-03-01
I would need this to show as:
customerID Order Number Order Date
00001 1000033 2017-02-01
00002 2000022 2017-01-01
00003 3000011 2017-03-01
In Postgresql I would have used SELECT DISTINCT ON (customerID) then ordered by Order Date desc but this isn't possible in SQL Server.
I have tried using the Max function on Order Date, but this still return duplicates in Customer ID when applied like below:
SELECT DISTINCT [CustomerID], [Order No], Max([Order Date])
FROM [T.ORDERS]
GROUP BY [CustomerID], [JOBNO]

You can use JOIN too
SELECT
A.[CustomerID], A.[Order No], A.[Order Date]
FROM [T.ORDERS] A INNER JOIN
(
SELECT
[CustomerID], Max([Order Date])
FROM [T.ORDERS]
GROUP BY A.[CustomerID], [JOBNO]
) B
ON A.[CustomerID]=B.[CustomerID] AND A.[Order Date]=B.[Order Date]

You can use row_number as below:
select * from
( Select *, RowN = Row_Number() over (partition by CustomerID order by [Order date] desc) from #yourtable ) a
where a.RowN = 1

You may have this query
SELECT DISTINCT customerID, MAX(OrderNumber), MAX(OrderDate) FROM table;
distinct is faster than group by

you may try an top 1 with ties function with ROW_NUMBER:
with Data as (
select '00001' customerID, 1000011 orderNumber, cast('20170101' as date) orderdate union all
select '00001' customerID, 1000022 orderNumber, '20170110' orderdate union all
select '00001' customerID, 1000033 orderNumber, '20170201' orderdate union all
select '00002' customerID, 2000011 orderNumber, '20161201' orderdate union all
select '00002' customerID, 2000022 orderNumber, '20170101' orderdate union all
select '00003' customerID, 3000011 orderNumber, '20170301' orderdate)
select top 1 with ties
CustomerID,
OrderNumber,
OrderDate
from Data
order by
ROW_NUMBER() OVER (partition by CustomerID order by orderdate desc)
result:
CustomerID orderNumber orderdate
00001 1000033 2017-02-01
00002 2000022 2017-01-01
00003 3000011 2017-03-01

If your table is not huge, you could try something like this:
SELECT [CustomerID], SUBSTRING(Dummy, 0, CHARINDEX('*', Dummy) - 1) AS [Order Date],
SUBSTRING(Dummy, CHARINDEX('*', Dummy), LEN(Dummy) - CHARINDEX('*', Dummy)) AS [Order No],
FROM (
SELECT [CustomerID],
Max(CONVERT(varchar, [Order Date], 101) + '*' + CAST([Order No] as varchar)) AS Dummy
FROM [T.ORDERS] GROUP BY [CustomerID]
)
What it is doing is to join Order Date and Order No fields with * character (which hopefully doesn't occur anywhere in either column data) and then pick its max value within each group. In the outer SELECT, we then split the max value on the * character to get back the two values.

Related

SQL - How to count number of distinct values (payments), after sum of rows where they have another column value (Due Date) in common

My 'deals_payments' table is:
Due Date Payment ID
1-Mar-19 1,000.00 123
1-Apr-19 1,000.00 123
1-May-19 1,000.00 123
1-Jun-19 1,000.00 123
1-Jul-19 1,000.00 123
1-Aug-19 1,000.00 123
1-Jun-19 500.00 456
1-Jul-19 500.00 456
1-Aug-19 500.00 456
I have the SQL code:
select
count(*), payment
from (select deals_payments.*,
(row_number() over (order by due_date) -
row_number() over (partition by payment order by due_date)
) as grp
from deals_payments
where id = 123
) deals_payments
group by grp, payment
order by grp
which gives me what I want - the number of payments on each distinct amount - (here I only asked for ID 123):
COUNT(*) PAYMENT
6 1000.00
But now I need the sum of payments of the two ID's (123 and 456), where the due dates are the same, and count the number of payments on each distinct amount, as:
COUNT(*) PAYMENT
3 1000.00
3 1500.00
I tried the below but it gives me the 'missing right parenthesis' error. What is wrong??
select
count(*),
(select
sum(total) total
from (select distinct
due_date,
(select
sum(payment)
from deals_payments
where (due_date = a.due_date)) as total
from deals_payments a
where a.id in (123, 456)
and payment > 0)
group by due_date
order by due_date) b
from (select deals_payments.*,
(row_number() over (order by due_date) -
row_number() over (partition by payment order by due_date)
) as grp
from deals_payments
where id = 123
) deals_payments
group by grp, payment
order by grp

Taking your earlier comments into consideration, I agree that the SQL can be simplified to get the intended result. My understanding is that the expected output is the frequency of the total payment of a subset of IDs on any given date.
select count(*) as PaymentFrequency, TotalPaidOnDueDate from
(
select due_date, sum(payment) as TotalPaidOnDueDate from #deals_payments
where ID in (123, 456)
group by due_date
) a
group by a.TotalPaidOnDueDate
Here is a sql fiddle I used to verify: http://sqlfiddle.com/#!18/6b04f/1

This seems really strange. I don't understand why your logic is so complicated.
How about this?
select id, count(*), max(payment)
from (select dp.*,
count(*) over (partition by due_date) as cnt
from deal_payments dp
where dp.id in (123, 456)
) dp
where cnt = 2
group by id;

An interesting question. Could this do the trick???
select payment, count(*)
from deals_payments
where due_date in
(select due_date
from deals_payments
group by due_date
having count(*) > 1)
group by payment;
You can add a filter by id if you want, of course.

How to create a rolling total field?

I'm trying to create a rolling total of the number of orders placed by a customer within a specific time period, ordered by date.
I have tried to use the partition function but the below query doesn't yield the correct results. Any help would be appreciated
select
CustomerID
, Order ID
, COUNT(OrderID) OVER (PARTITION BY CustomerID ORDER BY OrderDate) RunningOrderCount
from #existingtable
I want the results to be a table of all the customer ID's, all their corresponding order ID's and then a field with the order count eg...
CustomerID OrderID OrderCount
1234 5675 1
1234 5676 2
1234 5677 3
1234 5678 4
1234 5679 5

I think your are looking for is ROW_NUMBER()
SELECT
CustomerID
, OrderID
, ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY OrderDate) RunningOrderCount
from #existingtable

this line will return you the count over a single year
select
CustomerID
, Order ID
, count(*) OVER
(PARTITION BY CustomerID ORDER BY DATEPART(yy,orderDate) ) as year_total
from #customtable
It's the same if you use row_number instead of count(*)

SQL - "not contained in either an aggregate function or the GROUP BY clause."

Using SQL Server 2016. I have a table:
Product Qty OrderDate
--------------------------
Toys 100 2018-10-01
Toys 100 2018-10-01
Books 30 2018-10-01
Toys 150 2018-10-02
Toys 50 2018-10-02
Toys 20 2018-10-02
Toys 110 2018-10-03
Toys 90 2018-10-04
Toys 200 2018-10-05
Toys 100 2018-10-05
Toys 30 2018-10-08
Toys 50 2018-10-09
and I want to calculate the average quantity per product, for the last 5 days. I am close to this with this query:
SELECT
Product,
RowNumber,
OrderDate,
AVG(TotalQty) OVER (ORDER BY RowNumber DESC ROWS 5 PRECEDING) as RollingAvg
FROM
(
SELECT ROW_NUMBER() OVER (PARTITION BY Product ORDER BY orderDate) AS RowNumber, Product, OrderDate, sum(Qty) as TotalQty
FROM Tbl
GROUP BY Product, OrderDate
) x
GROUP BY Product, RowNumber, OrderDate
The inner query works correctly, giving me the total per product/date pair. However my outer query reports a problem:
Column 'x.TotalQty' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
There's obviously something I'm doing wrong with my OVER clause, because when I remove that I get a valid result.
Syntactically valid query (that does the wrong thing):
SELECT
Product,
RowNumber,
OrderDate,
AVG(TotalQty) as RollingAvg
FROM
(
SELECT ROW_NUMBER() OVER (PARTITION BY Product ORDER BY orderDate) AS RowNumber, Product, OrderDate, sum(Qty) as TotalQty
FROM Tbl
GROUP BY Product, OrderDate
) x
GROUP BY Product, RowNumber, OrderDate
Any help/pointers would be much appreciated please - I'm close but can't cross this final hurdle!

I think you want:
SELECT Product, RowNumber, OrderDate,
AVG(TotalQty) OVER (ORDER BY RowNumber DESC ROWS 5 PRECEDING) as RollingAvg
FROM (SELECT ROW_NUMBER() OVER (PARTITION BY Product ORDER BY orderDate) AS RowNumber,
Product, OrderDate, sum(Qty) as TotalQty
FROM Tbl
GROUP BY Product, OrderDate
) x;
That is, the outer aggregation is unnecessary because AVG() is being used as a window function, not an aggregation function.
You should be able to do this without a subquery:
SELECT ROW_NUMBER() OVER (PARTITION BY Product ORDER BY orderDate) AS RowNumber,
Product, OrderDate, sum(Qty) as TotalQty,
AVG(SUM(Qty)) OVER (PARTITION BY Product ORDER BY orderDate ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) as avg_5
FROM Tbl
GROUP BY Product, OrderDate;
Note that this interprets "last five days" as the current day plus the preceding four days. Your version has six days for the average.

Fetch last record for each Row number

Hi have written query like this:
select Customerid,orderDate, OrderNumber,
DENSE_RANK() OVER (PARTITION BY Customerid ORDER BY orderDate) "rank"
from [order]
and this produce result:
Here I want to retrieve only latest purchase of each customer like this:
1 2014-04-09 00:00:00.000 543141 6
2 2014-03-04 00:00:00.000 543056 4
3 2014-01-28 00:00:00.000 542986 7
How to achieve this using sql query

Use a subquery:
select o.*
from (select Customerid,orderDate, OrderNumber,
DENSE_RANK() OVER (PARTITION BY Customerid ORDER BY orderDate DESC) as seqnum
from [order] o
) o
where seqnum = 1;

Finding the interval between dates in SQL Server

I have a table including more than 5 million rows of sales transactions. I would like to find sum of date intervals between each customer three recent purchases.
Suppose my table looks like this :
CustomerID ProductID ServiceStartDate ServiceExpiryDate
A X1 2010-01-01 2010-06-01
A X2 2010-08-12 2010-12-30
B X4 2011-10-01 2012-01-15
B X3 2012-04-01 2012-06-01
B X7 2012-08-01 2013-10-01
A X5 2013-01-01 2015-06-01
The Result that I'm looking for may looks like this :
CustomerID IntervalDays
A 802
B 135
I know the query need to first retrieve 3 resent transactions of each customer (based on ServiceStartDate) and then calculate the interval between startDate and ExpiryDate of his/her transactions.

You want to calculate the difference between the previous row's ServiceExpiryDate and the current row's ServiceStartDate based on descending dates and then sum up the last two differences:
with cte as
(
select tab.*,
row_number()
over (partition by customerId
order by ServiceStartDate desc
, ServiceExpiryDate desc -- don't know if this 2nd column is necessary
) as rn
from tab
)
select t2.customerId,
sum(datediff(day, prevEnd, ServiceStartDate)) as Intervaldays
,count(*) as purchases
from cte as t2 left join cte as t1
on t1.customerId = t2.customerId
and t1.rn = t2.rn+1 -- previous and current row
where t2.rn <= 3 -- last three rows
group by t2.customerId;
Same result using LEAD:
with cte as
(
select tab.*,
row_number()
over (partition by customerId
order by ServiceStartDate desc) as rn
,lead(ServiceExpiryDate)
over (partition by customerId
order by ServiceStartDate desc
) as prevEnd
from tab
)
select customerId,
sum(datediff(day, prevEnd, ServiceStartDate)) as Intervaldays
,count(*) as purchases
from cte
where rn <= 3
group by customerId;
Both will not return the expected result unless you subtract purchases (or max(rn)) from Intervaldays. But as you only sum two differences this seems to be not correct for me either...
Additional logic must be applied based on your rules regarding:
customer has less than 3 purchases
overlapping intervals

Assuming there are no overlaps, I think you want this:
select customerId,
sum(datediff(day, ServiceStartDate, ServieEndDate) as Intervaldays
from (select t.*, row_number() over (partition by customerId
order by ServiceStartDate desc) as seqnum
from table t
) t
where seqnum <= 3
group by customerId;

Try this:
SELECT dt.CustomerID,
SUM(DATEDIFF(DAY, dt.PrevExpiry, dt.ServiceStartDate)) As IntervalDays
FROM (
SELECT *
, ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY ServiceStartDate DESC) AS rn
, (SELECT Max(ti.ServiceExpiryDate)
FROM yourTable ti
WHERE t.CustomerID = ti.CustomerID
AND ti.ServiceStartDate < t.ServiceStartDate) As PrevExpiry
FROM yourTable t )dt
GROUP BY dt.CustomerID
Result will be:
CustomerId | IntervalDays
-----------+--------------
A | 805
B | 138

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL: Select Distinct on 1 column - sql

You can use JOIN too SELECT A.[CustomerID], A.[Order No], A.[Order Date] FROM [T.ORDERS] A INNER JOIN ( SELECT [CustomerID], Max([Order Date]) FROM [T.ORDERS] GROUP BY A.[CustomerID], [JOBNO] ) B ON A.[CustomerID]=B.[CustomerID] AND A.[Order Date]=B.[Order Date]

You can use row_number as below: select * from ( Select *, RowN = Row_Number() over (partition by CustomerID order by [Order date] desc) from #yourtable ) a where a.RowN = 1

You may have this query SELECT DISTINCT customerID, MAX(OrderNumber), MAX(OrderDate) FROM table; distinct is faster than group by

Related

SQL - How to count number of distinct values (payments), after sum of rows where they have another column value (Due Date) in common

How to create a rolling total field?

SQL - "not contained in either an aggregate function or the GROUP BY clause."

Fetch last record for each Row number

Finding the interval between dates in SQL Server

Categories

Resources