Grouping records on consecutive dates - sql

If I have following table in Postgres:
order_dtls
Order_id Order_date Customer_name
-------------------------------------
1 11/09/17 Xyz
2 15/09/17 Lmn
3 12/09/17 Xyz
4 18/09/17 Abc
5 15/09/17 Xyz
6 25/09/17 Lmn
7 19/09/17 Abc
I want to retrieve such customer who has placed orders on 2 consecutive days.
In above case Xyz and Abc customers should be returned by query as result.

There are many ways to do this. Use an EXISTS semi-join followed by DISTINCT or GROUP BY, should be among the fastest.
Postgres syntax:
SELECT DISTINCT customer_name
FROM order_dtls o
WHERE EXISTS (
SELEST 1 FROM order_dtls
WHERE customer_name = o.customer_name
AND order_date = o.order_date + 1 -- simple syntax for data type "date" in Postgres!
);
If the table is big, be sure to have an index on (customer_name, order_date) to make it fast - index items in this order.
To clarify, since Oto happened to post almost the same solution a bit faster:
DISTINCT is an SQL construct, a syntax element, not a function. Do not use parentheses like DISTINCT (customer_name). Would be short for DISTINCT ROW(customer_name) - a row constructor unrelated to DISTINCT - and just noise for the simple case with a single expression, because Postgres removes the pointless row wrapper for a single element automatically. But if you wrap more than one expression like that, you get an actual row type - an anonymous record actually, since no row type is given. Most certainly not what you want.
What is a row constructor used for?
Also, don't confuse DISTINCT with DISTINCT ON (expr, ...). See:
Select first row in each GROUP BY group?

Try something like...
SELECT `order_dtls`.*
FROM `order_dtls`
INNER JOIN `order_dtls` AS mirror
ON `order_dtls`.`Order_id` <> `mirror`.`Order_id`
AND `order_dtls`.`Customer_name` = `mirror`.`Customer_name`
AND DATEDIFF(`order_dtls`.`Order_date`, `mirror`.`Order_date`) = 1

The way I would think of it doing it would be to join the table the date part with itselft on the next date and joining it with the Customer_name too.
This way you can ensure that the same customer_name done an order on 2 consecutive days.
For MySQL:
SELECT distinct *
FROM order_dtls t1
INNER JOIN order_dtls t2 on
t1.Order_date = DATE_ADD(t2.Order_date, INTERVAL 1 DAY) and
t1.Customer_name = t2.Customer_name
The result you should also select it with the Distinct keyword to ensure the same customer is not displayed more than 1 time.

For postgresql:
select distinct(Customer_name) from your_table
where exists
(select 1 from your_table t1
where
Customer_name = your_table.Customer_name and Order_date = your_table.Order_date+1 )
Same for MySQL, just instead of your_table.Order_date+1 use: DATE_ADD(your_table.Order_date , INTERVAL 1 DAY)

This should work:
SELECT A.customer_name
FROM order_dtls A
INNER JOIN (SELECT customer_name, order_date FROM order_dtls) as B
ON(A.customer_name = B.customer_name and Datediff(B.Order_date, A.Order_date) =1)
group by A.customer_name

Related

Find Last Purchase Order For Each Part

I need to find the last P.O.for parts purchased from Vendors.
I was trying to come up with a way to do this using a query I found that allowed me to find
the max Creation date for a group of Quotes linked to an Opportunity:
SELECT
t1.[quoteid]
,t1.[OpportunityId]
,t1.[Name]
FROM
[Quote] t1
WHERE
t1.[CreatedOn] = (SELECT MAX(t2.[CreatedOn])
FROM [Quote] t2
WHERE t2.[OpportunityId] = t1.[OpportunityId])
In the case of Purchase Orders, though, I have a header table and a line item table.
So, I need to include info from both:
SELECT
PURCHASE_ORDER.ORDER_DATE
,PURC_ORDER_LINE.PURC_ORDER_ID
,PURC_ORDER_LINE.PART_ID
,PURC_ORDER_LINE.UNIT_PRICE
,PURC_ORDER_LINE.USER_ORDER_QTY
FROM
PURCHASE_ORDER,
PURC_ORDER_LINE
WHERE
PURCHASE_ORDER.ID=
PURC_ORDER_LINE.PURC_ORDER_ID
If the ORDER_DATE from the header were available in the PURC_ORDER_LINE table I thought
this could be done like so:
SELECT
PURC_ORDER_LINE.ORDER_DATE
,PURC_ORDER_LINE.PURC_ORDER_ID
,PURC_ORDER_LINE.PART_ID
,PURC_ORDER_LINE.UNIT_PRICE
,PURC_ORDER_LINE.USER_ORDER_QTY
FROM
PURC_ORDER_LINE T1
WHERE T1.ORDER_DATE=(SELECT MAX(T2.ORDER_DATE)
FROM PURC_ORDER_LINE T2
WHERE T2.PURC_ORDER_ID=T1.PURC_ORDER_ID)
But I'm not sure that's correct and, in any case, there are 2 things:
The ORDER_DATE is in the Header table, not in the line table
I need the last P.O. created for each of the Parts (PART_ID)
So:
PART_A and PART_B, as an example, may appear on several P.O.s
Part
Order Date
P.O. #
PART_A
2020-08-17
PO12345
PART_A
2020-11-21
PO23456
PART_A
2021-07-08
PO29986
PART_B
2019-11-30
PO00861
PART_B
2021-08-30
PO30001
The result set would be (including the other fields from above):
ORDER_DATE
PURC_ORDER_ID
PART_ID
UNIT_PRICE
ORDER_QTY
2021-07-08
PO29986
PART_A
321.00
12
2021-08-30
PO30001
PART_B
426.30
8
I need a query that will give me such a result set.
You can use row-numbering for this. Just place the whole join inside a subquery (derived table), add a row-number, then filter on the outside.
SELECT *
FROM (
SELECT
pol.PART_ID,
po.ORDER_DATE,
pol.PURC_ORDER_ID,
pol.UNIT_PRICE,
pol.USER_ORDER_QTY,
rn = ROW_NUMBER() OVER (PARTITION BY pol.PART_ID ORDER BY po.ORDER_DATE DESC)
FROM PURCHASE_ORDER po
JOIN PURC_ORDER_LINE pol ON po.ID = pol.PURC_ORDER_ID
) po
WHERE po.rn = 1;
Note the use of proper join syntax, as well as table aliases
you can use window function:
select * from (
select * , row_number() over (partition by PART_ID order by ORDER_DATE desc) rn
from tablename
) t where t.rn = 1

Return a NULL value if Date not in CTE

I have a query that counts the number of records imported for every day according to the current date. The only problem is that the count only returns when records have been imported and NULLS are ignored
I have created a CTE with one column in MSSQL that lists dates in a certain range e.g. 2019-01-01 - today.
The query that i've currently got is like this:
SELECT TableName, DateRecordImported, COUNT(*) AS ImportedRecords
FROM Table
WHERE DateRecordImported IN (SELECT * FROM DateRange_CTE)
GROUP BY DateRecordImported
I get the results fine for the dates that exist in the table for example:
TableName DateRecordImported ImportedRecords
______________________________________________
Example 2019-01-01 165
Example 2019-01-02 981
Example 2019-01-04 34
Example 2019-01-07 385
....
but I need a '0' count returned if the date from the CTE is not in the Table. Is there a better alternative to use in order to return a 0 count or does my method need altering slightly
You can do LEFT JOIN :
SELECT C.Date, COUNT(t.DateRecordImported) AS ImportedRecords
FROM DateRange_CTE C LEFT JOIN
table t
ON t.DateRecordImported = C.Date -- This may differ use actual column name instead
GROUP BY C.Date; -- This may differ use actual column name instead
Move the position of the CTE from a subquery to the FROM:
SELECT T.TableName,
DT.PCTEDateColumn} AS DateRecordImported,
COUNT(T.{TableIDColumn}) AS ImportedRecords
FROM DateRange_CTE DT
LEFT JOIN [Table] T ON DT.{TEDateColumn} = T.DateRecordImported
GROUP BY DT.{CTEDateColumn};
You'll need to replace the values in braces ({})
You can try this
SELECT TableName, DateRecordImported,
case when DateRecordImported is null
then '0'
else count(*) end AS ImportedRecords
FROM Table full join DateRange_CTE
on Table.DateRecordImported DateRange_CTE.ImportedDate
group by DateRecordImported,ImportedDate
(ImportedDate is name of column of CTE)

Group table by custom column

I have a table transaction_transaction with columns:
id, status, total_amount, date_made, transaction_type
The status can be: Active, Paid, Trashed, Renewed, Void
So what i want to do is filter by date and status, but since sometimes there are no records with Renewed or Trashed, i get inconsistent data it returns only Active and Paid when grouping by status ( notice Renewed and Trashed is missing ). I want it allways to return smth like:
-----------------------------------
Active | 121 | 2017-08-09
Paid | 122 | 2017-08-19
Trashed | 123 | 2017-08-20
Renewed | 123 | 2017-08-20
The sql query i use:
SELECT
ST.type,
COALESCE(SUM(TR.total_amount), 0) AS amount
FROM sms_admin_status ST
LEFT JOIN transaction_transaction TR ON TR.status = ST.type
WHERE TR.store_id = 21 AND TR.transaction_type = 'Layaway' AND TR.status != 'Void'
AND TR.date_made >= '2018-02-01' AND TR.date_made <= '2018-02-26'
GROUP BY ST.type
Edit: I created a table sms_admin_status since you said its bad not having a table and in the future i might have new statuses, and i also changed the query to fit my needs.
Use a VALUES list in a subquery to LEFT JOIN your transaction table. You may need to NULLIF your sums to have them return 0.
https://www.postgresql.org/docs/10/static/queries-values.html
One possible solution (not very nice one) is the following
select statuses.s, date_made, coalesce(SUM(amount), 0)
from (values('active'),('inactive'),('deleted')) statuses(s)
left join transactions t on statuses.s = t.status and
date_made >= '2017-08-08'
group by statuses.s, date_made
I assume that you forgot to add date_made to the group by. therefore, I added it there. As you can see the possible values are hardcoded in the SQL. Some other solution (much more cleaner) is to create a table with possible values of status and replace my statuses.
Use SELECT ... FROM (VALUES) with restriction from the transaction table:
select * from (values('active', 0),('inactive', 0),('deleted', 0)) as statuses
where column1 not in (select status from transactions)
union select status, sum(amount) from transactions group by status
Add the date column as need be, I assume it's a static value
The multiple where statements will limit the rows selected unless they are in a sub-query. May I suggest something like the following?
SELECT ST.type, ISNULL(SELECT SUM(TR.total_amount)
FROM transaction_transaction TR
WHERE TR.status = ST.type AND TR.store_id = 21 AND TR.transaction_type = 'Layaway' AND TR.status != 'Void'
AND TR.date_made >= '2018-02-01' AND TR.date_made <= '2018-02-26'),0) AS amount
FROM sms_admin_status ST
GROUP BY ST.type

Get smallest date for each element in access query

So I have a table containing different elements and dates.
It basically looks like this:
actieElement beginDatum
1 1/01/2010
1 1/01/2010
1 10/01/2010
2 1/02/2010
2 3/02/2010
What I now need is the smallest date for every actieElement.
I've found a solution using a simple GROUP BY statement, but that way the query loses its scope and you can't change anything anymore.
Without the GROUP BY statement I get multiple dates for every actieElement because certain dates are the same.
I thought of something like this, but it also does not work as it would give the subquery more then 1 record:
SELECT s1.actieElement, s1.begindatum
FROM tblActieElementLink AS s1
WHERE (((s1.actieElement)=(SELECT TOP 1 (s2.actieElement)
FROM tblActieElementLink s2
WHERE s1.actieElement = s2.actieElement
ORDER BY s2.begindatum ASC)));
Try this
SELECT s1.actieElement, s1.begindatum
FROM tblActieElementLink AS s1
WHERE s1.begindatum =(SELECT MIN(s2.begindatum)
FROM tblActieElementLink s2
WHERE s1.actieElement = s2.actieElement
);
SELECT DISTINCT T1.actieElement, T1.beginDatum
FROM tblActieElementLink AS T1
INNER JOIN (
SELECT T2.actieElement,
MIN(T2.beginDatum) AS smallest_beginDatum
FROM tblActieElementLink AS T2
GROUP
BY T2.actieElement
) AS DT1
ON T1.actieElement = DT1.actieElement
AND T1.beginDatum = DT1.smallest_beginDatum;
Add a DISTINCT clause to your SELECT.

Variant use of the GROUP BY clause in TSQL

Imagine the following schema and sample data (SQL Server 2008):
OriginatingObject
----------------------------------------------
ID
1
2
3
ValueSet
----------------------------------------------
ID OriginatingObjectID DateStamp
1 1 2009-05-21 10:41:43
2 1 2009-05-22 12:11:51
3 1 2009-05-22 12:13:25
4 2 2009-05-21 10:42:40
5 2 2009-05-20 02:21:34
6 1 2009-05-21 23:41:43
7 3 2009-05-26 14:56:01
Value
----------------------------------------------
ID ValueSetID Value
1 1 28
etc (a set of rows for each related ValueSet)
I need to obtain the ID of the most recent ValueSet record for each OriginatingObject. Do not assume that the higher the ID of a record, the more recent it is.
I am not sure how to use GROUP BY properly in order to make sure the set of results grouped together to form each aggregate row includes the ID of the row with the highest DateStamp value for that grouping. Do I need to use a subquery or is there a better way?
You can do it with a correlated subquery or using IN with multiple columns and a GROUP-BY.
Please note, simple GROUP-BY can only bring you to the list of OriginatingIDs and Timestamps. In order to pull the relevant ValueSet IDs, the cleanest solution is use a subquery.
Multiple-column IN with GROUP-BY (probably faster):
SELECT O.ID, V.ID
FROM Originating AS O, ValueSet AS V
WHERE O.ID = V.OriginatingID
AND
(V.OriginatingID, V.DateStamp) IN
(
SELECT OriginatingID, Max(DateStamp)
FROM ValueSet
GROUP BY OriginatingID
)
Correlated Subquery:
SELECT O.ID, V.ID
FROM Originating AS O, ValueSet AS V
WHERE O.ID = V.OriginatingID
AND
V.DateStamp =
(
SELECT Max(DateStamp)
FROM ValueSet V2
WHERE V2.OriginatingID = O.ID
)
SELECT OriginatingObjectID, id
FROM (
SELECT id, OriginatingObjectID, RANK() OVER(PARTITION BY OriginatingObjectID
ORDER BY DateStamp DESC) as ranking
FROM ValueSet)
WHERE ranking = 1;
This can be done with a correlated sub-query. No GROUP-BY necessary.
SELECT
vs.ID,
vs.OriginatingObjectID,
vs.DateStamp,
v.Value
FROM
ValueSet vs
INNER JOIN Value v ON v.ValueSetID = vs.ID
WHERE
NOT EXISTS (
SELECT 1
FROM ValueSet
WHERE OriginatingObjectID = vs.OriginatingObjectID
AND DateStamp > vs.DateStamp
)
This works only if there can not be two equal DateStamps for a OriginatingObjectID in the ValueSet table.