How do I pull in duplicate charges based upon certain criteria - sql

I am trying to find duplicate orders within the dataset that I can search from.
Table: Transaction_Data
Columns: Guest_ID, Order_ID, Name, Quote_Date, Arrival_Date, Sale_Location, Product_Code and Deposit_Amount
Guest_ID
Order_ID
Name
Quote_Date
Arrival_Date
Sale_Location
Product_Code
Deposit_Amount
1
123455
Guest1
12/1/2022
12/20/2022
Location1
Product1
100
1
123456
Guest1
12/1/2022
12/20/2022
Location1
Product1
100
2
123457
Guest2
12/2/2022
12/21/2022
Location2
Product2
105
3
123458
Guest3
12/3/2022
12/22/2022
Location3
Product3
110
1
123459
Guest1
12/1/2022
12/20/2022
Location1
Product1
100
I have tried the following but have not had good luck:
SELECT Guest_ID,Order_ID,COUNT(Order_ID) AS Quantity,Name,Quote_Date,Arrival_Date,Sale_Location,Product_Code,Deposit_Amount
FROM Transaction_Data WHERE Quote_date >='2022-11-01' and Deposit_Amount NOT LIKE '-%'
GROUP BY Guest_ID,Order_ID,Name,Quote_Date,Arrival_Date,Sale_Location,Product_Code,Deposit_Amount
HAVING COUNT (Order_ID) >1
ORDER BY Order_ID,Guest_ID
SELECT Guest_ID,Order_ID,Name,Quote_Date,Arrival_Date,Sale_Location,Product_Code,Deposit_Amount
INTO #TempOrder
FROM Transaction_Data WHERE Quote_Date >='2022-11-01' and Deposit_Amount NOT LIKE '-%'
SELECT Guest_ID,Order_ID,
(SELECT MAX (Order_ID)
FROM Transaction_Data TD
WHERE TD.Order_ID < TO.Order_ID
) AS Prev_Order,
(SELECT MIN(Order_ID)
FROM Transaction_Data TD
WHERE TD.Order_ID > TO.Order_ID
) As Nxt_Order, TO.Name,TO.Quote_Date,TO.Arrival_Date,TO.Sale_Location,TO.Product_Code,TO.Deposit_Amount
FROM #TempOrder
This is giving me the Order_ID that is before and after, but is not giving me duplicates based upon the Guest.
If you look at the above table I am attempting to pull in only the following records:
Guest_ID
Order_ID
Name
Quote_Date
Arrival_Date
Sale_Location
Product_Code
Deposit_Amount
1
123455
Guest1
12/1/2022
12/20/2022
Location1
Product1
100
1
123456
Guest1
12/1/2022
12/20/2022
Location1
Product1
100
1
123459
Guest1
12/1/2022
12/20/2022
Location1
Product1
100
EDIT: I will do better next time I have an answer, but this has been answered.
I ended up putting these into a temp table.
DROP TABLE IF EXISTS #TempRes;
SELECT rth.ip_number,
ror.reservation_id,
COUNT(ror.reservation_id) OVER (PARTITION BY rth.ip_number,
ror.quote_date,
rdd.deposit_amount
ORDER BY ror.reservation_id
) AS quantity,
MIN(ror.reservation_id) OVER (PARTITION BY rth.ip_number, ror.quote_date, rdd.deposit_amount) AS min_order,
ror.name,
ror.quote_date,
ror.arrival_date,
rth.sale_location_code,
rdd.deposit_amount
INTO #TempRes
FROM dbo.r_order_reservation ror
JOIN dbo.r_transaction_header rth
ON rth.reservation_id = ror.reservation_id
JOIN dbo.r_transaction_detail rtd
ON rtd.reservation_id = ror.reservation_id
AND rtd.product_header_code != '7777777'
JOIN dbo.r_deposit_detail rdd
ON rdd.reservation_id = ror.reservation_id
WHERE ror.quote_date > '2022-12-01'
AND ror.operator_id = 'freeride'
AND rdd.deposit_amount NOT LIKE '-%';
SELECT *
FROM #TempRes tr
WHERE tr.quantity > 1
ORDER BY tr.reservation_id,
tr.ip_number;
DROP TABLE #TempRes;
This gave me the quantity of each but also grouped them together correctly.
Thank you!

There is a problem in your Grouping expression, you have included Order_ID but that is unique, so not duplicated. To find a duplicate you need to remove that field from the comparison.
This response highlights a standard solution to selecting just the duplicated records by using the ROW_NUMBER() window function that partitions by your grouping set, then we return only the records that were not the first record in the set, so only the duplicates, not the original.
Window functions provide us a way to evaluate aggregate expressions across a dataset inline with the entire set, so without actually grouping the results.
There is a caveat, we need to use a CTE or subquery to calculate the Window Function if we want to filter or group on the results of the aggregation.
;WITH AggregatedData as
(
SELECT Guest_ID,Order_ID
,ROW_NUMBER() OVER(
PARTITION BY Guest_ID, Quote_Date, Product_Code,Deposit_Amount
ORDER BY Order_ID
) AS Order_Duplicate
,Name,Quote_Date,Arrival_Date
,Sale_Location,Product_Code,Deposit_Amount
FROM Transaction_Data WHERE Quote_date >='2022-11-01' and Deposit_Amount NOT LIKE '-%'
)
SELECT Guest_ID,Order_ID,Name,Quote_Date,Arrival_Date
,Sale_Location,Product_Code,Deposit_Amount
FROM AggregatedData
WHERE Order_Duplicate > 1
ORDER BY Order_ID,Guest_ID
This should return just the duplicated records, which is often more useful because these represent the transactions that most likely need to be reversed or refunded.
Guest_ID
Order_ID
Name
Quote_Date
Arrival_Date
Sale_Location
Product_Code
Deposit_Amount
1
123456
Guest1
12/1/2022
12/20/2022
Location1
Product1
100
1
123459
Guest1
12/1/2022
12/20/2022
Location1
Product1
100
If you want to return ALL of the records where there is a duplicate, then we can use the window query version of COUNT() with the OVER() clause which is very similar:
;WITH AggregatedData as
(
SELECT Guest_ID,Order_ID
,COUNT() OVER(
PARTITION BY Guest_ID, Quote_Date, Product_Code, Deposit_Amount
ORDER BY Order_ID
) AS Quantity
,Name,Quote_Date,Arrival_Date
,Sale_Location,Product_Code,Deposit_Amount
FROM Transaction_Data WHERE Quote_date >='2022-11-01' and Deposit_Amount NOT LIKE '-%'
)
SELECT Guest_ID,Order_ID,Name,Quote_Date,Arrival_Date
,Sale_Location,Product_Code,Deposit_Amount
FROM AggregatedData
WHERE Quantity > 1
ORDER BY Order_ID, Guest_ID
Guest_ID
Order_ID
Name
Quote_Date
Arrival_Date
Sale_Location
Product_Code
Deposit_Amount
1
123455
Guest1
12/1/2022
12/20/2022
Location1
Product1
100
1
123456
Guest1
12/1/2022
12/20/2022
Location1
Product1
100
1
123459
Guest1
12/1/2022
12/20/2022
Location1
Product1
100
We can extend this further by including both of the COUNT and ROW_NUMBER results in the output or we can use other aggregates like MIN:
;WITH AggregatedData as
(
SELECT Guest_ID,Order_ID
,COUNT() OVER(
PARTITION BY Guest_ID, Quote_Date, Product_Code, Deposit_Amount
ORDER BY Order_ID
) AS Quantity
,MIN(Order_ID) OVER(
PARTITION BY Guest_ID, Quote_Date, Product_Code, Deposit_Amount
) AS Min_Order_ID
,Name,Quote_Date,Arrival_Date
,Sale_Location,Product_Code,Deposit_Amount
FROM Transaction_Data WHERE Quote_date >='2022-11-01' and Deposit_Amount NOT LIKE '-%'
)
SELECT Guest_ID,Order_ID,Quantity,Min_Order_ID,Name,Quote_Date
,Arrival_Date,Sale_Location,Product_Code,Deposit_Amount
FROM AggregatedData
WHERE Quantity > 1
ORDER BY Order_ID, Guest_ID
Guest_ID
Order_ID
Quantity
Min_Order_ID
Name
Quote_Date
Arrival_Date
Sale_Location
Product_Code
Deposit_Amount
1
123455
3
123455
Guest1
12/1/2022
12/20/2022
Location1
Product1
100
1
123456
3
123455
Guest1
12/1/2022
12/20/2022
Location1
Product1
100
1
123459
3
123455
Guest1
12/1/2022
12/20/2022
Location1
Product1
100
Technically we could have used MIN as the first example, and filtered where the result of MIN did not equal the Order_ID but that syntax is harder to interpret if you are not familiar with these concepts.
Update: Without CTE
Common Table Expressions (CTE/WITH) and the Window Query Functions (OVER()) were both introduced in SQL Server 2005. If your SSMS supports OVER() then it will also support WITH. A caveate to using WITH is that it must be the first stratement in an expression, the simple way around this is to add a ; prefix any time you use ;WITH. I have updated this post to reflect this convention.
If you do not want to, or cannot use a CTE, then a simple nested query will do the same job:
SELECT Guest_ID,Order_ID,Quantity,Min_Order_ID,Name,Quote_Date
,Arrival_Date,Sale_Location,Product_Code,Deposit_Amount
FROM (
SELECT Guest_ID,Order_ID
,COUNT() OVER(
PARTITION BY Guest_ID, Quote_Date, Product_Code, Deposit_Amount
ORDER BY Order_ID
) AS Quantity
,MIN(Order_ID) OVER(
PARTITION BY Guest_ID, Quote_Date, Product_Code, Deposit_Amount
) AS Min_Order_ID
,Name,Quote_Date,Arrival_Date
,Sale_Location,Product_Code,Deposit_Amount
FROM Transaction_Data
WHERE Quote_date >='2022-11-01'
and Deposit_Amount NOT LIKE '-%'
) AggregatedData
WHERE Quantity > 1
ORDER BY Order_ID, Guest_ID

Related

BigQuery: JOIN on single matching row

I have two tables, one containing orders with a nested line_items structure and another with a pricing history for each product sku code.
Orders Table
order_id
order_date
item_sku
item_quantity
item_subtotal
1
2022-23-07
SKU1
7
12.34
SKU2
1
9.99
2
2022-12-07
SKU1
1
1.12
SKU3
5
32.54
Price History Table
item_sku
effective_date
cost
SKU1
2022-20-07
0.78
SKU2
2022-02-03
4.50
SKU1
2022-02-03
0.56
SKU3
2022-02-03
4.32
Desired Output
order_id
order_date
item_sku
item_quantity
item_subtotal
cost
1
2022-23-07
SKU1
7
12.34
0.78
SKU2
1
9.99
4.50
2
2022-12-07
SKU1
1
1.12
0.56
SKU3
5
32.54
4.32
I'm trying to get the product cost by finding the cost at the time of the order being placed.
SELECT order_id, order_date,
ARRAY(
SELECT AS STRUCT
item_sku,
item_quantity,
item_subtotal,
cost.product_cost
FROM UNNEST(line_items) as items
JOIN `price_history_table` as cost
ON items.item_sku = cost.sku AND effective_date < order_date
) AS line_items,
FROM
`order_data_table`
The above query works but creates a separate line_item array row for each record in the price history table.
How can I match on just the most recent price for that sku. I want to add something like this
ORDER BY effective_date DESC LIMIT 1
But can't work out how to add it.
How can I match on just the most recent price for that sku
You need to add below line into subquery and move join out of select to address correlated subquery issue
QUALIFY 1 = ROW_NUMBER() OVER(PARTITION BY item_sku ORDER BY effective_date DESC)
so, the final query will look like below
SELECT order_id, order_date,
ARRAY_AGG(line_item) AS line_items
FROM (
SELECT order_id, order_date,
STRUCT(item_sku,
item_quantity,
item_subtotal,
cost.product_cost) AS line_item
FROM `order_data_table`, UNNEST(line_items) AS items
JOIN `price_history_table` AS cost
ON items.item_sku = cost.sku AND effective_date < order_date
QUALIFY 1 = ROW_NUMBER() OVER(PARTITION BY order_id, order_date, item_sku ORDER BY effective_date DESC)
)
GROUP BY order_id, order_date
with output

how to avoid sum(sum()) when writing this postgres query with window functions?

Runnable query example at https://www.db-fiddle.com/f/ssrpQyyajYdZkkkAJBaYUp/0
I have a postgres table of sales; each row has a sale_id, product_id, salesperson, and price.
I want to write a query that returns, for each (salesperson, product_id) tuple with at least one sale:
The total of price for all of the sales made by that salesperson for that product (call this product_sales).
The total of price over all of that salesperson's sales (call this total_sales).
My current query is as follows, but I feel silly writing sum(sum(price)). Is there a more standard/idiomatic approach?
select
salesperson,
product_id,
sum(price) as product_sales,
sum(sum(price)) over (partition by salesperson) as total_sales
from sales
group by 1, 2
order by 1, 2
Writing sum(price) instead of sum(sum(price)) yields the following error:
column "sales.price" must appear in the GROUP BY clause or be used in an aggregate function
UPDATES
See this response for a nice approach using a WITH clause. I feel like I ought to be able to do this without a subquery or WITH.
Just stumbled on this response to a different question which proposes both sum(sum(...)) and a subquery approach. Perhaps these are the best options?
You can use a Common Table Expression to simplify the query and do it in two steps.
For example:
with
s as (
select
salesperson,
product_id,
sum(price) as product_sales
from sales
group by salesperson, product_id
)
select
salesperson,
product_id,
product_sales,
sum(product_sales) over (partition by salesperson) as total_sales
from s
order by salesperson, product_id
Result:
salesperson product_id product_sales total_sales
------------ ----------- -------------- -----------
Alice 1 2000 5400
Alice 2 2200 5400
Alice 3 1200 5400
Bobby 1 2000 4300
Bobby 2 1100 4300
Bobby 3 1200 4300
Chuck 1 2000 4300
Chuck 2 1100 4300
Chuck 3 1200 4300
See running example at DB Fiddle.
You can try the below -
select * from
(
select
salesperson,
product_id,
sum(price) over(partition by salesperson,product_id) as product_sales,
sum(price) over(partition by salesperson) as total_sales,
row_number() over(partition by salesperson,product_id order by sale_id) as rn
from sales s
)A where rn=1

How to find lowest value from one columns that has been Grouped by SQL Server

I'm looking for some assistance: I am looking to get this into a report but not sure how to achieve this.
Here is the data stored in the table:
Product | Quantity | Status | Line
Product1 1 Active 1000
Product2 2 Active 2000
Product1 2 Active 3000
Product1 1 InDev 4000
Product2 2 Active 5000
I am grouping by Product and Status and summing up Quantity.
But looking to also retrieve the lowest line number for row in the group.
My expected result would be like below:
Product | Quantity | Status | Line
Product1 3 Active 1000
Product2 4 Active 2000
Product1 1 InDev 5000
Any help would be greatly appreciated
This can be done if you group by Product, Status and aggregate:
select Product, sum(Quantity) Quantity, Status, min(Line) Line
from tablename
group by Product, Status
You can use window function :
select *
from (select t.*, row_number() over (partition by product, status order by line) as seq,
sum(qty) over (partition by product, status) as sum_qty
from table t
) t
where seq = 1;
If table has only available columns (in question) then you can do aggregation :
select product, sum(qty), status, min(line) as line
from table t
group by product, status
order by line;

How to remove duplicate accounts in SQL?

I am using SQL Server 2008 and I was wondering how to remove duplicate customers either from the table or exclude it in my query. An Account_ID can only have 1 product associated with it. And the account with the most recent purchase date is what should be showing. An example is below:
Account_ID, Account_Purchase, Purchase_Date
1 Product 1 1/1/2016
2 Product 1 1/2/2016
3 Product 2 1/5/2016
1 Product 3 3/12/2016
4 Product 3 1/5/2016
Ideally I would only see:
Account_ID, Account_Purchase, Purchase_Date
2 Product 1 1/2/2016
3 Product 2 1/5/2016
1 Product 3 3/12/2016
4 Product 3 1/5/2016
This should not show up because it is not the most recent purchase from account 1
Account_ID, Account_Purchase, Purchase_Date
1 Product 1 1/1/2016
Thank you all for help, folks!
Simply acquire the latest purchase_date using max and group by account_id. Then use inner join to get the other details from the acquired details.
SELECT TABLE_NAME.* FROM TABLE_NAME
INNER JOIN(
SELECT Account_ID, MAX(Purchase_Date) AS Purchase_Date
GROUP BY Account_ID
) LatestPurchases
ON TABLE_NAME.Account_ID = LatestPurchases.Account_ID
AND TABLE_NAME.Purchase_Date = LatestPurchases.Purchase_Date
Try below query, please replace TABLENAME with your table
WITH CTE
AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Account_ID ORDER BY Purchase_Date DESC) AS RN
FROM TABLENAME
)
SELECT
*
FROM CTE
WHERE RN = 1
Here is another query
SELECT
t.Account_id,
t.Account_Purchase,
t.Purchase_Date
FROM
tablename t
WHERE
t.Purchase_Date = (SELECT MAX(Purchase_date) FROM Tablename WHERE Account_ID = t.Account_ID)
ORDER BY
t.Purchase_Date DESC

How to Retrieve Maximum Value of Each Group? - SQL

There is a table tbl_products that contains data as shown below:
Id Name
----------
1 P1
2 P2
3 P3
4 P4
5 P5
6 P6
And another table tbl_inputs that contains data as shown below:
Id Product_Id Price Register_Date
----------------------------------------
1 1 10 2010-01-01
2 1 20 2010-10-11
3 1 30 2011-01-01
4 2 100 2010-01-01
5 2 200 2009-01-01
6 3 500 2011-01-01
7 3 270 2010-10-15
8 4 80 2010-01-01
9 4 50 2010-02-02
10 4 92 2011-01-01
I want to select all products(id, name, price, register_date) with maximum date in each group.
For Example:
Id Name Price Register_Date
----------------------------------------
3 P1 30 2011-01-01
4 P2 100 2010-01-01
6 P3 500 2011-01-01
10 P4 92 2011-01-01
select
id
,name
,code
,price
from tbl_products tp
cross apply (
select top 1 price
from tbl_inputs ti
where ti.product_id = tp.id
order by register_date desc
) tii
Although is not the optimum way you can do it like:
;with gb as (
select
distinct
product_id
,max(register_date) As max_register_date
from tbl_inputs
group by product_id
)
select
id
,product_id
,price
,register_date
from tbl_inputs ti
join gb
on ti.product_id=gb.product_id
and ti.register_date = gb.max_register_date
But as I said earlier .. this is not the way to go in this case.
;with cte as
(
select t1.id, t1.name, t1.code, t2.price, t2.register_date,
row_number() over (partition by product_id order by register_date desc) rn
from tbl_products t1
join tbl_inputs t2
on t1.id = t2.product_id
)
select id, name, code, price, register_date
from cte
where rn = 1
Something like this..
select id, product_id, price, max(register_date)
from tbl_inputs
group by id, product_id, price
you can use the max function and the group by clause. if you only need results from the table tbl_inputs you even don't need a join
select product_id, max(register_date), price
from tbl_inputs
group by product_id, price
if you need field from the tbl_prducts you have to use a join.
select p.name, p. code, i.id, i.price, max(i.register_date)
from tbl_products p join tbl_inputs i on p.id=i.product_id
grooup by p.name, p. code, i.id, i.price
Try this:
SELECT id, product_id, price, register_date
FROM tbl_inputs T1 INNER JOIN
(
SELECT product_id, MAX(register_date) As Max_register_date
FROM tbl_inputs
GROUP BY product_id
) T2 ON(T1.product_id= T2.product_id AND T1.register_date= T2.Max_register_date)
This is, of course, assuming your dates are unique. if they are not, you need to add the DISTINCT Keyword to the outer SELECT statement.
edit
Sorry, I didn't explain it very well. Your dates can be duplicated, it's not a problem as long as they are unique per product id. if you can have duplicated dates per product id, then you will have more then one row per product in the outcome of the select statement I suggested, and you will have to find a way to reduce it to one row per product.
i.e:
If you have records like that (when the last date for a product appears more then once in your table with different prices)
id | product_Id | price | register_date
--------------------------------------------
1 | 1 | 10.00 | 01/01/2000
2 | 1 | 20.00 | 01/01/2000
it will result in having both of these records as outcome.
However, if the register_date is unique per product id, then you will get only one result for each product id.