SQL - Manage duplicates using hierarchy - sql

Consider the following table.
Customer
Category
Key
Ajax
VIP
1A
Zeus
Retail
2B
Hera
Retail
3C
Ajax
Retail
1A
Notice the duplicate 1A value, which is both VIP and Retail.
How to manage these duplicates using a hierarchy, where if VIP, then keep VIP and remove retail ?
End result should be
Customer
Category
Key
Ajax
VIP
1A
Zeus
Retail
2B
Hera
Retail
3C
Thanks
I've tried assigning values as such:
VIP = 100
Retail = 1
Then group by Key and summing the new column. If results are within a specific range (>100) then the customer gets assigned VIP, otherwise Retail. In the end, original Category column is removed, replaced by Computed_Category.
Looking for a more elegant method.
Edit:
There are 17 categories, with each superseding those below in rank.
Example here
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY CUSTNUMBER ORDER BY
CASE
WHEN CLASS = 'X' THEN 1
WHEN CLASS = 'Y' THEN 2
ELSE 3 END, UNIQUEID) RN
FROM (VALUES (100,'AA','Z')
, (200,'BB','X')
, (300,'CC','X')
, (400,'DD','Y')
, (100,'AA','Y')
, (100,'AA','X')
) t1 (CUSTNUMBER, UNIQUEID, CLASS)
ORDER BY UNIQUEID, CLASS
End result should remove 100-AA-Y and 100-AA-Z.
If a category is not found, use the next one in the hierarchy.

You could use row_number function with order by case expression as the following:
Select Customer, Category, Key_
From
(
Select *,
ROW_NUMBER() Over (Partition By Customer Order By
Case When Category = 'VIP' Then 1
When Category = 'Corporate' Then 2
When Category = 'Retail' Then 3
-- continue for all categoreis
Else 17
End, key_) rn
From table_name
) T
Where rn = 1
If you could have a separate table that holds each category and it's priority, you may simplify the query as the following:
Create table Categories(Category VARCHAR(50), priority INT);
Insert Into Categories Values
('VIP', 1),
('Corporate', 2),
('Retail', 3); -- list all categories
And the query:
Select Customer, Category, Key_
From
(
Select T.Customer, T.Category, T.Key_,
ROW_NUMBER() Over (Partition By T.Customer Order By C.priority, T.key_) rn
From table_name T LEFT JOIN Categories C
ON T.Category = C.Category
) T
Where rn = 1
See demo.

Related

Last record per transaction

I am trying to select the last record per sales order.
My query is simple in SQL Server management.
SELECT *
FROM DOCSTATUS
The problem is that this database has tens of thousands of records, as it tracks all SO steps.
ID SO SL Status Reason Attach Name Name Systemdate
22 951581 3 Processed Customer NULL NULL BW 2016-12-05 13:33:27.857
23 951581 3 Submitted Customer NULL NULL BW 2016-17-05 13:33:27.997
24 947318 1 Hold Customer NULL NULL bw 2016-12-05 13:54:27.173
25 947318 1 Invoices Submit Customer NULL NULL bw 2016-13-05 13:54:27.300
26 947318 1 Ship Customer NULL NULL bw 2016-14-05 13:54:27.440
I would to see the most recent record per the SO
ID SO SL Status Reason Attach Name Name Systemdate
23 951581 4 Submitted Customer NULL NULL BW 2016-17-05 13:33:27.997
26 947318 1 Ship Customer NULL NULL bw 2016-14-05 13:54:27.440
Well I'm not sure how that table has two Name columns, but one easy way to do this is with ROW_NUMBER():
;WITH cte AS
(
SELECT *,
rn = ROW_NUMBER() OVER (PARTITION BY SO ORDER BY Systemdate DESC)
FROM dbo.DOCSTATUS
)
SELECT ID, SO, SL, Status, Reason, ..., Systemdate
FROM cte WHERE rn = 1;
Also please always reference the schema, even if today everything is under dbo.
I think you can keep it this simple:
SELECT *
FROM DOCSTATUS
WHERE ID IN (SELECT MAX(ID)
FROM DOCSTATUS
GROUP BY SO)
You want only the maximum ID from each SO.
An efficient method with the right index is a correlated subquery:
select t.*
from t
where t.systemdate = (select max(t2.systemdate) from t t2 where t2.so = t.so);
The index is on (so, systemdate).

Finding updates in a table using Self-Join

I have a table as shown below
tablename - property
|runId|listingId|listingName
1 123 abc
1 234 def
2 123 abcd
2 567 ghi
2 234 defg
As you can see in above code there is a runId and there is a listing Id. I am trying to fetch for a particular runId which are the new listings added (In this case for runId 2 its 4th row with listing id 567 ) and which are the listing Ids that are update (In this case its row 3 and row 5 with listingId 123 and 234 respectively)
I am trying self join and it is working fairly for new updates but new additions are giving me trouble
SELECT p1.* FROM property p1
INNER JOIN property p2
ON p1.listingid = p2.listingid
WHERE p1.runid=456 AND p2.runid!=456
The above query provides me correct updated records in the table. But I am not able to find new listing. I used p1.listingid != p2.listingId , left outer join, still wont work.
I would use the ROW_NUMBER() analytical function for it.
SELECT
T.*
FROM
(
SELECT
T.*,
CASE
WHEN ROW_NUMBER() OVER(
PARTITION BY LISTINGID
ORDER BY
RUNID
) = 1 THEN 'INSERTED'
ELSE 'UPDATED'
END AS OPERATION_
FROM
PROPERTY
)
WHERE
RUNID = 2
-- AND OPERATION_ = 'INSERTED'
-- AND OPERATION_ = 'UPDATED'
This will provide the result as updated if listingid is added in any of the previous runid
Cheers!!
You may try this.
with cte as (
select row_number() over (partition by listingId order by runId) as Slno, * from property
)
select * from property where listingId not in (
select listingId from cte as c where slno>1
) --- for new listing added
with cte as (
select row_number() over (partition by listingId order by runId) as Slno, * from property
)
select * from property where listingId in (
select listingId from cte as c where slno>1
) --- for modified listing
For this, I would recommend exists and not exists. For updates:
select p.*
from property p
where exists (select 1
from property p2
where p2.listingid = p.listingid and
p2.runid < p.runid
);
If you want the result for a particular runid, add and runid = ? to the outer query.
And for new listings:
select p.*
from property p
where not exists (select 1
from property p2
where p2.listingid = p.listingid and
p2.runid < p.runid
);
With an index on property(listingid, runid), I would expect this to have somewhat better performance than a solution using window functions.
Here is a db<>fiddle.

SQL - create flag in query to highlight order which contain quantity = 1

I have tried creating a case statement but doesn't seem to give me what i want. Id like to get a split of the table (which is at a product level) and aggregate at an order level of items which contain quantity of 1.
Any ideas on how I would do this?
order id | Product | Quantity
---------+---------+--------------
11111 | sdsd4 | 1 (single item )
22222 | sasas | 1 (multiple items)
22222 | wertt | 1 (multiple items)
I'd like to get a case statement to add another column to split out orders with quantity = 1 and orders greater 1
Any idea on how I would do this?
The desired outcome would be the column in (brackets)
I could then count the orders and bring in the newly created column as the dimension
More detail here:
enter image description here
Attached is an image of table structure.
Logic, if quantity = 1 and 1 order then single item order
if order has one item but multiples of same item non single item order
if order has more than one product then non single item order
If your database supports analytic functions, then you can use a query like this one:
SELECT *,
CASE WHEN count("Product") OVER (partition by "order id") > 1
THEN 'multiple items' ELSE 'single item'
END As "How many items"
FROM Table1
Demo: https://dbfiddle.uk/?rdbms=postgres_11&fiddle=b659279fc16d2084cb1cf4a3bea361a1
Below is for BigQuery Standard SQL
#standardSQL
SELECT *,
CASE COUNT(DISTINCT Product) OVER(PARTITION BY order_id)
WHEN 1 THEN 'Single Item Order'
ELSE 'Multiple Items Order'
END Single_or_Multiple
FROM `project.dataset.table`
You can test, play with above using dummy data as below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 11111 order_id, 'sdsd4' Product, 1 Quantity UNION ALL
SELECT 22222, 'sasas', 2 UNION ALL
SELECT 22222, 'wertt', 1
)
SELECT *,
CASE COUNT(DISTINCT Product) OVER(PARTITION BY order_id)
WHEN 1 THEN 'Single Item Order'
ELSE 'Multiple Items Order'
END Single_or_Multiple
FROM `project.dataset.table`
with result
Row order_id Product Quantity Single_or_Multiple
1 11111 sdsd4 1 Single Item Order
2 22222 sasas 2 Multiple Items Order
3 22222 wertt 1 Multiple Items Order
If I understand this right, you could use a subquery to get the count of records for an order and flag a record, if this count is larger then 1 and the quantity is equal to 1.
SELECT t1.order_id,
t1.product,
t1.quantity,
CASE
WHEN t1.quantity = 1
AND (SELECT count(*)
FROM elbat t2
WHERE t2.order_id = t1.order_id) > 1 THEN
'flag'
ELSE
'no flag'
END flag
FROM elbat t1;

Select if then case with first record

Can you do something like this in SQL Server?
I want to select from a table which has some records with the same product_id in one column and a Y or N in another (in stock), and take the first one which has a Y where the product_id is the same, while matching the product_id_set from another table.
... ,
SELECT
(SELECT TOP 1
(product_name),
CASE
WHEN in_stock = 'Y' THEN product_name
ELSE product_name
END
FROM
Products
WHERE
Products.product_set = Parent_Table.product_set) AS 'Product Name',
...
Sample data would be
product_set in_stock product_id product_name
---------------------------------------------------
1 N 12 Orange
1 Y 12 Pear
2 N 12 Apple
2 N 12 Lemon
Output from product_set = 1 would be 'Pear' for example.
So there's kind of two solutions depending on the answer to the following question. If there are no records for a product id with an in_stock value of 'Y', should anything return? Secondly, if there are multiple rows with in_stock 'Y', do you care which one it picks?
The first solution assumes you want the first row, whether or not there is ANY "Y" value.
select *
from (select RID = row_number() over (partition by product_set order by in_stock desc) -- i.e. sort Y before N
from Products) a
where a.RID = 1
The second will only return a value if there is at least one row with a 'Y' for in_stock. Note that the order by (select null) is essentially saying you don't care which one it picks if there are multiple in_stock items. If you DO care the order, replace it with the appropriate sort condition.
select *
from (select RID = row_number() over (partition by product_set order by (select null)) -- i.e. sort Y before N
from Products
where in_stock = 'Y') a
where a.RID = 1
I don't know what the structure of the "parent table" in your query is, so I've simplified it to assume you have what you need in Products alone.
SELECT ISNULL(
(
SELECT TOP 1 product_name
FROM Products
WHERE Products.product_set = Parent_Table.product_set
AND Products.in_stock = 'Y'
), 'Not in the stock') AS 'Product Name'

SQL get value from next row

I'm looking for an SQL way to get the value from the next row.
The data I have looks like:
CUST PROD From_Qty Disc_Pct
23 Brush 1 0
23 Brush 13 1
23 Brush 52 4
77 Paint 1 0
77 Paint 22 7
What I need to end up with is this, (I want to create the To_Qty row):
CUST PROD From_Qty To_Qty Disc_Pct
23 Brush 1 12 0
23 Brush 13 51 1 #13 is 12+1
23 Brush 52 99999 4 #52 is 51+1
77 Paint 1 21 0 #1 is 99999+1
77 Paint 22 99999 7 #22 is 21+1
I've got 100K+ rows to do this to, and it has to be SQL because my ETL application allows SQL but not stored procedures etc.
How can I get the value from the next row so I can create To_Qty?
SELECT *,
LEAD([From_Qty], 1, 100000) OVER (PARTITION BY [CUST] ORDER BY [From_Qty]) - 1 AS To_Qty
FROM myTable
LEAD() will get you the next value based on the order of [From_Qty].. you use PARTITION BY [CUST] to reset when [Cust] changes values
or you can use a CTE and Row_Number.
WITH cte AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY [CUST] ORDER BY [From_Qty]) Rn
FROM myTable
)
SELECT t1.*,
ISNULL(t2.From_Qty - 1, 99999) To_Qty
FROM cte t1
LEFT JOIN cte t2 ON t1.Cust = t2.Cust AND t1.Rn + 1 = t2.Rn
SELECT
CUST,
PROD,
FROM_QTY ,
COALESCE(MIN(FROM_QTY) OVER (PARTITION BY CUST, PROD ORDER BY FROM_QTY DESC ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) , 10000)-1,
DISC_PCT
FROM <tablename>
ORDER BY CUST, PROD, FROM_QTY
If you are running SQL Server 2012 or later versions, you can use the LAG and LEAD functions for accessing prior or subsequent rows along with the current row.
You can use LEAD and FIRST_VALUE analytic functions to generate the result you mentioned. By using LEAD() function the next value with in the customer group can be retrieved an the FIRST_VALUE() will give the first value with in the customer group.
Say for eg. CUST=23... LEAD will return 13 and FIRST_VALUE will return 1... TO_QTY= LEAD - FIRST_VALUE i.e.. 13-1=12. In similar way the formula mentioned below will compute for all the 100k rows in your table.
SELECT CUST,
PROD,
FROM_QTY,
CASE WHEN LEAD( FROM_QTY,1 ) OVER ( PARTITION BY CUST ORDER BY FROM_QTY ) IS NOT NULL
THEN
LEAD( FROM_QTY,1 ) OVER ( PARTITION BY CUST ORDER BY FROM_QTY ) -
FIRST_VALUE( FROM_QTY ) OVER ( PARTITION BY CUST ORDER BY FROM_QTY )
ELSE 99999
END AS TO_QTY,
DISC_PCT
FROM Yourtable;
Insert the data into a temp table with the same columns but an id auto increment field added. Insert them ordered, I'm assuming by cust, prod, then from_qty.
Now you can run an update statement on the temp table.
UPDATE #mytable
SET To_Qty = (SELECT From_Qty - 1 FROM #mytable AS next WHERE next.indexfield = #mytable.indexfield + 1 AND next.cust = #mytable.cust and next.prod = #mytable.prod)
and then another one to do the 99999 with a not exists clause.
Then insert the data back to your new or modified table.
declare #Table table(CUST int, PROD varchar(50), From_Qty int, Disc_Pct int)
insert into #Table values
(23, 'Brush', 1, 0)
,(23, 'Brush', 13, 1)
,(23, 'Brush', 52, 4)
,(77, 'Paint', 1, 0)
,(77, 'Paint', 22, 7)
SELECT CUST, Prod, From_qty,
LEAD(From_Qty,1,100000) OVER(PARTITION BY cust ORDER BY from_qty)-1 AS To_Qty,
Disc_Pct
FROM #Table