How to build SQL to capture most unique value? - sql

I am trying to build a query results with SQL. Here is my table:
CUST_ID ORDER_ID STORE_FREQUENCY
---------- ----------- ---------------
100 20122 500
100 20100 500
100 20100 737
200 20119 287
300 20130 434
300 20150 434
300 20130 434
300 20120 120
The expected output is:
CUST_ID UNIQUE_ORDERS TOP_STORE
--------- ----------------- ---------
100 2 737
200 1 287
300 3 434
The requirement for the output is:
TOP_STORE = Per CUST_ID, sort the STORE_FREQUENCY column by DESC and get the greatest store frequency
UNIQUE_ORDERS = Per CUST_ID, the number of unique ORDER_IDs in the column
I have started this SELECT statement, but having difficulties completing it to include the 2 columns correctly:
Select cust_id, Count(order_id) as unique_orders
From ORDERS_TABLE
Group By Order_ID
Can you help me complete the 2 columns?

Use aggregate functions such as COUNT(DISTINCT ...) and MAX()
SELECT CUST_ID, COUNT(DISTINCT ORDER_ID), MAX(STORE_FREQUENCY )
FROM TableName
GROUP BY CUST_ID
Here's a DEMO.

It seems to be that the top store should be the store with the greatest number of orders. If so, then CUST_ID 100 should have store 500 as the top store, not 737. In other words, I would expect the following output:
This requirement changes the query strategy, because we no longer can just do a single simple aggregation over the entire table. One approach is to do a separate calculation to find the top store for each customer, then join that result to a query similar to the other answers.
WITH cte AS (
SELECT CUST_ID, STORE_FREQUENCY, cnt,
ROW_NUMBER() OVER (PARTITION BY CUST_ID ORDER BY cnt DESC) rn
FROM
(
SELECT CUST_ID, STORE_FREQUENCY,
COUNT(*) OVER (PARTITION BY CUST_ID, STORE_FREQUENCY) cnt
FROM yourTable
) t
)
SELECT
t1.CUST_ID,
t1.UNIQUE_ORDERS,
t2.TOP_STORE
FROM
(
SELECT CUST_ID, COUNT(DISTINCT ORDER_ID) AS UNIQUE_ORDERS
FROM yourTable
GROUP BY CUST_ID
) t1
INNER JOIN
(
SELECT CUST_ID, STORE_FREQUENCY AS TOP_STORE
FROM cte
WHERE rn = 1
) t2
ON t1.CUST_ID = t2.CUST_ID;
Demo

Related

Partition Over issue in SQL

I have a Order shipment table like below -
Order_ID
shipment_id
pkg_weight
1
101
5
1
101
5
1
101
5
1
102
3
1
102
3
I want the output table to look like below -
Order_ID
Distinct_shipment_id
total_pkg_weight
1
2
8
select
order_id
, count(distinct(shipment_id)
, avg(pkg_weight) over (partition by shipment_id)
from table1
group by order_id
but getting the below error -
column "pkg_weight" must appear in the GROUP BY clause or be used in
an aggregate function
Please help
Use a distinct select first, then aggregate:
SELECT Order_ID,
COUNT(DISTINCT shipment_id) AS Distinct_shipment_id,
SUM(pkg_weight) AS total_pkg_weight
FROM
(
SELECT DISTINCT Order_ID, shipment_id, pkg_weight
FROM table1
) t
GROUP BY Order_ID;

Compare column value with previous record in Oracle

My oracle table data is as below.
Org_ID
Product_ID
Order_Month
Amount
101
201
JAN-2021
2000
101
201
FEB-2021
2000
101
201
MAR-2021
2000
101
201
APR-2021
1500
101
201
MAY-2021
2000
101
202
JUN-2021
2000
101
202
JUL-2021
2000
We need to compare previous value for amount and find records with mis-matched amount and with respect to Product_ID.
My output should be like below. Tried using lag but couldn't find the solution. Can someone please provides inputs on how to approach for solving this.
Org_ID
Product_ID
Order_Month
Amount
101
201
JAN-2021 to MAR-2021
2000
101
201
APR-2021 to APR-2021
1500
101
201
MAY-2021 to MAY-2021
1500
101
202
JUN-2021 to JUL-2021
1500
You may try the following
SELECT
"Org_ID",
"Product_ID",
CONCAT(
CONCAT(Order_Month_Group,' to '),
TO_CHAR(MAX(actual_date),'MON-YYYY')
) as Order_Month,
"Amount"
FROM (
SELECT
t1.*,
LAG(
"Order_Month",
CASE WHEN continued=0 THEN 0 ELSE seq_num-1 END
,"Order_Month") OVER (
PARTITION BY "Org_ID","Product_ID","Amount"
ORDER BY actual_date
) as Order_Month_Group
FROM (
SELECT
t.*,
TO_DATE(t."Order_Month",'MON-YYYY') as actual_date,
ROW_NUMBER() OVER (
PARTITION BY t."Org_ID",t."Product_ID",t."Amount"
ORDER BY TO_DATE("Order_Month",'MON-YYYY')
) as seq_num,
CASE
WHEN t."Amount" = LAG(t."Amount",1,t."Amount") OVER (
PARTITION BY t."Org_ID",t."Product_ID"
ORDER BY TO_DATE("Order_Month",'MON-YYYY')
) THEN 1
ELSE 0
END as continued
FROM
my_oracle_table t
) t1
) t2
GROUP BY "Org_ID", "Product_ID", Order_Month_Group, "Amount"
ORDER BY MIN(actual_date)
or
SELECT
"Org_ID",
"Product_ID",
CONCAT(
CONCAT(TO_CHAR(MIN(actual_date),'MON-YYYY'),' to '),
TO_CHAR(MAX(actual_date),'MON-YYYY')
) as Order_Month,
"Amount"
FROM (
SELECT
t1.*,
SUM(continued) OVER ( ORDER BY actual_date) as grp
FROM (
SELECT
t.*,
TO_DATE("Order_Month",'MON-YYYY') as actual_date,
CASE
WHEN t."Amount" = LAG(t."Amount",1,t."Amount") OVER (
PARTITION BY t."Org_ID",t."Product_ID"
ORDER BY TO_DATE("Order_Month",'MON-YYYY')
) THEN 0
ELSE 1
END as continued
FROM
my_oracle_table t
) t1
) t2
GROUP BY "Org_ID", "Product_ID", grp, "Amount"
ORDER BY MIN(actual_date)
View Demo on DB Fiddle
Let me know if this works for you.
This is a type of gaps-and-islands problem. In this case, the simplest solution is probably the difference of row numbers. The following assumes that order_month is actually a string (rather than a date):
select org_id, product_id, amount,
min(order_month), max(order_month)
from (select t.*,
row_number() over (partition by org_id, product_id order by to_date(order_month, 'MON-YYYY')) as seqnum,
row_number() over (partition by org_id, product_id, amount order by to_date(order_month, 'MON-YYYY')) as seqnum_2
from t
) t
group by org_id, product_id, amount, (seqnum - seqnm_2);
Why this works is a little tricky to explain. However, if you look at the results of the subquery, you will see how the difference between these two values is constant on adjacent months.

SQL sum grouped by field with all rows

I have this table:
id sale_id price
-------------------
1 1 100
2 1 200
3 2 50
4 3 50
I want this result:
id sale_id price sum(price by sale_id)
------------------------------------------
1 1 100 300
2 1 200 300
3 2 50 50
4 3 50 50
I tried this:
SELECT id, sale_id, price,
(SELECT sum(price) FROM sale_lines GROUP BY sale_id)
FROM sale_lines
But get the error that subquery returns different number of rows.
How can I do it?
I want all the rows of sale_lines table selecting all fields and adding the sum(price) grouped by sale_id.
You can use window function :
sum(price) over (partition by sale_id) as sum
If you want sub-query then you need to correlate them :
SELECT sl.id, sl.sale_id, sl.price,
(SELECT sum(sll.price)
FROM sale_lines sll
WHERE sl.sale_id = sll.sale_id
)
FROM sale_lines sl;
Don't use GROUP BY in the sub-query, make it a co-related sub-query:
SELECT sl1.id, sl1.sale_id, sl1.price,
(SELECT sum(sl2.price) FROM sale_lines sl2 where sl2.sale_id = sl.sale_id) as total
FROM sale_lines sl1
In addition to other approaches, You can use CROSS APPLY and get the sum.
SELECT id, sale_id,price, Price_Sum
FROM YourTable AS ot
CROSS APPLY
(SELECT SUM(price) AS Price_Sum
FROM YourTable
WHERE sale_id = ot.sale_id);
SELECT t1.*,
total_price
FROM `sale_lines` AS t1
JOIN(SELECT Sum(price) AS total_price,
sale_id
FROM sale_lines
GROUP BY sale_id) AS t2
ON t1.sale_id = t2.sale_id

Fetch row with max occurrence in Oracle

I have a table like:
SALES
PROD_CODE SALE_ID
321 30
123 67
321 46
321 82
123 48
321 91
For the code:
SELECT PROD_CODE, COUNT(SALE_ID) AS TOTAL_SALES
FROM SALES
GROUP BY PROD_CODE
ORDER BY COUNT(SALE_ID) DESC;
The output is:
PROD_CODE TOTAL_SALES
321 4
123 2
But, when I am expecting only the prod_code with the maximum number of sales as the output,
like:
PROD_CODE
321
For the code:
SELECT PROD_CODE
FROM (SELECT MAX(COUNT(SALE_ID)) FROM SALES
GROUP BY SALE_ID);
The code isn't working!
In Oracle 12c+, you can do:
select s.prod_code
from sales s
order by count(*) desc
fetch first 1 row only;
In earlier versions, either
select s.*
from (select s.prod_code
from sales s
order by count(*) desc
) s
where rownum = 1;
Or:
select max(prod_code) over (dense_rank first order by cnt desc)
from (select s.prod_code, count(*) as cnt
from sales s
group by s.prod_code
) s
The first two versions fetch the entire row. You can limit it to one or more columns is that is all you want.
You could use stats_mode function to fetch row/column with maximum occurrence.
Here is detailed doc for this function https://docs.oracle.com/database/121/SQLRF/functions188.htm#SQLRF06320

how to use same column twice with different criteria with one common column in sql

I have a table
ID P_ID Cost
1 101 1000
2 101 1050
3 101 1100
4 102 5000
5 102 2000
6 102 6000
7 103 3000
8 103 5000
9 103 4000
I want to use 'Cost' column twice to fetch first and last inserted value in cost corresponding to each P_ID
I want output as:
P_ID First_Cost Last_Cost
101 1000 1100
102 5000 6000
103 3000 4000
;WITH t AS
(
SELECT P_ID, Cost,
f = ROW_NUMBER() OVER (PARTITION BY P_ID ORDER BY ID),
l = ROW_NUMBER() OVER (PARTITION BY P_ID ORDER BY ID DESC)
FROM dbo.tablename
)
SELECT t.P_ID, t.Cost, t2.Cost
FROM t INNER JOIN t AS t2
ON t.P_ID = t2.P_ID
WHERE t.f = 1 AND t2.l = 1;
In 2012 you will be able to use FIRST_VALUE():
SELECT DISTINCT
P_ID,
FIRST_VALUE(Cost) OVER (PARTITION BY P_ID ORDER BY ID),
FIRST_VALUE(Cost) OVER (PARTITION BY P_ID ORDER BY ID DESC)
FROM dbo.tablename;
You get a slightly more favorable plan if you remove the DISTINCT and instead use ROW_NUMBER() with the same partitioning to eliminate multiple rows with the same P_ID:
;WITH t AS
(
SELECT
P_ID,
f = FIRST_VALUE(Cost) OVER (PARTITION BY P_ID ORDER BY ID),
l = FIRST_VALUE(Cost) OVER (PARTITION BY P_ID ORDER BY ID DESC),
r = ROW_NUMBER() OVER (PARTITION BY P_ID ORDER BY ID)
FROM dbo.tablename
)
SELECT P_ID, f, l FROM t WHERE r = 1;
Why not LAST_VALUE(), you ask? Well, it doesn't work like you might expect. For more details, see the comments under the documentation.
SELECT t.P_ID,
SUM(CASE WHEN ID = t.minID THEN Cost ELSE 0 END) as FirstCost,
SUM(CASE WHEN ID = t.maxID THEN Cost ELSE 0 END) as LastCost
FROM myTable
JOIN (
SELECT P_ID, MIN(ID) as minID, MAX(ID) as maxID
FROM myTable
GROUP BY P_ID) t ON myTable.ID IN (t.minID, t.maxID)
GROUP BY t.P_ID
Admittedly, #AaronBertrand's approach is cleaner here. However, this solution will work on older versions of SQL Server (that don't support CTE's or window functions), or on pretty much any other DBMS.
Do you want first and last in terms of Min and Max, or do you want which one was entered first and which one was entered last? If you want Min and max you can group by.
SELECT P_ID, MIN(Cost), MAX(Cost) FROM table_name GROUP BY P_ID
I believe this does your thing also, just without self joins or subqueries:
SELECT DISTINCT
P_ID
,MIN(Cost) OVER (PARTITION BY P_ID) as FirstCost
,MAX(Cost) OVER (PARTITION BY P_ID) as LastCost
FROM Table