How to I stop duplication on SQL join where I have order_ids and when people order more than 1 item (so multiple product_ids) to calculate discounts? - sql

So my problem is my discount number is blowing up because an order has a discount for the entire order, but I am making a dataset where there are multiple lines for each order to represent each product in the order. Instead of the discount only applying once to the order, it adds the discount for every line.
what is happening
order_id
product_id
quantity
amount
discount
1
a
1
5
0
2
a
1
5
7
2
b
1
10
7
3
a
1
5
5
3
b
1
10
5
3
c
1
15
5
what i want
order_id
product_id
quantity
amount
discount
1
a
1
5
0
2
a
1
5
7
2
b
1
10
0
3
a
1
5
5
3
b
1
10
0
3
c
1
15
0
I just want the discount to be applied once per order, and my join is using order_id so that is why the discount is applying multiple times. I would attach my code, but it's a decent sized CTE

Figured it out. I did need to use a row_number() Over Partition by Order id, but I was also losing records if the order had more than 1 item. The solution was to use a CASE WHEN statement.
CASE WHEN ORDER_ROW_COUNT = 1 THEN DISCOUNT ELSE 0 END
this allowed me to keep the records without duplicating the discounts

You’re joining on a field that isn’t unique so the join is returning all the records for that order Id and therefore the discount is being applied to all the records for that order Id. You need some sort of differentiator field. Something that is unique in each orders data set.
Example:
Select *, row_number () over(partition by order_id order by order_id) as rownumber into #temp from table
This should give you something like in the picture.
rownumber table image
Then join on order_Id = order_Id and rownumber =1 and this would only update the first record for each order.

Related

Merge row values based on other column value

I'm trying to merge the values of two rows based on the value of another row in a different column. Below is my based table
Customer ID
Property ID
Bookings per customer
Cancellations per customer
A
1
0
1
B
2
10
1
C
3
100
1
C
4
100
1
D
5
20
1
Here is the SQL query I used
select customer_id, property_id, bookings_per_customer, cancellations_per_customer
from table
And this is what I want to see. Any ideas the query to get this would be? We use presto SQL
Thanks!
Customer ID
Property ID
Bookings per customer
Cancellations per customer
A
1
0
1
B
2
10
1
C
3 , 4
100
1
D
5
20
1
We can try:
SELECT
customer_id,
ARRAY_JOIN(ARRAY_AGG(property_id), ',') AS properties,
bookings_per_customer,
cancellations_per_customer
FROM yourTable
GROUP BY
customer_id,
bookings_per_customer,
cancellations_per_customer;

Select column's occurence order without group by

I currently have two tables, users and coupons
id
first_name
1
Roberta
2
Oliver
3
Shayna
4
Fechin
id
discount
user_id
1
20%
1
2
40%
2
3
15%
3
4
30%
1
5
10%
1
6
70%
4
What I want to do is select from the coupons table until I've selected X users.
so If I chose X = 2 the resulting table would be
id
discount
user_id
1
20%
1
2
40%
2
4
30%
1
5
10%
1
I've tried using both dense_rank and row_number but they return the count of occurrences of each user_id not it's order.
SELECT id,
discount,
user_id,
dense_rank() OVER (PARTITION BY user_id)
FROM coupons
I'm guessing I need to do it in multiple subqueries (which is fine) where the first subquery would return something like
id
discount
user_id
order_of_occurence
1
20%
1
1
2
40%
2
2
3
15%
3
3
4
30%
1
1
5
10%
1
1
6
70%
4
4
which I can then use to filter by what I need.
PS: I'm using postgresql.
You've stated that you want to parameterize the query so that you can retrieve X users. I'm reading that as all coupons for the first X distinct user_ids in coupon id column order.
It appears your attempt was close. dense_rank() is the right idea. Since you want to look over the entire table you can't use partition by. And a sorting column is also required to determine the ranking.
with data as (
select *,
dense_rank() over (order by id) as dr
from coupons
)
select * from data where dr <= <X>;

SQL - Counting over several groups

I have a list of transactions where the ID's are repeated and I have the quantity of items being bought. I need to count the number of times that a particular number of items were purchased at once.
Row
ItmNBR
TQTY
1
123
5
2
123
5
3
123
5
3
456
25
4
456
19
I need to produce an out put like this...
ItmNBR
QTY
Occurance
123
5
3
123
19
1
123
25
1
I can get the first two columns of my result but when I attempt to counting over a partition I end up counting getting repeating numbers since I'm only looking up 9 items I just count the number of rows in which the Cnt is the same.
TOT_IVO_ITM_QTY
Count(*) OVER (PARTITION BY QTY) AS CNT
FROM dataset
WHERE YEAR(bus_dt) = 2021
AND ITM_NBR IN (12639,12940,12949,12955,13485,13666,43950,631343,1103731)
AND QTY BETWEEN 5 AND 25
ORDER BY ITM_NBR
,QTY
GROUP BY ITM_NBR, TOT_IVO_ITM_QTY```
I think you just want group by:
select ItmNBR, QTY, count(*)
from t
group by ItmNBR, QTY
order by count(*) desc;
This assumes that you want the count by item and quantity, which seems to be the gist of the question.

Derby DB last x row average

I have the following table structure.
ITEM TOTAL
----------- -----------------
ID | TITLE ID |ITEMID|VALUE
1 A 1 2 6
2 B 2 1 4
3 C 3 3 3
4 D 4 3 8
5 E 5 1 2
6 F 6 5 4
7 4 5
8 2 8
9 2 7
10 1 3
11 2 2
12 3 6
I am using Apache Derby DB. I need to perform the average calculation in SQL. I need to show the list of item IDs and their average total of the last 3 records.
That is, for ITEM.ID 1, I will go to TOTAL table and select the last 3 records of the rows which are associated with the ITEMID 1. And take average of them. In Derby database, I am able to do this for a given item ID but I cannot make it without giving a specific ID. Let me show you what I've done it.
SELECT ITEM.ID, AVG(VALUE) FROM ITEM, TOTAL WHERE TOTAL.ITEMID = ITEM.ID GROUP BY ITEM.ID
This SQL gives the average of all items in a list. But this calculates for all values of the total tables. I need last 3 records only. So I changed the SQL to this:
SELECT AVG(VALUE) FROM (SELECT ROW_NUMBER() OVER() AS ROWNUM, TOTAL.* FROM TOTAL WHERE ITEMID = 1) AS TR WHERE ROWNUM > (SELECT COUNT(ID) FROM TOTAL WHERE ITEMID = 1) - 3
This works if I supply the item ID 1 or 2 etc. But I cannot do this for all items without giving an item ID.
I tried to do the same thing in ORACLE using partition and it worked. But derby does not support partitioning. There is WINDOW but I could not make use of it.
Oracle one
SELECT ITEMID, AVG(VALUE) FROM(SELECT ITEMID, VALUE, COUNT(*) OVER (PARTITION BY ITEMID) QTY, ROW_NUMBER() OVER (PARTITION BY ITEMID ORDER BY ID) IDX FROM TOTAL ORDER BY ITEMID, ID) WHERE IDX > QTY -3 GROUP BY ITEMID ORDER BY ITEMID
I need to use derby DB for its portability.
The desired output is this
RESULT
-----------------
ITEMID | AVERAGE
1 (9/3)
2 (17/3)
3 (17/3)
4 (5/1)
5 (4/1)
6 NULL
As you have noticed, Derby's support for the SQL 2003 "OLAP Operations" support is incomplete.
There was some initial work (see https://wiki.apache.org/db-derby/OLAPOperations), but that work was only partially completed.
I don't believe anyone is currently working on adding more functionality to Derby in this area.
So yes, Derby has a row_number function, but no, Derby does not (currently) have partition by.

Oracle SQL: Insert based on found value in a column given condition from another table

I want to merge my order data into one table which is now in two separate tables:
Order ID and customer code in table Orders:
Order_ID Customer
1 C11
2 C76
4 C32
and order detalis in table Details (with columns Order_ID, Hour, Quantity) in which the ordered quantity for the hours that the order is valid is given:
Order_ID Hour Quantity
1 2 10
1 3 20
2 2 5
2 3 5
2 4 5
4 6 20
4 7 25
I want to merge data of these two tables in one table to have only one row per each order by inserting the quantity for the hours that the order is valid in corresponding column, otherwise zero.
Order_ID Cutomer Hour1 Hour2 Hour3 Hour4 Hour5 Hour6 Hour7 ...
1 C11 0 10 20 0 0 0 0
2 C76 0 5 5 5 0 0 0
4 C32 0 0 0 0 0 20 25
I tried (only for quantity of hour 1):
insert into Merged_Order_Table
(Order_ID,Customer,Hour1)
select
Orders.Order_id,Orders.Customer,
case
when 1 in (select Details.Hour from Details,Orders where
Details.Order_ID = Orders.Order_ID)
then Details.Quantity
else 0
end
from
Orders
inner join
Details
on
Details.Order_ID = Orders.Order_ID;
But got quantity in Hour1 even for orders with no quantity in this hour.
I question why you would want to take nicely normalized data and put it into a table with that structure. I can understand a query returning the data like that, but another table?
In any case, your problem is a common problem when using correlated subqueries. The table being correlated is included in the subquery. Ooops. Here is fix for that:
insert into Merged_Order_Table(Order_ID, Customer, Hour1)
select o.Order_id, o.Customer,
(case when 1 in (select d.Hour from Details d where d.Order_ID = o.Order_ID)
then d.Quantity
else 0
end)
from Orders o;
That said, what you really want is conditional aggregation:
insert into Merged_Order_Table(Order_ID, Customer, Hour1)
select o.Order_id, o.Customer,
sum(case when d.Hour = 1 then d.Quantity else 0 end)
from Orders o left join
Details d
on o.Order_ID = d.Order_ID
group by o.Order_id, o.Customer;
You are on a right track!
you just need one more layer:
select order_id, customer, max(quantity) hour1 from
(your query)
group by order_id, customer
Or you can look into how to do PIVOT tables