Query to list every line item but only list total once - sql

We have a sales report that pulls data from multiple tables and my query shows correct data except orders that have multiple line items, i.e., the Total from the Orders table is listed on every line item row.
How can we list the Order Total only once on the row that has the smallest line item ID (for that order) but still list every line item row? Thanks!
Data Structure:
Orders Table:
Order_ID
Total
Line Items Table:
ID
Order_ID
Line_Item_Price
Line_Item_Qty
Result should be:
Order_ID Total Line_Item_Price Line_item_Qty Line_Item_ID
---------- ------- ----------------- --------------- --------------
10001 200 100 2 32001
10002 150 150 1 32002
10003 210 55 1 32003
10003 30 2 32004
10003 95 1 32005
10004 125 125 1 32006

This should be done in the application not in SQL.
But you can do that using window functions
select o.order_id,
case row_number() over (partition by o.order_id order by line_item_id)
when 1 then o.total
end as total,
li.line_item_price,
li.line_item_qty,
li.line_item_id
from orders o
join line_item li on o.order_id = li.order_id
order by o.order_id, li.line_item_id;
row_number() assigns a unique row number for each line item for every order. When the number is 1 the total is displayed, otherwise it's not.
In a relational database there is no such thing as "the first row" unless you specify an order by - in this case the "first row" is the line item with the smallest line_item_id
Online example: http://rextester.com/TQOIX50171
Unrelated, but: storing the total in the orders table is not a terribly good idea. In a normalized design you shouldn't store information that can easily be derived from existing data.

What this does is get the orders with their items and ranks them. So that the smallest item_id is rank 1 for that order, and the latest is the last rank.
ROW_NUMBER() is the function that gives the index of the rows in the output of the query (row 1 = 1, row 2 = 2). Then we can combine this with OVER (PARTITION BY), which means get the row numbers within a certain window, a partition. In this case we want to number the rows for the windows of Order_IDs. We use ORDER BY alongside to say how we order the rows within the window
When we have this table, we can then write a query on it to show the total only on the rows where the item_rank = 1
WITH rank_items_for_orders AS (
SELECT
Order_ID,
Line_Item_Price,
Line_Item_Qty,
Line_Item_ID,
Total,
ROW_NUMBER()
OVER (PARTITION BY Order_ID, ORDER BY Line_Item_ID ASC)
AS order_the_items_IDs
FROM orders o
LEFT JOIN line_items li ON o.order_id = li.order_id
ORDER BY Order_ID ASC)
SELECT
Order_ID,
Line_Item_Price,
Line_Item_Qty,
Line_Item_ID,
CASE WHEN order_the_items_IDs = 1
THEN Total ELSE NULL END
AS Total
FROM
rank_items_for_orders

Related

SQL query which will extract conditionally the values from top categories the first and the 2nd where CATEGORY is OTHER

I have this table. The table just a small example and has more obs.
id
CATEGORY
AMOUNT
1
TECH
120
1
FUN
220
2
OTHER
340
2
PARENTS
220
made by id category amount spent in each category.I want to select ID and Category in which the ID spents the most but in case if category is OTHER I want to get 2nd most spending category.
I have a constraint. I CANNOT use the the subquery and select with filter WHERE CATEGORY <> 'OTHER'. It just makes my machine to go out of the memory (For reasons Idk)
This is what I have tried.
I have tried to create a row_number () over (partition by id order by amount desc) rn.
and then
select id, category from table where row num = 1 group by 1,2
**buttt. I don't know how to say to query. If CATEGORY is OTHER then take row num=2 . **
id
CATEGORY
AMOUNT
ROW NUM
1
TECH
120
2
1
FUN
220
1
2
OTHER
340
1
2
PARENTS
220
2
Another thing I was thinking to do is to write qualify function
QUALIFY ROW_NUMBER() OVER (PARTITION BY ID ORDER BY AMOUNT DESC) <1.
Also here I am getting only 1st records in which there is also OTHER. If I could filter it out within QUALIFY and say if CATEGORY is 'OTHER' don't consider it.
I am using Databricks.

SELECT random 10% of rows for each category on SQL Server

There is a table of products sold.
row_id
customer
product
date_sold
1
customer_1
thingamajig
01.01.2023
2
customer_12
whosi-whatsi
03.01.2023
3
customer_1
watchamacallit
04.01.2023
4
customer_4
whosi-whatsi
06.01.2023
...
...
...
...
There is always one row per one item.
Let's say customer_1 ordered 100 items total. customer_2 ordered 50 items total. customer_3 ordered 17 items total. How do you select random 10% of rows for each customer? The fraction of rows selected should be rounded up (for example 12 rows total results in 2 selected). That means every customer that bought at least one item should appear in the resulting table. In this case the resulting table for customer_1, customer_2 and customer_3 would have 10 + 5 + 2 = 17 rows.
My initial approach would be to create a temp table, calculate desired row counts for each customer and then loop through the temp table and select rows for each customer. Then insert them to another table and select from that one:
drop table if exists #row_counts
select
customer
ceiling(convert(decimal(10, 2), count(product)) / 10) as row_count
into #row_counts
from products_sold
group by customer
-- then use cursor to loop over #row_counts and insert into the final table
-- for randomness an 'order by newid()' will be used
But this just doesn't feel like the right solution...
You need to know total count and a row count of what you want.
Something like this can perhaps be of service:
EDITED due to it not being randomized properly:
select *
from (
select row_number() over(partition by customerid order by newid()) as sortOrder
, COUNT(*) OVER(PARTITION BY customerID) AS cnt
, *
FROM products
) p
-- Now, we want 10% of total count rounded upwards
WHERE sortOrder <= CEILING(cnt * 0.1)

Sum all rows where column is different in SQL?

I have a simple table.
The relevant fields are:Return Value, and Return Number
So this table shows me all items that were returned, what return number this return is, and what was the value of all items in this return.
So an example table can look something like this
Line # | Item Number | Quantity Returned | Return Value | Return Number | Cust Order #
1 789 1 $40 123 456
1 780 1 $40 123 456
1 780 1 $20 124 456
I just want it to sum up all return values by different return numbers. So for example, there are two rows with return number 123 and one row with return number 124. So it should take one of the 123 and sum it to 124, giving my $60
I've tried
SUM((rh.Total_Value-rh.Freight_Charges)) OVER (PARTITION BY rh.Customer_Purchase_Order_Number) as Total_Returned_Value
SUM((rh.Total_Value-rh.Freight_Charges)) OVER (PARTITION BY rh.Return_Number) as Total_Returned_Value
SUM((rh.Total_Value-rh.Freight_Charges)) OVER (PARTITION BY rh.Return_Number Order by rh.Customer_Purchase_Order_Number) as Total_Returned_Value
SUM((rh.Total_Value-rh.Freight_Charges)) OVER (PARTITION BY rh.Customer_Purchase_Order_Number Order by rh.Return_Number) as Total_Returned_Value
None of these seem to work and I feel that I don't have a great grasp on order by and partition by
This is my full code
select rh.Return_Number,
rd.Odet_Line_Number, rd.Item_Number, rd.Color_Code, rd.Quantity_Returned,
(rh.Total_Value-rh.Freight_Charges)as Returned_Value, rh.Remarks,
SUM((rh.Total_Value-rh.Freight_Charges)) OVER (PARTITION BY /*rh.Return_Number Order by*/ rh.Customer_Purchase_Order_Number) as Total_Returned_Value
from
[JMNYC-AMTDB].[AMTPLUS].[dbo].Returns_Header rh (nolock)
LEFT JOIN
[JMNYC-AMTDB].[AMTPLUS].[dbo].Returns_Detail rd (nolock) on rd.Return_Number = Rh.Return_number
WHERE rh.Customer_Purchase_Order_Number = #Shopify
You probably got multiple detail rows per header resulting in duplicate header data. If you want to sum by unique return number do the calculation on the header first in a CTE and join the result to the detail, like this
with rh as
( select -- assuming the rh.Return_Number is unique
rh.Return_Number,
(rh.Total_Value-rh.Freight_Charges)as Returned_Value,
rh.Remarks,
SUM((rh.Total_Value-rh.Freight_Charges))
OVER (PARTITION BY rh.Customer_Purchase_Order_Number) as Total_Returned_Value
-- don't know if this is the PARTITION you want, maybe none
from
[JMNYC-AMTDB].[AMTPLUS].[dbo].Returns_Header rh (nolock)
)
select rh.Return_Number,
rd.Odet_Line_Number, rd.Item_Number, rd.Color_Code, rd.Quantity_Returned,
rh.Returned_Value, rh.Remarks,
rh.Total_Returned_Value
from
rh
LEFT JOIN
[JMNYC-AMTDB].[AMTPLUS].[dbo].Returns_Detail rd (nolock) on rd.Return_Number = Rh.Return_number
WHERE rh.Customer_Purchase_Order_Number = #Shopify

SQL - create flag in query to highlight order which contain quantity = 1

I have tried creating a case statement but doesn't seem to give me what i want. Id like to get a split of the table (which is at a product level) and aggregate at an order level of items which contain quantity of 1.
Any ideas on how I would do this?
order id | Product | Quantity
---------+---------+--------------
11111 | sdsd4 | 1 (single item )
22222 | sasas | 1 (multiple items)
22222 | wertt | 1 (multiple items)
I'd like to get a case statement to add another column to split out orders with quantity = 1 and orders greater 1
Any idea on how I would do this?
The desired outcome would be the column in (brackets)
I could then count the orders and bring in the newly created column as the dimension
More detail here:
enter image description here
Attached is an image of table structure.
Logic, if quantity = 1 and 1 order then single item order
if order has one item but multiples of same item non single item order
if order has more than one product then non single item order
If your database supports analytic functions, then you can use a query like this one:
SELECT *,
CASE WHEN count("Product") OVER (partition by "order id") > 1
THEN 'multiple items' ELSE 'single item'
END As "How many items"
FROM Table1
Demo: https://dbfiddle.uk/?rdbms=postgres_11&fiddle=b659279fc16d2084cb1cf4a3bea361a1
Below is for BigQuery Standard SQL
#standardSQL
SELECT *,
CASE COUNT(DISTINCT Product) OVER(PARTITION BY order_id)
WHEN 1 THEN 'Single Item Order'
ELSE 'Multiple Items Order'
END Single_or_Multiple
FROM `project.dataset.table`
You can test, play with above using dummy data as below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 11111 order_id, 'sdsd4' Product, 1 Quantity UNION ALL
SELECT 22222, 'sasas', 2 UNION ALL
SELECT 22222, 'wertt', 1
)
SELECT *,
CASE COUNT(DISTINCT Product) OVER(PARTITION BY order_id)
WHEN 1 THEN 'Single Item Order'
ELSE 'Multiple Items Order'
END Single_or_Multiple
FROM `project.dataset.table`
with result
Row order_id Product Quantity Single_or_Multiple
1 11111 sdsd4 1 Single Item Order
2 22222 sasas 2 Multiple Items Order
3 22222 wertt 1 Multiple Items Order
If I understand this right, you could use a subquery to get the count of records for an order and flag a record, if this count is larger then 1 and the quantity is equal to 1.
SELECT t1.order_id,
t1.product,
t1.quantity,
CASE
WHEN t1.quantity = 1
AND (SELECT count(*)
FROM elbat t2
WHERE t2.order_id = t1.order_id) > 1 THEN
'flag'
ELSE
'no flag'
END flag
FROM elbat t1;

How can I SELECT the max row in a table SQL?

I have a little problem.
My table is:
Bill Product ID Units Sold
----|-----------|------------
1 | 10 | 25
1 | 20 | 30
2 | 30 | 11
3 | 40 | 40
3 | 20 | 20
I want to SELECT the product which has sold the most units; in this sample case, it should be the product with ID 20, showing 50 units.
I have tried this:
SELECT
SUM(pv."Units sold")
FROM
"Products" pv
GROUP BY
pv.Product ID;
But this shows all the products, how can I select only the product with the most units sold?
Leaving aside for the moment the possibility of having multiple products with the same number of units sold, you can always sort your results by the sum, highest first, and take the first row:
SELECT pv."Product ID", SUM(pv."Units sold")
FROM "Products" pv
GROUP BY pv."Product ID"
ORDER BY SUM(pv."Units sold") DESC
LIMIT 1
I'm not quite sure whether the double-quote syntax for column and table names will work - exact syntax will depend on your specific RDBMS.
Now, if you do want to get multiple rows when more than one product has the same sum, then the SQL will become a bit more complicated:
SELECT pv.`Product ID`, SUM(pv.`Units sold`)
FROM `Products` pv
GROUP BY pv.`Product ID`
HAVING SUM(pv.`Units sold`) = (
select max(sums)
from (
SELECT SUM(pv2.`Units sold`) as "sums"
FROM `Products` pv2
GROUP BY pv2.`Product ID`
) as subq
)
Here's the sqlfiddle
SELECT SUM(pv."Units sold") as `sum`
FROM "Products" pv
group by pv.Product ID
ORDER BY sum DESC
LIMIT 1
limit 1 + order by
The Best and effective way to this is Max function
Here's The General Syntax of Max function
SELECT MAX(ID) AS id
FROM Products;
and in your Case
SELECT MAX(Units Sold) from products
Here is the Complete Reference to MIN and MAX functions in Query
Click Here