Merge duplicate values in column - sql

I have a table cart_items with the following data
+-----+---------+---------+------------+----------+
| id | user_id | cart_id | product_id | quantity |
+-----+---------+---------+------------+----------+
| 303 | 9 | 44 | 1 | 2 |
| 305 | 9 | 44 | 3 | 1 |
| 307 | 9 | 44 | 3 | 1 |
| 308 | 9 | 44 | 2 | 1 |
| 309 | 9 | 44 | 6 | 1 |
| 310 | 9 | 44 | 2 | 1 |
+-----+---------+---------+------------+----------+
My problem is that there are duplicate products. My desired table would be this
+-----+---------+---------+------------+----------+
| id | user_id | cart_id | product_id | quantity |
+-----+---------+---------+------------+----------+
| 303 | 9 | 44 | 1 | 2 |
| 305 | 9 | 44 | 3 | 2 |
| 308 | 9 | 44 | 2 | 2 |
| 309 | 9 | 44 | 6 | 1 |
+-----+---------+---------+------------+----------+
So the difference is that the duplicates product_id got merged and increased the quantity.
Is there an easy way to do this with an SQL query?

You need to group by user_id, cart_id, product_id and aggregate:
select
min(id) id, user_id, cart_id, product_id, sum(quantity) quantity
from cart_items
group by user_id, cart_id, product_id

Related

PostgresSql:Comparing two tables and obtaining its result and compare it with third table

TABLE 2 : trip_delivery_sales_lines
+-------+---------------------+------------+----------+------------+-------------+--------+--+
| Sl no | Order_date | Partner_id | Route_id | Product_id | Product qty | amount | |
+-------+---------------------+------------+----------+------------+-------------+--------+--+
| 1 | 2020-08-01 04:25:35 | 34567 | 152 | 432 | 2 | 100 | |
| 2 | 2021-09-11 02:25:35 | 34572 | 130 | 312 | 4 | 150 | |
| 3 | 2020-05-10 04:25:35 | 34567 | 152 | 432 | 3 | 123 | |
| 4 | 2021-02-16 01:10:35 | 34572 | 130 | 432 | 5 | 123 | |
| 5 | 2020-02-19 01:10:35 | 34567 | 152 | 432 | 2 | 600 | |
| 6 | 2021-03-20 01:10:35 | 34569 | 152 | 123 | 1 | 123 | |
| 7 | 2021-04-23 01:10:35 | 34570 | 152 | 432 | 4 | 200 | |
| 8 | 2021-07-08 01:10:35 | 34567 | 152 | 432 | 3 | 32 | |
| 9 | 2019-06-28 01:10:35 | 34570 | 152 | 432 | 2 | 100 | |
| 10 | 2018-11-14 01:10:35 | 34570 | 152 | 432 | 5 | 20 | |
| | | | | | | | |
+-------+---------------------+------------+----------+------------+-------------+--------+--+
From Table 2 : we had to find partners in route=152 and find the sum of product_qty of the last 2 sale [can be selected by desc order_date]
. We can find its result in table 3.
34567 – Serial number [ 1,8]
34570 – Serial number [ 7,9]
34569 – Serial number [6]
TABLE 3 : RESULT OBTAINED FROM TABLE 1,2
+------------+-------+
| Partner_id | count |
+------------+-------+
| 34567 | 5 |
| 34569 | 1 |
| 34570 | 6 |
| | |
+------------+-------+
From table 4 we want to find the above partner_ids leaf count
TABLE 4 :coupon_leaf
+------------+-------+
| Partner_id | Leaf |
+------------+-------+
| 34567 | XYZ1 |
| 34569 | XYZ2 |
| 34569 | DDHC |
| 34567 | DVDV |
| 34570 | DVFDV |
| 34576 | FVFV |
| 34567 | FVV |
| | |
+------------+-------+
From that we can find result as:
34567 – 3
34569-2
34570 -1
TABLE 5: result obtained from TABLE 4
+------------+-------+
| Partner_id | count |
+------------+-------+
| 34567 | 3 |
| 34569 | 2 |
| 34570 | 1 |
| | |
+------------+-------+
Now we want compare table 3 and 5
If partner_id count [table 3] > partner_id count [table 4]
Print partner_id
I want a single query to do all these operation
distinct partner_id can be found by: fROM TABLE 1
SELECT DISTINCT partner_id
FROM trip_delivery_sales ts
WHERE ts.route_id='152'
GROUP BY ts.partner_id
This answers the original version of the problem.
You seem to want to compare totals after aggregating tables 2 and 3. I don't know what table1 is for. It doesn't seem to do anything.
So:
select *
from (select partner_id, sum(quantity) as sum_quantity
from (select tdsl.*,
row_number() over (partition by t2.partner_id order by order_date) as seqnum
from trip_delivery_sales_lines tdsl
) tdsl
where seqnum <= 2
group by tdsl.partner_id
) tdsl left join
(select cl.partner_id, count(*) as leaf_cnt
from coupon_leaf cl
group by cl.partner_id
) cl
on cl.partner_id = tdsl.partner_id
where leaf_cnt is null or sum_quantity > leaf_cnt

Deleting a row from a table. If an associated ID has more than 1 entry in another table, insert a value. Else insert null

I have the following 3 tables:
Item Table
+---------+-------------+
| Item_ID | Location_ID |
+---------+-------------+
| 1 | 23 |
| 2 | 44 |
| 3 | 44 |
| 4 | 25 |
| 5 | 13 |
| 6 | 13 |
+---------+-------------+
Shipments Table
+-------------+------------------+----------------+
| Shipment_ID | Location_From_ID | Location_To_ID |
+-------------+------------------+----------------+
| 1 | 15 | 23 |
| 2 | 12 | 44 |
| 3 | 7 | 16 |
| 4 | 15 | 21 |
| 5 | 16 | 25 |
| 6 | 21 | 11 |
| 7 | 11 | 13 |
+-------------+------------------+----------------+
Item_Shipment_Link Table
+-------------+---------+
| Shipment_ID | Item_ID |
+-------------+---------+
| 1 | 1 |
| 2 | 2 |
| 2 | 3 |
| 3 | 4 |
| 4 | 5 |
| 4 | 6 |
| 5 | 4 |
| 6 | 5 |
| 7 | 5 |
| 7 | 6 |
+-------------+---------+
I am working on a procedure to delete a row from the Shipments Table.
When the shipment is deleted, the Location_ID(s) for the associated Item_ID(s) on the Item Table must be updated with the Location_From_ID value from the shipment. This is equivalent to placing the item back where it came from.
Any rows associated with the Shipment_ID being deleted must be removed from the Item_Shipment_Link Table as well.
Example: If Shipment_ID:5 is deleted, update the Location_ID for Item_ID:4 on the Item Table to 16. Also delete the entry for Shipment_ID:5 from the Item_Shipment_Link Table.
Item Table
+---------+-------------+
| Item_ID | Location_ID |
+---------+-------------+
| 1 | 23 |
| 2 | 44 |
| 3 | 44 |
| 4 | 16 |
| 5 | 13 |
| 6 | 13 |
+---------+-------------+
Shipments Table
+-------------+------------------+----------------+
| Shipment_ID | Location_From_ID | Location_To_ID |
+-------------+------------------+----------------+
| 1 | 15 | 23 |
| 2 | 12 | 44 |
| 3 | 7 | 16 |
| 4 | 15 | 21 |
| 6 | 21 | 11 |
| 7 | 11 | 13 |
+-------------+------------------+----------------+
Item_Shipment_Link Table
+-------------+---------+
| Shipment_ID | Item_ID |
+-------------+---------+
| 1 | 1 |
| 2 | 2 |
| 2 | 3 |
| 3 | 4 |
| 4 | 5 |
| 4 | 6 |
| 6 | 5 |
| 7 | 5 |
| 7 | 6 |
+-------------+---------+
If a shipment is deleted and the associated Item_ID(s) are not included in any other shipments, the Location_ID(s) for the associated Item_ID(s) on the Item Table must be set to NULL.
Example: If Shipment_ID:2 is deleted, update the Location_ID for Item_ID:2 and Item_ID:3 on the Item Table to NULL. Also delete the entries for Shipment_ID:2 from the Item_Shipment_Link Table.
Item Table
+---------+-------------+
| Item_ID | Location_ID |
+---------+-------------+
| 1 | 23 |
| 2 | NULL |
| 3 | NULL |
| 4 | 25 |
| 5 | 13 |
| 6 | 13 |
+---------+-------------+
Shipments Table
+-------------+------------------+----------------+
| Shipment_ID | Location_From_ID | Location_To_ID |
+-------------+------------------+----------------+
| 1 | 15 | 23 |
| 3 | 7 | 16 |
| 4 | 15 | 21 |
| 5 | 16 | 25 |
| 6 | 21 | 11 |
| 7 | 11 | 13 |
+-------------+------------------+----------------+
Item_Shipment_Link Table
+-------------+---------+
| Shipment_ID | Item_ID |
+-------------+---------+
| 1 | 1 |
| 3 | 4 |
| 4 | 5 |
| 4 | 6 |
| 5 | 4 |
| 6 | 5 |
| 7 | 5 |
| 7 | 6 |
+-------------+---------+
It is possible for a shipment to contain some items that have been in previous shipments, and some items that have not been in any previous shipments.
Example: If Shipment_ID:7 is deleted, update the Location_ID for Item_ID:5 on the Item Table to 11. Update the Location_ID for Item_ID:6 on the Item Table to NULL. Also delete the entries for Shipment_ID:7 from the Item_Shipment_Link Table.
Item Table
+---------+-------------+
| Item_ID | Location_ID |
+---------+-------------+
| 1 | 23 |
| 2 | 44 |
| 3 | 44 |
| 4 | 25 |
| 5 | 11 |
| 6 | NULL |
+---------+-------------+
Shipments Table
+-------------+------------------+----------------+
| Shipment_ID | Location_From_ID | Location_To_ID |
+-------------+------------------+----------------+
| 1 | 15 | 23 |
| 2 | 12 | 44 |
| 3 | 7 | 16 |
| 4 | 15 | 21 |
| 5 | 16 | 25 |
| 6 | 21 | 11 |
+-------------+------------------+----------------+
Item_Shipment_Link Table
+-------------+---------+
| Shipment_ID | Item_ID |
+-------------+---------+
| 1 | 1 |
| 2 | 2 |
| 2 | 3 |
| 3 | 4 |
| 4 | 5 |
| 4 | 6 |
| 5 | 4 |
| 6 | 5 |
+-------------+---------+
Note: It does not matter if an Item_ID is associated with shipments that occurred before or after the Shipment_ID being deleted. The application will only allow you to delete the most recent shipment associated with an Item_ID, so if an Item_ID has entries in the Item_Shipment_Link Table they are guaranteed to have occurred prior to the shipment being deleted.
Essentially, I need to get each Item_ID associated with the Shipment_ID that is being deleted and then check if each Item_ID has more than 1 entry in the Item_Shipment_Link Table. If it has more than 1 entry, set its Location_ID on the Item Table to the Location_From_ID of the shipment that is being deleted. If it has only 1 entry, set its Location_ID on the Item Table to NULL.
This is the pseudo code for what I am trying to accomplish:
CREATE PROCEDURE ExampleProcedure (#Shipment_ID INT)
DECLARE #OLD_Location_From_ID INT
SET #OLD_Location_From_ID =
(SELECT Location_From_ID FROM Shipments WHERE Shipment_ID = #Shipment_ID)
FOR (each Item_ID associated with #Shipment_ID on the Item_Shipment_Link Table)
IF (Item_ID has more than 1 entry in the Item_Shipment_Link Table)
BEGIN
UPDATE i
SET i.Location_ID = #OLD_Location_From_ID
FROM Item i
JOIN Item_Shipment_Link l ON i.Item_ID = l.Item_ID
WHERE l.Shipment_ID = #Shipment_ID
END
ELSE (Item_ID has only 1 entry in the Item_Shipment_Link Table)
BEGIN
UPDATE i
SET i.Location_ID = NULL
FROM Item i
JOIN Item_Shipment_Link l ON i.Item_ID = l.Item_ID
WHERE l.Shipment_ID = #Shipment_ID
END
DELETE Item_Shipment_Link
WHERE Shipment_ID = #Shipment_ID
DELETE Shipments
WHERE Shipment_ID = #Shipment_ID

Incremental/Update in hive

I have a hive external table with data say, (version less than 0.14)
+--------+------+------+------+
| id | A | B | C |
+--------+------+------+------+
| 10011 | 10 | 3 | 0 |
| 10012 | 9 | 0 | 40 |
| 10015 | 10 | 3 | 0 |
| 10017 | 9 | 0 | 40 |
+--------+------+------+------+
And I have a delta file having data given below.
+--------+------+------+------+
| id | A | B | C |
+--------+------+------+------+
| 10012 | 50 | 3 | 10 | --> update
| 10013 | 29 | 0 | 40 | --> insert
| 10014 | 10 | 3 | 0 | --> update
| 10013 | 19 | 0 | 40 | --> update
| 10015 | 70 | 3 | 0 | --> update
| 10016 | 17 | 0 | 40 | --> insert
+--------+------+------+------+
How can I update my hive table with the delta file, without using sqoop. Any help on how to proceed will be great! Thanks.
This is because there is duplicates in the file. How do you know which you should keep? The last one?
In that case you can use, for example, the row_number and then get the maximum value. Something like that.
SELECT coalesce(tmp.id,initial.id) as id,
coalesce(tmp.A, initial.A) as A,
coalesce(tmp.B,initial.B) as B,
coalesce(tmp.C, initial.C) as C
FROM
table_a initial
FULL OUTER JOIN
( SELECT *, row_number() over( partition by id ) as row_num
,COUNT(*) OVER (PARTITION BY id) AS cnt
FROM temp_table
) tmp
ON initial.id=tmp.id
WHERE row_num=cnt
OR row_num IS NULL;
Output:
+--------+-----+----+-----+--+
| id | a | b | c |
+--------+-----+----+-----+--+
| 10011 | 10 | 3 | 0 |
| 10012 | 50 | 3 | 10 |
| 10013 | 19 | 0 | 40 |
| 10014 | 10 | 3 | 0 |
| 10015 | 70 | 3 | 0 |
| 10016 | 17 | 0 | 40 |
| 10017 | 9 | 0 | 40 |
+--------+-----+----+-----+--+
You can load the file to a temporary table in hive and then execute a FULL OUTER JOIN between the two tables.
Query Example:
SELECT coalesce(tmp.id,initial.id) as id,
coalesce(tmp.A, initial.A) as A,
coalesce(tmp.B,initial.B) as B,
coalesce(tmp.C, initial.C) as C
FROM
table_a initial
FULL OUTER JOIN
temp_table tmp on initial.id=tmp.id;
Output
+--------+-----+----+-----+--+
| id | a | b | c |
+--------+-----+----+-----+--+
| 10011 | 10 | 3 | 0 |
| 10012 | 50 | 3 | 10 |
| 10013 | 29 | 0 | 40 |
| 10013 | 19 | 0 | 40 |
| 10014 | 10 | 3 | 0 |
| 10015 | 70 | 3 | 0 |
| 10016 | 17 | 0 | 40 |
| 10017 | 9 | 0 | 40 |
+--------+-----+----+-----+--+

I need a view which will contain columns, CatId and Flag . Flag will be 1 if there exist any subcategory for CatID

There are two tables: Category and CatID | CatName.
| | CatID | CategoryName |
|---|-------|--------------------|
| 1 | 1021 | Home |
| 2 | 1022 | Corporate |
| 3 | 1023 | Products |
| 4 | 1024 | Gardens |
| 5 | 1025 | Investor Relations |
| 6 | 1026 | News & Events |
| 7 | 1027 | Contact Us |
and SubCategory and SubID | CatID
| | SubID | CatID |
|----|-------|-------|
| 1 | 9 | 1025 |
| 2 | 5 | 1022 |
| 3 | 6 | 1022 |
| 4 | 10 | 1025 |
| 5 | 11 | 1025 |
| 6 | 12 | 1025 |
| 7 | 13 | 1025 |
| 8 | 14 | 1025 |
| 9 | 15 | 1025 |
| 10 | 16 | 1026 |
| 11 | 17 | 1026 |
| 12 | 7 | 1022 |
| 13 | 8 | 1022 |
| 14 | 18 | 1023 |
I want to get a view, in which there will be two columns View and CatID | Flag, where 0 if no subcategory for that CatId else 1.
I'd count the subcategories and left-join on that:
CREATE VIEW my_view AS
SELECT c.CatId, CASE WHEN cnt IS NOT NULL THEN 1 ELSE 0 END AS Flag
FROM Categoery c
LEFT JOIN (SELECT CatId, COUNT(*) AS cnt
FROM SubCategory
GROUP BY CatId) s ON c.CatId = s.CatId

How to generate merit list from exam results in SQL Server

I'm using SQL Server 2008 R2. I have a table called tstResult in my database.
AI SubID StudID StudName TotalMarks ObtainedMarks
--------------------------------------------------------
1 | 1 | 1 | Jakir | 100 | 90
2 | 1 | 2 | Rubel | 100 | 75
3 | 1 | 3 | Ruhul | 100 | 82
4 | 1 | 4 | Beauty | 100 | 82
5 | 1 | 5 | Bulbul | 100 | 96
6 | 1 | 6 | Ripon | 100 | 82
7 | 1 | 7 | Aador | 100 | 76
8 | 1 | 8 | Jibon | 100 | 80
9 | 1 | 9 | Rahaat | 100 | 82
Now I want a SELECT query that generate a merit list according to the Obtained Marks. In this query obtained marks "96" will be the top in the merit list and all the "82" marks will be placed one after another in the merit list. Something like this:
StudID StudName TotalMarks ObtainedMarks Merit List
----------------------------------------------------------
| 5 | Bulbul | 100 | 96 | 1
| 1 | Jakir | 100 | 90 | 2
| 9 | Rahaat | 100 | 82 | 3
| 3 | Ruhul | 100 | 82 | 3
| 4 | Beauty | 100 | 82 | 3
| 6 | Ripon | 100 | 82 | 3
| 8 | Jibon | 100 | 80 | 4
| 7 | Aador | 100 | 76 | 5
| 2 | Rubel | 100 | 75 | 6
;with cte as
(
select *, dense_rank() over (order by ObtainedMarks desc) as Merit_List
from tstResult
)
select * from cte order by Merit_List desc
you need to use Dense_rank()
select columns from tstResult order by ObtainedMarks desc