SQL Update based on aggregate record set

SQL Update based on aggregate record set - sql

I have a table with purchase orders:
po_line table
+--------+---------+-----------+
| po_num | po_line | date |
+--------+---------+-----------+
| 1 | 1 | 9/22/2013 |
| 1 | 2 | 9/22/2013 |
| 1 | 3 | 9/22/2013 |
| 2 | 1 | 9/21/2013 |
| 2 | 2 | NULL |
+--------+---------+-----------+
po table
+--------+-----------+
| po_num | confirmed |
+--------+-----------+
| 1 | NULL |
| 2 | NULL |
+--------+-----------+
For a given po, example po_num 1, I am wanting to update a value in table 2 to 'confirmed' if all the records have a date in them for those lines. Example 1 would populate confirmed. PO 2 would fail the criteria since line 2 has no date.
Do I need to use a cursor to do this? Running sql 2008 r2.

UPDATE po SET confirmed = 'confirmed'
FROM po T
WHERE
NOT T.po_num IN
(
SELECT po_num FROM po_line
WHERE po_date IS NULL
)

Alternatively, if you want to make sure that are entries for each po in the po_line table before confirming, you can use:
update po set confirmed = 'confirmed'
where po.po_num in (select po_num from
(select po_num, count(po_date) dated, count(*) total from po_line group by po_num) q
where dated=total)
as shown in http://sqlfiddle.com/#!6/b16988/8/0

Related

How to join two tables with sum of one column and with condition

I have two tables:
table 1
+-------------+--------------+-----------------+
| id_product | id_customer |start_date |
+-------------+--------------+-----------------+
| 1 | 1 | 2021-08-28T10:37|
| 1 | 2 | 2021-08-28T11:17|
| 1 | 3 | 2021-08-28T12:27|
| 2 | 1 | 2021-08-28T17:00|
table 2
+-------------+------------------+----------+-------------------------------+
| id_customer | stop_date | duration | 20 other columns like duration|
+-------------+------------------+----------+-------------------------------+
| 1 | 2021-08-27T17:00| 20 | ...
| 1 | 2021-08-26T17:00| 40 | ...
| 2 | 2021-08-29T17:00| 120 | ...
| 1 | 2021-08-30T17:00| 40 | ...
| ..........................................|
start_date in table 1 is the date the customer started the product.
stop_datein table 2 is the date the customer stopped the product.
I want to join these two tables to have something like : one row with :
productid
customer_id
start_date
sum of all duration for all the stop_date BEFORE start_date.
same as duration for all the 20 reminding columns.
example for product_id = 1, custom_id = 1 :
+-------------+--------------+-----------------+---------------+-----------------------------------+
| id_product | id_customer |start_date | sum(duration) | sum(all other columns from table 2)
+-------------+--------------+-----------------+---------------+-----------------------------------+
| 1 | 1 | 2021-08-28T10:37| 60
I have a really big tables, I am using pyspark with SQL. Do you know an optimised way to this ?
Thank you
EDIT :
There is also an id_product in table2

SELECT
Table_1.id_product,
Table_1.id_customer,
Table_1.start_date,
SUM(duration) AS [sum(duration)]
---,SUM(duration2)
---,SUM(duration3)
FROM Table_1
LEFT JOIN Table_2 ON
Table_2.id_customer = Table_1.id_customer
AND Table_2.id_product = Table_1.id_product
AND Table_2.stop_date < Table_1.start_date
GROUP BY Table_1.id_product,Table_1.id_customer, Table_1.start_date

How to select the latest date for each group by number?

I've been stuck on this question for a while, and I was wondering if the community would be able to direct me in the right direction?
I have some tag IDs that needs to be grouped, with exceptions (column: deleted) that need to be retained in the results. After which, for each grouped tag ID, I need to select the one with the latest date. How can I do this? An example below:
ID | TAG_ID | DATE | DELETED
1 | 300 | 05/01/20 | null
2 | 300 | 03/01/20 | 04/01/20
3 | 400 | 06/01/20 | null
4 | 400 | 05/01/20 | null
5 | 400 | 04/01/20 | null
6 | 500 | 03/01/20 | null
7 | 500 | 02/01/20 | null
I am trying to reach this outcome:
ID | TAG_ID | DATE | DELETED
1 | 300 | 05/01/20 | null
2 | 300 | 03/01/20 | 04/01/20
3 | 400 | 06/01/20 | null
6 | 500 | 03/01/20 | null
So, firstly if there is a date in the "DELETED" column, I would like the row to be present. Secondly, for each unique tag ID, I would like the row with the latest "DATE" to be present.
Hopefully this question is clear. Would appreciate your feedback and help! A big thanks in advance.

Your results seem to be something like this:
select t.*
from (select t.*,
row_number() over (partition by tag_id, deleted order by date desc) as seqnum
from t
) t
where seqnum = 1 or deleted is not null;
This takes one row where deleted is null -- the most recent row. It also keeps each row where deleted is not null.

You need 2 conditions combined with OR in the WHERE clause:
the 1st is deleted is not null, or
the 2nd that there isn't any other row with the same tag_id and date later than the current row's date, meaning that the current row's date is the latest:
select t.* from tablename t
where t.deleted is not null
or not exists (
select 1 from tablename
where tag_id = t.tag_id and date > t.date
)
See the demo.
Results:
| id | tag_id | date | deleted |
| --- | ------ | ---------- | -------- |
| 1 | 300 | 2020-05-01 | |
| 2 | 300 | 2020-03-01 | 04/01/20 |
| 3 | 400 | 2020-06-01 | |
| 6 | 500 | 2020-03-01 | |

Duplicate records upon joining table

I am still very new to SQL and Tableau however I am trying to work myself towards achieving a personal project of mine.
Table A; shows a table which contains the defect quantity per product category and when it was raised
+--------+-------------+--------------+-----------------+
| Issue# | Date_Raised | Category_ID# | Defect_Quantity |
+--------+-------------+--------------+-----------------+
| PCR12 | 11-Jan-2019 | Product#1 | 14 |
| PCR13 | 12-Jan-2019 | Product#1 | 54 |
| PCR14 | 5-Feb-2019 | Product#1 | 5 |
| PCR15 | 5-Feb-2019 | Product#2 | 7 |
| PCR16 | 20-Mar-2019 | Product#1 | 76 |
| PCR17 | 22-Mar-2019 | Product#2 | 5 |
| PCR18 | 25-Mar-2019 | Product#1 | 89 |
+--------+-------------+--------------+-----------------+
Table B; shows the consumption quantity of each product by month
+-------------+--------------+-------------------+
| Date_Raised | Category_ID# | Consumed_Quantity |
+-------------+--------------+-------------------+
| 5-Jan-2019 | Product#1 | 100 |
| 17-Jan-2019 | Product#1 | 200 |
| 5-Feb-2019 | Product#1 | 100 |
| 8-Feb-2019 | Product#2 | 50 |
| 10-Mar-2019 | Product#1 | 100 |
| 12-Mar-2019 | Product#2 | 50 |
+-------------+--------------+-------------------+
END RESULT
I would like to create a table/bar chart in tableau that shows that Defect_Quantity/Consumed_Quantity per month, per Category_ID#, so something like this below;
+----------+-----------+-----------+
| Month | Product#1 | Product#2 |
+----------+-----------+-----------+
| Jan-2019 | 23% | |
| Feb-2019 | 5% | 14% |
| Mar-2019 | 89% | 10% |
+----------+-----------+-----------+
WHAT I HAVE TRIED SO FAR
Unfortunately i have not really done anything, i am struggling to understand how do i get rid of the duplicates upon joining the tables based on Category_ID#.
Appreciate all the help I can receive here.

I can think of doing left joins on both product1 and 2.
select to_char(to_date(Date_Raised,'d-mon-yyyy'),'mon-yyyy')
, (p2.product1 - sum(case when category_id='Product#1' then Defect_Quantity else 0 end))/p2.product1 * 100
, (p2.product2 - sum(case when category_id='Product#2' then Defect_Quantity else 0 end))/p2.product2 * 100
from tableA t1
left join
(select to_char(to_date(Date_Raised,'d-mon-yyyy'),'mon-yyyy') Date_Raised
, sum(Comsumed_Quantity) as product1 tableB
where category_id = 'Product#1'
group by to_char(to_date(Date_Raised,'d-mon-yyyy'),'mon-yyyy')) p1
on p1.Date_Raised = t1.Date_Raised
left join
(select to_char(to_date(Date_Raised,'d-mon-yyyy'),'mon-yyyy') Date_Raised
, sum(Comsumed_Quantity) as product2 tableB
where category_id = 'Product#2'
group by to_char(to_date(Date_Raised,'d-mon-yyyy'),'mon-yyyy')) p2
on p2.Date_Raised = t1.Date_Raised
group by to_char(to_date(Date_Raised,'d-mon-yyyy'),'mon-yyyy')

By using ROW_NUMBER() OVER (PARTITION BY ORDER BY ) as RN, you can remove duplicate rows. As of your end result you should extract month from date and use pivot to achieve.

I would do this as:
select to_char(date_raised, 'YYYY-MM'),
(sum(case when product = 'Product#1' then defect_quantity end) /
sum(case when product = 'Product#1' then consumed_quantity end)
) as product1,
(sum(case when product = 'Product#2' then defect_quantity end) /
sum(case when product = 'Product#2' then consumed_quantity end)
) as product2
from ((select date_raised, product, defect_quantity, 0 as consumed_quantity
from a
) union all
(select date_raised, product, 0 as defect_quantity, consumed_quantity
from b
)
) ab
group by to_char(date_raised, 'YYYY-MM')
order by min(date_raised);
(I changed the date format because I much prefer YYYY-MM, but that is irrelevant to the logic.)
Why do I prefer this method? This will include all months where there is a row in either table. I don't have to worry that some months are inadvertently filtered out, because there are missing production or defects in one month.

PL/SQL - Update Rows with Max Date Using Value from another table

I have an Oracle database where I'm working with two tables, as shown below.
ITEMS
ITEM_ID | ITEM_DESC | ITEM_STATUS
============================================
1 | ITEM 1 | A
2 | ITEM 2 | A
3 | ITEM 3 | I
4 | ITEM 4 | A
ITEM_UPDATES
ITEM_ID | LAST_CHANGE | ITEM_STATUS
=============================================
1 | 1/21/2010 |
1 | 4/1/2015 |
2 | 1/21/2010 |
2 | 7/14/2016 |
3 | 1/21/2010 |
3 | 10/21/2011 |
3 | 11/15/2017 |
4 | 11/30/2010 |
We are wanting to change the way that ITEM_STATUS is tracked in this system, and I'm trying to move the ITEM_STATUS column to the ITEM_UPDATES table. Things that occur in the past don't matter and will likely have unique status, however I want to set ITEM_STATUS for each record with a MAX(LAST_CHANGE) for a given ID to the value of the ITEM_STATUS column currently in ITEMS. So basically, the finished table would look like this.
ITEM_UPDATES
ITEM_ID | LAST_CHANGE | ITEM_STATUS
=============================================
1 | 1/21/2010 |
1 | 4/1/2015 | A
2 | 1/21/2010 |
2 | 7/14/2016 | A
3 | 1/21/2010 |
3 | 10/21/2011 |
3 | 11/15/2017 | I
4 | 11/30/2010 | A
I have the query to select the proper data below, but I don't know how to translate this into an update statement given that I'm having to compare item_ids AND whether or not something is the max date record for that item. Is this doable?
SELECT ITEM_UPDATES.ITEM_ID, ITEMS.ITEM_STATUS, MAX(EFFECTIVE_DATE) AS MAX_DATE
FROM ITEM_UPDATES, ITEMS
WHERE ITEM_UPDATES.ITEM_ID = ITEMS.ITEM_ID
GROUP BY ITEM_UPDATES.ITEM_ID, ITEMS.ITEM_STATUS

So you want the status updated on the most recent item_updates record. You can do:
update item_updates iu
set item_status = (select i.item_status from items where i.item_id = iu.item_id)
where iu.effective_date = (select max(iu2.effective_date)
from item_updates iu2
where iu2.item_id = iu.item_id
);

Maybe:
update item_updates iup
set iup.item_status = (select item_status ist
from ist.item_id = iup.item_id)
where (iup.item_id, iup.last_change) = (select iup2.item_id, max(iup.last_change)
from item_updates iup2
where iup2.item_id = iup.item_id
group by iup2.item_id)
Now that I see Gordon Linoff's answer, I aks myself why I added the (already correlated) item_id...

How can I do SQL query count based on certain criteria including row order

I've come across certain logic that I need for my SQL query. Given that I have a table as such:
+----------+-------+------------+
| product | valid | Date |
+----------+-------+------------+
| 1 | null | 2016-05-10 |
| 1 | null | 2016-05-09 |
| 1 | yes | 2016-05-08 |
+----------+-------+------------+
This table is produced by a simple query:
SELECT * FROM products WHERE product = 1 ORDER BY date desc
Now what I need to do is create a query to count the number of nulls for certain products by order of date until there is a yes value. So the above example the count would be 2 as there are 2 nulls until a yes.
+----------+-------+------------+
| product | valid | Date |
+----------+-------+------------+
| 2 | null | 2016-05-10 |
| 2 | yes | 2016-05-09 |
| 2 | null | 2016-05-08 |
+----------+-------+------------+
Above would return 1 as there is 1 null until a yes.
+----------+-------+------------+
| product | valid | Date |
+----------+-------+------------+
| 3 | yes | 2016-05-10 |
| 3 | yes | 2016-05-09 |
| 3 | null | 2016-05-08 |
+----------+-------+------------+
Above would return 0.

You need a Correlated Subquery like this:
SELECT COUNT(*)
FROM products AS p1
WHERE product = 1
AND Date >
( -- maximum date with 'yes'
SELECT MAX(Date)
FROM products AS p2
WHERE p1.product = p2.product
AND Valid = 'yes'
)

This should do it:
select count(1) from table where valid is null and date > (select min(date) from table where valid = 'yes')

Not sure if your logic provided covers all the possible weird and wonderful extreme scenarios but the following piece of code would do what you are after:
select a.product,
count(IIF(a.valid is null and a.date >maxdate,a.date,null)) as total
from sometable a
inner join (
select product, max(date) as Maxdate
from sometable where valid='yes' group by product
) b
on a.product=b.product group by a.product

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Update based on aggregate record set - sql

UPDATE po SET confirmed = 'confirmed' FROM po T WHERE NOT T.po_num IN ( SELECT po_num FROM po_line WHERE po_date IS NULL )

Related

How to join two tables with sum of one column and with condition

How to select the latest date for each group by number?

Duplicate records upon joining table

PL/SQL - Update Rows with Max Date Using Value from another table

How can I do SQL query count based on certain criteria including row order

Categories

Resources