SQL Update based on aggregate record set - sql

I have a table with purchase orders:
po_line table
+--------+---------+-----------+
| po_num | po_line | date |
+--------+---------+-----------+
| 1 | 1 | 9/22/2013 |
| 1 | 2 | 9/22/2013 |
| 1 | 3 | 9/22/2013 |
| 2 | 1 | 9/21/2013 |
| 2 | 2 | NULL |
+--------+---------+-----------+
po table
+--------+-----------+
| po_num | confirmed |
+--------+-----------+
| 1 | NULL |
| 2 | NULL |
+--------+-----------+
For a given po, example po_num 1, I am wanting to update a value in table 2 to 'confirmed' if all the records have a date in them for those lines. Example 1 would populate confirmed. PO 2 would fail the criteria since line 2 has no date.
Do I need to use a cursor to do this? Running sql 2008 r2.

UPDATE po SET confirmed = 'confirmed'
FROM po T
WHERE
NOT T.po_num IN
(
SELECT po_num FROM po_line
WHERE po_date IS NULL
)

Alternatively, if you want to make sure that are entries for each po in the po_line table before confirming, you can use:
update po set confirmed = 'confirmed'
where po.po_num in (select po_num from
(select po_num, count(po_date) dated, count(*) total from po_line group by po_num) q
where dated=total)
as shown in http://sqlfiddle.com/#!6/b16988/8/0

Related

How to join two tables with sum of one column and with condition

I have two tables:
table 1
+-------------+--------------+-----------------+
| id_product | id_customer |start_date |
+-------------+--------------+-----------------+
| 1 | 1 | 2021-08-28T10:37|
| 1 | 2 | 2021-08-28T11:17|
| 1 | 3 | 2021-08-28T12:27|
| 2 | 1 | 2021-08-28T17:00|
table 2
+-------------+------------------+----------+-------------------------------+
| id_customer | stop_date | duration | 20 other columns like duration|
+-------------+------------------+----------+-------------------------------+
| 1 | 2021-08-27T17:00| 20 | ...
| 1 | 2021-08-26T17:00| 40 | ...
| 2 | 2021-08-29T17:00| 120 | ...
| 1 | 2021-08-30T17:00| 40 | ...
| ..........................................|
start_date in table 1 is the date the customer started the product.
stop_datein table 2 is the date the customer stopped the product.
I want to join these two tables to have something like : one row with :
productid
customer_id
start_date
sum of all duration for all the stop_date BEFORE start_date.
same as duration for all the 20 reminding columns.
example for product_id = 1, custom_id = 1 :
+-------------+--------------+-----------------+---------------+-----------------------------------+
| id_product | id_customer |start_date | sum(duration) | sum(all other columns from table 2)
+-------------+--------------+-----------------+---------------+-----------------------------------+
| 1 | 1 | 2021-08-28T10:37| 60
I have a really big tables, I am using pyspark with SQL. Do you know an optimised way to this ?
Thank you
EDIT :
There is also an id_product in table2
SELECT
Table_1.id_product,
Table_1.id_customer,
Table_1.start_date,
SUM(duration) AS [sum(duration)]
---,SUM(duration2)
---,SUM(duration3)
FROM Table_1
LEFT JOIN Table_2 ON
Table_2.id_customer = Table_1.id_customer
AND Table_2.id_product = Table_1.id_product
AND Table_2.stop_date < Table_1.start_date
GROUP BY Table_1.id_product,Table_1.id_customer, Table_1.start_date

How to select the latest date for each group by number?

I've been stuck on this question for a while, and I was wondering if the community would be able to direct me in the right direction?
I have some tag IDs that needs to be grouped, with exceptions (column: deleted) that need to be retained in the results. After which, for each grouped tag ID, I need to select the one with the latest date. How can I do this? An example below:
ID | TAG_ID | DATE | DELETED
1 | 300 | 05/01/20 | null
2 | 300 | 03/01/20 | 04/01/20
3 | 400 | 06/01/20 | null
4 | 400 | 05/01/20 | null
5 | 400 | 04/01/20 | null
6 | 500 | 03/01/20 | null
7 | 500 | 02/01/20 | null
I am trying to reach this outcome:
ID | TAG_ID | DATE | DELETED
1 | 300 | 05/01/20 | null
2 | 300 | 03/01/20 | 04/01/20
3 | 400 | 06/01/20 | null
6 | 500 | 03/01/20 | null
So, firstly if there is a date in the "DELETED" column, I would like the row to be present. Secondly, for each unique tag ID, I would like the row with the latest "DATE" to be present.
Hopefully this question is clear. Would appreciate your feedback and help! A big thanks in advance.
Your results seem to be something like this:
select t.*
from (select t.*,
row_number() over (partition by tag_id, deleted order by date desc) as seqnum
from t
) t
where seqnum = 1 or deleted is not null;
This takes one row where deleted is null -- the most recent row. It also keeps each row where deleted is not null.
You need 2 conditions combined with OR in the WHERE clause:
the 1st is deleted is not null, or
the 2nd that there isn't any other row with the same tag_id and date later than the current row's date, meaning that the current row's date is the latest:
select t.* from tablename t
where t.deleted is not null
or not exists (
select 1 from tablename
where tag_id = t.tag_id and date > t.date
)
See the demo.
Results:
| id | tag_id | date | deleted |
| --- | ------ | ---------- | -------- |
| 1 | 300 | 2020-05-01 | |
| 2 | 300 | 2020-03-01 | 04/01/20 |
| 3 | 400 | 2020-06-01 | |
| 6 | 500 | 2020-03-01 | |

Duplicate records upon joining table

I am still very new to SQL and Tableau however I am trying to work myself towards achieving a personal project of mine.
Table A; shows a table which contains the defect quantity per product category and when it was raised
+--------+-------------+--------------+-----------------+
| Issue# | Date_Raised | Category_ID# | Defect_Quantity |
+--------+-------------+--------------+-----------------+
| PCR12 | 11-Jan-2019 | Product#1 | 14 |
| PCR13 | 12-Jan-2019 | Product#1 | 54 |
| PCR14 | 5-Feb-2019 | Product#1 | 5 |
| PCR15 | 5-Feb-2019 | Product#2 | 7 |
| PCR16 | 20-Mar-2019 | Product#1 | 76 |
| PCR17 | 22-Mar-2019 | Product#2 | 5 |
| PCR18 | 25-Mar-2019 | Product#1 | 89 |
+--------+-------------+--------------+-----------------+
Table B; shows the consumption quantity of each product by month
+-------------+--------------+-------------------+
| Date_Raised | Category_ID# | Consumed_Quantity |
+-------------+--------------+-------------------+
| 5-Jan-2019 | Product#1 | 100 |
| 17-Jan-2019 | Product#1 | 200 |
| 5-Feb-2019 | Product#1 | 100 |
| 8-Feb-2019 | Product#2 | 50 |
| 10-Mar-2019 | Product#1 | 100 |
| 12-Mar-2019 | Product#2 | 50 |
+-------------+--------------+-------------------+
END RESULT
I would like to create a table/bar chart in tableau that shows that Defect_Quantity/Consumed_Quantity per month, per Category_ID#, so something like this below;
+----------+-----------+-----------+
| Month | Product#1 | Product#2 |
+----------+-----------+-----------+
| Jan-2019 | 23% | |
| Feb-2019 | 5% | 14% |
| Mar-2019 | 89% | 10% |
+----------+-----------+-----------+
WHAT I HAVE TRIED SO FAR
Unfortunately i have not really done anything, i am struggling to understand how do i get rid of the duplicates upon joining the tables based on Category_ID#.
Appreciate all the help I can receive here.
I can think of doing left joins on both product1 and 2.
select to_char(to_date(Date_Raised,'d-mon-yyyy'),'mon-yyyy')
, (p2.product1 - sum(case when category_id='Product#1' then Defect_Quantity else 0 end))/p2.product1 * 100
, (p2.product2 - sum(case when category_id='Product#2' then Defect_Quantity else 0 end))/p2.product2 * 100
from tableA t1
left join
(select to_char(to_date(Date_Raised,'d-mon-yyyy'),'mon-yyyy') Date_Raised
, sum(Comsumed_Quantity) as product1 tableB
where category_id = 'Product#1'
group by to_char(to_date(Date_Raised,'d-mon-yyyy'),'mon-yyyy')) p1
on p1.Date_Raised = t1.Date_Raised
left join
(select to_char(to_date(Date_Raised,'d-mon-yyyy'),'mon-yyyy') Date_Raised
, sum(Comsumed_Quantity) as product2 tableB
where category_id = 'Product#2'
group by to_char(to_date(Date_Raised,'d-mon-yyyy'),'mon-yyyy')) p2
on p2.Date_Raised = t1.Date_Raised
group by to_char(to_date(Date_Raised,'d-mon-yyyy'),'mon-yyyy')
By using ROW_NUMBER() OVER (PARTITION BY ORDER BY ) as RN, you can remove duplicate rows. As of your end result you should extract month from date and use pivot to achieve.
I would do this as:
select to_char(date_raised, 'YYYY-MM'),
(sum(case when product = 'Product#1' then defect_quantity end) /
sum(case when product = 'Product#1' then consumed_quantity end)
) as product1,
(sum(case when product = 'Product#2' then defect_quantity end) /
sum(case when product = 'Product#2' then consumed_quantity end)
) as product2
from ((select date_raised, product, defect_quantity, 0 as consumed_quantity
from a
) union all
(select date_raised, product, 0 as defect_quantity, consumed_quantity
from b
)
) ab
group by to_char(date_raised, 'YYYY-MM')
order by min(date_raised);
(I changed the date format because I much prefer YYYY-MM, but that is irrelevant to the logic.)
Why do I prefer this method? This will include all months where there is a row in either table. I don't have to worry that some months are inadvertently filtered out, because there are missing production or defects in one month.

PL/SQL - Update Rows with Max Date Using Value from another table

I have an Oracle database where I'm working with two tables, as shown below.
ITEMS
ITEM_ID | ITEM_DESC | ITEM_STATUS
============================================
1 | ITEM 1 | A
2 | ITEM 2 | A
3 | ITEM 3 | I
4 | ITEM 4 | A
ITEM_UPDATES
ITEM_ID | LAST_CHANGE | ITEM_STATUS
=============================================
1 | 1/21/2010 |
1 | 4/1/2015 |
2 | 1/21/2010 |
2 | 7/14/2016 |
3 | 1/21/2010 |
3 | 10/21/2011 |
3 | 11/15/2017 |
4 | 11/30/2010 |
We are wanting to change the way that ITEM_STATUS is tracked in this system, and I'm trying to move the ITEM_STATUS column to the ITEM_UPDATES table. Things that occur in the past don't matter and will likely have unique status, however I want to set ITEM_STATUS for each record with a MAX(LAST_CHANGE) for a given ID to the value of the ITEM_STATUS column currently in ITEMS. So basically, the finished table would look like this.
ITEM_UPDATES
ITEM_ID | LAST_CHANGE | ITEM_STATUS
=============================================
1 | 1/21/2010 |
1 | 4/1/2015 | A
2 | 1/21/2010 |
2 | 7/14/2016 | A
3 | 1/21/2010 |
3 | 10/21/2011 |
3 | 11/15/2017 | I
4 | 11/30/2010 | A
I have the query to select the proper data below, but I don't know how to translate this into an update statement given that I'm having to compare item_ids AND whether or not something is the max date record for that item. Is this doable?
SELECT ITEM_UPDATES.ITEM_ID, ITEMS.ITEM_STATUS, MAX(EFFECTIVE_DATE) AS MAX_DATE
FROM ITEM_UPDATES, ITEMS
WHERE ITEM_UPDATES.ITEM_ID = ITEMS.ITEM_ID
GROUP BY ITEM_UPDATES.ITEM_ID, ITEMS.ITEM_STATUS
So you want the status updated on the most recent item_updates record. You can do:
update item_updates iu
set item_status = (select i.item_status from items where i.item_id = iu.item_id)
where iu.effective_date = (select max(iu2.effective_date)
from item_updates iu2
where iu2.item_id = iu.item_id
);
Maybe:
update item_updates iup
set iup.item_status = (select item_status ist
from ist.item_id = iup.item_id)
where (iup.item_id, iup.last_change) = (select iup2.item_id, max(iup.last_change)
from item_updates iup2
where iup2.item_id = iup.item_id
group by iup2.item_id)
Now that I see Gordon Linoff's answer, I aks myself why I added the (already correlated) item_id...

How can I do SQL query count based on certain criteria including row order

I've come across certain logic that I need for my SQL query. Given that I have a table as such:
+----------+-------+------------+
| product | valid | Date |
+----------+-------+------------+
| 1 | null | 2016-05-10 |
| 1 | null | 2016-05-09 |
| 1 | yes | 2016-05-08 |
+----------+-------+------------+
This table is produced by a simple query:
SELECT * FROM products WHERE product = 1 ORDER BY date desc
Now what I need to do is create a query to count the number of nulls for certain products by order of date until there is a yes value. So the above example the count would be 2 as there are 2 nulls until a yes.
+----------+-------+------------+
| product | valid | Date |
+----------+-------+------------+
| 2 | null | 2016-05-10 |
| 2 | yes | 2016-05-09 |
| 2 | null | 2016-05-08 |
+----------+-------+------------+
Above would return 1 as there is 1 null until a yes.
+----------+-------+------------+
| product | valid | Date |
+----------+-------+------------+
| 3 | yes | 2016-05-10 |
| 3 | yes | 2016-05-09 |
| 3 | null | 2016-05-08 |
+----------+-------+------------+
Above would return 0.
You need a Correlated Subquery like this:
SELECT COUNT(*)
FROM products AS p1
WHERE product = 1
AND Date >
( -- maximum date with 'yes'
SELECT MAX(Date)
FROM products AS p2
WHERE p1.product = p2.product
AND Valid = 'yes'
)
This should do it:
select count(1) from table where valid is null and date > (select min(date) from table where valid = 'yes')
Not sure if your logic provided covers all the possible weird and wonderful extreme scenarios but the following piece of code would do what you are after:
select a.product,
count(IIF(a.valid is null and a.date >maxdate,a.date,null)) as total
from sometable a
inner join (
select product, max(date) as Maxdate
from sometable where valid='yes' group by product
) b
on a.product=b.product group by a.product