Compare table to itself and update one value based on another - bulk

Compare table to itself and update one value based on another - bulk - sql

The following select provides a list of 8524 values. Half are duplicates of the other half, with different dates. I need to terminate the older values based on the new DateEffective
SELECT PRID, COUNT(SiteID) AS SiteID_Count FROM PRL
WHERE GETDATE() BETWEEN DateEffective AND DateTerminated
and SiteGID in (190,191,192,193,30,31,32,33)
GROUP BY PRID
HAVING COUNT(SiteID)=2
ORDER BY PRID
Below table shows the current and expected result:
select * from PRL where SiteGID in (30,31,32,33) and PRID = 1339
UNION
select * from PRL where SiteGID in (190,191,192,193) and PRID = 1339
table:
| PRLID | PRID | SiteGID | SiteID | DateEffective | DateTerminated
| 895 | 1339 | 30 | 4353 | 2010-04-10 | 9999-12-31
| 966598 | 1339 | 191 | 4353 | 2021-02-19 | 9999-12-31
| PRLID | PRID | SiteGID | SiteID | DateEffective | DateTerminated
| 895 | 1339 | 30 | 4353 | 2010-04-10 | **2021-02-18**
| 966598 | 1339 | 191 | 4353 | 2021-02-19 | 9999-12-31
I want to link two tmp tables together, possibly using row_number and partitions? I'm really not sure - any advice is greatly appreciated

Based on your description,
PRLID is the primary key of table PRL
Grouping is based on (PRID, SiteID)
DateTerminated needs to be updated with following DateEffective - 1 day if applicable.
with cte as (
select prlid,
date_sub(lead(date_effective,1) over (partition by prid, site_id order by date_effective), interval 1 day) as new_date_terminated
from prl)
update prl as p
inner join cte c
using (prlid)
set p.date_terminated = c.new_date_terminated
where c.new_date_terminated is not null
and p.date_terminated <> c.new_date_terminated;
Outcome:
prlid |prid|site_gid|site_id|date_effective|date_terminated|
------+----+--------+-------+--------------+---------------+
895|1339| 30| 4353| 2010-04-10| 2021-02-18|
966598|1339| 191| 4353| 2021-02-19| 9999-12-31|

Related

Find the first order of a supplier in a day using SQL

I am trying to write a query to return supplier ID (sup_id), order date and the order ID of the first order (based on earliest time).
+--------+--------+------------+--------+-----------------+
|orderid | sup_id | items | sales | order_ts |
+--------+--------+------------+--------+-----------------+
|1111132 | 3 | 1 | 27,0 | 24/04/17 13:00 |
|1111137 | 3 | 2 | 69,0 | 02/02/17 16:30 |
|1111147 | 1 | 1 | 87,0 | 25/04/17 08:25 |
|1111153 | 1 | 3 | 82,0 | 05/11/17 10:30 |
|1111155 | 2 | 1 | 29,0 | 03/07/17 02:30 |
|1111160 | 2 | 2 | 44,0 | 30/01/17 20:45 |
|....... | ... | ... | ... | ... ... |
+--------+--------+------------+--------+-----------------+
Output I am looking for:
+--------+--------+------------+
| sup_id | date | order_id |
+--------+--------+------------+
|....... | ... | ... |
+--------+--------+------------+
I tried using a subquery in the join clause as below but didn't know how to join it without having selected order_id.
SELECT sup_id, date(order_ts), order_id
FROM sales s
JOIN
(
SELECT sup_id, date(order_ts) as date, min(time(order_date))
FROM sales
GROUP BY merchant_id, date
) m
on ...
Kindly assist.

You can use not exists:
select *
from sales
where not exists (
-- find sales for same supplier, earlier date, same day
select *
from sales as older
where older.sup_id = sales.sup_id
and older.order_ts < sales.order_ts
and older.order_ts >= cast(sales.order_ts as date)
)

The query below might not be the fastest in the world, but it should give you all information you need.
select order_id, sup_id, items, sales, order_ts
from sales s
where order_ts <= (
select min(order_ts)
from sales m
where m.sup_id = s.sup_id
)

select sup_id, min(order_ts), min(order_id) from sales
where order_ts = '2022-15-03'
group by sup_id
Assumed orderid is an identity / auto increment column

Subtracting previous row value from current row

I'm doing an aggregation like this:
select
date,
product,
count(*) as cnt
from
t1
where
yyyy_mm_dd in ('2020-03-31', '2020-07-31', '2020-09-30', '2020-12-31')
group by
1,2
order by
product asc, date asc
This produces data which looks like this:
| date | product | cnt | difference |
|------------|---------|------|------------|
| 2020-03-31 | p1 | 100 | null |
| 2020-07-31 | p1 | 1000 | 900 |
| 2020-09-30 | p1 | 900 | -100 |
| 2020-12-31 | p1 | 1100 | 200 |
| 2020-03-31 | p2 | 200 | null |
| 2020-07-31 | p2 | 210 | 10 |
| ... | ... | ... | x |
But without the difference column. How could I make such a calculation? I could pivot the date column and subtract that way but maybe there's a better way

Was able to use lag with partition by and order by to get this to work:
select
date,
product,
count,
count - lag(count) over (partition by product order by date, product) as difference
from(
select
date,
product,
count(*) as count
from
t1
where
yyyy_mm_dd in ('2020-03-31', '2020-07-31', '2020-09-30', '2020-12-31')
group by
1,2
) t

How to de-duplicate SQL table rows by multiple columns with hierarchy?

I have a table with multiple records for each patient.
My end goal is a table that is 1-to-1 between Patient_id and Value.
I would like to de-duplicate (in respect to patient_id) my rows based on "a hierarchical series of aggregate functions" (if someone has a better way to phrase this, I'd appreciate that as well.)
+----+------------+------------+------------+----------+-----------------+-------+
| ID | patient_id | Date | Date2 | Priority | Source | Value |
+----+------------+------------+------------+----------+-----------------+-------+
| 1 | 1 | 2017-09-09 | 2018-09-09 | 1 | 'verified' | 55 |
| 2 | 1 | 2017-09-09 | 2018-11-11 | 2 | 'verified' | 78 |
| 3 | 1 | 2017-11-11 | 2018-09-09 | 3 | 'verified' | 23 |
| 4 | 1 | 2017-11-11 | 2018-11-11 | 1 | 'self_reported' | 11 |
| 5 | 1 | 2017-09-09 | 2018-09-09 | 2 | 'self_reported' | 90 |
| 5 | 1 | 2017-09-09 | 2018-09-09 | 3 | 'self_reported' | 34 |
| 6 | 2 | 2017-11-11 | 2018-09-09 | 2 | 'self_reported' | 21 |
+----+------------+------------+------------+----------+-----------------+-------+
For each patient_id, I would like to get the row(s) that has/have the MAX(Date). In the case that there are still duplicated patient_id, I would like to get the row(s) with the MIN(Priority). In the case that there are still duplicated rows I would like to get the row(s) with the MIN(Date2).
The way I've approached this problem is using a series of queries like this to de-duplicate on the columns one at a time.
SELECT *
FROM #table t1
LEFT JOIN
(SELECT
patient_id,
MIN(priority) AS min_priority
FROM #table
GROUP BY patient_id) t2 ON t2.patient_id = t1.patient_id
WHERE t2.min_priority = t1.priority
Is there a way to do this that allows me to de-dup on multiple columns at once? Is there a more elegant way to do this?
I'm able to get my results, but my solution feels very inefficient, and I keep running into this. Thank you for any input.

You could use row_number(), if your RDBMS supports it:
select ID, patient_id, Date, Date2, Priority, Source, Value
from (
select
t.*,
row_number() over(partition by patient_id order by Date desc, Priority, Date2) rn
from mytable t
) where rn = 1
Another option is to filter with a correlated subquery that sorts the record according to your criteria, like so:
select t.*
from mytable t
where id = (
select id
from mytable t1
where t1.patient_id = t.patient_id
order by t1.Date desc, t1.Priority, t1.Date2
limit 1
)
The actual syntax for limit varies accross RDBMS.

eSQL multiple join but with conditions

I've 3 tables as under
MERCHANDISE
+-----------+-----------+---------------+
| MERCH_NUM | MERCH_DIV | MERCH_SUB_DIV |
+-----------+-----------+---------------+
| 1 | car | awd |
| 1 | car | awd |
| 2 | bike | 1kcc |
| 3 | cycle | hybrid |
| 3 | cycle | city |
| 4 | moped | fixie |
+-----------+-----------+---------------+
PRIORITY
+----------+-----------+---------+---------+------------+------------+---------------+
| CUST_NUM | SALES_NUM | DOC_NUM | BALANCE | PRIORITY_1 | PRIORITY_2 | PRIORITY_CODE |
+----------+-----------+---------+---------+------------+------------+---------------+
| 90 | 1000 | 10 | 23 | 1 | 6 | NO |
| 91 | 1001 | 20 | 32 | 3 | 7 | PRI |
| 92 | 1002 | 30 | 11 | 2 | 8 | LATE |
| 93 | 1003 | 40 | 22 | 5 | 9 | 1MON |
+----------+-----------+---------+---------+------------+------------+---------------+
ORDER
+----------+-----------+---------+---------+-----------+-----------+
| CUST_NUM | SALES_NUM | DOC_NUM | COUNTRY | MERCH_NUM | MERCH_DIV |
+----------+-----------+---------+---------+-----------+-----------+
| 90 | 1000 | 10 | INDIA | 1 | car |
| 91 | 1001 | 20 | CHINA | 2 | bike |
| 92 | 1002 | 30 | USA | 3 | cycle |
| 93 | 1003 | 40 | UK | 4 | moped |
+----------+-----------+---------+---------+-----------+-----------+
I want to join the left joined table from the last two tables with the first one such that the MERCH_SUB_DIV 'awd' appears only once for each unique combination of merch_num and merch_div
the code I came up with is as under, but I'm not sure how do I eliminate the duplicate row just for the awd
select
ROW#, MERCH.MERCH_NUMBER, ORDPRI.MERCH_NUMBER, ORDPRI.CUST_NUM,
BALANCE, SALES_NUM, ITEM_NUM, RANK, PRIORITY_1
from (
select
ROW_NUMBER() OVER(
PARTITION BY ORD.DOC_NUM, ORD.ITEM_NUM
ORDER BY ORD.DOC_NUM, ORD.ITEM_NUM ASC
) AS Row#,
ORD.CUST_NUM, PRI.CUST_NUM, ORD.MERCH_NUM, ORD.MERCH_DIV, PRI.BALANCE,
pri.DOC_NUM, pri.SALES_NUM, pri.PRIORITY_1, pri.PRIORITY_2
from ORDER as ORD
left join PRIORITY as PRI on ORD.DOC_NUM = PRI.DOC_NUM
and ORD.SALES_NUMBER = PRI.SALES_NUM
where country_name in ('USA', ‘INDIA’)
) as ORDPRI
left join MERCHANDISE as MERCH on ORDPRI.DIV = MERCH.DIV
and ORDPRI.MERCH_NUM = MERCH.MERCH_NUM

You have to use 'DISTINCT' keyword to get unique values, but if your 'Priority table' & 'Order table' contains different values for Same MERCH_NUM then the final result contains the repetation of the 'MERCH_NUM'.
SELECT DISTINCT M.MERCH_NUMBER, O.MERCH_NUMBER, O.CUST_NUM, BALANCE, SALES_NUM,ITEM_NUM,RANK,PRIORITY_1
FROM priority_table P
LEFT JOIN order_table O ON P.CUST_NUM = O.CUST_NUM AND P.SALES_NUM=O.SALES_NUM AND P.DOC_NUM = O.DOC_NUM
LEFT JOIN merchandise_table M ON M.MERCH_NUM = O.MERCH_NUM

A way around can be to add one new Row_Number() in the outermost query having Partition by MERCH_SUB_DIV + all the columns in the final list and then filter final results based on the New Row_Number() . Follows a pseudo code that might help:
select
-- All expected columns in final result except the newRow#
ROW#, MERCH_NUM, CUST_NUM,
BALANCE, SALES_NUM, PRIORITY_1
from (
select
ROW#,
-- the new row number includes all column you want to show in final result
row_number() over ( PARTITION BY MERCH.MERCH_SUB_DIV ,
MERCH.MERCH_NUM, ORDPRI.MERCH_NUM, ORDPRI.CUST_NUM,
BALANCE, SALES_NUM, PRIORITY_1
order by (select 1 )) as newRow# ,
MERCH.MERCH_NUM, ORDPRI.CUST_NUM,
BALANCE, SALES_NUM, PRIORITY_1
from (
-- main query goes here
select
ROW_NUMBER() OVER(
PARTITION BY ORD.DOC_NUM --, ORD.ITEM_NUM
ORDER BY ORD.DOC_NUM ASC --, ORD.ITEM_NUM
) AS Row#,
ORD.CUST_NUM, ORD.MERCH_NUM, ORD.MERCH_DIV as DIV, PRI.BALANCE,
pri.DOC_NUM, pri.SALES_NUM, pri.PRIORITY_1, pri.PRIORITY_2
from #ORDER as ORD
left join #PRIORITY as PRI on ORD.DOC_NUM = PRI.DOC_NUM
and ORD.SALES_NUMBER = PRI.SALES_NUM
where country_name in ('USA', 'INDIA')
) as ORDPRI
left join #MERCHANDISE as MERCH on ORDPRI.DIV = MERCH.DIV
and ORDPRI.MERCH_NUM = MERCH.MERCH_NUM
) as T
-- final filter to get distinct values
where newRow# = 1
Sample code here .. Hope this helps!!

SQL fetch records using group by with 3 conditions

I'm trying to write a query which gives me the number of patient visits by age, gender and condition(Diabetes, Hypertension etc). Get the visit count for patients having diabetes and group by gender and patients who fall between the age range of 45-54. I used Inner Join to get only the rows which are present in both tables. I get the error:
age.Age is invalid in the select list because it is not contained in
either an aggregate function or the GROUP BY clause.
Do you think I should use partition by age.age?
TABLE_A
+------------+------------+------------+
| Member_Key | VisitCount | date |
+------------+------------+------------+
| 4000 | 1 | 2014-05-07 |
| 4000 | 1 | 2014-05-09 |
| 4001 | 2 | 2014-05-08 |
+------------+------------+------------+
TABLE_B
+------------+--------------+
| Member_Key | Condition |
+------------+--------------+
| 4000 | Diabetes |
| 4000 | Diabetes |
| 4001 | Hypertension |
+------------+--------------+
TABLE_C
+------------+---------------+------------+
| Member_Key | Member_Gender | Member_DOB |
+------------+---------------+------------+
| 4000 | M | 1970-05-21 |
| 4001 | F | 1968-02-19 |
+------------+---------------+------------+
Query
SELECT c.conditions,
age.gender,
CASE
WHEN age.age BETWEEN 45 AND 54
THEN SUM(act.visitcount)
END AS age_45_54_years
FROM table_a act
INNER JOIN
(
SELECT DISTINCT
member_key,
conditions
FROM table_b
) c ON c.member_key = act.member_key
INNER JOIN
(
SELECT DISTINCT
member_key,
member_gender,
DATEPART(year, '2017-10-16')-DATEPART(year, member_dob) AS Age
FROM [table_c]
) AS age ON age.member_key = c.member_key
GROUP BY c.conditions,
age.member_gender;
Expected Output
+--------------+--------+-------------+
| Condition | Gender | TotalVisits |
+--------------+--------+-------------+
| Diabetes | M | 2 |
| Hypertension | F | 2 |
+--------------+--------+-------------+

You can simplify your query filtering the age on the WHERE condition
And as Sean Lange said, use DATEDADD and GETDATE() to calculate the age more accurately.
SQL DEMO
SELECT [Condition],
[Member_Gender] as [Gender],
SUM([VisitCount]) as [VisitCount]
FROM TableA A
JOIN (SELECT DISTINCT [Member_Key], [Condition]
FROM TableB) B
ON A.[Member_Key] = B.[Member_Key]
JOIN TableC C
ON A.[Member_Key] = C.[Member_Key]
WHERE [Member_DOB] BETWEEN DATEADD(year, -50 , GETDATE())
AND DATEADD(year, -45 , GETDATE())
GROUP BY [Condition], [Member_Gender]
EDIT
Have to change the WHERE condition to solve the age precision and allow index use.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Compare table to itself and update one value based on another - bulk - sql

Related

Find the first order of a supplier in a day using SQL

Subtracting previous row value from current row

How to de-duplicate SQL table rows by multiple columns with hierarchy?

eSQL multiple join but with conditions

SQL fetch records using group by with 3 conditions

Categories

Resources