How to avoid duplicates when finding row changes? - sql

I am working to extract data when a column changes between user IDs in a single table. I am able to pull the change as well as the previous row (ID) using a Select + Union query. For the previous row, I am getting more than one due to the parameters. Looking for suggestions on how to only retrieve a single previous row (ID). The query below is trying to retrieve a single row
| ID | Year | Event | ActivityDate | UserID
| 1 | 2020 | A | 2020-12-01 | xxx
| 1 | 2021 | A | 2021-03-01 | xyz
| 2 | 2020 | B | 2021-01-01 | xxx
| 1 | 2022 | C | 2021-10-01 | yyy
| 3 | 2021 | D | 2021-12-01 | xxx
Select d.ID, d.Year, d.Event, d.ActivityDate, d.UserID
from tableA d
where
d.year in ('2020','2021','2022')
and d.event <>
(select f.event
from tableA f
where
f.year in ('2020','2021','2022')
and d.id = f.id
and d.activityDate < f.activityDate
order by f.activityDate desc
fetch first 1 row only
)
;
I was hoping to retrieve the following
1, 2021, A, 2021-03-01, xyz
But I got
1, 2020, a, 2020-12-01, xxx
1, 2021, a, 2021-03-01, xyz

I think analytic functions will help you to your answer.
The row_number() will get you the last row in a series of duplicates.
The count(id) will allow you to limit yourself to combinations that have more than one row.
WITH
aset
AS
(SELECT d.id
, d.year
, d.event
, d.activitydate
, d.userid
, ROW_NUMBER ()
OVER (PARTITION BY id, event ORDER BY year DESC) AS rn
, COUNT (id) OVER (PARTITION BY id, event) AS n
FROM tablea d)
SELECT *
FROM aset
WHERE rn = 1 AND n > 1;

Related

SQL select row containing all of the values in interval

I know the question is poorly worded, I'm sorry, I can't really put this problem into words. Here is a representation:
I have two tables: product and availability. A product can have multiple dates when it's available. Example:
Table 1 (products):
id | name | ....
----------------------------------
1 | My product 1 | ....
2 | My product 2 | ....
Table 2 (availability):
id | productId | date
-----------------------------------------
1 | 1 | 2021-01-15
2 | 1 | 2021-01-16
3 | 1 | 2021-01-17
4 | 2 | 2021-01-15
5 | 2 | 2021-01-16
Is there an sql statement that, given an interval, allows us to fetch a list of products having a row in the availabilty table for each element of the interval?
For example, given the interval [2021-01-15 -> 2021-01-17], the request should return product 1 because it's available during the entire period (it has a row for each element: the 15th, 16th and 17th). Product2 isn't returned because it's not available on 2021-01-17.
Is there a way to do this in SQL or do I have to use PL/SQL?
Any help is appreciated,
Thanks
You can use analytical function as follows:
select p.* from
(select p.*, count(distinct a.date) over (partition by a.productid) as cnt
from products p
join availability a on a.productid = p.id
where a.date >= date '201-01-15'
and a.date < date '201-01-17' + 1 )
where cnt = date '201-01-17' - date '201-01-15' + 1
Finally, came up with this, thanks #Popeye for the inspiration.
select occurence.pid from
(
select a.product_id as pid, count(distinct a.date::date) as cnt
from availability a
where a.date >= '2021-01-15'
and a.date < '2021-01-17'::date + 1
group by a.product_id
) as occurence
where cnt = '2021-01-17'::date - '2021-01-15'::date + 1;

Counting current items by month

I'm trying to build a monthly tally of active equipment, grouped by service area from a database log table. I think I'm 90% of the way there; I have a list of months, along with the total number of items that existed, and grouped by region.
However, I also need to know the state of each item as they were on the first of each month, and this is the part I'm stuck on. For instance, Item 1 is in region A in January, but moves to Region B in February. Item 2 is marked as 'inactive' in February, so shouldn't be counted. My existing query will always count item 1 in region A, and item 2 as 'active'.
I can correctly show that Item 3 is deleted in March, and Item 4 doesn't show up until the April count. I realize that I'm getting the first values because my query is specifying the min date, I'm just not sure how I need to change it to get what I want.
I think I'm looking for a way to group by Max(OperationDate) for each Month.
The Table looks like this:
| EQUIPID | EQUIPNAME | EQUIPACTIVE | DISTRICT | REGION | OPERATIONDATE | OPERATION |
|---------|-----------|-------------|----------|--------|----------------------|-----------|
| 1 | Item 1 | 1 | 1 | A | 2015-01-01T00:00:00Z | INS |
| 2 | Item 2 | 1 | 1 | A | 2015-01-01T00:00:00Z | INS |
| 3 | Item 3 | 1 | 1 | A | 2015-01-01T00:00:00Z | INS |
| 2 | Item 2 | 0 | 1 | A | 2015-02-10T00:00:00Z | UPD |
| 1 | Item 1 | 1 | 1 | B | 2015-02-15T00:00:00Z | UPD |
| 3 | (null) | (null) | (null) | (null) | 2015-02-21T00:00:00Z | DEL |
| 1 | Item 1 | 1 | 1 | A | 2015-03-01T00:00:00Z | UPD |
| 4 | Item 4 | 1 | 1 | B | 2015-03-10T00:00:00Z | INS |
There is also a subtable that holds attributes that I care about. It's structure is similar. Unfortunately, due to previous design decisions, there is no correlation to operations between the two tables. Any joins will need to be done using the EquipmentID, and have the overlapping states matched up for each date.
Current query:
--cte to build date list
WITH calendar (dt) AS
(SELECT &fromdate from dual
UNION ALL
SELECT Add_Months(dt,1)
FROM calendar
WHERE dt < &todate)
SELECT dt, a.district, a.region, count(*)
FROM
(SELECT EQUIPID, DISTRICT, REGION, OPERATION, MIN(OPERATIONDATE ) AS FirstOp, deleted.deldate
FROM Equipment_Log
LEFT JOIN
(SELECT EQUIPID,MAX(OPERATIONDATE) as DelDate
FROM Equipment_Log
WHERE OPERATION = 'DEL'
GROUP BY EQUIPID
) Deleted
ON Equipment_Log.EQUIPID = Deleted.EQUIPID
WHERE OPERATION <> 'DEL' --AND additional unimportant filters
GROUP BY EQUIPID,DISTRICT, REGION , OPERATION, deldate
) a
INNER JOIN calendar
ON (calendar.dt >= FirstOp AND calendar.dt < deldate)
OR (calendar.dt >= FirstOp AND deldate is null)
LEFT JOIN
( SELECT EQUIPID, MAX(OPERATIONDATE) as latestop
FROM SpecialEquip_Table_Log
--where SpecialEquip filters
group by EQUIPID
) SpecialEquip
ON a.EQUIPID = SpecialEquip.EQUIPID and calendar.dt >= SpecialEquip.latestop
GROUP BY dt, district, region
ORDER BY dt, district, region
Take only last operation for each id. This is what row_number() and where rn = 1 do.
We have calendar and data. Make partitioned join.
I assumed that you need to fill values for months where entries for id are missing. So nvl(lag() ignore nulls) are needed, because if something appeared in January it still exists in Feb, March and we need district, region values from last not empty row.
Now you have everything to make count. That part where you mentioned SpecialEquip_Table_Log is up to you, because you left-joined this table and not used it later, so what is it for? Join if you need it, you have id.
db<>fiddle
with
calendar(mth) as (
select date '2015-01-01' from dual union all
select add_months(mth, 1) from calendar where mth < date '2015-05-01'),
data as (
select id, dis, reg, dt, op, act
from (
select equipid id, district dis, region reg,
to_char(operationdate, 'yyyy-mm') dt,
row_number()
over (partition by equipid, trunc(operationdate, 'month')
order by operationdate desc) rn,
operation op, nvl(equipactive, 0) act
from t)
where rn = 1 )
select mth, dis, reg, sum(act) cnt
from (
select id, mth,
nvl(dis, lag(dis) ignore nulls over (partition by id order by mth)) dis,
nvl(reg, lag(reg) ignore nulls over (partition by id order by mth)) reg,
nvl(act, lag(act) ignore nulls over (partition by id order by mth)) act
from calendar
left join data partition by (id) on dt = to_char(mth, 'yyyy-mm') )
group by mth, dis, reg
having sum(act) > 0
order by mth, dis, reg
It may seem complicated, so please run subqueries separately at first to see what is going on. And test :) Hope this helps.

SQL Match group of records to another group of records

Is there SQL statement to match up multiple records to an exact match of multiple records in another table?
Lets say I have table A
ID | List# | Item
1 | 5 | A
2 | 5 | C
3 | 5 | B
4 | 6 | A
5 | 6 | D
*I purposely made Items 'ABC' out of order as the order of the records I receive may be out of order.
Table B
ID | Group | Item
1 | AAA | A
2 | AAA | B
3 | AAA | C
4 | AAA | D
5 | BBB | A
6 | BBB | B
7 | BBB | C
8 | DDD | A
If looking at the first table, I would want List# 5 to return a match only for group 'BBB', as all (and only) three records match.
The simplest way is to aggregate into a string or array and join. Standard SQL supports listagg(), so you can do:
select a.list, b.list, a.items
from (select a.list, listagg(item, ',') within group (order by item) as items
from a
group by a.list
) a join
(select b.list, listagg(item, ',') within group (order by item) as items
from b
group by b.list
) b
on a.items = b.items;
Not all databases support listagg(). Many -- but not all -- have similar functionality. This is simpler than the "standard" SQL approach.
You can simulate database division. It's a little bit cumbersome but here it is:
with
x as (
select
from a
where a.list = 5
),
y as (
select grp, count(*) as cnt
from b
join x on x.item = b.item
group by grp
)
select grp
from y
where cnt = (select count(*) from x)

Join table A on table B and select only the first occurrence from B after specific date from table A

I'm trying to determine the best way to do the following.... Table a has a specific start_date. table b has a bunch of dollar amounts with various dates based on payments received and when. I only want to show the row from table b with the first date occurrence >= the start_date from table a. I also do not want to retrieve duplicates ID numbers which is what I am encountering now.
I have something like this so far...
Select a.ID, a.Start_Date
From a
Left Join (Select ID, Min(Recd_Dt) as Mindate, Total_Recd
Group by ID, Total_Recd) b on a.ID = b.ID and a.Start_Date <= b.Mindate
table a looks like this...
ID | Start_Dt
1 | 11/2/2017
2 | 11/3/2017
table b looks like this...
ID | Recd_Dt | Total_Recd
1 | 11/1/2017 | $600
1 | 11/10/2017 | $800
1 | 11/19/2017 | $100
2 | 11/2/2017 | $200
2 | 11/5/2017 | $600
2 | 11/6/2017 | $100
Id Like to see something like this...
ID | Recd_Dt | Total_Recd | Sum_of_Total_Recd_After_Start
1 | 11/10/2017 | $800 | $900
2 | 11/5/2017 | $600 | $700
furthermore, I'd like to also have a second join on the same table b that will give me a sum of any amount that occurred after the Start_Date
Give this a try:
SELECT
a.ID,
b.Recd_Dt,
b.Total_Recd,
SUM(Total_Recd) OVER(PARTITION BY a.ID) AS Sum_of_Total_Recd_After_Start
FROM a
INNER JOIN b ON a.ID = b.ID AND b.Recd_Dt > a.Start_Dt
QUALIFY ROW_NUMBER() OVER(PARTITION BY a.ID ORDER BY b.Start_Dt) = 1
1) Get all rows from table "a"
2) Get related rows from table "b" with Recd_Dt > Start_Dt
3) ROW_NUMBER orders rows by the earliest Start_Dt per each ID
4) QUALIFY ... = 1 keeps only the first row per ID grouping
5) SUM(Total_Recd) adds up the Total_Recd column per each ID grouping
I haven't tested it, but let me know if it works.

Postgres: Deleting rows that are duplicated in one column based on the conditions of another column

I have a PostgreSQL table that stores user details called users as shown below
ID | user name | item | dos | Charge|
1 | Ed | 32 |01-02-1987| 1 |
2 | Taya | 01 |05-07-1981|-1 |
3 | Damian | 32 |22-19-1990| 1 |
2 | Taya | 01 |05-07-1981| 1 |
2 | Taya | 01 |05-07-1981| 1 |
1 | Ed | 32 |01-02-1987|-1 |
I want to delete rows where they are same across id, username, item and dos & sum of charges is 0. This means both row 1 and row 6 for ed gets deleted.
With more than 2 occurences, if the sum of charge is 1, i want one of the row with charge -1 and 1 deleted which means one row with charge 1 will be retained. For eg: ROw 2 and Row for Taya will be deleted.
The output table that i am after is:
ID | user name | item | dos | Charge|
3 | Damian | 32 |22-19-1990| 1 |
2 | Taya | 01 |05-07-1981| 1 |
Any ideas?
You want the having clause:
This will get you the output you want:
select
id, user_name, item, dos, sum (charge)
from table
group by
id, user_name, item, dos
having
sum (charge) != 0
If you're really trying to delete the records that make it zero:
delete from table
where (id, user_name, item, dos) in (
select id, user_name, item, dos
from table
group by id, user_name, item, dos
having sum (charge) = 0
)
This does the same thing, and is quite a bit more code, but because it's using a semi-join it might be better for really large datasets:
with delete_me as (
select id, user_name, item, dos
from table
group by id, user_name, item, dos
having sum (charge) = 0
)
delete from table t
where exists (
select null
from delete_me d
where
t.id = d.id and
t.user_name = d.user_name and
t.item = d.item and
t.dos = d.dos
)