Filtering Join in Oracle DB - sql

Problem:
Each KEY in Table A should have one RF record and one SJ record however I have some duplicated SJ records.
Objective:
I wish to use the SJ records in Table B to identify which SJ record in Table A to keep.
Info:
Table A and Table B share a KEY and SEQ_NBR field.
Inputs:
Table A looks as follows
KEY ID_TYPE SEQ_NBR BUS_NAME
1234 RF 1 COMP_A
1234 SJ 2 COMP_B
1234 SJ 4 COMP_C
5678 RF 1 COMP_L
5678 SJ 2 COMP_M
5678 SJ 3 COMP_N
Table B looks as follows
KEY SEQ_NBR BUS_NAME
1234 2 COMP_B
5678 3 COMP_N
Desired Outcome:
My output would look as follows
KEY ID_TYPE SEQ_NBR BUS_NAME
1234 RF 1 COMP_A
1234 SJ 2 COMP_B
5678 RF 1 COMP_L
5678 SJ 3 COMP_N

Here is one way:
select key, id_type, seq_nbr, bus_name
from (
select a.*,
row_number() over (partition by a.key, a.id_type
order by b.key) as rn
from a left outer join b on a.key = b.key and a.seq_nbr = b.seq_nbr
)
where rn = 1
;
The left outer join adds columns from table b to those of table a. We need that for a single purpose: as we partition by key and id_type, we have partitions of either a single row or (two or more) rows. In the latter case, only one row has a non-null value in b.key. If we order by b.key, the row with non-null b.key will get row number = 1 (and we don't care about the rest).
Then the outer query simply keeps all the rows with row number = 1 and ignores the rest.
An alternative solution, using the union all of the two tables (slightly modified as needed) and basic aggregation using the last aggregate function:
select key, id_type,
min(seq_nbr) keep (dense_rank last order by source) as seq_nbr,
min(bus_name) keep (dense_rank last order by source) as bus_name
from (
select 'A' as source, a.* from a
union all
select 'B', key, 'SJ', seq_nbr, bus_name from b
)
group by key, id_type
;
You can test both to see which is more efficient on your data (if performance is important).

Here goes your code:
select * from tablea a
where exists
(select 1 from tableb b where b.key=a.key and b.seq_nbr=a.seq_nbr)
or not exists (select tablea.id_type from tablea inner join tableb on tablea.key=tableb.key and tablea.SEQ_NBR=tableb.SEQ_NBR and tablea.id_type=a.id_type)

If I understand correctly, you can count the number of duplicates. Then use left join and filter based on both the count and the match:
select a.*
from (select a.*,
count(*) over (partition by key, id_type) as cnt
from a
) a left join
b
on b.key = a.key and
b.seq_nbr = a.seq_nbr and
b.bus_name = a.bus_name
where cnt = 1 or b.key is not null;

Related

Optimize a complex PostgreSQL Query

I am attempting to make a complex SQL join on several tables: as shown below. I have included an image of the dB schema also.
Consider table_1 -
e_id name
1 a
2 b
3 c
4 d
and table_2 -
e_id date
1 1/1/2019
1 1/1/2020
2 2/1/2019
4 2/1/2019
The issue here is performance. From the tables 2 - 4 we only want the most recent entry for a given e_id but because these tables contain historical data (~ >3.5M rows) it's quite slow. I've attached an example of how we're currently trying to achieve this but it only includes one join of 'table_1' with 'table_x'. We group by e_id and get the max date for it. The other way we've thought about doing this is creating a Materialized View and pulling data from that and refreshing it after some period of time. Any improvements welcome.
from fds.region as rg
inner join (
select e_id, name, p_id
from fds.table_1
where sec_type = 'S' AND active_flag = 1
) as table_1 on table_1.e_id = rg.e_id
inner join fds.table_2 table_2 on table_2.e_id = rg.e_id
inner join fds.sec sec on sec.p_id = table_1.p_id
inner join fds.entity ent on ent.int_entity_id = sec.int_entity_id
inner join (
SELECT int_1.e_id, int_1.date, int_1.int_price
FROM fds.table_4 int_1
INNER JOIN (
SELECT e_id, MAX(date) date
FROM fds.table_2
GROUP BY e_id
) int_2 ON int_1.e_id = int_2.fsym_id AND int_1.date = int_2.date
) as table_4 on table_4.e_id = rg.e_id
where rg.region_str like '%US' and ent.sec_type = 'P'
order by table_2.int_price
limit 500;
You can simplify this logic:
(
SELECT int_1.e_id, int_1.date, int_1.int_price
FROM fds.table_4 int_1
INNER JOIN (
SELECT e_id, MAX(date) date
FROM fds.table_2
GROUP BY e_id
) int_2 ON int_1.e_id = int_2.fsym_id AND int_1.date = int_2.date
) as table_4
To:
(SELECT DISTINCT ON (int_1.e_id) int_1.*
FROM fds.table_4 int_1
ORDER BY int_1.e_id, int_1.date DESC
) table_4
This can take advantage of an index on fds.table_4(e_id, date desc) -- and might be wicked fast with such an index.
You also want appropriate indexes for the joins and filtering. However, it is hard to be more specific without an execution plan.

Finding max of a column while doing inner join of two tables

I have two tables as follows:
Table A
=====================
student_id test_week
-------- ---------
s1 2018-12-01
s1 2018-12-08
Table B
======================
student_id last_updated remarks
-------- ------------ --------
s1 2018-12-06 Fail
s1 2018-12-10 Pass
Above two tables, I want to fetch following columns:
student_id, last(test_week) and remarks such that
last_updated>=test_week -1 and last_updated<=test_week-15,
i.e. last_updated should be within two weeks of last(test_week), so following will be the result for above entries:
s1 2018-12-08 Pass
I have written like following:
select a.student_id, test_week, remarks
from A inner join B
on A.student_id = B.student_id
and DATEDIFF(last_updated, test_week)>=1
and DATEDIFF(last_updated, test_week)<=15;
But how I will handle the last(test_week), that I am not getting.
If you need the only record related to the last test_week then you can do the following. If I understood this right.
select top 1 a.student_id, test_week, remarks
from A inner join B
on A.student_id = B.student_id
and DATEDIFF(last_updated, test_week)>=1
and DATEDIFF(last_updated, test_week)<=15
order by last_week desc;
You can try to use window function row_number(). The following query will give the max(test_week) for every student_id.
select * from (
select id, test_week, remarks, row_number()
over (partition by id order by test_week desc) as rn
from (
select a.id, test_week, remarks from A join B on A.id = B.id and last_updated - test_week >=1 and last_updated - test_week <=15)tb1
)tb2 where rn=1;
Note : The above query is supported in postgresql, you might want to convert it into equivalent Mysql query

Grouping the data and showing 1 row per group in postgres

I have two tables which look like this :-
Component Table
Revision Table
I want to get the name,model_id,rev_id from this table such that the result set has the data like shown below :-
name model_id rev_id created_at
ABC 1234 2 23456
ABC 5678 2 10001
XYZ 4567
Here the data is grouped by name,model_id and only 1 data for each group is shown which has the highest value of created_at.
I am using the below query but it is giving me incorrect result.
SELECT cm.name,cm.model_id,r.created_at from dummy.component cm
left join dummy.revision r on cm.model_id=r.model_id
group by cm.name,cm.model_id,r.created_at
ORDER BY cm.name asc,
r.created_at DESC;
Result :-
Anyone's help will be highly appreciated.
use max and sub-query
select T1.name,T1.model_id,r.rev_id,T1.created_at from
(
select cm.name,
cm.model_id,
MAX(r.created_at) As created_at from dummy.component cm
left join dummy.revision r on cm.model_id=r.model_id
group by cm.name,cm.model_id
) T1
left join revision r
on T1.created_at =r.created_at
http://www.sqlfiddle.com/#!17/68cb5/4
name model_id rev_id created_at
ABC 1234 2 23456
ABC 5678 2 10001
xyz 4567
In your SELECT you're missing rev_id
Try this:
SELECT
cm.name,
cm.model_id,
MAX(r.rev_id) AS rev_id,
MAX(r.created_at) As created_at
from dummy.component cm
left join dummy.revision r on cm.model_id=r.model_id
group by 1,2
ORDER BY cm.name asc,
r.created_at DESC;
What you were missing is the statement to say you only want the max record from the join table. So you need to join records, but the join will bring in all records from table r. If you group by the 2 columns in component, then select the max from r, on the id and created date, it'll only pick the top out the available to join
I would use distinct on:
select distinct on (m.id) m.id, m.name, r.rev_id, r.created_at
from model m left join
revision r
on m.model_id = r.model_id
order by m.id, r.rev_id;

Combine rows from Mulitple tables into single table

I have one parent table Products with multiple child tables -Hoses,Steeltubes,ElectricCables,FiberOptics.
ProductId -Primary key field in Product table
ProductId- ForeignKey field in Hoses,Steeltubes,ElectricCables,FiberOptics.
Product table has 1 to many relationship with Child tables
I want to combine result of all tables .
For eg - Product P1 has PK field ProductId which is used in all child tables as FK.
If Hoses table has 4 record with ProductId 50 and Steeltubes table has 2 records with ProductId 50 when I perform left join then left join is doing cartesian product of records showing 8 record as result But it should be 4 records .
;with HOSESTEELCTE
as
(
select '' as ModeType, '' as FiberOpticQty , '' as NumberFibers, '' as FiberLength, '' as CableType , '' as Conductorsize , '' as Voltage,'' as ElecticCableLength , s.TubeMaterial , s.TubeQty, s.TubeID , s.WallThickness , s.DWP ,s.Length as SteelLength , h.HoseSeries, h.HoseLength ,h.ProductId
from Hoses h
left join
(
--'' as HoseSeries,'' as HoseLength ,
select TubeMaterial , TubeQty, TubeID , WallThickness , DWP , Length,ProductId from SteelTubes
) s on (s.ProductId = h.ProductId)
) select * from HOSESTEELCTE
Assuming there are no relationships between child tables and you simply want a list of all child entities which make up a product you could generate a cte which has a number of rows which are equal to the largest number of entries across all the child tables for a product. In the example below I have used a dates table to simplify the example.
so for this data
create table products(pid int);
insert into products values
(1),(2);
create table hoses (pid int,descr varchar(2));
insert into hoses values (1,'h1'),(1,'h2'),(1,'h3'),(1,'h4');
create table steeltubes (pid int,descr varchar(2));
insert into steeltubes values (1,'t1'),(1,'t2');
create table electriccables(pid int,descr varchar(2));
truncate table electriccables
insert into electriccables values (1,'e1'),(1,'e2'),(1,'e3'),(2,'e1');
this cte
;with cte as
(select row_number() over(partition by p.pid order by datekey) rn, p.pid
from dimdate, products p
where datekey < 20050105)
select * from cte
create a cartesian join (one of the rare ocassions where an implicit join helps) pid to rn
result
rn pid
-------------------- -----------
1 1
2 1
3 1
4 1
1 2
2 2
3 2
4 2
And if we add the child tables
;with cte as
(select row_number() over(partition by p.pid order by datekey) rn, p.pid
from dimdate, products p
where datekey < 20050106)
select c.pid,h.descr hoses,s.descr steeltubes,e.descr electriccables from cte c
left join (select h.*, row_number() over(order by h.pid) rn from hoses h) h on h.rn = c.rn and h.pid = c.pid
left join (select s.*, row_number() over(order by s.pid) rn from steeltubes s) s on s.rn = c.rn and s.pid = c.pid
left join (select e.*, row_number() over(order by e.pid) rn from electriccables e) e on e.rn = c.rn and e.pid = c.pid
where h.rn is not null or s.rn is not null or e.rn is not null
order by c.pid,c.rn
we get this
pid hoses steeltubes electriccables
----------- ----- ---------- --------------
1 h1 t1 e1
1 h2 t2 e2
1 h3 NULL e3
1 h4 NULL NULL
2 NULL NULL e1
In fact, the result having 8 rows can be expected to be the result, since your four records are joined with the first record in the other table and then your four records are joined with the second record of the other table, making it 4 + 4 = 8.
The very fact that you expect 4 records to be in the result instead of 8 shows that you want to use some kind of grouping. You can group your inner query issued for SteelTubes by ProductId, but then you will need to use aggregate functions for the other columns. Since you have only explained the structure of the desired output, but not the semantics, I am not able with my current knowledge about your problem to determine what aggregations you need.
Once you find out the answer for the first table, you will be able to easily add the other tables into the selection as well, but in case of large data you might get some scaling problems, so you might want to have a table where you store these groups, maintain it when something changes and use it for these selections.

Select count from different tables with a field in common

So here is my query:
SELECT COUNT( tab1.id_z ) AS Count, tab_tot.name
FROM tab1
INNER JOIN tab_id ON (tab1.id_key = tab_id.id_key)
INNER JOIN tab_tot ON (tab_id.id_z = tab_tot.id_z)
WHERE tab1.id_c = 10888 GROUP BY tab_id.id_z
In table tab_id there are 6 records with the same id_z but i receive 1 in Count.
How can I do?
Edit - this is my schema:
tab1
id_key | id_c
tab_id
id_key |id_z
tab_tot
id_z | description
Though I have more records in tab_id, I "count" ever 1