Bad performance when joining two sets based on a

Bad performance when joining two sets based on a - sql

To better illustrate my problem picture the following data set that has Rooms that contain a "range" of animals. To represent the range, each animal is assigned a sequence number in a separate table. There are different animal types and the sequence is "reset" for each of them.
Table A
RoomId
StartAnimal
EndAnimal
GroupType
1
Monkey
Bee
A
1
Lion
Buffalo
A
2
Ant
Frog
B
Table B
Animal
Sequence
Type
Monkey
1
A
Zebra
2
A
Bee
3
A
Turtle
4
A
Lion
5
A
Buffalo
6
A
Ant
1
B
Frog
2
B
Desired Output
Getting all the animals for each Room based on their Start-End entries, e.g.
RoomId
Animal
1
Monkey
1
Zebra
1
Bee
1
Lion
1
Buffalo
2
Ant
2
Frog
I have been able to get the desired output by first creating a view where the rooms have their start and end sequence numbers, and then Join them with the animal list comparing the ranges.
The problem is that this is performing poorly in my real data set where there are around 10k rooms and around 340k animals. Is there a different (better) way to go about this that I'm not seeing?
Example fiddle I'm working with: https://dbfiddle.uk/RnagCTf0
The query I tried is
WITH fullAnimals AS (
SELECT DISTINCT(RoomId), a.[Animal], ta.[GroupType], a.[sequence] s1, ae.[sequence] s2
FROM [TableA] ta
LEFT JOIN [TableB] a ON a.[Animal] = ta.[StartAnimal] AND a.[Type] = ta.[GroupType]
LEFT JOIN [TableB] ae ON ae.[Animal] = ta.[EndAnimal] AND ae.[Type] = a.[Type]
)
SELECT DISTINCT(r.Id), Name, b.[Animal], b.[Type]
FROM [TableB] b
LEFT JOIN fullAnimals ON (b.[Sequence] >= s1 AND b.[Sequence] <= s2)
INNER JOIN [Rooms] r ON (r.[Id] = fullAnimals.[RoomId]) --this is a third table that has more data from the rooms
WHERE b.[Type] = fullAnimals.[GroupType]
Thanks!

One option, to remove the aggregations, is to use the following joins:
between TableA and TableB, to gather "a.StartAnimal" id
between TableA and TableB, to gather "a.EndAnimal" id
between TableB and the previous two TableBs, to gather only the rows that have b.Sequence between the two values of "a.StartAnimal" id and "b.StartAnimal" id, on the matching "Type".
between Table A and Rooms, to gather room infos
SELECT r.*, b.Animal, b.Type
FROM TableA a
INNER JOIN TableB b1 ON a.StartAnimal = b1.Animal
INNER JOIN TableB b2 ON a.EndAnimal = b2.Animal
INNER JOIN TableB b ON b.Sequence BETWEEN b1.Sequence AND b2.Sequence
AND a.GroupType = b.Type
INNER JOIN Rooms r ON r.Id = a.roomId
Check the updated demo here.

Related

SQL/Presto: right join bigger than original table due to NULL

I need to right join 2 tables with 3 conditions but the resulting table is bigger than left or right table.
left_table a is like the following:
capacity value group_id level_id tags
100 3 a ab
120 5 a afb lala
122 4 b afg hhh
122 6 c adfg
right table b is like the following: bigger than left table
user group_id level_id tags
adsf a ab
af a abf df
sf a afb lala
dsf b afg hhh
sdf c adfg
I want to append the value and capacity value to the right table b. I have used the following query but the resulting table is larger than the right table. I noticed that it is due to the NULL in tags in both the right and left tables, but i am wondering how to resolve this issue.
select a.capacity, a.value, b.*
from a
right join b
on a.group_id = b._group_id
and a.level_id = b.level_id
and a.tags = b.tags

I noticed that it is due to the NULL in tags in both the right and left tables
No, this is not the cause of duplicates. In fact NULL values fail the comparison, so you will not get a match at all if either value is NULL. That is, the row in b will be returned with NULL values for the columns from a.
If you want NULL values to match as being equal, then you need a NULL-safe comparison -- and Presto supports the SQL Standard is not distinct from. I also strongly prefer left join over right join:
select a.capacity, a.value, b.*
from b left join
a
on a.group_id = b._group_id and
a.level_id = b.level_id and
a.tags is not distinct from b.tags;
If you are getting duplicates, it is because you have duplicates in a. You can check for this using:
select group_id, level_id, tags, count(*)
from a
group by group_id, level_id, tags
having count(*) >= 2;

SQL Group rows in left join into one

i try create query with left join when i combine multiple rows into one. I try with GROUP_CONTENT function but when i try use it my db server is going down. I use MariaDB 10.3.17. I have tables like:
Games:
game_id game_name
1 Test
2 Stack
3 Other
data_developers:
dev_id dev_name
1 Electronic Arts
2 BioWare
3 2K Games
game_developers
developer_id game_id
1 1
2 1
2 3
Result i want:
game_id game_name devs
1 Test Electonics Arts, BioWare
2 Stack 2K Games
my two sql (but didnt work)
SELECT games.*, GROUP_CONCAT(data_developers.dev_name)
FROM games
LEFT JOIN game_developers ON game_developers.game_id = games.game_id
LEFT JOIN data_developers ON data_developers.dev_id = game_developers.dev_id
LIMIT 500
and second query
SELECT games.*
FROM games
LEFT JOIN game_developers ON game_developers.game_id = games.game_id
LEFT JOIN
(SELECT GROUP_CONCAT(data_developers.developer_name) as developers,
data_developers.developer_id FROM data_developers) x
ON x.developer_id = game_developers.developer_id
But of course, also dont work :(

Your query should be something like below-
SELECT A.game_id,B.Game_name,GROUP_CONCAT(C.dev_name)
FROM game_developers A
INNER JOIN Games B ON A.game_id = B.game_id
INNER JOIN data_developers C ON A.developer_id = C.dev_id
GROUP BY A.game_id,B.Game_name

pad database out with NULL criteria

If I have the following sample table (order by ID)
ID Date Type
-- ---- ----
1 01/01/2000 A
2 22/04/1995 A
2 14/02/2001 B
Where you can immediate see that ID=1 does not have a Type=B, but ID=2 does. What I want to do, if fill in a line to show this:
ID Date Type
-- ---- ----
1 01/01/2000 A
1 NULL B
2 22/04/1995 A
2 14/02/2001 B
where there could potentially be 100's of different types, (so may need to end up inserting 100's rows per person if they lack 100's Types!)
Is there a general solution to do this?
Could I possibly outer join the table on itself and do it that way?

You can do this with a cross join to generate all the rows and a left join to get the actual data values:
select i.id, s.date, t.type
from (select distinct id from sample) i cross join
(select distinct type from sample) t left join
sample s
on s.id = i.id and
s.type = t.type;

SQL query construction: checking if query result is subset of another

Hi Guys I have a table relation which works like this (legacy)
A has many B and B has many C;
A has many C as well
Now I am having trouble coming up with a SQL which will help me to get all B (Id of B to make it simple) mapped to certain A(by Id) AND any B which has a collection of C that's a subset of Cs of that A.
I have failed to come up with a decent sql specially for the second part and was wondering if I can get any tips / suggestions re how I can do that.
Thanks
EDIT:
Table A
Id |..
------------
1 |..
Table B
Id |..
--------------
2 |..
Table A_B_rel
A_id | B_id
-----------------
1 | 2
C is a strange table. The data of C (single column) is actually just duped in 2 rel table for A and B. so its like this
Table B_C_Table
B_Id| C_Value
-----------------
2 | 'Somevalue'
Table A_C_Table
A_Id| C_Value
-------------
1 | 'SomeValue'
So I am looking for Bs the C_Values of which are subset of certain A_C_Values.

Yes, the second part of your problem is a bit tricky. We've got B_C_Table on the one hand, and a subset of A_C_Table where A_ID is a specific ID, on the other.
Now, if we use an outer join, we'll be able to see which rows in B_C_Table have no match in A_C_Table:
SELECT *
FROM B_C_Table bc
LEFT JOIN A_C_Table ac ON bc.C_Value = ac.C_Value AND ac.A_ID = #A_ID
Note that it is important to put the ac.A_ID = #A_ID into the ON clause rather than into WHERE, because in the latter case we would be filtering out non-matching rows of #A_ID, which is not what we want.
The next step (to achieving the final query) would be to group rows by B and count rows. Now, we will calculate both the total number of rows and the number of matching rows.
SELECT
bc.B_ID,
COUNT(*) AS TotalCount,
COUNT(ac.A_ID) AS MatchCount
FROM B_C_Table bc
LEFT JOIN A_C_Table ac ON bc.C_Value = ac.C_Value AND ac.A_ID = #A_ID
GROUP BY bc.B_ID
As you can see, to count matches, we simply count ac.A_ID values: in case of no match the corresponding column will be NULL and thus not counted. And if indeed some rows in B_C_Table do not match any rows in the subset of A_C_Table, we will see different values of TotalCount and MatchCount.
And that logically leads us towards the final step: comparing those counts. (For, obviously, if we can obtain values, we can also compare them.) But not in the WHERE clause, of course, because aggregate functions aren't allowed in WHERE. It's the HAVING clause that is used to compare values of grouped rows, including aggregated values too. So...
SELECT
bc.B_ID,
COUNT(*) AS TotalCount,
COUNT(ac.A_ID) AS MatchCount
FROM B_C_Table bc
LEFT JOIN A_C_Table ac ON bc.C_Value = ac.C_Value AND ac.A_ID = #A_ID
GROUP BY bc.B_ID
HAVING COUNT(*) = COUNT(ac.A_ID)
The count values aren't really needed, of course, and when you drop them you will be able to UNION the above query with the one selecting B_ID from A_B_rel:
SELECT B_ID
FROM A_B_rel
WHERE A_ID = #A_ID
UNION
SELECT bc.B_ID
FROM B_C_Table bc
LEFT JOIN A_C_Table ac ON bc.C_Value = ac.C_Value AND ac.A_ID = #A_ID
GROUP BY bc.B_ID
HAVING COUNT(*) = COUNT(ac.A_ID)

Sounds like you need to think in terms of double negation, i.e. there should not exist any B_C that does not have a matching A_C (and I'm guessing there should be at least one B_C).
So, try something like
select B.B_id
from Table_B B
where exists (select 1 from B_C_Table BC
where BC.B_id = B.B_id)
and not exists (select 1 from B_C_Table BC
where BC.B_id = B.B_id
and not exists(select 1 from B_C_Table AC
join A_B_Rel ABR on AC.A_id = ABR.A_id
where ABR.B_id = B.B_id
and BC.C_Value = AC.C_Value))

Perhaps this is what you're looking for:
SELECT B_id
FROM A_B_rel
WHERE A_id = <A ID>
UNION
SELECT a.B_Id
FROM B_C_Table a
LEFT JOIN A_C_Table b ON a.C_Value = b.C_Value AND b.A_Id = <A ID>
GROUP BY a.B_Id
HAVING COUNT(CASE WHEN b.A_Id IS NULL THEN 1 END) = 0
The first SELECT gets all B's which are mapped to a particular A (<A ID> being the input parameter for the A ID), then we tack onto that result set any additional B's whose entire set of C_Value's are within the subset of the C_Value's of the particular A (again, <A ID> being the input parameter).

Left Outer Join with one result per match and "priority" of match set by a different field in match SQL Server 2005

I am trying to get a single result (PHONE) per match (CONTACT_ID) in a Left Outer Join. I imagine that there is a way to accomplish this with the preference (or order) being set by another column/field- the phone type (TYPE), but I haven't been able to figure it out. Below is a list of facts to help better explain what I am trying to accomplish and then following is an example Table A and B with the desired result. I've looked at min() and group by, but I don't know how to make those work here. As a side note, after this is working, I will be joining it to more tables to the left of it in a simpler fashion.
The student can have an unlimited number of CONTACT_ID.
A contact does not always have all phone types.
The preferred order of phone types (TYPE) is C,H,W (which, fortunately, happens to be alphabetical)
ignore match and go to the next in priority if PHONE is null
TableA:
STUDENT_ID CONTACT_ID
---------- ----------
X 1
X 2
Y 3
Y 4
TableB:
CONTACT_ID TYPE PHONE
---------- ---- -----
1 H 21
1 C
1 W 44
2 H 78
2 C 92
2 W 11
Desired Result:
STUDENT_ID CONTACT_ID TYPE PHONE
---------- ---------- ---- -----
X 1 H 21
X 2 C 92
Y 3
Y 4
Here is the query that I have that will make a join with all phone matches (minus all of my crazy attempts at getting what I want).
SELECT *
FROM Table TableA T1
LEFT OUTER JOIN TableB T2 ON T1.CONTACT_ID = T2.CONTACT_ID
All help greatly appreciated!
Edited code from Stefan Onofrei's solution:
(results in some duplicate entries)
SELECT
T1.STUDENT_ID,
T1.CONTACT_ID,
T2.PHONE_TYPE,
T3.PHONE
FROM REG_STU_CONTACT T1
INNER JOIN
(SELECT MIN(PHONE_TYPE) AS PHONE_TYPE, CONTACT_ID
FROM REG_CONTACT_PHONE
WHERE PHONE IS NOT NULL
GROUP BY CONTACT_ID) T2 ON T1.CONTACT_ID = T2.CONTACT_ID
INNER JOIN REG_CONTACT_PHONE T3 ON T2.CONTACT_ID = T3.CONTACT_ID AND T2.PHONE_TYPE = T3.PHONE_TYPE
ORDER BY T1.STUDENT_ID

Select A.STUDENT_ID A.CONTACT_ID B.TYPE c.PHONE
from TableA A
inner join
(select MIN(type ) as type, Contact_ID
from Tableb
where phone is not null
group by contactid) B
on A.contactid = b.contactid
inner join Tableb C
on B.contactid = c.conatctid and b.type = c.type

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Bad performance when joining two sets based on a - sql

Related

SQL/Presto: right join bigger than original table due to NULL

SQL Group rows in left join into one

pad database out with NULL criteria

SQL query construction: checking if query result is subset of another

Left Outer Join with one result per match and "priority" of match set by a different field in match SQL Server 2005

Categories

Resources