Problem with JOINs on selection of related data - sql

I have three entity types (let's call them A, B, C) persisted on three tables of my database.
Each entity type has a relation with the other two entities. Relations are persisted in three tables of the DB as well (let's call them AB, AC, BC), where every record is a couple of IDs of the respective entities.
Relations A-B are one-to-many and are mandatory: every A has at least one relation with a B, every B has a relation with an A.
Relations A-C and B-C are many-to-many and are optional: there can be As without relation with Cs, there can be Bs without relations with Cs.
I cannot change this schema.
I need to build a table with all the As and their related data. Every row of the table must contain only related data or NULLs where there are no relations.
I thought I would be fine with something like:
SELECT * -- let's omit columns for simplicity
FROM AB
LEFT JOIN BC ON BC.IdB = AB.IdB
LEFT JOIN AC ON AC.IdA = AB.IdA AND
AC.IdC = BC.IdC
INNER JOIN A ON A.Id = AB.IdA
INNER JOIN B ON B.Id = AB.IdB
LEFT JOIN C ON C.Id = AC.IdC
and then filtering with a WHERE clause. My problem is I don't get how, which makes me think I am approaching the problem in the wrong way.
Any hint would be appreciated, thank you in advance.

Related

IMPALA SQL (pivoting table)

Here I’ve got the IMPALA SQL task, can someone help me to solve it please
There is a table with three columns A B and C.
There are three types of values: child in column A; child/parent in column A and C; and owner in column B.
IF value appears in column C, then it is a parent value for other value in the same row in the column A, and it also lives in column A a swell (as a child).
There are multiple levels of children and parents (child for one value can be parent for another).
All children and parent values are live together in column A;
Owners are live only in column B;
Children parent chain ends when owner appears (in column B).
I need to pivot existing table in a way that reveals every chain in a separate row
Child, parent-child 1, parent-child 2, parent-child 3, Owner.
see pic_1
I tried to solve this problem with LEFT JOINS
SELECT a.A, a.B, a.C, b.A, b.B, b.C, c.A, c.B, c.C
FROM
table1 a
LEFT JOIN
table1 b
ON a.C = b.A
LEFT JOIN
table1 c
ON b.C = c.A
it works but it is a bad solution because we don’t know how many levels of ‘parent-child’ can be there. some chains are consist of only three parent/child
values and some of them have 50 'chain-rings' prior to the owner

How to Map rows of same table based on child table values?

I need to write SQL that generates a table or view that contains a mapping of one entity to another entity, where the entities belong to the same table, and the mapping conditions are based on their children's values. Say I have the following schema:
I need to be able to creating a mapping between two different Entity IDs (call them A and B) based on the following conditions:
The Category.Name value of the Category linked to the Person linked to Entity A must have the same Category.Name value of the Category linked to the Person linked to Entity B ("The validation names of each entity must match")
The Verification linked to Entity A must have the same VerificationValue as the Verification linked to Entity B ("The Verification values of the two entities must match")
The VerificationType.Name linked to the verification that is linked to Entity A must be the same as the VerificationType.Name that is linked to the Verification that is linked to Entity B ("The verification types of each entity must match")
The end result would be something like this as a TABLE or VIEW:
entity_ID_A | entity_ID_B
--------------------------
1 2
3 4
11 10
Assume for simplicity's sake that we cannot have more than one value mapped to entity_ID_A
In code, this could be expressed simply as:
Entity a, b = //do some stuff to get the two different entities
return a.person.name == b.person.name &&
a.verification.verificationValue == b.verificationValue &&
a.verification.verificationType.name == b.verification.verificationType.Name;
I'm not even sure where to begin expressing this in SQL, much less generate a table of mappings of entity IDs that meet this criteria. Do I join all of the tables together before doing any comparisons? Any pointers in the right direction would be appreciated
Yes. Join all tables at first and then make self-join:
with ents as (
select e.id, c.name cname, v.verificationvalue vval, vt.name vname
from entity e
join person p on e.person_id = p.id
join category c on c.id = p.category_id
join verification v on v.id = e.verification_id
join verificationtype vt on vt.id = v.verificationtype_id)
select a.id id_a, b.id id_b
from ents a
join ents b on a.id < b.id
and a.cname = b.cname
and a.vval = b.vval
and a.vname = b.vname

Joining, but not joining (hypothetical q.)

Lets suppose that I have a table A with couple of columns. I work with tables, where there is no index on the entries, since they are 'historical' tables. I use one specific column, though, to sort of identify my things. Lets call this ID.
If you'd make a query like the one below, sometimes you'd get one line back, other cases a few.
SELECT * FROM A WHERE ID = '<something>'
Lets say I have two more tables, B and C. Both have ID columns, like A.
Also, some of the IDs in A, are also in B OR C. IDs existing in B CANNOT exist in C. And ALL IDs in A EXIST in either B OR C.
B and C contain extra information, which I'd like to join to A at the same SELECT.
My problem is, that they would only provide extra information. I do not want extra lines in my output..
To make it more clear: my selection from A returns a hundred lines of output.
When I left/right/inner join B table, I —probably— will have less lines as output. Same thing with joining C.
AND FINALLY, my question is:
Is there a way to join table B only on those IDs, which exist in B and vice versa? (And it I would want it in the same SELECT.... statement.)
If you don't want extra lines in your output, you could do something like this:
select *
from A join
(select B.*
from B
group by B.id
) B
on A.id = B.id;
This would choose arbitrary values from B for each id and join them back to A. Is this what you want?
Well it seems like you should build some left join between A and two "Select MAX"s: one from table B, the other one from table C.
And if you do not want 'duplicate' IDs from table A, a 'group by' on table A should help.

SQL query - Joining a many-to-many relationship, filtering/joining selectively

I find myself in a bit of an unworkable situation with a SQL query and I'm hoping that I'm missing something or might learn something new. The structure of the DB2 database I'm working with isn't exactly built for this sort of query, but I'm tasked with this...
Let's say we have Table People and Table Groups. Groups can contain multiple people, and one person can be part of multiple groups. Yeah, it's already messy. In any case, there are a couple of intermediary tables linking the two. The problem is that I need to start with a list of groups, get all of the people in those groups, and then get all of the groups with which the people are affiliated, which would be a superset of the initial group set. This would mean starting with groups, joining down to the people, and then going BACK and joining to the groups again. I need information from both tables in the result set, too, so that rules out a number of techniques.
I have to join this with a number of other tables for additional information and the query is getting enormous, cumbersome, and slow. I'm wondering if there's some way that I could start with People, join it to Groups, and then specify that if a person has one group that is in the supplied set of groups (which is done via a subquery), then ALL groups for that person should be returned. I don't know of a way to make this happen, but I'm thinking (hoping) that there's a relatively clean way to make this happen in SQL.
A quick and dirty example:
SELECT ...
FROM GROUPS g
JOIN LINKING_A a
ON g.GROUPID = a.GROUPID
AND GROUPID IN (subquery)
JOIN LINKING_B b
ON a.GROUPLIST = b.GROUPLIST
JOIN PEOPLE p
ON b.PERSONID = p.PERSONID
--This gets me all people affiliated with groups,
-- but now I need all groups affiliated with those people...
JOIN LINKING_B b2
ON p.PERSONID = b2.PERSONID
JOIN LINKING_A a2
ON b2.GROUPLIST = a.GROUPLIST
JOIN GROUPS g2
ON a2.GROUPID = g.GROUPID
And then I can return information from p and g2 in the result set. You can see where I'm having trouble. That's a lot of joining on some large tables, not to mention a number of other joins that are performed in this query as well. I need to be able to query by joining PEOPLE to GROUPS, then specify that if any person has an associated group that is in the subquery, it should return ALL groups affiliated with that entry in PEOPLE. I'm thinking that GROUP BY might be just the thing, but I haven't used that one enough to really know. So if Bill is part of group A, B, and C, and our subquery returns a set containing Group A, the result set should include Bill along with groups A, B, and C.
The following is a shorter way to get all the groups that people in the supplied group list are in. Does this help?
Select g.*
From Linking_B b
Join Linking_B b2
On b2.PersonId = b.PersonId
Join Group g
On g.GroupId = b2.GroupId
Where b.Groupid in (SubQuery)
I'm not clear why you have both Linking_A and Linking_B. Generally all you should need to represent a many-to-many relationship between two master tables is a single association table with GroupID and PersonId.
I often recommend using "common table expressions" [CTE's] in order to help you break a problem up into chunks that can be easier to understand. CTE's are specified using a WITH clause, which can contain several CTE's before starting the main SELECT query.
I'm going to assume that the list of groups you want to start with is specified by your subquery, so that will be the 1st CTE. The next one selects people who belong to those groups. The final part of the query then selects groups those people belong to, and returns the columns from both master tables.
WITH g1 as
(subquery)
, p1 as
(SELECT p.*
from g1
join Linking a1 on g1.groupID=a1.groupID
join People p on p.personID=a1.personID )
SELECT p1.*, g2.*
from p1
join Linking a2 on p2.personID=a2.personID
join Groups g2 on g2.groupID=a2.groupID
I think I'd build the list of people you want to pull records for first, then use that to query out all the groups for those people. This will work across any number of link tables with the appropriate joins added:
with persons_wanted as
(
--figure out which people are in a group you want to include
select p.person_key
from person p
join link l1
on p.person_key = l1.person_key
join groups g
on l1.group_key = g.group_key
where g.group name in ('GROUP_I_WANT_PEOPLE_FROM', 'THIS_ONE_TOO')
group by p.person_key --we only want each person_key once
)
--now pull all the groups for the list of people in at least one group we want
select p.name as person_name, g.name as group_name, ...
from person p
join link l1
on p.person_key = l1.person_key
join groups g
on l1.group_key = g.group_key
where p.person_key in (select person_key from persons_wanted);

SQL Query, return all children in a one-to-many relationship when one child matches

I'm working on enhancing a query for a DB2 database and I'm having some problems getting acceptable performance due to the number of joins across large tables that need to be performed to get all of the data and I'm hoping that there's a SQL function or technique that can simplify and speed up the process.
To break it down, let's say there are two tables: People and Groups. Groups contain multiple people, and a person can be part of multiple groups. It's a many-to-many, but bear with me. Basically, there's a subquery that will return a set of groups. From that, I can join to People (which requires additional joins across other tables) to get all of the people from those groups. However, I also need to know all of the groups that those people are in, which means joining back to the Groups table again (several more joins) to get a superset of the original subquery. There are additional joins in the query as well to get other pieces of relevant data, and the cost is adding up in a very ugly way. I also need to return information from both tables, so that rules out a number of techniques.
What I'd like to do is be able to start with the People table, join it to Groups, and then compare Groups with the subquery. If the Groups attached to one person has one match in the subquery, it should return ALL Group items associated with that person.
Essentially, let's say that Bob is part of Group A, B, and C. Currently, I start with groups, and let's say that only Group A comes out of the subquery. Then I join A to Bob, but then I have to come back and join Bob to Group again to get B and C. SQL example:
SELECT p.*, g2.*
FROM GROUP g
JOIN LINKA link
ON link.GROUPID = g.GROUPID
JOIN LINKB link1
ON link1.LISTID = link.LISTID
JOIN PERSON p
ON link1.PERSONID = p.PERSONID
JOIN LINKB link2
ON link2.PERSONID = p.PERSONID
JOIN LINKA link3
ON link2.LISTID = link3.LISTID
JOIN GROUP g2
ON link3.GROUPID = g2.GROUPID
WHERE
g.GROUPID IN (subquery)
Yes, the linking tables aren't ideal, but they're basically normalized tables containing additional information that is not relevant to the query I'm running. We have to start with a filtered Group set, join to People, then come back to get all of the Groups that the People are associated to.
What I'd like to do is start with People, join to Group, and if ANY Group that Bob is in returns from the subquery, ALL should be returned, so if we have Bob joined to A, B, and C, and A is in the subquery, it will return three rows of Bob to A, B, and C as there was at least one match. In this way, it could be treated as a one-to-many relationship if we're only concerned with the Groups for each Person and not the other way around. SQL example:
SELECT p.*, g.*
FROM PEOPLE p
JOIN LINKB link
ON link.PERSONID = p.PERSONID
JOIN LINKA link1
ON link.LISTID = link1.LISTID
JOIN GROUP g
ON link1.GROUPID = g.GROUPID
WHERE
--SQL function, expression, or other method to return
--all groups for any person who is part of any group contained in the subquery
The number of joins in the first query make it largely unusable as these are some pretty big tables. The second would be far more ideal if this sort of thing is possible.
From the question, I think you are querying hierarchical data. DB2 provides facility to deal with such data. There are two clauses Start with and Connect by in DB2 which will be useful. They are explained here.