Beginner SQL trouble with simple Join/Subquery - sql

I know this is a quite a basic query, but I can't find exactly the solution for my needs, so here goes...
I have a table of customers INT_AUX_LISTING, a table of folders in which they're contained INT_AUX_DIRECTORY and a joining table INT_AUX_DIR_LIST.
I need to return a list of all customers who are in folder 40017, and also not in folder 2. Any other folder in which they're contained is irrelevant.
So far I've come up with
SELECT *
FROM INT_AUX_LISTING l
LEFT JOIN INT_AUX_DIR_LIST dl
ON l.LISTING_ID=dl.LISTING_ID
WHERE dl.CONTAIN_DIR_ID=40017
AND dl.CONTAIN_DIR_ID <> 2
However this (obviously) isn't correct, and is returning far too many values. I think I need to introduce a subquery but things go awry when I try.
Apologies for the entry level nature of the question, but it's only my 3rd day working with SQL!

First, clarification on left-join and where. If you have a query that has a left-join to a table (per your example where "dl" was the right-side of the join), and then add the "dl" alias to a where clause, that in essence converts it to an INNER JOIN (unless you explicitly tested with an OR dl.listing_ID IS NULL).
Now, back to your primary objective. Without the table structures, and not exactly clear on your column intent, it APPEARS that the "listing_id" column is your client as found in the int_aux_listing table. the "contain_dir_id" is the directory (one of many that may be possible) as found in the int_aux_dir_list.
That said, you want a list of all people who are in 40017 AND NOT in 2. For this, you can do a left-join but using the int_aux_listing table twice but with different aliases. Once you get the list of clients, you can then join that to your client table. Since you are new, take the sample one step at a time.
Just get clients within 40017
select
dl.listing_id
from
int_aux_dir_list dl
where
dl.contain_dir_id = 40017
Now, exclude those if they are found in directory 2
select
dl.listing_id
from
int_aux_dir_list dl
left join int_aux_dir_list dl2
on dl.listing_id = dl2.listing_id
AND dl2.contain_dir_id = 2
where
dl.contain_dir_id = 40017
AND dl2.listing_id IS NULL
By doing a left-join to the directory listing table a second time (via alias "dl2"), but specifically for its directory id = 2 (via AND dl2.contain_dir_id = 2), you are getting all clients from dl that are within directory 40017 and looking for that same client ID in the second instance ("dl2").
Here's the kicker. Notice the where clause is stating "AND dl2.listing_id IS NULL".
This is basically saying WHEN Looking at the DL2 instance, I only want records that DO NOT have a corresponding listing ID matched.
Now, wrap this up to get the client information you need. In this case, getting all columns from the directory listing (in case other columns of importance), AND getting all columns from the "l"isting table.
select
dl.*,
l.*
from
int_aux_dir_list dl
left join int_aux_dir_list dl2
on dl.listing_id = dl2.listing_id
AND dl2.contain_dir_id = 2
join INT_AUX_LISTING l
on dl.listing_id = l.listing_id
where
dl.contain_dir_id = 40017
AND dl2.listing_id IS NULL
Hopefully this clears things up for you some.

Your WHERE clause is the problem; it brings back ALL rows except the one where that id is not 2, including the one where id is 40017 (since it's not equal to 2). I think you should revisit it.

If you use a LEFT JOIN[1] on INT_AUX_LISTING, you will get all customers even if they don't have a corresponding folder. In your case it's best to use INNER JOIN[2] instead.
In the where clause, you're first statement selects customers who are in folder 40017 (dl.CONTAIN_DIR_ID=40017), because of this your second statement (dl.CONTAIN_DIR_ID <> 2) will always be true (40017 will never be equal to 2). So basically your selecting all customers in folder 40017.
Your notion of a subquery was correct. This is one way of getting the desired result:
SELECT *
FROM INT_AUX_LISTING l
INNER JOIN INT_AUX_DIR_LIST dl1
ON l.LISTING_ID = dl1.LISTING_ID
WHERE dl1.CONTAIN_DIR_ID = 40017
AND l.LISTING_ID NOT IN (
SELECT dl2.LISTING_ID FROM INT_AUX_DIR_LIST dl2 WHERE dl2.CONTAIN_DIR_ID = 2
)

Related

Abap subquery Where Cond [duplicate]

I have a requirement to pull records, that do not have history in an archive table. 2 Fields of 1 record need to be checked for in the archive.
In technical sense my requirement is a left join where right side is 'null' (a.k.a. an excluding join), which in abap openSQL is commonly implemented like this (for my scenario anyways):
Select * from xxxx //xxxx is a result for a multiple table join
where xxxx~key not in (select key from archive_table where [conditions] )
and xxxx~foreign_key not in (select key from archive_table where [conditions] )
Those 2 fields are also checked against 2 more tables, so that would mean a total of 6 subqueries.
Database engines that I have worked with previously usually had some methods to deal with such problems (such as excluding join or outer apply).
For this particular case I will be trying to use ABAP logic with 'for all entries', but I would still like to know if it is possible to use results of a sub-query to check more than than 1 field or use another form of excluding join logic on multiple fields using SQL (without involving application server).
I have tested quite a few variations of sub-queries in the life-cycle of the program I was making. NOT EXISTS with multiple field check (shortened example below) to exclude based on 2 keys works in certain cases.
Performance acceptable (processing time is about 5 seconds), although, it's noticeably slower than the same query when excluding based on 1 field.
Select * from xxxx //xxxx is a result for a multiple table inner joins and 1 left join ( 1-* relation )
where NOT EXISTS (
select key from archive_table
where key = xxxx~key OR key = XXXX-foreign_key
)
EDIT:
With changing requirements (for more filtering) a lot has changed, so I figured I would update this. The construct I marked as XXXX in my example contained a single left join ( where main to secondary table relation is 1-* ) and it appeared relatively fast.
This is where context becomes helpful for understanding the problem:
Initial requirement: pull all vendors, without financial records in 3
tables.
Additional requirements: also exclude based on alternative
payers (1-* relationship). This is what example above is based on.
More requirements: also exclude based on alternative payee (*-* relationship between payer and payee).
Many-to-many join exponentially increased the record count within the construct I labeled XXXX, which in turn produces a lot of unnecessary work. For instance: a single customer with 3 payers, and 3 payees produced 9 rows, with a total of 27 fields to check (3 per row), when in reality there are only 7 unique values.
At this point, moving left-joined tables from main query into sub-queries and splitting them gave significantly better performance.
than any smarter looking alternatives.
select * from lfa1 inner join lfb1
where
( lfa1~lifnr not in ( select lifnr from bsik where bsik~lifnr = lfa1~lifnr )
and lfa1~lifnr not in ( select wyt3~lifnr from wyt3 inner join t024e on wyt3~ekorg = t024e~ekorg and wyt3~lifnr <> wyt3~lifn2
inner join bsik on bsik~lifnr = wyt3~lifn2 where wyt3~lifnr = lfa1~lifnr and t024e~bukrs = lfb1~bukrs )
and lfa1~lifnr not in ( select lfza~lifnr from lfza inner join bsik on bsik~lifnr = lfza~empfk where lfza~lifnr = lfa1~lifnr )
)
and [3 more sets of sub queries like the 3 above, just checking different tables].
My Conclusion:
When exclusion is based on a single field, both not in/not exits work. One might be better than the other, depending on filters you use.
When exclusion is based on 2 or more fields and you don't have many-to-many join in main query, not exists ( select .. from table where id = a.id or id = b.id or... ) appears to be the best.
The moment your exclusion criteria implements a many-to-many relationship within your main query, I would recommend looking for an optimal way to implement multiple sub-queries instead (even having a sub-query for each key-table combination will perform better than a many-to-many join with 1 good sub-query, that looks good).
Anyways, any additional insight into this is welcome.
EDIT2: Although it's slightly off topic, given how my question was about sub-queries, I figured I would post an update. After over a year I had to revisit the solution I worked on to expand it. I learned that proper excluding join works. I just failed horribly at implementing it the first time.
select header~key
from headers left join items on headers~key = items~key
where items~key is null
if it is possible to use results of a sub-query to check more than
than 1 field or use another form of excluding join logic on multiple
fields
No, it is not possible to check two columns in subquery, as SAP Help clearly says:
The clauses in the subquery subquery_clauses must constitute a scalar
subquery.
Scalar is keyword here, i.e. it should return exactly one column.
Your subquery can have multi-column key, and such syntax is completely legit:
SELECT planetype, seatsmax
FROM saplane AS plane
WHERE seatsmax < #wa-seatsmax AND
seatsmax >= ALL ( SELECT seatsocc
FROM sflight
WHERE carrid = #wa-carrid AND
connid = #wa-connid )
however you say that these two fields should be checked against different tables
Those 2 fields are also checked against two more tables
so it's not the case for you. Your only choice seems to be multi-join.
P.S. FOR ALL ENTRIES does not support negation logic, you cannot just use some sort of NOT IN FOR ALL ENTRIES, it won't be that easy.

How do I invert my join critera in TSQL?

I think this is a relatively basic operation in SQL, but I'm having a hard time figuring it out.
I have two tables, a source table I'm trying to SELECT my data from, and a reference table containing serial #'s and transaction #'s (the source table also has these columns, and more). I want to do two different SELECTs, one where the serial/trans number pair exist in the source table and one where the serial/trans number do not exist. (the combination of serial and trans number are the primary keys for both these tables)
Intially I'm doing this with a join like this:
SELECT * FROM source s
INNER JOIN reference r ON s.serial = r.serial AND s.trans = r.trans
I would think this should give me everything from the source table that has a serial/trans pair matching with one in the reference table. I'm not positive this is correct, but it is returning a reasonable number of results and I think it looks good.
When I go to do the opposite, get everything from source where the serial/trans pair do not match up with one in reference, I encounter a problem. I tried the following query:
SELECT * FROM source s
INNER JOIN reference r ON s.serial <> r.serial AND s.trans <> r.trans
When I run this the query goes on forever, it starts returning way more results than it should, more than are actually in the entire source table. It eventually ends with an OOM exception, I let it run for 20 min+. For some persepctive the source table I'm dealing with has about 13 million records, and reference table has about 105,000.
So how do I get the results I'm looking for? If it is not already clear, the number of results from the first query + results from second query should equal the total number of records in my source table.
Thanks for any help!
I think you need something like NOT EXISTS:
SELECT *
FROM source s
WHERE NOT EXISTS (SELECT 1
FROM reference r
WHERE s.serial = r.serial AND s.trans = r.trans)
The above query will get everything from source where the serial/trans pair do not match up with one in reference.
As cited in comment from #Dan below NOT EXISTS generally has a performance advantage over LEFT JOIN in a situation like this (see this for example).
"LEFT JOIN" match records that match together, and return NULL for those that doesn't fit.
So, non-matching records all have NULL in column coming from reference table. This way, you can than add a where clause to return only records with null value in a non-nullable column of table "reference" (like the primarykey).
SELECT * FROM source s
LEFT JOIN reference r ON s.serial = r.serial AND s.trans = r.trans
WHERE R.serial IS NULL

Contradiction Between Multiple Left Joins

I am trying to understand the following query which is automatically produced by some software library:
SELECT DISTINCT `t`.* FROM `teacher` AS `t`
LEFT JOIN `rel` AS `rel_profile`
ON `rel_profile`.`field_id` = 2319 AND `rel_profile`.`item_id` = `t`.`id`
LEFT JOIN `teacher_info` AS `profile`
ON `profile`.`id` = `rel_profile`.`related_item_id`
LEFT JOIN `rel` AS `rel_profile_city`
ON `rel_profile_city`.`field_id` = 2320 AND `rel_profile_city`.`item_id` = `profile`.`id` WHERE `rel_profile_city`.`item_id` = 1
There are three left joins. I understand the first and second one. What I don't understand is the third left join:
LEFT JOIN `rel` AS `rel_profile_city`
ON `rel_profile_city`.`field_id` = 2320 AND `rel_profile_city`.`item_id` = `profile`.`id` WHERE `rel_profile_city`.`item_id` = 1
The table rel has already been used in the first left join:
LEFT JOIN `rel` AS `rel_profile`
ON `rel_profile`.`field_id` = 2319
Now, the same table is left joined again but this time the value of the joined field is different:
LEFT JOIN `rel` AS `rel_profile_city`
ON `rel_profile_city`.`field_id` = 2320
How come that these two joins do not contradict?
The query is using aliases:
`rel` AS `rel_profile`
Says to pretend that the table rel is actually a table called rel_profile. That alias is then used throughout the rest of the query. I'm not sure of MySQL, but on some other database systems, it's an error to refer to the table as rel from then onwards(*) (unless there's another join that re-introduces the table and doesn't provide an alias).
And joining to the same table multiple times is allowed - provided that the names (or aliases) are unique. This is useful when you're trying to construct a result that relies on the content of multiple rows from the same table, where the result should occupy a single row.
(*) "Then onwards" being in the order in which the clauses are processed, not the text order. E.g. you should use the alias in the SELECT clause because, even though it occurs earlier textually, it's (conceptually) processed after the FROM clause.
This query will show teacher rows that have associated rows in rel with field_id = 2319 OR field_id = 2320
The are not "contradicting" each other. Imagine you have a table of users, wich have the demographic and personal data of your users. And another table with the "relation" between users. So, in this "relations" table, you have columns UserId1 and UserId2. If you want a query that returns the data of those two users, you'll need to do two JOINS with the table Users, once per each User column. This doesn't mean that they are contradicting each other.

In an EXISTS can my JOIN ON use a value from the original select

I have an order system. Users with can be attached to different orders as a type of different user. They can download documents associated with an order. Documents are only given to certain types of users on the order. I'm having trouble writing the query to check a user's permission to view a document and select the info about the document.
I have the following tables and (applicable) fields:
Docs: DocNo, FileNo
DocAccess: DocNo, UserTypeWithAccess
FileUsers: FileNo, UserType, UserNo
I have the following query:
SELECT Docs.*
FROM Docs
WHERE DocNo = 1000
AND EXISTS (
SELECT * FROM DocAccess
LEFT JOIN FileUsers
ON FileUsers.UserType = DocAccess.UserTypeWithAccess
AND FileUsers.FileNo = Docs.FileNo /* Errors here */
WHERE DocAccess.UserNo = 2000 )
The trouble is that in the Exists Select, it does not recognize Docs (at Docs.FileNo) as a valid table. If I move the second on argument to the where clause it works, but I would rather limit the initial join rather than filter them out after the fact.
I can get around this a couple ways, but this seems like it would be best. Anything I'm missing here? Or is it simply not allowed?
I think this is a limitation of your database engine. In most databases, docs would be in scope for the entire subquery -- including both the where and in clauses.
However, you do not need to worry about where you put the particular clause. SQL is a descriptive language, not a procedural language. The purpose of SQL is to describe the output. The SQL engine, parser, and compiler should be choosing the most optimal execution path. Not always true. But, move the condition to the where clause and don't worry about it.
I am not clear why do you need to join with FileUsers at all in your subquery?
What is the purpose and idea of the query (in plain English)?
In any case, if you do need to join with FileUsers then I suggest to use the inner join and move second filter to the WHERE condition. I don't think you can use it in JOIN condition in subquery - at least I've never seen it used this way before. I believe you can only correlate through WHERE clause.
You have to use aliases to get this working:
SELECT
doc.*
FROM
Docs doc
WHERE
doc.DocNo = 1000
AND EXISTS (
SELECT
*
FROM
DocAccess acc
LEFT OUTER JOIN
FileUsers usr
ON
usr.UserType = acc.UserTypeWithAccess
AND usr.FileNo = doc.FileNo
WHERE
acc.UserNo = 2000
)
This also makes it more clear which table each field belongs to (think about using the same table twice or more in the same query with different aliases).
If you would only like to limit the output to one row you can use TOP 1:
SELECT TOP 1
doc.*
FROM
Docs doc
INNER JOIN
FileUsers usr
ON
usr.FileNo = doc.FileNo
INNER JOIN
DocAccess acc
ON
acc.UserTypeWithAccess = usr.UserType
WHERE
doc.DocNo = 1000
AND acc.UserNo = 2000
Of course the second query works a bit different than the first one (both JOINS are INNER). Depeding on your data model you might even leave the TOP 1 out of that query.

Translating Oracle SQL to Access Jet SQL, Left Join

There must be something I'm missing here. I have this nice, pretty Oracle SQL statement in Toad that gives me back a list of all active personnel with the IDs that I want:
SELECT PERSONNEL.PERSON_ID,
PERSONNEL.NAME_LAST_KEY,
PERSONNEL.NAME_FIRST_KEY,
PA_EID.ALIAS EID,
PA_IDTWO.ALIAS IDTWO,
PA_LIC.ALIAS LICENSENO
FROM PERSONNEL
LEFT JOIN PERSONNEL_ALIAS PA_EID
ON PERSONNEL.PERSON_ID = PA_EID.PERSON_ID
AND PA_EID.PERSONNEL_ALIAS_TYPE_CD = 1086
AND PA_EID.ALIAS_POOL_CD = 3796547
AND PERSONNEL.ACTIVE_IND = 1
LEFT JOIN PERSONNEL_ALIAS PA_IDTWO
ON PERSONNEL.PERSON_ID = PA_IDTWO.PERSON_ID
AND PA_IDTWO.PERSONNEL_ALIAS_TYPE_CD = 3839085
AND PA_IDTWO.ACTIVE_IND = 1
LEFT JOIN PERSONNEL_ALIAS PA_LIC
ON PERSONNEL.PERSON_ID = PA_LIC.PERSON_ID
AND PA_LIC.PERSONNEL_ALIAS_TYPE_CD = 1087
AND PA_LIC.ALIAS_POOL_CD = 683988
AND PA_LIC.ACTIVE_IND = 1
WHERE PERSONNEL.ACTIVE_IND = 1 AND PERSONNEL.PHYSICIAN_IND = 1;
This works very nicely. Where I run into problems is when I put it into Access. I know, I know, Access Sucks. Sometimes one needs to use it, especially if one has multiple database types that they just want to store a few queries in, and especially if one's boss only knows Access. Anyway, I was having trouble with the ANDs inside the FROM, so I moved those to the WHERE, but for some odd reason, Access isn't doing the LEFT JOINs, returning only those personnel with EID, IDTWO, and LICENSENO's. Not everybody has all three of these.
Best shot in Access so far is:
SELECT PERSONNEL.PERSON_ID,
PERSONNEL.NAME_LAST_KEY,
PERSONNEL.NAME_FIRST_KEY,
PA_EID.ALIAS AS EID,
PA_IDTWO.ALIAS AS ID2,
PA_LIC.ALIAS AS LICENSENO
FROM ((PERSONNEL
LEFT JOIN PERSONNEL_ALIAS AS PA_EID ON PERSONNEL.PERSON_ID=PA_EID.PERSON_ID)
LEFT JOIN PERSONNEL_ALIAS AS PA_IDTWO ON PERSONNEL.PERSON_ID=PA_IDTWO.PERSON_ID)
LEFT JOIN PERSONNEL_ALIAS AS PA_LIC ON PERSONNEL.PERSON_ID=PA_LIC.PERSON_ID
WHERE (((PERSONNEL.ACTIVE_IND)=1)
AND ((PERSONNEL.PHYSICIAN_IND)=1)
AND ((PA_EID.PRSNL_ALIAS_TYPE_CD)=1086)
AND ((PA_EID.ALIAS_POOL_CD)=3796547)
AND ((PA_IDTWO.PRSNL_ALIAS_TYPE_CD)=3839085)
AND ((PA_IDTWO.ACTIVE_IND)=1)
AND ((PA_LIC.PRSNL_ALIAS_TYPE_CD)=1087)
AND ((PA_LIC.ALIAS_POOL_CD)=683988)
AND ((PA_LIC.ACTIVE_IND)=1));
I think that part of the problem could be that I'm using the same alias (lookup) table for all three joins. Maybe there's a more efficient way of doing this? Still new to SQL land, so any tips as far as that goes would be great. I feel like these should be equivalent, but the Toad query gives me back many many tens of thousands of imperfect rows, and Access gives me fewer than 500. I need to find everybody so that nobody is left out. It's almost as if the LEFT JOINs aren't working at all in Access.
To understand what you are doing, let's look at simplified version of your query:
SELECT PERSONNEL.PERSON_ID,
PA_EID.ALIAS AS EID
FROM PERSONNEL
LEFT JOIN PERSONNEL_ALIAS AS PA_EID ON PERSONNEL.PERSON_ID=PA_EID.PERSON_ID
WHERE PERSONNEL.ACTIVE_IND=1
AND PERSONNEL.PHYSICIAN_IND=1
AND PA_EID.PRSNL_ALIAS_TYPE_CD=1086
AND PA_EID.ALIAS_POOL_CD=3796547
If the LEFT JOIN finds match, your row might look like this:
Person_ID EID
12345 JDB
If it doesn't find a match, (disregard the WHERE clause for a second), it could look like:
Person_ID EID
12345 NULL
When you add the WHERE clauses above, you are telling it to only find records in the PERSONNEL_ALIAS table that meet the condition, but if no records are found, then the values are considered NULL, so they will never satisfy the WHERE condition and no records will come back...
As Joe Stefanelli said in his comment, adding a WHERE clause to a LEFT JOIN'ed table make it act as an INNER JOIN instead...
Further to #Sparky's answer, to get the equivalent of what you're doing in Oracle, you need to filter rows from the tables on the "outer" side of the joins before you join them. One way to do this might be:
For each table on the "outer" side of a join that you need to filter rows from (that is, the three instances of PERSONNEL_ALIAS), create a query that filters the rows you want. For example, the first query (say, named PA_EID) might look something like this:SELECT PERSONNEL_ALIAS.* FROM PERSONNEL_ALIAS WHERE PERSONNEL_ALIAS.PERSONNEL_ALIAS_TYPE_CD = 1086 AND PERSONNEL_ALIAS.ALIAS_POOL_CD = 3796547
In your "best shot in Access so far" query in the original post: a) replace each instance of PERSONNEL_ALIAS with the corresponding query created in Step 1, and, b) remove the corresponding conditions (on PA_EID, PA_IDTWO, and PA_LIC) from the WHERE clause.