What is the practical difference between these two SQL statements? - sql

In an exam I was asked to retrieve the name of the transporters never having transported a container based in Rotterdam. The correct answer was
select Transporter.ID
from Transporter
where Transporter.ID not in (
select TransporterID
from Container
inner join Transportation on Container.ID = Transportation.ContainerID
where Container.City = 'Rotterdam')
and nevertheless the following was marked as a wrong answer:
select Transporter.ID
from Transporter
where Transporter.ID in (
select TransporterID
from Container
inner join Transportation on Container.ID = Transportation.ContainerID
where Container.City <> 'Rotterdam')
Why don't both statements lead to the same result? What is the practical difference between in ( ... where A <> B ) and not in ( ... where A = B )?
[Note that Transportation is in the center of the relational scheme, with all its prime attributs being foreign keys]

Let's build a simple table as example :
Container
TransporterID | City
1 | 'Rotterdam'
1 | 'Paris'
2 | 'Rotterdam'
And then this query
SELECT TransporterID
FROM Container
WHERE Container.City <> 'Rotterdam'
This will result 1 (the row with paris)
Then, WHERE Transporter.ID IN ( ... statement will give wrong result (transporter 1 has been to 'Rotterdam')

Besides what the other answers point out, take NULLs into consideration:
If City is NULL both queries would treat the comparison as FALSE in their WHERE clause...

You version is answering a slightly different question: "What are the ids of transporters that have transported a container somewhere other than Rotterdam?".
As for the best answer, I would use not exists (which is material) and table aliases (more stylistic):
select t.ID
from Transporter t
where not exists (select 1
from Container c join
Transportation tr
on c.ID = t.ContainerID
where tr.TransporterID = t.id and
c.City = 'Rotterdam'
);
NOT IN does not behave the way most people expect when any row in the subquery returns NULL (all rows are filtered out in that case). NOT EXISTS has the expected behavior.

Related

Don't select rows where column A is duplicated AND any row of column B is a specific value

I'm working on generating a report merging multiple tables. The report requires only showing projects that did not have any document marked 'Not Received' These document markings are listed in a table that lists each document in an individual line. So when merged into my other table it creates multiple rows of the same project. For example the following table
Project Number
ChecklistValue
565
Received
565
Not Received
465
Received
465
Not Applicable
As you can see really only two projects are listed on this table but the desired output is:
Project Number
Other Info
465
etc
I do not need the checklist value on the actual report, so I can use the GROUP BY to combine all the good rows, but where I have an Issue is that would still include project 565 even if I include something like where ChecklistValue <> 'Not Received', 565 needs to be hidden from the report entirely because any row for 565 contains 'Not Received'.
So that's my actual question, how do I exclude all project numbers rows that have any row containing 'Not Received'?
I'm adding the entire query will generalized names below:
SELECT
Project Number
,Name
,Contractor
,ABS(DATEDIFF(day,(ActualDate),(EstDate))) AS DelayPeriod
,S.NoteDate
,S.FinalAppDate
,Status
,S.ONE
,S.TWO
,S.THREE
,S.FOUR
,CH.ChecklistValue
FROM [DB1] A
INNER JOIN [DB2] C ON A.Contractor = C.Contractor
INNER JOIN [DB3] S ON A.AppID = S.AppID
INNER JOIN [DB4] LS ON S.StatusID = LS.StatusID
LEFT OUTER JOIN [DB5] CH ON A.AppID = CH.AppID AND CH.OtherID = 1
WHERE C.TypeID = 4 AND A.YEAR = 2022, AND S.THING = 1 AND
(CH.CheckListValue IS NULL OR A.AppID NOT IN (SELECT * FROM [DB5] WHERE
CheckListValue = 'Not Reveived'))
GROUP BY Project Number,Name,Contractor,ABS(DATEDIFF(day,(ActualDate),(EstDate))) AS DelayPeriod,S.NoteDate,S.FinalAppDate,Status,S.ONE,S.TWO,S.THREE,S.FOUR
The last portion of the WHERE clause was added from a suggestion, but I'm clearly not implementing it correctly as it errors
You can use not in like:
create table test(
num int,
description varchar(20)
);
insert into test(num,description)
values(565,'Received'),
(565,'Not Received'),
(465,'Received'),
(465,'Not Applicable');
select *
from test
where num not in
(
select num -- Only select one column here
from test
where description = 'Not Received'
);
Results:
+-----+---------------+
| num | description |
+-----+---------------+
| 465 | Received |
| 465 | Not Applicable|
+-----+---------------+
db<>fiddle this is on sql-server but works on other dbms as well.
So in your query you should have (in my understanding):
OR A.AppID NOT IN
(
SELECT AppID -- Not select *
FROM [DB5]
WHERE CheckListValue = 'Not Reveived'
)
Other way to do it is with a cte but it is complicated at first glance:
with x as(
select num
from test
where description = 'Not Received'
)
select t.num, t.description
from test t
left join x
on t.num = x.num
where x.num is null
I'm first creating a cte on the num column where the description = not received then I'm selecting all from the test table, and I'm left joining to the cte but I'm only selecting the num column that are not in the cte by using where x.num is null, and this will only return 465.
Now which one is better? I don't know sometimes join would be faster and sometimes in, for more you can find on this post.

Join Tables to return 1 or 0 based on multiple conditions

I am working on a project management website and have been asked for a new feature in a review meeting section.
A meeting is held to determine whether to proceed to the next phase, and I need to maintain a list of who attended each phase review meeting. I need to write an SQL query to return all people, with an additional column that states they have already been added before.
There are two tables involved to get my desired result, with the relevant columns listed below:
Name: PersonList
ID | Name | Division
Name: reviewParticipants
ProjectID | PersonID | GateID
The query I am looking for is something that returns all people in PersonList, with an additional "hasAttended" bit that is TRUE if reviewParticipants.ProjectID = 5 AND reviewParticpants.CurrentPhase = 'G0' ELSE FALSE.
PersonName | PersonID | hasAttended
Mr Smith | 1 | 1
Mr Jones | 2 | 0
I am not sure how to structure such a query with multiple conditions in a (left?) join, that would return as a different column name and data type, so I would appreciate if anybody can point me in the right direction?
With the result of this query I am going to add a series of checkboxes, and use this additional bit to mark it checked, or not, for page refreshes.
You can use LEFT JOIN as well:
SELECT DISTINCT p.*
,CASE WHEN rp.id IS NOT NULL THEN 1 ELSE 0 END AS hasAttended
FROM personlist p
LEFT JOIN reviewParticipants rp ON rp.personid = p.id
AND rp.projectid = 5
AND rp.currentphase = 'GO'
I agree with Gordon Linoff: I would prefer an int or tinyint over a bit value,
You can use exists to see if there is a matching row.
select p.*,
(case when exists (select 1
from reviewParticipants rp
where rp.personid = p.id and
rp.projectid = 5 and
rp.currentphase = 'GO'
)
then 1 else 0 end)
from personlist p;
I see no reason to prefer a bit over an integer, but you can return a bit if you really prefer.
This will do :
select a.* from PersonList a where a.hasAttended=1 and
a.Id in (select b.PersonId from reviewParticipants b
where b.ProjectID =5 and exists (
select 1 from reviewParticipants c where c.CurrentPhase = 'G0'and
c.Project =b.projectId
)
)

How to get different data from two different tables in SQL query?

I have two table named Soft and Web, table containing multiple data in that which data is different that data I want. For Ex :
In soft table containing 5 data i.e.
Also in Web table containing 5 data i.e.
Now I want output i.e.
I have done query but unfortunately didnt succed, lets see my query i.e.
SELECT DISTINCT soft.GSTNo AS SoftGST
,web.GSTNo AS WebGST
,soft.InvoiceNumber AS SoftInvoice
,web.InvoiceNumber AS WebInvoice
,soft.Rate AS SoftRate
,web.Rate AS WebRate
FROM soft
LEFT OUTER JOIN web ON web.GstNo = soft.GSTNo
AND web.InvoiceNumber = soft.invoicenumber
AND web.rate = soft.rate
Also I apply inner join bt same thing didnt work.
You can achieve this by
;WITH cte_soft AS
(SELECT * FROM soft
EXCEPT
SELECT * FROM web)
,cte_web AS
(SELECT * FROM web
EXCEPT
SELECT * FROM soft)
SELECT *
FROM
(SELECT gst softgst, NULL webgst, invoice softinvoice, NULL webinvoice, rate softrate, NULL webrate
FROM cte_soft
UNION ALL
SELECT NULL, gst, NULL, invoice, NULL , rate
FROM cte_web) tbl
ORDER BY coalesce(softgst, webgst),coalesce(softinvoice,webinvoice)
Fiddle
You can use full join:
SELECT s.gst as softgst, w.gst as webgst,
s.invoice as softinvoice, w.invoice as webinvoice,
s.rate as softrate, w.rate as webrate
FROM soft s FULL JOIN
web w
ON s.gst = w.gst AND s.invoice = w.invoice AND s.rate = w.rate
WHERE s.gst IS NULL OR w.gst IS NULL
ORDER BY COALESCE(s.gst, w.gst), COALESCE(s.invoice, w.invoice);
No subqueries are CTEs are needed. This is really just a slight variant of your query.

OracleSQL: How do I add a specific AND is not null OR is not null to my query

Backstory:
I have three tables I'm working with. A directory table (directory), an general attribute table (attribute1table) and a specific attribute table (attribute2table). The general attribute tables hold attribute names (ex. Last Name) under attribute id's (attrid = 2). The specific attribute table holds specific data for these attributes (ex. Doe).
I needed to transpose rows to columns. I had tried using pivot, and max(decode) before but all options gave me the wrong string value- so I used a sub select within the select statement. This worked well- it did transpose the rows into columns but gave me a bunch of null values. See query at the bottom for steps.
Then I added in a general 'stringval IS NOT NULL' to eliminate any of the other attribute1table.attrid's (ex. 4, 5, 6). This worked.
This is the output I was getting at this point. The ? are null values.
Name DataID LastName FirstName
File10 1290 ? Jane
File10 1290 Doe ?
Then I wanted to add in a specification. Essentially to include the values where LastName is not null OR FirstName is not null. I found that someone had recommended doing this in a previous question albeit their situation was different. Eliminating specific null values in sql select
I was able to include one statement or the other but could not add in both. Instead of getting an error I just got a horrifically long run time with no foreseeable result (note that I am using software which lets you input oracle queries within the interface to query the database). It works if I run the query up until the ** (see code) but as soon as I add in the OR condition, it doesn't work anymore. I think this is because I have multiple WHERE conditions. In all cases I want the directory ID and general stringval conditions to apply but I want to have a third condition where either lastname is not null or first name is not null. I'm not sure if I'm missing something obvious- please help?
Here is my current query:
SELECT directory.name, directory.dataid,
(SELECT max(stringval) FROM attribute2table WHERE attribute1table.attrid = 2) as LastName,
(SELECT max(stringval) FROM attribute2table WHERE attribute1table.attrid = 3) as FirstName
FROM attribute2table
JOIN directory ON directory.dataid = attribute2table.id
JOIN attribute1table ON attribute1table.id = directory.dataid
WHERE directory.dataid = 1290
AND stringval IS NOT NULL
AND (SELECT max(valstr) FROM attribute1table WHERE attribute1table.attrid = 2) IS NOT NULL
**OR (SELECT max(valstr) FROM attribute1table WHERE attribute1table.attrid = 3) IS NOT NULL**
Basically I just need to get rid of the null values and want my table to look like....
Name DataID LastName FirstName
File10 1290 Doe Jane
This appears to be a parenthesization issue. If I understand the issue, you need to put the two IS NOT NULL conditions in parentheses:
SELECT directory.name,
directory.dataid,
m2.LastName,
m3.FirstName
FROM attribute2table
INNER JOIN directory
ON directory.dataid = attribute2table.id
INNER JOIN attribute1table
ON attribute1table.id = directory.dataid
LEFT OUTER JOIN (SELECT max(valstr) AS LASTNAME
FROM attribute1table
WHERE attribute1table.attrid = 2) m2
ON 1 = 1
LEFT OUTER JOIN (SELECT max(valstr) AS FIRSTNAME
FROM attribute1table
WHERE attribute1table.attrid = 3) m3
ON 1 = 1
WHERE directory.dataid = 1290 AND
stringval IS NOT NULL AND
(m2.LASTNAME IS NOT NULL OR
m3.FIRSTNAME IS NOT NULL)
I also rewrote the query using joins instead of subselects as I think it's a bit clearer.
Note also that in the M2 and M3 joins I used LEFT OUTER with a condition of 1 = 1 rather than using CROSS JOIN, because I've noticed that CROSS JOIN acts like an INNER JOIN if the query being cross-joined returns no rows - that is, it causes the entire SELECT to return no data. dbfiddle demonstrating this situation here
I'm pretty sure you just need conditional aggregation:
SELECT d.name, d.dataid,
MAX(CASE WHEN a1.attrid = 2 THEN a2.stringval END) as LastName,
MAX(CASE WHEN a1.attrid = 3 THEN a2.stringval END) as FirstName
FROM directory d JOIN
attribute2table a2
ON a2.id = d.dataid JOIN
attribute1table a1
ON a1.id = d.dataid
WHERE d.dataid = 1290
GROUP BY d.name, d.dataid

Performance Issue in Left outer join Sql server

In my project I need find difference task based on old and new revision in the same table.
id | task | latest_Rev
1 A N
1 B N
2 C Y
2 A Y
2 B Y
Expected Result:
id | task | latest_Rev
2 C Y
So I tried following query
Select new.*
from Rev_tmp nw with (nolock)
left outer
join rev_tmp old with (nolock)
on nw.id -1 = old.id
and nw.task = old.task
and nw.latest_rev = 'y'
where old.task is null
when my table have more than 20k records this query takes more time?
How to reduce the time?
In my company don't allow to use subquery
Use LAG function to remove the self join
SELECT *
FROM (SELECT *,
CASE WHEN latest_Rev = 'y' THEN Lag(latest_Rev) OVER(partition BY task ORDER BY id) ELSE NULL END AS prev_rev
FROM Rev_tmp) a
WHERE prev_rev IS NULL
My answer assumes
You can't change the indexes
You can't use subqueries
All fields are indexed separately
If you look at the query, the only value that really reduces the resultset is latest_rev='Y'. If you were to eliminate that condition, you'd definitely get a table scan. So we want that condition to be evaluated using an index. Unfortunately a field that just values 'Y' and 'N' is likely to be ignored because it will have terrible selectivity. You might get better performance if you coax SQL Server into using it anyway. If the index on latest_rev is called idx_latest_rev then try this:
Set transaction isolated level read uncommitted
Select new.*
from Rev_tmp nw with (index(idx_latest_rev))
left outer
join rev_tmp old
on nw.id -1 = old.id
and nw.task = old.task
where old.task is null
and nw.latest_rev = 'y'
latest_Rev should be a Bit type (boolean equivalent), i better for performance (Detail here)
May be can you add index on id, task
, latest_Rev columns
You can try this query (replace left outer by not exists)
Select *
from Rev_tmp nw
where nw.latest_rev = 'y' and not exists
(
select * from rev_tmp old
where nw.id -1 = old.id and nw.task = old.task
)