Nested Query Alternatives in AWS Athena - sql

I am running a query that gives a non-overlapping set of first_party_id's - ids that are associated with one third party but not another. This query does not run in Athena, however, giving the error: Correlated queries not yet supported.
Was looking at prestodb docs, https://prestodb.io/docs/current/sql/select.html (Athena is prestodb under the hood), for an alternative to nested queries. The with statement example given doesn't seem to translate well for this not in clause. Wondering what the alternative to a nested query would be - Query below.
SELECT
COUNT(DISTINCT i.third_party_id) AS uniques
FROM
db.ids i
WHERE
i.third_party_type = 'cookie_1'
AND i.first_party_id NOT IN (
SELECT
i.first_party_id
WHERE
i.third_party_id = 'cookie_2'
)

There may be a better way to do this - I would be curious to see it too! One way I can think of would be to use an outer join. (I'm not exactly sure about how your data is structured, so forgive the contrived example, but I hope it would translate ok.) How about this?
with
a as (select *
from (values
(1,'cookie_n',10,'cookie_2'),
(2,'cookie_n',11,'cookie_1'),
(3,'cookie_m',12,'cookie_1'),
(4,'cookie_m',12,'cookie_1'),
(5,'cookie_q',13,'cookie_1'),
(6,'cookie_n',13,'cookie_1'),
(7,'cookie_m',14,'cookie_3')
) as db_ids(first_party_id, first_party_type, third_party_id, third_party_type)
),
b as (select first_party_type
from a where third_party_type = 'cookie_2'),
c as (select a.third_party_id, b.first_party_type as exclude_first_party_type
from a left join b on a.first_party_type = b.first_party_type
where a.third_party_type = 'cookie_1')
select count(distinct third_party_id) from c
where exclude_first_party_type is null;
Hope this helps!

You can use an outer join:
SELECT
COUNT(DISTINCT i.third_party_id) AS uniques
FROM
db.ids a
LEFT JOIN
db.ids b
ON a.first_party_id = b.first_party_id
AND b.third_party_id = 'cookie_2'
WHERE
a.third_party_type = 'cookie_1'
AND b.third_party_id is null -- this line means we select only rows where there is no match
You should also use caution when using NOT IN for subqueries that may return NULL values since the condition will always be true. Your query is comparing a.first_party_id to NULL, which will always be false and so NOT IN will lead to the condition always being true. Nasty little gotcha.
One way to avoid this is to avoid using NOT IN or to add a condition to your subquery i.e. AND third_party_id IS NOT NULL.
See here for a longer explanation.

Related

Sql subquery issue suming subfields

I am trying to get a subquery to work but i cant get past subquery returned more than 1 value. this is not permitted. No matter how i try to re-write this query I cannot get this sum area to work. I am only fairly new to subqueries so any assistance you could provide would be great the query currently stands as
SELECT dbo.h.DateDelivery, dbo.h.DateInProduction, dbo.h.JobNumber, dbo.h.JobKeyID, dbo.h.QuantityFrames, dbo.h.QuantityGlass, dbo.h.QuantityPanels,
dbo.r.Description, dbo.r.DivisionID, dbo.r.rKeyID, (SELECT SUM(t.QtyPacks) AS packqty
FROM t INNER JOIN
h ON t.JobKeyID = h.JobKeyID
WHERE (t.StageID = 10) OR
(t.StageID = 20) OR
(t.StageID = 28)
GROUP BY t.JobKeyID)
FROM dbo.h INNER JOIN
dbo.r ON dbo.h.rKeyID = dbo.r.rKeyID
WHERE (NOT (dbo.r.rKeyID IN (1, 50, 81, 91)))
You haven't specified which DBMS you are using, however in general a scalar subquery expression must return exactly one column value from one row (see for instance the Oracle documentation).
In your code there is a GROUP BY t.JobKeyID which might cause the subquery to return more than one row.

The "where" condition worked not as expected ("or" issue)

I have a problem to join thoses 4 tables
Model of my database
I want to count the number of reservations with different sorts (user [mrbs_users.id], room [mrbs_room.room_id], area [mrbs_area.area_id]).
Howewer when I execute this query (for the user (id=1) )
SELECT count(*)
FROM mrbs_users JOIN mrbs_entry ON mrbs_users.name=mrbs_entry.create_by
JOIN mrbs_room ON mrbs_entry.room_id = mrbs_room.id
JOIN mrbs_area ON mrbs_room.area_id = mrbs_area.id
WHERE mrbs_entry.start_time BETWEEN "145811700" and "1463985000"
or
mrbs_entry.end_time BETWEEN "1458120600" and "1463992200" and mrbs_users.id = 1
The result is the total number of reservations of every user, not just the user who has the id = 1.
So if anyone could help me.. Thanks in advance.
Use parentheses in the where clause whenever you have more than one condition. Your where is parsed as:
WHERE (mrbs_entry.start_time BETWEEN "145811700" and "1463985000" ) or
(mrbs_entry.end_time BETWEEN "1458120600" and "1463992200" and
mrbs_users.id = 1
)
Presumably, you intend:
WHERE (mrbs_entry.start_time BETWEEN 145811700 and 1463985000 or
mrbs_entry.end_time BETWEEN 1458120600 and 1463992200
) and
mrbs_users.id = 1
Also, I removed the quotes around the string constants. It is bad practice to mix data types, and in some databases, the conversion between types can make the query less efficient.
The problem you've faced caused by the incorrect condition WHERE.
So, should be:
WHERE (mrbs_entry.start_time BETWEEN 145811700 AND 1463985000 )
OR
(mrbs_entry.end_time BETWEEN 1458120600 AND 1463992200 AND mrbs_users.id = 1)
Moreover, when you use only INNER JOIN (JOIN) then it be better to avoid WHERE clause, because the ON clause is executed before the WHERE clause, so criteria there would perform faster.
Your query in this case should be like this:
SELECT COUNT(*)
FROM mrbs_users
JOIN mrbs_entry ON mrbs_users.name=mrbs_entry.create_by
JOIN mrbs_room ON mrbs_entry.room_id = mrbs_room.id
AND
(mrbs_entry.start_time BETWEEN 145811700 AND 1463985000
OR ( mrbs_entry.end_time BETWEEN 1458120600 AND 1463992200 AND mrbs_users.id = 1)
)
JOIN mrbs_area ON mrbs_room.area_id = mrbs_area.id

SQL joins vs nested query

This two SQL statements return equal results but first one is much slower than the second:
SELECT leading.email, kstatus.name, contacts.status
FROM clients
JOIN clients_leading ON clients.id_client = clients_leading.id_client
JOIN leading ON clients_leading.id_leading = leading.id_leading
JOIN contacts ON contacts.id_k_p = clients_leading.id_group
JOIN kstatus on contacts.status = kstatus.id_kstatus
WHERE (clients.email = 'some_email' OR clients.email1 = 'some_email')
ORDER BY contacts.date DESC;
SELECT leading.email, kstatus.name, contacts.status
FROM (
SELECT *
FROM clients
WHERE (clients.email = 'some_email' OR clients.email1 = 'some_email')
)
AS clients
JOIN clients_leading ON clients.id_client = clients_leading.id_client
JOIN leading ON clients_leading.id_leading = leading.id_leading
JOIN contacts ON contacts.id_k_p = clients_leading.id_group
JOIN kstatus on contacts.status = kstatus.id_kstatus
ORDER BY contacts.date DESC;
But I'm wondering why is it so? It looks like in the firt statement joins are done first and then WHERE clause is applied and in second is just the opposite. But will it behave the same way on all DB engines (I tested it on MySQL)?
I was expecting DB engine can optimize queries like the fors one and firs apply WHERE clause and then make joins.
There are a lot of different reasons this could be (keying etc), but you can look at the explain mysql command to see how the statements are being executed. If you can run that and if it still is a mystery post it.
You always can replace join with nested query... It's always faster but lot messy...

What approach should I follow if I need to select a data 'EXCEPT' some other bunch of data?

What approach should I follow to construct my SQL query if I need to select a data exepct some other data?
For example, my
I want so select all the data from the data-base EXCEPT this result-set:
SELECT *
FROM table1
WHERE table1.MarketTYpe = 'EmergingMarkets'
AND IsBigOne = 1
AND MarketVolume = 'MIDDLE'
AND SomeClass = 'ThirdClass'
Should I use
NOT IN (the aboe result set)
Or shoudl I get INVERSE of the conditions like != inseat of = etc.
Or ?
Can you advice?
Use the EXCEPT construct?
SELECT *
FROM table1
EXCEPT
SELECT *
FROM table1
WHERE table1.MarketTYpe = 'EmergingMarkets'
AND IsBigOne = 1
AND MarketVolume = 'MIDDLE'
AND SomeClass = 'ThirdClass'
Note that EXCEPT and NOT EXISTS give the same query plan using "left anti semi joins".
NOT IN (subquery with above) may not give correct results if there are NULL values in the sub-query, hence I wouldn't use
I would avoid negation in the WHERE clause because it isn't readable straight awayAs the comments show on Michael's answer...
For more on "all rows except some rows", see these:
Combining datasets with EXCEPT versus checking on IS NULL in a LEFT JOIN
To take out those dept who has no employees assigned to it
SQL NOT IN possibly performance issues
(DBA.SE) https://dba.stackexchange.com/questions/4009/the-use-of-not-logic-in-relation-to-indexes/4010#4010
What database engine?
Minus operator in ORACLE
Except operator in SQL Server
Simplest and probably fastest here is to simply invert the conditions:
SELECT *
FROM table1
WHERE table1.MarketTYpe <> 'EmergingMarkets'
OR IsBigOne <> 1
OR MarketVolume <> 'MIDDLE'
OR SomeClass <> 'ThirdClass'
This is likely to use lots fewer resources than doing a NOT IN(). You may wish to benchmark them to be certain, but the above is likely to be faster.
Use NOT IN because that makes it clear that you want the set in the main select statement excluding the subset in the NOT IN select statement.
I like gbn's answer, but another way of doing it can be:
SELECT *
FROM table1
WHERE NOT (table1.MarketTYpe = 'EmergingMarkets'
AND IsBigOne = 1
AND MarketVolume = 'MIDDLE'
AND SomeClass = 'ThirdClass')

sql query question inner join

LEFT JOIN PatientClinics AB ON PPhy.PatientID = AB.PatientID
JOIN Clinics CL ON CL.ID = AB.ClinicID
AND COUNT(AB.ClinicID) = 1
I get error using Count(AB.ClinicID) = 1 (ClinicID has duplicate values in the table and
I want to use only 1 value of each duplicate value of ClinicId to produce result)
What mistake am I making?
I've never seen a COUNT() being used in a JOIN before. Maybe you should use:
HAVING COUNT(AB.ClinicID) = 1
instead.
Count() can't be used as a join/filter predicate. It can be used in the HAVING clause however. You should include the entire query in order to get a better example of how to rewrite it.
maybe investigate the HAVING clause instead of using COUNT where you put it.
hard to help without the full query.