Matching multiple columns in a nested query - sql

I'm trying to make a nested query, where I return values from multiple columns and match those with values from another column. Something similar to the code below (this is for work so I can't put the exact code here)
select *
from store1
where (orderNum, customer, total) not in (
select orderNum, customer, total from store2)
When I try to run the code, I get the error:
An expression of non-boolean type specified in a context where a condition is expected, near ','.
Is there a way to do this is SQL Server. I know using a join is an option but I'd prefer to avoid that at this time. Thanks for any help!

SQL Server doesn't support table constructors like that, but there are lots of other ways to do it. My preferred approach is NOT EXISTS:
SELECT * FROM dbo.store1
WHERE NOT EXISTS
(
SELECT 1 FROM dbo.store2
WHERE store1.orderNum = store2.orderNum
AND store1.customer = store2.customer
AND store1.total = store2.total
);
For other approaches, see:
Should I use NOT IN, OUTER APPLY, LEFT OUTER JOIN, EXCEPT, or NOT EXISTS?

"count" returns a single value, works fast. Easy to understand.
select *
from store1
where
(select count(*) from store2 where store2.orderNum=store1.orderNum and store2.customer=store1.customer and store2.total=store1.total)=0

Related

Athena subquery not working - incorrect table name

I'm trying to write an athena query that uses a subquery in the where clause because I want to place a restriction on an array-type field. I don't want to do a cross join unnest since I don't want to flatten each row.
Example query:
SELECT
foo.some_scalar_field,
foo.some_other_scalar_field
FROM "fake_db"."table_name"
WHERE EXISTS (SELECT NULL FROM foo.some_array_field as T(item) WHERE item = "THING")
When I do this, I get an error saying that "foo" isn't a valid table name. I'm a bit new to athena, however a previous SQL engine I used to use supported this kind of query. Is doing this not possible in Athena?
Edit;;
I went ahead and added an unnest:
SELECT
foo.some_scalar_field,
foo.some_other_scalar_field
FROM "fake_db"."table_name"
WHERE EXISTS (SELECT NULL FROM UNNEST(foo.some_array_field) as T(item) WHERE item = "THING")
Now i get an error that the correlated subquery here is not supported. It seems like Athena doesn't support correlated subqueries?
There is no need for unnest and correlated subqueries, just use array functions. Something along this lines:
SELECT
foo.some_scalar_field,
foo.some_other_scalar_field
FROM "fake_db"."table_name"
WHERE any_match(foo.some_array_field, item -> item = "THING")

Oracle - Inner query takes time

I have two queries:
select * from PRE_DETAIL_REPORT a where item = (select item from apple_skus);
select * from PRE_DETAIL_REPORT a where item IN ('100299122');
the table: APPLE_SKUS
only has one item: 100299122
When I run the first query, it takes 2 minutes to execute
When I run the second query, it takes 3 seconds to execute
What can be the reason?
you can rewrite it in this way:
select a.* from PRE_DETAIL_REPORT a
join apple_skus t on t.item = a.item;
Its the way a sql query syntax works
You have manual values for selection in your 2nd query but in the first case you have subquery specified so again a
FROM CLAUSE, N THEN SELECT so, Querying a table will take
more time than a hardcode value even when theres a single record
You could try EXISTS as it uses correlated subquery which would be much faster
Select * from table t1 where exists (select 1 from table
where
Item =t1.item)
It’s very likely that the difference is due to a different access to PRE_DETAIL_REPORT; and as mentioned earlier by someone, an explain plan (or SQL Monitor report) will tell you the answer.
But until you provide the diagnostic, this is just a guess…

Different results from using IN and EXISTS

I have a query that has been giving me fits. Basically I want a left outer join, but without using a join.
I started off using IN and got back about 13,000 rows. If I use EXISTS, I then get about 11,000 rows. Even if I use GROUP BY to make sure duplicates aren't counted, there's still a difference.
Here's some code
This one with exists
SELECT upper(EMAIL_ADDRESS)
FROM DATA.CRM_CONTACTS
WHERE EXISTS
(
SELECT upper(Email_address)
FROM DATA.MMBI
WHERE DATA.CRM_CONTACTS.Email_address = DATA.MMBI.Email_Address
)
group by 1
order by 1
And this is code that uses IN:
SELECT upper(EMAIL_ADDRESS)
FROM DATA.CRM_CONTACTS
WHERE upper(EMAIL_ADDRESS) IN
(
SELECT upper(Email_address)
FROM DATA.MMBI
)
group by 1
order by 1
Is there any reason that would explain why I'm getting different results?
Assuming that you're using SQL Server:
In your in case, you're making a case-insensitive comparison, uppercasing both values to be compared:
WHERE upper(EMAIL_ADDRESS) IN ( SELECT upper(Email_address)
FROM DATA.MMBI
)
In your exists case, your join criteria for the correlated subquery is this
WHERE DATA.CRM_CONTACTS.Email_address = DATA.MMBI.Email_Address
Which means it's going to use the collation in play to make the comparison, which might be case-sensitive.

Using the DISTINCT keyword causes this error: not a SELECTed expression

I have a query that looks something like this:
SELECT DISTINCT share.rooms
FROM Shares share
left join share.rooms.buildingAdditions.buildings.buildingInfoses as bi
... //where clause omitted
ORDER BY share.rooms.floors.floorOrder, share.rooms.roomNumber,
share.rooms.firstEffectiveAt, share.shareNumber, share.sharePercent
Which results in the following exception:
Caused by: org.hibernate.exception.SQLGrammarException: ORA-01791: not a SELECTed expression
If I remove the DISTINCT keyword, the query runs without issue. If I remove the order by clause, the query runs without issue. Unfortunately, I can't seem to get the ordered result set without duplicates.
You are trying to order your result with columns that are not being calculated. This wouldn't be a problem if you didn't have the DISTINCT there, but since your query is basically grouping only by share.rooms column, how can it order that result set with other columns that can have multiple values for the same share.rooms one?
This post is a little old but one thing I did to get around this error is wrap the query and just apply the order by on the outside like so.
SELECT COL
FROM (
SELECT DISTINCT COL, ORDER_BY_COL
FROM TABLE
// ADD JOINS, WHERE CLAUSES, ETC.
)
ORDER BY ORDER_BY_COL;
Hope this helps :)
When using DISTINCT in a query that has an ORDER BY you must select all the columns you've used in the ORDER BY statement:
SELECT DISTINCT share.rooms.floors.floorOrder, share.rooms.roomNumber,
share.rooms.firstEffectiveAt, share.shareNumber, share.sharePercent
...
ORDER BY share.rooms.floors.floorOrder, share.rooms.roomNumber,
share.rooms.firstEffectiveAt, share.shareNumber, share.sharePercent

What do you put in a subquery's Select part when it's preceded by Exists?

What do you put in a subquery's Select part when it's preceded by Exists?
Select *
From some_table
Where Exists (Select 1
From some_other_table
Where some_condition )
I usually use 1, I used to put * but realized it could add some useless overhead.
What do you put? is there a more efficient way than putting 1 or any other dummy value?
I think the efficiency depends on your platform.
In Oracle, SELECT * and SELECT 1 within an EXISTS clause generate identical explain plans, with identical memory costs. There is no difference. However, other platforms may vary.
As a matter of personal preference, I use
SELECT *
Because SELECTing a specific field could mislead a reader into thinking that I care about that specific field, and it also lets me copy / paste that subquery out and run it unmodified, to look at the output.
However, an EXISTS clause in a SQL statement is a bit of a code smell, IMO. There are times when they are the best and clearest way to get what you want, but they can almost always be expressed as a join, which will be a lot easier for the database engine to optimize.
SELECT *
FROM SOME_TABLE ST
WHERE EXISTS(
SELECT 1
FROM SOME_OTHER_TABLE SOT
WHERE SOT.KEY_VALUE1 = ST.KEY_VALUE1
AND SOT.KEY_VALUE2 = ST.KEY_VALUE2
)
Is logically identical to:
SELECT *
FROM
SOME_TABLE ST
INNER JOIN
SOME_OTHER_TABLE SOT
ON ST.KEY_VALUE1 = SOT.KEY_VALUE1
AND ST.KEY_VALUE2 = SOT.KEY_VALUE2
I also use 1. I've seen some devs who use null. I think 1 is efficient compared to selecting from any field as the query won't have to get the actual value from the physical loc when it executes the select clause of the subquery.
Use:
WHERE EXISTS (SELECT NULL
FROM some_other_table
WHERE ... )
EXISTS returns true if one or more of the specified criteria match - it doesn't matter if columns are actually returned in the SELECT clause. NULL just makes it explicit that there isn't a comparison while 1/etc could be a valid value previously used in an IN clause.