SQL - Running Select Query with Having Clause - sql

here is the query I want to run.
SELECT COUNT(tableA.ID)
FROM tableA
NATURAL JOIN tableB
NATURAL JOIN tableC
WHERE tableB.Time IS NULL
GROUP BY tableA.ID
HAVING COUNT(tableA.ID) < tableC.Quantity
This query will run perfectly fine without the HAVING clause, however the HAVING clause has an error which I can't pick out.
The purpose of the HAVING clause is that I want to return ID's that have less than the Quantity threshold (which is defined as tableC.Quantity).
How can I fix my current HAVING clause to incorporate that the query only returns ID's that are less than the tableC.Quantity.
Note: if you need more clarification, I can provide more.

I am going to assume that the error is something to the effect that tableC.quantity is not in the group by clause (and that you are not using MySQL). If so, you can fix this by using an aggregation function:
SELECT COUNT(tableA.ID)
FROM tableA NATURAL JOIN
tableB NATURAL JOIN
tableC
WHERE tableB.Time IS NULL
GROUP BY tableA.ID
HAVING COUNT(tableA.ID) < max(tableC.Quantity);
By the way, I think natural join is a dangerous operation. You could add a new column to a table and invalidate all your queries, with no error message to tell you what is going wrong.

Related

SQL subselect statement very slow on certain machines

I've got an sql statement where I get a list of all Ids from a table (Machines).
Then need the latest instance of another row in (Events) where the the id's match so have been doing a subselect.
I need to latest instance of quite a few fields that match the id so have these subselects after one another within this single statement so end up with results similar to this...
This works and the results are spot on, it's just becoming very slow as the Events Table has millions of records. The Machine table would have on average 100 records.
Is there a better solution that subselects? Maybe doing inner joins or a stored procedure?
Help appreciated :)
You can use apply. You don't specify how "latest instance" is defined. Let me assume it is based on the time column:
Select a.id, b.*
from TableA a outer apply
(select top(1) b.Name, b.time, b.weight
from b
where b.id = a.id
order by b.time desc
) b;
Both APPLY and the correlated subquery need an ORDER BY to do what you intend.
APPLY is a lot like a correlated query in the FROM clause -- with two convenient enhances. A lateral join -- technically what APPLY does -- can return multiple rows and multiple columns.

LEFT JOIN WHERE RIGHT IS NULL for same table in Teradata SQL

I have a table with 51 records . The table structure looks something like below :
ack_extract_id query_id cnst_giftran_key field1 value1
Now ack_extract_ids can be 8,9.
I want to check for giftran keys which are there for extract_id 9 and not there in 8.
What I tried was
SELECT *
FROM ddcoe_tbls.ack_flextable ack_flextable1
INNER JOIN ddcoe_tbls.ack_main_config config
ON ack_flextable1.ack_extract_id = config.ack_extract_id
LEFT JOIN ddcoe_tbls.ack_flextable ack_flextable2
ON ack_flextable1.cnst_giftran_key = ack_flextable2.cnst_giftran_key
WHERE ack_flextable2.cnst_giftran_key IS NULL
AND config.ack_extract_file_nm LIKE '%Dtl%'
AND ack_flextable2.ack_extract_id = 8
AND ack_flextable1.ack_extract_id = 9
But it is returning me 0 records. Ideally the left join where right is null should have returned the record for which the cnst_giftran_key is not present in the right hand side table, right ?
What am I missing here ?
When you test columns from the left-joined table in the where clause (ack_flextable2.ack_extract_id in your case), you force that join to behave as if it were an inner join. Instead, move that test to be part of the join condition.
Then to find records where that value is missing, test for a NULL key in the where clause.
SELECT *
FROM ddcoe_tbls.ack_flextable ack_flextable1
INNER JOIN ddcoe_tbls.ack_main_config config
ON ack_flextable1.ack_extract_id = config.ack_extract_id
LEFT JOIN ddcoe_tbls.ack_flextable ack_flextable2
ON ack_flextable1.cnst_giftran_key = ack_flextable2.cnst_giftran_key
AND ack_flextable2.ack_extract_id = 8
WHERE ack_flextable2.cnst_giftran_key IS NULL
AND config.ack_extract_file_nm LIKE '%Dtl%'
AND ack_flextable1.ack_extract_id = 9
AND ack_flextable2.cnst_giftran_key IS NULL
THIS IS NO ANSWER, JUST AN EXPLANATION
From your comment to Joe Stefanelli's answer I gather that you don't fully understand the issue with WHERE and ON in an outer join. So let's look at an example.
We are looking for all supplier's last orders, i.e. the order records where there is no newer order for the supplier.
select *
from order
where not exists
(
select *
from order newer
where newer.supplier = order.supplier
and newer.orderdate > order.orderdate
);
This is straight-forward; the query matches what we just put in words: Find orders for which NOT EXISTS a newer order for the same supplier.
The same query with the anti-join pattern:
select order.*
from order
left join order newer on newer.supplier = order.supplier
and newer.orderdate > order.orderdate
where newer.id is null;
Here we join every order with all their newer orders, thus probably creating a huge intermediate result. With the left outer join we make sure we get a dummy record attached when there is no newer order for the supplier. Then at last we scan the intermediate result with the WHERE clause, keeping only records where the attached record has an ID null. Well, the ID is obviously the table's primary key and can never be null, so what we keep here is only the outer-joined results where the newer data is just a dummy record containing nulls. Thus we get exactly the orders for which no newer order exists.
Talking about a huge intermediate result: How can this be faster than the first query? Well, it shouldn't. The first query should actually either run equally fast or faster. A good DBMS will see through this and make the same execution plan for both queries. A rather young DBMS however may really execute the anti join quicker. That is because the developers put so much effort into join techniques, as these are needed in about every query, and didn't yet care about IN and EXISTS that much. In such a case one may run into performance issues with NOT IN or NOT EXISTS and use the anti-join pattern instead.
Now as to the WHERE / ON problem:
select order.*
from order
left join order newer on newer.orderdate > order.orderdate
where newer.supplier = order.supplier
and newer.id is null;
This looks almost the same as before, but some criteria has moved from ON to WHERE. This means the outer join gets different criteria. Here is what happens: for every order find all newer orders &dash; no matter which supplier! So it is all orders of the last order date that get an outer-join dummy record. But then in the WHERE clause we remove all pairs where the supplier doesn't match. Notice that the outer-joined records contain NULL for newer.supplier, so newer.supplier = order.supplier is never true for them; they get removed. But then, if we remove all outer-joined records we get exactly the same result as with a vanilla inner join. When we put outer join criteria in the WHERE clause we turn the outer join into an inner join. So the query can be re-written as
select order.*
from order
inner join order newer on newer.orderdate > order.orderdate
where newer.supplier = order.supplier
and newer.id is null;
And with tables in FROM and INNER JOIN it doesn't matter whether the criteria is in ON or WHERE; it's rather a matter of readability, because both criteria will equally get applied.
Now we see that newer.id is null can never be true. The final result will be empty &dash; which is exactly what happened with your query.
You can try with this query:
select * from ddcoe_tbls.ack_main_config
where cnst_giftran_key not in
(
select cnst_giftran_key from ddcoe_tbls.ack_main_config
where ack_extract_id = 8
)
and ack_extract_id = 9;

Oracle subquery with MAX operator taking very long

I have a web application that was using a very complex database view to retrieve some data which appeared to be very slow, taking up to 3 minutes to complete only . After a thorough investigation I've found what was the cause of the problem.
In my code I was using the following condition in the WHERE clause to retrieve only the LAST element of a joined table:
SELECT ...
FROM MY_TABLE
JOIN TABLE_JOIN_XX tableA on ....
..lots of other joins ...
WHERE
tableA.id =
(SELECT MAX (id) FROM TABLE_JOIN_XX tableB WHERE tableA.id_parent = tableB.id_parent)
I have then changed the condition in the following way:
tableA.id >= ALL
(SELECT id FROM TABLE_JOIN_XX tableB WHERE tableA.id_parent = tableB.id_parent)
and now the query takes only a couple of seconds.
Now I'm wondering why there is this huge difference in execution time between using the MAX operator and the ALL operator. I am quite surprised indeed. I am no DBA and not very expert in query optimization, but maybe there is something that I don't know and that I should take in consideration while developing my queries for database access.
Or maybe is something related to a problem in that specific Oracle instance and not to the query? I've never noticed this problem in other instances of the same database.
Looking at the explain plan I've noticed that in the second case (and not in the first one) Oracle replaces the ALL operator with a NOT EXISTS:
not exists
(select 0 from TABLE_JOIN_XX tableA
where tableA.id_parent=:b1 and LNNVL (id<=:b2))
Any suggestion?
Many thanks.
Your query seems malformed. This is your statement:
WHERE tableA.id = (SELECT MAX(id) FROM TABLE_JOIN_XX tableB WHERE tableA.id = tableB.id)
You are doing a correlated subquery on the column id. Then you are choosing the maximum value. The subquery can only return tableA.id or NULL, so this is equivalent to:
WHERE EXISTS (SELECT 1 FROM TABLE_JOIN_XX tableB WHERE tableA.id = tableB.id)
Perhaps Oracle is getting a bit confused. In any case, by using MAX(), you are saying that all the values need to be processed, so Oracle is probably doing that. In fact, it only needs to find one row with a value.
An index on TABLE_JOIN_XX(id) should help this query even more.

how the sub select query in the case clause get the parameter value of the main query

I have two tables both has id columns, but TableA.id is char, and TableB.id is int. now I want to join two tables, but the problem is there are some string in A.id can't be converted to int. Here is the query I wrote
SELECT
case
when Column1 is null
then (select Surname from TableB
where TableA.id = TableB.id
)
else Column1
end
FROM TableA
GO
the sub select query returns a bunch of records, so my question is that is it possible to run that subquery with the current TableB.id? I am not sure if i explained this clearly, how the subquery get the TableB.id's value of the main query. Thanks
I'm not sure I'm following you, but it sounds like the crux of the problem is that you are trying to join on ID, but they are different field types. Perhaps something like:
SELECT
COALESCE(TableA.Column1, TableB.Surname)
FROM
TableA
LEFT OUTER JOIN TableB On
TableA.ID = Cast(TableB.ID AS Char(64))
I was just taking a guess at the CHAR size, but I assume that's ample. Also I'm not sure what DB you are working on so the syntax may need a bit tweaked.
There is a feature that can do that. It's called a Correlated Subquery, although I'm not sure they work inside case statements.

SQL Method of checking that INNER / LEFT join doesn't duplicate rows

Is there a good or standard SQL method of asserting that a join does not duplicate any rows (produces 0 or 1 copies of the source table row)? Assert as in causes the query to fail or otherwise indicate that there are duplicate rows.
A common problem in a lot of queries is when a table is expected to be 1:1 with another table, but there might exist 2 rows that match the join criteria. This can cause errors that are hard to track down, especially for people not necessarily entirely familiar with the tables.
It seems like there should be something simple and elegant - this would be very easy for the SQL engine to detect (have I already joined this source row to a row in the other table? ok, error out) but I can't seem to find anything on this. I'm aware that there are long / intrusive solutions to this problem, but for many ad hoc queries those just aren't very fun to work out.
EDIT / CLARIFICATION: I'm looking for a one-step query-level fix. Not a verification step on the results of that query.
If you are only testing for linked rows rather than requiring output, then you'd use EXISTS.
More correctly, you need a "semi-join" but this isn't supported by most RDBMS unless as EXISTS
SELECT a.*
FROM TableA a
WHERE EXISTS (SELECT * FROM TableB b WHERE a.id = b.id)
Also see:
Using 'IN' with a sub-query in SQL Statements
EXISTS vs JOIN and use of EXISTS clause
SELECT JoinField
FROM MyJoinTable
GROUP BY JoinField
HAVING COUNT(*) > 1
LIMIT 1
Is that simple enough? Don't have Postgres but I think it's valid syntax.
Something along the lines of
SELECT a.id, COUNT(b.id)
FROM TableA a
JOIN TableB b ON a.id = b.id
GROUP BY a.id
HAVING COUNT(b.id) > 1
Should return rows in TableA that have more than one associated row in TableB.