How to debug in the middle of a SELECT query? - sql

I'm debugging a SELECT query with correlated subquery that isn't returning the right results. The form of my query is, schematically
SELECT id, cost, country
FROM table AS table1
WHERE table1.cost =
(
SELECT MAX(table2.cost)
FROM table as table2
WHERE table2.country = table1.country
)
I'd like to be able to pause mid-execution and see the values being compared in the WHERE statement. However a PRINT statement seems not to run if I put it as the first line of the subquery, for example. I am new to SQL and would be curious what a more experienced person considers to be the most time-efficient way to debug in such a situation.

SQL does not support what you want want to do.
It is very important to remember that SQL is a descriptive language not a procedural language. A SQL query describes the result set being produced. It does not specify the specific actions being taken.
If you want to know what the result of the subquery is, then run that as a separate query.

Related

Postgres all subqueries in coalesce executed

COALESCE in Postgres is a function that returns the first parameter not null.
So I used coalesce in subqueries like:
SELECT COALESCE (
( SELECT * FROM users WHERE... ORDER BY ...),
( SELECT * FROM users WHERE... ORDER BY ...),
( SELECT * FROM users WHERE... ORDER BY ...),
( SELECT * FROM users WHERE... ORDER BY ...)
);
I change the where in any query and they contain lots of params and CASE, also different ORDER BY clauses.
This is because I always want to return something but giving priorities.
What I noticed while issuing EXPLAIN ANALYZE is that any query is executed despite the first one actually returns NOT a null value.
I would expect the engine to run only the first one query and not the following ones if it returns not null.
This way I could have a bad performance.
So am I doing any bad practice and is it better to run the queries separately for performance reason?
EDIT:
Sorry you where right I don’t select * but I select only one column. I didn’t post my code because I am not interested in my query but it’s a generic question to understand how the engine is working. So I reproduce a very simple fiddle here http://sqlfiddle.com/#!17/a8aa7/4
I may be wrong but I think it behaves as I was telling: it runs all the subqueries despite the first one already returns a not null value
EDIT 2: ok I read only now it says never executed. So the other two queries aren’t getting executed. What confused me was the fact they were included in the query plan.
Anyways it’s still important for my question. Is it better to run all the queries separately for performance reasons? Because it seems like that even if the first one returns a not null value the other two subqueries can slow down the performance
For separate SELECT queries, I suggest to use UNION ALL / LIMIT 1 instead. Based on your fiddle:
(select user_id from users order by age limit 1) -- misleading example, see below
UNION ALL
(select user_id from users where user_id=1)
UNION ALL
(select user_id from users order by user_id DESC limit 1)
LIMIT 1;
db<>fiddle here
For three reasons:
Works for any SELECT list: single expressions (your fiddle), multiple or whole row (your example in the question).
You can distinguish actual NULL values from "no row". Since user_id is the PK in the example (and hence, NOT NULL), the problem cannot surface in the example. But with an expression that can be NULL, COALESCE cannot distinguish between both, "no row" is coerced to NULL for the purpose of the query. See:
Return a value if no record is found
Faster.
Aside, your first SELECT in the example makes this a wild-goose chase. It returns a row if there is at least one. The rest is noise in this case.
Related:
PostgreSQL combine multiple select statements
SQL - does order of OR conditions matter?
Way to try multiple SELECTs till a result is available?

SQL Server performance on concat method

I need to write a query to fetch some data from table and need to use concat method. I end up preparing two different queries and I'm not sure which one will have better performance when we have huge amounts of data.
Please help me understand which query will perform better, and why.
I need to concat twice, one for displaying and one for where condition:
select
id, concat(boolean_value, long_value, string_value, double_value) as Value
from
table
where
concat(boolean_value, long_value, string_value, double_value) = 'XX'
Write the above query as a subquery and add the where condition
Select *
from
(select
id, concat(boolean_value, long_value, string_value, double_value) as Value
from
table) as output
where
Value = 'XX'
Note: this is sample query and the actual query will have multiple joins and the need to concat from multiple columns of different tables
SQL databases represent the result set being produced. You seem to be asking if there is common expression elimination -- that is, will the concat() be performed exactly once.
You have a better chance of this with the subquery than with the version that repeat the expression. But SQL Server could be smart enough to only evaluate it once.
If you want to guarantee single evaluation, then I think cross apply does that:
select t.id, v.value
from table t cross apply
(values (concat( boolean_value, long_value, string_value, double_value))
) as Value
where v.value = 'XX';
I would, however, question why you are comparing the result of the concat() rather than the base columns. Comparing the base columns would allow the optimizer to take advantage of indexes, partitions, and statistics.
You may run EXPLAIN on both queries to see what exactly is on the mind of SQL Server. I would expect that the first version would be preferable, because, unlike the second version, it does not force SQL Server to materialize an intermediate table. Regarding whether or not SQL Server would have to actually evaluate CONCAT twice in the first version, it may not even matter if the cost of the subquery be very high.

Does this query include a correlated or non correlated subquery?

So I've written a simple query that gives me the ID #s for properties that show up only once in the property_usage table, along with the code for their associated usage type. Since I didn't want to include a column that shows the count of how many times each property ID shows up in the property_usage table, I wrote two subqueries to get a list of all the property IDs that only show up once. I then use the result of those subqueries (a single column of propertyIDs) to filter out those properties that show up more than once in the table.
here's the query:
select pu.property_id, pu.usage_type_id
from acres_final_40.property_usage pu
where pu.property_id not in
(select multiple_use_properties
from
(select pu.property_id multiple_use_properties, count(pu.property_id)
from acres_final_40.property_usage pu
group by pu.property_id having count(pu.property_id) > 1))
order by pu.property_id;
My question is: is that innermost subquery correlated or noncorrelated with the outermost query?
I have the following thoughts (see the paragraph below), but I'd like to know for sure whether I'm right about this. I'm learning all this stuff on my own and don't have anyone I can ask about this in person!
My feeling is that it's not, because it seems like the pu.propertyID column from the outermost query isn't a value that's passed into the innermost query. It seems like the innermost query may technically be a derived table, in which case my code is sloppy because I don't alias the table name in the FROM clause of that SELECT statement.
In a SQL database query, a correlated subquery (also known as a synchronized subquery) is a subquery (a query nested inside another query) that uses values from the outer query.
(Wikipedia, emph. mine.)
Yours does not, so it's not a correlated subquery. Basically, if you can cut away the subquery and run it as an independent query, without the outer context, it's definitely uncorrelated. It can be done in your case.
BTW you could probably rewrite it using a not exists clause to check if another record with the same PK but another property_id exists, and get a better query plan than using count(). This is my speculation, though; only an explain plan would show if there's a benefit.

Using Order By in IN condition

Let's say we have this silly query:
Select * From Emp Where Id In (Select Id From Emp)
Let's do a small modification inside IN condition by adding an Order By clause.
Select * From Mail Where Id In (Select Id From Mail Order By Id)
Then we are getting the following error:
ORA-00907: missing right parenthesis
Oracle assumes that IN condition will end after declaring the From table. As a result waits for right parenthesis, but we give an Order By instead.
Why can't we add an Order By inside IN condition?
FYI: I don't ask for a equivalent query. This is an example after all. I just can't understand why this error occurs.
Consider the condition x in (select something from somewhere). It returns true if x is equal to any of the somethings returned from the query - regardless of whether it's the first, second, last, or anything in the middle. The order that the somethings are returned is inconsequential. Adding an order by clause to a query often comes with a hefty performance hit, so I'm guessing this this limitation was introduced to prevent adding a clause that has no impact on the query's correctness on the one hand and may be quite expensive on the other.
It would not make sense to sort the values inside the IN clause. Think in this way:
IN (a, b, c)
is same as
IN (c, b, a)
IS SAME AS
IN (b, c, a)
Internally Oracle applies OR condition, so it is equivalent to:
WHERE id = a OR id = b OR id = c
Would it make any sense to order the conditions?
Ordering comes at it's own expense. So, when there is no need to sort, just don't do it. And in this case, Oracle applied the same rule.
When it comes to performance of the query, optimizer needs choose the best possible execution plan i.e. with the least cost to achieve the desired output. ORDER BY is useless in this case, and Oracle did a good job to prevent using it at all.
For the documentation,
Type of Condition Operation
----------------- ----------
IN Equal-to-any-member-of test. Equivalent to =ANY.
So, when you need to test ANY value for membership in a list of values, there is no need of ordered list, just a random matching does the job.
If you look at Oracle SQL reference (syntax diagrams) you will find a reason. ORDER BY is part of "select" statement, while IN clause uses lover level "subquery" statement. Your problem relates to nature of the Oracle's SQL grammar.
PS: there might be more gotchas like multiple UNION, MINUS on "subqueries" and then also you can use ONLY one ORDER BY clause, as this applies only onto result of UNION operation.
This will fail too:
select * from dual order by 1
union all
select * from dual order by 1;

Use SELECT inside an UPDATE query

How can I UPDATE a field of a table with the result of a SELECT query in Microsoft Access 2007.
Here's the Select Query:
SELECT Min(TAX.Tax_Code) AS MinOfTax_Code
FROM TAX, FUNCTIONS
WHERE (((FUNCTIONS.Func_Pure)<=[Tax_ToPrice]) AND ((FUNCTIONS.Func_Year)=[Tax_Year]))
GROUP BY FUNCTIONS.Func_ID;
And here's the Update Query:
UPDATE FUNCTIONS
SET FUNCTIONS.Func_TaxRef = [Result of Select query]
Well, it looks like Access can't do aggregates in UPDATE queries. But it can do aggregates in SELECT queries. So create a query with a definition like:
SELECT func_id, min(tax_code) as MinOfTax_Code
FROM Functions
INNER JOIN Tax
ON (Functions.Func_Year = Tax.Tax_Year)
AND (Functions.Func_Pure <= Tax.Tax_ToPrice)
GROUP BY Func_Id
And save it as YourQuery. Now we have to work around another Access restriction. UPDATE queries can't operate on queries, but they can operate on multiple tables. So let's turn the query into a table with a Make Table query:
SELECT YourQuery.*
INTO MinOfTax_Code
FROM YourQuery
This stores the content of the view in a table called MinOfTax_Code. Now you can do an UPDATE query:
UPDATE MinOfTax_Code
INNER JOIN Functions ON MinOfTax_Code.func_id = Functions.Func_ID
SET Functions.Func_TaxRef = [MinOfTax_Code].[MinOfTax_Code]
Doing SQL in Access is a bit of a stretch, I'd look into Sql Server Express Edition for your project!
I wrote about some of the limitations of correlated subqueries in Access/JET SQL a while back, and noted the syntax for joining multiple tables for SQL UPDATEs. Based on that info and some quick testing, I don't believe there's any way to do what you want with Access/JET in a single SQL UPDATE statement. If you could, the statement would read something like this:
UPDATE FUNCTIONS A
INNER JOIN (
SELECT AA.Func_ID, Min(BB.Tax_Code) AS MinOfTax_Code
FROM TAX BB, FUNCTIONS AA
WHERE AA.Func_Pure<=BB.Tax_ToPrice AND AA.Func_Year= BB.Tax_Year
GROUP BY AA.Func_ID
) B
ON B.Func_ID = A.Func_ID
SET A.Func_TaxRef = B.MinOfTax_Code
Alternatively, Access/JET will sometimes let you get away with saving a subquery as a separate query and then joining it in the UPDATE statement in a more traditional way. So, for instance, if we saved the SELECT subquery above as a separate query named FUNCTIONS_TAX, then the UPDATE statement would be:
UPDATE FUNCTIONS
INNER JOIN FUNCTIONS_TAX
ON FUNCTIONS.Func_ID = FUNCTIONS_TAX.Func_ID
SET FUNCTIONS.Func_TaxRef = FUNCTIONS_TAX.MinOfTax_Code
However, this still doesn't work.
I believe the only way you will make this work is to move the selection and aggregation of the minimum Tax_Code value out-of-band. You could do this with a VBA function, or more easily using the Access DLookup function. Save the GROUP BY subquery above to a separate query named FUNCTIONS_TAX and rewrite the UPDATE statement as:
UPDATE FUNCTIONS
SET Func_TaxRef = DLookup(
"MinOfTax_Code",
"FUNCTIONS_TAX",
"Func_ID = '" & Func_ID & "'"
)
Note that the DLookup function prevents this query from being used outside of Access, for instance via JET OLEDB. Also, the performance of this approach can be pretty terrible depending on how many rows you're targeting, as the subquery is being executed for each FUNCTIONS row (because, of course, it is no longer correlated, which is the whole point in order for it to work).
Good luck!
I had a similar problem. I wanted to find a string in one column and put that value in another column in the same table. The select statement below finds the text inside the parens.
When I created the query in Access I selected all fields. On the SQL view for that query, I replaced the mytable.myfield for the field I wanted to have the value from inside the parens with
SELECT Left(Right(OtherField,Len(OtherField)-InStr((OtherField),"(")),
Len(Right(OtherField,Len(OtherField)-InStr((OtherField),"(")))-1)
I ran a make table query. The make table query has all the fields with the above substitution and ends with INTO NameofNewTable FROM mytable
Does this work? Untested but should get the point across.
UPDATE FUNCTIONS
SET Func_TaxRef =
(
SELECT Min(TAX.Tax_Code) AS MinOfTax_Code
FROM TAX, FUNCTIONS F1
WHERE F1.Func_Pure <= [Tax_ToPrice]
AND F1.Func_Year=[Tax_Year]
AND F1.Func_ID = FUNCTIONS.Func_ID
GROUP BY F1.Func_ID;
)
Basically for each row in FUNCTIONS, the subquery determines the minimum current tax code and sets FUNCTIONS.Func_TaxRef to that value. This is assuming that FUNCTIONS.Func_ID is a Primary or Unique key.
I did want to add one more answer that utilizes a VBA function, but it does get the job done in one SQL statement. Though, it can be slow.
UPDATE FUNCTIONS
SET FUNCTIONS.Func_TaxRef = DLookUp("MinOfTax_Code", "SELECT
FUNCTIONS.Func_ID,Min(TAX.Tax_Code) AS MinOfTax_Code
FROM TAX, FUNCTIONS
WHERE (((FUNCTIONS.Func_Pure)<=[Tax_ToPrice]) AND ((FUNCTIONS.Func_Year)=[Tax_Year]))
GROUP BY FUNCTIONS.Func_ID;", "FUNCTIONS.Func_ID=" & Func_ID)
I know this topic is old, but I thought I could add something to it.
I could not make an Update with Select query work using SQL in MS Access 2010. I used Tomalak's suggestion to make this work. I had a screenshot, but am apparently too much of a newb on this site to be able to post it.
I was able to do this using the Query Design tool, but even as I was looking at a confirmed successful update query, Access was not able to show me the SQL that made it happen. So I could not make this work with SQL code alone.
I created and saved my select query as a separate query. In the Query Design tool, I added the table I'm trying to update the the select query I had saved (I put the unique key in the select query so it had a link between them). Just as Tomalak had suggested, I changed the Query Type to Update. I then just had to choose the fields (and designate the table) I was trying to update. In the "Update To" fields, I typed in the name of the fields from the select query I had brought in.
This format was successful and updated the original table.