Questions on BETWEEN syntax - sql

I'm working on a script which retrieves the latest balance for an account that is earlier or equal to a given date. For example, if the last movement of the account was on Jan.5, and I input Jan.8 on the script, it would have to retrieve the balance back in Jan.5 . of course, if there is any movement in the account on Jan.8, then it would retrieve the balance at Jan.8 . Each balance though, is valid for a given date range, as dictated by the columns BAL_DATE and END_BAL_DATE.
This script works, butI find it very unusual since the position of the values and the expressions are swapped:
SELECT
*
FROM
tbaadm.gstt
WHERE
gl_sub_head_code = '12403'
AND
crncy_code in ('USD','HKD')
AND
sol_id = 5001
AND
last_day('29-OCT-2015') BETWEEN BAL_DATE and END_BAL_DATE
Is it okay if I use this script? Or is it bad practice?

From the documentation,
BETWEEN Conditions
A BETWEEN condition determines whether the value of one expression is
in an interval defined by two other expressions.
Basically, you are comparing two expressions to an expression.
last_day('29-OCT-2015') BETWEEN BAL_DATE and END_BAL_DATE
In your query, you are comparing expr 1 i.e. last_day('29-OCT-2015') to expr 2 BAL_DATE and expr 3 END_BAL_DATE.
The entire BETWEEN condition is internally interpreted as:
last_day('29-OCT-2015') >= BAL_DATE AND last_day('29-OCT-2015') <= END_BAL_DATE

To answer your concern, note that in SQL, you can use multiple where clause by using and condition.
select ....
from ....
where <condition1>
and <condition2>
...
So the between query is correctly used. You can use between query in where clause and rest all in and and you will get exact same result. So your query is similar to
SELECT
*
FROM
tbaadm.gstt
WHERE
last_day('29-OCT-2015') BETWEEN BAL_DATE and END_BAL_DATE
and
gl_sub_head_code = '12403'
AND
crncy_code in ('USD','HKD')
AND
sol_id = 5001

Related

SQLite alias (AS) not working in the same query

I'm stuck in an (apparently) extremely trivial task that I can't make work , and I really feel no chance than to ask for advice.
I used to deal with PHP/MySQL more than 10 years ago and I might be quite rusty now that I'm dealing with an SQLite DB using Qt5.
Basically I'm selecting some records while wanting to make some math operations on the fetched columns. I recall (and re-read some documentation and examples) that the keyword "AS" is going to conveniently rename (alias) a value.
So for example I have this query, where "X" is an integer number that I render into this big Qt string before executing it with a QSqlQuery. This query lets me select all the electronic components used in a Project and calculate how many of them to order (rounding to the nearest multiple of 5) and the total price per component.
SELECT Inventory.id, UsedItems.pid, UsedItems.RefDes, Inventory.name, Inventory.category,
Inventory.type, Inventory.package, Inventory.value, Inventory.manufacturer,
Inventory.price, UsedItems.qty_used as used_qty,
UsedItems.qty_used*X AS To_Order,
ROUND((UsedItems.qty_used*X/5)+0.5)*5*CAST((X > 0) AS INT) AS Nearest5,
Inventory.price*Nearest5 AS TotPrice
FROM Inventory
LEFT JOIN UsedItems ON Inventory.id=UsedItems.cid
WHERE UsedItems.pid='1'
ORDER BY RefDes, value ASC
So, for example, I aliased UsedItems.qty_used as used_qty. At first I tried to use it in the next field, multiplying it by X, writing "used_qty*X AS To_Order" ... Query failed. Well, no worries, I had just put the original tab.field name and it worked.
Going further, I have a complex calculation and I want to use its result on the next field, but the same issue popped out: if I alias "ROUND(...)" AS Nearest5, and then try to use this value by multiplying it in the next field, the query will fail.
Please note: the query WORKS, but ONLY if I don't use aliases in the following fields, namely if I don't use the alias Nearest5 in the TotPrice field. I just want to avoid re-writing the whole ROUND(...) thing for the TotPrice field.
What am I missing/doing wrong? Either SQLite does not support aliases on the same query or I am using a wrong syntax and I am just too stuck/confused to see the mistake (which I'm sure it has to be really stupid).
Column aliases defined in a SELECT cannot be used:
For other expressions in the same SELECT.
For filtering in the WHERE.
For conditions in the FROM clause.
Many databases also restrict their use in GROUP BY and HAVING.
All databases support them in ORDER BY.
This is how SQL works. The issue is two things:
The logic order of processing clauses in the query (i.e. how they are compiled). This affects the scoping of parameters.
The order of processing expressions in the SELECT. This is indeterminate. There is no requirement for the ordering of parameters.
For a simple example, what should x refer to in this example?
select x as a, y as x
from t
where x = 2;
By not allowing duplicates, SQL engines do not have to make a choice. The value is always t.x.
You can try with nested queries.
A SELECT query can be nested in another SELECT query within the FROM clause;
multiple queries can be nested, for example by following the following pattern:
SELECT *,[your last Expression] AS LastExp From (SELECT *,[your Middle Expression] AS MidExp FROM (SELECT *,[your first Expression] AS FirstExp FROM yourTables));
Obviously, respecting the order that the expressions of the innermost select query can be used by subsequent select queries:
the first expressions can be used by all other queries, but the other intermediate expressions can only be used by queries that are further upstream.
For your case, your query may be:
SELECT *, PRC*Nearest5 AS TotPrice FROM (SELECT *, ROUND((UsedItems.qty_used*X/5)+0.5)*5*CAST((X > 0) AS INT) AS Nearest5 FROM (SELECT Inventory.id, UsedItems.pid, UsedItems.RefDes, Inventory.name, Inventory.category, Inventory.type, Inventory.package, Inventory.value, Inventory.manufacturer, Inventory.price AS PRC, UsedItems.qty_used*X AS To_Order FROM Inventory LEFT JOIN UsedItems ON Inventory.id=UsedItems.cid WHERE UsedItems.pid='1' ORDER BY RefDes, value ASC))

SQL Query for Search Page

I am working on a small project for an online databases course and i was wondering if you could help me out with a problem I am having.
I have a web page that is searching a movie database and retrieving specific columns using a movie initial input field, a number input field, and a code field. These will all be converted to strings and used as user input for the query.
Below is what i tried before:
select A.CD, A.INIT, A.NBR, A.STN, A.ST, A.CRET_ID, A.CMNT, A.DT
from MOVIE_ONE A
where A.INIT = :init
AND A.CD = :cd
AND A.NBR = :num
The way the page must search is in three different cases:
(initial and number)
(code)
(initial and number and code)
The cases have to be independent so if certain field are empty, but fulfill a certain case, the search goes through. It also must be in one query. I am stuck on how to implement the cases.
The parameters in the query are taken from the Java parameters in the method found in an SQLJ file.
If you could possibly provide some aid on how i can go about this problem, I'd greatly appreciate it!
Consider wrapping the equality expressions in NVL (synonymous to COALESCE) so if parameter inputs are blank, corresponding column is checked against itself. Also, be sure to kick the a-b-c table aliasing habit.
SELECT m.CD, m.INIT, m.NBR, m.STN, m.ST, m.CRET_ID, m.CMNT, m.DT
FROM MOVIE_ONE m
WHERE m.INIT = NVL(:init, m.INIT)
AND m.CD = NVL(:cd, m.CD)
AND m.NBR = COALESCE(:num, m.NBR)
To demonstrate, consider below DB2 fiddles where each case can be checked by adjusting value CTE parameters all running on same exact data.
Case 1
WITH
i(init) AS (VALUES('db2')),
c(cd) AS (VALUES(NULL)),
n(num) AS (VALUES(53)),
cte AS
...
Case 2
WITH
i(init) AS (VALUES(NULL)),
c(cd) AS (VALUES(2018)),
n(num) AS (VALUES(NULL)),
cte AS
...
Case 3
WITH
i(init) AS (VALUES('db2')),
c(cd) AS (VALUES(2018)),
n(num) AS (VALUES(53)),
cte AS
...
However, do be aware the fiddle runs a different SQL due to nature of data (i.e., double and dates). But query does reflect same concept with NVL matching expressions on both sides.
SELECT *
FROM cte, i, c, n
WHERE cte.mytype = NVL(i.init, cte.mytype)
AND YEAR(CAST(cte.mydate AS date)) = NVL(c.cd, YEAR(CAST(cte.mydate AS date)))
AND ROUND(cte.mynum, 0) = NVL(n.num, ROUND(cte.mynum, 0));

SQL - HAVING (execution vs structure)

I'm a beginner, studying on my own... please help me to clarify something about a query: I am working with a soccer database and trying to answer this question: list all seasons with an avg goal per Match rate of over 1, in Matchs that didn’t end with a draw;
The right query for it is:
select season,round((sum(home_team_goal+away_team_goal) *1.0) /count(id),3) as ratio
from match
where home_team_goal != away_team_goal
group by season
having ratio > 1
I don't understand 2 things about this query:
Why do I *1.0? why is it necessary?
I know that the execution in SQL is by this order:
from
where
group
having
select
So how does this query include: having ratio>1 if the "ratio" is only defined in the "select" which is executed AFTER the HAVING?
Am I confused?
Thanks in advance for the help!
The multiplication is added as a typecast to convert INT to FLOAT because by default sum of ints is int and the division looses decimal places after dividing 2 ints.
HAVING. You can consider HAVING as WHERE but applied to the query results. Imagine the query is executed first without HAVING and then the HAVING condition is applied to result rows leaving only suitable ones.
In you case you first select grouped data and calculate aggregated results and then skip unnecessary results of aggregation.
the *1.0 is used for its ".0" part so that it tells the system to treat the expression as a decimal, and thus not make an integer division which would cut-off the decimal part (eg 1 instead of 1.33).
About the second part: select being at the end just means that the last thing
to be done is showing the data. Hoewever, assigning an alias to a calculated field is being done, you could say, at first priority. Still, I am a bit doubtful; I am almost certain field aliases cannot be used in the where/group by/having in, say, sql server.
There is no order of execution of a SQL query. SQL is a descriptive language not a procedural language. A SQL query describes the result set that the query is producing. The SQL engine can execute it however it likes. In fact, most SQL engines compile the query into a directed acyclic graph, which looks nothing like the original query.
What you are referring to might be better phrased as the "order of interpretation". This is more simply described by simple rules. Column aliases can be used in the ORDER BY clause in any database. They cannot be used in the FROM, WHERE, or GROUP BY clauses. Some databases -- such as SQLite -- allow them to be referenced in the HAVING clause.
As for the * 1.0, it is because some databases -- such as SQLite -- do integer arithmetic. However, the logic that you want is probably more simply expressed as:
round((avg(home_team_goal + away_team_goal * 1.0), 3)

POSTGRES - evaluate (sub)expression error as NULL

I have sensor data in a Postgres table measurements with columns id, timestamp, s0, s1, s2, ...
Besides, there is an index on columns (id, timestamp). I want to allow for dynamic math expressions (in the example below: sin(s3)*0.1000/s5) for calculation of derived values.
SELECT
timestamp,
trunc((sin(s3) * 0.1000/s5)::numeric, 3) AS "calculated"
FROM measurements
WHERE id = 42
ORDER BY timestamp DESC
LIMIT 10000;
Obviously, this is prone to a "division by zero" error which will make the query fail. Is there a way to catch this error and return e.g. NULL for the calculated value where the error would occur?
Inspired by
Postgres return null values on function error/failure when casting
Store a formula in a table and use the formula in a function
I already tried defining a postgres function eval_numeric(sensors int[], formula text) that parses the formula and returns NULL on exception. The third row of the SQL statement above now reads
trunc(eval_numeric(ARRAY[s3,s5],'sin(var1)*0.1/var2'), 3) AS "calculated"
This gives the desired behavior but execution time as reported by EXPLAIN ANALYZE increases by a factor of 20 (~20ms -> ~400ms). Any other ideas?
UPDATE
The dynamic expression to be evaluated stems from a web application user. So the formula above is only an example (might require checking for negative argument to square root). I'd rather have a generic error checking possibility and would prefer not having logic in the math expression. This would be easier for the end user and I could validate the allowed math e.g. with a math parser thereby preventing SQL injection.
Can you change the expression to this?
SELECT timestamp,
trunc((sin(s3) * 0.1000/nullif(s5, 0))::numeric, 3) AS "calculated",
FROM measurements
WHERE id = 42
ORDER BY timestamp DESC
LIMIT 10000;
This is the simplest way to accomplish what you want to do.
You could also use a CASE expression:
SELECT
timestamp,
CASE WHEN s5 != 0
THEN trunc((sin(s3) * 0.1000/s5)::numeric, 3)
ELSE NULL AS "calculated",
FROM measurements
WHERE id = 42
ORDER BY timestamp DESC
LIMIT 10000;
This has the potential benefit that you may replace the value with anything you want, including NULL.
Another option, if you don't care about rows which would have triggered a divide by zero, would be to just add the check on s5 to the WHERE clause and filter off those rows before the division happens.

SQL and logical operators and null checks

I've got a vague, possibly cargo-cult memory from years of working with SQL Server that when you've got a possibly-null column, it's not safe to write "WHERE" clause predicates like:
... WHERE the_column IS NULL OR the_column < 10 ...
It had something to do with the fact that SQL rules don't stipulate short-circuiting (and in fact that's kind-of a bad idea possibly for query optimization reasons), and thus the "<" comparison (or whatever) could be evaluated even if the column value is null. Now, exactly why that'd be a terrible thing, I don't know, but I recall being sternly warned by some documentation to always code that as a "CASE" clause:
... WHERE 1 = CASE WHEN the_column IS NULL THEN 1 WHEN the_column < 10 THEN 1 ELSE 0 END ...
(the goofy "1 = " part is because SQL Server doesn't/didn't have first-class booleans, or at least I thought it didn't.)
So my questions here are:
Is that really true for SQL Server (or perhaps back-rev SQL Server 2000 or 2005) or am I just nuts?
If so, does the same caveat apply to PostgreSQL? (8.4 if it matters)
What exactly is the issue? Does it have to do with how indexes work or something?
My grounding in SQL is pretty weak.
I don't know SQL Server so I can't speak to that.
Given an expression a L b for some logical operator L, there is no guarantee that a will be evaluated before or after b or even that both a and b will be evaluated:
Expression Evaluation Rules
The order of evaluation of subexpressions is not defined. In particular, the inputs of an operator or function are not necessarily evaluated left-to-right or in any other fixed order.
Furthermore, if the result of an expression can be determined by evaluating only some parts of it, then other subexpressions might not be evaluated at all.
[...]
Note that this is not the same as the left-to-right "short-circuiting" of Boolean operators that is found in some programming languages.
As a consequence, it is unwise to use functions with side effects as part of complex expressions. It is particularly dangerous to rely on side effects or evaluation order in WHERE and HAVING clauses, since those clauses are extensively reprocessed as part of developing an execution plan.
As far as an expression of the form:
the_column IS NULL OR the_column < 10
is concerned, there's nothing to worry about since NULL < n is NULL for all n, even NULL < NULL evaluates to NULL; furthermore, NULL isn't true so
null is null or null < 10
is just a complicated way of saying true or null and that's true regardless of which sub-expression is evaluated first.
The whole "use a CASE" sounds mostly like cargo-cult SQL to me. However, like most cargo-cultism, there is a kernel a truth buried under the cargo; just below my first excerpt from the PostgreSQL manual, you will find this:
When it is essential to force evaluation order, a CASE construct (see Section 9.16) can be used. For example, this is an untrustworthy way of trying to avoid division by zero in a WHERE clause:
SELECT ... WHERE x > 0 AND y/x > 1.5;
But this is safe:
SELECT ... WHERE CASE WHEN x > 0 THEN y/x > 1.5 ELSE false END;
So, if you need to guard against a condition that will raise an exception or have other side effects, then you should use a CASE to control the order of evaluation as a CASE is evaluated in order:
Each condition is an expression that returns a boolean result. If the condition's result is true, the value of the CASE expression is the result that follows the condition, and the remainder of the CASE expression is not processed. If the condition's result is not true, any subsequent WHEN clauses are examined in the same manner.
So given this:
case when A then Ra
when B then Rb
when C then Rc
...
A is guaranteed to be evaluated before B, B before C, etc. and evaluation stops as soon as one of the conditions evaluates to a true value.
In summary, a CASE short-circuits buts neither AND nor OR short-circuit so you only need to use a CASE when you need to protect against side effects.
Instead of
the_column IS NULL OR the_column < 10
I'd do
isnull(the_column,0) < 10
or for the first example
WHERE 1 = CASE WHEN isnull(the_column,0) < 10 THEN 1 ELSE 0 END ...
I've never heard of such a problem, and this bit of SQL Server 2000 documentation uses WHERE advance < $5000 OR advance IS NULL in an example, so it must not have been a very stern rule. My only concern with OR is that it has lower precedence than AND, so you might accidentally write something like WHERE the_column IS NULL OR the_column < 10 AND the_other_column > 20 when that's not what you mean; but the usual solution is parentheses rather than a big CASE expression.
I think that in most RDBMSes, indices don't include null values, so an index on the_column wouldn't be terribly useful for this query; but even if that weren't the case, I don't see why a big CASE expression would be any more index-friendly.
(Of course, it's hard to prove a negative, and maybe someone else will know what you're referring to?)
Well, I've repeatedly written queries like the first example since about forever (heck, I've written query generators that generate queries like that), and I've never had a problem.
I think you may be remembering some admonishment somebody gave you sometime against writing funky join conditions that use OR. In your first example, the conditions joined by the OR restrict the same one column of the same table, which is OK. If your second condition was a join condition (i.e., it restricted columns from two different tables), then you could get into bad situations where the query planner just has no choice but to use a Cartesian join (bad, bad, bad!!!).
I don't think your CASE function is really doing anything there, except perhaps hamper your query planner's attempts at finding a good execution plan for the query.
But more generally, just write the straightforward query first and see how it performs for realistic data. No need to worry about a problem that might not even exist!
Nulls can be confusing. The " ... WHERE 1 = CASE ... " is useful if you are trying to pass a Null OR a Value as a parameter ex. "WHERE the_column = #parameter. This post may be helpful Passing Null using OLEDB .
Another example where CASE is useful is when using date functions on the varchar columns. adding ISDATE before using say convert(colA,datetime) might not work, and when colA has non-date data the query can error out.