POSTGRES - evaluate (sub)expression error as NULL - sql

I have sensor data in a Postgres table measurements with columns id, timestamp, s0, s1, s2, ...
Besides, there is an index on columns (id, timestamp). I want to allow for dynamic math expressions (in the example below: sin(s3)*0.1000/s5) for calculation of derived values.
SELECT
timestamp,
trunc((sin(s3) * 0.1000/s5)::numeric, 3) AS "calculated"
FROM measurements
WHERE id = 42
ORDER BY timestamp DESC
LIMIT 10000;
Obviously, this is prone to a "division by zero" error which will make the query fail. Is there a way to catch this error and return e.g. NULL for the calculated value where the error would occur?
Inspired by
Postgres return null values on function error/failure when casting
Store a formula in a table and use the formula in a function
I already tried defining a postgres function eval_numeric(sensors int[], formula text) that parses the formula and returns NULL on exception. The third row of the SQL statement above now reads
trunc(eval_numeric(ARRAY[s3,s5],'sin(var1)*0.1/var2'), 3) AS "calculated"
This gives the desired behavior but execution time as reported by EXPLAIN ANALYZE increases by a factor of 20 (~20ms -> ~400ms). Any other ideas?
UPDATE
The dynamic expression to be evaluated stems from a web application user. So the formula above is only an example (might require checking for negative argument to square root). I'd rather have a generic error checking possibility and would prefer not having logic in the math expression. This would be easier for the end user and I could validate the allowed math e.g. with a math parser thereby preventing SQL injection.

Can you change the expression to this?
SELECT timestamp,
trunc((sin(s3) * 0.1000/nullif(s5, 0))::numeric, 3) AS "calculated",
FROM measurements
WHERE id = 42
ORDER BY timestamp DESC
LIMIT 10000;
This is the simplest way to accomplish what you want to do.

You could also use a CASE expression:
SELECT
timestamp,
CASE WHEN s5 != 0
THEN trunc((sin(s3) * 0.1000/s5)::numeric, 3)
ELSE NULL AS "calculated",
FROM measurements
WHERE id = 42
ORDER BY timestamp DESC
LIMIT 10000;
This has the potential benefit that you may replace the value with anything you want, including NULL.
Another option, if you don't care about rows which would have triggered a divide by zero, would be to just add the check on s5 to the WHERE clause and filter off those rows before the division happens.

Related

SPOTFIRE how do I shape by column in the map chart with a custom expression

I have a spotfire question. I am attempting to create Shapes by Column Values in a Map Chart.
I would like to create these Shapes by a Custom Expression. The custom expression is below, I have simplified it so it is easier to read.
All I am saying is:
if((Current months oil rate - 12MonthAgo Oilrate)/12MonthAgo Oilrate)>0,"UP","Down")
When I run this calculation though it only gives me one value, (there are both positives and negatives so it should give two).
I am not sure what I have done wrong? Any help is appreciated.
<If(((((Sum(If(CurrentMonth),[OILRATECD],null))
-
Sum(If(12MonthsAgo),[OILRATECD],null)))))
/
Sum(If(12MonthsAgo),[OILRATECD],null)))>0,"UP","DOWN")>
ORIGINAL EQUATION:
<If(((((Sum(If((Month([DATE])=Month(DateAdd("mm",-2,DateTimeNow()))) and (Year([DATE])=Year(DateAdd("mm",-2,DateTimeNow()))),[OILRATECD],null))
-
Sum(If((Month([DATE])=Month(DateAdd("mm",-${MonthInterval}-2,DateTimeNow()))) and (Year([DATE])=Year(DateAdd("mm",-${MonthInterval}-2,DateTimeNow()))),[OILRATECD],null)))))
/
Sum(If((Month([DATE])=Month(DateAdd("mm",-${MonthInterval}-2,DateTimeNow()))) and (Year([DATE])=Year(DateAdd("mm",-${MonthInterval}-2,DateTimeNow()))),[OILRATECD],null)))>0,"UP","DOWN")>
First, you have a lot of unnecessary parentheses but that shouldn't hurt anything.
If(
(
( --this open parentheses is unneeded
( --this open parentheses is unneeded
(
Sum(If((Month([DATE])=Month(DateAdd("mm",-2,DateTimeNow()))) and (Year([DATE])=Year(DateAdd("mm",-2,DateTimeNow()))),[OILRATECD],null))
-
Sum(If((Month([DATE])=Month(DateAdd("mm",-${MonthInterval}-2,DateTimeNow()))) and (Year([DATE])=Year(DateAdd("mm",-${MonthInterval}-2,DateTimeNow()))),[OILRATECD],null))
)
) --this closed parentheses is unneeded
) --this closed parentheses is unneeded
/
Sum(If((Month([DATE])=Month(DateAdd("mm",-${MonthInterval}-2,DateTimeNow()))) and (Year([DATE])=Year(DateAdd("mm",-${MonthInterval}-2,DateTimeNow()))),[OILRATECD],null))
)
>0,"UP","DOWN")
The reason you are not getting UP and DOWN returned is because the condition isn't met. We need a sample data set with expected output to verify this.
However, here's a reason why you could be getting unexpected results, regarding NULL in your SUM(IF(...,[OILRATECD],NULL)) expression.
TL;DR
If your condtion in your IF() statement isn't ever evaluated to true
then NULL is returned to SUM(), and SUM(NULL,NULL,NULL) = NULL
and NULL is not > nor is it < 0, thus your outer IF()
statement would return NULL instead of "UP" or "DOWN"
LONG VERSION
Spotfire ignored NULL when evaluating SUM() which is what you want. For example, Sum(4,3,NULL) = 7 and this is the behavior we want. However...
Spotfire doesn't ignore NULL for addition, subtraction, division, and other comparison operators like >. So, 4 - NULL = NULL and NULL / 18 = NULL and so on. This means if either of your two SUM() methods return NULL then your entire expression will be NULL because...
NULL isn't > nor is it < and certainly not = 0. NULL is the absence of a value, and thus can't be compared or equated to anything. For example, If(NULL > 1,"YES","NO") doesn't return YES or NO... it returns NULL, the lack of a value. Also, If(NULL=NULL,"YES","NO") will also return NULL
HOW TO GET AROUND THIS
Use IS NULL and IS NOT NULL in a IF() statement to set it to a default value, or use 0 in place of NULL in your current expression.
Sum(If((Month([DATE])=Month(DateAdd("mm",-2,DateTimeNow()))) and (Year([DATE])=Year(DateAdd("mm",-2,DateTimeNow()))),[OILRATECD],0))
BIG SIDE NOTE
You said your equation's pseudo code is:
If((Current months oil rate - 12MonthAgo Oilrate)/12MonthAgo Oilrate)>0,"UP","Down")
This doesn't seem to be what you are evaluating. Instead, I read:
x = (-var) + (-2) thus var < -2 (i.e. -3....-6...-55)
if((sum([2 months ago oil rate]) - sum([x months ago oil rate])) > 0, "UP","DOWN")
So you aren't ever looking at the current month's oil rate, but instead are looking at current - 2. Also, you are looking at the SUM() over that entire month... this may be what you want by you may be actually looking for Max() or Min()
OVER FUNCTION
You can handle this a lot easier most likely with a few calculated columns to keep your data / expressions clean and legible
DatePart("yy",[Date]) as [Year]
DatePart("mm",[Date]) as [Month]
Max([OILRATECD]) OVER (Intersect([Year],[Month])) as [YearMonthRate] or use SUM() if that's what you really want.
Check to see the difference with Rank()

Postgresql Writing max() Window function with multiple partition expressions?

I am trying to get the max value of column A ("original_list_price") over windows defined by 2 columns (namely - a unique identifier, called "address_token", and a date field, called "list_date"). I.e. I would like to know the max "original_list_price" of rows with both the same address_token AND list_date.
E.g.:
SELECT
address_token, list_date, original_list_price,
max(original_list_price) OVER (PARTITION BY address_token, list_date) as max_list_price
FROM table1
The query already takes >10 minutes when I use just 1 expression in the PARTITION (e.g. using address_token only, nothing after that). Sometimes the query times out. (I use Mode Analytics and get this error: An I/O error occurred while sending to the backend) So my questions are:
1) Will the Window function with multiple PARTITION BY expressions work?
2) Any other way to achieve my desired result?
3) Any way to make Windows functions, especially the Partition part run faster? e.g. use certain data types over others, try to avoid long alphanumeric string identifiers?
Thank you!
The complexity of the window functions partitioning clause should not have a big impact on performance. Do realize that your query is returning all the rows in the table, so there might be a very large result set.
Window functions should be able to take advantage of indexes. For this query:
SELECT address_token, list_date, original_list_price,
max(original_list_price) OVER (PARTITION BY address_token, list_date) as max_list_price
FROM table1;
You want an index on table1(address_token, list_date, original_list_price).
You could try writing the query as:
select t1.*,
(select max(t2.original_list_price)
from table1 t2
where t2.address_token = t1.address_token and t2.list_date = t1.list_date
) as max_list_price
from table1 t1;
This should return results more quickly, because it doesn't have to calculate the window function value first (for all rows) before returning values.

What is MAX(DISTINCT x) in SQL?

I just stumbled over jOOQ's maxDistinct SQL aggregation function.
What does MAX(DISTINCT x) do different from just MAX(x) ?
maxDistinct and minDistinct were defined in order to keep consistency with the other aggregate functions where having a distinct option actually makes a difference (e.g., countDistinct, sumDistinct).
Since the maximum (or minimum) calculated between the distinct values of a dataset is mathematically equivalent with the simple maximum (or minimum) of the same set, these function are essentially redundant.
In short, there will be no difference. In case of MySQL, it's even stated in manual page:
Returns the maximum value of expr. MAX() may take a string argument;
in such cases, it returns the maximum string value. See Section 8.5.3,
“How MySQL Uses Indexes”. The DISTINCT keyword can be used to find the
maximum of the distinct values of expr, however, this produces the
same result as omitting DISTINCT.
The reason why it's possible - is because to keep compatibility with other platforms. Internally, there will be no difference - MySQL will just omit influence of DISTINCT. It will not try to do something with set of rows (i.e. produce distinct set first). For indexed columns it will be Select tables optimized away (thus reading one value from index, not a table), for non-indexed - full scan.
If i'm not wrong there are no difference
For Columns
ID
1
2
2
3
3
4
5
5
The OUTPUT for both quires are same 5
MAX(DISTINCT x)
// ID = 1,2,2,3,3,4,5,5
// DISTINCT = 1,2,3,4,5
// MAX = 5
// 1 row
and for
MAX(x)
// ID = 1,2,2,3,3,4,5,5
// MAX = 5
// 1 row
Theoretically, DISTINCT x ensures that every element is different from a certain set. The max operator selects the highest value from a set. In plain SQL there should be no difference between both.

Boolean evaluation of page item is incorrect

I have a report in an APEX page and it has multiple columns with results (rows count) that go from anywhere between 10,000 and 1,000,000 (1M) records.
There is a conditional where clause that I have, and it uses a page item's value in order to determine or restrict the results that are shown... it looks something like this:
SELECT
...
FROM ...
WHERE ...
AND (:P2_STARTDATE IS NULL OR TO_DATE(:P2_STARTDATE, 'DD-MON-YYYY HH24:MI:SS') < creation_date)
I believe that at anytime I do enter a value for the P2_STARTDATE page item then the comparison takes place, but when I do not enter any value for the page item then it is supposed to be NULL and boolean operation should just return TRUE for the P2_STARTDATE IS NULL evaluation...
The query makes the execution time take as much as 45+ seconds when searching in 0.5M rows, which is not acceptable. I wrote the following change to test my theory:
SELECT
...
FROM ...
WHERE deleted_flag = 'N'
AND (:P2_STARTDATE IS NULL) -- comment the rest of the evaluation....
It evaluates immediately to NULL and returns the same resultset 0.5M+ in about 1 second... now, if I do set a value then the resultset is empty, obviously.
So the question is, how can I make Oracle APEX evaluate quickly to TRUE that expression? Thanks for any tips, workarounds, or solutions that you may offer.
I am not sure it is safe to assume that the SQL engine is using short-circuit evaluation on your OR.
Try this:
AND (:P2_STARTDATE IS NULL
OR
(:P2_STARTDATE IS NOT NULL
AND TO_DATE(:P2_STARTDATE, 'DD-MON-YYYY HH24:MI:SS') < creation_date)
)
I do not believe the second part of your query is sargeable - it cannot use an index. Plus your example has nothing to do with searching for the actual result set.
One way is to convert (OUTSIDE of this query) the bound variable to the correct datatype, so that the query becomes able to use the index on creation_date (there is an index on creation_date, right?)
SELECT
...
FROM ...
WHERE ...
AND :P2_STARTDATE IS NULL OR creation_date > :newdatevariable;
In any event get the function to_date out of there and pass in a constant.

SQL - Calculating if an alphanumeric value exists in an alphanumeric range

I have a varchar column in a database and a requirement has come in so a user can enter a range to/from eg/ABC001 to ABC100
I have the following query but feel it might not be strict enough to work out if any values within that range exist.
SELECT count(*) FROM MyTable where MyColumn between 'ABC001' and 'ABC005'
I have a feeling an order by should be used or is there a better way to calculate the existence of values within a alphanumeric range
No orderby is required. That should be perfrect.
If you want to boost that operation you can create a index on it.
Order by operation is done at the end of query execution, so the data will be retrived in the same way.
OP said:
or is there a better way to calculate
the existence of values within a
alphanumeric range
The best way would be:
SELECT count(*) FROM MyTable where MyColumn>='ABC001' and MyColumn<='ABC005'
I find most people can't remember if BETWEEN includes or excludes the "end points". By just always using >= and/or > and/or <= and/or < you have more clarity and flexibility.
Any ORDER BY would be applied to the resulting set of rows that meet the WHERE condition, and has nothing to do with the WHERE filtering. You can use it if you want the final result set in a particular order, but it will have no effect on which rows are included in the results.