SPOTFIRE how do I shape by column in the map chart with a custom expression - shapes

I have a spotfire question. I am attempting to create Shapes by Column Values in a Map Chart.
I would like to create these Shapes by a Custom Expression. The custom expression is below, I have simplified it so it is easier to read.
All I am saying is:
if((Current months oil rate - 12MonthAgo Oilrate)/12MonthAgo Oilrate)>0,"UP","Down")
When I run this calculation though it only gives me one value, (there are both positives and negatives so it should give two).
I am not sure what I have done wrong? Any help is appreciated.
<If(((((Sum(If(CurrentMonth),[OILRATECD],null))
-
Sum(If(12MonthsAgo),[OILRATECD],null)))))
/
Sum(If(12MonthsAgo),[OILRATECD],null)))>0,"UP","DOWN")>
ORIGINAL EQUATION:
<If(((((Sum(If((Month([DATE])=Month(DateAdd("mm",-2,DateTimeNow()))) and (Year([DATE])=Year(DateAdd("mm",-2,DateTimeNow()))),[OILRATECD],null))
-
Sum(If((Month([DATE])=Month(DateAdd("mm",-${MonthInterval}-2,DateTimeNow()))) and (Year([DATE])=Year(DateAdd("mm",-${MonthInterval}-2,DateTimeNow()))),[OILRATECD],null)))))
/
Sum(If((Month([DATE])=Month(DateAdd("mm",-${MonthInterval}-2,DateTimeNow()))) and (Year([DATE])=Year(DateAdd("mm",-${MonthInterval}-2,DateTimeNow()))),[OILRATECD],null)))>0,"UP","DOWN")>

First, you have a lot of unnecessary parentheses but that shouldn't hurt anything.
If(
(
( --this open parentheses is unneeded
( --this open parentheses is unneeded
(
Sum(If((Month([DATE])=Month(DateAdd("mm",-2,DateTimeNow()))) and (Year([DATE])=Year(DateAdd("mm",-2,DateTimeNow()))),[OILRATECD],null))
-
Sum(If((Month([DATE])=Month(DateAdd("mm",-${MonthInterval}-2,DateTimeNow()))) and (Year([DATE])=Year(DateAdd("mm",-${MonthInterval}-2,DateTimeNow()))),[OILRATECD],null))
)
) --this closed parentheses is unneeded
) --this closed parentheses is unneeded
/
Sum(If((Month([DATE])=Month(DateAdd("mm",-${MonthInterval}-2,DateTimeNow()))) and (Year([DATE])=Year(DateAdd("mm",-${MonthInterval}-2,DateTimeNow()))),[OILRATECD],null))
)
>0,"UP","DOWN")
The reason you are not getting UP and DOWN returned is because the condition isn't met. We need a sample data set with expected output to verify this.
However, here's a reason why you could be getting unexpected results, regarding NULL in your SUM(IF(...,[OILRATECD],NULL)) expression.
TL;DR
If your condtion in your IF() statement isn't ever evaluated to true
then NULL is returned to SUM(), and SUM(NULL,NULL,NULL) = NULL
and NULL is not > nor is it < 0, thus your outer IF()
statement would return NULL instead of "UP" or "DOWN"
LONG VERSION
Spotfire ignored NULL when evaluating SUM() which is what you want. For example, Sum(4,3,NULL) = 7 and this is the behavior we want. However...
Spotfire doesn't ignore NULL for addition, subtraction, division, and other comparison operators like >. So, 4 - NULL = NULL and NULL / 18 = NULL and so on. This means if either of your two SUM() methods return NULL then your entire expression will be NULL because...
NULL isn't > nor is it < and certainly not = 0. NULL is the absence of a value, and thus can't be compared or equated to anything. For example, If(NULL > 1,"YES","NO") doesn't return YES or NO... it returns NULL, the lack of a value. Also, If(NULL=NULL,"YES","NO") will also return NULL
HOW TO GET AROUND THIS
Use IS NULL and IS NOT NULL in a IF() statement to set it to a default value, or use 0 in place of NULL in your current expression.
Sum(If((Month([DATE])=Month(DateAdd("mm",-2,DateTimeNow()))) and (Year([DATE])=Year(DateAdd("mm",-2,DateTimeNow()))),[OILRATECD],0))
BIG SIDE NOTE
You said your equation's pseudo code is:
If((Current months oil rate - 12MonthAgo Oilrate)/12MonthAgo Oilrate)>0,"UP","Down")
This doesn't seem to be what you are evaluating. Instead, I read:
x = (-var) + (-2) thus var < -2 (i.e. -3....-6...-55)
if((sum([2 months ago oil rate]) - sum([x months ago oil rate])) > 0, "UP","DOWN")
So you aren't ever looking at the current month's oil rate, but instead are looking at current - 2. Also, you are looking at the SUM() over that entire month... this may be what you want by you may be actually looking for Max() or Min()
OVER FUNCTION
You can handle this a lot easier most likely with a few calculated columns to keep your data / expressions clean and legible
DatePart("yy",[Date]) as [Year]
DatePart("mm",[Date]) as [Month]
Max([OILRATECD]) OVER (Intersect([Year],[Month])) as [YearMonthRate] or use SUM() if that's what you really want.
Check to see the difference with Rank()

Related

Nested NULLIF with ISDATE to eliminate records

I have a column (datatype nvarchar(max), not my choice, legacy) with various different uses to the end user dependent on other factors.
I was trying to narrow down to certain specific data within that column given earlier clauses within some sample (but highly representative) data I noticed that the remaining data was either a 1,2,3,4, or a date
I initially added some nested NULLIF along with an IS NOT NULL
AND NULLIF(NULLIF(NULLIF(NULLIF([ColumnName],'1'),'2'),'3'),'4') IS NOT NULL
The 1 is as a string because in the global data there are some strings so removing the single brackets creates an implied conversion to int which many records would fail. This got me down to 25 records in my sample, the records with dates and the target records
I then thought I'd add an ISDATE to isolate those records
AND NULLIF(NULLIF(NULLIF(NULLIF(ISDATE([ColumnName]),'1'),'2'),'3'),'4') IS NOT NULL
This then returned 60k or so records, which was not the behaviour I expected.
I ran the following queries to see if there was any incompatibility with the two commands inline but the returned as expected
SELECT NULLIF(ISDATE('06/01/2022'),1)
returned NULL.
SELECT NULLIF(ISDATE('06/01/2022'),'1')
in case it didn't like a string, but returned NULL.
SELECT NULLIF(NULLIF(ISDATE('06/01/2022'),'1'),'2')
in case it didn't like the nest, but returned NULL.
So why does it not NULL the values that present as dates, and also why does it negate the other NULLIF commands in the outer parts of the nest?
Turns out I'm an idiot
I forgot to catch all the times that the ISDATE() returned a 0, so it was returning all the values that the NULLIFs were trying to catch
AND ISDATE(ISNULL(NULLIF(NULLIF(NULLIF(NULLIF([2nd Ref],'1'),'2'),'3'),'4'),'06/01/2022')) = 0
So changing the order here helped, creating NULLs for the expected values, changing those into a known date, then returning where it isn't a date
That should now work as expected. Oh well, someone might one day find it useful
edit: There are a couple of records that still have dates in them, I will also use the TRYCONVERT as Jeroen suggested

Finding the last 4, 3, 2, 1 months consecutive order drops among clients based on drop variance

Here I have this query that finds out the drop percentage of a bunch of clients based on the orders they have received(i.e. It finds the percentage difference in orders by comparing the current month with the previous month). What I want to achieve here is to have a field where I can see the clients who had 4 months continuous drop, 3 months drop, 2 months drop, and 1 month drop.
I know, it can only be achieved by comparing the last 4 months using the lag function or sub queries. can you guys pls help me out on this one, would appreciate it very much
select
fd.customers2, fd.Month1, fd.year1, fd.variance, case when
(fd.variance < -0.00001 and fd.year1 = '2022.0' and fd.Month1 = '1')
then '1month drop' else fd.customers2 end as 1_most_host_drop
from 
(SELECT
c.*,
sa.customers as customers2,
sum(sa.order) as orders,
date_part(mon, sa.date) as Month1,
date_part(year, sa.date) as year1,
(cast(orders - LAG(orders) OVER(Partition by customers2 ORDER BY
 year1, Month1) as NUMERIC(10,2))/NULLIF(LAG(orders) 
OVER(partition by customers2 ORDER BY year1, Month1) * 1, 0)) AS variance
FROM stats sa join (select distinct
    d.id, d.customers 
     from configer d 
    ) c on sa.customers=c.customers
WHERE sa.date >= '2021-04-1' 
GROUP BY Month1, sa.customers, c.id,  year1, 
     c.customers)fd
In a spirit of friendliness: I think you are a little premature in posting this here as there are several issues with the syntax before even reaching the point where you can solve the problem:
You have at least two places with a comma immediately preceding the word FROM:
...AS variance, FROM stats_archive sa ...
...d.customers, FROM config d...
Recommend you don't use VARIANCE as an alias (it is a system function in PostgreSQL and so is likely also a system function name in Redshift)
Not super important, but there's no need for c.* - just select the columns you will use
DATE_PART requires a string as the first parameter DATE_PART('mon',current_date)
I might be wrong about this, but I suspect you cannot use column aliases in the partition by or order by of a window function. Put the originating expressions there instead:
... OVER (PARTITION BY customers2 ORDER BY DATE_PART('year',sa.date),DATE_PART('mon',sa.date))
LAG has three parameters. (1) The column you want to retrieve the value from, (2) the row offset, where a positive integer indicates how many rows prior to the current row you should retrieve a value from according to the partition and order context and (3) the value the function should return as a default (in case of the first row in the partition). As such, you don't need NULLIF. So, to get the row immediately prior to the current row, or return 0 in case the current row is the first row in the partition:
LAG(orders,1,0) OVER (PARTITION BY customers2 ORDER BY DATE_PART('year',sa.date),DATE_PART('mon',sa.date))
If you use 0 as a default in the calculation of what is currently aliased variance, you will almost certainly run into a div/0 error either now, or worse, when you least expect it in the future. You should protect against that with some CASE logic or better, provide a more appropriate default value or even better, calculate the LAG with the default 0, then filter out the 0 rows before doing the calculation.
You can't use column aliases in the GROUP BY. You must reference each field that is not participating in an aggregate in the group by, whether through direct mention (sa.date) or indirectly in an expression (DATE_PART('mon',sa.date))
Your date should be '2021-04-01'
All in all, without sample data, expected results using the posted sample data and without first removing syntax errors, it is a tall order to attempt to offer advice on the problem which is any more specific than:
Build the source of the calculation as a completely separate query first. Calculate the LAG in that source query. Only when you've run that source query and verified that the LAG is producing the correct result should you then wrap it as a sub-query or CTE (not sure if Redshift supports these, but presumably) at which point you can filter out the rows with a zero as the denominator (the first month of orders for each customer).
Good luck!

IIF Function returning incorrect calculated values - SQL Server

I am writing a query to show returns of placing each way bets on horse races
There is an issue with the PlaceProfit result - This should show a return if the horses finishing position is between 1-4 and a loss if the position is => 5
It does show the correct return if the horses finishing position is below 9th, but 10th place and above is being counted as a win.
I include my code below along with the output.
ALTER VIEW EachWayBetting
AS
SELECT a.ID,
RaceDate,
runners,
track.NAME AS Track,
horse.NAME as HorseName,
IndustrySP,
Place AS 'FinishingPosition',
-- // calculates returns on the win & place parts of an each way bet with 1/5 place terms //
IIF(A.Place = '1', 1.0 * (A.IndustrySP-1), '-1') AS WinProfit,
IIF(A.Place <='4', 1.0 * (A.IndustrySP-1)/5, '-1') AS PlaceProfit
FROM dbo.NewRaceResult a
LEFT OUTER JOIN track ON track.ID = A.TrackID
LEFT OUTER JOIN horse ON horse.ID = A.HorseID
WHERE a.Runners > 22
This returns:
As I mention in the comments, the problem is your choice of data type for place, it's varchar. The ordering for a string data type is completely different to that of a numerical data type. Strings are sorted by character from left to right, in the order the characters are ordered in the collation you are using. Numerical data types, however, are ordered from the lowest to highest.
This means that, for a numerical data type, the value 2 has a lower value than 10, however, for a varchar the value '2' has a higher value than '10'. For the varchar that's because the ordering is completed on the first character first. '2' has a higher value than '1' and so '2' has a higher value than '10'.
The solution here is simple, fix your design; store numerical data in a numerical data type (int seems appropriate here). You're also breaking Normal Form rules, as you're storing other data in the column; mainly the reason a horse failed to be classified. Such data isn't a "Place" but information on why the horse didn't place, and so should be in a separate column.
You can therefore fix this by firstly adding a new column, then updating it's value to be the values that aren't numerical and making place only contain numerical data, and then finally altering your place column.
ALTER TABLE dbo.YourTable ADD UnClassifiedReason varchar(5) NULL; --Obviously use an appropriate length.
GO
UPDATE dbo.YourTable
SET Place = TRY_CONVERT(int,Place),
UnClassifiedReason = CASE WHEN TRY_CONVERT(int,Place) IS NULL THEN Place END;
GO
ALTER TABLE dbo.YourTable ALTER COLUMN Place int NULL;
GO
If Place does not allow NULL values, you will need to ALTER the column first to allow them.
In addition to fixing the data as Larnu suggests, you should also fix the query:
SELECT nrr.ID, nrr.RaceDate, nrr.runners,
t.NAME AS Track, t.NAME as HorseName, nrr.IndustrySP,
Place AS FinishingPosition,
-- // calculates returns on the win & place parts of an each way bet with 1/5 place terms //
(CASE WHEN nrr.Place = 1 THEN (nrr..IndustrySP - 1.0) ELSE -1 END) AS WinProfit,
(CASE WHEN nrr.Place <= 4 THEN (nrr.IndustrySP - 1.0) / 5 THEN -1 END) AS PlaceProfit
FROM dbo.NewRaceResult nrr LEFT JOIN
track t
ON t.ID = nrr.TrackID LEFT JOIN
horse h
ON h.ID = nrr.HorseID
WHERE nrr.Runners > 22;
The important changes are removing single quotes from numbers and column names. It seems you need to understand the differences among strings, numbers, and identifiers.
Other changes are:
Meaningful table aliases, rather than meaningless letters such as a.
Qualifying all column references, so it is clear where columns are coming from.
Switching from IFF() to CASE. IFF() is bespoke SQL Server; CASE is standard SQL for conditional expressions (both work fine).
Being sure that the types returned by all branches of the conditional expressions are consistent.
Note: This version will work even if you don't change the type of Place. The strings will be converted to numbers in the appropriate places. I don't advocate relying on such silent conversion, so I recommend fixing the data.
If place can have non-numeric values, then you need to convert them:
(CASE WHEN TRY_CONVERT(int, nrr.Place) = 1 THEN (nrr..IndustrySP - 1.0) ELSE -1 END) AS WinProfit,
(CASE WHEN TRY_CONVERT(int, nrr.Place) <= 4 THEN (nrr.IndustrySP - 1.0) / 5 THEN -1 END) AS PlaceProfit
But the important point is to fix the data.

POSTGRES - evaluate (sub)expression error as NULL

I have sensor data in a Postgres table measurements with columns id, timestamp, s0, s1, s2, ...
Besides, there is an index on columns (id, timestamp). I want to allow for dynamic math expressions (in the example below: sin(s3)*0.1000/s5) for calculation of derived values.
SELECT
timestamp,
trunc((sin(s3) * 0.1000/s5)::numeric, 3) AS "calculated"
FROM measurements
WHERE id = 42
ORDER BY timestamp DESC
LIMIT 10000;
Obviously, this is prone to a "division by zero" error which will make the query fail. Is there a way to catch this error and return e.g. NULL for the calculated value where the error would occur?
Inspired by
Postgres return null values on function error/failure when casting
Store a formula in a table and use the formula in a function
I already tried defining a postgres function eval_numeric(sensors int[], formula text) that parses the formula and returns NULL on exception. The third row of the SQL statement above now reads
trunc(eval_numeric(ARRAY[s3,s5],'sin(var1)*0.1/var2'), 3) AS "calculated"
This gives the desired behavior but execution time as reported by EXPLAIN ANALYZE increases by a factor of 20 (~20ms -> ~400ms). Any other ideas?
UPDATE
The dynamic expression to be evaluated stems from a web application user. So the formula above is only an example (might require checking for negative argument to square root). I'd rather have a generic error checking possibility and would prefer not having logic in the math expression. This would be easier for the end user and I could validate the allowed math e.g. with a math parser thereby preventing SQL injection.
Can you change the expression to this?
SELECT timestamp,
trunc((sin(s3) * 0.1000/nullif(s5, 0))::numeric, 3) AS "calculated",
FROM measurements
WHERE id = 42
ORDER BY timestamp DESC
LIMIT 10000;
This is the simplest way to accomplish what you want to do.
You could also use a CASE expression:
SELECT
timestamp,
CASE WHEN s5 != 0
THEN trunc((sin(s3) * 0.1000/s5)::numeric, 3)
ELSE NULL AS "calculated",
FROM measurements
WHERE id = 42
ORDER BY timestamp DESC
LIMIT 10000;
This has the potential benefit that you may replace the value with anything you want, including NULL.
Another option, if you don't care about rows which would have triggered a divide by zero, would be to just add the check on s5 to the WHERE clause and filter off those rows before the division happens.

SQL and logical operators and null checks

I've got a vague, possibly cargo-cult memory from years of working with SQL Server that when you've got a possibly-null column, it's not safe to write "WHERE" clause predicates like:
... WHERE the_column IS NULL OR the_column < 10 ...
It had something to do with the fact that SQL rules don't stipulate short-circuiting (and in fact that's kind-of a bad idea possibly for query optimization reasons), and thus the "<" comparison (or whatever) could be evaluated even if the column value is null. Now, exactly why that'd be a terrible thing, I don't know, but I recall being sternly warned by some documentation to always code that as a "CASE" clause:
... WHERE 1 = CASE WHEN the_column IS NULL THEN 1 WHEN the_column < 10 THEN 1 ELSE 0 END ...
(the goofy "1 = " part is because SQL Server doesn't/didn't have first-class booleans, or at least I thought it didn't.)
So my questions here are:
Is that really true for SQL Server (or perhaps back-rev SQL Server 2000 or 2005) or am I just nuts?
If so, does the same caveat apply to PostgreSQL? (8.4 if it matters)
What exactly is the issue? Does it have to do with how indexes work or something?
My grounding in SQL is pretty weak.
I don't know SQL Server so I can't speak to that.
Given an expression a L b for some logical operator L, there is no guarantee that a will be evaluated before or after b or even that both a and b will be evaluated:
Expression Evaluation Rules
The order of evaluation of subexpressions is not defined. In particular, the inputs of an operator or function are not necessarily evaluated left-to-right or in any other fixed order.
Furthermore, if the result of an expression can be determined by evaluating only some parts of it, then other subexpressions might not be evaluated at all.
[...]
Note that this is not the same as the left-to-right "short-circuiting" of Boolean operators that is found in some programming languages.
As a consequence, it is unwise to use functions with side effects as part of complex expressions. It is particularly dangerous to rely on side effects or evaluation order in WHERE and HAVING clauses, since those clauses are extensively reprocessed as part of developing an execution plan.
As far as an expression of the form:
the_column IS NULL OR the_column < 10
is concerned, there's nothing to worry about since NULL < n is NULL for all n, even NULL < NULL evaluates to NULL; furthermore, NULL isn't true so
null is null or null < 10
is just a complicated way of saying true or null and that's true regardless of which sub-expression is evaluated first.
The whole "use a CASE" sounds mostly like cargo-cult SQL to me. However, like most cargo-cultism, there is a kernel a truth buried under the cargo; just below my first excerpt from the PostgreSQL manual, you will find this:
When it is essential to force evaluation order, a CASE construct (see Section 9.16) can be used. For example, this is an untrustworthy way of trying to avoid division by zero in a WHERE clause:
SELECT ... WHERE x > 0 AND y/x > 1.5;
But this is safe:
SELECT ... WHERE CASE WHEN x > 0 THEN y/x > 1.5 ELSE false END;
So, if you need to guard against a condition that will raise an exception or have other side effects, then you should use a CASE to control the order of evaluation as a CASE is evaluated in order:
Each condition is an expression that returns a boolean result. If the condition's result is true, the value of the CASE expression is the result that follows the condition, and the remainder of the CASE expression is not processed. If the condition's result is not true, any subsequent WHEN clauses are examined in the same manner.
So given this:
case when A then Ra
when B then Rb
when C then Rc
...
A is guaranteed to be evaluated before B, B before C, etc. and evaluation stops as soon as one of the conditions evaluates to a true value.
In summary, a CASE short-circuits buts neither AND nor OR short-circuit so you only need to use a CASE when you need to protect against side effects.
Instead of
the_column IS NULL OR the_column < 10
I'd do
isnull(the_column,0) < 10
or for the first example
WHERE 1 = CASE WHEN isnull(the_column,0) < 10 THEN 1 ELSE 0 END ...
I've never heard of such a problem, and this bit of SQL Server 2000 documentation uses WHERE advance < $5000 OR advance IS NULL in an example, so it must not have been a very stern rule. My only concern with OR is that it has lower precedence than AND, so you might accidentally write something like WHERE the_column IS NULL OR the_column < 10 AND the_other_column > 20 when that's not what you mean; but the usual solution is parentheses rather than a big CASE expression.
I think that in most RDBMSes, indices don't include null values, so an index on the_column wouldn't be terribly useful for this query; but even if that weren't the case, I don't see why a big CASE expression would be any more index-friendly.
(Of course, it's hard to prove a negative, and maybe someone else will know what you're referring to?)
Well, I've repeatedly written queries like the first example since about forever (heck, I've written query generators that generate queries like that), and I've never had a problem.
I think you may be remembering some admonishment somebody gave you sometime against writing funky join conditions that use OR. In your first example, the conditions joined by the OR restrict the same one column of the same table, which is OK. If your second condition was a join condition (i.e., it restricted columns from two different tables), then you could get into bad situations where the query planner just has no choice but to use a Cartesian join (bad, bad, bad!!!).
I don't think your CASE function is really doing anything there, except perhaps hamper your query planner's attempts at finding a good execution plan for the query.
But more generally, just write the straightforward query first and see how it performs for realistic data. No need to worry about a problem that might not even exist!
Nulls can be confusing. The " ... WHERE 1 = CASE ... " is useful if you are trying to pass a Null OR a Value as a parameter ex. "WHERE the_column = #parameter. This post may be helpful Passing Null using OLEDB .
Another example where CASE is useful is when using date functions on the varchar columns. adding ISDATE before using say convert(colA,datetime) might not work, and when colA has non-date data the query can error out.