How to check if a float is between multiple ranges in Postgres? - sql

I'm trying to write a query like this:
SELECT * FROM table t
WHERE ((long_expression BETWEEN -5 AND -2) OR
(long_expression BETWEEN 0 AND 2) OR
(long_expression BETWEEN 4 and 6))
Where long_expression is approximately equal to this:
(((t.s <#> (SELECT s FROM user WHERE user.user_id = $1)) / (SELECT COUNT(DISTINCT cluster_id) FROM cluster) * -1) + 1)
t.s and s are the CUBE datatypes and <#> is the indexed distance operator.
I could just repeat this long expression multiple times in the body, but this would be extremely verbose. An alternative might be to save it in a variable somehow (with a CTE?), but I think this might remove the possibility of using an index in the WHERE clause?
I also found int4range and numrange, but I don't believe they would work here either, because the distance operator returns float8's, not integer or numerics.

You can use a lateral join:
SELECT t.*
FROM table t CROSS JOIN LATERAL
(VALUES (long_expression)) v(x)
WHERE ((v.x BETWEEN -5 AND -2) OR
(v.x BETWEEN 0 AND 2) OR
(v.x BETWEEN 4 and 6)
);
Of course, a CTE or subquery could be used as well; I like lateral joins because they are easy to express multiple expressions that depend on previous values.

Related

Average interval between timestamps in an array

In a PostgreSQL 9.x database, I have a column which is an array of type timestamp. Each array has between 1..n timestamps.
I'm trying to extract the average interval between all elements in each array.
I understand using a window function on the source table might be the ideal way to tackle this but in this case I am trying to do it as an operation on the array.
I've looked at several other questions that are trying to calculate the moving average of another column etc or the avg (median date of a list of timestamps).
For example the average interval I'm looking for on an array with 3 elements like this:
'{"2012-10-09 17:04:05.710887"
,"2013-10-18 22:30:08.973749"
,"2014-10-22 22:18:18.885973"}'::timestamp[]
Would be:
-368d
Wondering if I need to unpack the array through a function?
One way of many possible: unnest, join, avg in a lateral subquery:
SELECT *
FROM tbl t
LEFT JOIN LATERAL (
SELECT avg(a2.ts - a1.ts) AS avg_intv
FROM unnest(t.arr) WITH ORDINALITY a1(ts, ord)
JOIN unnest(t.arr) WITH ORDINALITY a2(ts, ord) ON (a2.ord = a1.ord + 1)
) avg ON true;
db<>fiddle here
The [INNER] JOIN in the subquery produces exactly the set of combinations relevant for intervals between elements.
I get 371 days 14:37:06.587543, not '-368d', btw.
Related, with more explanation:
PostgreSQL unnest() with element number
You can also only unnest once and use the window functions lead() or lag(), but you were trying to avoid window functions. And you need to make sure of the original order of elements in any case ...
(There is no array function you could use directly to get what you need - in case you were hoping for that.)
Alternative with CTE
Might be appealing to still unnest only once (even while avoiding window functions):
SELECT *
FROM tbl t
LEFT JOIN LATERAL (
WITH a AS (SELECT * FROM unnest(t.arr) WITH ORDINALITY a1(ts, ord))
SELECT avg(a2.ts - a1.ts) AS avg_intv
FROM a a1
JOIN a a2 ON (a2.ord = a1.ord +1)
) avg ON true;
But I expect the added CTE overhead to cost more than unnesting twice. Mostly just demonstrating a WITH clause in a subquery.

Simplifying a SQL Select Statement in Oracle SQL Developer

The problem I'm facing right now is I'm working with a SQL query that has over 200 lines of code and at the moment in multiple cases I'm just repeating the same sub-query multiple times in this select statement. In the code below I'm using two of the select statements a lot "avail_qty" and "pct_avail" which both having equations in them. Inside the LOW_CNT_&% SELECT statement I use both of the previous two SELECT statements over and over (this is just one example in my code). I would like to be able to make the equation once and assign it to a variable. Is there any way of doing this? I have tried using the WITH clause but for that you need to use a FROM clause, my FROM clause is massive and would look just as ugly if I were to use a WITH clause (plus instead of repeating the SELECT statement now I would be just repeating the FROM statement).
The reason I don't want to type out the whole equation multiple times is for a two reasons the first is it makes the code easier to read. My other reason is because multiple people edit this query and if someone else were to edit the equation in one spot but forgets to edit it in another spot, that could be bad. Also it doesn't feel like good code etiquette to repeat code over and over.
SELECT
all_nbr.total_qty,
NVL (avail_nbr.avail_qty, 0) AS avail_qty,
100 * TRUNC ( (NVL (avail_nbr.avail_qty, 0) / all_nbr.total_qty), 2) AS pct_avail,
CASE
WHEN ((NVL (avail_nbr.avail_qty, 0)) < 35)
THEN CASE
WHEN ((100 * TRUNC ( (NVL (avail_nbr.avail_qty, 0) / all_nbr.total_qty), 2)) < 35)
THEN (35 - (NVL (avail_nbr.avail_qty, 0)))
ELSE 0
END
ELSE 0
END AS "LOW_CNT_&%"
FROM
...
Any help would be awesome!!
If the subquery is exactly the same one, you can pre-compute it as a Common Table Expression (CTE). For example:
with
cte1 as (
select ... -- long, tedious, repetitive SELECT here
),
cte2 as (
select ... -- you can reference/use cte1 here
)
select ...
from cte1 -- you can use cte1 here, multiple times if you want
join cte2 -- you can also reference/use cte2 here, also multiple times
join ... -- all other joins
cte1 (you can use any name) is a precomputed table expression that can be used multiple times. You can also have multiple CTEs, each one with different names; also each CTE can reference previous ones.
I have tried using the WITH clause but for that you need to use a FROM clause, my FROM clause is massive and would look just as ugly if I were to use a WITH clause (plus instead of repeating the SELECT statement now I would be just repeating the FROM statement).
You shouldn't need to repeat the from clause. You move all of the query, including that clause, into the CTE; you just pull out the bits that rely on earlier calculations into the main query, which avoids the code repetition.
The structure would be something like:
WITH cte AS (
SELECT
all_nbr.total_qty,
NVL (avail_nbr.avail_qty, 0) AS avail_qty,
100 * TRUNC ( (NVL (avail_nbr.avail_qty, 0) / all_nbr.total_qty), 2) AS pct_avail,
FROM
...
)
SELECT
cte.total_qty,
cte.avail_qty,
cte.pct_avail,
CASE
WHEN cte.avail_qty, 0 < 35
THEN CASE
WHEN cte.total_qty < 35
THEN 35 - cte.avail_qty
ELSE 0
END
ELSE 0
END AS "LOW_CNT_&%"
FROM
cte;
Your main query only need to refer to the CTE (again, based on what you've shown), and can (only) refer to the prjoection of the CTE, incuding the calculated columns. It can't see the underlying tables, but shouldn't need to.
Or with an inline view instead, the principal is the same:
SELECT
total_qty,
avail_qty,
pct_avail,
CASE
WHEN avail_qty < 35
THEN CASE
WHEN total_qty < 35
THEN 35 - avail_qty
ELSE 0
END
ELSE 0
END AS "LOW_CNT_&%"
FROM
(
SELECT
all_nbr.total_qty,
NVL (avail_nbr.avail_qty, 0) AS avail_qty,
100 * TRUNC ( (NVL (avail_nbr.avail_qty, 0) / all_nbr.total_qty), 2) AS pct_avail,
FROM
...
);

Teradata column not sorting correctly

I'm trying to join two tables by a column, and sort a table by the same column.
Here is some example data from the two tables:
table.x
state
00039
01156
table.y
state
39
1156
How do I join and sort the tables in SQL assistant?
Simplest solution would be to cast both sides to integer as #Andrew mentioned, so you could use simple casting, or trycast(...) which will try to cast the value and if that fails won't return an error, but NULL value instead:
select *
from x
inner join y on
trycast(y.state as integer) = trycast(y.state as integer)
order by y.state
Old answer (leaving this here for sake of future readers and what you can / can't do):
If you have a recent version of Teradata (you didn't specify it) you would also have LPAD function. Assuming that y.state is not text, but a number we'd also need to cast it, as lpad takes string as argument. If it is, omit cast(...):
select *
from x
inner join y on
x.state = lpad(cast(y.state as varchar(5)), 5, '0')
order by y.state
If you don't have an LPAD function, then some dirty code with substring might come in handy:
select *
from x
inner join y on
x.state = substring('00000' from char_length(cast(y.state as varchar(5))+1) || cast(y.state as varchar(5)
order by y.state
Above assumes that you store numbers within maximum 5 digits. If it's beyond that number (your sample data says 5) then you need to adjust the code.

SQL Server - Multiplying row values for a given column value [duplicate]

Im looking for something like SELECT PRODUCT(table.price) FROM table GROUP BY table.sale similar to how SUM works.
Have I missed something on the documentation, or is there really no PRODUCT function?
If so, why not?
Note: I looked for the function in postgres, mysql and mssql and found none so I assumed all sql does not support it.
For MSSQL you can use this. It can be adopted for other platforms: it's just maths and aggregates on logarithms.
SELECT
GrpID,
CASE
WHEN MinVal = 0 THEN 0
WHEN Neg % 2 = 1 THEN -1 * EXP(ABSMult)
ELSE EXP(ABSMult)
END
FROM
(
SELECT
GrpID,
--log of +ve row values
SUM(LOG(ABS(NULLIF(Value, 0)))) AS ABSMult,
--count of -ve values. Even = +ve result.
SUM(SIGN(CASE WHEN Value < 0 THEN 1 ELSE 0 END)) AS Neg,
--anything * zero = zero
MIN(ABS(Value)) AS MinVal
FROM
Mytable
GROUP BY
GrpID
) foo
Taken from my answer here: SQL Server Query - groupwise multiplication
I don't know why there isn't one, but (take more care over negative numbers) you can use logs and exponents to do:-
select exp (sum (ln (table.price))) from table ...
There is no PRODUCT set function in the SQL Standard. It would appear to be a worthy candidate, though (unlike, say, a CONCATENATE set function: it's not a good fit for SQL e.g. the resulting data type would involve multivalues and pose a problem as regards first normal form).
The SQL Standards aim to consolidate functionality across SQL products circa 1990 and to provide 'thought leadership' on future development. In short, they document what SQL does and what SQL should do. The absence of PRODUCT set function suggests that in 1990 no vendor though it worthy of inclusion and there has been no academic interest in introducing it into the Standard.
Of course, vendors always have sought to add their own functionality, these days usually as extentions to Standards rather than tangentally. I don't recall seeing a PRODUCT set function (or even demand for one) in any of the SQL products I've used.
In any case, the work around is fairly simple using log and exp scalar functions (and logic to handle negatives) with the SUM set function; see #gbn's answer for some sample code. I've never needed to do this in a business application, though.
In conclusion, my best guess is that there is no demand from SQL end users for a PRODUCT set function; further, that anyone with an academic interest would probably find the workaround acceptable (i.e. would not value the syntactic sugar a PRODUCT set function would provide).
Out of interest, there is indeed demand in SQL Server Land for new set functions but for those of the window function variety (and Standard SQL, too). For more details, including how to get involved in further driving demand, see Itzik Ben-Gan's blog.
You can perform a product aggregate function, but you have to do the maths yourself, like this...
SELECT
Exp(Sum(IIf(Abs([Num])=0,0,Log(Abs([Num])))))*IIf(Min(Abs([Num]))=0,0,1)*(1-2*(Sum(IIf([Num]>=0,0,1)) Mod 2)) AS P
FROM
Table1
Source: http://productfunctionsql.codeplex.com/
There is a neat trick in T-SQL (not sure if it's ANSI) that allows to concatenate string values from a set of rows into one variable. It looks like it works for multiplying as well:
declare #Floats as table (value float)
insert into #Floats values (0.9)
insert into #Floats values (0.9)
insert into #Floats values (0.9)
declare #multiplier float = null
select
#multiplier = isnull(#multiplier, '1') * value
from #Floats
select #multiplier
This can potentially be more numerically stable than the log/exp solution.
I think that is because no numbering system is able to accommodate many products. As databases are designed for large number of records, a product of 1000 numbers would be super massive and in case of floating point numbers, the propagated error would be huge.
Also note that using log can be a dangerous solution. Although mathematically log(a*b) = log(a)*log(b), it might not be in computers as we are not dealing with real numbers. If you calculate 2^(log(a)+log(b)) instead of a*b, you may get unexpected results. For example:
SELECT 9999999999*99999999974482, EXP(LOG(9999999999)+LOG(99999999974482))
in Sql Server returns
999999999644820000025518, 9.99999999644812E+23
So my point is when you are trying to do the product do it carefully and test is heavily.
One way to deal with this problem (if you are working in a scripting language) is to use the group_concat function.
For example, SELECT group_concat(table.price) FROM table GROUP BY table.sale
This will return a string with all prices for the same sale value, separated by a comma.
Then with a parser you can get each price, and do a multiplication. (In php you can even use the array_reduce function, in fact in the php.net manual you get a suitable example).
Cheers
Another approach based on fact that the cardinality of cartesian product is product of cardinalities of particular sets ;-)
⚠ WARNING: This example is just for fun and is rather academic, don't use it in production! (apart from the fact it's just for positive and practically small integers)⚠
with recursive t(c) as (
select unnest(array[2,5,7,8])
), p(a) as (
select array_agg(c) from t
union all
select p.a[2:]
from p
cross join generate_series(1, p.a[1])
)
select count(*) from p where cardinality(a) = 0;
The problem can be solved using modern SQL features such as window functions and CTEs. Everything is standard SQL and - unlike logarithm-based solutions - does not require switching from integer world to floating point world nor handling nonpositive numbers. Just number rows and evaluate product in recursive query until no row remain:
with recursive t(c) as (
select unnest(array[2,5,7,8])
), r(c,n) as (
select t.c, row_number() over () from t
), p(c,n) as (
select c, n from r where n = 1
union all
select r.c * p.c, r.n from p join r on p.n + 1 = r.n
)
select c from p where n = (select max(n) from p);
As your question involves grouping by sale column, things got little bit complicated but it's still solvable:
with recursive t(sale,price) as (
select 'multiplication', 2 union
select 'multiplication', 5 union
select 'multiplication', 7 union
select 'multiplication', 8 union
select 'trivial', 1 union
select 'trivial', 8 union
select 'negatives work', -2 union
select 'negatives work', -3 union
select 'negatives work', -5 union
select 'look ma, zero works too!', 1 union
select 'look ma, zero works too!', 0 union
select 'look ma, zero works too!', 2
), r(sale,price,n,maxn) as (
select t.sale, t.price, row_number() over (partition by sale), count(1) over (partition by sale)
from t
), p(sale,price,n,maxn) as (
select sale, price, n, maxn
from r where n = 1
union all
select p.sale, r.price * p.price, r.n, r.maxn
from p
join r on p.sale = r.sale and p.n + 1 = r.n
)
select sale, price
from p
where n = maxn
order by sale;
Result:
sale,price
"look ma, zero works too!",0
multiplication,560
negatives work,-30
trivial,8
Tested on Postgres.
Here is an oracle solution for anyone who needs it
with data(id, val) as(
select 1,1.0 from dual union all
select 2,-2.0 from dual union all
select 3,1.0 from dual union all
select 4,2.0 from dual
),
neg(val , modifier) as(
select exp(sum(ln(abs(val)))), case when mod(count(*),2) = 0 then 1 Else -1 end
from data
where val <0
)
,
pos(val) as (
select exp(sum(ln(val)))
from data
where val >=0
)
select (select val*modifier from neg)*(select val from pos) product from dual

SQL return exactly one row or null in a select sub-query

In Oracle, is it possible to have a sub-query within a select statement that returns a column if exactly one row is returned by the sub-query and null if none or more than one row is returned by the sub-query?
Example:
SELECT X,
Y,
Z,
(SELECT W FROM TABLE2 WHERE X = TABLE1.X) /* but return null if 0 or more than 1 rows is returned */
FROM TABLE1;
Thanks!
How about going about it in a different way? A simple LEFT OUTER JOIN with a subquery should do what you want:
SELECT T1.X
,T1.Y
,T1.Z
,T2.W
FROM TABLE1 AS T1
LEFT OUTER JOIN (
SELECT X
,W
FROM TABLE2
GROUP BY X,W
HAVING COUNT(X) = 1
) AS T2 ON T2.X = T1.X;
This will only return items that have exactly 1 instance of X, and LEFT OUTER JOIN it back to the table when appropriate (leaving the non-matches NULL).
This is also ANSI-compliant, so it is quite performant.
Besides a CASE solution or rewriting the inline subquery as an outer join, this will work, if you can apply an aggregate function (MIN or MAX) on the W column:
SELECT X,
Y,
Z,
(SELECT MIN(W) FROM TABLE2 WHERE X = TABLE1.X HAVING COUNT(*) = 1) AS W
FROM TABLE1;
SELECT
X, Y, Z, (SELECT W FROM TABLE2 WHERE X = TABLE1.X HAVING COUNT(*) = 1)
FROM
TABLE1;
my answer is: dont use subselects (unless you are sure ...)
no need and not a good idea to use a subselect here as PlantTheIdea mentioned because of two things
explaination:
subselect means:
one select for each row of the primary select result set. i.e. if you get 1000 rows, you also get 1000 (small) select statemts in your db-system (ignoring optimizer here)
and(!)
with a subselect you have a good chance to hide (or override) a heavy database or select problem. that means: you are only expecting none (NULL) or one (exactly) row (both easily resolvable with a [left outer] join). if there are more than one in your subselect there is something wrong, the SQL Error points that out
the "HAVING COUNT(X) = 1" of course correct, has the small (or not small) problem, thats: "why is there a count of more than one row?"
I spent hours of lifetime finding a workarround like this, just ending up in "dont do it if you are realy sure ..."
I see that in opposite to a "having" like this
...
HAVING date=max(date) -- depends on sql dialect
or
where date = select max(date) from same_table
and with my last example i again want to point out: if you get here more than one row (both from today ;.) you have a DB problem - you chould use a timestamp instead for example