SQL function table interpolation

SQL function table interpolation - sql

I have an SQL table of (x,y) values.
x y
0.0 0.0
0.1 0.4
0.5 1.0
5.0 2.0
6.0 4.0
8.0 4.0
10.0 5.0
The x column is indexed. I am using sqlite.
My ultimate goal is to get a y(x) for any x value. I will use linear interpolation using table values. Similar as shown in the plot below.
Is there a way to perform the linear interpolation directly using a select query?
Otherwise getting the interval values where the x belongs, would be enough.
Is there a query that will give me the last smaller and the first bigger pair of a given x, so that I can compute the interpolated y(x) value?
For example if x=2.0 to get:
0.5 1.0
5.0 2.0
In case x is out of the table to get the two first/last values to perform an extrapolation.
For example if x=20.0 to get:
8.0 4.0
10.0 5.0

It would be hard to do this in plain SQLLite, without analytical functions. In more complex SQL engines, You could use LEG and LEAD analytical functions for obtaining set of pairs You want easily enough.
In SQLLite though, I would create two cursors, like those:
Cursor C1:
SELECT
x,y
FROM
table
WHERE
x>=2
ORDER BY
x asc
;
Cursor C2:
SELECT
x,y
FROM
table
WHERE
x<=2
ORDER BY
x desc
;
And perform rest of operations in other language - fetching once from both, or if one cursor do not return value, twice from the other. Also, some additional exceptions need to be handled - what if Your set have less than two values. Or if You have given X in Your set - You do not need interpolata at all... And so on.

I would go with a simple substraction.
You are looking to the two nearest input so :
SELECT x, y
FROM my_table
ORDER BY Abs(:val - x)
LIMIT 2
However this will lead to a full table scan.

Related

Pandas keep decimal part of values like .998344 after round(2) applied to series

I have dataset with float values with 6 decimals. I need to round it to two decimals.The problem becams with some floats nearly close to 1. After applying round(2) I got 1.00 instead of 0.99. May be this is mathematically right, but I need to have values like 0.99. My customer needs two decimals as result, I cant change it.
ones_list = [0.998344, 0.996176, 0.998344, 0.998082]
df = pd.DataFrame(ones_list, columns=['value_to_round'])
df['value_to_round'].round(2)
1.0
1.0
1.0
1.0

I see a few options:
use floor instead of round (but would you have the same issue with 0.005?)
use clip to set a maximum (and a min?) value in the column of 0.99:
df['value_to_round'].round(2).clip(upper=0.99)

Please refer to the basic of rounding in math, you are trying to round 2 digits behind the dot using .round(2)
if you round 0.999 using .round(2), of course you'll get 1.0 because the last '9' digit (0.009) will become 0.01 thus will be added to 0.09 and become 0.1 added again with 0.9 and finally becomes 1.0
If you really want to have values like 0.99, just take the two decimals behind the dot. You can try either the following methods:
import math
df['value_to_round'] = df['value_to_round'] * 100
df['value_to_round'] = df['value_to_round'].apply(math.trunc)
df['value_to_round'] = df['value_to_round'] / 100
or
df['value_to_round'] = df['value_to_round'].astype(str)
df['value_to_round'] = df['value_to_round'].str[:4]
df['value_to_round'] = df['value_to_round'].astype(float)
I experienced the same thing when I was trying to show R squared value, what I did is just use .round(3), because 3 digits decimal wouldn't hurt
I hope this helps :)

df['value_to_round'] = [x if x < 1 else 0.99 for x in df['value_to_round'].round(2)]

r package for fuzzy sums of multiple rasters

I would like to do a fuzzy sum with raster data in r to form a cumulative resistance layer for research. I have found packages and functions to do fuzzy sums with vector data and was wondering if anyone can share resources for specifically combining raster layers with fuzzy logic.
Thank you

You can use spatialEco::fuzzySum for both vector and raster data.
For example for three terra rasters: rast1, rast2, rast3 it would work as follows:
rFuzzySum <- spatialEco::fuzzySum(c(rast1, rast2, rast3))
If you write it open, it would be:
rFuzzySum <- (1 - ( (1 - rast1) *
(1 - rast2) *
(1 - rast3) ) )

Here an illustration of how you can do that, using the suggestions by MattKummu
Example data
library(terra)
x <- rast(system.file("ex/logo.tif", package="terra"))
x <- x / max(minmax(x))
Two approaches
a <- 1 - prod(1 - x)
b <- spatialEco::fuzzySum(x)

Generate normally distributed series using BIgQuery

Is there a way to generate normally distributed series in BQ? ideally specifying the mean and sd of the distribution.
I found a way using Marsaglia polar method , but it is not ideal for I do not want polar coordinates of the distribution but to generate an array that follows the parameters specified for it to be normally distributed.
Thank you in advance.

This query gives you the euclidean coordinates of the normal distribution centred in 0. You can adjust both the mean (mean variable) or the sd (variance variable) and the x-axis values (GENERATE_ARRAY(beginning,end,step)) :
CREATE TEMPORARY FUNCTION normal(x FLOAT64)
RETURNS FLOAT64
LANGUAGE js AS """
var mean=0;
var variance=1;
var x0=1/(Math.sqrt(2*Math.PI*variance));
var x1=-Math.pow(x-mean,2)/(2*Math.pow(variance,2));
return x0*Math.pow(Math.E,x1);
""";
WITH numbers AS
(SELECT x FROM UNNEST(GENERATE_ARRAY(-10, 10,0.5)) AS x)
SELECT x, normal(x) as normal
FROM numbers;
For doing that, I used "User Defined Funtions" [1]. They are used when you want to have another SQL expression or when you want to use Java Script (as I did).
NOTE: I used the probability density function of the normal distribution, if you want to use another you'd need to change variables x0,x1 and the return (I wrote them separately so it's clearer).

Earlier answers give the probability distribution function of a normal rv. Here I modify previous answers to give a random number generated with the desired distribution, in BQ standard SQL, using the 'polar coordinates' method. The question asks not to use polar coordinates, which is an odd request, since polar coordinates are not use in the generation of the normally distributed random number.
CREATE TEMPORARY FUNCTION rnorm ( mu FLOAT64, sigma FLOAT64 ) AS
(
(select mu + sigma*(sqrt( 2*abs(
log( RAND())
)
)
)*cos( 2*ACOS(-1)*RAND())
)
)
;
select
num ,
rnorm(-1, 5.3) as RAND_NORM
FROM UNNEST(GENERATE_ARRAY(1, 17) ) AS num

The easiest way to do it in BQ is by creating a custom function:
CREATE OR REPLACE FUNCTION
`your_project.functions.normal_distribution_pdf`
(x ANY TYPE, mu ANY TYPE, sigma ANY TYPE) AS (
(
SELECT
safe_divide(1,sigma * power(2 * ACOS(-1),0.5)) * exp(-0.5 * power(safe_divide(x-mu,sigma),2))
)
);
Next you only need to apply the function:
with inputs as (
SELECT 1 as x, 0 as mu, 1 as sigma
union all
SELECT 1.5 as x, 1 as mu, 2 as sigma
union all
SELECT 2 as x , 2 as mu, 3 as sigma
)
SELECT x,
`your_project.functions.normal_distribution_pdf`(x, mu, sigma) as normal_pdf
from
inputs

Sql issue in calculating formulas

I have a problem when i'm trying to calculate in a view a formula whose result is smaller than 1.
e.g. I have the next formula: Arenda*TotalArea/10000 as TotalArenda
If I have Arenda=10 and TotalArea=10 I get TotalArenda=0,00 when normally should be 0.01
Thanks

Make Arenda = 10.0 and TotalArea = 10.0 instead of 10 and 10. This will force SQL not to use integer math and you will get your needed accuracy.
In fact, the only way I can get 0.0 as the result is if the Arenda is 10 (integer) while at least one of TotalArea or 10000 contain a decimal and a trailing 0, and only if I override order of operations by grouping using parentheses such as
select 10.0* (10/10000) as blah
If all are integers you get 0. If all contain decimals you get 0.01. If I remove the parentheses, I get 0.01 if ANY of them are non-integer types.
If precision is highly important I would recommend you cast to decimals and not floats:
select CONVERT(decimal(10,2), Arenda) * CONVERT(decimal(10,2), TotalArea) / 10000.0

You are using colunns, so changing the type may not be feasible. SQL Server does integer division on integers (other databases behave differently). Try one of these:
cast(Arenda as float)*cast(TotalArea as float)/10000
or:
Arenda*TotalArea/10000.0

Treatment of error values in the SQL standard

I have a question about the SQL standard which I'm hoping a SQL language lawyer can help with.
Certain expressions just don't work. 62 / 0, for example. The SQL standard specifies quite a few ways in which expressions can go wrong in similar ways. Lots of languages deal with these expressions using special exceptional flow control, or bottom psuedo-values.
I have a table, t, with (only) two columns, x and y each of type int. I suspect it isn't relevant, but for definiteness let's say that (x,y) is the primary key of t. This table contains (only) the following values:
x y
7 2
3 0
4 1
26 5
31 0
9 3
What behavior is required by the SQL standard for SELECT expressions operating on this table which may involve division(s) by zero? Alternatively, if no one behavior is required, what behaviors are permitted?
For example, what behavior is required for the following select statements?
The easy one:
SELECT x, y, x / y AS quot
FROM t
A harder one:
SELECT x, y, x / y AS quot
FROM t
WHERE y != 0
An even harder one:
SELECT x, y, x / y AS quot
FROM t
WHERE x % 2 = 0
Would an implementation (say, one that failed to realize on a more complex version of this query that the restriction could be moved inside the extension) be permitted to produce a division by zero error in response to this query, because, say it attempted to divide 3 by 0 as part of the extension before performing the restriction and realizing that 3 % 2 = 1? This could become important if, for example, the extension was over a small table but the result--when joined with a large table and restricted on the basis of data in the large table--ended up restricting away all of the rows which would have required division by zero.
If t had millions of rows, and this last query were performed by a table scan, would an implementation be permitted to return the first several million results before discovering a division by zero near the end when encountering one even value of x with a zero value of y? Would it be required to buffer?
There are even worse cases, ponder this one, which depending on the semantics can ruin boolean short-circuiting or require four-valued boolean logic in restrictions:
SELECT x, y
FROM t
WHERE ((x / y) >= 2) AND ((x % 2) = 0)
If the table is large, this short-circuiting problem can get really crazy. Imagine the table had a million rows, one of which had a 0 divisor. What would the standard say is the semantics of:
SELECT CASE
WHEN EXISTS
(
SELECT x, y, x / y AS quot
FROM t
)
THEN 1
ELSE 0
END AS what_is_my_value
It seems like this value should probably be an error since it depends on the emptiness or non-emptiness of a result which is an error, but adopting those semantics would seem to prohibit the optimizer for short-circuiting the table scan here. Does this existence query require proving the existence of one non-bottoming row, or also the non-existence of a bottoming row?
I'd appreciate guidance here, because I can't seem to find the relevant part(s) of the specification.

All implementations of SQL that I've worked with treat a division by 0 as an immediate NaN or #INF. The division is supposed to be handled by the front end, not by the implementation itself. The query should not bottom out, but the result set needs to return NaN in this case. Therefore, it's returned at the same time as the result set, and no special warning or message is brought up to the user.
At any rate, to properly deal with this, use the following query:
select
x, y,
case y
when 0 then null
else x / y
end as quot
from
t
To answer your last question, this statement:
SELECT x, y, x / y AS quot
FROM t
Would return this:
x y quot
7 2 3.5
3 0 NaN
4 1 4
26 5 5.2
31 0 NaN
9 3 3
So, your exists would find all the rows in t, regardless of what their quotient was.
Additionally, I was reading over your question again and realized I hadn't discussed where clauses (for shame!). The where clause, or predicate, should always be applied before the columns are calculated.
Think about this query:
select x, y, x/y as quot from t where x%2 = 0
If we had a record (3,0), it applies the where condition, and checks if 3 % 2 = 0. It does not, so it doesn't include that record in the column calculations, and leaves it right where it is.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL function table interpolation - sql

I would go with a simple substraction. You are looking to the two nearest input so : SELECT x, y FROM my_table ORDER BY Abs(:val - x) LIMIT 2 However this will lead to a full table scan.

Related

Pandas keep decimal part of values like .998344 after round(2) applied to series

r package for fuzzy sums of multiple rasters

Generate normally distributed series using BIgQuery

Sql issue in calculating formulas

Treatment of error values in the SQL standard

Categories

Resources