is there linear interpolation sql query? - sql

I have two tables. Table1 has data recorded at 10 sec intervals and the data in Table2 was recorded at 1 or 2 sec intervals. I want to join these two tables in a way that it will select the whole data from Table1 joined with Table2 where the recording time matches or the recording time in Table two is near to the recording time in Table1.
For example, one row in Table1 was recorded at 21:11:20. This row should be joined with a row in Table2 recorded at 21:11:20 if it exists, otherwise, selects the nearest row, let's say a row at 21:11:19.
Thank you.
Table1
Table2

you could try:
select t1.* from table1 t1 inner join table2 t2 on date_trunc('sec',t1.val) = t2.val;
Databases that allow for function indexes will do slightly better with this query than those without.

It takes a little effort, but it is clearly doable (here: SQL-Server):
-- set upt a test environment with two tables:
create table tbl1 ( i1 int identity primary key, t1 time, v1 float);
create table tbl10 (i10 int identity primary key, t10 time, v10 float);
-- fill them with some test values:
insert into tbl1 (t1,v1) VALUES ('12:00:06',1),('12:00:07',2),('12:00:08',3),('12:00:09',3),('12:00:10',2),('12:00:11',1),('12:00:12',0);
insert into tbl10 (t10,v10) VALUES ('12:00:00',99),('12:00:10',100),('12:00:20',98),('12:00:30',110);
-- and "join" them with interpolation
WITH t10s AS (
SELECT i10, t10, v10, DATEDIFF(second,0,t10) s10 FROM tbl10
)
SELECT t1,v1, v10a*f+v10b*(1.-f) v10int FROM (
SELECT t1, v1, CAST(b-DATEDIFF(second,0.,t1) AS FLOAT)/(b-a) f, ta.v10 v10a, tb.v10 v10b
FROM (
select t1, v1,
(SELECT max(s10) FROM t10s WHERE t10<=t1) a,
(SELECT min(s10) FROM t10s WHERE t10> t1) b
FROM tbl1
) tmp1
INNER JOIN t10s ta ON ta.s10=a
INNER JOIN t10s tb ON tb.s10=b
) tmp2
-- output:
t1 v1 v10int
12:00:06 1 99.6
12:00:07 2 99.7
12:00:08 3 99.8
12:00:09 3 99.9
12:00:10 2 100.0
12:00:11 1 99.8
12:00:12 0 99.6
see the little demo here: https://rextester.com/PQDH85753
In my rudimentary script I had the situation of always having t10 values greater or smaller than the t1 values. To protect against "getting out of range" you could use a COALESCE() function with a default value that applies outside the range.

Related

Add column to table based on aggregations and ranges found in another table

Given an initial table t1:
i_t1
tk
400
t1
702
t2
and a second table t2:
tk
i1_t2
i2_t2
v
t20
300
600
0.5
t19
350
550
0.6
t18
370
420
0.7
t17
500
800
0.2
t16
623
751
0.9
I would like to have the following result:
i_t1
tk
tot
400
t1
1.8
702
t2
1.1
This means that I add a column tot to table t1 that contains the sum over all values in column v in t2, only when i_t1 (from t1) is within the range [i1_t2 , i2_t2] (from t2).
Alternative #1:
It was to generate intermediate rows in t2 corresponding to each value ì in the range. Then, group by that i and cross check i with the value in i_t1. This has been discussed in my other post. However, I noticed, that this will be adding a lot of rows (sometimes unnecessary, because there is no match in t1) in my case and creates a performance issue.
Alternative #2: (conceptual)
Go through t1 row by row, select the corresponding value of i_t1 (e.g. 400) and then pick up the value resulting from the query below:
select sum(v) from t2 where ((select i_t1 from t1 limit 1) >= i1_t2) and ((select i_t1 from t1 limit 1) <= i2_t2);
However, this requires to be repeated over all rows of t1 and I'm not sure how to bring the result back to t1.
Is there a more efficient way to achieve this?
Here are the queries that can be used to reproduce:
create table t1 (i_t1 int, tk varchar(5));
insert into t1 (i_t1, tk)
values (400,'t1'),(702,'t2');
create table t2 (tk varchar(5), i1_t2 int, i2_t2 int, v real);
insert into t2 (tk, i1_t2, i2_t2, v)
values ('t20',300,600,0.5),('t19',350,550,0.6),('t18',370,420,0.7),('t17',500,800,0.2),('t16',623,751,0.9);
demo:db<>fiddle
SELECT
t1.i_t1, t1.tk,
SUM(t2.v) -- 2
FROM t1
JOIN t2 ON (t1.i_t1 BETWEEN t2.i1_t2 AND i2_t2) -- 1
GROUP BY t1.i_t1, t1.tk
Join on your condition (i_t1 between i1_t2 and i2_t2)
Group and sum the join result

Joining and grouping to equate on two tables

I've tried to minify this problem as much as possible. I've got two tables which share some Id's (among other columns)
id id
---- ----
1 1
1 1
2 1
2
2
Firstly, I can get each table to resolve to a simple count of how many of each Id there is:
select id, count(*) from tbl1 group by id
select id, count(*) from tbl2 group by id
id | tbl1-count id | tbl2-count
--------------- ---------------
1 2 1 3
2 1 2 2
but then I'm at a loss, I'm trying to get the following output which shows the count from tbl2 for each id, divided by the count from tbl1 for the same id:
id | count of id in tbl2 / count of id in tbl1
==========
1 | 1.5
2 | 2
So far I've got this:
select tbl1.Id, tbl2.Id, count(*)
from tbl1
join tbl2 on tbl1.Id = tbl2.Id
group by tbl1.Id, tbl2.Id
which just gives me... well... something nowhere near what I need, to be honest! I was trying count(tbl1.Id), count(tbl2.Id) but get the same multiplied amount (because I'm joining I guess?) - I can't get the individual representations into individual columns where I can do the division.
This gives consideration to your naming of tables -- the query from tbl2 needs to be first so the results will include all records from tbl2. The LEFT JOIN will include all results from the first query, but only join those results that exist in tbl1. (Alternatively, you could use a FULL OUTER JOIN or UNION both results together in the first query.) I also added an IIF to give you an option if there are no records in tbl1 (dividing by null would produce null anyway, but you can do what you want).
Counts are cast as decimal so that the ratio will be returned as a decimal. You can adjust precision as required.
SELECT tb2.id, tb2.table2Count, tb1.table1Count,
IIF(ISNULL(tb1.table1Count, 0) != 0, tb2.table2Count / tb1.table1Count, null) AS ratio
FROM (
SELECT id, CAST(COUNT(1) AS DECIMAL(18, 5)) AS table2Count
FROM tbl2
GROUP BY id
) AS tb2
LEFT JOIN (
SELECT id, CAST(COUNT(1) AS DECIMAL(18, 5)) AS table1Count
FROM tbl1
GROUP BY id
) AS tb1 ON tb1.id = tb2.id
(A subqquery with a LEFT JOIN will allow the query optimizer to determine how to generate the results and will generally outperform a CROSS APPLY, as that executes a calculation for every record.)
Assuming your expected results are wrong, then this is how I would do it:
CREATE TABLE T1 (ID int);
CREATE TABLE T2 (ID int);
GO
INSERT INTO T1 VALUES(1),(1),(2);
INSERT INTO T2 VALUES(1),(1),(1),(2),(2);
GO
SELECT T1.ID AS OutID,
(T2.T2Count * 1.) / COUNT(T1.ID) AS OutCount --Might want a CONVERT to a smaller scale and precision decimal here
FROM T1
CROSS APPLY (SELECT T2.ID, COUNT(T2.ID) AS T2Count
FROM T2
WHERE T2.ID = T1.ID
GROUP BY T2.ID) T2
GROUP BY T1.ID,
T2.T2Count;
GO
DROP TABLE T1;
DROP TABLE T2;
You can aggregate in subqueries and then join:
select t1.id, t2.cnt * 1.0 / t1.cnt
from (select id, count(*) as cnt
from tbl1
group by id
) t1 join
(select id, count(*) as cnt
from tbl2
group by id
) t2
on t1.id = t2.id

Equivalent Subquery for a Join

I am looking for an answer which is actually
Is It possible to rewrite every Join to equivalent Subquery
I know that Subquery columns can not be selected outer query.
I run a query in sql server which is
select DISTINct A.*,B.ParentProductCategoryID from [SalesLT].[Product] as
A inner join [SalesLT].[ProductCategory] as B on
A.ProductCategoryID=B.ProductCategoryID
select A.*
from [SalesLT].[Product] as A
where EXISTS(select B.ParentProductCategoryID from [SalesLT].
[ProductCategory] as B where A.ProductCategoryID=B.ProductCategoryID)
Both of these query giving me output 293 rows which I expected.
Now Problem is How do I select [SalesLT].[ProductCategory] the column in the 2nd case?
Do I need to co-relate this subquery in the select clause to get this column to be shown in output?
Is It possible to rewrite every Join to equivalent Subquery
No, because joins can 1) remove rows or 2) multiply rows
ex 1)
CREATE TABLE t1 (num int)
CREATE TABLE t2 (num int)
INSERT INTO t1 VALUES (1), (2), (3)
INSERT INTO t2 VALUES (2) ,(3)
SELECT * FROM t1 INNER JOIN t2 ON t1.num = t2.num
Gives output
t1num t2num
2 2
3 3
The row containing value 1 from t1 was removed. This does not happen in a subquery.
ex 2)
CREATE TABLE t1 (num int)
CREATE TABLE t2 (num int)
INSERT INTO t1 VALUES (1), (2), (3)
INSERT INTO t2 VALUES (2) ,(3), (3), (3), (3)
SELECT t1.num AS t1num, t2.num as t2num FROM t1 INNER JOIN t2 ON t1.num = t2.num
Gives output
t1num t2num
2 2
3 3
3 3
3 3
3 3
A subquery would not change the number of rows in the table being queried.
In your example, you do an exists... this is not going to return the value from the 2nd table.
This is how I would subquery:
select A.*
,(SELECT B.ParentProductCategoryID
FROM [SalesLT].[ProductCategory] B
WHERE B.ProductCategoryID = A.ProductCategoryID) AS [2nd table ProductCategoryID]
from [SalesLT].[Product] as A
You might use
select A.*,
(
select B.ParentProductCategoryID
from [SalesLT].[ProductCategory] as B
where A.ProductCategoryID=B.ProductCategoryID
) ParentProductCategoryID
from [SalesLT].[Product] as A
where EXISTS(select 1
from [SalesLT].[ProductCategory] as B
where A.ProductCategoryID=B.ProductCategoryID)
however, I find the JOIN version much more intuitive.
There is no way for you to use any data from the EXISTS subquery in the outer query. The only purpose of the subquery is to evaluate whether the EXISTS is true or false for each product.

Multiple table join

I have a scenario whereby I have 3 tables (Table1, Table2, Table3)
Table1 contains data whereby each MEMBNO is unique
I would like to JOIN to Table2 and Table3 to display results but only have one row for each result
I tried
SELECT A.MEMBNO,A.FIELD1,B.FIELD1,B.FIELD2,C.FIELD1
FROM Table1 A
INNER join Table2 B ON A.MEMBNO = B.MEMBNO
INNER join Table3 C ON A.MEMBNO = C.MEMBNO
but I get multiple results. If the MEMBNO is in Table2 twice and Table3 four times, I get 8 rows returned.
Is my JOIN correct or is the only way to control this through the WHERE statement after the JOIN to control what is returned from Table2 and Table3 (ie: does SQL "dumb" join all the data and expect the WHERE statement to be the filer?)
Many thanks
What you are fighting with is the different relationships between the data. Table1 is the primary key table which has your one row per MEMBNO. Table2\3 have more than one row for each MEMBNO. What you therefore need to think about is what data you actually want to see before you attempt the joins. The difference in cardinality is causing your row duplication when the joins are happening. If you want the data in Table2\3 to be squished into a single row, have a think how that might look. i.e. do you want to sum the numbers from the different rows into a total? do you want to take the maximum date? etc
Best thing to do is give some data examples from each table and give an example result. More than happy to have a go if you add that info.
As I am concern about only MEMBNO. What if I use distinct of MEMBNO from both tables Table2 and Table3.
Check the below example:
create table #t1
(
F1 int,
F2 int
)
Insert into #t1 values(1, 111)
Create table #t2
(
F1 int,
F2 int
)
Insert into #t2 values(1, 111)
Insert into #t2 values(1, 222)
Create table #t3
(
F1 int,
F2 int
)
Insert into #t3 values(1, 333)
Insert into #t3 values(1, 444)
SELECT a.*
FroM #t1 a left join (Select distinct f1 from #t2) b on a.F1 = b.f1
left join (Select distinct f1 from #t3) c on a.F1 = c.f1
Where #t1, #t2, #t3 are table1, table2, table3 respecively
AND F1 is your MEMBNO in all the tables.
You get multiple results because of using inner join.
You should use left or right join.

SQLServer join two tables

I've gotta question for you, I'm getting hard times trying to combine two tables, I can't manage to find the correct query.
I have two tables:
T1: 1column, Has X records
T2: 1column, Has Y records
Note: Y could never be greater than X but it often lesser than this one
I want to join those tables in order to have a table with two columns
t3: ColumnFromT1, columnFromT2.
When Y is lesser than X, the T2 field values gets repeated and are spread over all my other values, but I want to get NULL when ALL the columns from T2 are used.
How could I achieve that?
Thanks
You could give each table a row number in a subquery. Then you can left join on that row number. To recycle rows from the second table, take the modulus % of the first table's row number.
Example:
select Sub1.col1
, Sub2.col1
from (
select row_number() over (order by col1) as rn
, *
from #T1
) Sub1
left join
(
select row_number() over (order by col1) as rn
, *
from #T2
) Sub2
on (Sub1.rn - 1) % (select count(*) from #T2) + 1 = Sub2.rn
Test data:
declare #t1 table (col1 int)
declare #t2 table (col1 datetime)
insert #t1 values (1), (2), (3), (4), (5)
insert #t2 values ('2010-01-01'), ('2012-02-02')
This prints:
1 2010-01-01
2 2012-02-02
3 2010-01-01
4 2012-02-02
5 2010-01-01
You are looking for a LEFT JOIN (http://www.w3schools.com/sql/sql_join_left.asp) eg . T1 LEFT JOIN T2
say they both have column CustomerID in common
SELECT *
FROM T1
LEFT JOIN
T2 on t1.CustomerId = T2.CustomerId
This will return all records in T1 and those that match in T2 with nulls for the T2 values where they do not match.
Make sure you are joining the tables on a common column (or common column set if more than one column are necessary to perform the join). If not, you are doing a cartesian join ( http://ezinearticles.com/?What-is-a-Cartesian-Join?&id=3560672 )