How can I join these tables? - sql

I have a number of devices logging different data at different times and want to get all the data in a single query, ordered by time. An example of the kinds of tables I have:
CREATE TABLE location(
device_id INT, -- REFERENCES device(id)
timestamp DATETIME2 NOT NULL,
position GEOMETRY NOT NULL
)
CREATE TABLE temperature(
device_id INT, -- REFERENCES device(id)
timestamp DATETIME2 NOT NULL,
temp FLOAT NOT NULL
)
I want to have a single query that joins the tables on device_id and timestamp that contains nulls when the timestamps don't match. An example of the output format I am seeking is:
device_id, timestamp, location, temperature
1, 2011/12/1 10:00:00, (35.1, 139.2), NULL
1, 2011/12/1 10:00:01, NULL, 9.0
1, 2011/12/1 10:00:02, (35.1, 139.2), 9.1
I've tried doing FULL JOIN but cannot figure out how to do the timestamp column without a huge CASE statement (keep in mind although I've only shown 2 tables, this can have many more).
SELECT
location.device_id,
CASE WHEN location.timestamp IS NOT NULL THEN
location.timestamp
ELSE
temperature.timestamp
END as timestamp,
location,
temp
FROM
location
FULL JOIN temperature ON location.device_id = temperature.device_id
AND location.timestamp = temperature.timestamp
ORDER BY
timestamp
Is there a simpler way to write this kind of query?

You can use the COALESCE expression.
SELECT
location.device_id,
COALESCE(location.timestamp, temperature.timestamp) as timestamp,
position,
temp
FROM
location
FULL JOIN temperature ON location.device_id = temperature.device_id
AND location.timestamp = temperature.timestamp
ORDER BY
timestamp;

Yes, you can use an OUTER Join to the temperature table. That will return nulls in the case where there is no matching row in the temperature table.

You need a COALESCE to get the device_id/timestamp, as follows:
SELECT
COALESCE(l.device_id, t.device_id) as device_id,
COALESCE(l.timestamp, t.timestamp) as timestamp,
l.position as location,
t.temp as temperature
FROM location l
FULL JOIN temperature t ON l.device_id = t.device_id
AND l.timestamp = t.timestamp
ORDER BY 2
Also note the increased readability by aliasing the tables with very short names (l and t).
You may want to review your ordering - perhaps you want ORDER BY 1, 2 instead

SELECT device_id, timestamp, position, NULL AS temp
FROM location
UNION ALL
SELECT device_id, timestamp, NULL AS position, temp
FROM temperature
ORDER
BY timestamp;
Note the ALL keyword is required here.

Related

How to efficiently compute in step 1 differences between columns and in step 2 aggregate those differences?

This is a follow-up question to this Is there a way to use functions that can take multiple columns as input (e.g. `GREATEST`) without typing out every column name?, where I asked only about the second part of my problem. The feedback was that the data model is most likely not appropriate.
I was thinking again about the data model but still, have trouble figuring out a good way to do what I want.
The complete problem is as follows:
I got time series data for multiple technical devices with columns like energy_consumption and voltage.
Furthermore, I got columns with sensitivities towards multiple external factors for each device which I just added as additional columns (denoted with the cc_ in the example).
There are queries where I want to operate on the raw sensitivities. However, there are also queries for which I need to take first some differences such as cc_a - cc_b and cc_b -cc_c and then compute the max of those differences. The combinations for which the differences are to be computed are a predefined subset (around 30) of all possible combinations. The set of combinations that is of interest might change in the future so that for different time intervals different combinations have to be applied (e.g. from 2022-01-01 to 2024-12-31 take combination set A and from 2025-01-01 to ... take combination set B). However, it is very unlikely that the combination change very often.
Here is an example of how I am doing it at the moment
CREATE TEMP TABLE foo (device_id int, voltage int, energy_consumption int, cc_a int, cc_b int, cc_c int);
INSERT INTO foo VALUES (3, 12, 5, '1', '2', '3'), (4, 6, 3, '15', '4', '100');
WITH diff_table AS (
SELECT
id,
(cc_a - cc_b) as diff_ab,
(cc_a - cc_c) as diff_ac,
(cc_b - cc_c) as diff_bc
FROM foo
)
SELECT
id,
GREATEST(diff_ab, diff_ab, diff_bc) as max_cc
FROM diff_table
Since I got more than 100 sensitivities and also differences I am looking for a way how to do this efficiently, both computationally and in terms of typical query length.
What would be a good data model to perform such operations?
The solution I type below assumes all pairings are considered, and the you don't want the points where these are reached.
CREATE TABLE sources (
source_id int
,source_name varchar(10)
,PRIMARY KEY(source_id))
CREATE TABLE foo_values(
device_id int not null --device_id for "foo"
,source_id int -- you may change that with a foreign key
,value int
,CONSTRAINT fk_source_id
FOREIGN KEY(source_id )
REFERENCES sources(source_id ) )
With the exemple set you gave
INSERT INTO sources ( source_id, source_name ) VALUES
(1,'cc_a')
,(2,'cc_b')
,(3,'cc_c')
-- and so on
INSERT INTO foo_values (device_id,source_id, value ) VALUES
(3,1,1),(3,2,2),(3,3,3)
,(4,1,15),(4,2,4),(4,2,100)
doing this way, the query will be
SELECT device_id
, MAX(value)-MIN(value) as greatest_diff
FROM foo_values
group by device_id
Bonus : with such a schema, you can even tell where the maximum and minimum are reached
WITH ranked as (
SELECT
f.device_id
,f.value
,f.source_id
,RANK() OVER (PARTITION BY f.device_id ORDER BY f.value ) as low_first
,RANK() OVER (PARTITION BY f.device_id ORDER BY f.value DESC) as high_first
FROM foo_values as f)
SELECT h.device_id
, hs.source_name as source_high
, ls.source_name as source_low
, h.value as value_high
, l.value as value_low
, h.value - l.value as greatest_diff
FROM ranked l
INNER JOIN ranked h
on l.device_id = h.device_id
INNER JOIN sources ls
on ls.source_id = l.source_id
INNER JOIN sources hs
on hs.source_id = h.source_id
WHERE l.low_first =1 AND h.high_first = 1
Here is a fiddle for this solution.
EDIT : since you need to control the pairings, you must add a table that list them:
CREATE TABLE high_low_source
(high_source_id int
,low_source_id int
, constraint fk_low
FOREIGN KEY(low_source_id )
REFERENCES sources(source_id )
,constraint fk_high
FOREIGN KEY(high_source_id )
REFERENCES sources(source_id )
);
INSERT INTO high_low_source VALUES
(1,2),(1,3),(2,3)
The query looking for the greatest difference becomes :
SELECT h.device_id
, hs.source_name as source_high
, ls.source_name as source_low
, h.value as value_high
, l.value as value_low
, h.value - l.value as my_diff
, RANK() OVER (PARTITION BY h.device_id ORDER BY (h.value - l.value) DESC) as greatest_first
FROM foo_values l
INNER JOIN foo_values h
on l.device_id = h.device_id
INNER JOIN high_low_source hl
on hl.low_source_id = l.source_id
AND hl.high_source_id = h.source_id
INNER JOIN sources ls
on ls.source_id = l.source_id
INNER JOIN sources hs
on hs.source_id = h.source_id
ORDER BY device_id, greatest_first
With the values you have listed, there is a tie for device 3.
Extended fiddle

Sum of column between any two dates?

Say I have a table with 3 columns, Name, Date, Test Score.
I want to find the sum of a person’s test scores between any two dates.
How would I do this in SQL?
Obviously I’d need the dates and the person (Name) as parameters, but I’m not sure where to go from here?
You would typically use the parameters in the where clause of an aggregate query:
select sum(test_score) total_score
from mytable
where
date >= :start_date
and date < :end_date
and name = :name
:start_date, :end_date and :name are the parameters to the query, which are used to filter the dataset; the query always returns one row, with a single column called total_score that contains the sum of test_score for rows that satisfy the filtering predicataes. If no row matches, the returned value is null.
With the following table:
CREATE TABLE `marks` (
`id` INT AUTO_INCREMENT PRIMARY KEY,
`name` VARCHAR(255),
`date` DATETIME,
`score` INT
);
The following select SQL will display a list of names and their sum:
SELECT
`name`,
SUM(`score`) AS `score_sum`
FROM `marks`
WHERE `date`
BETWEEN '2020-06-01 00:00:00'
AND '2020-06-22 13:14:25'
GROUP BY `name`;
But for your solution, you might want to use:
SELECT
SUM(`score`) AS `score_sum`
FROM `marks`
WHERE
`name` = 'xxx'
AND `date`
BETWEEN '2020-06-01 00:00:00'
AND '2020-06-22 13:14:25';
Just keep in mind that if there is no value to SUM, a null will be returned.

Firebird select from table distinct one field

The question I asked yesterday was simplified but I realize that I have to report the whole story.
I have to extract the data of 4 from 4 different tables into a Firebird 2.5 database and the following query works:
SELECT
PRODUZIONE_T t.CODPRODUZIONE,
PRODUZIONE_T.NUMEROCOMMESSA as numeroco,
ANGCLIENTIFORNITORI.RAGIONESOCIALE1,
PRODUZIONE_T.DATACONSEGNA,
PRODUZIONE_T.REVISIONE,
ANGUTENTI.NOMINATIVO,
ORDINI.T_DATA,
FROM PRODUZIONE_T
LEFT OUTER JOIN ORDINI_T ON PRODUZIONE_T.CODORDINE=ORDINI_T.CODORDINE
INNER JOIN ANGCLIENTIFORNITORI ON ANGCLIENTIFORNITORI.CODCLIFOR=ORDINI_T.CODCLIFOR
LEFT OUTER JOIN ANGUTENTI ON ANGUTENTI.IDUTENTE = PRODUZIONE_T.RESPONSABILEUC
ORDER BY right(numeroco,2) DESC, left(numeroco,3) desc
rows 1 to 500;
However the query returns me double (or more) due to the REVISIONE column.
How do I select only the rows of a single NUMEROCOMMESSA with the maximum REVISIONE value?
This should work:
select COD, ORDER, S.DATE, REVISION
FROM TAB1
JOIN
(
select ORDER, MAX(REVISION) as REVISION
FROM TAB1
Group By ORDER
) m on m.ORDER = TAB1.ORDER and m.REVISION = TAB1.REVISION
Here you go - http://sqlfiddle.com/#!6/ce7cf/4
Sample Data (as u set it in your original question):
create table TAB1 (
cod integer primary key,
n_order varchar(10) not null,
s_date date not null,
revision integer not null );
alter table tab1 add constraint UQ1 unique (n_order,revision);
insert into TAB1 values ( 1, '001/18', '2018-02-01', 0 );
insert into TAB1 values ( 2, '002/18', '2018-01-31', 0 );
insert into TAB1 values ( 3, '002/18', '2018-01-30', 1 );
The query:
select *
from tab1 d
join ( select n_ORDER, MAX(REVISION) as REVISION
FROM TAB1
Group By n_ORDER ) m
on m.n_ORDER = d.n_ORDER and m.REVISION = d.REVISION
Suggestions:
Google and read the classic book: "Understanding SQL" by Martin Gruber
Read Firebird SQL reference: https://www.firebirdsql.org/file/documentation/reference_manuals/fblangref25-en/html/fblangref25.html
Here is yet one more solution using Windowed Functions introduced in Firebird 3 - http://sqlfiddle.com/#!6/ce7cf/13
I do not have Firebird 3 at hand, so can not actually check if there would not be some sudden incompatibility, do it at home :-D
SELECT * FROM
(
SELECT
TAB1.*,
ROW_NUMBER() OVER (
PARTITION BY n_order
ORDER BY revision DESC
) AS rank
FROM TAB1
) d
WHERE rank = 1
Read documentation
https://community.modeanalytics.com/sql/tutorial/sql-window-functions/
https://www.firebirdsql.org/file/documentation/release_notes/html/en/3_0/rnfb30-dml-windowfuncs.html
Which of the three (including Gordon's one) solution would be faster depends upon specific database - the real data, the existing indexes, the selectivity of indexes.
While window functions can make the join-less query, I am not sure it would be faster on real data, as it maybe can just ignore indexes on order+revision cortege and do the full-scan instead, before rank=1 condition applied. While the first solution would most probably use indexes to get maximums without actually reading every row in the table.
The Firebird-support mailing list suggested a way to break out of the loop, to only use a single query: The trick is using both windows functions and CTE (common table expression): http://sqlfiddle.com/#!18/ce7cf/2
WITH TMP AS (
SELECT
*,
MAX(revision) OVER (
PARTITION BY n_order
) as max_REV
FROM TAB1
)
SELECT * FROM TMP
WHERE revision = max_REV
If you want the max revision number in Firebird:
select t.*
from tab1 t
where t.revision = (select max(t2.revision) from tab1 t2 where t2.order = t.order);
For performance, you want an index on tab1(order, revision). With such an index, performance should be competitive with any other approach.

How to select values by date field (not as simple as it sounds)

I have a table called tblMK The table contains a date time field.
What I wish to do is create a query which will each time, select the 2 latest entries (by the datetime column) and then get the date difference between them and show only that.
How would I go around creating this expression. This doesn't necessarily need to be a query, it could be a view/function/procedure or what ever works. I have created a function called getdatediff which receives to dates, and returns a string the says (x days y hours z minutes) basically that will be the calculated field. So how would I go around doing this?
Edit: I need to each time select 2 and 2 and so on until the oldest one. There will always be an even amount of rows.
Use only sql like this:
create table t1(c1 integer, dt datetime);
insert into t1 values
(1, getdate()),
(2, dateadd(day,1,getdate())),
(3, dateadd(day,2,getdate()));
with temp as (select top 2 dt
from t1
order by dt desc)
select datediff(day,min(dt),max(dt)) as diff_of_dates
from temp;
sql fiddle
On MySQL use limit clause
select max(a.updated_at)-min(a.updated_at)
From
( select * from mytable order by updated_at desc limit 2 ) a
Thanks guys I found the solution please ignore the additional columns they are for my db:
; with numbered as (
Select part,taarich,hulia,mesirakabala,
rowno = row_number() OVER (Partition by parit order.by taarich)
From tblMK)
Select a.rowno-1,a.part, a.Julia,b.taarich,as.taarich_kabala,a.taarich, a.mesirakabala,getdatediff(b.taarich,a.taarich) as due
From numbered a
Left join numbered b ON b.parit=a.parit
And b.rowno = a.rowno - 1
Where b.taarich is not null
Order by part,taarich
Sorry about mistakes I might of made, I'm on my smartphone.

SQL JOIN table with a date range

Say, I have a table with C columns and N rows. I would like to produce a select statement that represents the "join" of that table with a data range comprising, M days. The resultant result set should have C+1 columns (the last one being the date) and NXM rows.
Trivial example to clarify things:
Given the table A below:
select * from A;
avalue |
--------+
"a" |
And a date range from 10 to 12 of October 2012, I want the following result set:
avalue | date
--------+-------
"a" | 2012-10-10
"a" | 2012-10-11
"a" | 2012-10-12
(this is a stepping stone I need towards ultimately calculating inventory levels on any given day, given starting values and deltas)
The Postgres way for this is simple: CROSS JOIN to the function generate_series():
SELECT t.*, g.day::date
FROM tbl t
CROSS JOIN generate_series(timestamp '2012-10-10'
, timestamp '2012-10-12'
, interval '1 day') AS g(day);
Produces exactly the output requested.
generate_series() is a set-returning function (a.k.a. "table function") producing a derived table. There are a couple of overloaded variants, here's why I chose timestamp input:
Generating time series between two dates in PostgreSQL
For arbitrary dates, replace generate_series() with a VALUES expression. No need to persist a table:
SELECT *
FROM tbl t
CROSS JOIN (
VALUES
(date '2012-08-13') -- explicit type in 1st row
, ('2012-09-05')
, ('2012-10-10')
) g(day);
If the date table has more dates in it than you're interested in, then do
select a.avalue, b.date from a, b where b.date between '2012-10-10' and '2012-10-12'
Other wise if the date table contained only the dates you were interested in, a cartesian join would accomplish this:
select * from a,b;
declare
#Date1 datetime = '20121010',
#Date2 datetime = '20121012';
with Dates
as
(
select #Date1 as [Date]
union all
select dateadd(dd, 1, D.[Date]) as [Date]
from Dates as D
where D.[Date] <= DATEADD(dd, -1, #Date2)
)
select
A.value, D.[Date]
from Dates as D
cross join A
For MySQL
schema/data:
CREATE TABLE someTable
(
someCol varchar(8) not null
);
INSERT INTO someTable VALUES ('a');
CREATE TABLE calendar
(
calDate datetime not null,
isBus bit
);
ALTER TABLE calendar
ADD CONSTRAINT PK_calendar
PRIMARY KEY (calDate);
INSERT INTO calendar VALUES ('2012-10-10', 1);
INSERT INTO calendar VALUES ('2012-10-11', 1);
INSERT INTO calendar VALUES ('2012-10-12', 1);
query:
select s.someCol, c.calDate from someTable s, calendar c;
You really have two options for what you are trying to do.
If your RDBMS supports it (I know SQL Server does, but I don't know any others), you can create a table-valued function which takes in a date range and returns a result set of all the discrete dates within that range. You would do a cartesian join between your table and the function.
You can create a static table of date values and then do a cartesian join between the two tables.
The second option will perform better, especially if you are dealing with large date ranges, however, that solution will not be able to handle arbitrary date ranges. But then, you should know your minimum date, and you can alway add more dates to your table as time goes on.
I am not very clear about your M table. Providing that you have such a table(M) with dates, following cross join will bring the results.
SELECT C.*, M.date FROM C CROSS JOIN M