Self-Joins, Cross-Joins and Grouping - sql

I've got a table of temperature samples over time from several sources and I want to find the minimum, maximum, and average temperatures across all sources at set time intervals. At first glance this is easily done like so:
SELECT MIN(temp), MAX(temp), AVG(temp) FROM samples GROUP BY time;
However, things become much more complicated (to the point of where I'm stumped!) if sources drop in and out and rather than ignoring the missing sources during the intervals in question I want to use the sources' last know temperatures for the missing samples. Using datetimes and constructing intervals (say every minute) across samples unevenly distributed over time further complicates things.
I think it should be possible to create the results I want by doing a self-join on the samples table where the time from the first table is greater than or equal to the time of the second table and then calculating aggregate values for rows grouped by source. However, I'm stumped about how to actually do this.
Here's my test table:
| time | source | temp |
| 1 | a | 20 |
| 1 | b | 18 |
| 1 | c | 23 |
| 2 | b | 21 |
| 2 | c | 20 |
| 2 | a | 18 |
| 3 | a | 16 |
| 3 | c | 13 |
| 4 | c | 15 |
| 4 | a | 4 |
| 4 | b | 31 |
| 5 | b | 10 |
| 5 | c | 16 |
| 5 | a | 22 |
| 6 | a | 18 |
| 6 | b | 17 |
| 7 | a | 20 |
| 7 | b | 19 |
INSERT INTO samples (time, source, temp) VALUES (1, 'a', 20), (1, 'b', 18), (1, 'c', 23), (2, 'b', 21), (2, 'c', 20), (2, 'a', 18), (3, 'a', 16), (3, 'c', 13), (4, 'c', 15), (4, 'a', 4), (4, 'b', 31), (5, 'b', 10), (5, 'c', 16), (5, 'a', 22), (6, 'a', 18), (6, 'b', 17), (7, 'a', 20), (7, 'b', 19);
To do my min, max and avg calculations, I want an intermediate table that looks like this:
| time | source | temp |
| 1 | a | 20 |
| 1 | b | 18 |
| 1 | c | 23 |
| 2 | b | 21 |
| 2 | c | 20 |
| 2 | a | 18 |
| 3 | a | 16 |
| 3 | b | 21 |
| 3 | c | 13 |
| 4 | c | 15 |
| 4 | a | 4 |
| 4 | b | 31 |
| 5 | b | 10 |
| 5 | c | 16 |
| 5 | a | 22 |
| 6 | a | 18 |
| 6 | b | 17 |
| 6 | c | 16 |
| 7 | a | 20 |
| 7 | b | 19 |
| 7 | c | 16 |
The following query is getting me close to what I want but it takes the temperature value of the source's first result, rather than the most recent one at the given time interval:
SELECT s.dt as sdt, s.mac, ss.temp, MAX(ss.dt) as maxdt FROM (SELECT DISTINCT dt FROM samples) AS s CROSS JOIN samples AS ss WHERE s.dt >= ss.dt GROUP BY sdt, mac HAVING maxdt <= s.dt ORDER BY sdt ASC, maxdt ASC;
| sdt | mac | temp | maxdt |
| 1 | a | 20 | 1 |
| 1 | c | 23 | 1 |
| 1 | b | 18 | 1 |
| 2 | a | 20 | 2 |
| 2 | c | 23 | 2 |
| 2 | b | 18 | 2 |
| 3 | b | 18 | 2 |
| 3 | a | 20 | 3 |
| 3 | c | 23 | 3 |
| 4 | a | 20 | 4 |
| 4 | c | 23 | 4 |
| 4 | b | 18 | 4 |
| 5 | a | 20 | 5 |
| 5 | c | 23 | 5 |
| 5 | b | 18 | 5 |
| 6 | c | 23 | 5 |
| 6 | a | 20 | 6 |
| 6 | b | 18 | 6 |
| 7 | c | 23 | 5 |
| 7 | b | 18 | 7 |
| 7 | a | 20 | 7 |
Update: chadhoc (great name, by the way!) gives a nice solution that unfortunately does not work in MySQL, since it does not support the FULL JOIN he uses. Luckily, I believe a simple UNION is an effective replacement:
-- Unify the original samples with the missing values that we've calculated
SELECT time, source, temp
FROM samples
( -- Pull all the time/source combinations that we are missing from the sample set, along with the temp
-- from the last sampled interval for the same time/source combination if we do not have one
SELECT a.time, a.source, (SELECT t2.temp FROM samples AS t2 WHERE t2.time < a.time AND t2.source = a.source ORDER BY t2.time DESC LIMIT 1) AS temp
( -- All values we want to get should be a cross of time/temp
SELECT t1.time, s1.source
(SELECT DISTINCT time FROM samples) AS t1
(SELECT DISTINCT source FROM samples) AS s1
) AS a
LEFT JOIN samples s
ON a.time = s.time
AND a.source = s.source
WHERE s.source IS NULL
ORDER BY time, source;
Update 2: MySQL gives the following EXPLAIN output for chadhoc's code:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | PRIMARY | temp | ALL | NULL | NULL | NULL | NULL | 18 | |
| 2 | UNION | <derived4> | ALL | NULL | NULL | NULL | NULL | 21 | |
| 2 | UNION | s | ALL | NULL | NULL | NULL | NULL | 18 | Using where |
| 4 | DERIVED | <derived6> | ALL | NULL | NULL | NULL | NULL | 3 | |
| 4 | DERIVED | <derived5> | ALL | NULL | NULL | NULL | NULL | 7 | |
| 6 | DERIVED | temp | ALL | NULL | NULL | NULL | NULL | 18 | Using temporary |
| 5 | DERIVED | temp | ALL | NULL | NULL | NULL | NULL | 18 | Using temporary |
| 3 | DEPENDENT SUBQUERY | t2 | ALL | NULL | NULL | NULL | NULL | 18 | Using where; Using filesort |
| NULL | UNION RESULT | <union1,2> | ALL | NULL | NULL | NULL | NULL | NULL | Using filesort |
I was able to get Charles' code working like so:
SELECT T.time, S.source,
SELECT temp FROM samples
WHERE source = S.source AND time = (
FROM samples
source = S.source
AND time < T.time
) AS temp
LEFT JOIN samples AS D
ON D.source = S.source AND D.time = T.time
Its explanation is:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | PRIMARY | <derived5> | ALL | NULL | NULL | NULL | NULL | 3 | |
| 1 | PRIMARY | <derived4> | ALL | NULL | NULL | NULL | NULL | 7 | |
| 1 | PRIMARY | D | ALL | NULL | NULL | NULL | NULL | 18 | |
| 5 | DERIVED | temp | ALL | NULL | NULL | NULL | NULL | 18 | Using temporary |
| 4 | DERIVED | temp | ALL | NULL | NULL | NULL | NULL | 18 | Using temporary |
| 2 | DEPENDENT SUBQUERY | temp | ALL | NULL | NULL | NULL | NULL | 18 | Using where |
| 3 | DEPENDENT SUBQUERY | temp | ALL | NULL | NULL | NULL | NULL | 18 | Using where |

I think you'll get better performance making use of the ranking/windowing functions in mySql, but unfortunately I do not know those as well as the TSQL implementation. Here is an ANSI compliant solution that will work though:
-- Full join across the sample set and anything missing from the sample set, pulling the missing temp first if we do not have one
select coalesce(c1.[time], c2.[time]) as dt, coalesce(c1.source, c2.source) as source, coalesce(c2.temp, c1.temp) as temp
from samples c1
full join ( -- Pull all the time/source combinations that we are missing from the sample set, along with the temp
-- from the last sampled interval for the same time/source combination if we do not have one
select a.time, a.source,
(select top 1 t2.temp from samples t2 where t2.time < a.time and t2.source = a.source order by t2.time desc) as temp
( -- All values we want to get should be a cross of time/samples
select t1.[time], s1.source
(select distinct [time] from samples) as t1
cross join
(select distinct source from samples) as s1
) a
left join samples s
on a.[time] = s.time
and a.source = s.source
where s.source is null
) c2
on c1.time = c2.time
and c1.source = c2.source
order by dt, source

I know this looks complicated, but it's formatted to explain itself...
It should work... Hope you only have three sources... If you have an arbitrary number of sources than this won't work... In that case see the second query...
EDIT: Removed first attempt
EDIT: If you don't know the sources ahead of time, you'll have to do something where you create an intermediate result set that "Fills in" the missing values..
something like this:
2nd EDIT: Removed need for Coalesce by moving logic to retrieve most recent temp reading for each source from Select clause into the Join condition.
Select T.Time, Max(Temp) MaxTemp,
Min(Temp) MinTemp, Avg(Temp) AvgTemp
(Select T.TIme, S.Source, D.Temp
From (Select Distinct Time From Samples) T
Cross Join
(Select Distinct Source From Samples) S
Left Join Samples D
On D.Source = S.Source
And D.Time =
(Select Max(Time)
From Samples
Where Source = S.Source
And Time <= T.Time)) Z
Group By T.Time


Count without using functions (like count) oracle

I have two tables:
name VARCHAR2(30) NOT NULL CHECK (upper(name)=name)
name VARCHAR2(30) NOT NULL CHECK (upper(name)=name),
ref_ostan NUMBER,
CONSTRAINT fk_ref_ostan FOREIGN KEY (ref_ostan) REFERENCES z_ostan(id)
How can I find the second and third place "id" from -Table A- The least used table B in the table? Without using predefined functions like "count()"
This only processes existing references to Table A.
Updated for oracle (used 12c)
Without using any aggregate or window functions:
Sample data for Table: tblb
| id | name | tbla_id |
| 1 | TBLB_01 | 1 |
| 2 | TBLB_02 | 1 |
| 3 | TBLB_03 | 1 |
| 4 | TBLB_04 | 1 | 4 rows
| 5 | TBLB_05 | 2 |
| 6 | TBLB_06 | 2 |
| 7 | TBLB_07 | 2 | 3 rows
| 8 | TBLB_08 | 3 |
| 9 | TBLB_09 | 3 |
| 10 | TBLB_10 | 3 |
| 11 | TBLB_11 | 3 |
| 12 | TBLB_12 | 3 |
| 13 | TBLB_13 | 3 | 6 rows
| 14 | TBLB_14 | 4 |
| 15 | TBLB_15 | 4 |
| 16 | TBLB_16 | 4 | 3 rows
| 17 | TBLB_17 | 5 | 1 row
| 18 | TBLB_18 | 6 |
| 19 | TBLB_19 | 6 | 2 rows
| 20 | TBLB_20 | 7 | 1 row
There are many ways to express this logic.
Step by step with CTE terms.
The intent is (for each set of tbla_id rows in tblb)
generate a row_number (n) for the rows in each partition.
We would normally use window functions for this.
But I assume these are not allowed.
Use this row_number (n) to determine the count of rows in each tbla_id partition.
To find that count per partition, find the last row in each partition (from step 1).
Order the results of step 2 by n of these last rows.
Choose the 2nd and 3rd row of this result
WITH first AS ( -- Find the first row per tbla_id
FROM tblb t1
LEFT JOIN tblb t2
ON >
AND t1.tbla_id = t2.tbla_id
, rnum (id, name, tbla_id, n) AS ( -- Generate a row_number (n) for each tbla_id partition
SELECT f.*, 1 FROM first f UNION ALL
SELECT,, n.tbla_id, c.n+1
FROM rnum c
JOIN tblb n
ON c.tbla_id = n.tbla_id
LEFT JOIN tblb n2
ON n.tbla_id = n2.tbla_id
, last AS ( -- Find the last row in each partition to obtain the count of tbla_id references
FROM rnum t1
LEFT JOIN rnum t2
ON <
AND t1.tbla_id = t2.tbla_id
Final Result, where n is the count of references to tbla:
| id | name | tbla_id | n |
| 20 | TBLB_20 | 7 | 1 |
| 19 | TBLB_19 | 6 | 2 |
Some intermediate results...
last CTE term result. The 2nd and 3rd rows of this become the final result.
| id | name | tbla_id | n |
| 17 | TBLB_17 | 5 | 1 |
| 20 | TBLB_20 | 7 | 1 |
| 19 | TBLB_19 | 6 | 2 |
| 7 | TBLB_07 | 2 | 3 |
| 16 | TBLB_16 | 4 | 3 |
| 4 | TBLB_04 | 1 | 4 |
| 13 | TBLB_13 | 3 | 6 |
rnum CTE term result. This provides the row_number over tbla_id partitions ordered by id
| id | name | tbla_id | n |
| 1 | TBLB_01 | 1 | 1 |
| 2 | TBLB_02 | 1 | 2 |
| 3 | TBLB_03 | 1 | 3 |
| 4 | TBLB_04 | 1 | 4 |
| 5 | TBLB_05 | 2 | 1 |
| 6 | TBLB_06 | 2 | 2 |
| 7 | TBLB_07 | 2 | 3 |
| 8 | TBLB_08 | 3 | 1 |
| 9 | TBLB_09 | 3 | 2 |
| 10 | TBLB_10 | 3 | 3 |
| 11 | TBLB_11 | 3 | 4 |
| 12 | TBLB_12 | 3 | 5 |
| 13 | TBLB_13 | 3 | 6 |
| 14 | TBLB_14 | 4 | 1 |
| 15 | TBLB_15 | 4 | 2 |
| 16 | TBLB_16 | 4 | 3 |
| 17 | TBLB_17 | 5 | 1 |
| 18 | TBLB_18 | 6 | 1 |
| 19 | TBLB_19 | 6 | 2 |
| 20 | TBLB_20 | 7 | 1 |
There are a few other ways to tackle this problem in just SQL.

Replace nulls of a column with column value from another table

I have data flowing from two tables, table A and table B. I'm doing an inner join on a common column from both the tables and creating two more new columns based on different conditions. Below is a sample dataset:
Table A
| Id | StartDate |
| 119 | 01-01-2018 |
| 120 | 01-02-2019 |
| 121 | 03-05-2018 |
| 123 | 05-08-2021 |
| Id | CodeId | Code | RedemptionDate |
| 119 | 1 | abc | null |
| 119 | 2 | abc | null |
| 119 | 3 | def | null |
| 119 | 4 | def | 2/3/2019 |
| 120 | 5 | ghi | 04/7/2018 |
| 120 | 6 | ghi | 4/5/2018 |
| 121 | 7 | jkl | null |
| 121 | 8 | jkl | 4/4/2019 |
| 121 | 9 | mno | 3/18/2020 |
| 123 | 10 | pqr | null |
What I'm basically doing is joining the tables on column 'Id' when StartDate>2018 and create two new columns - 'unlock' by counting CodeId when RedemptionDate is null and 'Redeem' by counting CodeId when RedmeptionDate is not null. Below is the SQL query:
WITH cte1 AS (
SELECT, COUNT(b.CodeId) AS 'Unlock'
FROM TableA AS a
JOIN TableB AS b ON a.Id=b.Id
WHERE YEAR(a.StartDate) >= 2018 AND b.RedemptionDate IS NULL
), cte2 AS (
SELECT, COUNT(b.CodeId) AS 'Redeem'
FROM TableA AS a
JOIN TableB AS b ON a.Id=b.Id
WHERE YEAR(a.StartDate) >= 2018 AND b.RedemptionDate IS NOT NULL
SELECT cte1.Id, cte1.Unlocked, cte2.Redeemed
FROM cte1
FULL OUTER JOIN cte2 ON cte1.Id = cte2.Id
If I break down the output of this query, result from cte1 will look like below:
| Id | Unlock |
| 119 | 3 |
| 121 | 1 |
| 123 | 1 |
And from cte2 will look like below:
| Id | Redeem |
| 119 | 1 |
| 120 | 2 |
| 121 | 2 |
The last select query will produce the following result:
| Id | Unlock | Redeem |
| 119 | 3 | 1 |
| null | null | 2 |
| 121 | 1 | 2 |
| 123 | 1 | null |
How can I replace the null value from Id with values from 'b.Id'? If I try coalesce or a case statement, they create new columns. I don't want to create additional columns, rather replace the null values from the column values coming from another table.
My final output should like:
| Id | Unlock | Redeem |
| 119 | 3 | 1 |
| 120 | null | 2 |
| 121 | 1 | 2 |
| 123 | 1 | null |
If I'm following correctly, you can use apply with aggregation:
select a.*, b.*
from a cross apply
(select count(RedemptionDate) as num_redeemed,
count(*) - count(RedemptionDate) as num_unlock
from b
where =
) b;
However, the answer to your question is to use coalesce(, as id.

output difference of two values same column to another column

Can anhone help me out or point me in the right direction? What is simplest way to get from current table to output table??
Current Table
ID | type | amount |
2 | A | 19 |
2 | B | 6 |
3 | A | 5 |
3 | B | 11 |
4 | A | 1 |
4 | B | 23 |
Desires output
ID | type | amount | change |
2 | A | 19 | 13 |
2 | B | 6 | -6 |
3 | A | 5 | -22 |
3 | B | 11 | |
4 | A | 1 | |
4 | B | 23 | |
I don't get how the values are put on rows. You can, for instance, subtract the "B" value from the "A" value for any given id. For instance:
select t.*,
(case when type = 'A'
then amount - max(amount) filter (type = 'B') over (partition by id)
end) as diff_a_b
from t;

find other columns value based on maximum of one column using groupby particular column

I have data like below
| Count | Mindif | Device |
| 45 | 3 | A |
| 78 | 4 | A |
| 52 | 5 | A |
| 24 | 6 | A |
| 22 | 1 | B |
| 22 | 2 | B |
| 34 | 3 | B |
| 37 | 4 | B |
| 52 | 5 | B |
| 34 | 6 | B |
| 13 | 1 | C |
| 30 | 2 | C |
| 57 | 3 | C |
| 111 | 4 | C |
| 35 | 5 | C |
Want to find Mindif and device based on max value of count.
Output be like
| Count | Mindif | Device |
| 78 | 4 | A |
| 52 | 5 | B |
| 111 | 4 | C |
You can use a query like this:
SELECT t1.Count, t1.Mindif, t1.Device
FROM mytable AS t1
SELECT Device, MAX(Count) AS Count
FROM mytable
) AS t2 ON t1.Device = t2.Device AND t1.Count = t2.Count
The query uses a derived table that returns the max Count value per Device. Joining back to the original table we can get the desired result.
using Window Function
SELECT Count, Mindif, Device
(SELECT Count, Mindif, Device,
rank() over (order by Count desc) as r
FROM table) S
WHERE S.r = 1;
Simple Join with MAX
SELECT a.* FROM table a
FROM table)b on (a.Count = b.Cnt)

Sum data from two tables with different number of rows

There are 3 Tables (SorMaster, SorDetail, and InvWarehouse):
| SalesOrder |
| 100 |
| 101 |
| 102 |
| SalesOrder | MStockCode | MBackOrderQty |
| 100 | PN-1 | 4 |
| 100 | PN-2 | 9 |
| 100 | PN-3 | 1 |
| 100 | PN-4 | 6 |
| 101 | PN-1 | 6 |
| 101 | PN-3 | 2 |
| 102 | PN-2 | 19 |
| 102 | PN-3 | 14 |
| 102 | PN-4 | 6 |
| 102 | PN-5 | 4 |
| MStockCode | Warehouse | QtyOnHand |
| PN-1 | A | 1 |
| PN-2 | B | 9 |
| PN-3 | A | 0 |
| PN-4 | B | 1 |
| PN-1 | A | 0 |
| PN-3 | B | 5 |
| PN-2 | A | 9 |
| PN-3 | B | 4 |
| PN-4 | A | 6 |
| PN-5 | B | 0 |
Desired Results:
| MStockCode | SumBackOrderQty | SumQtyOnHand |
| PN-1 | 10 | 10 |
| PN-2 | 28 | 1 |
| PN-3 | 17 | 5 |
| PN-4 | 12 | 13 |
| PN-5 | 11 | 6 |
I have been going around in circles with no end in sight. Seems like it should be simple but just can't wrap my head around it. The SumBackOrderQty obviously getting counted twice as the SumQtyOnHand is evaluated. To this point I have been doing the calculations in the PHP instead of the select statement but would like to clean things up a bit where possible.
Current query statement is:
SELECT SorDetail.MStockCode,
SUM(SorDetail.MBackOrderQty) AS 'SumMBackOrderQty',
SUM(InvWarehouse.QtyOnHand) AS 'SumQtyOnHand'
FROM SysproCompanyJ.dbo.SorMaster SorMaster,
SysproCompanyJ.dbo.SorDetail SorDetail LEFT OUTER JOIN SysproCompanyJ.dbo.InvWarehouse InvWarehouse
ON SorDetail.MStockCode = InvWarehouse.StockCode
WHERE SorMaster.SalesOrder = SorDetail.SalesOrder
AND SorMaster.ActiveFlag != 'N'
AND SorDetail.MBackOrderQty > '0'
AND SorDetail.MPrice > '0'
GROUP BY SorDetail.MStockCode
ORDER BY SorDetail.MStockCode ASC
Without providing the complete picture, in terms of your RDBMS, database schema, a description of the problem you're trying to solve and sample data that matches the aforementioned, the following is just an illustration of what a solution based on Barmar's comment could look like:
SUM(MBackOrderQty) AS `SumBackOrderQty`
FROM SorDetail
JOIN SorMaster ON SorDetail.SalesOrder=SorMaster.SalesOrder
WHERE SorMaster.ActiveFlag != 'N'
AND SorDetail.MBackOrderQty > 0
AND SorDetail.MPrice > 0
SUM(QtyOnHand) AS `SumQtyOnHand`
FROM InvWarehouse
GROUP BY MStockCode) AS IW ON SD.MStockCode=IW.MStockCode
Here's one approach:
select MStockCode,
(select sum(MBackOrderQty) from sorDetail as T2
where T2.MStockCode = T1.MStockCode ) as SumBackOrderQty,
(select sum(QtyOnHand) from invWarehouse as T3
where T3.MStockCode = T1.MStockCode ) as SumQtyOnHand
select mstockcode from sorDetail
select mstockcode from invWarehouse
) as T1
In a fiddle here:!9/fdaca/6
Though my SumQtyOnHand values don't match yours (as #Gordon pointed out).