First time users per day in Postgres - sql

I am new to writing queries in Postgres and am interested in understanding how one can count the number of unique first time users per day.
If the table only has two columns- user_id and start_time which is a timestamp that indicates the time of use. If a user has used on previous day, the user_id should not be counted.
Why does the following query not work? Shouldn't it be possible to select distinct on two variables at once?
SELECT COUNT (DISTINCT min(start_time::date), user_id),
start_time::date as date
FROM mytable
GROUP BY date
produces
ERROR: function count(date, integer) does not exist
The output would look like this
date count
1 2017-11-22 56
2 2017-11-23 73
3 2017-11-24 13
4 2017-11-25 91
5 2017-11-26 107
6 2017-11-27 33...
Any suggestions about how to count distinct min Date and user_id and then group by date in psql would be appreciated.

Try this
select start_time,count(*) as count from
(
select user_id,min(start_time::date) as start_time
from mytable
group by user_id
)distinctRecords
group by start_time;
This will count each user only once for min date.

You may try this logic:
First find the first login time of each user_id - MIN
(start_time) .
Joining the above results with the main table, increment the count of
user only if the user has not logged in yet. COUNT does not add 1 to the record when it's argument is NULL.
SQL Fiddle
PostgreSQL 9.6 Schema Setup:
CREATE TABLE yourtable
(user_id int, start_time varchar(19))
;
INSERT INTO yourtable
(user_id, start_time)
VALUES
(1, '2018-03-19 08:05:01'),
(2, '2018-03-19 08:05:01'),
(1, '2018-03-19 08:05:04'),
(3, '2018-03-19 08:05:01'),
(1, '2018-03-20 08:05:04'),
(2, '2018-03-20 08:05:04'),
(4, '2018-03-20 08:05:04'),
(3, '2018-03-20 08:05:06'),
(3, '2018-03-20 08:05:04'),
(3, '2018-03-20 08:05:05'),
(1, '2018-03-21 08:05:06'),
(3, '2018-03-21 08:05:05'),
(6, '2018-03-21 08:05:06'),
(3, '2018-03-22 08:05:05'),
(4, '2018-03-22 08:05:05'),
(5, '2018-03-23 08:05:05')
;
Query 1:
WITH f
AS ( SELECT user_id, MIN (start_time) first_start_time
FROM yourtable
GROUP BY user_id)
SELECT t.start_time::DATE
,count( CASE WHEN t.start_time > f.first_start_time
THEN NULL ELSE 1 END )
FROM yourtable t JOIN f ON t.user_id = f.user_id
GROUP BY start_time::DATE
ORDER BY 1
Results:
| start_time | count |
|------------|-------|
| 2018-03-19 | 3 |
| 2018-03-20 | 1 |
| 2018-03-21 | 1 |
| 2018-03-22 | 0 |
| 2018-03-23 | 1 |

you can use following query:
select count(user_id ) total_user , start_time
from (
SELECT min (date (start_time)) start_time, user_id
FROM mytable )tmp
group by start_time

Related

Is this possible in SQL? Min and Max Dates On a Total. Where it changes in between Dates

I am trying to figure out how to write a query that will give me the correct historical data between dates. But only using sql. I know it is possible coding a loop, but I'm not sure if this is possible in a SQL query. Dates: DD/MM/YYYY
An Example of Data
ID
Points
DATE
1
10
01/01/2018
1
20
02/01/2019
1
25
03/01/2020
1
10
04/01/2021
With a simple query
SELECT ID, Points, MIN(Date), MAX(Date)
FROM table
GROUP BY ID,POINTS
The Min date for 10 points would be 01/01/2018, and the Max Date would be 04/01/2021. Which would be wrong in this instance. As It should be:
ID
Points
Min DATE
Max DATE
1
10
01/01/2018
01/01/2019
1
20
02/01/2019
02/01/2020
1
25
03/01/2020
03/01/2021
1
10
04/01/2021
04/01/2021
I was thinking of using LAG, but need some ideas here. What I haven't told you is there is a record per day. So I would need to group until a change of points. This is to create a view from the data that I already have.
It looks like - for your sample data set - the following lead should suffice:
select id, points, date as MinDate,
IsNull(DateAdd(day, -1, Lead(Date,1) over(partition by Id order by Date)), Date) as MaxDate
from t
Example Fiddle
I'm guessing you want the MAX date to be 1 day before the next MIN date.
And you can use the window function LEAD to get the next MIN date.
And if you group also by the year, then the date ranges match the expected result.
SELECT ID, Points
, MIN([Date]) AS [Min Date]
, COALESCE(DATEADD(day, -1, LEAD(MIN([Date])) OVER (PARTITION BY ID ORDER BY MIN([Date]))), MAX([Date])) AS [Max Date]
FROM your_table
GROUP BY ID, Points, YEAR([Date]);
ID
Points
Min Date
Max Date
1
10
2018-01-01
2019-01-01
1
20
2019-01-02
2020-01-02
1
25
2020-01-03
2021-01-03
1
10
2021-01-04
2021-01-04
Test on db<>fiddle here
We can do this by creating two tables one with the minimum and one with the maximum date for each grouping and then combining them
CREATE TABLE dataa(
id INT,
points INT,
ddate DATE);
INSERT INTO dataa values(1 , 10 ,'2018-10-01');
INSERT INTO dataa values(1 , 20 ,'2019-01-02');
INSERT INTO dataa values(1 , 25 ,'2020-01-03');
INSERT INTO dataa values(1 , 10 ,'2021-01-04');
SELECT
mi.id, mi.points,mi.date minDate, ma.date maxDate
FROM
(select id, points, min(ddate) date from dataa group by id,points) mi
JOIN
(select id, points, max(ddate) date from dataa group by id,points) ma
ON
mi.id = ma.id
AND
mi.points = ma.points;
DROP TABLE dataa;
this gives the following output
+------+--------+------------+------------+
| id | points | minDate | maxDate |
+------+--------+------------+------------+
| 1 | 10 | 2018-10-01 | 2021-01-04 |
| 1 | 20 | 2019-01-02 | 2019-01-02 |
| 1 | 25 | 2020-01-03 | 2020-01-03 |
+------+--------+------------+------------+
I've used the default date formatting. This could be modified if you wish.
*** See my other answer, as I don't think this answer is correct after reexamining the OPs question. Leaving ths answer in place, in case it has any value.
As I understand the problem consecutive daily values with the same value for a given ID may be ignored. This can be done by examining the prior value using the LAG() function and excluding records where the current value is unchanged from the prior.
From the remaining records, the LEAD() function can be used to look ahead to the next included record to extract the date where this value is superseded. Max Date is then calculated as one day prior.
Below is an example that includes expanded test data to cover multiple IDs and repeated Points values.
DECLARE #Data TABLE (Id INT, Points INT, Date DATE)
INSERT #Data
VALUES
(1, 10, '2018-01-01'), -- Start
(1, 20, '2019-01-02'), -- Updated
(1, 25, '2020-01-03'), -- Updated
(1, 10, '2021-01-04'), -- Updated
(2, 10, '2022-01-01'), -- Start
(2, 20, '2022-02-01'), -- Updated
(2, 20, '2022-03-01'), -- No change
(2, 20, '2022-04-01'), -- No change
(2, 20, '2022-05-01'), -- No change
(2, 25, '2022-06-01'), -- Updated
(2, 25, '2022-07-01'), -- No change
(2, 20, '2022-08-01'), -- Updated
(2, 25, '2022-09-08'), -- Updated
(2, 10, '2022-10-09'), -- Updated
(3, 10, '2022-01-01'), -- Start
(3, 10, '2022-01-02'), -- No change
(3, 20, '2022-01-03'), -- Updated
(3, 20, '2022-01-04'), -- No change
(3, 20, '2022-01-05'), -- No change
(3, 10, '2022-01-06'), -- Updated
(3, 10, '2022-01-07'); -- No change
WITH CTE AS (
SELECT *, PriorPoints = LAG(Points) OVER (PARTITION BY Id ORDER BY Date)
FROM #Data
)
SELECT ID, Points, MinDate = Date,
MaxDate = DATEADD(day, -1, (LEAD(Date) OVER (PARTITION BY Id ORDER BY Date)))
FROM CTE
WHERE (PriorPoints <> Points OR PriorPoints IS NULL) -- Exclude unchanged
ORDER BY Id, Date
Results:
ID
Points
MinDate
MaxDate
1
10
2018-01-01
2019-01-01
1
20
2019-01-02
2020-01-02
1
25
2020-01-03
2021-01-03
1
10
2021-01-04
null
2
10
2022-01-01
2022-01-31
2
20
2022-02-01
2022-05-31
2
25
2022-06-01
2022-07-31
2
20
2022-08-01
2022-09-07
2
25
2022-09-08
2022-10-08
2
10
2022-10-09
null
3
10
2022-01-01
2022-01-02
3
20
2022-01-03
2022-01-05
3
10
2022-01-06
null
db<>fiddle
For the last value for a given ID, the calculated MaxDate is NULL indicating no upper bound to the date range. If you really want MaxDate = MinDate for this case, you can add ISNULL( ..., Date).
(I am adding this as an alternative (and simpler) interpretation of the OP's question.)
Problem restatement: Given a collection if IDs, Dates, and Points values, a group is defined as any consecutive sequence of the same Points value for a given ID and ascending dates. For each such group, calculate the min and max dates.
The start of such a group can be identified as a row where the Points value changes from the preceding value, or if there is no preceding value for a given ID. If we first tag such rows (NewGroup = 1), we can then assign group numbers based on a count of preceding tagged rows (including the current row). Once we have assigned group numbers, it is then a simple matter to apply a group and aggregate operation.
Below is a sample that includes some additional test data to show multiple IDs and repeating values.
DECLARE #Data TABLE (Id INT, Points INT, Date DATE)
INSERT #Data
VALUES
(1, 10, '2018-01-01'), -- Start
(1, 20, '2019-01-02'), -- Updated
(1, 25, '2020-01-03'), -- Updated
(1, 10, '2021-01-04'), -- Updated
(2, 10, '2022-01-01'), -- Start
(2, 20, '2022-02-01'), -- Updated
(2, 20, '2022-03-01'), -- No change
(2, 20, '2022-04-01'), -- No change
(2, 20, '2022-05-01'), -- No change
(2, 25, '2022-06-01'), -- Updated
(2, 25, '2022-07-01'), -- No change
(2, 20, '2022-08-01'), -- Updated
(2, 25, '2022-09-08'), -- Updated
(2, 10, '2022-10-09'), -- Updated
(3, 10, '2022-01-01'), -- Start
(3, 10, '2022-01-02'), -- No change
(3, 20, '2022-01-03'), -- Updated
(3, 20, '2022-01-04'), -- No change
(3, 20, '2022-01-05'), -- No change
(3, 10, '2022-01-06'), -- Updated
(3, 10, '2022-01-07'); -- No change
WITH CTE AS (
SELECT *,
PriorPoints = LAG(Points) OVER (PARTITION BY Id ORDER BY Date)
FROM #Data
)
, CTE2 AS (
SELECT *,
NewGroup = CASE WHEN (PriorPoints <> Points OR PriorPoints IS NULL)
THEN 1 ELSE 0 END
FROM CTE
)
, CTE3 AS (
SELECT *, GroupNo = SUM(NewGroup) OVER(
PARTITION BY ID
ORDER BY Date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)
FROM CTE2
)
SELECT Id, Points, MinDate = MIN(Date), MaxDate = MAX(Date)
FROM CTE3
GROUP BY Id, GroupNo, Points
ORDER BY Id, GroupNo
Results:
Id
Points
MinDate
MaxDate
1
10
2018-01-01
2018-01-01
1
20
2019-01-02
2019-01-02
1
25
2020-01-03
2020-01-03
1
10
2021-01-04
2021-01-04
2
10
2022-01-01
2022-01-01
2
20
2022-02-01
2022-05-01
2
25
2022-06-01
2022-07-01
2
20
2022-08-01
2022-08-01
2
25
2022-09-08
2022-09-08
2
10
2022-10-09
2022-10-09
3
10
2022-01-01
2022-01-02
3
20
2022-01-03
2022-01-05
3
10
2022-01-06
2022-01-07
To see the intermediate results, replace the final select with SELECT * FROM CTE3 ORDER BY Id, Date.
If you wish to treat gaps in dates as group criteria, add a PriorDate calculation to CTE and add OR Date <> PriorDate to the NewGroup condition.
db<>fiddle
Caution: In your original post, you state that "this is to create a view". Beware that if the above logic is included in a view, the entire result may be recalculated every time the view is accessed, regardless of any ID or date criteria applied. It might make more sense to use the above to populate and periodically refresh a historic roll-up data table for efficient access. Another alternative is to make a stored procedure with appropriate parameters that could filter that data before feeding it into the above.

how could I write a query to find items which contains at least 3 consecutive years data from the latest year?

We have many samples in the database these samples were produced from different years with the same id. I want to get the ids which have 3 consecutive years data from latest year. For example, if for id 100, the latest date is in 2019-09-01. If there exists data in 2018 and 2017. This id is qualified. How could I get them? I am using psql.
id date
1 2020-01-09
1 2019-02-03
1 2018-06-02
2 2021-01-03
2 2019-02-05
2 2018-03-09
3 2020-01-02
3 2019-03-01
4 2019-02-01
4 2019-02-04
4 2018-03-05
5 2015-02-03
6 2019-02-06
6 2018-05-06
My desired result
id
1
In a sub-select use the lag function to look back to prior row and again to look back 2 rows. Then extract the year from the date and the 2 rows from lag functions. Compere the extracted value as needed.
with test(id, tdate) as
(values (1, date '2020-01-09')
, (1, date '2019-02-03')
, (1, date '2018-06-02')
, (2, date '2021-01-03')
, (2, date '2019-02-05')
, (2, date '2018-03-09')
, (3, date '2020-01-02')
, (3, date '2019-03-01')
, (4, date '2019-02-01')
, (4, date '2019-02-04')
, (4, date '2018-03-05')
, (5, date '2015-02-03')
, (6, date '2019-02-06')
, (6, date '2018-05-06')
, (1, date '2021-01-22') --- added example row
)
select distinct id
from ( select id
, tdate
, lag(tdate,1) over(partition by id order by tdate) back1
, lag(tdate,2) over(partition by id order by tdate) back2
from test
) s
where extract(year from tdate) - 1 = extract(year from back1)
and extract(year from back1) - 1 = extract(year from back2);
Note: The query uses distinct to eliminate duplicate ids selection which would result form more than 3 consecutive years. To see this remove "distinct" and see results with line marked "added example row".

SQL Query for fetching 2 results as single row

I have table like this. I want to get employee records to get their current Designation(whose effectiveto is null) and the date where they FIRST joined as Trainee(min(effectivefrom) where Designation= Trainee)
+----+-------------------+---------------+-------------+
| ID | Designation | EffectiveFrom | EffectiveTo |
+----+-------------------+---------------+-------------+
| 1 | Trainee | 01/01/2000 | 31/12/2000 |
| 1 | Assistant Manager | 01/01/2001 | 31/12/2004 |
| 1 | Suspended | 01/01/2005 | 01/02/2005 |
| 1 | Trainee | 02/03/2005 | 31/03/2005 |
| 1 | Manager | 01/04/2005 | NULL |
| 2 | Trainee | 01/01/2014 | 31/12/2014 |
| 2 | Developer | 01/01/2015 | 31/12/2016 |
| 2 | Architect | 01/01/2017 | NULL |
+----+-------------------+---------------+-------------+
How to get result like this
+----+---------------------+---------------------+
| ID | Current Designation | Date First Employed |
+----+---------------------+---------------------+
| 1 | Manager | 01/01/2000 |
| 2 | Architect | 01/01/2014 |
+----+---------------------+---------------------+
The date of first employment could be located using CROSS APPLY and SELECT TOP(1)
CREATE TABLE #table1(
ID int,
Designation varchar(17),
EffectiveFrom datetime,
EffectiveTo varchar(10));
INSERT INTO #table1
(ID, Designation, EffectiveFrom, EffectiveTo)
VALUES
(1, 'Trainee', '2000-01-01 01:00:00', '31/12/2000'),
(1, 'Assistant Manager', '2001-01-01 01:00:00', '31/12/2004'),
(1, 'Suspended', '2005-01-01 01:00:00', '01/02/2005'),
(1, 'Trainee', '2005-02-03 01:00:00', '31/03/2005'),
(1, 'Manager', '2005-01-04 01:00:00', NULL),
(2, 'Trainee', '2014-01-01 01:00:00', '31/12/2014'),
(2, 'Developer', '2015-01-01 01:00:00', '31/12/2016'),
(2, 'Architect', '2017-01-01 01:00:00', NULL);
select t.ID, t.Designation [Current Designation],
ef.EffectiveFrom [Date First Employed]
from #table1 t
cross apply (select top(1) cast(tt.EffectiveFrom as date) EffectiveFrom
from #table1 tt
where t.ID=tt.ID
and Designation='Trainee'
order by tt.EffectiveFrom) ef
where t.EffectiveTo is null;
ID Current Designation Date First Employed
1 Manager 2000-01-01
2 Architect 2014-01-01
One method is conditional aggregation. It is a bit unclear how you define "current", but assuming this is associated with EffectiveTo being NULL:
select id,
max(case when EffectiveTo is null then designation end) as current_designation,
min(effectivefrom) as start_ate
from t
group by id;
You can try below query:
select id,max(current_designation) current_designation,min(date_first_employee) date_first_employee from
(select id,
max(case when EffectiveTo is null then designation end) over (partition by id) as current_designation,
(case when Designation='Trainee' then EffectiveFrom end) Date_First_Employee
from desig) t
group by id
Output:
This is another possilbe solution
SQL Fiddle
MySQL 5.6 Schema Setup:
CREATE TABLE table1
(`ID` int, `Designation` varchar(17), `EffectiveFrom` datetime, `EffectiveTo` varchar(10))
;
INSERT INTO table1
(`ID`, `Designation`, `EffectiveFrom`, `EffectiveTo`)
VALUES
(1, 'Trainee', '2000-01-01 01:00:00', '31/12/2000'),
(1, 'Assistant Manager', '2001-01-01 01:00:00', '31/12/2004'),
(1, 'Suspended', '2005-01-01 01:00:00', '01/02/2005'),
(1, 'Trainee', '2005-02-03 01:00:00', '31/03/2005'),
(1, 'Manager', '2005-01-04 01:00:00', NULL),
(2, 'Trainee', '2014-01-01 01:00:00', '31/12/2014'),
(2, 'Developer', '2015-01-01 01:00:00', '31/12/2016'),
(2, 'Architect', '2017-01-01 01:00:00', NULL)
;
Query 1:
SELECT
ID,
(SELECT
`Designation`
FROM
table1
WHERE
`EffectiveFrom` = (SELECT
MAX(`EffectiveFrom`)
FROM
table1
WHERE
ID = t1.ID)) AS `Current Designation`
,DATE(MIN(`EffectiveFrom`)) AS `Date First Employed`
FROM
table1 t1
GROUP BY ID
Results:
| ID | Current Designation | Date First Employed |
|----|---------------------|---------------------|
| 1 | Trainee | 2000-01-01 |
| 2 | Architect | 2014-01-01 |
It's actually a rather simple self-join assuming that the EffectiveFrom and EffectiveTo columns are always filled in appropriately (i.e. there's always only one NULL value for EffectiveTo of a given ID). Since it's possible for someone to be a Trainee twice, you also need to use a window function like ROW_NUMBER() to filter out only the earliest Traininee EffectiveFrom date:
WITH CTE_Designations AS
(
SELECT T1.ID, T1.Designation AS CurrentDesignation, ISNULL(T2.EffectiveFrom, T1.EffectiveFrom) AS DateFirstEmployed -- If the join fails below then that means the earliest Designation is in T1 (e.g. that is the 'Trainee' record)
FROM DesignationsTable AS T1
LEFT JOIN DesignationsTable AS T2
ON T1.ID = T2.ID
AND T1.Designation <> T2.Designation
AND T2.Designation = 'Trainee'
WHERE T1.EffectiveTo IS NULL
),
CTE_Designations_FirstEmployedOnly AS
(
SELECT ID, CurrentDesignation, DateFirstEmployed, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY DateFirstEmployed) AS SortId -- Generates a unique ID per DateFirstEmployed row for each Designation.ID sorted by DateFirstEmployed
FROM CTE_Designations
)
SELECT ID, CurrentDesignation, DateFirstEmployed
FROM CTE_Designations_FirstEmployedOnly
WHERE SortId = 1
Results per your example data:
Results if you had an additional person who was still a Trainee:
This returns the result shown for the given data
Solves for "current Designation(whose effectiveto is null) and the date where they FIRST joined". Also handles terminated employees where EffectiveTo is not NULL as well as new employees with a single row. Replace "#t" with your table name.
SELECT a.Id, d.Designation, a.EffectiveFrom
FROM
(SELECT *, ROW_NUMBER() OVER ( PARTITION BY Id ORDER BY EffectiveFrom ASC ) r FROM #t) a
INNER JOIN
(SELECT *, ROW_NUMBER() OVER ( PARTITION BY Id ORDER BY EffectiveFrom DESC ) r FROM #t) d
ON a.Id = d.Id
WHERE a.r = 1 AND d.r = 1
Result:
Id Designation EffectiveFrom
1 Manager 2000-01-01
2 Architect 2014-01-01

PostgreSQL: count rows where condition on start date and end date fits in generated time series

I have data organised like this
CREATE TABLE sandbox.tab_1 (id serial, started timestamp, ended timestamp);
INSERT INTO sandbox.tab_1 (id, started, ended) VALUES
(1, '2020-01-03'::timestamp, NULL),
(2, '2020-01-05'::timestamp, '2020-01-06'),
(3, '2020-01-07'::timestamp, NULL),
(4, '2020-01-08'::timestamp, NULL);
I need to count the number of rows where started >= and ended < than a generated time series that goes from min(started) to max(started). This would give me for each day the stock of started and not ended ids at a given time. The result would be something like this:
Thank you for your help
You can LEFT JOIN the table to the series of timestamps on the start being less then or equal to the timestamp and the end being greater than the timestamp or being NULL. Then GROUP BY the timestamps and take the count().
SELECT gs.ts,
count(t1.started)
FROM generate_series('2020-01-03'::timestamp, '2020-01-08'::timestamp, '1 day'::interval) gs (ts)
LEFT JOIN tab_1 t1
ON t1.started <= gs.ts
AND (t1.ended IS NULL
OR t1.ended > gs.ts)
GROUP BY gs.ts
ORDER BY gs.ts;
db<>fiddle
Here is one option with generate_series() and union all:
select ts, sum(sum(cnt)) over(order by ts) stock
from (
select generate_series(min(started), max(started), interval '1' day) ts, 0 cnt from tab_1
union all select started, 1 from tab_1
union all select ended, -1 from tab_1 where ended is not null
) t
group by ts
order by ts
Demo on DB Fiddlde:
ts | stock
:------------------ | ----:
2020-01-03 00:00:00 | 1
2020-01-04 00:00:00 | 1
2020-01-05 00:00:00 | 2
2020-01-06 00:00:00 | 1
2020-01-07 00:00:00 | 2
2020-01-08 00:00:00 | 3

Restore values in a time series table

I need to restore the most actual values from a time series table for the other values in specific time.
Let is say, that we have a table like that (I use SQL Server 2016), (this is pseudo-code, I did not check whether it works):
use sample
go
-- create time series table
drop table if exists dbo.PropertyHistory
go
create table dbo.PropertyHistory (
Id int
, Timestamp datetime
, Value int
)
go
-- fill dbo.PropertyHistory
insert into
dbo.PropertyHistory(Id, Timestamp, Value)
values
(1, '2019-01-01 12:00:00', 10)
, (1, '2019-01-01 13:00:00', 20)
, (2, '2019-01-01 13:00:00', 15)
, (3, '2019-01-01 14:00:00', 1)
, (4, '2019-01-01 15:00:00', 10)
, (1, '2019-01-01 16:00:00', 6)
, (4, '2019-01-01 17:00:00', 5)
, (2, '2019-01-01 17:00:00', 50)
, (2, '2019-01-01 19:00:00', 7)
, (1, '2019-01-01 19:00:00', 44)
go
I need to for example each row with the property id = 1 to have the last actual value (actual by datetime of course) of the property id = 2.
| Id | Timestamp | Value | Property2Value |
-------------------------------------------------------
| 1 | 2019-01-01 12:00:00 | 10 | NULL |
| 1 | 2019-01-01 13:00:00 | 20 | 15 |
| 1 | 2019-01-01 16:00:00 | 6 | 15 |
| 1 | 2019-01-01 19:00:00 | 44 | 7 |
-------------------------------------------------------
The ideas:
To create the function kinda of create function A (#propertyId int, #toDateTime datetime) which finds the latest row for the specified value restricted by the datetime. And then for each row with property id = 1 cross apply to this function. The performance is bad.
I think that it is possible to somehow use cumulative sum kinda of sum (case when PropertyId = 2 then Value else 0 end) over (order by Timestamp) but it will be cumulate more and more...
So, please help me to obtain the expected result.
If I understand correctly, this is a good use of apply:
select ph1.*, ph2.value as value2
from propertyhistory ph1 outer apply
(select top (1) ph2.*
from propertyhistory ph2
where ph2.id = 2 and ph2.timestamp <= ph1.timestamp
order by ph2.timestamp desc
) ph2
where ph1.id = 1;
Here is a db<>fiddle.
You can also do this with window functions, with the following logic:
For each row in the original data, get the most recent "2" timestamp.
Get the value for the "2" timestamp.
Filter down to just the "1"s
This looks like:
select ph.*
from (select ph.*,
max(case when ph.id = 2 then ph.value end) over (partition by timestamp_2) as value_2
from (select ph.*,
max(case when ph.id = 2 then ph.timestamp end) over (order by ph.timestamp) as timestamp_2
from propertyhistory ph
) ph
) ph
where id = 1;
We can handle this requirement by a judicious use of ROW_NUMBER, combined with some pivoting logic:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Id ORDER BY Timestamp DESC) rn
FROM dbo.PropertyHistory
)
SELECT
1 AS Id,
MAX(CASE WHEN Id = 1 THEN Timestamp END) AS Timestamp,
MAX(CASE WHEN Id = 1 THEN Value END) AS Value,
MAX(CASE WHEN Id = 2 THEN Value END) AS Property2Value
FROM cte
GROUP BY
rn
ORDER BY
MAX(CASE WHEN Id = 1 THEN Timestamp END);
Demo
The idea here is to compute a row number label for each record, numbered separately for each Id value. Then, we can aggregate by the row number, which brings the Id values from 1 and 2 into line, in a single record.