Why can't I have a rank index without duplicate values - indexing

I created a rank with this code
RANK = RANKX(FILTER(
'Table',
'Table'[ID]= EARLIER('Table'[ID]) &&
'Table'[Date Time] < EARLIER('Table'[Date Time])
),'Table'[Date Time],,ASC,Skip)
which to some extent worked and followed the proper order but it gave me 1,1,1,4 instead of 1,2,3,4. The Date and Time is the same for the first three rows. Next thing I did was add a column with random values.
SupportingColumn = RANDBETWEEN(1,COUNTROWS('Table'))
I was hoping that I could still do the same grouping but sort it by some index cause it doesn't matter which row is before the other. However it completely disorigented my position.
RANK = RANKX(FILTER(
'Table',
'Table'[ID]= EARLIER('Table'[ID]) &&
'Table'[Date Time] < EARLIER('Table'[Date Time])
),'Table'[SupportingColumn],,ASC,Skip)
same code as before but now completely placing the values all over the place. Not sure how to do this.

Instead of sorting 'Table'[Date Time] directly sort by this calculated column:
Sort Column = DATEDIFF ( DATE ( 1970, 1, 1 ), 'Table'[Date Time], SECOND ) + RAND() / 10
Basically this converts 'Table'[Date Time] into Unix Seconds and then adds a random number between 0 and 0.1

Related

Adding minutes of runtime from on/off records during a time period

I have a SQL database that collects temperature and sensor data from the barn.
The table definition is:
CREATE TABLE [dbo].[DataPoints]
(
[timestamp] [datetime] NOT NULL,
[pointname] [nvarchar](50) NOT NULL,
[pointvalue] [float] NOT NULL
)
The sensors report outside temperature (degrees), inside temperature (degrees), and heating (as on/off).
Sensors create a record when the previous reading has changed, so temperatures are generated every few minutes, one record for heat coming ON, one for heat going OFF, and so on.
I'm interested in how many minutes of heat has been used overnight, so a 24-hour period from 6 AM yesterday to 6 AM today would work fine.
This query:
SELECT *
FROM [home_network].[dbo].[DataPoints]
WHERE (pointname = 'Heaters')
AND (timestamp BETWEEN '2022-12-18 06:00:00' AND '2022-12-19 06:00:00')
ORDER BY timestamp
returns this data:
2022-12-19 02:00:20 | Heaters | 1
2022-12-19 02:22:22 | Heaters | 0
2022-12-19 03:43:28 | Heaters | 1
2022-12-19 04:25:31 | Heaters | 0
The end result should be 22 minutes + 42 minutes = 64 minutes of heat, but I can't see how to get this result from a single query. It also just happens that this result set has two complete heat on/off cycles, but that will not always be the case. So, if the first heat record was = 0, that means that at 6 AM, the heat was already on, but the start time won't show in the query. The same idea applies if the last heat record is =1 at, say 05:15, which means 45 minutes have to be added to the total.
Is it possible to get this minutes-of-heat-time result with a single query? Actually, I don't know the right approach, and it doesn't matter if I have to run several queries. If needed, I could use a small app that reads the raw data, and applies logic outside of SQL to arrive at the total. But I'd prefer to be able to do this within SQL.
This isn't a complete answer, but it should help you get started. From the SQL in the post, I'm assuming you're using SQL Server. I've formatted the code to match. Replace #input with your query above if you want to test on your own data. (SELECT * FROM [home_network].[dbo]...)
--generate dummy table with sample output from question
declare #input as table(
[timestamp] [datetime] NOT NULL,
[pointname] [nvarchar](50) NOT NULL,
[pointvalue] [float] NOT NULL
)
insert into #input values
('2022-12-19 02:00:20','Heaters',1),
('2022-12-19 02:22:22','Heaters',0),
('2022-12-19 03:43:28','Heaters',1),
('2022-12-19 04:25:31','Heaters',0);
--Append a row number to the result
WITH A as (
SELECT *,
ROW_NUMBER() OVER(ORDER BY(SELECT 1)) as row_count
from #input)
--Self join the table using the row number as a guide
SELECT sum(datediff(MINUTE,startTimes.timestamp,endTimes.timestamp))
from A as startTimes
LEFT JOIN A as endTimes on startTimes.row_count=endTimes.row_count-1
--Only show periods of time where the heater is turned on at the start
WHERE startTimes.row_count%2=1
Your problem can be divided into 2 steps:
Filter sensor type and date range, while also getting time span of each record by calculating date difference between timestamp of current record and the next one in chronological order.
Filter records with ON status and summarize the duration
(Optional) convert to HH:MM:SS format to display
Here's the my take on the problem with comments of what I do in each step, all combined into 1 single query.
-- Step 3: Convert output to HH:MM:SS, this is just for show and can be reduced
SELECT STUFF(CONVERT(VARCHAR(8), DATEADD(SECOND, total_duration, 0), 108),
1, 2, CAST(FLOOR(total_duration / 3600) AS VARCHAR(5)))
FROM (
-- Step 2: select records with status ON (1) and aggregate total duration in seconds
SELECT sum(duration) as total_duration
FROM (
-- Step 1: Use LEAD to get next adjacent timestamp and calculate date difference (time span) between the current record and the next one in time order
SELECT TOP 100 PERCENT
DATEDIFF(SECOND, timestamp, LEAD(timestamp, 1, '2022-12-19 06:00:00') OVER (ORDER BY timestamp)) as duration,
pointvalue
FROM [dbo].[DataPoints]
-- filtered by sensor name and time range
WHERE pointname = 'Heaters'
AND (timestamp BETWEEN '2022-12-18 06:00:00' AND '2022-12-19 06:00:00')
ORDER BY timestamp ASC
) AS tmp
WHERE tmp.pointvalue = 1
) as tmp2
Note: As the last record does not have next adjacent timestamp, it will be filled with the end time of inspection (In this case it's 6AM of the next day).
I do not really think it would be possible to achieve within single query.
Option 1:
implement stored procedure where you can implement some logic how to calculate these periods.
Option 2:
add new column (duration) and on insert new record calculate difference between NOW and previous timestamp and update duration for previous record

Data aggregation by sliding time periods

[Query and question edited and fixed thanks to comments from #Gordon Linoff and #shawnt00]
I recently inherited a SQL query that calculates the number of some events in time windows of 30 days from a log database. It uses a CTE (Common Table Expression) to generate the 30 days ranges since '2019-01-01' to now. And then it counts the cases in those 30/60/90 days intervals. I am not sure this is the best method. All I know is that it takes a long time to run and I do not understand 100% how exactly it works. So I am trying to rebuild it in an efficient way (maybe as it is now is the most efficient way, I do not know).
I have several questions:
One of the things I notice is that instead of using DATEDIFF the query simply substracts a number of days from the date.Is that a good practice at all?
Is there a better way of doing the time comparisons?
Is there a better way to do the whole thing? The bottom line is: I need to aggregate data by number of occurrences in time periods of 30, 60 and 90 days.
Note: LogDate original format is like 2019-04-01 18:30:12.000.
DECLARE #dt1 Datetime='2019-01-01'
DECLARE #dt2 Datetime=getDate();
WITH ctedaterange
AS (SELECT [Dates]=#dt1
UNION ALL
SELECT [dates] + 30
FROM ctedaterange
WHERE [dates] + 30<= #dt2)
SELECT
[dates],
lt.Activity, COUNT(*) as Total,
SUM(CASE WHEN lt.LogDate <= dates and lt.LogDate > dates - 90 THEN 1 ELSE 0 END) AS Activity90days,
SUM(CASE WHEN lt.LogDate <= dates and lt.LogDate > dates - 60 THEN 1 ELSE 0 END) AS Activity60days,
SUM(CASE WHEN lt.LogDate <= dates and lt.LogDate > dates - 30 THEN 1 ELSE 0 END) AS Activity30days
FROM ctedaterange AS cte
JOIN (SELECT Activity, CONVERT(DATE, LogDate) as LogDate FROM LogTable) AS lt
ON cte.[dates] = lt.LogDate
group by [dates], lt.Activity
OPTION (maxrecursion 0)
Sample dataset (LogTable):
LogDate, Activity
2020-02-25 01:10:10.000, Activity01
2020-04-14 01:12:10.000, Activity02
2020-08-18 02:03:53.000, Activity02
2019-10-29 12:25:55.000, Activity01
2019-12-24 18:11:11.000, Activity03
2019-04-02 03:33:09.000, Activity01
Expected Output (the output does not reflect the data shown above for I would need too many lines in the sample set to be shown in this post)
As I said above, the bottom line is: I need to aggregate data by number of occurrences in time periods of 30, 60 and 90 days.
Activity, Activity90days, Activity60days, Activity30days
Activity01, 3, 0, 1
Activity02, 1, 10, 2
Activity03, 5, 1, 3
Thank you for any suggestion.
SQL Server doesn't yet have the option to range over values of the window frame of an analytic function. Since you've generated all possible dates though and you've already got the counts by date, it's very easy to look back a specific number of (aggregated) rows to get the right totals. Here is my suggested expression for 90 days:
sum(count(LogDate)) over (
partition by Activity order by [dates]
with rows between 89 preceding and current row
)

sql query to calculate the difference between two dates of different columns and adjacent rows

Below is my table.
I want to populate the break column by calculating the difference between two dates
I want to calculate time between column EndD,EndT and StartD,StartD but of different rows(2nd row).
for e.g-
EndD is '2016-06-01 18:17:48' and start date is '2016-06-01 18:46:05' and break time should be calculated between these two dates
Output should be like this:
Try this:
SELECT ID, AgentID, StartD, StartT, EndD, EndT,
DATEDIFF(minute,
CAST(t1.EndD + ' ' + t1.EndT AS DATETIME),
(SELECT CAST(t2.StartD + ' ' + t2.StartT AS DATETIME)
FROM yourTable t2
WHERE t2.ID = t1.ID + 1 AND t1.AgentID = t2.AgentID)) as [break]
FROM yourTable t1
The Datediff function takes the minutes between an embedded query (which takes the date and time from the next line with same agent) and the date and time of your current row. Also, be sure to use brackets around your column name "break", since it is a SQL-Server reserved word.
UPDATE:
User reported negative values from my answer. This should be resolved by switching the two date parameters in the DATEDIFF function. I have updated the code (above) to reflect this.
For tsql that would work
update a
set a.break=datediff(minute,a.endt,b.startt)
from tbl a
inner join tbl b
on a.startD=b.endD and a.ID=b.ID-1
SELECT
DATEDIFF(minute,'2016-06-01 20:17:48','2016-06-01 20:50:17')
Mention that you should change that in order you want and always increase the startT + 1

Find closest date in SQL Server

I have a table dbo.X with DateTime column Y which may have hundreds of records.
My Stored Procedure has parameter #CurrentDate, I want to find out the date in the column Y in above table dbo.X which is less than and closest to #CurrentDate.
How to find it?
The where clause will match all rows with date less than #CurrentDate and, since they are ordered descendantly, the TOP 1 will be the closest date to the current date.
SELECT TOP 1 *
FROM x
WHERE x.date < #CurrentDate
ORDER BY x.date DESC
Use DateDiff and order your result by how many days or seconds are between that date and what the Input was
Something like this
select top 1 rowId, dateCol, datediff(second, #CurrentDate, dateCol) as SecondsBetweenDates
from myTable
where dateCol < #currentDate
order by datediff(second, #CurrentDate, dateCol)
I have a better solution for this problem i think.
I will show a few images to support and explain the final solution.
Background
In my solution I have a table of FX Rates. These represent market rates for different currencies. However, our service provider has had a problem with the rate feed and as such some rates have zero values. I want to fill the missing data with rates for that same currency that as closest in time to the missing rate. Basically I want to get the RateId for the nearest non zero rate which I will then substitute. (This is not shown here in my example.)
1) So to start off lets identify the missing rates information:
Query showing my missing rates i.e. have a rate value of zero
2) Next lets identify rates that are not missing.
Query showing rates that are not missing
3) This query is where the magic happens. I have made an assumption here which can be removed but was added to improve the efficiency/performance of the query. The assumption on line 26 is that I expect to find a substitute transaction on the same day as that of the missing / zero transaction.
The magic happens is line 23: The Row_Number function adds an auto number starting at 1 for the shortest time difference between the missing and non missing transaction. The next closest transaction has a rownum of 2 etc.
Please note that in line 25 I must join the currencies so that I do not mismatch the currency types. That is I don't want to substitute a AUD currency with CHF values. I want the closest matching currencies.
Combining the two data sets with a row_number to identify nearest transaction
4) Finally, lets get data where the RowNum is 1
The final query
The query full query is as follows;
; with cte_zero_rates as
(
Select *
from fxrates
where (spot_exp = 0 or spot_exp = 0)
),
cte_non_zero_rates as
(
Select *
from fxrates
where (spot_exp > 0 and spot_exp > 0)
)
,cte_Nearest_Transaction as
(
select z.FXRatesID as Zero_FXRatesID
,z.importDate as Zero_importDate
,z.currency as Zero_Currency
,nz.currency as NonZero_Currency
,nz.FXRatesID as NonZero_FXRatesID
,nz.spot_imp
,nz.importDate as NonZero_importDate
,DATEDIFF(ss, z.importDate, nz.importDate) as TimeDifferece
,ROW_NUMBER() Over(partition by z.FXRatesID order by abs(DATEDIFF(ss, z.importDate, nz.importDate)) asc) as RowNum
from cte_zero_rates z
left join cte_non_zero_rates nz on nz.currency = z.currency
and cast(nz.importDate as date) = cast(z.importDate as date)
--order by z.currency desc, z.importDate desc
)
select n.Zero_FXRatesID
,n.Zero_Currency
,n.Zero_importDate
,n.NonZero_importDate
,DATEDIFF(s, n.NonZero_importDate,n.Zero_importDate) as Delay_In_Seconds
,n.NonZero_Currency
,n.NonZero_FXRatesID
from cte_Nearest_Transaction n
where n.RowNum = 1
and n.NonZero_FXRatesID is not null
order by n.Zero_Currency, n.NonZero_importDate

Looking for SQL count performance improvements.

I'm refactoring some older SQL, which is struggling after 4 years and 1.7m rows of data. Is there a way to improve the following MS SQL Query:
SELECT ServiceGetDayRange_1.[Display Start Date],
SUM (CASE WHEN Calls.line_date BETWEEN [Start Date] AND [End Date] THEN 1 ELSE 0 END) AS PerDayCount
FROM dbo.ServiceGetDayRange(GETUTCDATE(), 30, #standardBias, #daylightBias, #DST_startMonth, #DST_endMonth, #DST_startWeek, #DST_endWeek, #DST_startHour, #DST_endHour, #DST_startDayNumber, #DST_endDayNumber) AS ServiceGetDayRange_1 CROSS JOIN
(select [line_date] from dbo.l_log where dbo.l_log.line_date > dateadd(day,-31,GETUTCDATE())) as Calls
GROUP BY ServiceGetDayRange_1.[Display Start Date], ServiceGetDayRange_1.[Display End Date]
ORDER BY [Display Start Date]
It counts log entries over the previous 30 days (ServiceGetDayRange function returns table detailing ranges, TZ aligned) for plotting on a chart.. useless information, but i'm not the client.
The execution plan states 99% of the exec time is used in counting the entries.. as you would expect. Very little overhead in working out the TZ offsets (remember max 30 rows).
Stupidly i thought 'ah, indexed view' but then realised i cant bind to a function.
Current exec time if 6.25 seconds. Any improvement on that +rep
Thanks in advance.
Is it faster if you turn the CASE into a WHERE?
SELECT ServiceGetDayRange_1.[Display Start Date], COUNT(*) AS PerDayCount
FROM dbo.ServiceGetDayRange(GETUTCDATE(), 30, #standardBias, #daylightBias, #DST_startMonth, #DST_endMonth, #DST_startWeek, #DST_endWeek, #DST_startHour, #DST_endHour, #DST_startDayNumber, #DST_endDayNumber) AS ServiceGetDayRange_1 CROSS JOIN
(select [line_date] from dbo.l_log where dbo.l_log.line_date > dateadd(day,-31,GETUTCDATE())) as Calls
WHERE Calls.line_date BETWEEN [Start Date] AND [End Date]
GROUP BY ServiceGetDayRange_1.[Display Start Date], ServiceGetDayRange_1.[Display End Date]
ORDER BY [Display Start Date]
6.25 seconds for nearly 2m rows is pretty good.. maybe try a count of valid rows (your 1/0 conditional should allow that) as opposed to a sum of values.. I think that's more efficient in oracle environments.