Count unique entries based on previous groups - sql

Trying to find unique values in each group, however with a look back at the previously grouped items. It will be group by time, so if previous time block had the unique value it should not appear in the next time block. Lookback should span all previous time blocks. So at time 2, it looks at time 0 and 1, while at time 10 it looks back at time 0 to 9.
I am also looking to do this dynamically, without manually offsetting each time block with a subquery, as time here is continuous and not discrete data set.
Sample data:
2018-03-25 00:00:00.000, 123
2018-03-25 00:00:00.000, 231
2018-03-26 00:00:00.000, 234
2018-03-26 00:00:00.000, 123
2018-03-27 00:00:00.000, 123
2018-03-27 00:00:00.000, 231
2018-03-27 00:00:00.000, 234
2018-03-27 00:00:00.000, 432
Sample output:
2018-03-25 00:00:00.000, 2
2018-03-26 00:00:00.000, 1
2018-03-27 00:00:00.000, 1

If I got you right, you can consider that if the value exists in any past group, it should be excluded from the results set.
I think this kind of approach should help you:
select groupped.t, count(*) from
(select distinct base.t, base.v from foo as base where v not in
(
select u.v from foo as u where u.t < base.t
)
) as groupped group by groupped.t;
Heres also a fiddle. Hope this helps. http://sqlfiddle.com/#!18/4a65e/1

Related

Access Query: Match Two FKs, Select Record with Max (Latest) Time, Return 3d Field From Record

I have an Access table (Logs) like this:
pk
modID
relID
DateTime
TxType
1
1234
22.3
10/1/22 04:00
1
2
1234
23.1
10/10/22 06:00
1
3
1234
23.1
10/11/22 07:00
2
4
1234
23.1
10/12/22 08:00
3
5
4321
22.3
10/2/22 06:00
7
6
4321
23.1
10/10/22 06:00
1
7
4321
23.1
10/11/22 07:30
3
Trying to write a query as part of a function that searches this table:
for all records matching a given modID and relID (e.g. 1234 and 23.1),
picks the most recent one (the MAX of DateTime),
returns the TxType for that record.
However, a bit new to Access and its query structure is vexing me. I landed on this but because I have to include a Total/Aggregate function for TxType I had to either choose Group By (not what I want) or Last (closer, but returns junk results). The SQL for my query is currently:
SELECT Last(Logs.TxType) AS LastOfTxType, Max(Logs.DateTime) AS MaxOfDateTime
FROM Logs
GROUP BY Logs.dmID, Logs.relID
HAVING (((Logs.dmID)=[EnterdmID]) AND ((Logs.relID)=[EnterrelID]));
It returns the TxType field when I pass it the right parameters, but not the correct record - I would like to be rid of the Last() bit but if I remove it Access complains that I don't have it as part of an aggregate function.
Anyone that can point me in the right direction here?
Have you tried
SELECT TOP 1 TxtType
FROM Logs
WHERE (((Logs.dmID)=[EnterdmID]) AND ((Logs.relID)=[EnterrelID]))
ORDER BY DateTime DESC;
That will give you the latest single data row based on your DateTime field and other criteria.

Calculating difference (or deltas) between current and previous row with clickhouse

It would be awesome if there was a way to index rows during a query.
Is there a way to SELECT (compute) the difference of a single column between consecutive rows?
Let's say, something like the following query
SELECT
toStartOfDay(stamp) AS day,
count(day ) AS events ,
day[current] - day[previous] AS difference, -- how do I calculate this
day[current] / day[previous] as percent, -- and this
FROM records
GROUP BY day
ORDER BY day
I want to get the integer and percentage difference between the current row's 'events' column and the previous one for something similar to this:
day
events
difference
percent
2022-01-06 00:00:00
197
NULL
NULL
2022-01-07 00:00:00
656
459
3.32
2022-01-08 00:00:00
15
-641
0.02
2022-01-09 00:00:00
7
-8
0.46
2022-01-10 00:00:00
137
130
19.5
My version of Clickhouse doesn't support window-function but, on looking about the LAG() function mentioned in the comments, I found neighbor(), which works perfectly for what I'm trying to do
SELECT
toStartOfDay(stamp) AS day,
count(day ) AS events ,
(events - neighbor(events, -1)) as diff,
(events / neighbor(events, -1)) as perc
FROM records
GROUP BY day
ORDER BY day

Transpose and fill missing dates in date range

First, my main issue, I want to do is to check how many users that had an active product on a given date.
My data looks like this:
UserID ActiveFrom ActiveTo
1 2019-02-03 2019-03-05
2 2019-04-01 2019-04-30
1 2019-03-06 2019-04-04
3 2019-05-01 2019-05-31
I think the solution could to select all the ActiveFrom and union with ActiveFrom, and then fill in the missing dates so that it looked something like this:
UserID ActiveOnDate
1 2019-02-03
1 2019-02-04
1 2019-02-05
And so on
Then I could count all the UserID for each date. But i can’t find a query that fills out the missing dates in the date range. And I also don’t know if this is the “easiest” solution. Any ideas?
If your Dates have the Date datatype (and not VARCHAR for exemple), you can use the BETWEEN sql operator
https://sql.sh/cours/where/between
SELECT count(*) FROM user WHERE [dateToTest] BETWEEN ActiveFrom AND ActiveTo;

CTE recursion infinite loop

I'm working with a stored procedure and using a CTE in SQL Server and I'm trying to reach some data from a 2 tables, but when the execution goes to the CTE query it gets an infinite loop and never ends, is there a way to prevent that infinite loop?
This is the query that I create:
WITH tableName(Id, enddate, statusDte, closeId, shceDte, calcDte, closeEndDte, ParentId, LastClose, lasCloseDte, closeClass,addSe,twon,code)
AS
(
SELECT
tba.Id,
CASE WHEN tb.ParentId IS NOT NULL
THEN tb.Id
WHEN tb.statusDte IN (1,2,3)
THEN tb.calcDte ELSE tb.shceDte
END ForecastDueDate,
statusDte, closeId, shceDte, calcDte,
CASE WHEN tb.ParentId IS NULL
THEN closeEndDte ELSE NULL END, tb.ParentId, 0,
CASE WHEN tb.ParentId IS NOT NULL
THEN statusDte
WHEN tb.statusDte = 5
AND (tb.calcDte BETWEEN '1/1/2020 12:00:00 AM' AND '12/31/2020 11:59:59 PM'
OR tb.closeEndDte BETWEEN '1/1/2020 12:00:00 AM' AND '12/31/2020 11:59:59 PM')
THEN ams.GetPreviousNthFullAuditDate(tb.Id, tb.AuditID, 2) ELSE a.statusDate END as lastDate,
a.closeClass, tba.addSe,tba.town,tba.code
FROM
tableA tba
INNER JOIN
tableB tb ON tb.Id = tba.Id
WHERE
statusDte NOT IN (3,4) AND tba.IsAtve = 1
UNION ALL
SELECT
Id, enddate,
statusDte, statusDte, shceDte, calcDte, closeEndDte, ParentId,
0, lasCloseDte, closeClass,addSe,twon,code
FROM
tableName
WHERE
enddate BETWEEN enddate AND '12/31/2020 11:59:59 PM'
)
SELECT *
FROM tableName
OPTION (maxrecursion 0)
Expected results
Id enddate statusDte closeId shceDte calcDte closeEndDte parentId lastClose lastCloseDte closeClass addSe town code
----------- ----------------------- ------------- ----------- ----------------------- ----------------------- ----------------------- ----------------------- ----------- ----------------------- ----------- --------------------------------- ---------------------- --------------------------------------------------
133 2011-04-04 00:00:00.000 22 14453 NULL 2011-04-04 00:00:00.000 2099-12-31 00:00:00.000 NULL 0 NULL 1 4707 EXECUTIVE DRIVE '' SAN DIEGO 123
56 2018-12-07 13:00:00.000 22 52354 NULL 2018-12-07 13:00:00.000 2019-12-07 00:00:00.000 NULL 0 NULL 1 75 STATE ST FL 24 '' BOSTON 345
12 2021-02-05 17:00:00.000 22 75751 NULL 2021-02-05 17:00:00.000 NULL NULL 0 NULL 1 1450 FRAZEE RD STE 308 '' SAN DIEGO 678
334 2019-03-07 16:30:00.000 15 66707 2019-03-07 16:30:00.000 2019-03-23 21:00:00.000 NULL NULL 0 2019-03-07 16:30:00.000 1 42690 RIO NEDO, STE E '' TEMECULA 91011
33 2020-01-10 17:00:00.000 22 65568 NULL 2020-01-10 17:00:00.000 NULL NULL 0 2018-01-10 17:00:00.000 1 2518 UNICORNIO ST. '' CARLSBAD 136
55 2020-04-16 20:00:00.000 22 67812 NULL 2020-04-16 20:00:00.000 NULL NULL 0 2018-04-17 20:00:00.000 1 4534 OSPREY STREET '' SAN DIEGO 653
66 2020-02-21 17:00:00.000 22 75956 NULL 2020-02-21 17:00:00.000 NULL NULL 0 2019-02-21 17:00:00.000 1 3511 CAMINO DEL RIO S, STE 305 '' SAN DIEGO 0484
094 2021-02-20 21:00:00.000 22 75629 NULL 2021-02-20 21:00:00.000 NULL NULL 0 NULL 1 29349 EAGLE DR '' MURRIETA 345
First, let's try to add some best practices. Qualify all your columns with the appropriate table alias. Just doing some of them is inconsistent and inconsistent style is difficult to read and prone to errors.
Next, you've (hopefully) dumbed down your actual query. Generic names like "tableA" hinder understanding.
Next - your first case expression seems highly suspicious. You have one branch returns tb.id and the others return what appears to be a date (or datetime). You can, unfortunately, cast an int to a datetime. Might not make any sense and it won't generate an error. So - does this make sense?
Next - you've made a common mistake with your datetime boundaries. Depending on your data you might never know this. But there is no reason to expect that and there is every reason to write your logic so that it avoids any possibility. Tibor discusses in great detail here. Shorter version - your upper boundary should always be an exclusive one to support all possible values of time for your datatype. 23:59:59 will ignore any time values with non-zero milliseconds. And use a literal format that is not dependent on language or connection settings.
Next, you add confusion. You named your columns in the cte declaration but your code also includes aliases for some (but not all - see, refer to the consistency comment) columns which differ significantly from the actual column name for the cte. The 2nd column for the cte is "enddate", the anchor query uses the alias "ForecastDueDate"
Next, you have this: tb.statusDte = 5. The name implies date; the literal implies something different. You have other columns that end in "Dte" that are obviously dates, but not this one? Danger, danger!
Next, you refer to columns "a.closeClass" and "a.statusDate". There is no table or alias named "a".
Lastly, you have:
WHERE enddate BETWEEN enddate AND '12/31/2020 11:59:59 PM'
Think about what you wrote. Is not enddate always between enddate and Dec 31 2010 (so long as enddate <= that value)? I think this is the source of your issue. You're not computing or adjusting anything from the anchor, so the recursed part just keeps selecting and selecting and selecting. There is no logic to end the recursion.
The next question is obviously "now to fix it". That is impossible to say without knowing your schema, what it represents, and your goal. The use of recursion here is not obvious.
If the data is in a structure that the hierarchy between records is is a loop then recursion goes to infinite causing a problem in SQL. You will see the resources used by SQL process is increasing tremendously.
If you use MAXRECURSION with a different value than 0 (zero lets SQL to continue recursion without a limit) you will be able to limit the recursion.
With data that is looping or referencing each other you can this MAXRECURSION parameter

How to get non occuring time gaps in SQL Server?

I have a table with time entries having start and end time. I want to get the time entries which are not there in the table.
Example: I have time entry having start time 08.00 - 09.00 and other for 10.20 - 11.00. I need a record which contains 09.00 - 10.19. As I need to do it for multiple occurrences, can anybody help me out to find this complex query?
i have a time range to show non occuring entries beteen 07.00 to 17.00 then it should return me 7.00 to 8.45 and 14.00 to 17.00
I lack reputation to do a comment (the comments by Mihai and TT were amazing and interesting), but a possible solution may be as simple as
SELECT a.endDT, Min(b.startDT)
FROM sched a, sched b
WHERE a.endDT<b.startDT
GROUP BY a.endDT
that will return for your sample data
2017-07-30 08:45:00.000 2017-07-30 09:30:00.000
2017-07-30 09:45:00.000 2017-07-30 10:30:00.000
2017-07-30 11:45:00.000 2017-07-30 13:15:00.000
2017-07-30 13:45:00.000 2017-07-30 14:00:00.000
However, as the comments of Mihai and TT point out, this will not get the time between say midnight and 8am, the first record.
I cannot tell what your sample data has to do with your description. But the problem seems to be solved by lag(). For the data you have provided:
select activity, prev_endtime as gap_start, starttime as gap_end
from (select t.*,
lag(endtime) over (partition by activity order by id) as prev_endtime
from t
) t
where starttime <> prev_endtime;
I should note that this will not work for all possible combinations of start times and end times. But, your time slots don't appear to overlap and they appear to be ordered by id, so this should work.