Calculate the height of a row in this scheduler calendar? - sql

I am looking for an SQL solution to pass onto my HTML javascript to alter the height of a row.
SCENARIO
I want to count the number of "rows" that will determine, how high (CSS height) this row should be to accommodate all the possible number of bars in this calendar.
For example, here we have 5 vessels in my calendar from the 5th Feb 2023, up to the 8th Feb 2023.
And here is the row data I want to "count" to give me '3' not '5'.
How do I count these records to find what's overlapped so I get the result of '3'? This I will use in javascript to adjust the height of my row to allow the number of bars to fit inside it.
Thanks

I think you are looking of maximum number of overlapping dates. You can use CROSS APPLY to unpivot the 2 dates and then perform a cumulative sum of the value (1 as Arrival, -1 as Departure). Finally MAX() will gives you the value that you want
select max(c)
from
(
select c = sum(e) over (order by dt)
from vessels
cross apply
(
values
(Arrival, 1), -- arrival
(Departure, -1) -- departure
) e (dt, e)
) t

Related

adjust recursive sql query to exclude holidays and weekends

I have a dataset like this called data_per_day
instructional_day
points
2023-01-24
2
2023-01-23
2
2023-01-20
1
2023-01-19
0
and so on. the table shows weekdays (days minus holidays and weekends) and the number of points someone has earned. 1 is the start of a streak and 0 is the end of a streak. 2 is max points after a streak has started.
I need to find how long is the latest streak. so in this case the result should be 3
I created a recursive cte but the query returns 2 as the streak count because i'm using lag mechanism with days. instead I need to adjust so that the instructional days are used rather than all dates.
RECURSIVE cte AS (
SELECT
student_unique_id,
instructional_day,
points,
1 AS cnt
FROM
`data_per_day`
WHERE
instructional_day = DATE_ADD(CURRENT_DATE('America/Chicago'), INTERVAL -1 DAY)
UNION ALL
SELECT
a.student_unique_id,
a.instructional_day,
a.points,
c.cnt+1
FROM (
SELECT
*
FROM
`data_per_day`
WHERE
points > 0 ) a
INNER JOIN
cte c
ON
a.student_unique_id = c.student_unique_id
AND a.instructional_day = c.instructional_day - INTERVAL '1' day )
SELECT
student_unique_id,
MAX(cnt) AS streak
FROM
cte --
WHERE
student_unique_id = "419"
GROUP BY
student_unique_id
How do I adjust the query?
This is not a trivial coding exercise, so I won't actually write the code and provide it.
What you have here is a gaps and islands question. You want to identify the largest "island" of days with points within a date range. Depending upon what dates are contained in your data, you may need to generate a list of sequential dates that meet your criteria.
One problem I see is that you are trying to combine the steps to generate the date range (the recursive CTE) with the points. You'll need to separate those steps.
Define the date range.
Generate the dates within the range.
Filter the dates with isweekday = 'no' and isholiday = 'no'. You will probably want to add a row number during this step.
[left] join the dates to your data, including coalesce(points, 0)
Filter the data to points > 0.
Identify the islands.
Identify the largest island per student.

Identifying if a column is in descending order

I am using Microsoft SQL Server 2005 Management Studio. I am a bit new so I hope I am not breaking any rules. My data is 15 columns and almost a million rows, however I am just giving you a sample to get assistance on one area where I am stuck.
In the above example as you can see the column 'lastlevel' values are decreasing. Also you can see that data in the 'Last_read' column date range is from today to 14 days prior (it was ran yesterday hence April 27, also pls. disregard that for 1st customer date 2021/04/14 is missing, it is an anomaly).
Column 'Shipto' provides the customer number and each customer has max 14 rows of data.
Please disregard column 'current_reading' and rn
If look at 'lastlevel' again you will notice that the values are going down consistently, however on April 18th, it goes from 0.73 to 0.74, showing an increase of 0.01.
What I want to do is that whenever there is an increase at all, I want that whole customer's all 14 rows be removed from the output i.e. I only want to see customers that have the prefect descending data and no increases.
Can you help?
WITH
deltas AS
(
-- For each [Shipto]; deduct the preceding row's value and record it as the [delta]
-- Note, each [Shipto]'s first row's delta with therefor be NULL
SELECT
*,
lastlevel - LAG(lastlevel) OVER (PARTITION BY Shipto ORDER BY Last_Read, lastlevel DESC) AS delta
FROM
yourTable
),
max_deltas AS
(
-- Get the maximum of the deltas per [Shipto]
SELECT
*,
MAX(delta) OVER (PARTITION BY Shipto) AS max_delta
FROM
deltas
)
-- Return only rows where the delta never exceeds 0 (thus, never ascending over any timestep)
SELECT
*
FROM
max_deltas
WHERE
max_delta <= 0
I've ordered by Last_Read, lastlevel DESC such that if two readings are on the same date, it is assumed that the highest value should be considered to have happened first.

How to group timestamps into islands (based on arbitrary gap)?

Consider this list of dates as timestamptz:
I grouped the dates by hand using colors: every group is separated from the next by a gap of at least 2 minutes.
I'm trying to measure how much a given user studied, by looking at when they performed an action (the data is when they finished studying a sentence.) e.g.: on the yellow block, I'd consider the user studied in one sitting, from 14:24 till 14:27, or roughly 3 minutes in a row.
I see how I could group these dates with a programming language by going through all of the dates and looking for the gap between two rows.
My question is: how would go about grouping dates in this way with Postgres?
(Looking for 'gaps' on Google or SO brings too many irrelevant results; I think I'm missing the vocabulary for what I'm trying to do here.)
SELECT done, count(*) FILTER (WHERE step) OVER (ORDER BY done) AS grp
FROM (
SELECT done
, lag(done) OVER (ORDER BY done) <= done - interval '2 min' AS step
FROM tbl
) sub
ORDER BY done;
The subquery sub returns step = true if the previous row is at least 2 min away - sorted by the timestamp column done itself in this case.
The outer query adds a rolling count of steps, effectively the group number (grp) - combining the aggregate FILTER clause with another window function.
fiddle
Related:
Query to find all timestamps more than a certain interval apart
How to label groups in postgresql when group belonging depends on the preceding line?
Select longest continuous sequence
Grouping or Window
About the aggregate FILTER clause:
Aggregate columns with additional (distinct) filters
Conditional lead/lag function PostgreSQL?
Building up on Erwin's answer, here is the full query for tallying up the amount of time people spent on those sessions/islands:
My data only shows when people finished reviewing something, not when they started, which means we don't know when a session truly started; and some islands only have one timestamp in them (leading to a 0-duration estimate.) I'm accounting for both by calculating the average review time and adding it to the total duration of islands.
This is likely very idiosyncratic to my use case, but I learned a thing or two in the process, so maybe this will help someone down the line.
-- Returns estimated total study time and average time per review, both in seconds
SELECT (EXTRACT( EPOCH FROM logged) + countofislands * avgreviewtime) as totalstudytime, avgreviewtime -- add total logged time to estimate for first-review-in-island and 1-review islands
FROM
(
SELECT -- get the three key values that will let us calculate total time spent
sum(duration) as logged
, count(island) as countofislands
, EXTRACT( EPOCH FROM sum(duration) FILTER (WHERE duration != '00:00:00'::interval) )/( sum(reviews) FILTER (WHERE duration != '00:00:00'::interval) - count(reviews) FILTER (WHERE duration != '00:00:00'::interval)) as avgreviewtime
FROM
(
SELECT island, age( max(done), min(done) ) as duration, count(island) as reviews -- calculate the duration of islands
FROM
(
SELECT done, count(*) FILTER (WHERE step) OVER (ORDER BY done) AS island -- give a unique number to each island
FROM (
SELECT -- detect the beginning of islands
done,
(
lag(done) OVER (ORDER BY done) <= done - interval '2 min'
) AS step
FROM review
WHERE clicker_id = 71 AND "done" > '2015-05-13' AND "done" < '2015-05-13 15:00:00' -- keep the queries small and fast for now
) sub
ORDER BY done
) grouped
GROUP BY island
) sessions
) summary

SQL Server - Retrieve monthly data (not accumulated)

I have a table with two fields
DATE_YYYYMM
TOTAL
That table contains the accumulated total per month for the whole year.
I'd like a query retrieving the "unacumulated" total for each month.
For example, the total of 201806 would be equal to the total of 201806 minus the one of 201805.
Any tip please ? Thanks !
The window function LAG can be used to get a previous value based on an order.
And it seems you just want to subtract the previous month's total from the total.
SELECT
DATE_YYYYMM,
TOTAL,
TOTAL - ISNULL(LAG(TOTAL) OVER (ORDER BY DATE_YYYYMM), 0) AS UNACCUMULATED
FROM YourYearTotalsTable
You can use the analytic function LAG (starting from sql-server-2012) to compare values in the current row with values in a previous row.
SELECT
[date_yyyymm]
,[current_total] = [total]
,[previous_total] = LAG([total], 1, 0) OVER (ORDER BY [date_yyyymm])
,[unacumulated] = [total] - LAG([total], 1, 0) OVER (ORDER BY [date_yyyymm])
FROM [your_table]
ORDER BY [date_yyyymm];

Max partition by DAX measure equivalent?

In DAX/Power BI, I wondering if it is possible to create an aggregate calculation on a subset of data within a dataset.
I have a listing of customer scores for a period of time, e.g.
date, customer, score
-----------------------
1.1.17, A, 12
2.1.17, A, 16
4.1.17, B, 10
5.1.17, B, 14
I would like to identify to Max date per customer eg.
date, customer, score, max date per client
-------------------------------------------
1.1.17, A, 12, 2.1.17
2.1.17, A, 11, 2.1.17
4.1.17, B, 10, 5.1.17
5.1.17, B, 14, 5.1.17
The SQL equivalent would something like-
MAX(date) OVER (PARTITION BY customer).
In DAX/Power BI I realise that a calculated column can be used in combination with EARLIER but this will not be suitable because the calculated column is not responsive to filtering from a slicer. I.e I would like to find the MAX date per client as illustrated above for a filtered date range controlled from a slicer and not for the full data set which is what a calculated column does. Is such a measure possible?
You will want a measure like this:
Max Date by Customer =
CALCULATE(
MAX(Table1[Date]),
FILTER(
ALLSELECTED(Table1),
Table1[customer] = MAX(Table1[customer])
)
)
The ALLSELECTED removes the local filter context while preserving any slicer filtering.
The filter Table1[customer] = MAX(Table1[customer]) is basically the measure equivalent of Table1[customer] = EARLIER(Table1[customer]) in a calculated column.
You can use subquery :
select *, (select max(t1.date)
from table t1
where t1.customer = t.customer
) as max_date_per_client
from table t;