Define groups of row by logic - sql

I have a unique scenario to which i can't find a solution, so i thought to ask the experts :)
I have a query that returns a course syllabus, which each row represent a day of training. You can see in the picture below that there are rest days in the middle of the training
I can't find a way to group the each consecutive training days
Please see screenshot below detailed the rows and what i want to achieve
I am using MS-SQL 2014
Here is a Fiddle with the data i have and the expected results
SQL Fiddle

The simplest method is a difference of row_number(). The following identifies each consecutive group with a number:
select td.*,
dense_rank() over (order by dateadd(day, - seqnum, DayOfTraining)) as grpnum
from (select td.*,
row_number() over (order by DayOfTraining) as seqnum
from TrainingDays td
) td;
The key idea is that subtracting a sequence from consecutive days produces a constant for those days.
Here is the SQL Fiddle.

After many hit and trials, this is the closest I could come up with
http://rextester.com/ECBQ88563
The problem here is that if last row belongs to another group, it will still use it with previous group. So in your sample if you change last date from 19 to 20, the output will still be the same. May be with another condition we can eliminate it. Other than that this should work.
SELECT DayOfTraining1,
dense_rank() over (ORDER BY grp_dt) AS grp
FROM
(SELECT DayOfTraining1,
min(DayOfTraining) AS grp_dt
FROM
(SELECT trng.DayOfTraining AS DayOfTraining1,
dd.DayOfTraining
FROM trng
CROSS JOIN
(SELECT d.*
FROM
(SELECT trng.*,
lag (DayOfTraining,1) OVER (
ORDER BY DayOfTraining) AS nxt_DayOfTraining,
lead (DayOfTraining,1) OVER (
ORDER BY DayOfTraining) AS prev_DayOfTraining,
datediff(DAY, lag (DayOfTraining,1) OVER (
ORDER BY DayOfTraining), DayOfTraining) AS ddf
FROM trng
) d
WHERE d.ddf <> 1
OR prev_DayOfTraining IS NULL
) dd
WHERE trng.DayOfTraining <= dd.DayOfTraining
) t
GROUP BY DayOfTraining1
) t1;
Explanation: The inner query d is using lag and lead functions to capture previous and next rows values. Then we are taking the days difference and using and capturing dates where difference is not 1. These are the dates where group should switch. Use a derived table dd for the same.
Now cross join this with main table and use aggregate function to determine the continuous groups (took me many hit and trials) to achieve this.
Then use dense_rank function on it to get the group.

Related

SQL Server: loop once a month value to the whole month

I have a table that gets one value of only one day in each month. I want to duplicate that value to the whole month until a new value shows up. the result will be a table with data for each day of the month based on the last known value.
Can someone help me writing this query?
This is untested, due to a lack of consumable sample data, but this looks like a gaps and island problem. Here you can count the number of non-NULL values for Yield to assign the group "number" and then get the windowed MAX in the outer SELECT:
WITH CTE AS(
SELECT Yield,
[Date],
COUNT(yield) OVER (ORDER BY [Date]) AS Grp
FROM dbo.YourTable)
SELECT MAX(yield) OVER (PARTITION BY grp) AS yield
[Date],
DATENAME(WEEKDAY,[Date]) AS [Day]
FROM CTE;
You seem to have data on the first of the month. That suggests an alternative approach:
select t.*, t2.yield as imputed_yield
from t cross apply
(select t2.*
from t t2
where t2.date = datefromparts(year(t.date), month(t.date), 1)
) t2;
This should be able to take advantage of an index on (date, yield). And it does assume that the value you want is on the first date of the month.

Teradata SQL -Min Max transaction dates from Rows

Tried Qualify row_Number () and Qualify Min & Max functions but still not able to get range of dates for transaction. See data structure below
Need help for the following output
Thank you in advance
You need to find the groups of consecutive dates first. There are several ways to do this, in your case the best should is based on comparing a sequence to another sequence with gaps in it:
with cte as
(
select t.*
-- consecutive numbers = sequence without gaps
,row_number()
over (partition by location, cust#, cust_type -- ??
order by transaction_date) as rn
-- consecutive numbers as long as there's no missing date = sequence with gaps
,(transaction_date - date '0001-01-01') as rn2
-- assign a common (but meaningless) value to consecutive dates,
-- value changes when there's a gap
,rn2 - rn as grp
from tab as t
)
select location, cust#, cust_type -- ??
,min(transaction_date), max(transaction_date)
,min(amount), max(amount)
from cte
-- add the calculated "grp" to GROUP BY
group by location, cust#, cust_type, grp
The columns used for PARTITION BY/GROUP BY depend on your rules.

Find the second largest value with Groupings

In SQL Server, I am attempting to pull the second latest NOTE_ENTRY_DT_TIME (items highlighted in screenshot). With the query written below it still pulls the latest date (I believe it's because of the grouping but the grouping is required to join later). What is the best method to achieve this?
SELECT
hop.ACCOUNT_ID,
MAX(hop.NOTE_ENTRY_DT_TIME) AS latest_noteid
FROM
NOTES hop
WHERE
hop.GEN_YN IS NULL
AND hop.NOTE_ENTRY_DT_TIME < (SELECT MAX(hope.NOTE_ENTRY_DT_TIME)
FROM NOTES hope
WHERE hop.GEN_YN IS NULL)
GROUP BY
hop.ACCOUNT_ID
Data sample in the table:
One of the "easier" ways to get the Nth row in a group is to use a CTE and ROW_NUMBER:
WITH CTE AS(
SELECT Account_ID,
Note_Entry_Dt_Time,
ROW_NUMBER() OVER (PARTITION BY AccountID ORDER BY Note_Entry_Dt_Time DESC) AS RN
FROM dbo.YourTable)
SELECT Account_ID,
Note_Entry_Dt_Time
FROM CTE
WHERE RN = 2;
Of course, if an ACCOUNT_ID only has 1 row, then it will not be returned in the result set.
The OP's statement "The row will not always be 2." from the comments conflicts with their statement "I am attempting to pull the second latest NOTE_ENTRY_DT_TIME" in the question. At a best guess, this means that the OP has rows with the same date, that could be the "latest" date. If so, then would simply need to replace ROW_NUMBER with DENSE_RANK. Their sampple data, however, doesn't suggest this is the case.
You can use window functions:
select *
from (
select
n.*,
row_number() over(partition by account_id order by note_entry_dt_time desc) rn
from notes n
) t
where rn = 2

SQL server - count distinct over function or row_numer with rows window function

I am currently trying to get a distinct count for customers over a 90 day rolling period. I have got the amount using sum amount and over partition. However, when I do this with count distinct, SQL doesn't have functionality.
I have attempted to use row_number() with the over partition and use rows current row and 90 preceding but this also isn't available.
Would greatly appreciate any suggested work around to resolve this problem.
I have attempted to solve the problem using 2 approaches, both which have failed based on the limitations outlined above.
Approach 1
select date
,count(distinct(customer_id)) over partition () order by date rows current row and 89 preceding as cust_count_distinct
from table
Approach 2
select date
,customer_id
,row_number() over partition (customer_id) order by date rows current row and 89 preceding as rn
from table
-- was then going to filter for rn = '1' but the rows functionality not possible with ranking function windows.
The simplest method is a correlated subquery of some sort:
select d.date, c.nt
from (select distinct date from t) d cross apply
(select count(distinct customerid) as cnt
from t t2
where t2.date >= dateadd(day, -89, d.date) and
t2.date <= d.date
) c;
This is not particularly efficient (i.e. a killer) on even a medium data set. But it might serve your needs.
You can restrict the dates being returned to test to see if it works.

can we get totalcount and last record from postgresql

i am having table having 23 records , I am trying to get total count of record and last record also in single query. something like that
select count(*) ,(m order by createdDate) from music m ;
is there any way to pull this out only last record as well as total count in PostgreSQL.
This can be done using window functions
select *
from (
select m.*,
row_number() over (order by createddate desc) as rn,
count(*) over () as total_count
from music
) t
where rn = 1;
Another option would be to use a scalar sub-query and combine it with a limit clause:
select *,
(select count(*) from order_test.orders) as total_count
from music
order by createddate desc
limit 1;
Depending on the indexes, your memory configuration and the table definition might be faster then the two window functions.
No, it's not not possible to do what is being asked, sql does not function that way, the second you ask for a count () sql changes the level of your data to an aggregation. The only way to do what you are asking is to do a count() and order by in a separate query.
Another solution using windowing functions and no subquery:
SELECT DISTINCT count(*) OVER w, last_value(m) OVER w
FROM music m
WINDOW w AS (ORDER BY date DESC RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING);
The point here is that last_value applies on partitions defined by windows and not on groups defined by GROUP BY.
I did not perform any test but I suspect my solution to be the less effective amongst the three already posted. But it is also the closest to your example query so far.