Group rows if certain column value directly after another in SQL - sql

I'm using MS SQL Server 2008, and I would like this:
+------+--------+--------+------------+
| id | Name | Event | Timestamp |
+------+--------+--------+------------+
| 0 | bob | note | 14:20 |
| 1 | bob | time | 14:22 |
| 2 | bob | time | 14:40 |
| 3 | bob | time | 14:45 |
| 4 | bob | send | 14:48 |
| 5 | bob | time | 15:30 |
| 6 | bob | note | 15:35 |
| 7 | bob | note | 18:00 |
+------+--------+--------+------------+
To become this:
+------+--------+--------+------------+
| id | Name | Event | Timestamp |
+------+--------+--------+------------+
| 0 | bob | note | 14:20 |
| 1 | bob | time | 14:22 |
| 4 | bob | send | 14:48 |
| 5 | bob | time | 15:30 |
| 6 | bob | note | 15:35 |
+------+--------+--------+------------+
I.e., rows are "grouped" by the "event" column. Only one of each grouped identical "event" is to be shown.
If one event, e.g. "note" as with id 0, is in the table with no row directly before or after it (row with nearest timestamps) that has an equal "event" value, it is shown;
If more than one row with the same event, e.g. "time" as with id 1-3. comes after each others (i.e. no row with a different "event" has a timestamp that is "between" them), any one of them is shown (doesn't matter for me, all other columns are identical anyway).
Those two are the only rules.

If ids are one after one try do it this way:
select * into #tab from(
select 0 as id, 'bob' as name, 'note' as event, '14:20' as time union
select 1, 'bob', 'time', '14:22' union
select 2, 'bob', 'time', '14:40' union
select 3, 'bob', 'time', '14:45' union
select 4, 'bob', 'send', '14:48' union
select 5, 'bob', 'time', '15:30' union
select 6, 'bob', 'note', '15:35' union
select 7, 'bob', 'note', '18:00'
) t
select t.*
from #tab t
left join #tab t1 on t.id = t1.id + 1 and t1.event = t.event
-- and t1.name = t.name -- if there are more names you are going to need this one as well
where t1.id is null
result:
id name event time
0 bob note 14:20
1 bob time 14:22
4 bob send 14:48
5 bob time 15:30
6 bob note 15:35
Added:
If ids aren't one after one, you can make them to be:
select identity(int, 1, 1) as id, name, event, time
into #tab_ordered_ids
from #tab order by name, id, time

Related

SQL SELECT most recently created row WHERE something is true

I am trying to SELECT the most recently created row, WHERE the ID field in the row is a certain number, so I don't want the most recently created row in the WHOLE table, but the most recently created one WHERE the ID field is a specific number.
My Table:
Table:
| name | value | num |SecondName| Date |
| James | HEX124 | 1 | Carl | 11022020 |
| Jack | JEU836 | 4 | Smith | 19042020 |
| Mandy | GER234 | 33 | Jones | 09042020 |
| Mandy | HER575 | 7 | Jones | 10052020 |
| Jack | JEU836 | 4 | Smith | 14022020 |
| Ryan | GER631 | 33 | Jacque | 12042020 |
| Sarah | HER575 | 7 | Barlow | 01022019 |
| Jack | JEU836 | 4 | Smith | 14042020 |
| Ryan | HUH233 | 33 | Jacque | 15042020 |
| Sarah | HER575 | 7 | Barlow | 02022019 |
My SQL:
SELECT name, value, num, SecondName, Date
FROM MyTable
INNER JOIN (SELECT NAME, MAX(DATE) AS MaxTime FROM MyTable GROUP BY NAME) grouped ON grouped.NAME = NAME
WHERE NUM = 33
AND grouped.MaxTime = Date
What I'm doing here, is selecting the table, and creating an INNER JOIN where I'm taking the MAX Date value (the biggest/newest value), and grouping by the Name, so this will return the newest created row, for each person (Name), WHERE the NUM field is equal to 33.
Results:
| Ryan | HUH233 | 33 | Jacque | 15042020 |
As you can see, it is returning one row, as there are 3 rows with the NUM value of 33, two of which are with the Name 'Ryan', so it is grouping by the Name, and returning the latest entry for Ryan (This works fine).
But, Mandy is missing, as you can see in my first table, she has two entries, one under the NUM value of 33, and the other with the NUM value of 7. Because the entry with the NUM value of 7 was created most recently, my query where I say 'grouped.MaxTime = Date' is taking that row, and it is not being displayed, as the NUM value is not 33.
What I want to do, is read every row WHERE the NUM field is 33, THEN select the Maximum Time inside of the rows with the value of 33.
I believe what it is doing, prioritising the Maximum Date value first, then filtering the selected fields with the NUM value of 33.
Desired Results:
| Ryan | HUH233 | 33 | Jacque | 15042020 |
| Mandy | GER234 | 33 | Jones | 09042020 |
Any help would be appreciated, thank you.
If I folow you correctly, you can filter with a subquery:
select t.*
from mytable t
where t.num = 33 and t.date = (
select max(t1.date) from mytable t1 where t1.name = t.name and t1.num = t.num
)
Look at your subquery. You want the maximum dates for num 33, but you are selecting the maximum dates independent from num.
I think you want:
select *
from mytable
where (name, date) in
(
select name, max(date)
from mytable
where num = 33
group by name
);

SQL: how to check for neither overlapping nor holes in payment records

I do have a table PaymentSchedules with percentages info, and dates from/to for which those
percentages are valid, resource by resource:
| auto_numbered | res_id | date_start | date_end | org | pct |
|---------------+--------+------------+------------+-------+-----|
| 1 | A | 2018-01-01 | 2019-06-30 | One | 100 |
| 2 | A | 2019-07-01 | (NULL) | One | 60 |
| 3 | A | 2019-07-02 | 2019-12-31 | Two | 40 |
| 4 | A | 2020-01-01 | (NULL) | Two | 40 |
| 5 | B | (NULL) | (NULL) | Three | 100 |
| 6 | C | 2018-01-01 | (NULL) | One | 100 |
| 7 | C | 2019-11-01 | (NULL) | Four | 100 |
(Records #3 and #4 could be summarized onto just one line, but duplicated on purpose, to show that there are many combinations of date_start and date_end.)
A quick reading of the data:
Org "One" is fully paying for resource A up to 2019-06-30; then, it continues
to pay 60% of the cost, but the rest (40%) is being paid by org "Two" since
2019-07-02.
This should begin on 2019-07-01... small encoding error… provoking a 1-day gap.
Org "Three" is fully paying for resource B, at all times.
Org "One" is fully paying for resource C from 2018-01-01... but, starting on
2019-01-11, org "Four" is paying for it...
... and, there, there is an encoding error: we do have 200% of resource C being
taken into account since 2019-11-01: the record #6 should have been closed
(date_end set to 2019-10-31), but hasn't...
So, when we generate a financial report for the year 2019 (from 2019-01-01 to
2019-12-31), we will have calculation errors...
So, question: how can we make sure we don't have overlapping payments for
resources, or -- also the contrary -- "holes" for some period of times?
How is it possible to write an SQL query to check that there are neither
underpaid nor overpaid resources? That is, all resources in the table should be
paid, for every single day of the financial period being looked at, by exactly
one or more organizations, in a way that the summed up percentage is always
equal to 100%.
I don't see how to proceed with such a query. Anybody able to give hints, to put
me on track?
EDIT -- Working with both SQL Server and Oracle.
EDIT -- I don't own the DB, I can't add triggers or views. I need to be able to detect things "after the facts"... Need to easily spot the conflictual records, or the "missing" ones (in case of "period holes"), fix them by hand, and then re-run the financial report.
EDIT -- If we make an analysis for 2019, the following report would be desired:
| res_id | pct_sum | date |
|--------+---------+------------|
| A | 60 | 2019-07-01 |
| C | 200 | 2019-11-01 |
| C | 200 | 2019-11-02 |
| C | 200 | ... |
| C | 200 | ... |
| C | 200 | ... |
| C | 200 | 2019-12-30 |
| C | 200 | 2019-12-31 |
or, of course, an even much better version -- certainly unobtainable? -- where one
type of problem would one be present once, with the relevant date range for
which the problem is observed:
| res_id | pct_sum | date_start | date_end |
|--------+---------+------------+------------|
| A | 60 | 2019-07-01 | 2019-07-01 |
| C | 200 | 2019-11-01 | 2019-12-31 |
EDIT -- Fiddle code: db<>fiddle here
Here's an incomplete attempt for Sql Server.
Basically, the idea was to use a recursive CTE to unfold months for each res_id.
Then left join 'what could be' to the existing date ranges.
But I doubt it can be done in a sql that would work both for Oracle & MS Sql Server.
Sure, both have window functions and CTE's.
But the datetime functions are rarely the same for different RDMS.
So I give up.
Maybe someone else finds an easier solution.
create table PaymentSchedules
(
auto_numbered int identity(1,1) primary key,
res_id varchar(30),
date_start date,
date_end date,
org varchar(30),
pct decimal(3,0)
)
GO
✓
insert into PaymentSchedules
(res_id, org, pct, date_start, date_end)
values
('A', 'One', 100, '2018-01-01', '2018-06-30')
, ('A', 'One', 100, '2019-01-01', '2019-06-30')
, ('A', 'One', 60, '2019-07-01', null)
, ('A', 'Two', 40, '2019-07-02', '2019-12-31')
, ('A', 'Two', 40, '2020-01-01', null)
, ('B', 'Three', 100, null, null)
, ('C', 'One', 100, '2018-01-01', null)
, ('C', 'Four', 100, '2019-11-01', null)
;
GO
8 rows affected
declare #MaxEndDate date;
set #MaxEndDate = (select max(iif(date_start > date_end, date_start, isnull(date_end, date_start))) from PaymentSchedules);
;with rcte as
(
select res_id
, datefromparts(year(min(date_start)), month(min(date_start)), 1) as month_start
, eomonth(coalesce(max(date_end), #MaxEndDate)) as month_end
, 0 as lvl
from PaymentSchedules
group by res_id
having min(date_start) is not null
union all
select res_id
, dateadd(month, 1, month_start)
, month_end
, lvl + 1
from rcte
where dateadd(month, 1, month_start) < month_end
)
, cte_gai as
(
select c.res_id, c.month_start, c.month_end
, t.org, t.pct, t.auto_numbered
, sum(isnull(t.pct,0)) over (partition by c.res_id, c.month_start) as res_month_pct
, count(t.auto_numbered) over (partition by c.res_id, c.month_start) as cnt
from rcte c
left join PaymentSchedules t
on t.res_id = c.res_id
and c.month_start >= datefromparts(year(t.date_start), month(t.date_start), 1)
and c.month_start <= coalesce(t.date_end, #MaxEndDate)
)
select *
from cte_gai
where res_month_pct <> 100
order by res_id, month_start
GO
res_id | month_start | month_end | org | pct | auto_numbered | res_month_pct | cnt
:----- | :---------- | :--------- | :--- | :--- | ------------: | :------------ | --:
A | 2018-07-01 | 2019-12-31 | null | null | null | 0 | 0
A | 2018-08-01 | 2019-12-31 | null | null | null | 0 | 0
A | 2018-09-01 | 2019-12-31 | null | null | null | 0 | 0
A | 2018-10-01 | 2019-12-31 | null | null | null | 0 | 0
A | 2018-11-01 | 2019-12-31 | null | null | null | 0 | 0
A | 2018-12-01 | 2019-12-31 | null | null | null | 0 | 0
C | 2019-11-01 | 2020-01-31 | One | 100 | 7 | 200 | 2
C | 2019-11-01 | 2020-01-31 | Four | 100 | 8 | 200 | 2
C | 2019-12-01 | 2020-01-31 | One | 100 | 7 | 200 | 2
C | 2019-12-01 | 2020-01-31 | Four | 100 | 8 | 200 | 2
C | 2020-01-01 | 2020-01-31 | One | 100 | 7 | 200 | 2
C | 2020-01-01 | 2020-01-31 | Four | 100 | 8 | 200 | 2
db<>fiddle here
I am not giving the full answer here but I think that you are after cursors(https://learn.microsoft.com/en-us/sql/t-sql/language-elements/declare-cursor-transact-sql?view=sql-server-ver15).
This allows you to iterate through the database, checking all of the records.
This is bad practice because even though the idea is really good, they are quite heavy, and they are slow, and they block the involved tables.
I know some people have found a method to rewrite cursors using loops (while probably), so you need to understand a cursor, get how you would implement it and then translate it into a loop. (https://www.sqlbook.com/advanced/sql-cursors-how-to-avoid-them/)
Also, views can be helpful, but I am assuming that you know how to use them already.
The algorithm should be something like:
have table1 and table2 (table2 is a copy of table1, https://www.tutorialrepublic.com/sql-tutorial/sql-cloning-tables.php)
iterate through all of the records (I would use in the first instance a cursor for this) from table1. Picking up a record from table1.
if overlapping dates (check it against table2) do something
else do something else
pick another record from table1 and go to step 2.
Drop unnecessary tables

SQL Calculate daily increment with Date and ID on multiple variables

I have a table with the following structure:
+----------+-----+-------+----------+
| Date | ID | Likes | Comments |
+----------+-----+-------+----------+
| 1/1/2018 | AAA | 70 | 90 |
| 1/2/2018 | AAA | 80 | 110 |
| 1/1/2018 | BBB | 60 | 5 |
| 1/2/2018 | BBB | 90 | 6 |
+----------+-----+-------+----------+
For each day and ID, I need to calculate the incremental value. The desired output should something like this: desired output
+----------+-----+-------+----------+-------------+----------------+
| Date | ID | Likes | Comments | daily_likes | daily_comments |
+----------+-----+-------+----------+-------------+----------------+
| 1/1/2018 | AAA | 70 | 90 | 70 | 90 |
| 1/2/2018 | AAA | 80 | 110 | 10 | 20 |
| 1/1/2018 | BBB | 60 | 5 | 60 | 5 |
| 1/2/2018 | BBB | 90 | 6 | 30 | 1 |
+----------+-----+-------+----------+-------------+----------------+
I have tried this code but it keeps aggregating the daily_likes or daily_comments together.
SELECT
"date",
"created_time",
("likes_count" - LAG("likes_count", 1) OVER (ORDER BY "date")) AS "daily_likes",
("comments_count" - LAG("comments_count", 1) OVER (ORDER BY "date")) AS "daily_comments",
"id",
"likes_count",
"comments_count",
"user_username"
FROM
"blablabla"
GROUP BY
1,
"id",
"likes_count",
"comments_count",
"user_username",
"created_time"
ORDER BY
1 DESC;
The table structure you postet doesn't match the query. And I'm wondering why you group at all, let alone on nearly every column and a literal...
A query, that would produce the desired output you posted from a table like you posted would be:
SELECT "date",
"id",
"likes",
"comments",
"likes"
- lag("likes",
1,
0) OVER (PARTITION BY "id"
ORDER BY "date") "daily_likes",
"comments"
- lag("comments",
1,
0) OVER (PARTITION BY "id"
ORDER BY "date") "daily_comments"
FROM "elbat"
ORDER BY "date",
"id";
SQL Fiddle
There is no need to group at all.
(However note, to show the daily increase, this requires the source table to have data for every day for every user. If you got gaps, you'd need to join all days (in the respective range) for all users to fill them.

SQL - Rows that are repetitive with a particular condition

We have a table like this:
+----+-------+-----------------+----------------+-----------------+----------------+-----------------+
| ID | Name | RecievedService | FirstZoneTeeth | SecondZoneTeeth | ThirdZoneTeeth | FourthZoneTeeth |
+----+-------+-----------------+----------------+-----------------+----------------+-----------------+
| 1 | John | SomeService1 | 13 | | 4 | |
+----+-------+-----------------+----------------+-----------------+----------------+-----------------+
| 2 | John | SomeService1 | 34 | | | |
+----+-------+-----------------+----------------+-----------------+----------------+-----------------+
| 3 | Steve | SomeService3 | | | | 2 |
+----+-------+-----------------+----------------+-----------------+----------------+-----------------+
| 4 | Steve | SomeService4 | | | | 12 |
+----+-------+-----------------+----------------+-----------------+----------------+-----------------+
Every digit in zones is a tooth (dental science) and it means "John" has got "SomeService1" twice for tooth #3.
+----+------+-----------------+----------------+-----------------+----------------+-----------------+
| ID | Name | RecievedService | FirstZoneTeeth | SecondZoneTeeth | ThirdZoneTeeth | FourthZoneTeeth |
+----+------+-----------------+----------------+-----------------+----------------+-----------------+
| 1 | John | SomeService1 | 13 | | 4 | |
+----+------+-----------------+----------------+-----------------+----------------+-----------------+
| 2 | John | SomeService1 | 34 | | | |
+----+------+-----------------+----------------+-----------------+----------------+-----------------+
Note that Steve has received services twice for tooth #2 (4th Zone) but services are not one.
I'd write some code that gives me a table with duplicate rows (Checking the only patient and received service)(using "group by" clause") but I need to check zones too.
I've tried this:
select ROW_NUMBER() over(order by vv.ID_sick) as RowNum,
bb.Radif,
bb.VCount as 'Count',
vv.ID_sick 'ID_Sick',
vv.ID_service 'ID_Service',
sick.FNamesick + ' ' + sick.LNamesick as 'Sick',
serv.NameService as 'Service',
vv.Mab_Service as 'MabService',
vv.Mab_daryafti as 'MabDaryafti',
vv.datevisit as 'DateVisit',
vv.Zone1,
vv.Zone2,
vv.Zone3,
vv.Zone4,
vv.ID_dentist as 'ID_Dentist',
dent.FNamedentist + ' ' + dent.LNamedentist as 'Dentist',
vv.id_do as 'ID_Do',
do.FNamedentist + ' ' + do.LNamedentist as 'Do'
from visiting vv inner join (
select ROW_NUMBER() OVER(ORDER BY a.ID_sick ASC) AS Radif,
count(a.ID_sick) as VCount,
a.ID_sick,
a.ID_service
from visiting a
group by a.ID_sick, a.ID_service, a.Zone1, a.Zone2, a.Zone3, a.Zone4
having count(a.ID_sick)>1)bb
on vv.ID_sick = bb.ID_sick and vv.ID_service = bb.ID_service
left join InfoSick sick on vv.ID_sick = sick.IDsick
left join infoService serv on vv.ID_service = serv.IDService
left join Infodentist dent on vv.ID_dentist = dent.IDdentist
left join infodentist do on vv.id_do = do.IDdentist
order by bb.ID_sick, bb.ID_service,vv.datevisit
But this code only returns rows with all tooths repeated. What I want is even one tooth repeats ...
How can I implement it?
I need to check characters in zones.
**Zone's datatype is varchar
This is a bad datamodel for what you are trying to do. By storing the teeth as a varchar, you have kind of decided that you are not interested in single teeth, but only in the group of teeth. Now, however, you are trying to investigate on single teeth.
You'd want a datamodel like this:
service
+------------+--------+-----------------+
| service_id | Name | RecievedService |
+------------+--------+-----------------+
| 1 | John | SomeService1 |
+------------+--------+-----------------+
| 3 | Steve | SomeService3 |
+------------+--------+-----------------+
| 4 | Steve | SomeService4 |
+------------+-------+-----------------+
service_detail
+------------+------+-------+
| service_id | zone | tooth |
+------------+------+-------+
| 1 | 1 | 1 |
| 1 | 1 | 3 |
| 1 | 3 | 4 |
+------------+------+-------+
| 1 | 1 | 3 |
| 1 | 1 | 4 |
+------------+------+-------+
| 3 | 4 | 2 |
+------------+------+-------+
| 4 | 4 | 1 |
| 4 | 4 | 2 |
+------------+------+-------+
What you can do with the given datamodel is to create such table on-the-fly using a recursive query and string manipulation:
with unpivoted(service_id, name, zone, teeth) as
(
select recievedservice, name, 1, firstzoneteeth
from mytable where len(firstzoneteeth) > 0
union all
select recievedservice, name, 2, secondzoneteeth
from mytable where len(secondzoneteeth) > 0
union all
select recievedservice, name, 3, thirdzoneteeth
from mytable where len(thirdzoneteeth) > 0
union all
select recievedservice, name, 4, fourthzoneteeth
from mytable where len(fourthzoneteeth) > 0
)
, service_details(service_id, name, zone, tooth, teeth) as
(
select
service_id, name, zone, substring(teeth, 1, 1), substring(teeth, 2, 10000)
from unpivoted
union all
select
service_id, name, zone, substring(teeth, 1, 1), substring(teeth, 2, 10000)
from service_details
where len(teeth) > 0
)
, duplicates(service_id, name) as
(
select distinct service_id, name
from service_details
group by service_id, name, zone, tooth
having count(*) > 1
)
select m.*
from mytable m
join duplicates d on d.service_id = m.recievedservice and d.name = m.name;
A lot of work and a rather slow query due to a bad datamodel, but still feasable.
Rextester demo: http://rextester.com/JVWK49901

Union in outer query

I'm attempting to combine multiple rows using a UNION but I need to pull in additional data as well. My thought was to use a UNION in the outer query but I can't seem to make it work. Or am I going about this all wrong?
The data I have is like this:
+------+------+-------+---------+---------+
| ID | Time | Total | Weekday | Weekend |
+------+------+-------+---------+---------+
| 1001 | AM | 5 | 5 | 0 |
| 1001 | AM | 2 | 0 | 2 |
| 1001 | AM | 4 | 1 | 3 |
| 1001 | AM | 5 | 3 | 2 |
| 1001 | PM | 5 | 3 | 2 |
| 1001 | PM | 5 | 5 | 0 |
| 1002 | PM | 4 | 2 | 2 |
| 1002 | PM | 3 | 3 | 0 |
| 1002 | PM | 1 | 0 | 1 |
+------+------+-------+---------+---------+
What I want to see is like this:
+------+---------+------+-------+
| ID | DayType | Time | Tasks |
+------+---------+------+-------+
| 1001 | Weekday | AM | 9 |
| 1001 | Weekend | AM | 7 |
| 1001 | Weekday | PM | 8 |
| 1001 | Weekend | PM | 2 |
| 1002 | Weekday | PM | 5 |
| 1002 | Weekend | PM | 3 |
+------+---------+------+-------+
The closest I've come so far is using UNION statement like the following:
SELECT * FROM
(
SELECT Weekday, 'Weekday' as 'DayType' FROM t1
UNION
SELECT Weekend, 'Weekend' as 'DayType' FROM t1
) AS X
Which results in something like the following:
+---------+---------+
| Weekday | DayType |
+---------+---------+
| 2 | Weekend |
| 0 | Weekday |
| 2 | Weekday |
| 0 | Weekend |
| 10 | Weekday |
+---------+---------+
I don't see any rhyme or reason as to what the numbers are under the 'Weekday' column, I suspect they're being grouped somehow. And of course there are several other columns missing, but since I can't put a large scope in the outer query with this as inner one, I can't figure out how to pull those in. Help is greatly appreciated.
It looks like you want to union all a pair of aggregation queries that use sum() and group by id, time, one for Weekday and one for Weekend:
select Id, DayType = 'Weekend', [time], Tasks=sum(Weekend)
from t
group by id, [time]
union all
select Id, DayType = 'Weekday', [time], Tasks=sum(Weekday)
from t
group by id, [time]
Try with this
select ID, 'Weekday' as DayType, Time, sum(Weekday)
from t1
group by ID, Time
union all
select ID, 'Weekend', Time, sum(Weekend)
from t1
group by ID, Time
order by order by 1, 3, 2
Not tested, but it should do the trick. It may require 2 proc sql steps for the calculation, one for summing and one for the case when statements. If you have extra lines, just use a max statement and group by ID, Time, type_day.
Proc sql; create table want as select ID, Time,
sum(weekday) as weekdayTask,
sum(weekend) as weekendTask,
case when calculated weekdaytask>0 then weekdaytask
when calculated weekendtask>0 then weekendtask else .
end as Task,
case when calculated weekdaytask>0 then "Weekday"
when calculated weekendtask>0 then "Weekend"
end as Day_Type
from have
group by ID, Time
;quit;
Proc sql; create table want2 as select ID, Time, Day_Type, Task
from want
;quit;