Merge Two Rows When DATEDIFF>3 - sql

I have a temp table which has the results of a main query where all records have been pivoted out. However, there are two date fields, that when they do not match, cannot pivot into a single row.
I am checking if they have greater than a 3 day difference between them, and if there is, then I need to delete the oldest date and merge the rest of the columns together.
I am using SQL Server 2014
Example Table
+--------------+-------------+-------------------------------+
| Lname | Date1 | idCode1 | idCode2 |
+--------------+-------------+-------------------------------+
| Higgins | 11/30/16 | 9008 2172 | NULL |
| Higgins | 12/31/16 | NULL | 4007 3589 |
| Shaffer | 11/15/16 | 9000 1541 | NULL |
| Shaffer | 11/21/16 | NULL | 7889 9412 |
+--------------+-------------+-------------------------------+
Needs to look like this.
+--------------+-------------+-------------------------------+
| Lname | Date1 | idCode1 | idCode2 |
+--------------+-------------+-------------------------------+
| Higgins | 12/31/16 | 9008 2172 | 4007 3589 |
| Shaffer | 11/21/16 | 9000 1541 | 7889 9412 |
+--------------+-------------+-------------------------------+

Unless I'm missing something here, a simple group by should do it (Assuming you are only going to get max 2 rows for each Lname):
Create and populate sample table (Please save us this step in your future questions)
DECLARE #T AS TABLE
(
Lname varchar(10),
Date1 date,
idCode1 varchar(20),
idCode2 varchar(20)
)
INSERT INTO #T VALUES
('Higgins', '11/30/16', '9008 2172', NULL ),
('Higgins', '12/31/16', NULL , '4007 3589'),
('Shaffer', '11/15/16', '9000 1541', NULL ),
('Shaffer', '11/21/16', NULL , '7889 9412')
The query:
SELECT LName,
MAX(Date1) As Date1,
MAX(IdCode1) As IdCode1,
Max(IdCode2) As IdCode2
FROM #T
GROUP BY LName
HAVING DATEDIFF(DAY, MIN(Date1), MAX(Date1)) > 3
Results:
LName Date1 IdCode1 IdCode2
Higgins 31.12.2016 00:00:00 9008 2172 4007 3589
Shaffer 21.11.2016 00:00:00 9000 1541 7889 9412

Related

Build/Rebuild a full history Type-2 Table from source data

I have a table in a PSA where I am capturing changes to records in a source table. Let's say it looks like this:
+-----+------------+----------------+------------------+------------------+
| PK | Check_cols | Dont_care_cols | start_ts | end_ts |
+-----+------------+----------------+------------------+------------------+
| 123 | abc | def | 1/1/20 00:10:00 | 1/2/20 13:13:23 |
| 123 | abc | dhf | 1/2/20 13:13:23 | 1/3/20 04:21:00 |
| 123 | abc | dhz | 1/3/20 04:21:00 | 1/5/20 12:15:00 |
| 123 | abd | dyz | 1/5/20 12:15:00 | 1/9/20 15:16:00 |
| 123 | abc | dyz | 1/9/20 15:16:00 | null |
| 456 | ghi | jkl | 1/2/20 03:45:00 | 1/10/20 00:00:00 |
| 456 | lmn | opq | 1/10/20 00:00:00 | null |
+-----+------------+----------------+------------------+------------------+
I would like to build a type-2 dimension (tracks changes with record start and stop times) from that table using only the values of check_cols, like the one shown below. I am looking for a pure SQL solution, with no looping.
check_cols is comprised of multiple columns, but I will use a md5 hash to look for changes. Since my dimension only cares about check_cols there are situations where the timestamp records aren't what I need. For instance if a value in the dont_care_cols changes, but none of the check_cols values change.
From the data above, I want the following result set:
+-----+------------+------------------+------------------+
| PK | Check_cols | start_ts | end_ts |
+-----+------------+------------------+------------------+
| 123 | abc | 1/1/20 00:10:00 | 1/5/20 12:15:00 |
| 123 | abd | 1/5/20 12:15:00 | 1/9/20 15:16:00 |
| 123 | abc | 1/9/20 15:16:00 | null |
| 456 | ghi | 1/2/20 03:45:00 | 1/10/20 00:00:00 |
| 456 | lmn | 1/10/20 00:00:00 | null |
+-----+------------+------------------+------------------+
I've tried using window functions to compare lead and lag values, get mins and maxes, etc, but I can't figure out this edge case shown for PK 123 in the first table. I also have not found a solution via google/stackoverflow/etc. Most methods rely on daily snapshots running. I want to be able to rebuild the target table if I have a logic change. Anyone have thoughts?
I don't know if this is the best answer or whether it solves all of your use-cases, but give it a try and let me know if there is an edge case that stumbles over it. It's a bit of a hack. Also, I did add a few records to the use-case:
CREATE OR REPLACE TEMP TABLE tran_data (pk int, check_cols varchar, dont_care_cols varchar, start_ts timestamp, end_ts timestamp);
INSERT INTO tran_data
SELECT *
FROM (VALUES(123,'abc','def',TO_TIMESTAMP('1/1/20 00:10:00','MM/DD/YY hh:mi:ss'),TO_TIMESTAMP('1/2/20 13:13:23','MM/DD/YY hh:mi:ss')),
(123,'abc','dhf',TO_TIMESTAMP('1/2/20 13:13:23','MM/DD/YY hh:mi:ss'),TO_TIMESTAMP('1/3/20 04:21:00','MM/DD/YY hh:mi:ss')),
(123,'abc','dhz',TO_TIMESTAMP('1/3/20 04:21:00','MM/DD/YY hh:mi:ss'),TO_TIMESTAMP('1/5/20 12:15:00','MM/DD/YY hh:mi:ss')),
(123,'abd','dyz',TO_TIMESTAMP('1/5/20 12:15:00','MM/DD/YY hh:mi:ss'),TO_TIMESTAMP('1/9/20 15:16:00','MM/DD/YY hh:mi:ss')),
(123,'abd','dyz',TO_TIMESTAMP('1/9/20 15:16:00','MM/DD/YY hh:mi:ss'),TO_TIMESTAMP('1/11/20 14:14:00','MM/DD/YY hh:mi:ss')),
(123,'abc','dyz',TO_TIMESTAMP('1/11/20 14:14:00','MM/DD/YY hh:mi:ss'),TO_TIMESTAMP('1/14/20 09:14:00','MM/DD/YY hh:mi:ss')),
(123,'abc','dyz',TO_TIMESTAMP('1/14/20 09:14:00','MM/DD/YY hh:mi:ss'),null),
(456,'ghi','jkl',TO_TIMESTAMP('1/2/20 03:45:00','MM/DD/YY hh:mi:ss'),TO_TIMESTAMP('1/10/20 00:00:00','MM/DD/YY hh:mi:ss')),
(456,'lmn','opq',TO_TIMESTAMP('1/10/20 00:00:00','MM/DD/YY hh:mi:ss'),null)
);
From there, I tried to find a way to create "groups" using a method that I hope will stand up to all of your use-cases:
SELECT DISTINCT
PK
, check_cols
, FIRST_VALUE(start_ts) OVER (PARTITION BY PK, check_cols, group_num ORDER BY start_ts) as new_start_ts
, LAST_VALUE(end_ts) OVER (PARTITION BY PK, check_cols, group_num ORDER BY start_ts) as new_end_ts
FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY PK, check_cols ORDER BY start_ts) as group_cnt
, group_cnt - pk_row as group_num
, *
FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY PK ORDER BY start_ts) as pk_row
, IFNULL(LAG(check_cols) OVER (PARTITION BY PK ORDER BY start_ts),check_cols) as prev_check_cols
, *
FROM tran_data
)
)
ORDER BY pk, new_start_ts;

Inserting different values from a row from 'Patients' table to 'visits' table as different entries

I am in a predicament where I have to take some values from one row from Patient's table and insert them into a different table (Visits) but each as a different row. Is there any way to do this in using SQL?
Patients Table:
| Jan | Feb | March | Apr | May | June |UniqueID|
| NULL | 2018-02-01 | 2019-03-01 | NULL |2018-05-01 | NULL | 1 |
| 2019-01-01| 2019-02-01 | NULL | NULL | NULL |2018-06-01| 2 |
Expected Visits Table:
| UniqueID | DateOfVist |
| 1 | 2018-02-01 |
| 1 | 2019-03-01 |
| 1 | 2018-05-01 |
| 2 | 2018-06-01 |
| 2 | 2019-01-01 |
| 2 | 2019-02-01 |
I think cross apply pretty much does what you want:
select t.uniqueid, v.dateofvisit
from t cross apply
(values (jan), (feb), (mar), (apr), (may), (jun)
) v(dateofvisit)
where v.dateofvisit is not null;
You can add insert into visits (uniqueid, dateofvisit) or into visits to (respectively) load or create a visits table.
You can use APPLY :
insert into visits (UniqueID, DateOfVist )
select UniqueID, dateofvisist
from patients t cross apply
( values (Jan), (Feb), . . ) t(dateofvisist)
where dateofvisist is not null;

Get dates missing from multiple date ranges

I have one table that stores when a customer support employee is in a particular location and for what date. Each separate date is its own record.
I have a second table that stores a range of dates that customers have asked for onsite support.
I need to extract a list of dates that a given location does NOT have any support representation. All I need is the location and the date(s). I don't care which employee in that location or which customer has requested the support.
So in the sample data below, I need to see as my query results:
+--------+------------+
| London | 04/01/2019 |
| London | 07/01/2019 |
| Paris | 05/01/2019 |
+--------+------------+
Table: Employee_Location
+----------+----------+------------+
| Employee | Location | Date |
+----------+----------+------------+
| 1111 | London | 01/01/2019 |
| 1111 | London | 02/01/2019 |
| 1111 | London | 03/01/2019 |
| 2222 | Paris | 01/01/2019 |
| 2222 | Paris | 02/01/2019 |
| 2222 | Paris | 03/01/2019 |
| 2222 | Paris | 04/01/2019 |
| 3333 | London | 05/01/2019 |
| 3333 | Paris | 06/01/2019 |
| 3333 | Paris | 07/01/2019 |
| 4444 | London | 06/01/2019 |
+----------+----------+------------+
Table: Customer_Request
+----------+----------+---------------+------------+
| Customer | Location | Request From | Request To |
+----------+----------+---------------+------------+
| AAAA | London | 01/01/2019 | 06/01/2019 |
| BBBB | Paris | 01/01/2019 | 06/01/2019 |
| CCCC | London | 05/01/2019 | 07/01/2019 |
+----------+----------+---------------+------------+
Here is my current code ...
select c.CALENDARDTM
from CALENDAR c, Employee_Location el
join Customer_Request cron el.location = cr.location
where c.CALENDARDTM NOT BETWEEN cr.RequestFrom and cr.RequestTo
and c.CALENDARDTM between '2019-01-01' AND '2019-01-07'
The key to solving this problem is to create a recordset that contains all dates between your nominated start and end dates.
There are a variety of methods you can use to do this, in the below example I have used a recursive CTE, for larger datasets you will need to tweak this slightly.
Once you have a list of all the dates, you combine it with a list of all locations, so you have all dates at all locations.?
Then you remove all records which match the records that already exists, in the example below a 'Not Exists' is used, but you can use a variety of approaches to get the desired outcome.
CREATE TABLE #Employee_Location (Employee int, [Location] varchar(100), [date] date)
INSERT INTO #Employee_Location (Employee, [Location], [Date])
VALUES (1111,'London','2019-01-01')
,(1111,'London','2019-01-02')
,(1111,'London','2019-01-03')
,(2222,'Paris','2019-01-01')
,(2222,'Paris','2019-01-02')
,(2222,'Paris','2019-01-03')
,(2222 ,'Paris','2019-01-04')
,(3333,'London','2019-01-05')
,(3333,'Paris','2019-01-06')
,(3333,'Paris','2019-01-07')
,(4444,'London','2019-01-06')
DECLARE #StartDate date = '2019-01-01'
DECLARE #EndDate date = '2019-01-07'
;WITH Dates AS (
SELECT #StartDate as d
UNION ALL
SELECT DateAdd(d, 1, d) as d
FROM Dates
WHERE d < #EndDate
)
,Locations AS (
SELECT DISTINCT [Location]
FROM #Employee_Location
)
,AllRecords AS (
SELECT d
,[Location]
FROM Dates
FULL OUTER JOIN Locations
ON 1=1
)
SELECT *
FROM AllRecords
WHERE NOT EXISTS (SELECT 1
FROM #Employee_Location e
WHERE e.[date] = Allrecords.d
AND e.[Location] = Allrecords.[Location])

Repeating ID based on

I have a very simple requirement but I'm struggling to find a way around this.
I have a very simple query:
SELECT
ServiceCode,
StartDate,
Available,
Nights,
BookingID
FROM #tmpAvailability
LEFT JOIN vwRSBooking B
ON B.Depart = A.StartDate
AND B.ServiceCode = A.SupplierCode
AND B.StatusID IN (2640, 2621)
ORDER BY StartDate;
Made up of 2 tables
#tmpAvailability which consists of the following fields:
SupplierCode
StartDate
Available
vwRSBooking which consists of the following fields
BookingID
DepartDate
Code
Nights
StatusID
Departure and startdate can be joined to link the first day, and the servicecode and suppliercode can be joined to make sure that the availability is linked to the same supplier.
Which produces an output like this:
Code | Dates | Available | Nights | BookingID
TEST | 2018-01-04 | 1 | NULL | NULL
TEST | 2018-01-05 | 1 | NULL | NULL
TEST | 2018-01-06 | 0 | 4 | 123456
TEST | 2018-01-07 | 0 | NULL | NULL
TEST | 2018-01-08 | 0 | NULL | NULL
TEST | 2018-01-09 | 0 | NULL | NULL
TEST | 2018-01-10 | 1 | NULL | NULL
TEST | 2018-01-11 | 1 | NULL | NULL
TEST | 2018-01-12 | 1 | NULL | NULL
TEST | 2018-01-13 | 0 | NULL | 234567
TEST | 2018-01-14 | 0 | NULL | NULL
TEST | 2018-01-15 | 0 | NULL | NULL
What I need is when the BookingID in for 4 days that the bookingID and the nights are spread across those days, for example:
Code | Dates | Available | Nights | BookingID
TEST | 2018-01-04 | 1 | NULL | NULL
TEST | 2018-01-05 | 1 | NULL | NULL
TEST | 2018-01-06 | 0 | 4 | 123456
TEST | 2018-01-07 | 0 | 4 | 123456
TEST | 2018-01-08 | 0 | 4 | 123456
TEST | 2018-01-09 | 0 | 4 | 123456
TEST | 2018-01-10 | 1 | NULL | NULL
TEST | 2018-01-11 | 1 | NULL | NULL
TEST | 2018-01-12 | 1 | NULL | NULL
TEST | 2018-01-13 | 0 | 3 | 234567
TEST | 2018-01-14 | 0 | 3 | 234567
TEST | 2018-01-15 | 0 | 3 | 234567
TEST | 2018-01-16 | 1 | NULL | NULL
If anyone has any ideas on how to solve it would be most appreciated.
Andrew
You could replace your vwRSBooking with another view which uses a CTE to obtain all the dates the booking covers. Then use the view's coverdate for joining to the #tmpAvailability table:
CREATE VIEW vwRSBookingFull
AS
WITH cte ( bookingid, nights, depart, code, coverdate)
AS (SELECT bookingid,
nights,
depart,
code,
depart
FROM vwRSBooking
UNION ALL
SELECT c.bookingid,
c.nights,
c.depart,
c.code,
DATEADD(d, 1, c.coverdate)
FROM cte c
WHERE DATEDIFF(d, c.depart, c.coverdate) < (c.nights - 1))
SELECT c.bookingid,
c.nights,
c.depart,
c.code,
c.coverdate
FROM cte c
GO
You will need a calendar table with all the dates in the date range your dates may fall into. For this example, I build one for January 2018. We can then join onto this table to create the additional rows.
Here is the sample code I used. You can see it at SQL Fiddle.
CREATE TABLE code (
code varchar(max),
dates date,
available int,
nights int,
bookingid int
)
INSERT INTO code VALUES
('TEST','2018-01-04','1',NULL,NULL),
('TEST','2018-01-05','1',NULL,NULL),
('TEST','2018-01-06','0',4,123456),
('TEST','2018-01-07','0',NULL,NULL),
('TEST','2018-01-08','0',NULL,NULL),
('TEST','2018-01-09','0',NULL,NULL),
('TEST','2018-01-10','1',NULL,NULL),
('TEST','2018-01-11','1',NULL,NULL),
('TEST','2018-01-12','1',NULL,NULL),
('TEST','2018-01-13','0',3,234567),
('TEST','2018-01-14','0',NULL,NULL),
('TEST','2018-01-15','0',NULL,NULL)
CREATE TABLE dates (
dates date
)
INSERT INTO dates VALUES
('2018-01-01'),('2018-01-02'),('2018-01-03'),('2018-01-04'),('2018-01-05'),('2018-01-06'),('2018-01-07'),('2018-01-08'),('2018-01-09'),('2018-01-10'),('2018-01-11'),('2018-01-12'),('2018-01-13'),('2018-01-14'),('2018-01-15'),('2018-01-16'),('2018-01-17'),('2018-01-18'),('2018-01-19'),('2018-01-20'),('2018-01-21'),('2018-01-22'),('2018-01-23'),('2018-01-24'),('2018-01-25'),('2018-01-26'),('2018-01-27'),('2018-01-28'),('2018-01-29'),('2018-01-30'),('2018-01-31')
Here is the query based on this dataset:
SELECT
code.code,
dates.dates,
code.available,
code.nights,
code.bookingid
FROM code
LEFT JOIN dates ON
dates.dates >= code.dates
AND dates.dates < DATEADD(DAY,nights,code.dates)
Edit: Here is an example using your initial query as a subquery to join your result set onto the dates table if you want a copy & paste. Still requires creating the dates table.
SELECT
ServiceCode,
StartDate,
Available,
Nights,
BookingID
FROM (
SELECT
ServiceCode,
StartDate,
Available,
Nights,
BookingID
FROM #tmpAvailability
LEFT JOIN vwRSBooking B
ON B.Depart = A.StartDate
AND B.ServiceCode = A.SupplierCode
AND B.StatusID IN (2640, 2621)
) code
LEFT JOIN dates ON
dates.dates >= code.dates
AND dates.dates < DATEADD(DAY,nights,code.dates)
ORDER BY StartDate;

HOW TO: SQL Server select distinct field based on max value in other field

id tmpname date_used tkt_nr
---|---------|------------------|--------|
1 | template| 04/03/2009 16:10 | 00011 |
2 | templat1| 04/03/2009 16:11 | 00011 |
5 | templat2| 04/03/2009 16:12 | 00011 |
3 | diffname| 03/03/2009 15:11 | 00022 |
4 | diffname| 03/03/2009 16:12 | 00022 |
6 | another | 03/03/2009 16:13 | NULL |
7 | somethin| 24/12/2008 11:12 | 00023 |
8 | name | 01/01/2009 12:12 | 00026 |
I would like to have the result:
id tmpname date_used tkt_nr
---|---------|------------------|--------|
5 | templat2| 04/03/2009 16:12 | 00011 |
4 | diffname| 03/03/2009 16:12 | 00022 |
7 | somethin| 24/12/2008 11:12 | 00023 |
8 | name | 01/01/2009 12:12 | 00026 |
So what I'm looking for is to have distinct tkt_nr values excluding NULL, based on the max value of datetime.
I have tried several options but always failed
SELECT *
FROM templateFeedback a
JOIN (
SELECT ticket_number, MAX(date_used) date_used
FROM templateFeedback
GROUP BY ticket_number
) b
ON a.ticket_number = b.ticket_number AND a.date_used = b.date_used
I would appreciate any help. Unfortunately I need the code to be compatible with SQL Server.
I've stopped doing things this way since I discovered windowing functions. Too often, there are two records with the same timestamp and I get two records in the resultset. Here's the code for tSQL. Similar for Oracle. I don't think mySQL supports this yet.
Select id, tmpname, date_used, tkt_nbr
From
(
Select id, tmpname, date_used, tkt_nbr,
rownum = Row_Number() Over (Partition by tkt_nbr Order by date_used desc)
) x
Where row_num=1