SQL from per day table to date range table transformation - sql

I need to transform the following input table to the output table where output table will have ranges instead of per day data.
Input:
Asin day is_instock
--------------------
A1 1 0
A1 2 0
A1 3 1
A1 4 1
A1 5 0
A2 3 0
A2 4 0
Output:
asin start_day end_day is_instock
---------------------------------
A1 1 2 0
A1 3 4 1
A1 5 5 0
A2 3 4 0

This is what is referred to as the "gaps and islands" problem. There's a fair amount of articles and references you can find if you use that search term.
Solution below:
/*Data setup*/
DROP TABLE IF EXISTS #Stock
CREATE TABLE #Stock ([Asin] Char(2),[day] int,is_instock bit)
INSERT INTO #Stock
VALUES
('A1',1,0)
,('A1',2,0)
,('A1',3,1)
,('A1',4,1)
,('A1',5,0)
,('A2',3,0)
,('A2',4,0);
/*Solution*/
WITH cte_Prev AS (
SELECT *
/*Compare previous day's stock status with current row's status. Every time it changes, return 1*/
,StockStatusChange = CASE WHEN is_instock = LAG(is_instock) OVER (PARTITION BY [Asin] ORDER BY [day]) THEN 0 ELSE 1 END
FROM #Stock
)
,cte_Groups AS (
/*Cumulative sum so everytime stock status changes, add 1 from StockStatusChange to begin the next group*/
SELECT GroupID = SUM(StockStatusChange) OVER (PARTITION BY [Asin] ORDER BY [day])
,*
FROM cte_Prev
)
SELECT [Asin]
,start_day = MIN([day])
,end_day = MAX([day])
,is_instock
FROM cte_Groups
GROUP BY [Asin],GroupID,is_instock

You are looking for an operator described in the temporal data literature, and "best known" as PACK.
This operator was not made part of the SQL standard (SQL:2011) that introduced the temporal features of the literature into the language, so there's extremely little chance you're going to find anything to support you in any SQL product/dialect.
Boils down to : you'll have to write out the algorithm to do the PACKing yourself.

Related

Selecting top n matches without matching the same rows twice

I am given two tables. Table 1 contains a list of appointment entries and Table 2 contains a list of date ranges, where each date range has an acceptable number of appointments it can be matched with.
I need to match an appointment from table 1 (starting with an appointment with the lowest date) to a date range in table 2. Once we've matched N appointments (where N = Allowed Appointments), we can no longer consider that date range.
Moreover, once we've matched an appointment from table 1 we can no longer consider that appointment for other matches.
Based on the matches I return table 3, with a bit column telling me if there was a match.
I am able to successfully perform this using a cursor, however this solution is not scaling well with larger datasets. I tried to match top n groups using row_count() however, this allows the same appointment to be matched multiple times which is not what I'm looking for.
Would anyone have suggestions in how to perform this matching using a set based approach?
Table 1
ApptID
ApptDate
1
01-01-2022
2
01-04-2022
3
01-05-2022
4
01-20-2022
5
01-21-2022
Table 2
DateRangeId
Date From
Date To
Allowed Num Appointments
1
01-01-2020
01-05-2020
2
2
01-06-2020
01-11-2020
1
3
01-12-2020
01-18-2020
2
4
01-20-2020
01-25-2020
1
5
01-20-2020
01-26-2020
1
Table 3 (Expected Output):
ApptID
ApptDate
Matched
DateRangeId
1
01-01-2022
1
1
2
01-04-2022
1
1
3
01-05-2022
0
NULL
4
01-20-2022
1
4
5
01-21-2022
1
5
Here's a set-based, iterative solution. Depending on the size of your data it might benefit from indexing on the temp table. It works by filling in appointment slots in order of appointment id and range id. You should be able to adjust that if something more optimal is important.
declare #r int = 0;
create table #T3 (ApptID int, ApptDate date, DateRangeId int, UsedSlot int);
insert into #T3 (ApptID, ApptDate, DateRangeId, UsedSlot)
select ApptID, ApptDate, null, 0
from T1;
set #r = ##rowcount;
while #r > 0
begin
with ranges as (
select r.DateRangeId, r.DateFrom, r.DateTo, s.ApptID, r.Allowed,
coalesce(max(s.UsedSlot) over (partition by r.DateRangeId), 0) as UsedSlots
from T2 r left outer join #T3 s on s.DateRangeId = r.DateRangeId
), appts as (
select ApptID, ApptDate from #T3 where DateRangeId is null
), candidates as (
select
a.ApptID, r.DateRangeId, r.Allowed,
UsedSlots + row_number() over (partition by r.DateRangeId
order by a.ApptID) as CandidateSlot
from appts a inner join ranges r
on a.ApptDate between r.DateFrom and r.DateTo
where r.UsedSlots < r.Allowed
), culled as (
select ApptID, DateRangeId, CandidateSlot,
row_number() over (partition by ApptID order by DateRangeId)
as CandidateSequence
from candidates
where CandidateSlot <= Allowed
)
update #T3
set DateRangeId = culled.DateRangeId,
UsedSlot = culled.CandidateSlot
from #T3 inner join culled on culled.ApptID = #T3.ApptID
where culled.CandidateSequence = 1;
set #r = ##rowcount;
end
select ApptID, ApptDate,
case when DateRangeId is null then 0 else 1 end as Matched, DateRangeId
from #T3 order by ApptID;
https://dbfiddle.uk/-5nUzx6Q
It also has occurred to me that you don't really need to store the UsedSlot column. Since it's looking for the maximum in the ranges CTE you might as well just use count(*) over . But it might still have some benefit in making sense of what's going on.

Separate columns for product counts using CTEs

Asking a question again as my post did not follow community rules.
I first tried to write a PIVOT statement to get the desired output. However, I am now trying to approach this using CTEs.
Here's the raw data. Let's call it ProductMaster:
PRODUCT_NUM
CO_CD
PROD_CD
MASTER_ID
Date
ROW_NUM
1854
MAWC
STATIONERY
10003493039
1/1/2021
1
1567
PREF
PRINTER
10003493039
2/1/2021
2
2151
MAWC
STATIONERY
10003497290
3/2/2021
1
I require the Count of each product for every Household from this data in separate columns, Printer_CT, Stationery_Ct
Each Master_ID represents a household. And a household can have multiple products.
So each household represents one row in my final output and I need the Product Counts in separate columns. There can be multiple products in each household, 4 or even more. But I have simplified this example.
I'm writing a query with CTEs to give me the output that I want. In my output, each row is grouped by Master ID
ORGL_CO_CD
ORGL_PROD_CD
STATIONERY_CT
PRINTER_CT
MAWC
STATIONERY
1
1
MAWC
STATIONERY
1
0
Here's my query. I'm not sure where to introduce Column 'Stationery_Ct'
WITH CTE AS
(
SELECT
CO_CD, Prod_CD, MASTER_ID,
'' as S1_CT, '' as P1_CT
FROM
ProductMaster
WHERE
ROW_NUM = 1
), CTE_2 AS
(
SELECT Prod_CD, MASTER_ID
FROM ProductMaster
WHERE ROW_NUM = 2
)
SELECT
CO_CD AS ORGL_CO_CD,
c.Prod_CD AS ORGL_PROD_CD,
(CASE WHEN c2.Prod_CD = ‘PRINTER’ THEN P1_CT = 1 END) PRINTER_CT
FROM
CTE AS c
LEFT OUTER JOIN
CTE_2 AS c2 ON c.MASTER_ID = c2.MASTER_ID
Any pointers would be appreciated.
Thank you!
I guess you can solve that using just GROUP BY and SUM:
-- Test data
DECLARE #ProductMaster AS TABLE (PRODUCT_NUM INT, CO_CD VARCHAR(30), PROD_CD VARCHAR(30), MASTER_ID BIGINT)
INSERT #ProductMaster VALUES (1854, 'MAWC', 'STATIONERY', 10003493039)
INSERT #ProductMaster VALUES (1567, 'PREF', 'PRINTER', 10003493039)
INSERT #ProductMaster VALUES (2151, 'MAWC', 'STATIONERY', 10003497290)
SELECT
MASTER_ID,
SUM(CASE PROD_CD WHEN 'STATIONERY' THEN 1 ELSE 0 END) AS STATIONERY_CT,
SUM(CASE PROD_CD WHEN 'PRINTER' THEN 1 ELSE 0 END) AS PRINTER_CT
FROM #ProductMaster
GROUP BY MASTER_ID
The result is:
MASTER_ID
STATIONERY_CT
PRINTER_CT
10003493039
1
1
10003497290
1
0

SQL Server - Find similarities in column and write them into new column

I have a big table with data like this:
ID Title
-- ------------------------
1 01_SOMESTRING_038
2 01_SOMESTRING K5038
3 01_SOMESTRING-648
4 K-OTHERSTRING_T_73474
5 K-OTHERSTRING_T_ffk
6 ABC
7 DEF
And the task is now to find similarities in that column, and write that found similarity to a new column.
So the desired output would be like this:
ID Title Similarity
-- ------------------------ -----------------
1 01_SOMESTRING_038 01_SOMESTRING
2 01_SOMESTRING K5038 01_SOMESTRING
3 01_SOMESTRING-648 01_SOMESTRING
4 K-OTHERSTRING_T_73474 K-OTHERSTRING_T_
5 K-OTHERSTRING_T_ffk K-OTHERSTRING_T_
6 ABC NULL
7 DEF NULL
How can I achieve that in MS SQL Server 17?
Any help is much appreciated. Thanks!
EDIT: The strings are not only broken by delimiters such as "-", "_".
And for handling competeing similrities I would set a minimum length for the similarity. For instance 10.
Try the following, using a recursive CTE to split out the letters, then we can group them up to find the greatest match:
WITH TITLE_EXPAND AS (
SELECT
1 MatchLen
,CAST(SUBSTRING(Title,1,1) as NVARCHAR(255)) MatchString
,Title
,ID
FROM
[SourceDataTable]
UNION ALL
SELECT
MatchLen + 1
,CAST(SUBSTRING(Title,1,MatchLen+1) AS NVARCHAR(255))
,Title
,ID
FROM
TITLE_EXPAND
WHERE
MatchLen < LEN(Title)
)
SELECT DISTINCT
SDT.ID
,SDT.title
,FIRST_VALUE(MatchString) OVER (PARTITION BY SDT.ID ORDER BY SC.MatchLen DESC, SC.MatchCount DESC) Similarity
FROM
[SourceDataTable] SDT
LEFT JOIN
(SELECT
*
,COUNT(*) OVER (PARTITION BY MatchString, MatchLen) MatchCount
FROM
TITLE_EXPAND) SC
ON
SDT.ID = SC.ID
AND
SC.MatchCount > 1
ORDER BY SDT.ID
Where SourceDataTable is your source table. The Similarity value will be the longest matched similar value.

Selecting and sorting data from a single table

Correction to my question....
I'm trying to select and sort in a query from a single table. The primary key for the table is a combination of a serialized number and a time/date stamp.
The table's name in the database is "A12", the columns are defined as:
Serial2D (PK, char(25), not null)
Completed (PK, datetime, not null)
Result (smallint, null)
MachineID (FK, smallint, null)
PT_1 (float, null)
PT_2 (float, null)
PT_3 (float, null)
PT_4 (float, null)
Since the primary key for the table is a combination of the "Serial2D" and "Completed", there can be multiple "Serial2D" entries with different values in the "Completed" and "Result" columns. (I did not make this database... I have to work with what I got)
I want to write a query that will utilize the value of the "Result" column ( always a "0" or "1") and retrive only unique rows for each "Serial2D" value. If the "Result" column has a "1" for that row, I want to choose it over any entries with that Serial that has a "0" in the Result column. There should be only one entry in the table that has a Result column entry of "1" for any Serial2D value.
Ex. table
Serial2d Completed Result PT_1 PT_2 PT_3 PT_4
------- ------- ------ ---- ---- ---- ----
A1 1:00AM 0 32.5 20 26 29
A1 1:02AM 0 32.5 10 29 40
A1 1:03AM 1 10 5 4 3
B1 1:04AM 0 29 4 1 9
B1 1:05AM 0 40 3 4 9
C1 1:06AM 1 9 7 6 4
I would like to be able to retrieve would be:
Serial2d Completed Result PT_1 PT_2 PT_3 PT_4
------- ------- ------ ---- ---- ---- ----
A1 1:03AM 1 10 5 4 3
B1 1:05AM 0 40 3 4 9
C1 1:06AM 1 9 7 6 4
I'm new to SQL and I'm still learning ALL the syntax. I'm finding it difficult to search for the correct operators to use since I'm not sure what I need, so please forgive my ignorance. A post with my answer could be staring me right in the face and i wouldn't know it, please just point me to it.
I appreciate the answers to my previous post, but the answers weren't sufficient for me due to MY lack of information and ineptness with SQL. I know this is probably insanely easy for some, but try to remember when you first started SQL... that's where I'm at.
Since you are using SQL Server, you can use Windowing Functions to get this data.
Using a sub-query:
select *
from
(
select *,
row_number() over(partition by serial2d
order by result desc, completed desc) rn
from a12
) x
where rn = 1
See SQL Fiddle with Demo
Or you can use CTE for this query:
;with cte as
(
select *,
row_number() over(partition by serial2d
order by result desc, completed desc) rn
from a12
)
select *
from cte c
where rn = 1;
See SQL Fiddle With Demo
You can group by Serial to get the MAX of each Time.
SELECT Serial, MAX([Time]) AS [Time]
FROM myTable
GROUP BY Serial
HAVING MAX(Result) => 0
SELECT
t.Serial,
max_Result,
MAX([time]) AS max_time
FROM
myTable t inner join
(SELECT
Serial,
MAX([Result]) AS max_Result
FROM
myTable
GROUP BY
Serial) m on
t.serial = m.serial and
t.result = m.max_result
group by
t.serial,
max_Result
This can be solved using a correlated sub-query:
SELECT
T.serial,
T.[time],
0 AS result
FROM tablename T
WHERE
T.result = 1
OR
NOT EXISTS(
SELECT 1
FROM tablename
WHERE
serial = T.serial
AND (
[time] > T.[time]
OR
result = 1
)
)

Best way to interpolate values in SQL

I have a table with rate at certain date :
Rates
Id | Date | Rate
----+---------------+-------
1 | 01/01/2011 | 4.5
2 | 01/04/2011 | 3.2
3 | 04/06/2011 | 2.4
4 | 30/06/2011 | 5
I want to get the output rate base on a simple linear interpolation.
So if I enter 17/06/2011:
Date Rate
---------- -----
01/01/2011 4.5
01/04/2011 3.2
04/06/2011 2.4
17/06/2011
30/06/2011 5.0
the linear interpolation is (5 + 2,4) / 2 = 3,7
Is there a way to do a simple query (SQL Server 2005), or this kind of stuff need to be done in a programmatic way (C#...) ?
Something like this (corrected):
SELECT CASE WHEN next.Date IS NULL THEN prev.Rate
WHEN prev.Date IS NULL THEN next.Rate
WHEN next.Date = prev.Date THEN prev.Rate
ELSE ( DATEDIFF(d, prev.Date, #InputDate) * next.Rate
+ DATEDIFF(d, #InputDate, next.Date) * prev.Rate
) / DATEDIFF(d, prev.Date, next.Date)
END AS interpolationRate
FROM
( SELECT TOP 1
Date, Rate
FROM Rates
WHERE Date <= #InputDate
ORDER BY Date DESC
) AS prev
CROSS JOIN
( SELECT TOP 1
Date, Rate
FROM Rates
WHERE Date >= #InputDate
ORDER BY Date ASC
) AS next
As #Mark already pointed out, the CROSS JOIN has its limitations. As soon as the target value falls outside the range of defined values no records will be returned.
Also the above solution is limited to one result only. For my project I needed an interpolation for a whole list of x values and came up with the following solution. Maybe it is of interested to other readers too?
-- generate some grid data values in table #ddd:
CREATE TABLE #ddd (id int,x float,y float, PRIMARY KEY(id,x));
INSERT INTO #ddd VALUES (1,3,4),(1,4,5),(1,6,3),(1,10,2),
(2,1,4),(2,5,6),(2,6,5),(2,8,2);
SELECT * FROM #ddd;
-- target x-values in table #vals (results are to go into column yy):
CREATE TABLE #vals (xx float PRIMARY KEY,yy float null, itype int);
INSERT INTO #vals (xx) VALUES (1),(3),(4.3),(9),(12);
-- do the actual interpolation
WITH valstyp AS (
SELECT id ii,xx,
CASE WHEN min(x)<xx THEN CASE WHEN max(x)>xx THEN 1 ELSE 2 END ELSE 0 END flag,
min(x) xmi,max(x) xma
FROM #vals INNER JOIN #ddd ON id=1 GROUP BY xx,id
), ipol AS (
SELECT v.*,(b.x-xx)/(b.x-a.x) f,a.y ya,b.y yb
FROM valstyp v
INNER JOIN #ddd a ON a.id=ii AND a.x=(SELECT max(x) FROM #ddd WHERE id=ii
AND (flag=0 AND x=xmi OR flag=1 AND x<xx OR flag=2 AND x<xma))
INNER JOIN #ddd b ON b.id=ii AND b.x=(SELECT min(x) FROM #ddd WHERE id=ii
AND (flag=0 AND x>xmi OR flag=1 AND x>xx OR flag=2 AND x=xma))
)
UPDATE v SET yy=ROUND(f*ya+(1-f)*yb,8),itype=flag FROM #vals v INNER JOIN ipol i ON i.xx=v.xx;
-- list the interpolated results table:
SELECT * FROM #vals
When running the above script you will get the following data grid points in table #ddd
id x y
-- -- -
1 3 4
1 4 5
1 6 3
1 10 2
2 1 4
2 5 6
2 6 5
2 8 2
[[ The table contains grid points for two identities (id=1 and id=2). In my example I referenced only the 1-group by using where id=1 in the valstyp CTE. This can be changed to suit your requirements. ]]
and the results table #vals with the interpolated data in column yy:
xx yy itype
--- ---- -----
1 2 0
3 4 0
4.3 4.7 1
9 2.25 1
12 1.5 2
The last column itype indicates the type of interpolation/extrapolation that was used to calculate the value:
0: extrapolation to lower end
1: interpolation within given data range
2: extrapolation to higher end
This working example can be found here.
The trick with CROSS JOIN here is it wont return any records if either of the table does not have rows (1 * 0 = 0) and the query may break. Better way to do is use FULL OUTER JOIN with inequality condition (to avoid getting more than one row)
( SELECT TOP 1
Date, Rate
FROM Rates
WHERE Date <= #InputDate
ORDER BY Date DESC
) AS prev
FULL OUTER JOIN
( SELECT TOP 1
Date, Rate
FROM Rates
WHERE Date >= #InputDate
ORDER BY Date ASC
) AS next
ON (prev.Date <> next.Date) [or Rate depending on what is unique]