SQL select a row X times and insert into new - sql

I am trying to migrate a bunch of data from an old database to a new one, the old one used to just have the number of alarms that occurred on a single row. The new database inserts a new record for each alarm that occurs. Here is a basic version of how it might look. I want to select each row from Table 1 and insert the number of alarm values as new rows into Table 2.
Table 1:
| Alarm ID | Alarm Value |
|--------------|----------------|
| 1 | 3 |
| 2 | 2 |
Should go into the alarm table as the below values.
Table 2:
| Alarm New ID | Value |
|--------------|----------|
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 2 |
| 5 | 2 |
I want to create a select insert script that will do this, so the select statement will bring back the number of rows that appear in the "Value" column.

A recursive CTE can be convenient for this:
with cte as (
select id, alarm, 1 as n
from t
union all
select id, alarm, n + 1
from cte
where n < alarm
)
select row_number() over (order by id) as alarm_id, id as value
from cte
order by 1
option (maxrecursion 0);
Note: If your values do not exceed 100, then you can remove OPTION (MAXRECURSION 0).

Replicate values out with a CTE.
DECLARE #T TABLE(AlarmID INT, Value INT)
INSERT #T VALUES
(1,3),
(2,2)
;WITH ReplicateAmount AS
(
SELECT AlarmID, Value FROM #T
UNION ALL
SELECT R.AlarmID, Value=(R.Value - 1)
FROM ReplicateAmount R
INNER JOIN #T T ON R.AlarmID = T.AlarmID
WHERE R.Value > 1
)
SELECT
AlarmID = ROW_NUMBER() OVER( ORDER BY AlarmID),
Value = AlarmID --??
FROM
ReplicateAmount
ORDER BY
AlarmID
This answers your question. I would think the query below would be more useful, however, you did not include usage context.
SELECT
AlarmID,
Value
FROM
ReplicateAmount
ORDER BY
AlarmID

Rather than using an rCTE, which is recursive (as the name suggests) and will fail at 100 rows, you can use a Tally table, which tend to be far faster as well:
WITH N AS(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
Tally AS(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
FROM N N1, N N2, N N3)
SELECT ROW_NUMBER() OVER (ORDER BY V.AlarmID,T.I) AS AlarmNewID,
V.AlarmID
FROM (VALUES(1,3),(2,2))V(AlarmID,AlarmValue)
JOIN Tally T ON V.AlarmValue >= T.I;

Related

How to remove duplicates while sorting by unique datetime

I am working with a fairly bad data source, the column that has the information I need is within a varchar(max) and is delimited. However, the data can be duplicated across multiple rows so I am trying to remove these duplicates.
This can be done by trimming the column I am interested in, as when I repeat occurs the "ID" gets re-appended to the end of the column. Then I am taking a distinct of that to, which I then concatenate the results; it isn't pretty.
Example data and the query I currently use SQL Fiddle
Data Table
| id | callID | callDateTime | history |
|----|--------|-----------------------------|-------------------------------------|
| 1 | 1 | 2021-01-01 10:00:00.0000000 | Amount: 10, Ref:123, ID:123 |
| 2 | 1 | 2021-01-01 10:01:00.0000000 | Amount: 10, Ref:123, ID:123, ID:123 |
| 3 | 2 | 2021-01-01 11:00:00.0000000 | Amount:12.44, Ref:SIS, ID:124 |
| 4 | 2 | 2021-01-01 11:02:00.0000000 | Amount:11.22, Ref:Dad, ID:124 |
| 5 | 2 | 2021-01-01 11:01:00.0000000 | Amount:11.22, Ref:Mum, ID:124 |
| 6 | 3 | 2021-01-01 12:00:00.0000000 | Amount:11, ID:125 |
Query
select CallID, Concat([1],',', [2],',',[3])
from
(
select CallID, historyEdit, ROW_NUMBER() over (partition by callID order by callID) as rowNum
from
(
select distinct callID,
substring(history, 0, charindex(', ID:',history)) historyEdit
from test
) a
)b
PIVOT(max(historyEdit) for rowNum IN ([1],[2],[3])) piv
Result
| CallID | |
|--------|-------------------------------------------------------------------|
| 1 | Amount: 10, Ref:123,, |
| 2 | Amount:11.22, Ref:Dad,Amount:11.22, Ref:Mum,Amount:12.44, Ref:SIS |
| 3 | Amount:11,, |
The issue is that I need to ensure the concatenate part is doing so in the order of when the events occurred. In the above you will see that CallID 2 is in the wrong order as Information 3 is coming before Information 2, I did try to sort the base table by callDateTime first and then run the query, however it does seem to yield somewhat random results. Sometimes it will be in the correct order, other times it won't be. I assume this is because I am not specifying any order by clause in the query.
Including the callDateTime in the results then causes the distinct not return the unqiue data rows as the callDateTime is still unique to that duplicated row of data
I am using SQL Server v12
Desired Result
| CallID | |
|--------|-------------------------------------------------------------------|
| 1 | Amount: 10, Ref:123,, |
| 2 | Amount:12.44, Ref:SiS,Amount:11.22, Ref:Mum,Amount:11.22, Ref:Dad |
| 3 | Amount:11,, |
If I understand correctly, you want to break apart the history and recombine -- without duplicates -- for each callid. If so, you can use string_split() and string_agg():
select callid, string_agg(value, ', ')
from (select distinct t.callid, s.value
from test t cross apply
(select trim(s.value) as value
from string_split(t.history, ',') s
) s
) st
group by callid;
Here is a db<>fiddle.
You could use TOP clause inside the select to order the records before pivoting the results if you are sure of the number of records like below:
select callID, historyEdit
from
(
select distinct top 100000 callID, callDateTime,
substring(history, 0, charindex(', ID:',history)) historyEdit
from test
order by callDateTime
)t
Please see the results here.
One way to calculate the strings could be to CROSS APPLY using an ordinal splitter to separate the 'history' column into components which can be enumerated. The result is very close to what was provided in the question. Maybe the provided expected results aren't accurately representative? Something like this
Ordinal splitter described here
CREATE FUNCTION [dbo].[DelimitedSplit8K_LEAD]
--===== Define I/O parameters
(#pString VARCHAR(8000), #pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "zero base" and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT 0 UNION ALL
SELECT TOP (DATALENGTH(ISNULL(#pString,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT t.N+1
FROM cteTally t
WHERE (SUBSTRING(#pString,t.N,1) = #pDelimiter OR t.N = 0)
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1),
Item = SUBSTRING(#pString,s.N1,ISNULL(NULLIF((LEAD(s.N1,1,1) OVER (ORDER BY s.N1) - 1),0)-s.N1,8000))
FROM cteStart s
;
query
with
unq_cte as (
select distinct callID
from #test),
exp_cte as (
select callID, callDateTime , dl.*,
row_number() over (partition by callID, dl.Item order by callDateTime) as rn
from #test t
cross apply dbo.DelimitedSplit8K_LEAD(t.history, ',') dl)
select t.callID,
stuff((select ',' + case when rn>1 then '' else Item end
from exp_cte tt
where t.callID = tt.callID
and ltrim(rtrim(Item)) not like 'ID%'
order by tt.callDateTime, tt.ItemNumber for xml path('')), 1, 1, '') [value1]
from unq_cte t
group by t.callID;
callID value1
1 Amount: 10, Ref:123,,
2 Amount:12.44, Ref:SIS,Amount:11.22, Ref:Mum,, Ref:Dad
3 Amount:11

SQL SELECT Convert Min/Max into Separate Rows

I have a table that has a min and max value that I'd like create a row for each valid number in a SELECT statement.
Original table:
| Foobar_ID | Min_Period | Max_Period |
---------------------------------------
| 1 | 0 | 2 |
| 2 | 1 | 4 |
I'd like to turn that into:
| Foobar_ID | Period_Num |
--------------------------
| 1 | 0 |
| 1 | 1 |
| 1 | 2 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
| 2 | 4 |
The SELECT results need to come out as one result-set, so I'm not sure if a WHILE loop would work in my case.
If you expect just a handful of rows per foobar, then this is a good opportunity to learn about recursive CTEs:
with cte as (
select foobar_id, min_period as period_num, max_period
from original t
union all
select foobar_id, min_period + 1 as period_num, max_period
from cte
where period_num < max_period
)
select foobar_id, period_num
from cte
order by foobar_id, period_num;
You can extend this to any number of periods by setting the MAXRECURSION option to 0.
One method would be to use a Tally table, ther's plenty of examples out there, but I'm going to create a very small one in this example. Then you can JOIN onto that and return your result set.
--Create the Tally Table
CREATE TABLE #Tally (I int);
WITH ints AS(
SELECT 0 AS i
UNION ALL
SELECT i + 1
FROM ints
WHERE i + 1 <= 10)
--And in the numbers go!
INSERT INTO #Tally
SELECT i
FROM ints;
GO
--Create the sample table
CREATE TABLE #Sample (ID int IDENTITY(1,1),
MinP int,
MaxP int);
--Sample data
INSERT INTO #Sample (Minp, MaxP)
VALUES (0,2),
(1,4);
GO
--And the solution
SELECT S.ID,
T.I AS P
FROM #Sample S
JOIN #Tally T ON T.I BETWEEN S.MinP AND S.MaxP
ORDER BY S.ID, T.I;
GO
--Clean up
DROP TABLE #Sample;
DROP TABLE #Tally;
Depending on the size of the data and the range of the period, the easiest way to do this is to use a dynamic number fact table, as follows:
WITH rn AS (SELECT ROW_NUMBER() OVER (ORDER BY object_id) -1 as period_num FROM sys.objects)
SELECT f.foobar_id, rn.period_num
FROM foobar f
INNER JOIN rn ON rn.period_num BETWEEN f.min_period AND f.max_period
However, if you're working with a larger volume of data, it will be worth creating a number fact table with an index. You can even use a TVV for this:
-- Declare the number fact table
DECLARE #rn TABLE (period_num INT IDENTITY(0, 1) primary key, dummy int)
-- Populate the fact table so that all periods are covered
WHILE (SELECT COUNT(1) FROM #rn) < (SELECT MAX(max_period) FROM foobar)
INSERT #rn select 1 from sys.objects
-- Select using a join to the fact table
SELECT f.foo_id, rn.period_num
FROM foobar f
inner join #rn rn on rn.period_num between f.min_period and f.max_period
Just Create a function sample date and use it
CREATE FUNCTION [dbo].[Ufn_GetMInToMaxVal] (#Min_Period INT,#Max_Period INT )
RETURNS #OutTable TABLE
(
DATA INT
)
AS
BEGIN
;WIth cte
AS
(
SELECT #Min_Period As Min_Period
UNION ALL
SELECT Min_Period+1 FRom
cte
WHERE Min_Period < #Max_Period
)
INSERT INTO #OutTable
SELECT * FROM cte
RETURN
END
Get the result by executing sql statement
DECLARE #Temp AS TABLE(
Foobar_ID INT,
Min_Period INT,
Max_Period INT
)
INSERT INTO #Temp
SELECT 1, 0,2 UNION ALL
SELECT 2, 1,4
SELECT Foobar_ID ,
DATA
FROM #Temp
CROSS APPLY
[dbo].[Ufn_GetMInToMaxVal] (Min_Period,Max_Period)
Result
Foobar_ID DATA
----------------
1 0
1 1
1 2
2 1
2 2
2 3
2 4

How to order an already ordered subquery

Creating this table:
CREATE TABLE #Test (id int, name char(10), list int, priority int)
INSERT INTO #Test VALUES (1, 'One', 1, 1)
INSERT INTO #Test VALUES (2, 'Two', 2, 1)
INSERT INTO #Test VALUES (3, 'Three', 3, 2)
INSERT INTO #Test VALUES (4, 'Four', 4, 1)
INSERT INTO #Test VALUES (5, 'THREE', 3, 1)
and ordering it by, list and priority:
SELECT * FROM #Test ORDER BY list, priority
1 | One | 1 | 1
2 | Two | 2 | 1
5 | THREE | 3 | 1
3 | Three | 3 | 2
4 | Four | 4 | 1
However I want to step through rows one by one selecting the top one for each list ordered by priority, and start over when I get to the end.
For example with this simpler table:
1 | One | 1 | 1
2 | Two | 2 | 1
3 | Three | 3 | 1
4 | Four | 4 | 1
and this query:
SELECT TOP 1 * FROM #Test ORDER BY (CASE WHEN list>#PreviousList THEN 1 ELSE 2 END)
If #PreviousList is the list for the previous row I got, then this will select the next row and gracefully jump to the top when I have selected the last row.
But there are rows that will have the same list only ordered by priority - like my first example:
1 | One | 1 | 1
2 | Two | 2 | 1
5 | THREE | 3 | 1
3 | Three | 3 | 2
4 | Four | 4 | 1
Here id=3 should be skipped because id=5 have the same list ordering and a better priority. The only way I can think of doing this is simply by first order the entire list by list and priority, and then run the order by that goes through the rows one by one, like this:
SELECT TOP 1 * FROM (
SELECT * FROM #Test ORDER BY list, priority
) ORDER BY (CASE WHEN list>#PreviousList THEN 1 ELSE 2 END)
But of course I cannot order by an already ordered subquery and get the error:
The ORDER BY clause is invalid in views, inline functions, derived tables,
subqueries, and common table expressions, unless TOP or FOR XML is also
specified.
Are there any ways and can get past this problem or get the query down to a single query and order by?
Another possible solution is to use a subquery to select the min priority grouped by list and join it back to the table for the rest of the details
SELECT T2.*
FROM (SELECT MIN(priority) as priority, list
FROM #Test
GROUP BY list) AS T1
INNER JOIN #Test T2 ON T1.list = T2.list AND T1.priority = T2.priority
ORDER BY T1.list, T1.priority
I want to step through rows one by one selecting the top one for each
list ordered by priority, and start over when I get to the end.
You can use the built in ROW_NUMBER function that is designed for these scenarios with OVER(PARTITION BY name ORDER BY priority) to do this directly:
WITH CTE
AS
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY name ORDER BY priority) AS RN
FROM #Test
)
SELECT *
FROM CTE
WHERE RN = 1;
Live DEMO
The ranking number rn generated by ROW_NUMBER() OVER(PARTITION BY name ORDER BY priority) will rank each group of rows that has the same name ordered by priority then when you filtered by WHERE rn = 1 it will remove all the duplicate with the same name and left only the first priority.
SELECT TOP 1 * FROM (
SELECT * FROM #Test
) ORDER BY (CASE WHEN list>#PreviousList THEN 1 ELSE 2 END)
Try this, because Order By is not allowed in CTE.
Perhaps I am missing the requirement that makes this harder than I realize, but what about a nice simple join to select highest priority for the list. To scale, performance would require an index on list.
select t.*
, ttop.id as firstid
from #test t
JOIN #test ttop on ttop.id = (SELECT TOP 1 ID
FROM #TEST tbest
WHERE t.list = tbest.list order by priority)
and ttop.id = t.id -- this does the trick!

SELECT First Group

Problem Definition
I have an SQL query that looks like:
SELECT *
FROM table
WHERE criteria = 1
ORDER BY group;
Result
I get:
group | value | criteria
------------------------
A | 0 | 1
A | 1 | 1
B | 2 | 1
B | 3 | 1
Expected Result
However, I would like to limit the results to only the first group (in this instance, A). ie,
group | value | criteria
------------------------
A | 0 | 1
A | 1 | 1
What I've tried
Group By
SELECT *
FROM table
WHERE criteria = 1
GROUP BY group;
I can aggregate the groups using a GROUP BY clause, but that would give me:
group | value
-------------
A | 0
B | 2
or some aggregate function of EACH group. However, I don't want to aggregate the rows!
Subquery
I can also specify the group by subquery:
SELECT *
FROM table
WHERE criteria = 1 AND
group = (
SELECT group
FROM table
WHERE criteria = 1
ORDER BY group ASC
LIMIT 1
);
This works, but as always, subqueries are messy. Particularly, this one requires specifying my WHERE clause for criteria twice. Surely there must be a cleaner way to do this.
You can try following query:-
SELECT *
FROM table
WHERE criteria = 1
AND group = (SELECT MIN(group) FROM table)
ORDER BY value;
If your database supports the WITH clause, try this. It's similar to using a subquery, but you only need to specify the criteria input once. It's also easier to understand what's going on.
with main_query as (
select *
from table
where criteria = 1
order by group, value
),
with min_group as (
select min(group) from main_query
)
select *
from main_query
where group in (select group from min_group);
-- this where clause should be fast since there will only be 1 record in min_group
Use DENSE_RANK()
DECLARE #yourTbl AS TABLE (
[group] NVARCHAR(50),
value INT,
criteria INT
)
INSERT INTO #yourTbl VALUES ( 'A', 0, 1 )
INSERT INTO #yourTbl VALUES ( 'A', 1, 1 )
INSERT INTO #yourTbl VALUES ( 'B', 2, 1 )
INSERT INTO #yourTbl VALUES ( 'B', 3, 1 )
;WITH cte AS
(
SELECT i.* ,
DENSE_RANK() OVER (ORDER BY i.[group]) AS gn
FROM #yourTbl AS i
WHERE i.criteria = 1
)
SELECT *
FROM cte
WHERE gn = 1
group | value | criteria
------------------------
A | 0 | 1
A | 1 | 1

Group rows into sequences using a sliding window on a DateTime column

I have a table that stores timestamped events. I want to group the events into 'sequences' by using 5-min sliding window on the timestamp column, and write the 'sequence ID' (any ID that can distinguish sequences) and 'order in sequence' into another table.
Input - event table:
+----+-------+-----------+
| Id | Name | Timestamp |
+----+-------+-----------+
| 1 | test | 00:00:00 |
| 2 | test | 00:06:00 |
| 3 | test | 00:10:00 |
| 4 | test | 00:14:00 |
+----+-------+-----------+
Desired output - sequence table. Here SeqId is the ID of the starting event, but it doesn't have to be, just something to uniquely identify a sequence.
+---------+-------+----------+
| EventId | SeqId | SeqOrder |
+---------+-------+----------+
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 3 | 2 | 2 |
| 4 | 2 | 3 |
+---------+-------+----------+
What would be the best way to do it? This is MSSQL 2008, I can use SSAS and SSIS if they make things easier.
CREATE TABLE #Input (Id INT, Name VARCHAR(20), Time_stamp TIME)
INSERT INTO #Input
VALUES
( 1 ,'test','00:00:00' ),
( 2 ,'test','00:06:00' ),
( 3 ,'test','00:10:00' ),
( 4 ,'test','00:14:00' )
SELECT * FROM #Input;
WITH cte AS -- add a sequential number
(
SELECT *,
ROW_NUMBER() OVER(ORDER BY Id) AS sort
FROM #Input
), cte2 as -- find the Id's with a difference of more than 5min
(
SELECT cte.*,
CASE WHEN DATEDIFF(MI, cte_1.Time_stamp,cte.Time_stamp) < 5 THEN 0 ELSE 1 END as GrpType
FROM cte
LEFT OUTER JOIN
cte as cte_1 on cte.sort =cte_1.sort +1
), cte3 as -- assign a SeqId
(
SELECT GrpType, Time_Stamp,ROW_NUMBER() OVER(ORDER BY Time_stamp) SeqId
FROM cte2
WHERE GrpType = 1
), cte4 as -- find the Time_Stamp range per SeqId
(
SELECT cte3.*,cte_2.Time_stamp as TS_to
FROM cte3
LEFT OUTER JOIN
cte3 as cte_2 on cte3.SeqId =cte_2.SeqId -1
)
-- final query
SELECT
t.Id,
cte4.SeqId,
ROW_NUMBER() OVER(PARTITION BY cte4.SeqId ORDER BY t.Time_stamp) AS SeqOrder
FROM cte4 INNER JOIN #Input t ON t.Time_stamp>=cte4.Time_stamp AND (t.Time_stamp <cte4.TS_to OR cte4.TS_to IS NULL);
This code is slightly more complex but it returns the expected output (which Gordon Linoffs solution doesn't...) and it's even slightly faster.
You seem to want things grouped together when they are less than five minutes apart. You can assign the groups by getting the previous time stamp and marking the beginning of a group. You then need to do a cumulative sum to get the group id:
with e as (
select e.*,
(case when datediff(minute, prev_timestamp, timestamp) < 5 then 1 else 0 end) as flag
from (select e.*,
(select top 1 e2.timestamp
from events e2
where e2.timestamp < e.timestamp
order by e2.timestamp desc
) as prev_timestamp
from events e
) e
)
select e.eventId, e.seqId,
row_number() over (partition by seqId order b timestamp) as seqOrder
from (select e.*, (select sum(flag) from e e2 where e2.timestamp <= e.timestamp) as seqId
from e
) e;
By the way, this logic is easier to express in SQL Server 2012+ because the window functions are more powerful.