SQL - Generate "missing rows" with a select

SQL - Generate "missing rows" with a select - sql

This question relates to SQL 2012 -
Lets say I have 3 rows generated as follows:
Start Position = 10
End Position = 13
Value = 100
Start position = 14
End Position = 14
Value = 250
Start Position = 15
End Position = 25
Value = 300
on 3 rows ..
Is there a way I can force SQL to write the output:
10 - 100
11 - 100
12 - 100
13 - 100
14 - 250
15 - 300
16 - 300
etc and so on and so forth
Been wracking the brains but cant work out an easy way to do it
Thanks a lot
J

You can do this with a recursive CTE or a numbers table. Assuming the gaps are no more than a few hundred or thousand:
with n as (
select row_number() over (order by (select null)) - 1 as n
from master.spt_values
)
select (t.startpos + n.n) as position, value
from t join
n
on t.startpos + n.n <= t.endpos;

No database is complete without its table of numbers! Instructions on how to create one are all over the net, here is an example:
Create a numbers table
I have a numbers table in my database, its called t_numbers, it has a single column "n" with a row for each number starting from 1. It goes up to 999,999 but it takes up very little space on disk. Once you have that, you can write something like this:
set up a bit of data to use first
declare #Rows table
(
StartPos int,
EndPos int,
Value int
)
insert into #Rows values (10, 13, 100), (14, 14, 250), (15, 25, 300)
If you want don't want gaps for nulls use an inner join
select n.n, Value
from t_numbers n
inner join #Rows r on n.n >= StartPos and n.n <= EndPos
If you want the gaps then left join, but limit the return with a where clause
select n.n, Value
from t_numbers n
left join #Rows r on n.n >= StartPos and n.n <= EndPos
where n <= (select MAX(EndPos) from #Rows)

Thankyou people! that did it
In the end I created a table of numbers (thankyou guys)
+ said
CROSS JOIN MyNumbers
WHERE T.Real_Position between T.triangle_start_position and T.triangle_end_position
This gave me the exact resultset I was looking for

Related

Out of range integer: infinity

So I'm trying to work through a problem thats a bit hard to explain and I can't expose any of the data I'm working with but what Im trying to get my head around is the error below when running the query below - I've renamed some of the tables / columns for sensitivity issues but the structure should be the same
"Error from Query Engine - Out of range for integer: Infinity"
WITH accounts AS (
SELECT t.user_id
FROM table_a t
WHERE t.type like '%Something%'
),
CTE AS (
SELECT
st.x_user_id,
ad.name as client_name,
sum(case when st.score_type = 'Agility' then st.score_value else 0 end) as score,
st.obs_date,
ROW_NUMBER() OVER (PARTITION BY st.x_user_id,ad.name ORDER BY st.obs_date) AS rn
FROM client_scores st
LEFT JOIN account_details ad on ad.client_id = st.x_user_id
INNER JOIN accounts on st.x_user_id = accounts.user_id
--WHERE st.x_user_id IN (101011115,101012219)
WHERE st.obs_date >= '2020-05-18'
group by 1,2,4
)
SELECT
c1.x_user_id,
c1.client_name,
c1.score,
c1.obs_date,
CAST(COALESCE (((c1.score - c2.score) * 1.0 / c2.score) * 100, 0) AS INT) AS score_diff
FROM CTE c1
LEFT JOIN CTE c2 on c1.x_user_id = c2.x_user_id and c1.client_name = c2.client_name and c1.rn = c2.rn +2
I know the query works for sure because when I get rid of the first CTE and hard code 2 id's into a where clause i commented out it returns the data I want. But I also need it to run based on the 1st CTE which has ~5k unique id's
Here is a sample output if i try with 2 id's:
Based on the above number of row returned per id I would expect it should return 5000 * 3 rows = 150000.
What could be causing the out of range for integer error?

This line is likely your problem:
CAST(COALESCE (((c1.score - c2.score) * 1.0 / c2.score) * 100, 0) AS INT) AS score_diff
When the value of c2.score is 0, 1.0/c2.score will be infinity and will not fit into an integer type that you’re trying to cast it into.
The reason it’s working for the two users in your example is that they don’t have a 0 value for c2.score.
You might be able to fix this by changing to:
CAST(COALESCE (((c1.score - c2.score) * 1.0 / NULLIF(c2.score, 0)) * 100, 0) AS INT) AS score_diff

How to Read Data Number by Number

I have a field that contains numbers such as the examples below in #Numbers. Each number within each row in #Numbers relates
to many different values that are contained within the #Area table.
I need to make a relationship from #Numbers to #Area using each number within each row.
CREATE TABLE #Numbers
(
Number int
)
INSERT INTO #Numbers
(
Number
)
SELECT 102 UNION
SELECT 1 UNION
SELECT 2 UNION
select * from #Numbers
CREATE TABLE #Area
(
Number int,
Area varchar(50)
)
INSERT INTO #Area
(
Number,
Area
)
SELECT 0,'Area1' UNION
SELECT 1,'Area2' UNION
SELECT 1,'Area3' UNION
SELECT 1,'Area5' UNION
SELECT 1,'Area8' UNION
SELECT 1,'Area9' UNION
SELECT 2,'Area12' UNION
SELECT 2,'Area43' UNION
SELECT 2,'Area25' UNION
select * from #Area
It would return the following for 102:
102,Area2
102,Area3
102,Area5
102,Area8
102,Area9
102,Area1
102,Area12
102,Area43
102,Area25
For 1 it would return:
1,Area2
1,Area3
1,Area5
1,Area8
1,Area9
For 2 it would return:
2,Area12
2,Area43
2,Area25
Note how the numbers match up to the individual Areas and return the values accordingly.

Well, the OP marked an answer already, which even got votes. Maybe he will not read this, but here is another option using direct simple select, which (according to the EP) seems like using a lot less resources:
SELECT *
FROM #Numbers t1
LEFT JOIN #Area t2 ON CONVERT(VARCHAR(10), t1.Number) like '%' + CONVERT(CHAR(1), t2.Number) + '%'
GO
Note! According to Execution Plan this solution uses only 27% while the selected answer (written by Squirrel) uses 73%, but Execution Plan can be misleading sometimes and you should check IO and TIME statistics as well using the real table structure and real data.

looks like you need to extract individual digit from #Number and then used it to join to #Area
; with tally as
(
select n = 1
union all
select n = n + 1
from tally
where n < 10
)
select n.Number, a.Area
from #Numbers n
cross apply
(
-- here it convert n.Number to string
-- then extract 1 digit
-- and finally convert back to integer
select num = convert(int,
substring(convert(varchar(10), n.Number),
t.n,
1)
)
from tally t
where t.n <= len(convert(varchar(10), n.Number))
) d
inner join #Area a on d.num = a.Number
order by n.Number
or if you prefer to do it in arithmetic and not string
; with Num as
(
select Number, n = 0, Num = Number / power(10, 0) % 10
from #Numbers
union all
select Number, n = n + 1, Num = Number / power(10, n + 1) % 10
from Num
where Number > power(10, n + 1)
)
select n.Number, a.Area
from Num n
inner join #Area a on n.Num = a.Number
order by n.Number

Here is my idea. In theory, it should work.
Have a table (temp or permanent) with the values and it's translation
I.E.
ID value
1 Area1, Area2, Area7, Area8, Area15
2 Area28, Area35
etc
Take each row and put a some special character between each number. Use a function like string_split with that character to turn it into a column of values.
e.g 0123 will then be something like 0|1|2|3 and when you run that through string_split you would get
0
1
2
3
Now join each value to your lookup table and return the Value.
Now you have a row with all the values that you want. Use another function like STUFF FOR XML and put those values back into a single column.
This doesn't sound very efficient.. but this is one way of achieving what you desire..
Another is to do a replace().. but that would be very messy!

Create a third table called n which contains a single column also called n that contains integers from 1 to the maximum number of digits in your number. Make it 1000 if you like, doesn't matter. Then:
select #numbers.number, substring(convert(varchar,#numbers.number),n,1) as chr, Area
from #numbers
join n on n>0 and n <=len(convert(varchar,number))
join #area on #area.number=substring(convert(varchar,#numbers.number),n,1)
The middle column chr is just there to show you what it's doing, and would be removed from the final result.

Find overlapping range in PL/SQL

Sample data below
id start end
a 1 3
a 5 6
a 8 9
b 2 4
b 6 7
b 9 10
c 2 4
c 6 7
c 9 10
I'm trying to come up with a query that will return all the overlap start-end inclusive between a, b, and c (but extendable to more). So the expected data will look like the following
start end
2 3
6 6
9 9
The only way I can picture this is with a custom aggregate function that tracks the current valid intervals then computes the new intervals during the iterate phase. However I can't see this approach being practical when working with large datasets. So if some bright mind out there have a query or some innate function that I'm not aware of I would greatly appreciate the help.

You can do this using aggregation and a join. Assuming no internal overlaps for "a" and "b":
select greatest(ta.start, tb.start) as start,
least(ta.end, tb.end) as end
from t ta join
t tb
on ta.start <= tb.end and ta.end >= tb.start and
ta.id = 'a' and tb.id = 'b';

This is a lot uglier and more complex than Gordon's solution, but I think it gives the expected answer better and should extend to work with more ids:
WITH NUMS(N) AS ( --GENERATE NUMBERS N FROM THE SMALLEST START VALUE TO THE LARGEST END VALUE
SELECT MIN("START") N FROM T
UNION ALL
SELECT N+1 FROM NUMS WHERE N < (SELECT MAX("END") FROM T)
),
SEQS(N,START_RANK,END_RANK) AS (
SELECT N,
CASE WHEN IS_START=1 THEN ROW_NUMBER() OVER (PARTITION BY IS_START ORDER BY N) ELSE 0 END START_RANK, --ASSIGN A RANK TO EACH RANGE START
CASE WHEN IS_END=1 THEN ROW_NUMBER() OVER (PARTITION BY IS_END ORDER BY N) ELSE 0 END END_RANK --ASSIGN A RANK TO EACH RANGE END
FROM (
SELECT N,
CASE WHEN NVL(LAG(N) OVER (ORDER BY N),N) + 1 <> N THEN 1 ELSE 0 END IS_START, --MARK N AS A RANGE START
CASE WHEN NVL(LEAD(N) OVER (ORDER BY N),N) -1 <> N THEN 1 ELSE 0 END IS_END /* MARK N AS A RANGE END */
FROM (
SELECT DISTINCT N FROM ( --GET THE SET OF NUMBERS N THAT ARE INCLUDED IN ALL ID RANGES
SELECT NUMS.*,T.*,COUNT(*) OVER (PARTITION BY N) N_CNT,COUNT(DISTINCT "ID") OVER () ID_CNT
FROM NUMS
JOIN T ON (NUMS.N >= T."START" AND NUMS.N <= T."END")
) WHERE N_CNT=ID_CNT
)
) WHERE IS_START + IS_END > 0
)
SELECT STARTS.N "START",ENDS.N "END" FROM SEQS STARTS
JOIN SEQS ENDS ON (STARTS.START_RANK=ENDS.END_RANK AND STARTS.N <= ENDS.N) ORDER BY "START"; --MATCH CORRESPONDING RANGE START/END VALUES
First we generate all the numbers between the smallest start value and the largest end value.
Then we find the numbers that are included in all the provided "id" ranges by joining our generated numbers to the ranges, and selecting each number "n" that appears once for each "id".
Then we determine whether each of these values "n" starts or ends a range. To determine that, for each N we say:
If the previous value of N does not exist or is not 1 less than current N, current N starts a range. If the next value of N does not exist or is not 1 greater than current N, current N ends a range.
Next, we assign a "rank" to each start and end value so we can match them up.
Finally, we self-join where the ranks match (and where the start <= the end) to get our result.
EDIT: After some searching, I came across this question which shows a better way to find the start/ends and refactored the query to:
WITH NUMS(N) AS ( --GENERATE NUMBERS N FROM THE SMALLEST START VALUE TO THE LARGEST END VALUE
SELECT MIN("START") N FROM T
UNION ALL
SELECT N+1 FROM NUMS WHERE N < (SELECT MAX("END") FROM T)
)
SELECT MIN(N) "START",MAX(N) "END" FROM (
SELECT N,ROW_NUMBER() OVER (ORDER BY N)-N GRP_ID
FROM (
SELECT DISTINCT N FROM ( --GET THE SET OF NUMBERS N THAT ARE INCLUDED IN ALL ID RANGES
SELECT NUMS.*,T.*,COUNT(*) OVER (PARTITION BY N) N_CNT,COUNT(DISTINCT "ID") OVER () ID_CNT
FROM NUMS
JOIN T ON (NUMS.N >= T."START" AND NUMS.N <= T."END")
) WHERE N_CNT=ID_CNT
)
)
GROUP BY GRP_ID ORDER BY "START";

Query Split string into rows

I have a table that looks like this:
ID Value
1 1,10
2 7,9
I want my result to look like this:
ID Value
1 1
1 2
1 3
1 4
1 5
1 6
1 7
1 8
1 9
1 10
2 7
2 8
2 9
I'm after both a range between 2 numbers with , as the delimiter (there can only be one delimiter in the value) and how to split this into rows.

Splitting the comma separated numbers is a small part of this problem. The parsing should be done in the application and the range stored in separate columns. For more than one reason: Storing numbers as strings is a bad idea. Storing two attributes in a single column is a bad idea. And, actually, storing unsanitized user input in the database is also often a bad idea.
In any case, one way to generate the list of numbers is to use a recursive CTE:
with t as (
select t.*, cast(left(value, charindex(',', value) - 1) as int) as first,
cast(substring(value, charindex(',', value) + 1, 100) as int) as last
from table t
),
cte as (
select t.id, t.first as value, t.last
from t
union all
select cte.id, cte.value + 1, cte.last
from cte
where cte.value < cte.last
)
select id, value
from cte
order by id, value;
You may need to fiddle with the value of MAXRECURSION if the ranges are really big.

Any table that a field with multiple values such as this is a problem in terms of design. The only way to deal with these records as it is is to split the values on the delimiter and put them into a temporary table, implement custom splitting code, integrate a CTE as noted, or redesign the original table to put the comma-delimited fields into separate fields, eg
ID LOWLIMIT HILIMIT
1 1 10

similar with Gordon Linoff variant, but has some difference
--create temp table for data sample
DECLARE #Yourdata AS TABLE ( id INT, VALUE VARCHAR(20) )
INSERT #Yourdata
( id, VALUE )
VALUES ( 1, '1,10' ),
( 2, '7,9' )
--final query
;WITH Tally
AS ( SELECT MIN(CONVERT(INT, SUBSTRING(y.VALUE, 1, CHARINDEX(',', y.value) - 1))) AS MinV ,
MAX(CONVERT(INT, SUBSTRING(y.VALUE, CHARINDEX(',', y.value) + 1, 18))) AS MaxV
FROM #yourdata AS y
UNION ALL
SELECT MinV = MinV + 1 , MaxV
FROM Tally
WHERE MinV < Maxv
)
SELECT y.id , t.minV AS value
FROM #yourdata AS y
JOIN tally AS t ON t.MinV BETWEEN CONVERT(INT, SUBSTRING(y.VALUE, 1, CHARINDEX(',', y.value) - 1))
AND CONVERT(INT, SUBSTRING(y.VALUE, CHARINDEX(',', y.value) + 1, 18))
ORDER BY id, minV
OPTION ( MAXRECURSION 999 ) --change it if required
output

Get all missing numbers in the sequence

The numbers are originally alpha numeric so I have a query to parse out the numbers:
My query here gives me a list of numbers:
select distinct cast(SUBSTRING(docket,7,999) as INT) from
[DHI_IL_Stage].[dbo].[Violation] where InsertDataSourceID='40' and
ViolationCounty='Carroll' and SUBSTRING(docket,5,2)='TR' and
LEFT(docket,4)='2011' order by 1
Returns the list of numbers parsed out.
For example, the number will be 2012TR557. After using the query it will be 557.
I need to write a query that will give back the missing numbers in a sequence.

Here is one approach
The following should return one row for each sequence of missing numbers. So, if you series is 3, 5, 6, 9, then it should return:
4 4
7 8
The query is:
with nums as (
select distinct cast(SUBSTRING(docket, 7, 999) as INT) as n,
row_number() over (order by cast(SUBSTRING(docket, 7, 999) as INT)) as seqnum
from [DHI_IL_Stage].[dbo].[Violation]
where InsertDataSourceID = '40' and
ViolationCounty = 'Carroll' and
SUBSTRING(docket,5,2) = 'TR' and
LEFT(docket, 4) = '2011'
)
select (nums_prev.n + 1) as first_missing, nums.n - 1 as last_missing
from nums left outer join
nums nums_prev
on nums.seqnum = nums_prev.seqnum + 1
where nums.n <> nums_prev.n + 1 ;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL - Generate "missing rows" with a select - sql

Thankyou people! that did it In the end I created a table of numbers (thankyou guys) + said CROSS JOIN MyNumbers WHERE T.Real_Position between T.triangle_start_position and T.triangle_end_position This gave me the exact resultset I was looking for

Related

Out of range integer: infinity

How to Read Data Number by Number

Find overlapping range in PL/SQL

Query Split string into rows

Get all missing numbers in the sequence

Categories

Resources