How to select unoccupied ranges between two numbers - sql

Consider I am having below table:
Id | Title | Start | End
-----+--------------+---------+-----
1 | Group A | 100 | 200
-----+--------------+---------+-----
2 | Group B | 350 | 500
-----+--------------+---------+-----
3 | Group C | 600 | 800
I want to get unoccupied ranges between 100 and 999.
my required final result would be:
Id | Start | End
-----+----------+-----
1 | 201 | 349
-----+----------+-----
2 | 501 | 599
-----+----------+-----
3 | 801 | 999

You can use lead() window function to do so.
Select Id, [End]+1 as Start, coalesce((lead(start)over(order by id) -1),999) [End]
from mytable
Since at the last row result of lead() will be null I have used coalesce() to make it 999.
Schema:
create table mytable( Id int, Title varchar(50),[Start] int , [End] int);
insert into mytable values(1, 'Group A', 100, 200);
insert into mytable values(2, 'Group B', 350, 500);
insert into mytable values(3, 'Group C', 600, 800);
Query:
Select Id, [End]+1 as [Start], coalesce((lead([start])over(order by id) -1),999) [End]
from mytable
Output:
Id
Start
End
1
201
349
2
501
599
3
801
999
db<>fiddle here

This is a tricky problem. If I make the following assumptions:
All the values are between 100 and 999.
The values have no overlaps.
Then you can handle this with lead() and union all:
select null, 100, min(starti) - 1
from t
having min(starti) > 100
union all
select title, endi + 1, next_starti - 1
from (select lead(starti, 1, 1000) over (order by starti) as next_starti, t.*
from t
) t
where next_starti >= endi + 1;
Note that the first subquery is for a condition not in your sample data, but where the first value starts after 100.
For the more general solution where you could have overlaps, the simplest method might be to general all possible values, remove the ones that exist, and then combine the adjacent values:
with n as (
select 100 as n
union all
select n + 1
from n
where n < 999
)
select min(n), max(n)
from (select n.*, row_number() over (order by n) as seqnum
from n
where not exists (select 1 from t where n.n between t.starti and t.endi)
) tn
group by (n - seqnum)
order by min(n)
option (maxrecursion 0);
Here is a db<>fiddle.

Related

SQL: Repeat patterns between date range

DECLARE
#startDate date = '2020-07-03'
#endDate date = 2020-07-06'
I have a tabe as below
---------------------------------------------------------
|EmployeeID | EmpName |Pattern | Frequency |
---------------------------------------------------------
| 11 | X | 1,2,3 | 1 |
| 12 | Y | 4,5 | 1 |
| 13 | Y | 1,2 | 3 |
| 14 | Z | 1,2 | 2 |
---------------------------------------------------------
AND I want to generate dates between given date range.
WANT result table as bellows:
--------------------------------
| EmpId | Dates | Pattern |
--------------------------------
| 11 |2020-07-03 | 1 |
| 11 |2020-07-04 | 2 |
| 11 |2020-07-05 | 3 |
| 11 |2020-07-06 | 1 |
| 12 |2020-07-03 | 4 |
| 12 |2020-07-04 | 5 |
| 12 |2020-07-05 | 4 |
| 12 |2020-07-06 | 5 |
| 13 |2020-07-03 | 1 |
| 13 |2020-07-04 | 1 |
| 13 |2020-07-05 | 1 |
| 13 |2020-07-06 | 2 |
| 14 |2020-07-03 | 1 |
| 14 |2020-07-04 | 1 |
| 14 |2020-07-05 | 2 |
| 14 |2020-07-06 | 2 |
Generate the dates as per given date range for each employee and repeat the pattern for each employee as per their pattern and frequency(days).
means as per frequency(days) pattern will change.
What I have acheived :
Able to generate the records for each employees between the given date range.
What I am not able to get:
I am not able to repeat the pattern based on the frequency for each employee between the date range.
I am able achieve everything but need little help while repeating the pattern based on frequency.*
Note:
Data are storing in this way only.. now I won't change existing schema...
I've came up with this. It's basically a splitter, a tally table and some logic.
Joining (Frequency)-Amount of Tally-datasets with the splitted pattern for the correct amount of pattern-values. Sorting them by their position in the pattern-string.
Join everything together and repeat the pattern by using modulo.
DECLARE #t TABLE( EmployeeID INT
, EmpName VARCHAR(20)
, Pattern VARCHAR(255)
, Frequency INT )
DECLARE #startDate DATE = '2020-07-03'
DECLARE #endDate DATE = '2020-07-09'
INSERT INTO #t
VALUES (11, 'X', '1,2,3', 1),
(12, 'Y', '4,5', 1),
(13, 'Y', '1,2', 3),
(14, 'Z', '1,2', 2)
DECLARE #delimiter CHAR(1) = ',';
WITH split(Txt
, i
, elem
, EmployeeID)
AS (SELECT STUFF(Pattern, 1, CHARINDEX(#delimiter, Pattern+#delimiter+'~'), '')
, 1
, CAST(LEFT(Pattern, CHARINDEX(#delimiter, Pattern+#delimiter+'~')-1) AS VARCHAR(MAX))
, EmployeeID
FROM #t
UNION ALL
SELECT STUFF(Txt, 1, CHARINDEX(#delimiter, Txt+#delimiter+'~'), '')
, i + 1
, CAST(LEFT(Txt, CHARINDEX(#delimiter, Txt+#delimiter+'~')-1) AS VARCHAR(MAX))
, EmployeeID
FROM split
WHERE Txt > ''),
E1(N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 AS a, E1 AS b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 AS a, E2 AS b), --10E+4 or 10,000 rows
E8(N) AS (SELECT 1 FROM E4 AS a , E4 AS b), --10E+8 or 100,000,000 rows
PatternXFrequency(EmployeeID
, Sort
, elem)
AS (SELECT split.EmployeeID
, ROW_NUMBER() OVER(PARTITION BY split.EmployeeID ORDER BY i) - 1
, elem
FROM split
INNER JOIN #t AS t ON t.EmployeeID = split.EmployeeID
CROSS APPLY (SELECT TOP (t.Frequency) 1
FROM E8
) AS Freq(Dummy))
SELECT EmployeeID
, DATEADD(DAY, i_count, #startDate) AS Dates
, elem
FROM (SELECT DATEDIFF(DAY, #startDate, #endDate) + 1) AS t_datediff(t_days)
CROSS APPLY (SELECT TOP (t_days) ROW_NUMBER() OVER(ORDER BY (SELECT 0) ) - 1 FROM E8
) AS t_dateadd(i_count)
CROSS APPLY (SELECT PatternXFrequency.*
FROM (SELECT DISTINCT EmployeeID FROM #t) AS t(EmpID)
CROSS APPLY (SELECT COUNT(Sort)
FROM PatternXFrequency
WHERE EmployeeID = EmpID
) AS EmpPattern(sortCount)
CROSS APPLY (SELECT *
FROM PatternXFrequency
WHERE EmployeeID = EmpID
AND Sort = ((i_count % sortCount))
) AS PatternXFrequency
) AS t
ORDER BY t.EmployeeID
, Dates
This isn't particularly pretty, but it avoids the recursion of a rCTE, so should provide a faster experience. As STRING_SPLIT still doesn't know what ordinal position means, we have to use something else here; I use DelimitedSplit8k_LEAD.
I also assume your expected results are wrong, as they stop short of your end date (20200709). This results in the below:
CREATE TABLE dbo.YourTable (EmployeeID int,
EmpName char(1),
Pattern varchar(8000), --This NEEDS fixing
Frequency tinyint);
INSERT INTO dbo.YourTable
VALUES(11,'X','1,2,3',1),
(12,'Y','4,5',1),
(13,'Y','1,2',3),
(14,'Z','1,2',2);
GO
DECLARE #StartDate date = '20200703',
#EndDate date = '20200709';
WITH CTE AS(
SELECT *,
MAX(ItemNumber) OVER (PARTITION BY EmployeeID) AS MaxItemNumber
FROM dbo.YourTable YT
CROSS APPLY dbo.DelimitedSplit8K_LEAD(YT.Pattern,',') DS),
N AS(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
Tally AS(
SELECT TOP (SELECT DATEDIFF(DAY,#startDate, #EndDate)+1)
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) - 1 AS I
FROM N N1, N N2, N N3) --1000 Rows
SELECT C.EmployeeID,
DATEADD(DAY,T.I, #StartDate),
C.Item
FROM CTE C
JOIN Tally T ON ISNULL(NULLIF((T.I +1) % C.MaxItemNumber,0),C.MaxItemNumber) = C.ItemNumber
ORDER BY EmployeeID,
T.I;
GO
DROP TABLE dbo.YourTable;
Like mentioned in the comments fix your data model.
Your output pattern is a little bit strange.
But is it something like this you are looking for?
DECLARE #startDate date = '2020-07-03'
DECLARE #endDate date = '2020-07-09'
DECLARE #Dates TABLE([Date] Date)
;WITH seq(n) AS
(
SELECT 0 UNION ALL SELECT n + 1 FROM seq
WHERE n < DATEDIFF(DAY, #StartDate, #endDate)
)
INSERT INTO #Dates ([Date])
SELECT DATEADD(Day,n, cast(GetDate() as date)) Date
FROM seq
ORDER BY n
OPTION (MAXRECURSION 0);
SELECT e.EmployeeId, d.Date, x.Value Pattern
FROM Employee e
CROSS APPLY STRING_SPLIT(e.Pattern, ',') x
INNER JOIN #Dates d on 1=1
-- Correct for the first iteration of the pattern
AND DATEDIFF(DAY, DATEADD(DAY, -1, #StartDate), d.Date) = x.Value

SQL query to search first rows until sum = value and skip big value that can exceed the value

I have a table
id | amount
---+--------
1 | 500
2 | 300
3 | 750
4 | 200
5 | 500
I want to select rows ascending until the sum is 1000 or until all rows are searched (and skip a big value (750) that can exceed 1000).
How can I do query to return some rows like below?
Thanks for help
id | amount
---+--------
1 | 500
2 | 300
4 | 200
I think that you need a common table expression for this.
The idea is to do a cumulative sum that skips the rows that would cause the sum to go above 1000 (aliased sm in the CTE), and to flag the records to skip (aliased keep in the CTE). Then the outer query just filters on the flag.
with recursive cte as (
select
id,
amount,
case when amount > 1000 then 0 else amount end sm,
case when amount > 1000 then 0 else 1 end keep
from mytable
where id = 1
union all
select
t.id,
t.amount,
case when c.sm + t.amount > 1000 then c.sm else c.sm + t.amount end,
case when c.sm + t.amount > 1000 then 0 else 1 end
from cte c
inner join mytable t on t.id = c.id + 1
)
select id, amount from cte where keep = 1 order by id
Demo on DB Fiddle:
id | amount
-: | -----:
1 | 500
2 | 300
4 | 200
you should get the expected result using a recursively common table expression..
doing something like this..
with RECURSIVE yourtableOrdered as (select row_number() over (order by id) row_num, id, val from (values (1, 500), (2, 300), (3, 750), (4, 200), (5, 500)) V (id, val)),
lineSum as (
select row_num, id, val,
case when val <= 1000 then val else 0 end totalSum,
case when val <= 1000 then true else false end InResult
from yourtableOrdered
where row_num = 1
union all
select y.row_num, y.id, y.val,
case when previousLine.totalSum + y.val <= 1000 then previousLine.totalSum + y.val else previousLine.totalSum end totalSum,
case when previousLine.totalSum + y.val <= 1000 then true else false end InResult
from yourtableOrdered y
inner join lineSum previousLine
on y.row_num = previousLine.row_num + 1
),
yourExpectedResult as (
select * from lineSum where InResult = true
)
select * from yourExpectedResult
see a working sample in
http://sqlfiddle.com/#!17/2cbcf/1/0
Use a cumulative sum:
select t.*
from (select t.*,
sum(amount) over (order by id) as running_amount
from t
) t
where running_amount - amount < 1000;

Group by range of values in bigquery

Is there any way in Bigquery to group by not the absolute value but a range of values?
I have a query that looks in a product table with 4 different numeric group by's.
What I am looking for is an efficient way to group by in a way like:
group by "A±1000" etc. or "A±10%ofA".
thanks in advance,
You can generate a column as a "named range" then group by the column. As an example for your A+-1000 case:
with data as (
select 100 as v union all
select 200 union all
select 2000 union all
select 2100 union all
select 2200 union all
select 4100 union all
select 8000 union all
select 8000
)
select count(v), ARRAY_AGG(v), ranges
FROM data, unnest([0, 2000, 4000, 6000, 8000]) ranges
WHERE data.v >= ranges - 1000 AND data.v < ranges + 1000
GROUP BY ranges
Output:
+-----+------------------------+--------+
| f0_ | f1_ | ranges |
+-----+------------------------+--------+
| 2 | ["100","200"] | 0 |
| 3 | ["2000","2100","2200"] | 2000 |
| 1 | ["4100"] | 4000 |
| 2 | ["8000","8000"] | 8000 |
+-----+------------------------+--------+
Below example is for BigQuery Standard SQL
#standardSQL
WITH `project.dataset.example` AS (
SELECT * FROM
UNNEST([STRUCT<id INT64, price FLOAT64>
(1, 15), (2, 50), (3, 125), (4, 150), (5, 175), (6, 250)
])
)
SELECT
CASE
WHEN price > 0 AND price <= 100 THEN ' 0 - 100'
WHEN price > 100 AND price <= 200 THEN '100 - 200'
ELSE '200+'
END AS range_group,
COUNT(1) AS cnt
FROM `project.dataset.example`
GROUP BY range_group
-- ORDER BY range_group
with result
Row range_group cnt
1 0 - 100 2
2 100 - 200 3
3 200+ 1
As you can see, in above solution you need construct CASE statement to reflect your ranges - if you have multiple - this can be quite boring - so below is more generic (but more verbose) solution - and it uses recently introduced RANGE_BUCKET function
#standardSQL
WITH `project.dataset.example` AS (
SELECT * FROM
UNNEST([STRUCT<id INT64, price FLOAT64>
(1, 15), (2, 50), (3, 125), (4, 150), (5, 175), (6, 250)
])
), ranges AS (
SELECT [100.0, 200.0] ranges_array
), temp AS (
SELECT OFFSET, IF(prev_val = val, CONCAT(prev_val, ' - '), CONCAT(prev_val, ' - ', val)) rng FROM (
SELECT OFFSET, IFNULL(CAST(LAG(val) OVER(ORDER BY OFFSET) AS STRING), '') prev_val, CAST(val AS STRING) AS val
FROM ranges, UNNEST(ARRAY_CONCAT(ranges_array, [ARRAY_REVERSE(ranges_array)[OFFSET(0)]])) val WITH OFFSET
)
)
SELECT
RANGE_BUCKET(price, ranges_array) range_group,
rng,
COUNT(1) AS cnt
FROM `project.dataset.example`, ranges
JOIN temp ON RANGE_BUCKET(price, ranges_array) = OFFSET
GROUP BY range_group, rng
-- ORDER BY range_group
with result
Row range_group rng cnt
1 0 - 100 2
2 1 100 - 200 3
3 2 200 - 1
As you can see, in second solution you need to define your your ranges in ranges as simple array enlisting your boundaries as SELECT [100.0, 200.0] ranges_array
Then temp does all needed calculation
You can do math operations on the GROUP BY, creating groups by any arbitrary criteria.
For example:
WITH data AS (
SELECT repo.name, COUNT(*) price
FROM `githubarchive.month.201909`
GROUP BY 1
HAVING price>100
)
SELECT FORMAT('range %i-%i', MIN(price), MAX(price)) price_range, COUNT(*) c
FROM data
GROUP BY CAST(LOG(price) AS INT64)
ORDER BY MIN(price)

SQL Order By On two columns but same prority

I'm stuck on this simple select and don't know what to do.
I Have this:
ID | Group
===========
1 | NULL
2 | 100
3 | 100
4 | 100
5 | 200
6 | 200
7 | 100
8 | NULL
and want this:
ID | Group
===========
1 | NULL
2 | 100
3 | 100
4 | 100
7 | 100
5 | 200
6 | 200
8 | NULL
all group members keep together, but others order by ID.
I can not write this script because of that NULL records. NULL means that there is not any group for this record.
First you want to order your rows by the minimum ID of their group - or their own ID in case they belong to no group.Then you want to order by ID. That is:
order by min(id) over (partition by case when grp is null then id else grp end), id
If IDs and groups can overlap (i.e. the same number can be used for an ID and for a group, e.g. add a record for ID 9 / group 1 to your sample data) you should change the partition clause to something like
order by min(id) over (partition by case when grp is null
then 'ID' + cast(id as varchar)
else 'GRP' + cast(grp as varchar) end),
id;
Rextester demo: http://rextester.com/GPHBW5600
What about data after a null? In a comment you said don't sort the null.
declare #T table (ID int primary key, grp int);
insert into #T values
(1, NULL)
, (3, 100)
, (5, 200)
, (6, 200)
, (7, 100)
, (8, NULL)
, (9, 200)
, (10, 100)
, (11, NULL)
, (12, 150);
select ttt.*
from ( select tt.*
, sum(ff) over (order by tt.ID) as sGrp
from ( select t.*
, iif(grp is null or lag(grp) over (order by id) is null, 1, 0) as ff
from #T t
) tt
) ttt
order by ttt.sGrp, ttt.grp, ttt.id
ID grp ff sGrp
----------- ----------- ----------- -----------
1 NULL 1 1
3 100 1 2
7 100 0 2
5 200 0 2
6 200 0 2
8 NULL 1 3
10 100 0 4
9 200 1 4
11 NULL 1 5
12 150 1 6

How to write a query to allow null in minimum function

I need to write a query to get minimum values for a column from a table and if the value is null then I want to include that row. I wrote following query but it ignores the null values. How I can modify this query to include null values in the result?
select * from TABLE where COLUMN = (select min(COLUMN) from TABLE );
If the table is like below
|ID | VALUE | NAME
101 1 John
101 null John
102 1 Bill
103 1 Tina
103 null Tina
104 null James
Result Should be
|ID | VALUE | NAME
101 1 John
102 1 Bill
103 1 Tina
104 null James
You need distinct on:
with my_table(id, value, name) as (
values
(101, 1, 'John'),
(101, null, 'John'),
(102, 1, 'Bill'),
(103, 1, 'Tina'),
(103, null, 'Tina'),
(104, null, 'James')
)
select distinct on (id) *
from my_table
order by id, value
id | value | name
-----+-------+-------
101 | 1 | John
102 | 1 | Bill
103 | 1 | Tina
104 | | James
(4 rows)
Distinct on is a fantastic feature specific for Postgres. An alternative in other RDBMS may be:
select t.id, t.value, t.name
from my_table t
join (
select id, min(value) as value
from my_table
group by id
) u on u.id = t.id and u.value is not distinct from t.value;
Note, you should use is not distinct from because value may be null.
SQL SERVER
select DISTINCT j.ID,j.VALUE,j.NAME from Table1 j
join (
select id, MIN(VALUE) VALUE from Table1
group by id
) as t
on t.ID = j.ID and (t.VALUE = j.VALUE or t.VALUE is null)
You cannot do an equals (=) for a null value, you have to check is null or so. So one simple solution is to default the null value to a number that would not otherwise be used:
select * from TABLE where coalesce(COLUMN, -9999) = (select min(coalesce(COLUMN,-9999)) from TABLE );
The coalesce function returns the first non-null value passed to it.
with c as (
select column as c
from table
order by column nulls first
limit 1
)
select *
from table cross join c
where column = c or column is null
If you want to user order by:
select t.*
from t
order by t.column asc nulls first
limit 1;
Alternatively, use rank():
select t.*
from (select t.*,
rank() over (order by col asc nulls first) as seqnum
from t
) t
where seqnum = 1;
I hope this solve your problem.
SELECT id,
CASE WHEN MIN(
CASE WHEN value IS NULL THEN 0 ELSE 1 END) = 0 THEN null
ELSE MIN(value) END
FROM tableName
GROUP BY id
or using COALESCE.
SELECT id,
CASE WHEN MIN(COALESCE(value, 0)) = 0 THEN null
ELSE MIN(value) END
FROM tableName
GROUP BY id
I am on mobile phone now, so I cannot test.