TSQL Make partitions "gaps and island"

TSQL Make partitions "gaps and island" - sql

I need to create partitions. Suppose I have this table:
CREATE TABLE MyTable (Pos INT UNIQUE, X INT)
INSERT INTO MyTable VALUES (3, 2)
INSERT INTO MyTable VALUES (5, 0)
INSERT INTO MyTable VALUES (6, 0)
INSERT INTO MyTable VALUES (9, 0)
INSERT INTO MyTable VALUES (43, 9)
INSERT INTO MyTable VALUES (53, 8)
INSERT INTO MyTable VALUES (56, 0)
INSERT INTO MyTable VALUES (81, 0)
INSERT INTO MyTable VALUES (163, 1)
INSERT INTO MyTable VALUES (9716, 0)
The query result should be this table with a column Y added, Y should be
IF X=0 : the previous value X<>0 (OR NULL, if not exists), ordered by Pos
IF X<>0 : X
Desired answer table looks like this
SELECT *
FROM MyQuery as a function of MyTable
ORDER BY Pos
Pos X Y
3 2 2
5 0 2
6 0 2
9 0 2
43 9 9
53 8 8
56 0 8
81 0 8
163 1 1
9716 0 1

This is a type of gaps-and-islands problem.
There are many solutions, here is one:
Use a running conditional count to number the rows that we want to group together
Use a partitioned conditional MIN to take the only value that we actually want, per group
WITH StartPoints AS (
SELECT *,
GroupId = COUNT(NULLIF(X, 0)) OVER (ORDER BY Pos)
FROM MyTable
)
SELECT
Pos,
X,
Y = MIN(NULLIF(X, 0)) OVER (PARTITION BY GroupId)
FROM StartPoints
ORDER BY Pos;
db<>fiddle

Related

SQL Server : set a row value based on a condition

I don't know what would be the appropriate title for this problem, but here is what I need to accomplish
Here is my dataset:
State TimeInState
--------------------------
1 20
3 0
4 5
8 2
5 10
1 18
3 30
12 2
2 0
What I want is another column in here, lets say FooID. What FooID is a int value that will remain same until the state is 1 again.
So the dataset would look like this:
State TimeInState FooID
------------------------------------------
1 20 1
3 0 1
4 5 1
8 2 1
5 10 1
1 18 2
3 30 2
12 2 2
2 0 2
So if there was another row at the end with State=1 then FooID will be 3 until the next state is changed.
How can I accomplish this in T-SQL?
Thanks in advance.

If you have some way of ordering rows (like an ID of sorts), then here is an example of how you could do something like this:
DECLARE #T TABLE (ID INT IDENTITY(1, 1), State INT, TimeInState INT)
INSERT #T (State, TimeInState)
VALUES (1, 20), (3, 0), (4, 5), (8, 2), (5, 10), (1, 18)
, (3, 30), (12, 2), (2, 0), (1, 1), (1, 1), (2, 1);
WITH CTE AS (
SELECT *
, ROW_NUMBER() OVER (ORDER BY CASE WHEN State = 1 THEN 0 ELSE 1 END, ID) RN
FROM #T
)
SELECT State, TimeInState, Foo.FooID
FROM CTE T
CROSS APPLY (SELECT MAX(RN) FooID FROM CTE WHERE State = 1 AND ID <= T.ID) Foo
ORDER BY ID;
But if you don't have the data ordered in some way already, then I don't think you can ensure the result set will sort the data in the way you want to sort it.

Add column with row number

I want to add a column to my select showing a set of number from say 1 to 4.
Example:
Select * gives me
Id Transaction
1 10
2 11
3 12
4 13
5 14
6 15
I want to add a column called "Flow". The result should be like this.
Id Transaction Flow
1 10 1
2 11 2
3 12 3
4 13 4
5 14 1
6 15 2
In this example the flow is from 1-4. Could be 1-n.
No particular relation between Id and Flow is needed.

If you're using SQL Server or other DBMS that allows ROW_NUMBER, you could do this:
CREATE TABLE #Tbl(Id INT, [Transaction] INT);
INSERT INTO #Tbl VALUES
(1, 10), (2, 11), (3, 12), (4, 13), (5, 14), (6, 15);
DECLARE #N INT = 4;
SELECT *,
Flow = 1 + ((ROW_NUMBER() OVER(ORDER BY Id) - 1) % #N)
FROM #Tbl
DROP TABLE #Tbl;

If you are using mySql.
Query
set #r := 0;
select Id, `Transaction`,
#r := (#r % 4) + 1 as Flow
from your_table_name
order by Id;
Demo
EDIT
Following sql query can be used irrespective of rdbms.
Query
select *, (
select ((count(*) - 1) % 4) + 1 as Flow
from your_table_name t2
where t1.Id >= t2.Id
) as Flow
from your_table_name t1;

Finding Missing Numbers When Data Is Grouped In SQL Server

I need to to write a query that will calculate the missing numbers in a sequence when the data is "grouped". The data in each group is in sequence, but each individual group would have its own sequence. The data would look something like this:
Id| Number|
-----------
1 | 250 |
1 | 270 | <260 Missing
1 | 280 | <290 Missing
1 | 300 |
1 | 310 |
2 | 110 |
2 | 130 | <120 Missing
2 | 140 |
3 | 260 |
3 | 270 |
3 | 290 | <280 Missing
3 | 300 |
3 | 340 | <310, 320 & 330 Missing
I have found a solution based on this post from CELKO here:
http://bytes.com/topic/sql-server/answers/511668-query-find-missing-number
In essence to set up a demo run the following:
CREATE TABLE Sequence
(seq INT NOT NULL
PRIMARY KEY (seq));
INSERT INTO Sequence VALUES (1);
INSERT INTO Sequence VALUES (2);
INSERT INTO Sequence VALUES (3);
INSERT INTO Sequence VALUES (4);
INSERT INTO Sequence VALUES (5);
INSERT INTO Sequence VALUES (6);
INSERT INTO Sequence VALUES (7);
INSERT INTO Sequence VALUES (8);
INSERT INTO Sequence VALUES (9);
INSERT INTO Sequence VALUES (10);
CREATE TABLE Tickets
(buyer CHAR(5) NOT NULL,
ticket_nbr INTEGER DEFAULT 1 NOT NULL
PRIMARY KEY (buyer, ticket_nbr));
INSERT INTO Tickets VALUES ('a', 2);
INSERT INTO Tickets VALUES ('a', 3);
INSERT INTO Tickets VALUES ('a', 4);
INSERT INTO Tickets VALUES ('b', 4);
INSERT INTO Tickets VALUES ('c', 1);
INSERT INTO Tickets VALUES ('c', 2);
INSERT INTO Tickets VALUES ('c', 3);
INSERT INTO Tickets VALUES ('c', 4);
INSERT INTO Tickets VALUES ('c', 5);
INSERT INTO Tickets VALUES ('d', 1);
INSERT INTO Tickets VALUES ('d', 6);
INSERT INTO Tickets VALUES ('d', 7);
INSERT INTO Tickets VALUES ('d', 9);
INSERT INTO Tickets VALUES ('e', 10);
SELECT DISTINCT T1.buyer, S1.seq
FROM Tickets AS T1, Sequence AS S1
WHERE seq <= (SELECT MAX(ticket_nbr) -- set the range
FROM Tickets AS T2
WHERE T1.buyer = T2.buyer)
AND seq NOT IN (SELECT ticket_nbr -- get missing numbers
FROM Tickets AS T3
WHERE T1.buyer = T3.buyer);
CELKO does mention that this is for a small number of tickets, in my example my numbers table is limited to 200 rows with a single column which is a primary key with each row an increment of 10 as that is what I am interested in. I modified CELKOs query as follows (added in min range):
SELECT DISTINCT T1.buyer, S1.seq
FROM Tickets AS T1, Sequence AS S1
WHERE seq <= (SELECT MIN(ticket_nbr) -- set the MIN range
FROM Tickets AS T2
WHERE T1.buyer = T2.buyer)
AND seq <= (SELECT MAX(ticket_nbr) -- set the MAX range
FROM Tickets AS T2
WHERE T1.buyer = T2.buyer)
AND seq NOT IN (SELECT ticket_nbr -- get missing numbers
FROM Tickets AS T3
WHERE T1.buyer = T3.buyer)
ORDER BY buyer, seq;
The output would be those numbers that are missing:
buyer seq
a 1
b 1
b 2
b 3
e 1
e 2
e 3
e 4
e 5
e 6
e 7
e 8
e 9
This works exactly as I want, however, on my data set it is very slow (11 second run time at the moment - it appears to be the DISTINCT which slows things down tremendously and presumably will gt worse as the base data set grows). I have tried all manner of things to make it more efficient but sadly my ambition exceeds my knowledge. Is it possible to make the query above more efficient/faster. My only constraint is that the dataset I am making needs to be a SQL View (as it feeds a report) and will execute on SQL Azure.
Cheers
David

If my understanding is correct, you want to fill in the missing data from the table. The table would consist of ID and a Number which is incremented by 10.
CREATE TABLE Test(
ID INT,
Number INT
)
INSERT INTO Test VALUES
(1, 250), (1, 270), (1, 280), (1, 300), (1, 310),
(2, 110), (2, 130), (2, 140), (3, 260), (3, 270),
(3, 290), (3, 300), (3, 340);
You could do this by using a Tally Table and doing a CROSS JOIN on the Test table:
;WITH E1(N) AS(
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
)
,E2(N) AS(SELECT 1 FROM E1 a, E1 b)
,E4(N) AS(SELECT 1 FROM E2 a, E2 b)
,Tally(N) AS(
SELECT TOP (SELECT MAX(Number)/10 FROM Test)
(ROW_NUMBER() OVER(ORDER BY(SELECT NULL)) - 1) * 10
FROM E4
),
MinMax AS(
SELECT
ID,
Minimum = MIN(Number),
Maximum = MAX(Number)
FROM Test
GROUP BY ID
),
CrossJoined AS(
SELECT
m.ID,
Number = Minimum + t.N
FROM MinMax m
CROSS JOIN Tally t
WHERE
Minimum + t.N <= Maximum
)
SELECT * FROM CrossJoined c
ORDER BY c.ID, c.Number
RESULT
ID Seq
----------- --------------------
1 250
1 260
1 270
1 280
1 290
1 300
1 310
2 110
2 120
2 130
2 140
3 260
3 270
3 280
3 290
3 300
3 310
3 320
3 330
3 340
If you only want to find the missing Number from Test grouped by ID, just replace the final SELECT statement:
SELECT * FROM CrossJoined c
ORDER BY c.ID, c.Number
to:
SELECT c.ID, c.Number
FROM CrossJoined c
WHERE NOT EXISTS(
SELECT 1 FROM Test t
WHERE
t.ID = c.ID
AND t.Number = c.Number
)
ORDER BY c.ID, c.Number
RESULT
ID Number
----------- --------------------
1 260
1 290
2 120
3 280
3 310
3 320
3 330

How to select multi record depending on some column's condition?

Say there is a SQL Server table which contain 2 columns: ID, Value
The sample data looks like this:
ID value
------------------
1 30
1 30
2 50
2 50
3 50
When I run this query:
select ID, NEWID(), value
from table1
order by ID
The result looks like this:
1 30 E152AD19-9920-4567-87FF-C4822FD9E485
1 30 54F28C58-ABA9-4DFB-9A80-CE9C4C390CBB
2 50 ........
2 50 ........
3 50 4E5A9E26-FEEC-4CC7-9AC5-96747053B6B2
But what I want is : how many record of ID depending on (sum of value /30 )'s result, for example of ID 2, it's value's sum is 50+50=100, and 100/30=3, so ID 2 will display in query result three times
The final result i want is like this:
1 E152AD19-9920-4567-87FF-C4822FD9E485
1 54F28C58-ABA9-4DFB-9A80-CE9C4C390CBB
2 4E5A9E26-FEEC-4CC7-9AC5-96747053B6B2
2 ....
2 ....
3 D861563E-E01A-4198-9E92-7BEB4678E5D1
Please note ID of 2 display three times, wait for your helps, thanks.

How about something like
CREATE TABLE Table1
([ID] int, [value] int)
;
INSERT INTO Table1
([ID], [value])
VALUES
(1, 30),
(1, 30),
(2, 50),
(2, 50),
(3, 50)
;
;WITH SummedVals AS (
SELECT ID,
SUM(value) / 30 Cnt
FROM Table1
GROUP BY ID
)
, Vals AS (
SELECT ID,
Cnt - 1 Cnt
FROM SummedVals
UNION ALL
SELECT ID,
Cnt - 1 Cnt
FROM Vals
WHERE Cnt > 0
)
SELECT ID,
NEWID()
FROM Vals
ORDER BY 1
SQL Fiddle DEMO

How can I group a set split by change in a field with respect to an order?

I have a set of records.
ID Value
1 a
2 b
3 b
4 b
5 a
6 a
7 b
8 b
And I would like to group them like so.
MIN(ID) MAX(ID) Value
1 1 a
2 4 b
5 6 a
7 8 b
I'm vaguely aware of oracle over() analytical function which looks to be the right direction, but I don't know what this problem is called much less how to solve it.

Probably an easier way, but this may help to start. I ran it on Postgres, but should work (maybe with a minor tweak) on Oracle. The inner most query puts the previous value on each row. We can use that to detect a grouping change (when value does not equal previous value). Every time there is a group change, we flag it with a "1". Sum these group changes and we now have a group id which increments every time there is a value change. Then we can perform our normal group by function.
create table x(id int, value varchar(1));
insert into x values(1, 'a');
insert into x values(2, 'b');
insert into x values(3, 'b');
insert into x values(4, 'b');
insert into x values(5, 'a');
insert into x values(6, 'a');
insert into x values(7, 'b');
insert into x values(8, 'b');
SELECT MIN(id), MAX(id), value
FROM ( SELECT id
,value
,previous_value
,SUM( CASE WHEN value = previous_value THEN 0 ELSE 1 END ) OVER(ORDER BY id) AS group_id
FROM ( SELECT id
,value
,COALESCE( LAG(value) OVER(ORDER BY id), value ) previous_value
FROM x
ORDER BY id
) y
) z
GROUP BY group_id, value
ORDER BY 1, 2;
min | max | value
-----+-----+-------
1 | 1 | a
2 | 4 | b
5 | 6 | a
7 | 8 | b
(4 rows)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

TSQL Make partitions "gaps and island" - sql

Related

SQL Server : set a row value based on a condition

Add column with row number

Finding Missing Numbers When Data Is Grouped In SQL Server

How to select multi record depending on some column's condition?

How can I group a set split by change in a field with respect to an order?

Categories

Resources