How to select pairs of rows from a table in sqlite? - sql

I have the following schema:
CREATE TABLE video_segments (
video_id TEXT,
segment_num INTEGER,
data BLOB
)
I want to select 100 pairs of rows from the table, such that each pair contains 2 segments from the same video, and no 2 pairs are from the same video.
What would be the best query for this?
I tried doing:
WITH CTE AS (
SELECT video_id
FROM video_segments
GROUP BY video_id
HAVING COUNT(*) >= 2
)
SELECT *
FROM video_segments
WHERE video_id IN (
SELECT video_id
FROM CTE
ORDER BY RANDOM()
LIMIT 100
)
ORDER BY RANDOM()
LIMIT 200;
but this did not work because sometimes there would be 0 rows with a given video id, and sometimes there would be more than 2 rows with a video id.
Sampled data:
video_id|segment_num|data|
foo 0 <bin>
foo 1 <bin>
foo 2 <bin>
foo 3 <bin>
bar 0 <bin>
bar 1 <bin>
baz 0 <bin>
baz 1 <bin>
baz 2 <bin>
Lets say I wanted to select 3 random pairs. A valid result might be:
foo 0 <bin>
foo 2 <bin>
bar 0 <bin>
bar 1 <bin>
baz 0 <bin>
baz 2 <bin>
since results should be random. The result should be different each time.

Something like this should do:
The fiddle
Assume vid <= 10 are the videos chosen and
We wish to select 2 sid's per selected vid.
Randomly assign a row number to each sid within each vid separately.
Then pick the first 2 row numbers (ord < 3) for each of those vids
Note: The first recursive CTE term is only to create test data for segments.
WITH segments (sid, vid) AS (
SELECT 0 , 1 UNION ALL
SELECT sid+1, 1 + (sid+1)/10 FROM segments WHERE sid < 100
)
, cte1 AS (
SELECT t.*, row_number() OVER (PARTITION BY vid ORDER BY random()) AS ord
FROM segments AS t
WHERE vid <= 10
)
SELECT * FROM cte1
WHERE ord < 3
ORDER BY vid, ord
;
Now applying that to your schema and initial logic, we have something like this:
The updated fiddle
WITH cte AS (
SELECT video_id
FROM video_segments
GROUP BY video_id
HAVING COUNT(*) >= 2
)
, cte1 AS (
SELECT *, row_number() OVER (PARTITION BY video_id ORDER BY random()) AS ord
FROM video_segments
WHERE video_id IN (
SELECT video_id
FROM cte
ORDER BY RANDOM()
LIMIT 100
)
)
SELECT * FROM cte1
WHERE ord < 3
ORDER BY video_id, ord
;
Fiddle updated with the new data
Result 1:
video_id
segment_num
data
ord
bar
1
null
1
bar
0
null
2
baz
0
null
1
baz
2
null
2
foo
1
null
1
foo
0
null
2
Result 2:
video_id
segment_num
data
ord
bar
1
null
1
bar
0
null
2
baz
1
null
1
baz
0
null
2
foo
2
null
1
foo
3
null
2
etc.
The larger set of data generated in the first fiddle test case shows the random behavior a bit better.

Use window function ROW_NUMBER() with random ordering:
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY video_id ORDER BY RANDOM()) rn,
COUNT(*) OVER (PARTITION BY video_id) cnt
FROM video_segments
)
SELECT video_id, segment_num, data
FROM cte
WHERE cnt >= 2 AND rn <= 2
ORDER BY video_id;
See the demo.

Related

How to sample from different values in a column but only return records that are unique from another column?

I am struggling with a sampling issue using Teradata
Below is the format of the data
ID Group Rank
1 dog 1
1 cat 1
1 lion 1
1 elephant 2
2 dog 1
2 cat 1
2 lion 1
2 elephant 1
3 dog 1
3 cat 2
3 lion 1
3 elephant 1
4 dog 2
4 cat 1
4 lion 1
4 elephant 1
...
I would ideally like to return a sample number for each entry in Group but with only unique values from ID.
Below is the current query I produced but this returns duplicates for ID
SELECT ID, Group FROM Table
WHERE rank = 1
SAMPLE
WHEN group = 'dog' then 10
WHEN group = 'cat' then 10
WHEN group = 'elephant' then 5
WHEN group = 'lion' then 5
END
with cte as
(
SELECT ID, Group,
random(1,10000) as rnd -- RANDOM can't be directly used in OLAP-functions
FROM Table
WHERE rank = 1
)
SELECT ID, Group
FROM cte
QUALIFY
ROW_NUMBER() -- get one random row per ID
OVER (PARTITION BY ID
ORDER BY rnd) = 1
SAMPLE
WHEN group = 'dog' then 10
WHEN group = 'cat' then 10
WHEN group = 'elephant' then 5
WHEN group = 'lion' then 5
END
Assuming you have enough records, choose a random row for each id and then choose the appropriate numbers from that:
select t.*
from (select t.*,
row_number() over (partition by group order by seqnum) as sequm_g
from (select t.*,
row_number() over (partition by id order by random(1, 1000000))
from t
) t
where seqnum = 1
) t
where (group in ('dog', 'cat') and seqnum_g <= 10) or
(group in ('elephant', 'lion') and seqnum_g <= 5) ;
This doesn't guarantee that the groups will be big enough in the result set. But if you have enough data relative to the size of the groups, then it should work.

ROW_Number with Custom Group

I am trying to have row_number based on custom grouping but I am not able to produce it.
Below is my Query
CREATE TABLE mytbl (wid INT, id INT)
INSERT INTO mytbl Values(1,1),(2,1),(3,0),(4,2),(5,3)
Current Output
wid id
1 1
2 1
3 0
4 2
5 3
Query
SELECT *, RANK() OVER(PARTITION BY wid, CASE WHEN id = 0 THEN 0 ELSE 1 END ORDER BY ID)
FROM mytbl
I would like to rank the rows based on custom condition like if ID is 0 then I have start new group until I have non 0 ID.
Expected Output
wid id RN
1 1 1
2 1 1
3 0 1
4 2 2
5 3 2
Guessing here, as we don't have much clarification, but perhaps this:
SELECT wid,
id,
COUNT(CASE id WHEN 0 THEN 1 END) OVER (ORDER BY wid ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) +1 AS [Rank]
FROM mytbl ;
If I understand you correctly, you may use the next approach. Note, that you need to have an ordering column (I assume this is wid column):
Statement:
;WITH ChangesCTE AS (
SELECT
*,
CASE WHEN LAG(id) OVER (ORDER BY wid) = 0 THEN 1 ELSE 0 END AS ChangeIndex
FROM mytbl
), GroupsCTE AS (
SELECT
*,
SUM(ChangeIndex) OVER (ORDER BY wid) AS GroupIndex
FROM ChangesCTE
)
SELECT
wid,
id,
DENSE_RANK() OVER (ORDER BY GroupIndex) AS Rank
FROM GroupsCTE
Result:
wid id Rank
1 1 1
2 1 1
3 0 1
4 2 2
5 3 2
without much clarification on the logic required, my understanding is you want to increase the Rank by 1 whenever id = 0
select wid, id,
[Rank] = sum(case when id = 0 then 1 else 0 end) over(order by wid)
+ case when id <> 0 then 1 else 0 end
from mytbl
Try this,
CREATE TABLE #mytbl (wid INT, id INT)
INSERT INTO #mytbl Values(1,1),(2,1),(3,0)
,(4,2),(5,3),(6,0),(7,4),(8,5),(9,6)
;with CTE as
(
select *,ROW_NUMBER()over(order by wid)rn
from #mytbl where id=0
)
,CTE1 as
(
select max(rn)+1 ExtraRN from CTE
)
select a.* ,isnull(ca.rn,ca1.ExtraRN) from #mytbl a
outer apply(select top 1 * from CTE b
where a.wid<=b.wid )ca
cross apply(select ExtraRN from CTE1)ca1
drop table #mytbl
Here both OUTER APPLY and CROSS APPLY will not increase cardianility estimate.It will always return only one rows.

How to get closest n rows for specific row in table?

I have a table foo with its primary key id and some other columns.
My goal is to find for instance rows with id=3 and id=4 and rows with id=6 and id=7 for row with id=5 - in case I would like to find 2 closest previous and next rows.
In case there is only one or no such rows (e.g. for id=2 there is only previous row) I would like to get only possible ones.
The problem is there can be some rows missing.
Is there a common practice to make such queries?
I would try the following:
SELECT * FROM table WHERE id > ? ORDER BY id ASC LIMIT 2
followed by
SELECT * FROM table WHERE id <= ? ORDER BY id DESC LIMIT 2
You may be able to combine the above into the following:
SELECT * FROM table WHERE id > ? ORDER BY id ASC LIMIT 2
UNION
SELECT * FROM table WHERE id <= ? ORDER BY id DESC LIMIT 2
I think this would fit your description.
Select * from table where id between #n-2 and #n+2 and id <> #n
One way is this:
with your_table(id) as(
select 1 union all
select 2 union all
select 4 union all
select 5 union all
select 10 union all
select 11 union all
select 12 union all
select 13 union all
select 14
)
select * from (
(select * from your_table where id <= 10 order by id desc limit 3+1)
union all
(select * from your_table where id > 10 order by id limit 3)
) t
order by id
(Here 10 is start point and 3 is n rows you want)
This is a possible solution by numbering all the records and fetching those where row number is 2 rows greater or lower than the selected ID.
create table foo(id int);
insert into foo values (1),(2),(4),(6),(7),(8),(11),(12);
-- using ID = 6
with rnum as
(
select id, row_number() over (order by id) rn
from foo
)
select *
from rnum
where rn >= (select rn from rnum where id = 6) - 2
and rn <= (select rn from rnum where id = 6) + 2;
id | rn
-: | -:
2 | 2
4 | 3
6 | 4
7 | 5
8 | 6
-- using ID = 2
with rnum as
(
select id, row_number() over (order by id) rn
from foo
)
select *
from rnum
where rn >= (select rn from rnum where id = 2) - 2
and rn <= (select rn from rnum where id = 2) + 2;
id | rn
-: | -:
1 | 1
2 | 2
4 | 3
6 | 4
dbfiddle here

SELECT records until new value SQL

I have a table
Val | Number
08 | 1
09 | 1
10 | 1
11 | 3
12 | 0
13 | 1
14 | 1
15 | 1
I need to return the last values where Number = 1 (however many that may be) until Number changes, but do not need the first instances where Number = 1. Essentially I need to select back until Number changes to 0 (15, 14, 13)
Is there a proper way to do this in MSSQL?
Based on following:
I need to return the last values where Number = 1
Essentially I need to select back until Number changes to 0 (15, 14,
13)
Try (Fiddle demo ):
select val, number
from T
where val > (select max(val)
from T
where number<>1)
EDIT: to address all possible combinations (Fiddle demo 2)
;with cte1 as
(
select 1 id, max(val) maxOne
from T
where number=1
),
cte2 as
(
select 1 id, isnull(max(val),0) maxOther
from T
where val < (select maxOne from cte1) and number<>1
)
select val, number
from T cross join
(select maxOne, maxOther
from cte1 join cte2 on cte1.id = cte2.id
) X
where val>maxOther and val<=maxOne
I think you can use window functions, something like this:
with cte as (
-- generate two row_number to enumerate distinct groups
select
Val, Number,
row_number() over(partition by Number order by Val) as rn1,
row_number() over(order by Val) as rn2
from Table1
), cte2 as (
-- get groups with Number = 1 and last group
select
Val, Number,
rn2 - rn1 as rn1, max(rn2 - rn1) over() as rn2
from cte
where Number = 1
)
select Val, Number
from cte2
where rn1 = rn2
sql fiddle demo
DEMO: http://sqlfiddle.com/#!3/e7d54/23
DDL
create table T(val int identity(8,1), number int)
insert into T values
(1),(1),(1),(3),(0),(1),(1),(1),(0),(2)
DML
; WITH last_1 AS (
SELECT Max(val) As val
FROM t
WHERE number = 1
)
, last_non_1 AS (
SELECT Coalesce(Max(val), -937) As val
FROM t
WHERE EXISTS (
SELECT val
FROM last_1
WHERE last_1.val > t.val
)
AND number <> 1
)
SELECT t.val
, t.number
FROM t
CROSS
JOIN last_1
CROSS
JOIN last_non_1
WHERE t.val <= last_1.val
AND t.val > last_non_1.val
I know it's a little verbose but I've deliberately kept it that way to illustrate the methodolgy.
Find the highest val where number=1.
For all values where the val is less than the number found in step 1, find the largest val where the number<>1
Finally, find the rows that fall within the values we uncovered in steps 1 & 2.
select val, count (number) from
yourtable
group by val
having count(number) > 1
The having clause is the key here, giving you all the vals that have more than one value of 1.
This is a common approach for getting rows until some value changes. For your specific case use desc in proper spots.
Create sample table
select * into #tmp from
(select 1 as id, 'Alpha' as value union all
select 2 as id, 'Alpha' as value union all
select 3 as id, 'Alpha' as value union all
select 4 as id, 'Beta' as value union all
select 5 as id, 'Alpha' as value union all
select 6 as id, 'Gamma' as value union all
select 7 as id, 'Alpha' as value) t
Pull top rows until value changes:
with cte as (select * from #tmp t)
select * from
(select cte.*, ROW_NUMBER() over (order by id) rn from cte) OriginTable
inner join
(
select cte.*, ROW_NUMBER() over (order by id) rn from cte
where cte.value = (select top 1 cte.value from cte order by cte.id)
) OnlyFirstValueRecords
on OriginTable.rn = OnlyFirstValueRecords.rn and OriginTable.id = OnlyFirstValueRecords.id
On the left side we put an original table. On the right side we put only rows whose value is equal to the value in first line.
Records in both tables will be same until target value changes. After line #3 row numbers will get different IDs associated because of the offset and will never be joined with original table:
LEFT RIGHT
ID Value RN ID Value RN
1 Alpha 1 | 1 Alpha 1
2 Alpha 2 | 2 Alpha 2
3 Alpha 3 | 3 Alpha 3
----------------------- result set ends here
4 Beta 4 | 5 Alpha 4
5 Alpha 5 | 7 Alpha 5
6 Gamma 6 |
7 Alpha 7 |
The ID must be unique. Ordering by this ID must be same in both ROW_NUMBER() functions.

Second maximum and minimum values

Given a table with multiple rows of an int field and the same identifier, is it possible to return the 2nd maximum and 2nd minimum value from the table.
A table consists of
ID | number
------------------------
1 | 10
1 | 11
1 | 13
1 | 14
1 | 15
1 | 16
Final Result would be
ID | nMin | nMax
--------------------------------
1 | 11 | 15
You can use row_number to assign a ranking per ID. Then you can group by id and pick the rows with the ranking you're after. The following example picks the second lowest and third highest :
select id
, max(case when rnAsc = 2 then number end) as SecondLowest
, max(case when rnDesc = 3 then number end) as ThirdHighest
from (
select ID
, row_number() over (partition by ID order by number) as rnAsc
, row_number() over (partition by ID order by number desc) as rnDesc
) as SubQueryAlias
group by
id
The max is just to pick out the one non-null value; you can replace it with min or even avg and it would not affect the outcome.
This will work, but see caveats:
SELECT Id, number
INTO #T
FROM (
SELECT 1 ID, 10 number
UNION
SELECT 1 ID, 10 number
UNION
SELECT 1 ID, 11 number
UNION
SELECT 1 ID, 13 number
UNION
SELECT 1 ID, 14 number
UNION
SELECT 1 ID, 15 number
UNION
SELECT 1 ID, 16 number
) U;
WITH EX AS (
SELECT Id, MIN(number) MinNumber, MAX(number) MaxNumber
FROM #T
GROUP BY Id
)
SELECT #T.Id, MIN(number) nMin, MAX(number) nMax
FROM #T INNER JOIN
EX ON #T.Id = EX.Id
WHERE #T.number <> MinNumber AND #T.number <> MaxNumber
GROUP BY #T.Id
DROP TABLE #T;
If you have two MAX values that are the same value, this will not pick them up. So depending on how your data is presented you could be losing the proper result.
You could select the next minimum value by using the following method:
SELECT MAX(Number)
FROM
(
SELECT top 2 (Number)
FROM table1 t1
WHERE ID = {MyNumber}
order by Number
)a
It only works if you can restrict the inner query with a where clause
This would be a better way. I quickly put this together, but if you can combine the two queries, you will get exactly what you were looking for.
select *
from
(
select
myID,
myNumber,
row_number() over (order by myID) as myRowNumber
from MyTable
) x
where x.myRowNumber = 2
select *
from
(
select
myID,
myNumber,
row_number() over (order by myID desc) as myRowNumber
from MyTable
) y
where x.myRowNumber = 2
let the table name be tblName.
select max(number) from tblName where number not in (select max(number) from tblName);
same for min, just replace max with min.
As I myself learned just today the solution is to use LIMIT. You order the results so that the highest values are on top and limit the result to 2. Then you select that subselect and order it the other way round and only take the first one.
SELECT somefield FROM (
SELECT somefield from table
ORDER BY somefield DESC LIMIT 2)
ORDER BY somefield ASC LIMIT 1