MYSQL query to get 'n' rows nearby given row

MYSQL query to get 'n' rows nearby given row - sql

I have a MySQL table by name 'videos', where one of the column is 'cat' (INT) and 'id' is the PRIMARY KEY.
So, if 'x' is the row number,and 'n' is the category id, I need to get nearby 15 rows
Case 1: There are many rows in the category before and after 'x'.. Just get 7 each rows before and after 'x'
SELECT * FROM videos WHERE cat=n AND id<x ORDER BY id DESC LIMIT 0,7
SELECT * FROM videos WHERE cat=n AND id>x LIMIT 0,7
Case 2: If 'x' is in the beginning/end of the the table -> Print all (suppose 'y' rows) the rows before/after 'x' and later print 15-y rows after/before 'x'
Case 1 is not a problem but I am stuck with Case 2. Is there any generic method to get 'p' rows nearby a row 'x' ??

This query will always position N (exact id match) at the centre of the data, unless there are no more rows (in either direction), in which case rows will be added from the prior/next sections as required, while still preserving data from prior/next (as much as available).
set #n := 28;
SELECT * FROM
(
SELECT * FROM
(
(SELECT v.*, 0 as prox FROM videos v WHERE cat=1 AND id = #n)
union all
(SELECT v.*, #rn1:=#rn1+1 FROM (select #rn1:=0) x, videos v WHERE cat=1 AND id < #n ORDER BY id DESC LIMIT 15)
union all
(SELECT v.*, #rn2:=#rn2+1 FROM (select #rn2:=0) y, videos v WHERE cat=1 AND id > #n ORDER BY id LIMIT 15)
) z
ORDER BY prox
LIMIT 15
) w
order by id
For example, if you had 30 ids for cat=1, and you were looking at item #28, it will show items 16 through 30, #28 is the 3rd row from the bottom.
Some explanation:
SELECT v.*, 0 as prox FROM videos v WHERE cat=1 AND id = #n
v.* means to select all columns in the table/alias v. In this case, v is the alias for the table videos.
0 as prox means to create a column named prox, and it will contain just the value 0
The next query:
SELECT v.*, #rn1:=#rn1+1 FROM (select #rn1:=0) x, videos v WHERE cat=1 AND id < #n ORDER BY id DESC LIMIT 15
v.* - as above
#rn1:=#rn1+1 uses a variable to return a sequence number for each record in this subquery. It starts with 1 and for each record, following the ORDER BY id DESC, it will be numbered 2, then 3 etc.
(select #rn1:=0) x This creates a subquery aliased as x, all it does is ensures the variable #rn1 starts with the value 1 for the first row.
The end result is that the variable and 0 as prox ranks each row based on how close it is to the value #n. The clause order by prox limit 15 takes the 15 that are closest to N.

Related

SQL - Extracting an ID range for a packet of records

I have a table where I have about 40000000 records. Min(id) = 2 and max(80000000).
I would like create a automated script which will be running in a loop.
But I don't want to create about 80 iteration because a part of then will be empty.
Who knows how I can find range min(id) and max(id) for first iteration, and next?
I used mod but it doesn't work correctly:
SELECT MIN(ID), MAX(ID)
FROM (
SELECT mod(id,45), id FROM table
WHERE mod(id,45) = 0
GROUP BY mod(id,45), id
ORDER BY id desc
)
Because I want to:
first iteration has range for 1mln records: min(id) = 2 max(id) = 1 500 000
second iteration has range for 1 mln records: min(id)=1 550 000, max(id) = 5 000 000
and so on

It should be easy for whatever DBMS supporting ordered numbering of rows.
Db2.
Every SELECT returns 2 rows except the last one, which may return less rows.
SELECT 'SELECT * FROM MYTAB WHERE I BETWEEN ' || MIN (I) || ' AND ' || MAX (I) AS STMT
FROM
(
SELECT I, (ROW_NUMBER () OVER (ORDER BY I) - 1) / 2 AS RN_
FROM (VALUES 1, 9, 2, 7, 4) MYTAB (I)
) G
GROUP BY RN_
The result is:
STMT
SELECT * FROM MYTAB WHERE I BETWEEN 1 AND 2
SELECT * FROM MYTAB WHERE I BETWEEN 4 AND 7
SELECT * FROM MYTAB WHERE I BETWEEN 9 AND 9

How to update each column one at time for each row in snowflake

Suppose I have 10 columns in my table and I want to update each column but one at a time for each row up to 10 rows.
if table is like
1,2,3
4,5,6
7,8,9
I want to update it like
x,2,3
4,y,6
7,8,z
Columns can be of any count so need dynamic approach. Also sometimes need to exclude some columns.
I tried to see if I can update row based on row id but there is no such option available as row id. I don't wanna change design of table to include a counter column.

you can use window function to assign a a row id and based on that :
with cte as (
select * from (
select * , row_number() over (order by id) rn
from tablename
) t ) ;
update t
set col1 = case when rn = 1 then <updatevalue> else col1 end
, col2 = case when rn = 2 then <updatevalue> else col2 end
, col3 = case when rn = 3 then <updatevalue> else col3 end
, ...
from tablename t
join cte on cte.id = t.id

The requirement "Columns can be of any count so need dynamic approach" looks like as a try to implement matrix as a table.
Alternative approach could be usage of ARRAY type and storing entire structure as single "cell" in the table.
CREATE OR REPLACE TABLE t
AS
SELECT ARRAY_CONSTRUCT(ARRAY_CONSTRUCT(1,2,3),
ARRAY_CONSTRUCT(4,5,6),
ARRAY_CONSTRUCT(7,8,9)) c
UNION ALL
SELECT ARRAY_CONSTRUCT(ARRAY_CONSTRUCT(10,20,30),
ARRAY_CONSTRUCT(40,50,60),
ARRAY_CONSTRUCT(70,80,90)) c;
SELECT *
FROM t;
/*
C
[[1,2,3],[4,5,6],[7,8,9]]
[[10,20,30],[40,50,60],[70,80,90]]
*/
Accessing elements:
SELECT c[0][0], c[0][1], c[0][2],
c[1][0], c[1][1], c[1][2],
c[2][0], c[2][1], c[2][2]
FROM t;
/*
C[0][0] C[0][1] C[0][2] C[1][0] C[1][1] C[1][2] C[2][0] C[2][1] C[2][2]
1 2 3 4 5 6 7 8 9
10 20 30 40 50 60 70 80 90
*/
Update:
UPDATE t
SET c = ARRAY_CONSTRUCT(ARRAY_CONSTRUCT('x' , c[0][1], c[0][2])
,ARRAY_CONSTRUCT(c[1][0], 'y' ,c[1][2])
,ARRAY_CONSTRUCT(c[2][0], c[2][1] , 'z' )
);
SELECT * FROM t;
/*
C
[["x",2,3],[4,"y",6],[7,8,"z"]]
[["x",20,30],[40,"y",60],[70,80,"z"]]
*/
More robust transformations could be performed via user-defined functions.

Select random sample of N rows from Oracle SQL query result

I want to reduce the number of rows exported from a query result. I have had no luck adapting the accepted solution posted on this thread.
My query looks as follows:
select
round((to_date('2019-12-31') - date_birth) / 365, 0) as age
from
personal_info a
where
exists
(
select person_id b from credit_info where credit_type = 'C' and a.person_id = b.person_id
)
;
This query returns way more rows than I need, so I was wondering if there's a way to use sample() to select a fixed number of rows (not a percentage) from however many rows result from this query.

You can sample your data by ordering randomly and then fetching first N rows.
DBMS_RANDOM.RANDOM
select round((to_date('2019-12-31') - date_birth) / 365, 0) as age
From personal_info a
where exists ( select person_id b from credit_info where credit_type = 'C' and a.person_id = b.person_id )
Order by DBMS_RANDOM.RANDOM
Fetch first 250 Rows
Edit: for oracle 11g and prior
Select * from (
select round((to_date('2019-12-31') - date_birth) / 365, 0) as age
From personal_info a
where exists ( select person_id b from credit_info where credit_type = 'C' and a.person_id = b.person_id )
Order by DBMS_RANDOM.RANDOM
)
Where rownum< 250

You can use fetch first to return a fixed number of rows. Just add:
fetch first 100 rows
to the end of your query.
If you want these sampled in some fashion, you need to explain what type of sampling you want.

If you are using 12C, you can use the row limiting clause below
select
round((to_date('2019-12-31') - date_birth) / 365, 0) as age
from
personal_info a
where
exists
(
select person_id b from credit_info where credit_type = 'C' and a.person_id = b.person_id
)
FETCH NEXT 5 ROWS ONLY;
Instead of 5, you can use any number you want.

Find overlapping range in PL/SQL

Sample data below
id start end
a 1 3
a 5 6
a 8 9
b 2 4
b 6 7
b 9 10
c 2 4
c 6 7
c 9 10
I'm trying to come up with a query that will return all the overlap start-end inclusive between a, b, and c (but extendable to more). So the expected data will look like the following
start end
2 3
6 6
9 9
The only way I can picture this is with a custom aggregate function that tracks the current valid intervals then computes the new intervals during the iterate phase. However I can't see this approach being practical when working with large datasets. So if some bright mind out there have a query or some innate function that I'm not aware of I would greatly appreciate the help.

You can do this using aggregation and a join. Assuming no internal overlaps for "a" and "b":
select greatest(ta.start, tb.start) as start,
least(ta.end, tb.end) as end
from t ta join
t tb
on ta.start <= tb.end and ta.end >= tb.start and
ta.id = 'a' and tb.id = 'b';

This is a lot uglier and more complex than Gordon's solution, but I think it gives the expected answer better and should extend to work with more ids:
WITH NUMS(N) AS ( --GENERATE NUMBERS N FROM THE SMALLEST START VALUE TO THE LARGEST END VALUE
SELECT MIN("START") N FROM T
UNION ALL
SELECT N+1 FROM NUMS WHERE N < (SELECT MAX("END") FROM T)
),
SEQS(N,START_RANK,END_RANK) AS (
SELECT N,
CASE WHEN IS_START=1 THEN ROW_NUMBER() OVER (PARTITION BY IS_START ORDER BY N) ELSE 0 END START_RANK, --ASSIGN A RANK TO EACH RANGE START
CASE WHEN IS_END=1 THEN ROW_NUMBER() OVER (PARTITION BY IS_END ORDER BY N) ELSE 0 END END_RANK --ASSIGN A RANK TO EACH RANGE END
FROM (
SELECT N,
CASE WHEN NVL(LAG(N) OVER (ORDER BY N),N) + 1 <> N THEN 1 ELSE 0 END IS_START, --MARK N AS A RANGE START
CASE WHEN NVL(LEAD(N) OVER (ORDER BY N),N) -1 <> N THEN 1 ELSE 0 END IS_END /* MARK N AS A RANGE END */
FROM (
SELECT DISTINCT N FROM ( --GET THE SET OF NUMBERS N THAT ARE INCLUDED IN ALL ID RANGES
SELECT NUMS.*,T.*,COUNT(*) OVER (PARTITION BY N) N_CNT,COUNT(DISTINCT "ID") OVER () ID_CNT
FROM NUMS
JOIN T ON (NUMS.N >= T."START" AND NUMS.N <= T."END")
) WHERE N_CNT=ID_CNT
)
) WHERE IS_START + IS_END > 0
)
SELECT STARTS.N "START",ENDS.N "END" FROM SEQS STARTS
JOIN SEQS ENDS ON (STARTS.START_RANK=ENDS.END_RANK AND STARTS.N <= ENDS.N) ORDER BY "START"; --MATCH CORRESPONDING RANGE START/END VALUES
First we generate all the numbers between the smallest start value and the largest end value.
Then we find the numbers that are included in all the provided "id" ranges by joining our generated numbers to the ranges, and selecting each number "n" that appears once for each "id".
Then we determine whether each of these values "n" starts or ends a range. To determine that, for each N we say:
If the previous value of N does not exist or is not 1 less than current N, current N starts a range. If the next value of N does not exist or is not 1 greater than current N, current N ends a range.
Next, we assign a "rank" to each start and end value so we can match them up.
Finally, we self-join where the ranks match (and where the start <= the end) to get our result.
EDIT: After some searching, I came across this question which shows a better way to find the start/ends and refactored the query to:
WITH NUMS(N) AS ( --GENERATE NUMBERS N FROM THE SMALLEST START VALUE TO THE LARGEST END VALUE
SELECT MIN("START") N FROM T
UNION ALL
SELECT N+1 FROM NUMS WHERE N < (SELECT MAX("END") FROM T)
)
SELECT MIN(N) "START",MAX(N) "END" FROM (
SELECT N,ROW_NUMBER() OVER (ORDER BY N)-N GRP_ID
FROM (
SELECT DISTINCT N FROM ( --GET THE SET OF NUMBERS N THAT ARE INCLUDED IN ALL ID RANGES
SELECT NUMS.*,T.*,COUNT(*) OVER (PARTITION BY N) N_CNT,COUNT(DISTINCT "ID") OVER () ID_CNT
FROM NUMS
JOIN T ON (NUMS.N >= T."START" AND NUMS.N <= T."END")
) WHERE N_CNT=ID_CNT
)
)
GROUP BY GRP_ID ORDER BY "START";

detect gaps in integer sequence

Intention: detect whether a numeric sequence contains gaps. No need to identify the missing elements, just flag (true / false) the sequence if it contains gaps.
CREATE TABLE foo(x INTEGER);
INSERT INTO foo(x) VALUES (1), (2), (4);
Below is my (apparently correctly functioning) query to detect gaps:
WITH cte AS
(SELECT DISTINCT x FROM foo)
SELECT
( (SELECT COUNT(*) FROM cte a
CROSS JOIN cte b
WHERE b.x=a.x-1)
=(SELECT COUNT(*)-1 FROM cte))
OR (NOT EXISTS (SELECT 1 FROM cte))
where the OR is needed for the edge case where the table is empty. The query's logic is based on the observation that in a contiguous sequence the number of links equals the number of elements minus 1.
Anything more idiomatic or performant (should I be worried by the CROSS JOIN in particularly long sequences?)

Try this:
SELECT
CASE WHEN ((MAX(x)-MIN(x)+1 = COUNT(DISTINCT X)) OR
(COUNT(DISTINCT X) = 0) )
THEN 'TRUE'
ELSE 'FALSE'
END
FROM foo
SQLFiddle demo

The following should detect whether or not there are gaps:
select (case when max(x) - min(x) + 1 = count(distinct x)
then 'No Gaps'
else 'Some Gaps'
end)
from foo;
If there are no gaps or duplicates, then the number of distinct values of x is the max minus the min plus 1.

A different approach...
If you subtract your min value from the max value, and add 1, you should equal the count.
if count = (max-min)+1 then "no gaps!"
If you can express that in SQL, it should be very efficient.

SELECT 'Has ' || count(*) - 1 || ' gaps.' AS gaps
FROM foo f1
LEFT JOIN foo f2 ON f2.id = f1.id + 1
WHERE f2.id IS NULL;
The trick is to count rows, where the next row is missing - which only happens for the last row(s) if there are no gaps.
If there are no rows, you get 'Has -1 gaps.'.
If there are no gaps, you get 'Has 0 gaps.'.
Else you get 'Has n gaps.' .. n being the exact number of gaps, no matter how big.
The count can be increased for duplicates, but 0 and -1 are immune to dupes.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

MYSQL query to get 'n' rows nearby given row - sql

Related

SQL - Extracting an ID range for a packet of records

How to update each column one at time for each row in snowflake

Select random sample of N rows from Oracle SQL query result

Find overlapping range in PL/SQL

detect gaps in integer sequence

Categories

Resources