Subquery select select only one col - sql

I have this query who create pagination system, I want to SELECT only A* , I dont want to show row_number value even if I need it .
SELECT *
FROM (SELECT A.*, rownum row_number
FROM (select * from dual
) A
WHERE rownum <= 10)
WHERE row_number >= 1
The result :
D ROW_NUMBER
- ----------
X 1
Whats I want
D
-
X
Thanks for help

If your table has a primary key, you may perform the pagination filter only on this key and in the second step select the data based on the PK.
This will allow you to use SELECT *
select * from tab
where id in (
SELECT id
FROM (SELECT id, rownum row_number
FROM (select id from tab
) A
WHERE rownum <= 10)
WHERE row_number >= 1)
You'll pay a little performance penalty, as each selected row must be additionaly accessed by the primary key index (but this will be not visible for 10 rows or so).
An other point with pagination is that you typically need to present the data in some order and not randonly as in your example.
In that case the innermost subquery will be
select id from tab order by <some column>
Here you can profit, as you need to sort only the PK and the sort key and not the whole row (but again it will be not visible for 10 rows).

Related

How to skip/offset rows in Oracle database?

I am writing a very simple query for an Oracle DB (version 9).
Somehow I can get first 5 rows:
select * from cities where rownum <= 5
But skipping 5 rows returns an empty result:
select * from cities where rownum >= 5
Using:
Oracle SQL Developer
Oracle DB version 9
Why is the second query returning an empty result?
In Oracle Database 12c (release 1) and above, you can do this very simple, for skip 5 rows:
SELECT * FROM T OFFSET 5 ROWS
and for skip 5 rows and take 15 rows:
SELECT * FROM T OFFSET 5 ROWS FETCH NEXT 15 ROWS ONLY
You can use the following query to skip the first not n of rows.
select * from (
select rslts.*, rownum as rec_no from (
<<Query with proper order by (If you don't have proper order by you will see weird results)>>
) rslts
) where rec_no > <<startRowNum - n>>
The above query is similar to pagination query below.
select * from (
select rslts.*, rownum as rec_no from (
<<Query with proper order by (If you don't have proper order by you will see weird results)>>
) rslts where rownum <= <<endRowNum>>
) where rec_no > <<startRowNum>>
Your cities query:
select * from (
select rslts.*, rownum as rec_no from (
select * from cities order by 1
) rslts
) where rec_no > 5 <<startRowNum>>
Note: Assume first column in cities table is unique key
Oracle increments rownum each time it adds a row to the result set. So saying rownum < 5 is fine; as it adds each of the first 5 rows it increments rownum, but then once ruwnum = 5 the WHERE clause stops matching, no more rows are added to the result, and though you don't notice this rownum stops incrementing.
But if you say WHERE rownum > 5 then right off the bat, the WHERE clause doesn't match; and since, say, the first row isn't added to the result set, rownum isn't incremented... so rownum can never reach a value greater than 5 and the WHERE clause can never match.
To get the result you want, you can use row_number() over() in a subquery, like
select *
from (select row_number() over() rn, -- other values
from table
where -- ...)
where rn > 5
Update - As noted by others, this kind of query only makes sense if you can
control the order of the row numbering, so you should really use row_number() over(order bysomething) where something is a useful ordering key in deciding which records are "the first 5 records".
rownum is being increased only when a row is being output, so this type of condition won't work.
In any case, you are not ordering your rows, so what's the point?
Used row_number() over (order by id):
select * from
(select row_number() over (order by id) rn, c.* from countries c)
where rn > 5
Used ROWNUM:
select * from
(select rownum rn, c.* from countries c)
where rn > 5
Important note:
Using alias as countries c instead of countries is required! Without, it gives an error "missing expression"
Even better would be:
select * from mytab sample(5) fetch next 1 rows only;
Sample clause indicates the probability of each row getting picked up in the sampling process. FETCH NEXT clause indicates the number of rows you want to select.
With this code, you can query your table with skip and take.
select * from (
select a.*, rownum rnum from (
select * from cities
) a
) WHERE rnum >= :skip + 1 AND rnum <= :skip + :take
This code works with Oracle 11g. With Oracle 12, there is already a better way to perform this queries with offset and fetch

Global row numbers in chunked query

I would like to include a column row_number in my result set with the row number sequence, where 1 is the newest item, without gaps. This works:
SELECT id, row_number() over (ORDER BY id desc) AS row_number, title
FROM mytable
WHERE group_id = 10;
Now I would like to query for the same data in chunks of 1000 each to be easier on memory:
SELECT id, row_number() over (ORDER BY id desc) AS row_number, title
FROM mytable
WHERE group_id = 10 AND id >= 0 AND id < 1000
ORDER BY id ASC;
Here the row_number restarts from 1 for every chunk, but I would like it to be as if it were part of the global query, as in the first case. Is there an easy way to accomplish this?
Assuming:
id is defined as PRIMARY KEY - which means UNIQUE and NOT NULL. Else you may have to deal with NULL values and / or duplicates (ties).
You have no concurrent write access on the table - or you don't care what happens after you have taken your snapshot.
A MATERIALIZED VIEW, like you demonstrate in your answer, is a good choice.
CREATE MATERIALIZED VIEW mv_temp AS
SELECT row_number() OVER (ORDER BY id DESC) AS rn, id, title
FROM mytable
WHERE group_id = 10;
But index and subsequent queries must be on the row number rn to get
data in chunks of 1000
CREATE INDEX ON mv_temp (rn);
SELECT * FROM mv_temp WHERE rn BETWEEN 1000 AND 2000;
Your implementation would require a guaranteed gap-less id column - which would void the need for an added row number to begin with ...
When done:
DROP MATERIALIZED VIEW mv_temp;
The index dies with the table (materialized view in this case) automatically.
Related, with more details:
Optimize query with OFFSET on large table
You want to have a query for the first 1000 rows, then one for the next 1000, and so on?
Usually you just write one query (the one you already use), have your app fetch 1000 records, do something with them, then fetch the next 1000 and so on. No need for separate queries, hence.
However, it would be rather easy to write such partial queries:
select *
from
(
SELECT id, row_number() over (ORDER BY id desc) AS rn, title
FROM mytable
WHERE group_id = 10
) numbered
where rn between 1 and 1000; -- <- simply change the row number range here
-- e.g. where rn between 1001 and 2000 for the second chunk
You need a pagination. Try this
SELECT id, row_number() over (ORDER BY id desc)+0 AS row_number, title
FROM mytable
WHERE group_id = 10 AND id >= 0 AND id < 1000
ORDER BY id ASC;
Next time, when you change the start value of id in the WHERE clause change it in row_number() as well like below
SELECT id, row_number() over (ORDER BY id desc)+1000 AS row_number, title
FROM mytable
WHERE group_id = 10 AND id >= 1000 AND id < 2000
ORDER BY id ASC;
or Better you can use OFFSET and LIMIT approach for pagination
https://wiki.postgresql.org/images/3/35/Pagination_Done_the_PostgreSQL_Way.pdf
In the end I ended up doing it this way:
First I create a temporary materialized view:
CREATE MATERIALIZED VIEW vw_temp AS SELECT id, row_number() over (ORDER BY id desc) AS rn, title
FROM mytable
WHERE group_id = 10;
Then I define the index:
CREATE INDEX idx_temp ON vw_temp USING btree(id);
Now I can perform all operations very quickly, and with numbered rows:
SELECT * FROM vw_temp WHERE id BETWEEN 1000 AND 2000;
After doing the operations, cleanup:
DROP INDEX idx_temp;
DROP MATERIALIZED VIEW vw_temp;
Even though Thorsten Kettner's answer seems the cleanest one, it was not practical for me due to being too slow. Thanks for contributing everyone. For those interesed in the practical use case, I use this for feeding data to the Sphinx indexer.

Select a row with preceding and following rows

I have a table as follows:
CREATE TABLE results (
id uuid primary key UNIQUE,
score integer NOT NULL
)
I need to select a record with particular UUID and what's around it (say, 5 before and after) ordered by score
SELECT * FROM results
WHERE id = <SOME_UUID>
ORDERED BY score
OFFSET -5 LIMIT 10; -- apparently this is wrong
How can I effectively do that?
Its not 'effective', but you could try this:
select a.* from (SELECT * FROM results
WHERE id <> <SOME_UUID> and score <= (select score from results WHERE id = <SOME_UUID>)
ORDERED BY score,id desc
LIMIT 5) as a
UNION ALL
SELECT * FROM results
WHERE id = <SOME_UUID>
UNION ALL
select b.* from (SELECT * FROM results
WHERE id <> <SOME_UUID> and score >= (select score from results WHERE id = <SOME_UUID>)
ORDERED BY score, id asc
LIMIT 5) as b
I tried this an SQL-Server, which needded the 'ALL' to compute.
So you may get records with equal score as duplicates. To avoid this make it again to a subquery and use select distinct.
One way of solving this is with a rank for each row assigned using a window function and then finding out which ranks you are interested in:
WITH ranked AS (
SELECT id, score, rank() OVER (ORDER BY score) AS rnk
FROM results),
this_rank AS (
SELECT rnk - 5 AS low_rnk FROM ranked
WHERE id = <some uuid>::uuid)
SELECT id, score
FROM ranked, this_rank
WHERE rnk >= low_rnk
ORDER BY rnk
LIMIT 11;
For very low or high scores you get fewer than 11 rows, rather than rows with NULLs.
SQLFiddle
One further detail: A PRIMARY KEY already implies uniqueness so you do not have to use the UNIQUE clause in your table definition.

How to use ROWNUM for a maximum and another minimum ordering in ORACLE?

Currently i am trying to output the top row for 2 condition. One is max and one is min.
Current code
Select *
from (MY SELECT STATEMENT order by A desc)
where ROWNUM <= 1
UPDATE
I am now able to do for both condition. But i need the A to be the highest, if same then check for the B lowest.
E.g Lets say there is 2 rows, Both A is 100 and B is 50 for one and 60 for other.
In this case the 100:50 shld be choose because A is same then B is lowest.
E.g
Lets say there is 2 rows, A is 100 for one and 90 for other, since one is higher no need to check for B.
I tried using max and min but this method seems to work better, any suggestions
Well, after your clarification, you are looking for one record. With Max A. And the smallest B, in case there is more than one record with MAX A. This is simply:
Select *
from (MY SELECT STATEMENT order by A desc, B)
where ROWNUM = 1;
This sorts by A descending first, so you get all maximal A records first. Then it sorts by B, so inside each A group you get the least B first. This gives you the desired A record first, no matter if the found A is unique or not.
or avoid the vagaries of rownun and go for row_number() instead:
SELECT
*
FROM (
SELECT
*
, ROW_NUMBER (ORDER BY A DESC) adesc
, ROW_NUMBER (ORDER BY B ASC) basc
FROM SomeQuery
)
WHERE adesc = 1
OR basc = 1
footnote: select * is a convenience only, please replace with the actual columns required along with table names etc.
Try this if that works
Select *
from (MY SELECT STATEMENT order by A desc)
where ROWNUM <= 1
union
Select *
from (MY SELECT STATEMENT order by A asc)
where ROWNUM <= 1
SELECT * FROM
(Select foo.*, 0 as union_order
from (MY SELECT STATEMENT order by A desc) foo
where ROWNUM <= 1
UNION
Select foo.*, 1
from (MY SELECT STATEMENT order by B asc) foo
where ROWNUM <= 1)
ORDER BY
union_order

How can I get a random cartesian product in PostgreSQL?

I have two tables, custassets and tags. To generate some test data I'd like to do an INSERT INTO a many-to-many table with a SELECT that gets random rows from each (so that a random primary key from one table is paired with a random primary key from the second). To my surprise this isn't as easy as I first thought, so I'm persisting with this to teach myself.
Here's my first attempt. I select 10 custassets and 3 tags, but both are the same in each case. I'd be fine with the first table being fixed, but I'd like to randomise the tags assigned.
SELECT
custassets_rand.id custassets_id,
tags_rand.id tags_rand_id
FROM
(
SELECT id FROM custassets WHERE defunct = false ORDER BY RANDOM() LIMIT 10
) AS custassets_rand
,
(
SELECT id FROM tags WHERE defunct = false ORDER BY RANDOM() LIMIT 3
) AS tags_rand
This produces:
custassets_id | tags_rand_id
---------------+--------------
9849 | 3322 }
9849 | 4871 } this pattern of tag PKs is repeated
9849 | 5188 }
12145 | 3322
12145 | 4871
12145 | 5188
17837 | 3322
17837 | 4871
17837 | 5188
....
I then tried the following approach: doing the second RANDOM() call in the SELECT column list. However this one was worse, as it chooses a single tag PK and sticks with it.
SELECT
custassets_rand.id custassets_id,
(SELECT id FROM tags WHERE defunct = false ORDER BY RANDOM() LIMIT 1) tags_rand_id
FROM
(
SELECT id FROM custassets WHERE defunct = false ORDER BY RANDOM() LIMIT 30
) AS custassets_rand
Result:
custassets_id | tags_rand_id
---------------+--------------
16694 | 1537
14204 | 1537
23823 | 1537
34799 | 1537
36388 | 1537
....
This would be easy in a scripting language, and I'm sure can be done quite easily with a stored procedure or temporary table. But can I do it just with a INSERT INTO SELECT?
I did think of choosing integer primary keys using a random function, but unfortunately the primary keys for both tables have gaps in the increment sequences (and so an empty row might be chosen in each table). That would have been fine otherwise!
Note that what you are looking for is not a Cartesian product, which would produce n*m rows; rather a random 1:1 association, which produces GREATEST(n,m) rows.
To produce truly random combinations, it's enough to randomize rn for the bigger set:
SELECT c_id, t_id
FROM (
SELECT id AS c_id, row_number() OVER (ORDER BY random()) AS rn
FROM custassets
) x
JOIN (SELECT id AS t_id, row_number() OVER () AS rn FROM tags) y USING (rn);
If arbitrary combinations are good enough, this is faster (especially for big tables):
SELECT c_id, t_id
FROM (SELECT id AS c_id, row_number() OVER () AS rn FROM custassets) x
JOIN (SELECT id AS t_id, row_number() OVER () AS rn FROM tags) y USING (rn);
If the number of rows in both tables do not match and you do not want to lose rows from the bigger table, use the modulo operator % to join rows from the smaller table multiple times:
SELECT c_id, t_id
FROM (
SELECT id AS c_id, row_number() OVER () AS rn
FROM custassets -- table with fewer rows
) x
JOIN (
SELECT id AS t_id, (row_number() OVER () % small.ct) + 1 AS rn
FROM tags
, (SELECT count(*) AS ct FROM custassets) AS small
) y USING (rn);
Window functions were added with PostgreSQL 8.4.
WITH a_ttl AS (
SELECT count(*) AS ttl FROM custassets c),
b_ttl AS (
SELECT count(*) AS ttl FROM tags),
rows AS (
SELECT gs.*
FROM generate_series(1,
(SELECT max(ttl) AS ttl FROM
(SELECT ttl FROM a_ttl UNION SELECT ttl FROM b_ttl) AS m))
AS gs(row)),
tab_a_rand AS (
SELECT custassets_id, row_number() OVER (order by random()) as row
FROM custassets),
tab_b_rand AS (
SELECT id, row_number() OVER (order by random()) as row
FROM tags)
SELECT a.custassets_id, b.id
FROM rows r
JOIN a_ttl ON 1=1 JOIN b_ttl ON 1=1
LEFT JOIN tab_a_rand a ON a.row = (r.row % a_ttl.ttl)+1
LEFT JOIN tab_b_rand b ON b.row = (r.row % b_ttl.ttl)+1
ORDER BY 1,2;
You can test this query on SQL Fiddle.
Here is a different approach to pick a single combination from 2 tables by random, assuming two tables a and b, both with primary key id. The tables needn't be of same size, and the second row is independently chosen from the first, which might not be that important for testdata.
SELECT * FROM a, b
WHERE a.id = (
SELECT id
FROM a
OFFSET (
SELECT random () * (SELECT count(*) FROM a)
)
LIMIT 1)
AND b.id = (
SELECT id
FROM b
OFFSET (
SELECT random () * (SELECT count(*) FROM b)
)
LIMIT 1);
Tested with two tables, one of size 7000 rows, one with 100k rows, result: immediately. For more than one result, you have to call the query repeatedly - increasing the LIMIT and changing x.id = to x.id IN would produce (aA, aB, bA, bB) result patterns.
It bugs me that after all these years of relational databases, there doesn't seem to be very good cross database ways of doing things like this. The MSDN article http://msdn.microsoft.com/en-us/library/cc441928.aspx seems to have some interesting ideas, but of course that's not PostgreSQL. And even then, their solution requires a single pass, when I'd think it ought to be able to be done without the scan.
I can imagine a few ways that might work without a pass (in selection), but it would involve creating another table that maps your table's primary keys to random numbers (or to linear sequences that you later randomly select, which in some ways may actually be better), and of course, that may have issues as well.
I realize this is probably a non-useful comment, I just felt I needed to rant a bit.
If you just want to get a random set of rows from each side, use a pseudo-random number generator. I would use something like:
select *
from (select a.*, row_number() over (order by NULL) as rownum -- NULL may not work, "(SELECT NULL)" works in MSSQL
from a
) a cross join
(select b.*, row_number() over (order by NULL) as rownum
from b
) b
where a.rownum <= 30 and b.rownum <= 30
This is doing a Cartesian product, which returns 900 rows assuming a and b each have at least 30 rows.
However, I interpreted your question as getting random combinations. Once again, I'd go for the pseudo-random approach.
select *
from (select a.*, row_number() over (order by NULL) as rownum -- NULL may not work, "(SELECT NULL)" works in MSSQL
from a
) a cross join
(select b.*, row_number() over (order by NULL) as rownum
from b
) b
where modf(a.rownum*107+b.rownum*257+17, 101) < <some vaue>
This let's you get combinations among arbitrary rows.
Just a plain carthesian product ON random() appears to work reasonably well. Simple comme bonjour...
-- Cartesian product
-- EXPLAIN ANALYZE
INSERT INTO dirgraph(point_from,point_to,costs)
SELECT p1.the_point , p2.the_point, (1000*random() ) +1
FROM allpoints p1
JOIN allpoints p2 ON random() < 0.002
;