I have a SQL query that looks like this:
SELECT foo "c0",
bar "c1",
baz "c2",
...
FROM some_table
WHERE ...
In order to apply a limit, and only return a subset of records from this query, I use the following wrapper SQL:
SELECT *
FROM (
SELECT t.*,
ROW_NUMBER() OVER (ORDER BY ...) rnum
FROM (
... original SQL goes here ...
) t
)
WHERE rnum BETWEEN 1 AND 10
My problem is that the original query is selecting over 1000 columns across a large number of joins to other tables. Oracle has an internal limit of 1000 columns per table or view, and apparently the wrapper SQL I'm using to limit the result set is creating a temporary view to which this limit is applied, causing the whole thing to fail.
Is there another method of pagination that doesn't create such a view, or wouldn't otherwise be affected by the 1000 column limit?
I'm not interested in suggestions to break the work up into chunks, not select > 1000 columns, etc., as I'm already fully aware of all of these methods.
Okay, this will perform worse than what you were planning, but my point is that you could try pagination this way:
WITH CTE AS
(
... original SQL goes here ...
)
SELECT A.*
FROM CTE A
INNER JOIN (SELECT YourKey,
ROW_NUMBER() OVER (ORDER BY ...) rnum
FROM CTE) B
ON A.YourKey = B.YourKey
WHERE rnum BETWEEN 1 AND 10;
you cant have a view with 1000+ columns, so cheat a little.
select *
from foo f, foo2 f2
where (f.rowid, f2.rowid) in (select r, r2
from (select r, r2, rownum rn
from (select /*+ first_rows */ f.rowid r, f2.rowid r2
from foo f, foo2 f2
where f.c1 = f2.a1
and f.c2 = '1'
order by f.c1))
where rn >= AAA
and rownum <= BBB)
order by whatever;
now put any where clauses in the innermost bit (eg i put f.c1 = '1').
BBB = pagesize.
AAA = start point
Is the problem pagination or just returning the first 10 rows? If it is the latter, you can do:
SELECT foo "c0",
bar "c1",
baz "c2",
...
FROM some_table
WHERE ... and
rownum <= 10
Related
Let us have a simple table tt created like this
WITH x AS (SELECT n FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) v(n)), t1 AS
(
SELECT ones.n + 10 * tens.n + 100 * hundreds.n + 1000 * thousands.n + 10000 * tenthousands.n as id
FROM x ones, x tens, x hundreds, x thousands, x tenthousands, x hundredthousands
)
SELECT id,
id % 100 groupby,
row_number() over (partition by id % 100 order by id) orderby,
row_number() over (partition by id % 100 order by id) / (id % 100 + 1) local_search
INTO tt
FROM t1
I have a simple query Q1:
select distinct g1.groupby,
(select count(*) from tt g2
where local_search = 1 and g1.groupby = g2.groupby) as orderby
from tt g1
option(maxdop 1)
I would like to know why the SQL Server estimates the result size so badly for Q1 (see the printscreen). Most of the operators in the query plan are estimated precisely, however, at the the root Hash Match operator introduce completely insane guess.
To make it more interesting I have tried different rewritings of the Q1. If I apply decorrelation of the subquery I obtain an equivalent query Q2:
select main.groupby,
coalesce(sub1.orderby,0) orderby
from
(
select distinct g1.groupby
from tt g1
) main
left join
(
select groupby, count(*) orderby
from tt g2
where local_search = 1
group by groupby
) sub1 on sub1.groupby = main.groupby
option(maxdop 1)
This query is interesting in two aspects: (1) the estimation is accurate (see printscreen), (2) it has also different query plan, which is more efficient that the query plan of Q1.
So the question is: why the estimation of Q1 is inccorect, whereas the estimation of Q2 is precise? Please do not post other rewritings of this SQL (I know that that is can be written even without subqueries), I'm interested only about the explanation of the selectivity estimator behaviour. Thanks.
It doesn't recognize the orderby value will be the same for all rows with the same groupby so it thinks distinct groupby, orderby will have more combinations than just distinct groupby.
It multiplies the estimate for DISTINCT orderby (for me this is 35.0367) and the estimate for DISTINCT groupby (for me this is 100) as if they were uncorrelated.
I get an estimate for 3503.67 for the root node in Q1
This rewrite avoids it as it now only groups by the single groupby column.
SELECT groupby,
max(orderby) AS orderby
FROM (SELECT g1.groupby,
(SELECT count(*)
FROM tt g2
WHERE local_search = 1
AND g1.groupby = g2.groupby) AS orderby
FROM tt g1) d
GROUP BY groupby
OPTION(maxdop 1)
This is not the optimal approach to this query though as shown by your Q2 and the comment #GarethD makes about the inefficiencies of running the correlated sub query multiple times and discarding the duplicates.
Please find below image for make understanding my issues. I have a table as shown below picture. I need to get only highlighted (yellow) records. What is the best method to find these records?
In SQL Server 2012+, you can use the lead() and lag() functions. However, this is not available in SQL Server 2008. Here is a method using outer apply:
select t.*
from t outer apply
(select top 1 tprev.*
from t tprev
on tprev.time < t.time
order by tprev.time desc
) tprev outer apply
(select top 1 tnext.*
from t tnext
on tnext.time > t.time
order by tnext.time asc
)
where (t.cardtype = 1 and tnext.cardtype = 2) or
(t.cardtype = 2 and tprev.cardtype = 1);
With your sample data, it would also be possible to use self joins on the id column. This seems unsafe, though, because there could be gaps in that columns values.
Havent tried this, but I think it should work. First, make a view of the table in your question, with the rownumber included as one column:
CREATE VIEW v AS
SELECT
ROW_NUMBER() OVER(ORDER BY id) AS rownum,
id,
time,
card,
card_type
FROM table
Then, you can get all the rows of type 1 followed by a row of type 2 like this:
SELECT
a.id,
-- And so on...
FROM v AS a
JOIN v AS b ON b.rownum = a.rownum + 1
WHERE a.card_type = 1 AND b.card_type = 2
And all the rows of type 2 preceded by a row of type 1 like this:
SELECT
b.id,
-- And so on...
FROM v AS b
JOIN v AS a ON b.rownum = a.rownum + 1
WHERE a.card_type = 1 AND b.card_type = 2
To get them both in the same set of results, you can just use UNION ALL. Technically, you don't need the view. You could use nested selects instead, but since you will need to query the table four times it might be nice to have it as a view.
Also, if the ID is continous (it goes 1, 2, 3 without any gaps), you don't need the rownum and can just use the ID instead.
here is a code you can run in sql server
select * from Table_name where id in (1,2,6,7,195,160,164,165)
I have a postgresql query like this:
with r as (
select
1 as reason_type_id,
rarreason as reason_id,
count(*) over() count_all
from
workorderlines
where
rarreason != 0
and finalinsdate >= '2012-12-01'
)
select
r.reason_id,
rt.desc,
count(r.reason_id) as num,
round((count(r.reason_id)::float / (select count(*) as total from r) * 100.0)::numeric, 2) as pct
from r
left outer join
rtreasons as rt
on
r.reason_id = rt.rtreason
and r.reason_type_id = rt.rtreasontype
group by
r.reason_id,
rt.desc
order by r.reason_id asc
This returns a table of results with 4 columns: the reason id, the description associated with that reason id, the number of entries having that reason id, and the percent of the total that number represents.
This table looks like this:
What I would like to do is only display the top 10 results based off the total number of entries having a reason id. However, whatever is leftover, I would like to compile into another row with a description called "Other". How would I do this?
with r2 as (
...everything before the select list...
dense_rank() over(order by pct) cause_rank
...the rest of your query...
)
select * from r2 where cause_rank < 11
union
select
NULL as reason_id,
'Other' as desc,
sum(r2.num) over() as num,
sum(r2.pct) over() as pct,
11 as cause_rank
from r2
where cause_rank >= 11
As said above Limit and for the skipping and getting the rest use offset... Try This Site
Not sure about Postgre but SELECT TOP 10... should do the trick if you sort correctly
However about the second part: You might use a Right Join for this. Join the TOP 10 Result with the whole table data and use only the records not appearing on the left side. If you calculate the sum of those you should get your "Sum of the rest" result.
I assume that vw_my_top_10 is the view showing you the top 10 records. vw_all_records shows all records (including the top 10).
Like this:
SELECT SUM(a_field)
FROM vw_my_top_10
RIGHT JOIN vw_all_records
ON (vw_my_top_10.Key = vw_all_records.Key)
WHERE vw_my_top_10.Key IS NULL
Given that I've got a table with the following, very simple content:
# select * from messages;
id | verbosity
----+-----------
1 | 20
2 | 20
3 | 20
4 | 30
5 | 100
(5 rows)
I would like to select N messages, which sum of verbosity is lower than Y (for testing purposes let's say it should be 70, then correct results will be messages with id 1,2,3).
It's really important to me, that solution should be database independent (it should work at least on Postgres and SQLite).
I was trying with something like:
SELECT * FROM messages GROUP BY id HAVING SUM(verbosity) < 70;
However it doesn't seem to work as expected, because it doesn't actually sum all values from verbosity column.
I would be very grateful for any hints/help.
SELECT m.id, sum(m1.verbosity) AS total
FROM messages m
JOIN messages m1 ON m1.id <= m.id
WHERE m.verbosity < 70 -- optional, to avoid pointless evaluation
GROUP BY m.id
HAVING SUM(m1.verbosity) < 70
ORDER BY total DESC
LIMIT 1;
This assumes a unique, ascending id like you have in your example.
In modern Postgres - or generally with modern standard SQL (but not in SQLite):
Simple CTE
WITH cte AS (
SELECT *, sum(verbosity) OVER (ORDER BY id) AS total
FROM messages
)
SELECT *
FROM cte
WHERE total < 70
ORDER BY id;
Recursive CTE
Should be faster for big tables where you only retrieve a small set.
WITH RECURSIVE cte AS (
( -- parentheses required
SELECT id, verbosity, verbosity AS total
FROM messages
ORDER BY id
LIMIT 1
)
UNION ALL
SELECT c1.id, c1.verbosity, c.total + c1.verbosity
FROM cte c
JOIN LATERAL (
SELECT *
FROM messages
WHERE id > c.id
ORDER BY id
LIMIT 1
) c1 ON c1.verbosity < 70 - c.total
WHERE c.total < 70
)
SELECT *
FROM cte
ORDER BY id;
All standard SQL, except for LIMIT.
Strictly speaking, there is no such thing as "database-independent". There are various SQL-standards, but no RDBMS complies completely. LIMIT works for PostgreSQL and SQLite (and some others). Use TOP 1 for SQL Server, rownum for Oracle. Here's a comprehensive list on Wikipedia.
The SQL:2008 standard would be:
...
FETCH FIRST 1 ROWS ONLY
... which PostgreSQL supports - but hardly any other RDBMS.
The pure alternative that works with more systems would be to wrap it in a subquery and
SELECT max(total) FROM <subquery>
But that is slow and unwieldy.
db<>fiddle here
Old sqlfiddle
This will work...
select *
from messages
where id<=
(
select MAX(id) from
(
select m2.id, SUM(m1.verbosity) sv
from messages m1
inner join messages m2 on m1.id <=m2.id
group by m2.id
) v
where sv<70
)
However, you should understand that SQL is designed as a set based language, rather than an iterative one, so it designed to treat data as a set, rather than on a row by row basis.
I want to select some rows based on certain criteria, and then take one entry from that set and the 5 rows before it and after it.
Now, I can do this numerically if there is a primary key on the table, (e.g. primary keys that are numerically 5 less than the target row's key and 5 more than the target row's key).
So select the row with the primary key of 7 and the nearby rows:
select primary_key from table where primary_key > (7-5) order by primary_key limit 11;
2
3
4
5
6
-=7=-
8
9
10
11
12
But if I select only certain rows to begin with, I lose that numeric method of using primary keys (and that was assuming the keys didn't have any gaps in their order anyway), and need another way to get the closest rows before and after a certain targeted row.
The primary key output of such a select might look more random and thus less succeptable to mathematical locating (since some results would be filtered, out, e.g. with a where active=1):
select primary_key from table where primary_key > (34-5)
order by primary_key where active=1 limit 11;
30
-=34=-
80
83
100
113
125
126
127
128
129
Note how due to the gaps in the primary keys caused by the example where condition (for example becaseu there are many inactive items), I'm no longer getting the closest 5 above and 5 below, instead I'm getting the closest 1 below and the closest 9 above, instead.
There's a lot of ways to do it if you run two queries with a programming language, but here's one way to do it in one SQL query:
(SELECT * FROM table WHERE id >= 34 AND active = 1 ORDER BY id ASC LIMIT 6)
UNION
(SELECT * FROM table WHERE id < 34 AND active = 1 ORDER BY id DESC LIMIT 5)
ORDER BY id ASC
This would return the 5 rows above, the target row, and 5 rows below.
Here's another way to do it with analytic functions lead and lag. It would be nice if we could use analytic functions in the WHERE clause. So instead you need to use subqueries or CTE's. Here's an example that will work with the pagila sample database.
WITH base AS (
SELECT lag(customer_id, 5) OVER (ORDER BY customer_id) lag,
lead(customer_id, 5) OVER (ORDER BY customer_id) lead,
c.*
FROM customer c
WHERE c.active = 1
AND c.last_name LIKE 'B%'
)
SELECT base.* FROM base
JOIN (
-- Select the center row, coalesce so it still works if there aren't
-- 5 rows in front or behind
SELECT COALESCE(lag, 0) AS lag, COALESCE(lead, 99999) AS lead
FROM base WHERE customer_id = 280
) sub ON base.customer_id BETWEEN sub.lag AND sub.lead
The problem with sgriffinusa's solution is that you don't know which row_number your center row will end up being. He assumed it will be row 30.
For similar query I use analytic functions without CTE. Something like:
select ...,
LEAD(gm.id) OVER (ORDER BY Cit DESC) as leadId,
LEAD(gm.id, 2) OVER (ORDER BY Cit DESC) as leadId2,
LAG(gm.id) OVER (ORDER BY Cit DESC) as lagId,
LAG(gm.id, 2) OVER (ORDER BY Cit DESC) as lagId2
...
where id = 25912
or leadId = 25912 or leadId2 = 25912
or lagId = 25912 or lagId2 = 25912
such query works more faster for me than CTE with join (answer from Scott Bailey). But of course less elegant
You could do this utilizing row_number() (available as of 8.4). This may not be the correct syntax (not familiar with postgresql), but hopefully the idea will be illustrated:
SELECT *
FROM (SELECT ROW_NUMBER() OVER (ORDER BY primary_key) AS r, *
FROM table
WHERE active=1) t
WHERE 25 < r and r < 35
This will generate a first column having sequential numbers. You can use this to identify the single row and the rows above and below it.
If you wanted to do it in a 'relationally pure' way, you could write a query that sorted and numbered the rows. Like:
select (
select count(*) from employees b
where b.name < a.name
) as idx, name
from employees a
order by name
Then use that as a common table expression. Write a select which filters it down to the rows you're interested in, then join it back onto itself using a criterion that the index of the right-hand copy of the table is no more than k larger or smaller than the index of the row on the left. Project over just the rows on the right. Like:
with numbered_emps as (
select (
select count(*)
from employees b
where b.name < a.name
) as idx, name
from employees a
order by name
)
select b.*
from numbered_emps a, numbered_emps b
where a.name like '% Smith' -- this is your main selection criterion
and ((b.idx - a.idx) between -5 and 5) -- this is your adjacency fuzzy-join criterion
What could be simpler!
I'd imagine the row-number based solutions will be faster, though.