There is any equivalent for TOP and BOTTOM in SQL that available in influxdb? - sql

I am porting the queries in influx DB to timescaleDB (Postgres SQL). I am currently stuck in the TOP and BOTTOM functions. Is there any equivalent in Postgres SQL or any suggestions to achieve it?
For constant one, I did like that,
TOP('field', 1) -> MAX('field')
BOTTOM('field', 1) -> MIN('field')
What about others like,
TOP('field', 5)
BOTTOM('field', 5)
Edit 1:
Does Using LIMIT with ORDER BY also work with GROUP BY because the limit is executed after group by Right What if want something like this
Thank You

Using window functions is probably the most versatile way to do this:
select *
from (
select t.*,
dense_rank() over (partition by ??? order by ??? asc) as rnk
from the_table t
) x
where x.rnk = 3; --<< adjust here
Rows in a relational database have no implied sort order. So "top" or "bottom" only makes sense if you also provide an order by. From your question is completely unclear what that would be.
Using order by .. asc returns the "bottom rows", using order by .. desc returns the "top rows"
If you want top/bottom for the entire table (instead of one row "per group"), the leave out the partition by
dense_rank() will return multiple rows with the same "rank" when the rows have the same highest (or lowest) value in the column you are sorting by. If you don't want that (and pick an arbitrary one from those "duplicates") then use row_number() instead.

Related

Reverse initial order of SELECT statement

I want to run a SQL query in Postgres that is exactly the reverse of the one that you'd get by just running the initial query without an order by clause.
So if your query was:
SELECT * FROM users
Then
SELECT * FROM users ORDER BY <something here to make it exactly the reverse of before>
Would it just be this?
ORDER BY Desc
You are building on the incorrect assumption that you would get rows in a deterministic order with:
SELECT * FROM users;
What you get is really arbitrary. Postgres returns rows in any way it sees fit. For simple queries typically in order of their physical storage, which typically is the order in which rows were entered. But there are no guarantees, and the order may change any time between two calls. For instance after any UPDATE (writing a new physical row version), or when any background process reorders rows - like VACUUM. Or a more complex query might return rows according to an index or a join. Long story short: there is no reliable order for table rows in a relational database unless you specify it with ORDER BY.
That said, assuming you get rows from the above simple query in the order of physical storage, this would get you the reverse order:
SELECT * FROM users
ORDER BY ctid DESC;
ctid is the internal tuple ID signifying physical order. Related:
In-order sequence generation
How list all tables with data changes in the last 24 hours?
here is a tsql solution, thid might give you an idea how to do it in postgres
select * from (
SELECT *, row_number() over( order by (select 1)) rowid
FROM users
) x
order by rowid desc

Does ROW_NUMBER queried data require to fetch the entire result set?

I'm trying to write query that's not using offset (because as I just have learnt offset fetches all data which causes performance overhead). with ROW_NUMBER window function. For instance:
SELECT id
FROM(
SELECT id, ROW_NUMBER() over (order by id) rn
FROM users) sq
WHERE rn > 1000
Does it require all rows to be fetched as it would be with offset 1000? I mean, does it make a sense to use such query instead of
SELECT if
FROM users
OFFSET 1000
? Do I get performance improvement on large amount of data?
Check out the window function docs. Window functions operate on the result set, after the fetch:
Window functions are permitted only in the SELECT list and the ORDER
BY clause of the query. They are forbidden elsewhere, such as in GROUP
BY, HAVING and WHERE clauses. This is because they logically execute
after the processing of those clauses. Also, window functions execute
after regular aggregate functions. This means it is valid to include
an aggregate function call in the arguments of a window function, but
not vice versa.
Does it make sense to use the row_number() query? Well, it produces the same result set. However, the query basically has to assign row_number() to all the rows in order to find the ones that meet the requirement.
The second query, however, is lacking an order by. When using offset, you should have an order by:
SELECT id
FROM users u
ORDER BY id
OFFSET 1000
I would imagine that this is more efficient than using row_number(), but actual timings would demonstrate that.

Calculating SQL Server ROW_NUMBER() OVER() for a derived table

In some other databases (e.g. DB2, or Oracle with ROWNUM), I can omit the ORDER BY clause in a ranking function's OVER() clause. For instance:
ROW_NUMBER() OVER()
This is particularly useful when used with ordered derived tables, such as:
SELECT t.*, ROW_NUMBER() OVER()
FROM (
SELECT ...
ORDER BY
) t
How can this be emulated in SQL Server? I've found people using this trick, but that's wrong, as it will behave non-deterministically with respect to the order from the derived table:
-- This order here ---------------------vvvvvvvv
SELECT t.*, ROW_NUMBER() OVER(ORDER BY (SELECT 1))
FROM (
SELECT TOP 100 PERCENT ...
-- vvvvv ----redefines this order here
ORDER BY
) t
A concrete example (as can be seen on SQLFiddle):
SELECT v, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN
FROM (
SELECT TOP 100 PERCENT 1 UNION ALL
SELECT TOP 100 PERCENT 2 UNION ALL
SELECT TOP 100 PERCENT 3 UNION ALL
SELECT TOP 100 PERCENT 4
-- This descending order is not maintained in the outer query
ORDER BY 1 DESC
) t(v)
Also, I cannot reuse any expression from the derived table to reproduce the ORDER BY clause in my case, as the derived table might not be available as it may be provided by some external logic.
So how can I do it? Can I do it at all?
The Row_Number() OVER (ORDER BY (SELECT 1)) trick should NOT be seen as a way to avoid changing the order of underlying data. It is only a means to avoid causing the server to perform an additional and unneeded sort (it may still perform the sort but it's going to cost the minimum amount possible when compared to sorting by a column).
All queries in SQL server ABSOLUTELY MUST have an ORDER BY clause in the outermost query for the results to be reliably ordered in a guaranteed way.
The concept of "retaining original order" does not exist in relational databases. Tables and queries must always be considered unordered until and unless an ORDER BY clause is specified in the outermost query.
You could try the same unordered query 100,000 times and always receive it with the same ordering, and thus come to believe you can rely on said ordering. But that would be a mistake, because one day, something will change and it will not have the order you expect. One example is when a database is upgraded to a new version of SQL Server--this has caused many a query to change its ordering. But it doesn't have to be that big a change. Something as little as adding or removing an index can cause differences. And more: Installing a service pack. Partitioning a table. Creating an indexed view that includes the table in question. Reaching some tipping point where a scan is chosen instead of a seek. And so on.
Do not rely on results to be ordered unless you have said "Server, ORDER BY".

What is the result set ordering when using window functions that have `order by` components?

I'm working on a query on the SEDE:
select top 20
row_number() over(order by "percentage approved" desc, approved desc),
row_number() over(order by "total edits" asc),
*
from editors
where "total edits" > 30
What is the ordering of the result set, taking into account the two window functions?
I suspect it's undefined but couldn't find a definitive answer. OTOH, results from queries with one such window function were ordered according to the over(order by ...) clause.
The results can be returned in any order.
Now, they will often be returned in the same order as specified in the OVER clause, but this is just because SQL Server is likely to pick a query plan that sorts the rows to calculate the aggregate. This is by no means guaranteed, as it could pick a different query plan at any time, especially as you make your query more complex which extends the space of possible query plans.
The result set of ANY SQL Server query that doesn't have an explicit ORDER BY is undefined.
This includes when you have window functions within the query, or an ORDER BY in a subquery. The result order will depend on a lot of factors, none of which are guaranteed unless you specify an ORDER BY.

Select last n rows without use of order by clause

I want to fetch the last n rows from a table in a Postgres database. I don't want to use an ORDER BY clause as I want to have a generic query. Anyone has any suggestions?
A single query will be appreciated as I don't want to use FETCH cursor of Postgres.
That you get what you expect with Lukas' solution (as of Nov. 1st, 2011) is pure luck. There is no "natural order" in an RDBMS by definition. You depend on implementation details that could change with a new release without notice. Or a dump / restore could change that order. It can even change out of the blue when db statistics change and the query planner chooses a different plan that leads to a different order of rows.
The proper way to get the "last n" rows is to have a timestamp or sequence column and ORDER BY that column. Every RDBMS you can think of has ORDER BY, so this is as 'generic' as it gets.
The manual:
If ORDER BY is not given, the rows are returned in whatever order the
system finds fastest to produce.
Lukas' solution is fine to avoid LIMIT, which is implemented differently in various RDBMS (for instance, SQL Server uses TOP n instead of LIMIT), but you need ORDER BY in any case.
Use window functions!
select t2.* from (
select t1.*, row_number() over() as r, count(*) over() as c
from (
-- your generic select here
) t1
) t2
where t2.r + :n > t2.c
In the above example, t2.r is the row number of every row, t2.c is the total records in your generic select. And :n will be the n last rows that you want to fetch. This also works when you sort your generic select.
EDIT: A bit less generic from my previous example:
select * from (
select my_table.*, row_number() over() as r, count(*) over() as c
from my_table
-- optionally, you could sort your select here
-- order by my_table.a, my_table.b
) t
where t.r + :n > t.c