Query rows 9,000,000 to 10,000,000 - sql

New to postgresql, I have a table with 10,000,000 rows, I've been querying data a million rows at a time
SELECT mmsi, report_timestamp, position_geom, ST_X(position_geom) AS Long,ST_Y(position_geom) AS Lat
FROM reports4
WHERE position_geom IS NOT NULL
ORDER by report_timestamp ASC
LIMIT 1000000
OFFSET 8000000
When I try and query the last million rows nothing shows up
SELECT mmsi, report_timestamp, position_geom, ST_X(position_geom) AS Long,ST_Y(position_geom) AS Lat
FROM reports4
WHERE position_geom IS NOT NULL
ORDER by report_timestamp ASC
LIMIT 1000000
OFFSET 9000000
Not sure If I'm doing the query right, or im overlooking something.

The table may have 10,000,000 rows, but how many of those rows have WHERE position_geom IS NOT NULL?
What do you get with:
SELECT count(*)
FROM reports4
WHERE position_geom IS NOT NULL;

Related

Why LIMIT и OFFSET (OFFSET ... ROWS FETCH FIRST ... ROW only) functions in PostgreSQL are not working?

I am trying to use LIMIT and OFFSET functions or OFFSET ... ROWS
FETCH FIRST ... ROW only. PostgreSQL gives me the wrong number of rows in result.
select user_id, max(order_ts) as lastorder
from production.orders
group by user_id
order by lastorder desc, user_id desc
OFFSET 10 ROWS
FETCH FIRST 20 ROW only
or
select user_id, max(order_ts) as lastorder
from production.orders
group by user_id
order by lastorder desc, user_id desc
OFFSET 10
limit 20
Still gives me 20 rows (should be 10: from 10th row to 20th - is 10).
How is this? Any help, please?
LIMIT 20 tells server to return not more than 20 records. FETCH FIRST 20 ONLY is absolutely the same. The query might return 20 rows or less depending on the data and query conditions. If you are trying to get rows from 11th to 20th then you need to specify LIMIT 10 OFFSET 10.
See the paragraph LIMIT Clause in the documentation for details:
https://www.postgresql.org/docs/15/sql-select.html#SQL-LIMIT

How remove rows or filter rows with some identical columns values by some criteria?

I have a table which I filter and sort with this query
select * from XXXX where Segment='Gewerbe' and Division='Strom' and consumption >= 1000 and consumption < 2000 and rank<= 5 and market_id='39a2e05fd43300c998558ef56bca18e2' order by consumption ,rank
The result set contains basically three groups of results which are grouped MARKET_ID and RANK. Each subresult differs by RANK (1..N).
The difficult part: I am interested only in the subresults with the highest RANK In this case I need each row with RANK=5. So I want to eliminate the rows with RANK=1..4. Note that highest RANK for each subresult might be smaller than 5.
Result table
I think the following would work too:
select market_id, consumption, max(costs_netto), max(rank) from XXX where Segment='Gewerbe' and Division='Strom' and consumption >= 1000 and consumption < 2000 and rank<= 5 and market_id='39a2e05fd43300c998558ef56bca18e2'[enter image description here][1] group by market_id, consumption order by market_id, consumption
Result with grouping

How to query exact range of rows in Impala?

I want to get a result set of rows from 1000 to 2000 from an ordered query.
In Oracle I would use condition "rownum >= 1000 and rownum <= 2000"
Is there some way to do same thing using Impala?
Managed to do it by using:
"ORDER BY field_id
LIMIT 1000 OFFSET 1000"

SQL AVG() function returning incorrect values

I want to use the AVG function in sql to return a working average for some values (ie based on the last week not an overall average). I have two values I am calculating, weight and restingHR (heart rate). I have the following sql statements for each:
SELECT AVG( weight ) AS average
FROM stats
WHERE userid='$userid'
ORDER BY date DESC LIMIT 7
SELECT AVG( restingHR ) AS average
FROM stats
WHERE userid='$userid'
ORDER BY date DESC LIMIT 7
The value I get for weight is 82.56 but it should be 83.35
This is not a massive error and I'm rounding it when I use it so its not too big a deal.
However for restingHR I get 45.96 when it should be 57.57 which is a massive difference.
I don't understand why this is going so wrong. Any help is much appreciated.
Thanks
Use a subquery to separate selecting the rows from computing the average:
SELECT AVG(weight) average
FROM (SELECT weight
FROM stats
WHERE userid = '$userid'
ORDER BY date DESC
LIMIT 7) subq
It seems you want to filter your data with ORDER BY date DESC LIMIT 7, but you have to consider, that the ORDER BY clause takes effect after everything else is done. So your AVG() function considers all values of restingHR from your $userId, not just the 7 latest.
To overcome this...okay, Barmar just posted a query.

Oracle SQL query slight alteration in query makes large result return time difference

H
I'm trying to figure out why two similar but slightly different SQL queries have such a large discrepancy between the time taken to run them.
I would appreciate input based on the two samples below and the times reported.
The first query is as follows and it takes approximately 115 - 154 seconds to run.
SELECT * FROM
(
SELECT a.*, ROWNUM rnum
FROM
(
SELECT ERR_ID, ERR_CREATED_BY,TO_CHAR(ERR_CREATED_DATE, 'DD/MM/YYYY H24:MI'),
ERR_SOURCE, ERR_DETAIL
FROM tdsys_errors err
WHERE ERR_SOURCE = 'Online Transaction'
ORDER BY ERR_ID DESC
) a
WHERE ROWNUM <= 25
)
WHERE rnum > 0;
The second query has a slight change in terms of the position of the "ORDER BY ERR_ID DESC " piece and takes approximately 0.07 seconds to run
SELECT * FROM
(
SELECT a.*, ROWNUM rnum
FROM
(
SELECT ERR_ID, ERR_CREATED_BY,TO_CHAR(ERR_CREATED_DATE, 'DD/MM/YYYY H24:MI'),
ERR_SOURCE, ERR_DETAIL
FROM tdsys_errors err
WHERE ERR_SOURCE = 'Online Transaction'
) a
WHERE ROWNUM <= 25
)
WHERE rnum > 0
ORDER BY ERR_ID DESC;
I'm guessing the second query is ordered AFTER the results arrive and the first query tries to do all at once.
Is this an SQL best practice case is what I'm wondering and why is there such a difference?
Thanks
Your own sumise is correct, the first query orders the rows from tdsys_errors by the err_id, takes the first 25 of those, and then returns those. The second query just outputs 25 rows (no order guarenteed) and then orders those random 25 rows.
In the first case you're selecting the first 25 rows - those with the lowest highest err_id. It has to find all the results from your query and then order them all before it knows which 25 to use, which is clearly taking a while.
In the second case you're pulling the first 25 rows returned by the unordered query, which could be anything but is quick, and then ordering only those 25.
You are likely to get different results form the two queries - you certainly shouldn't assume they'll always be the same, even if they happen to be sometimes.
The reason is that first query has to order all the rows in the tdsys_errors table, whereas the second query only has the 25 rows returned from the inner query to order.
Note that the output of the two queries can be different.
Assuming you're using Oracle 9i or higher, you can use the window/analytic function ROW_NUMBER() so you need not use multiple subqueries:
SELECT * FROM (
SELECT ERR_ID, ERR_CREATED_BY, TO_CHAR(ERR_CREATED_DATE, 'DD/MM/YYYY H24:MI')
, ERR_SOURCE, ERR_DETAIL, ROW_NUMBER() OVER (ORDER BY ERR_ID DESC) AS rnum
FROM tdsys_errors err
WHERE ERR_SOURCE = 'Online Transaction'
) WHERE rnum <= 25
ORDER BY ERR_ID DESC;
Hope this helps.