PostgreSQL - repeating rows from LIMIT OFFSET - sql

I noticed some repeating rows in a paginated recordset.
When I run this query:
SELECT "students".*
FROM "students"
ORDER BY "students"."status" asc
LIMIT 3 OFFSET 0
I get:
| id | name | status |
| 1 | foo | active |
| 12 | alice | active |
| 4 | bob | active |
Next query:
SELECT "students".*
FROM "students"
ORDER BY "students"."status" asc
LIMIT 3 OFFSET 3
I get:
| id | name | status |
| 1 | foo | active |
| 6 | cindy | active |
| 2 | dylan | active |
Why does "foo" appear in both queries?

Why does "foo" appear in both queries?
Because all rows that are returned have the same value for the status column. In that case the database is free to return the rows in any order it wants.
If you want a reproducable ordering you need to add a second column to your order by statement to make it consistent. E.g. the ID column:
SELECT students.*
FROM students
ORDER BY students.status asc,
students.id asc
If two rows have the same value for the status column, they will be sorted by the id.

For more details from PostgreSQL documentation (http://www.postgresql.org/docs/8.3/static/queries-limit.html) :
When using LIMIT, it is important to use an ORDER BY clause that constrains the result rows into a unique order. Otherwise you will get an unpredictable subset of the query's rows. You might be asking for the tenth through twentieth rows, but tenth through twentieth in what ordering? The ordering is unknown, unless you specified ORDER BY.
The query optimizer takes LIMIT into account when generating a query plan, so you are very likely to get different plans (yielding different row orders) depending on what you give for LIMIT and OFFSET. Thus, using different LIMIT/OFFSET values to select different subsets of a query result will give inconsistent results unless you enforce a predictable result ordering with ORDER BY. This is not a bug; it is an inherent consequence of the fact that SQL does not promise to deliver the results of a query in any particular order unless ORDER BY is used to constrain the order.

select * from(
Select "students".*
from "students"
order by "students"."status" asc
limit 6
) as temp limit 3 offset 0;
select * from(
Select "students".*
from "students"
order by "students"."status" asc
limit 6
) as temp limit 3 offset 3;
where 6 is the total number of records that is under examination.

Related

Count results in SQL statement additional row

I am trying to get 3% of total membership which the code below does, but the results are bringing me back two rows one has the % and the other is "0" not sure why or how to get rid of it ...
select
sum(Diabetes_FLAG) * 100 / (select round(count(medicaid_no) * 0.03) as percent
from membership) AS PERCENT_OF_Dia
from
prefinal
group by
Diabetes_Flag
Not sure why it brought back a second row I only need the % not the second row .
Not sure what I am doing wrong
Output:
PERCENT_OF_DIA
1 11.1111111111111
2 0
SELECT sum(Diabetes_FLAG)*100 / (SELECT round(count(medicaid_no)*0.03) as percentt
FROM membership) AS PERCENT_OF_Dia
FROM prefinal
WHERE Diabetes_FLAG = 1
# GROUP BY Diabetes_Flag # as you're limiting by the flag in the where clause, this isn't needed.
Remove the group by if you want one row:
select sum(Diabetes_FLAG)*100/( SELECT round(count(medicaid_no)*0.03) as percentt
from membership) AS PERCENT_OF_Dia
from prefinal;
When you include group by Diabetes_FLAG, it creates a separate row for each value of Diabetes_FLAG. Based on your results, I'm guessing that it takes on the values 0 and 1.
Not sure why it brought back a second row
This is how GROUP BY query works. The group by clause group data by a given column, that is - it collects all values of this column, makes a distinct set of these values and displays one row for each individual value.
Please consider this simple demo: http://sqlfiddle.com/#!9/3a38df/1
SELECT * FROM prefinal;
| Diabetes_Flag |
|---------------|
| 1 |
| 1 |
| 5 |
Usually GROUP BY column is listed in in SELECT clause too, in this way:
SELECT Diabetes_Flag, sum(Diabetes_Flag)
FROM prefinal
GROUP BY Diabetes_Flag;
| Diabetes_Flag | sum(Diabetes_Flag) |
|---------------|--------------------|
| 1 | 2 |
| 5 | 5 |
As you see, GROUP BY display two rows - one row for each unique value of Diabetes_Flag column.
If you remove Diabetes_Flag colum from SELECT clause, you will get the same result as above, but without this column:
SELECT sum(Diabetes_Flag)
FROM prefinal
GROUP BY Diabetes_Flag;
| sum(Diabetes_Flag) |
|--------------------|
| 2 |
| 5 |
So the reason that you get 2 rows is that Diabetes_Flag has 2 distict values in the table.

rownum / fetch first n rows

select * from Schem.Customer
where cust='20' and cust_id >= '890127'
and rownum between 1 and 2 order by cust, cust_id;
Execution time appr 2 min 10 sec
select * from Schem.Customer where cust='20'
and cust_id >= '890127'
order by cust, cust_id fetch first 2 rows only ;
Execution time appr 00.069 ms
The execution time is a huge difference but results are the same. My team is not adopting to later one. Don't ask why.
So what is the difference between Rownum and fetch first 2 rows and what should I do to improve or convince anyone to adopt.
DBMS : DB2 LUW
Although both SQL end up giving same resultset, it only happens for your data. There is a great chance that resultset would be different. Let me explain why.
I will make your SQL a little simpler to make it simple to understand:
SELECT * FROM customer
WHERE ROWNUM BETWEEN 1 AND 2;
In this SQL, you want only first and second rows. That's fine. DB2 will optimize your query and never look rows beyond 2nd. Because only first 2 rows qualify your query.
Then you add ORDER BY clause:
SELECT * FROM customer
WHERE ROWNUM BETWEEN 1 AND 2;
ORDER BY cust, cust_id;
In this case, DB2 first fetches 2 rows then order them by cust and cust_id. Then sends to client(you). So far so good. But what if you want to order by cust and cust_id first, then ask for first 2 rows? There is a great difference between them.
This is the simplified SQL for this case:
SELECT * FROM customer
ORDER BY cust, cust_id
FETCH FIRST 2 ROWS ONLY;
In this SQL, ALL rows qualify the query, so DB2 fetches all of the rows, then sorts them, then sends first 2 rows to client.
In your case, both queries give same results because first 2 rows are already ordered by cust and cust_id. But it won't work if first 2 rows would have different cust and cust_id values.
A hint about this is FETCH FIRST n ROWS comes after order by, that means DB2 orders the result then retrieves first n rows.
Excellent answer here:
https://blog.dbi-services.com/oracle-rownum-vs-rownumber-and-12c-fetch-first/
Now the index range scan is chosen, with the right cardinality estimation.
So which solution it the best one? I prefer row_number() for several reasons:
I like analytic functions. They have larger possibilities, such as setting the limit as a percentage of total number of rows for example.
11g documentation for rownum says:
The ROW_NUMBER built-in SQL function provides superior support for ordering the results of a query
12c allows the ANSI syntax ORDER BY…FETCH FIRST…ROWS ONLY which is translated to row_number() predicate
12c documentation for rownum adds:
The row_limiting_clause of the SELECT statement provides superior support
rownum has first_rows_n issues as well
PLAN_TABLE_OUTPUT
SQL_ID 49m5a3f33cmd0, child number 0
-------------------------------------
select /*+ FIRST_ROWS(10) */ * from test where contract_id=500
order by start_validity fetch first 10 rows only
Plan hash value: 1912639229
--------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | Buffers |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 10 | 15 |
|* 1 | VIEW | | 1 | 10 | 10 | 15 |
|* 2 | WINDOW NOSORT STOPKEY | | 1 | 10 | 10 | 15 |
| 3 | TABLE ACCESS BY INDEX ROWID| TEST | 1 | 10 | 11 | 15 |
|* 4 | INDEX RANGE SCAN | TEST_PK | 1 | | 11 | 4 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber" <=10)
2 - filter(ROW_NUMBER() OVER ( ORDER BY "TEST"."START_VALIDITY") <=10 )
4 - access("CONTRACT_ID"=500)

Sort by data from multiple columns

For customer reviews on my products, I have them stored in SQL something like the below:
durability | cost | appearance
----------------------------------
5 | 3 | 4
2 | 4 | 2
1 | 5 | 5
Each value is an out of five score in the three categories.
When I want to print this information on page, I'd like to order them in descending order by the average score of an individual review.
SELECT *
FROM reviews
ORDER BY (durability+cost+appearance)/3 DESC
Obviously this doesn't work, but is there a way to get my result? I don't want to include an average column in SQL because outside of this one small application, it serves zero purpose.
Use ORDER BY instead of SORT BY:
SELECT *
FROM reviews
ORDER BY (durability+cost+appearance)/3 DESC
EDIT:
To see the order by value, try adding one more column in the select clause:
SELECT *,(durability+cost+appearance)/3 as OrderValue
FROM reviews
ORDER BY (durability+cost+appearance)/3 DESC
Sample output:
DURABILITY COST APPEARANCE ORDERVALUE
5 3 4 4
1 5 5 3
2 4 2 2

How to get the first rows after order

How can I get only the first few rows,
After I performed order by to a table?
In SQL 2012, let's say I have a table:
----------------------
| Sales | ProductType |
----------------------
120 | Foodstuff
100 | Electronic
200 | Mobile
Now the problem is:
I select with order by Sales DESC
and I only want to get 2 rows.
You can use the limit clause.
SELECT *
FROM tablename
ORDER BY sales DESC
LIMIT n;
Where n is the number of rows you want to select

How to get row count in all rows?

select id from table;
+------+
| id |
+------+
| 774 |
| 2775 |
+------+
return 2 rows
select count(id) as count, id from table;
+-------+-----+
| count | id |
+-------+-----+
| 2 | 774 |
+-------+-----+
but return 1 row
How to return all rows, but with counter in each record ?
SQL ???
+-------+------+
| count | id |
+-------+------+
| 2 | 774 |
| 2 | 2775 |
+-------+------+
SELECT id, (select count(*) from table) AS TotalRows
FROM table;
Although this seems unnecessary, as the total count will not change per row.
Use a group by
select id, count(id)
from table
group by id;
(BTW, your SQL in question does not work, at least in oracle and AFAIK in MySql)
I'm not sure what you're trying to do, but if you're trying to fetch the rows and get the total count in the same query because its a resource-intensive and you don't want to repeat your joins/conditions/whatever in two queries, under MySQL you can do:
# Returns a regular results set
SELECT SQL_CALC_FOUND_ROWS foo, bar FROM baz WHERE qux = 'corge' LIMIT 2;
# Returns the total count of found rows (without the LIMIT)
SELECT FOUND_ROWS();
If you want the total number of rows after the LIMIT, or don't have a LIMIT at all, you can skip the SQL_CALC_FOUND_ROWS.
However, generally speaking, counting the total number of rows doesn't scale very well. If you can, find an alternative way that doesn't require you to do that. for example, if its for paging, consider showing only 'next' / 'prev' buttons, without displaying the total number of pages. If you have 30 rows in a page, you can LIMIT 31 instead of 30, only display the first 30 rows, and check if the 31th row exists to know if a 'next' button should be displayed.
if you are useing oracle database you can use count Analytic function also for achieve this task as follow -
SELECT COUNT(*) OVER (PARTITION BY 1) AS COUNT, ID FROM TABLE