Efficient way to get top 5 rows with max value without using order by? - sql

A Relational database table holds the information of Insurance details, say id and amount. Table consists of millions of records. requirement is to fetch top 5 records with max amount without using order by clause.
A solution I could think of is to use the temp table to maintain the max 5 records and update these entries each time the main table is updated but would like to know if there are better solution to above problem ?

An efficient way is to put an index on amount desc and use order by. Something like:
select t.*
from t
order by t.amount desc
fetch first 5 rows only; -- or however your database does this
This should be quite efficient.

You can try using analytic functions (example below), but you still have to order at some stage
select id,
amount
from (select id,
amount,
row_number() over (order by amount desc nulls last) as rn
from t)
where rn<=5;

Related

Use of rownum in batching the data from oracle

Recently i was using the rownum in the sql query to fetch the data from oracle db in batch of 1000 records. for example 1 to 1000, 1001 to 2000 etc.
Note:
I have about a million records in the table
I am querying the table to get records 1000 at a time
I used the below query
SELECT NAME FROM (
SELECT NAME, ROWNUM RN
FROM employee
) WHERE RN >= ? AND RN <= ?
to fetch the data but stuck in strange issue.
The records in the db are unique but after complete execution I end up getting some duplicate record.
Is there any issue with the query? Is rownum causing issue? Is it possible that the record fetched in first batch of 1000 is also coming in the next subsequent batch?
You can't trust the order of records returned from a query without an explicit order by clause. If the inner query doesn't return values in the same order in each execution, you may get the same row in two separate executions, and thus duplicates in the overall execution.
Is there any issue with the query? Is rownum causing issue?
Yes, Oracle does not guarantee you that a common ordering will be used each time you see the output from a query. So ROWNUM, without an ORDER BY is not useful.Try these options.
In 11g use row_number with an order by, in Oracle 12c use FETCH..FIRST syntax, you should have a primary key / unique key column for this to work.
SELECT NAME
FROM (SELECT NAME,
row_number()
OVER(
ORDER BY primary_key_col ) rn
FROM employee)
WHERE rn >=?
AND rn <=? ;
SELECT NAME
FROM employee
ORDER BY primary_key_col offset ? FETCH first ? rows only;

Set Order Of column by Sequence sql

If I have a table like this:
Id Rnk
1 1
1 1
1 2
1 2
and I want to arrange the table like that:
Id Rnk
1 1
1 2
1 1
1 2
And I want it to be fixed so when ever I'll select the table the order will be like this. Any Help on how can I do it?
And I want it to be fixed so when ever I'll select the table the order
will be like this.
Quick answer: it cannot be done. You have to always use ORDER BY clause in the query if you want to get rows in desired order.
A few related questions and answers on this topis:
What is the default order of records for a SELECT statement in MySQL?
Default row ordering for select query in oracle
SQL: What is the default Order By of queries?
MySQL, ORDER BY insertion order, no sorting columns
Quote from the Wikipedia: Order by
ORDER BY is the only way to sort the rows in the result set. Without
this clause, the relational database system may return the rows in any
order. If an ordering is required, the ORDER BY must be provided in
the SELECT statement sent by the application.
Another quote from the Wikipedia: Relational database
The relational model specifies that the tuples of a relation have no
specific order and that the tuples, in turn, impose no order on the
attributes.
In order to get this concrete order you can use row_number analytic functions in this way:
SELECT "Id", "Rnk"
FROM (
SELECT t.*,
row_number() over (partition by "Id", "Rnk" order by "Id", "Rnk") as rn
FROM Table1 t
) x
ORDER BY "Id", rn
A demo for PostgreSQL: http://dbfiddle.uk/?rdbms=postgres_10&fiddle=0b86522f37e927a86701e778006e8cad
row_number is now supported by most databases, even MySql will have this feature in the upcoming version

Select last N rows SQL Server 2012

I am currently working with Big Data. I am importing data into a table which is about 200 million records per import. I want to see how many records are loaded in for the current import. But currently my script is running through 1 billion records first to finally count the last imported data.
SELECT Datum, COUNT(Datum) AS recCount
FROM PF161DailyAggregates
GROUP BY Datum
That is my current code which shows the amount of rows per Date
I can make the code that it only shows the current import job but it will still go through all the other records.
Currently this query takes about an hour. How can I make this fast to only count the last N rows?
Thanks in advance
this will restrict result to 100 rows and you can get last rows by giving order by clause desc
SELECT Datum, COUNT(Datum) AS recCount
FROM PF161DailyAggregates
GROUP BY Datum
order by datum desc
OFFSET 1 ROWS
FETCH NEXT 100 ROWS ONLY;
Thats a hard one. I think as long as you want to find out the last records AFTER the import, you are required to use some ordering on the Datum column. You can try various tricks there but as long as this column does not have an index, you will be lost, since any ordering requires a full table scan. So my first suggestion is to make an index on that column, then you can use any technique that restricts your result to the last date like:
select top 1 Datum, count(Datum)
from PF161DailyAggregates
group by Datum
order by Datum desc
or
select count(*)
from PF161DailyAggregates
where Datum = (select top 1 Datum
from PF161DailyAggregates
order by Datum desc)
Another idea would be to break out of the box and make the import job write the number of records per Datum in a seperate table each time it runs. That would be much cheaper.
Fastest way to find count on single table,
SELECT T.name AS [TABLE NAME],
I.rows AS [ROWCOUNT]
FROM sys.tables AS T
INNER JOIN sys.sysindexes AS I
ON T.object_id = I.id
AND I.indid < 2
where T.name ='PF161DailyAggregates'
ORDER BY I.rows DESC
Alternatively,
you can create one identity column.
Before insert find max id== easy and fast
then after insert find SCOPE_IDENTITY() in variable.
then subtract these two .
If table already contain one rownumber type in sequence then
also you can use same technique using First_Value in sql server 2012

How to retrieve the last 2 records from table?

I have a table with n number of records
How can i retrieve the nth record and (n-1)th record from my table in SQL without using derived table ?
I have tried using ROWID as
select * from table where rowid in (select max(rowid) from table);
It is giving the nth record but i want the (n-1)th record also .
And is there any other method other than using max,derived table and pseudo columns
Thanks
You cannot depend on rowid to get you to the last row in the table. You need an auto-incrementing id or creation time to have the proper ordering.
You can use, for instance:
select *
from (select t.*, row_number() over (order by <id> desc) as seqnum
from t
) t
where seqnum <= 2
Although allowed in the syntax, the order by clause in a subquery is ignored (for instance http://docs.oracle.com/javadb/10.8.2.2/ref/rrefsqlj13658.html).
Just to be clear, rowids have nothing to do with the ordering of rows in a table. The Oracle documentation is quite clear that they specify a physical access path for the data (http://docs.oracle.com/cd/B28359_01/server.111/b28318/datatype.htm#i6732). It is true that in an empty database, inserting records into a newtable will probably create a monotonically increasing sequence of row ids. But you cannot depend on this. The only guarantees with rowids are that they are unique within a table and are the fastest way to access a particular row.
I have to admit that I cannot find good documentation on Oracle handling or not handling order by's in subqueries in its most recent versions. ANSI SQL does not require compliant databases to support order by in subqueries. Oracle syntax allows it, and it seems to work in some cases, at least. My best guess is that it would probably work on a single processor, single threaded instance of Oracle, or if the data access is through an index. Once parallelism is introduced, the results would probably not be ordered. Since I started using Oracle (in the mid-1990s), I have been under the impression that order bys in subqueries are generally ignored. My advice would be to not depend on the functionality, until Oracle clearly states that it is supported.
select * from (select * from my_table order by rowid) where rownum <= 2
and for rows between N and M:
select * from (
select * from (
select * from my_table order by rowid
) where rownum <= M
) where rownum >= N
Try this
select top 2 * from table order by rowid desc
Assuming rowid as column in your table:
SELECT * FROM table ORDER BY rowid DESC LIMIT 2

Teradata - limiting the results using TOP

I am trying to fetch a huge set of records from Teradata using JDBC. And I need to break this set into parts for which I'm using "Top N" clause in select.
But I dont know how to set the "Offset" like how we do in MySQL -
SELECT * FROM tbl LIMIT 5,10
so that next select statement would fetch me the records from (N+1)th position.
RANK and QUALIFY I beleive are your friends here
for example
SEL RANK(custID), custID
FROM mydatabase.tblcustomer
QUALIFY RANK(custID) < 1000 AND RANK(custID) > 900
ORDER BY custID;
RANK(field) will (conceptually) retrieve all the rows of the resultset,
order them by the ORDER BY field and assign an incrementing rank ID to them.
QUALIFY allows you to slice that by limiting the rows returned to the qualification expression, which now can legally view the RANKs.
To be clear, I am returning the 900-1000th rows in the query select all from cusotmers,
NOT returning customers with IDs between 900 and 1000.
You can also use the ROW_NUMBER window aggregate on Teradata.
SELECT ROW_NUMBER() OVER (ORDER BY custID) AS RowNum_
, custID
FROM myDatabase.myCustomers
QUALIFY RowNum_ BETWEEN 900 and 1000;
Unlike the RANK windows aggregate, ROW_NUMBER will provide you a sequence regardless of whether the column you are ordering over the optional partition set is unique or not.
Just another option to consider.