Update top N values using PostgreSQL - sql

I want to update the top 10 values of a column in table. I have three columns; id, account and accountrank. To get the top 10 values I can use the following:
SELECT * FROM accountrecords
ORDER BY account DESC
LIMIT 10;
What I would like to do is to set the value in accountrank to be a series of 1 - 10, based on the magnitude of account. Is this possible to do in PostgreSQL?

WITH cte AS (
SELECT id, row_number() OVER (ORDER BY account DESC NULLS LAST) AS rn
FROM accountrecords
ORDER BY account DESC NULLS LAST
LIMIT 10
)
UPDATE accountrecords a
SET accountrank = cte.rn
FROM cte
WHERE cte.id = a.id;
Joining in a table expression is typically faster than correlated subqueries. It is also shorter.
With the window function row_number() distinct numbers are guaranteed. Use rank() (or possibly dense_rank()) if you want rows with equal values for account to share the same number.
Only if there can be NULL values in account, you need to append NULLS LAST for descending sort order, or NULL values sort on top:
Sort by column ASC, but NULL values first?
If there can be concurrent write access, the above query is subject to a race condition. Consider:
Atomic UPDATE .. SELECT in Postgres
Postgres UPDATE … LIMIT 1
However, if that was the case, the whole concept of hard-coding the top ten would be a dubious approach to begin with.
Use a CTE instead of a plain subquery to enforce the LIMIT reliably. See links above.

Sure, you can use your select statement in a subquery. Generating the rank-order isn't trivial, but here's at least one way to do it. I haven't tested this, but off the top of my head:
update accountrecords
set accountrank =
(select count(*) + 1 from accountrecords r where r.account > account)
where id in (select id from accountrecords order by account desc limit 10);
This has the quirk that if two records have the same value for account, then they will get the same rank. You could consider that a feature... :-)

Related

How to pick first record from the duplicates, With only duplicate column values

Here is the situation where I have a table in bigquery like following.
As in the table we have record 1 and 3 with the same id but different first_name (Say the person with the id one changed his first_name) all other fields are same in both of the records (1 and 3) Now I need to select one records out of those 2 how can I do that. I tried self join but that is discarding both of the records, group_by will not work because the records is not duplicate only the Id is duplicate same with the distinct.
Thanks!!!!
The query I am using right now is
select * from table t group by 1,2,3,4,5;
You Can use ROW_NUMBER function to assign row numbers to each of your records in the table.
select *
from(
select *, ROW_NUMBER() OVER(PARTITION BY t.id) rn
from t)
Where rn = 1
ROW_NUMBER does not require the ORDER BY clause. Returns the sequential row ordinal (1-based) of each row for each ordered partition. If the ORDER BY clause is unspecified then the result is non-deterministic.
If you have record created date or modified dates you can use those in the ORDER BY clause to alway pick up the latest records.
SQL tables represent unordered sets. There is no first row unless you have a column that specifies the ordering. Let me assume you have such a column.
If you want a particular row, you can use aggregation with an order by:
select array_agg(t order by ? asc limit 1)[ordinal(1)].*
from t
group by id;
? is the column that specifies the ordering.
You can also leave out the order by:
select array_agg(t limit 1)[ordinal(1)].*
from t
group by id;

How to identify duplicates in SQL?

I have an requirment that to identify the dupliacte values in the result data and append colour to it, where the Limit is of 10 records only(not to check duplicates in entire table).
Now my issue is how to find the duplicate in the respective result set.
I have tried in this way, But it's checking all table for duplicate. But, I want in within the limit checking.
SELECT count(*)
,safer_id
,CONCAT (
(DAYS_OPEN)
,CASE
WHEN (count(*) > 1)
THEN '~#0a9ec1'
END
) AS DAYS_OPEN
FROM table_gear
WHERE SAFER_ID NOT LIKE '%WYN%'
GROUP BY safer_id
,url
,DAYS_OPEN
ORDER BY Days_open DESC limit 10 offset 0;
I would use COUNT as an analytic function here:
SELECT
safer_id,
url,
CASE WHEN cnt > 1 THEN '~#0a9ec1' END AS color
FROM
(
SELECT t.*, COUNT(*) OVER (PARTITION BY safer_id, url) cnt
FROM table_gear t
WHERE safer_id NOT LIKE '%WYN%'
) a
ORDER BY cnt DESC
LIMIT 10;
This query conditionally assigns a hex color to those records which are duplicate with respect to the combination of safer_id and url values. I'm not entirely sure about your limit or ordering logic, but you can easily modify what I wrote above to fit your needs.

How to select the rows in original order in Hive?

I want to select rows from mytable in original rows with definite numbers.
As we know, the key word 'limit' will randomly select rows. The rows in mytable are in order. I just want to select them in their original order. For example, to select the 10000 rows which means from row 1 to row 10000.
How to realize this?
Thanks.
Try:
SET mapred.reduce.tasks = 1
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER () AS row_num
FROM table ) table1
SORT BY row_num LIMIT 10000
Rows in your table may be in order but...
Tables are being read in parallel, results returned from different mappers or reducers not in original order. That is why you should know the rule defining "original order".
If you know then you can use row_number() or order by. For example:
select * from table order by ... limit 10000;

Oracle Select query help please

SELECT id
FROM (
SELECT id
FROM table
WHERE
PROCS_DT is null
ORDER BY prty desc, cret_dt ) where rownum >0 and rownum <=100
The above query is giving me back 100 records as expected
SELECT id
FROM (
SELECT id
FROM table
WHERE
PROCS_DT is null
ORDER BY prty desc, cret_dt ) where rownum >101 and rownum <=200
why is the above query returning me zero records?
Can some one help me how i can keep on. I am dumb in oracle...
Try this:
SELECT id
FROM
(SELECT id,
rownum AS rn
FROM
(SELECT id
FROM TABLE
WHERE PROCS_DT IS NULL
ORDER BY prty DESC, cret_dt) )
WHERE rn >101
AND rn <=200
If you are comfortable using the ANALYTIC functions, try this:
SELECT id
FROM
(
SELECT id,
ROW_NUMBER() OVER(ORDER BY prty DESC, cret_dt ) rn
FROM table
WHERE procs_dt IS NULL
)
WHERE rn >101 and rn <=200
ROWNUM values are assigned to rows as they are returned from a query (or subquery). If a row is not returned, it is not assigned a ROWNUM value at all; so the ROWNUM values returned always begin at 1 and increment by 1 for each row.
(Note that these values are assigned prior to any sorting indicated by the ORDER BY clause. This is why in your case you need to check rownum outside the subquery.)
The odd bit of logic you have to understand is that when you have a predicate on ROWNUM, you are filtering on a value that will only exist if the row passes the filter. Conceptually, Oracle applies any other filters in the query first, then tentatively assigns ROWNUM 1 to the first matching row and checks it against the filter on ROWNUM. If it passes this check, it will be returned with that ROWNUM value, and the next row will be tentatively assigned ROWNUM 2. But if it does not pass the check, the row is discarded, and the same ROWNUM value is tentatively assigned to the next row.
Therefore, if the filter on ROWNUM does not accept a value of 1, no rows will ever pass the filter.
The use of the analytic function ROW_NUMBER() shown in the other answers is one way around this. This function explicitly assigns row numbers (distinct from ROWNUM) based on a given ordering. However, this can change performance significantly, as the optimizer does not necessarily realize that it can avoid assigning numbers to ever possible row in order to complete the query.
The traditional ROWNUM-based way of doing what you want is:
SELECT id
FROM (
SELECT rownum rn, id
FROM (
SELECT id
FROM table
WHERE
PROCS_DT is null
ORDER BY prty desc, cret_dt
) where rownum <=200
) where rn > 101
The innermost query conceptually finds all matching rows and sorts them. The next layer assigns ROWNUMs to these and returns only the first 200 matches. (And actually, the Oracle optimizer understands the significance of a sort followed by a ROWNUM filter, and will usually do the sort in such a way as to identify the top 200 rows without caring about the specific ordering of the other rows.)
The middle layer also takes the ROWNUMs that it assigns and returns them as part of its result set with the alias "rn". This allows the outermost layer to filter on that value to establish the lower limit.
I would experiment with this variant and the analytic function to see which performs better in your case.

SQL Server SELECT LAST N Rows

This is a known question but the best solution I've found is something like:
SELECT TOP N *
FROM MyTable
ORDER BY Id DESC
I've a table with lots of rows. It is not a posibility to use that query because it takes lot of time. So how can I do to select last N rows without using ORDER BY?
EDIT
Sorry duplicated question of this one
You can get SQL server to select the last N rows with the following query:
select * from tbl_name order by id desc limit N;
I tested JonVD's code, but found it was very slow, 6s.
This code took 0s.
SELECT TOP(5) ORDERID, CUSTOMERID, OrderDate
FROM Orders where EmployeeID=5
Order By OrderDate DESC
You can do it by using the ROW NUMBER BY PARTITION Feature also. A great example can be found here:
I am using the Orders table of the Northwind database... Now let us retrieve the Last 5 orders placed by Employee 5:
SELECT ORDERID, CUSTOMERID, OrderDate
FROM
(
SELECT ROW_NUMBER() OVER (PARTITION BY EmployeeID ORDER BY OrderDate DESC) AS OrderedDate,*
FROM Orders
) as ordlist
WHERE ordlist.EmployeeID = 5
AND ordlist.OrderedDate <= 5
If you want to select last numbers of rows from a table.
Syntax will be like
select * from table_name except select top
(numbers of rows - how many rows you want)* from table_name
These statements work but differrent ways. thank you guys.
select * from Products except select top (77-10) * from Products
in this way you can get last 10 rows but order will show descnding way
select top 10 * from products
order by productId desc
select * from products
where productid in (select top 10 productID from products)
order by productID desc
select * from products where productID not in
(select top((select COUNT(*) from products ) -10 )productID from products)
First you most get record count from
Declare #TableRowsCount Int
select #TableRowsCount= COUNT(*) from <Your_Table>
And then :
In SQL Server 2012
SELECT *
FROM <Your_Table> As L
ORDER BY L.<your Field>
OFFSET <#TableRowsCount-#N> ROWS
FETCH NEXT #N ROWS ONLY;
In SQL Server 2008
SELECT *
FROM
(
SELECT ROW_NUMBER() OVER(ORDER BY ID) AS sequencenumber, *
FROM <Your_Table>
Order By <your Field>
) AS TempTable
WHERE sequencenumber > #TableRowsCount-#N
In a very general way and to support SQL server here is
SELECT TOP(N) *
FROM tbl_name
ORDER BY tbl_id DESC
and for the performance, it is not bad (less than one second for more than 10,000 records On Server machine)
Is "Id" indexed? If not, that's an important thing to do (I suspect it is already indexed).
Also, do you need to return ALL columns? You may be able to get a substantial improvement in speed if you only actually need a smaller subset of columns which can be FULLY catered for by the index on the ID column - e.g. if you have a NONCLUSTERED index on the Id column, with no other fields included in the index, then it would have to do a lookup on the clustered index to actually get the rest of the columns to return and that could be making up a lot of the cost of the query. If it's a CLUSTERED index, or a NONCLUSTERED index that includes all the other fields you want to return in the query, then you should be fine.
select * from (select top 6 * from vwTable order by Hours desc) T order by Hours
Here's something you can try without an order by but I think it requires that each row is unique. N is the number of rows you want, L is the number of rows in the table.
select * from tbl_name except select top L-N * from tbl_name
As noted before, which rows are returned is undefined.
EDIT: this is actually dog slow. Of no value really.
A technique I use to query the MOST RECENT rows in very large tables (100+ million or 1+ billion rows) is limiting the query to "reading" only the most recent "N" percentage of RECENT ROWS. This is real world applications, for example I do this for non-historic Recent Weather Data, or recent News feed searches or Recent GPS location data point data.
This is a huge performance improvement if you know for certain that your rows are in the most recent TOP 5% of the table for example. Such that even if there are indexes on the Tables, it further limits the possibilites to only 5% of rows in tables which have 100+ million or 1+ billion rows. This is especially the case when Older Data will require Physical Disk reads and not only Logical In Memory reads.
This is well more efficient than SELECT TOP | PERCENT | LIMIT as it does not select the rows, but merely limit the portion of the data to be searched.
DECLARE #RowIdTableA BIGINT
DECLARE #RowIdTableB BIGINT
DECLARE #TopPercent FLOAT
-- Given that there is an Sequential Identity Column
-- Limit query to only rows in the most recent TOP 5% of rows
SET #TopPercent = .05
SELECT #RowIdTableA = (MAX(TableAId) - (MAX(TableAId) * #TopPercent)) FROM TableA
SELECT #RowIdTableB = (MAX(TableBId) - (MAX(TableBId) * #TopPercent)) FROM TableB
SELECT *
FROM TableA a
INNER JOIN TableB b ON a.KeyId = b.KeyId
WHERE a.Id > #RowIdTableA AND b.Id > #RowIdTableB AND
a.SomeOtherCriteria = 'Whatever'
MS doesn't support LIMIT in t-sql. Most of the times i just get MAX(ID) and then subtract.
select * from ORDERS where ID >(select MAX(ID)-10 from ORDERS)
This will return less than 10 records when ID is not sequential.
This query returns last N rows in correct order, but it's performance is poor
select *
from (
select top N *
from TableName t
order by t.[Id] desc
) as temp
order by temp.[Id]
use desc with orderby at the end of the query to get the last values.
This may not be quite the right fit to the question, but…
OFFSET clause
The OFFSET number clause enables you to skip over a number of rows and then return rows after that.
That doc link is to Postgres; I don't know if this applies to Sybase/MS SQL Server.
DECLARE #MYVAR NVARCHAR(100)
DECLARE #step int
SET #step = 0;
DECLARE MYTESTCURSOR CURSOR
DYNAMIC
FOR
SELECT col FROM [dbo].[table]
OPEN MYTESTCURSOR
FETCH LAST FROM MYTESTCURSOR INTO #MYVAR
print #MYVAR;
WHILE #step < 10
BEGIN
FETCH PRIOR FROM MYTESTCURSOR INTO #MYVAR
print #MYVAR;
SET #step = #step + 1;
END
CLOSE MYTESTCURSOR
DEALLOCATE MYTESTCURSOR
In order to get the result in ascending order
SELECT n.*
FROM
(
SELECT *
FROM MyTable
ORDER BY id DESC
LIMIT N
) n
ORDER BY n.id ASC
I stumpled acros this issue while using SQL server
What i did to resolve it is order the results descending and giving row number to the results of that, After i filtered the results and turned them around again.
SELECT *
FROM (
SELECT *
,[rn] = ROW_NUMBER() OVER (ORDER BY [column] DESC)
FROM [table]
) A
WHERE A.[rn] < 3
ORDER BY [column] ASC
Easy copy paste answer
To display last 3 rows without using order by:
select * from Lms_Books_Details where Book_Code not in
(select top((select COUNT(*) from Lms_Books_Details ) -3 ) book_code from Lms_Books_Details)
Try using the EXCEPT syntax.
Something like this:
SELECT *
FROM clientDetails
EXCEPT
(SELECT TOP (numbers of rows - how many rows you want) *
FROM clientDetails)