Is it possible to select the first 50 rows in Postgres with select * from yellow_tripdata_staging fetch first 50 rows only and after that sort the results by column?
If so, how?
edit: the table is really big, and is not really important which rows i get.
this question was because i was using Redash to visualise the data and was getting some weird order on the sorted results.then i realized that the column i was using to order was not numerical but char, which causes values like 11 and 10 to come before 2 and 3.
Im sorry for this dumb question
It's not completely clear how your first 50 rows are identified and in what order they shall be returned. There is no "natural order" in tables of a relational database. No guarantees without explicit ORDER BY.
However, there is a current physical order of rows you can (ab-)use. And by default that's the order in which rows have been inserted - as long as nothing else has happened to that table. But the RDBMS is free to change the physical order any time, so the physical order is not reliable. Results can and will change with write operations to the table (including VACUUM or other utility commands).
Let's call your column used to sort after 50 rows sort_col.
( -- parentheses required
TABLE yellow_tripdata_staging LIMIT 50
)
UNION ALL
( -- parentheses required
SELECT *
FROM (TABLE yellow_tripdata_staging OFFSET 50) sub
ORDER BY sort_col
);
More explanation (incl. TABLE and parentheses):
Is there a shortcut for SELECT * FROM in psql?
Get n grouped categories and sum others into one
Or, assuming sort_col is defined NOT NULL:
SELECT *
FROM yellow_tripdata_staging
ORDER BY CASE WHEN row_number() OVER () > 50 THEN sort_col END NULLS FIRST;
The window function row_number() is allowed to appear in the ORDER BY clause.
row_number() OVER () (with empty OVER clause) will attach serial numbers according to the current physical order of row - all the disclaimers above still apply.
The CASE expression replaces the first 50 row numbers with NULL, which sort first due to attached NULLS FIRST. In effect, the first 50 rows are unsorted the rest is sorted by sort_col.
Or, if you actually mean to take the first 50 rows according to sort_col and leave them unsorted, while the rest is to be sorted:
SELECT *
FROM yellow_tripdata_staging
ORDER BY GREATEST (row_number() OVER (ORDER BY sort_col), 50);
Or, if you just mean to fetch the "first" 50 rows according to current physical order or some other undisclosed (more reliable) criteria, you need a subquery or CTE to sort those 50 rows in the outer SELECT:
SELECT *
FROM (TABLE yellow_tripdata_staging LIMIT 50) sub
ORDER BY sort_col;
You need to define your requirements clearly.
You can order by two different columns. For instance:
select yts.*
from (select yts.*,
row_number() over (order by id) as seqnum
from yellow_tripdata_staging yts
) yts
order by (seqnum <= 50)::int desc,
(case when seqnum <= 50 then id end),
col
Related
A Relational database table holds the information of Insurance details, say id and amount. Table consists of millions of records. requirement is to fetch top 5 records with max amount without using order by clause.
A solution I could think of is to use the temp table to maintain the max 5 records and update these entries each time the main table is updated but would like to know if there are better solution to above problem ?
An efficient way is to put an index on amount desc and use order by. Something like:
select t.*
from t
order by t.amount desc
fetch first 5 rows only; -- or however your database does this
This should be quite efficient.
You can try using analytic functions (example below), but you still have to order at some stage
select id,
amount
from (select id,
amount,
row_number() over (order by amount desc nulls last) as rn
from t)
where rn<=5;
I am looking for the fastest way to get the 1st record (columns a,b,c ) for every partition (a,b) using SQL. Table is ~10, 000, 000 rows.
Approach #1:
SELECT * FROM (
SELECT a,b,c,
ROW_NUMBER() OVER ( PARTITION by a, b ORDER BY date DESC) as row_num
FROM T
) WHERE row_num =1
But it probably does extra work behind the scene - I need only 1st row per partition.
Approach #2 using FIRST_VALUE(). Since FIRST_VALUE() returns expression
let pack/concatenate a,b,c using some separator into single expression, e.g.:
SELECT FIRST_VALUE(a+','+'b'+','+c)
OVER ( PARTITION by a, b ORDER BY date DESC rows unbounded preceding) FROM T
But in this case I need to unpack the result, which is extra step.
Approach #3 using FIRST_VALUE() - repeat OVER (...) for a , b :
SELECT
FIRST_VALUE(a)
OVER ( PARTITION by a, b ORDER BY date DESC rows unbounded preceding),
FIRST_VALUE(b)
OVER ( PARTITION by a, b ORDER BY date DESC rows unbounded preceding),
c
FROM T
In approach #3 I do not know if database engine (Redshift) smart enough to partition only once
The first query is different from the other two. The first only returns one row per group. The other two return the same rows as in the original query.
You should use the version that does what you want, which I presume is the first one. If you add select distinct or group by to the other queries, that will probably add overhead that will make them slower -- but you can test on your data to see if that is true.
Your intuition is correct that the first query does unnecessary work. In databases that support indexes fully, a correlated subquery is often faster. I don't think that would be the case in Redshift, however.
Can we select specific rows to range in oracle? for example, I have a table of 100 rows I have to select only a range of 10 to 20-row numbers. Is it possible to do that
You can do with an auxiliary operation. Firstly number the rows by row_number() function and then order by them :
select * from
(
select row_number() over (order by 0) rn, t.*
from tab t
)
where rn between 10 and 20;
but this is not a stable operation, since SQL statements are unordered sets. Therefore it's better to define a unique identity column and order depending on it.
Replace zero in the order by clause with some columns of your table to be able to reach a rigid ordering criteria. If a primary key column exists, it might be better to include only it in the order by list.
would LIMIT and OFFSET work?
ie.
SELECT * FROM table
LIMIT 20
OFFSET 20
will read rows 20 -> 40. Is this what you are trying to do?
I searched everywhere to find an SQL query to select rows randomly without changing the order. Almost everyone uses something like this:
SELECT * FROM table WHERE type = 1 ORDER BY RAND() LIMIT 25
But above query changes the order. I need a query which selects randomly among the rows but doesn't changes the order, cause every record has a date also.
Select the random rows and then re-order them:
select t.*
from (select *
from table t
where type = 1
order by rand()
limit 25
) t
order by datecol;
In SQL, if you want rows in a particular order, you need to use an explicit order by clause. You should never depend on the ordering of results with no order by. SQL does not guarantee the ordering. MySQL does not guarantee the ordering, unless the query has an order by.
I am trying to fetch a huge set of records from Teradata using JDBC. And I need to break this set into parts for which I'm using "Top N" clause in select.
But I dont know how to set the "Offset" like how we do in MySQL -
SELECT * FROM tbl LIMIT 5,10
so that next select statement would fetch me the records from (N+1)th position.
RANK and QUALIFY I beleive are your friends here
for example
SEL RANK(custID), custID
FROM mydatabase.tblcustomer
QUALIFY RANK(custID) < 1000 AND RANK(custID) > 900
ORDER BY custID;
RANK(field) will (conceptually) retrieve all the rows of the resultset,
order them by the ORDER BY field and assign an incrementing rank ID to them.
QUALIFY allows you to slice that by limiting the rows returned to the qualification expression, which now can legally view the RANKs.
To be clear, I am returning the 900-1000th rows in the query select all from cusotmers,
NOT returning customers with IDs between 900 and 1000.
You can also use the ROW_NUMBER window aggregate on Teradata.
SELECT ROW_NUMBER() OVER (ORDER BY custID) AS RowNum_
, custID
FROM myDatabase.myCustomers
QUALIFY RowNum_ BETWEEN 900 and 1000;
Unlike the RANK windows aggregate, ROW_NUMBER will provide you a sequence regardless of whether the column you are ordering over the optional partition set is unique or not.
Just another option to consider.