I have a view and I request data from it with a simple query like:
SELECT * FROM my_view WHERE id IN (...)
In general on "normal data" it should return 10-100 entries per id, but for some ids, it may return more than 1,000,000 entries!
I would like to limit my query so that it would not return more than 100 entries per id, but I really have no idea other than running a query for each id separately.
Use row_number():
select v.*
from (select v.*, row_number() over (partition by id order by id) as seqnum
from my_view
where id in (...)
) v
where seqnum <= 100;
Related
Can I rewrite this select without using aggregate function to retrieve the highest value
Select *
From A
Where Id= 123
and value = (
select max(value)
from A inner
where inner.id = 123 )
If you are certain that only one record would have the max value, or, if there are ties you don't care which gets returned, then you may use this limit query:
SELECT *
FROM A
WHERE Id = 123
ORDER BY value DESC
LIMIT 1;
If this doesn't meet your expectations, then stick with your current approach. Note that you could also use RANK() here:
WITH cte AS (
SELECT *, RANK() OVER (ORDER BY value DESC) rnk
FROM A
WHERE Id = 123
)
SELECT *
FROM cte
WHERE rnk = 1;
But like your version, the above rank query also requires a subquery.
I am attempting to sample 20% of a table in impala. I have heard somewhere that the built in impala sampling function has issues.
Is there a way to pass in a subquery to the impala limit function to sample n percent of the entire table.
I have something like this:
select
* from
table_a
order by rand()
limit
(
select
round( (count(distinct ids)) *.2,0)
from table_a)
)
The sub query gives me 20% of all records
I'm not sure if Impala has specific sampling logic (some databases do). But you can use window functions:
select a.*
from (select a.*,
row_number() over (order by rand()) as seqnum,
count(*) over () as cnt
from table_a
) a
where seqnum <= cnt * 0.2;
Group by the highest Number in a column worked great with MAX(), but what if I would like to get the cell that is at most common.
As example:
ID
100
250
250
300
200
250
So I would like to group by ID and instead of get the lowest (MIN) or highest (MAX) number, I would like to get the most common one (that would be 250, because there 3x).
Is there an easy way in SQL Server 2012 or am I forced to add a second SELECT where I COUNT(DISTINCT ID) and add that somehow to my first SELECT statement?
You can use dense_rank to return all the id's with the highest counts. This would handle cases when there are ties for the highest counts as well.
select id from
(select id, dense_rank() over(order by count(*) desc) as rnk from tablename group by id) t
where rnk = 1
A simple way to do what you want uses top and order by:
SELECT top 1 id
FROM t
GROUP BY id
ORDER BY COUNT(*) DESC;
This is a statistic called the mode. Getting the mode and max is a bit challenging in SQL Server. I would approach it as:
WITH cte AS (
SELECT t.id, COUNT(*) AS cnt,
row_number() OVER (ORDER BY COUNT(*) DESC) AS seqnum
FROM t
GROUP BY id
)
SELECT MAX(id) AS themax, MAX(CASE WHEN seqnum = 1 THEN id END) AS MODE
FROM cte;
I need to retrieve the last few entries from a table. I can retrieve them using:
SELECT TOP n *
FROM table
ORDER BY id DESC
That I looked everywhere and that's the only answer I could find, But that way I get them in reverse order. I need them in the same order as they are in the table because it's for a messaging interface.
Use a derived table:
select id, ...
from
(
select top n id, ...
from t
order by id desc
) dt
order by id
I suggest you to use a ROW_NUMBER() like this:
SELECT *
FROM (
SELECT
*, ROW_NUMBER() OVER (ORDER BY id DESC) AS RowNo
FROM
yourTable
) AS t
WHERE
(RowNO < #n)
ORDER BY
id
I have an Oracle table with ID, SUBJECT, and PAYLOAD (CLOB). I'd like to get a listing of the TOP 10 records who have the biggest PAYLOAD (LENGTH(PAYLOAD)) grouped by subject. So if I have 10 DISTINCT SUBJECT's in the table, the query should return 100 rows (top 10 per subject).
Use row_number():
select t.*
from (select t.*, row_number() over (partition by subject order by length(payload) desc) as seqnum
from table t
) t
where seqnum <= 10;