TOP 1 with grouping - sql

I have the followin table structure
person_id organization_id
1 1
1 2
1 3
2 4
2 2
I want the result set as
person_id organization_id
1 1
2 4
means TOP1 of the person_id

You are using SQL Server, so you can use row_number(). However, you really cannot define top without ordering -- the results are not guaranteed.
So, the following will do a top without an order by:
select person_id, min(organization_id)
from t
group by person_id;
However, I assume that you intend for the order of the rows to be the intended order. Alas, SQL tables are unordered so the ordering is not valid. You really need an id or creationdate or something to specify the order.
All that said, you can try the following:
select person_id, organization_id
from (select t.*,
row_number() over (partition by person_id order by (select NULL)) as seqnum
from t
) t
where seqnum = 1;
It is definitely not guaranteed to work. In my experience, the order by (select NULL)) returns rows in the same order as the select -- although there is no documentation to this effect (that I have found). Note that in a parallel system on a decent sized table, SQL Server return order has little to do with the order of the rows on the pages or the insert order.

Related

SQL query looping for each value in a list

New to SQL here - I am trying to get 1 row from a table matching to a particular criteria
Typically this would look like
SELECT TOP 1 *
FROM myTable
WHERE id = 'abc'
The output may look like
value id
--------------
1 abc
The table has many entries for an 'id', and I am trying to get one entry per 'id'. Now I have list of 'id's. How would I execute something like
SELECT TOP 1 *
FROM myTable
FOR EACH id
WHERE id IN ('abc', 'edf', 'fgh')
Expecting result like
value id
--------------
1 abc
10 edf
12 fgh
I do not know if it is some sort union or concat operation, but would like to learn. I am working on Azure SQL Server
The table has many entries for an 'id', and I am trying to get one entry per 'id'. Now I have list of 'id's.
A typical method is row_number():
select t.*
from (select t.*,
row_number() over (partition by id order by id) as seqnum
from mytable t
) t
where seqnum = 1;
Note: you can filter on particular ids, if you want. It is unclear if that is really required for your question.
If you happen to be using SQL Server (as select top suggests), you can use the more concise, but somewhat less performant:
select top (1) with ties t.*
from mytable t
order by row_number() over (order by id order by (select null));

How to get nth record in a sql server table without changing the order?(sql server)

for example i have data like this(sql server)
id name
4 anu
3 lohi
1 pras
2 chand
i want 2nd record in a table (means 3 lohi)
if i use row_number() function its changes the order and i get (2 chand)
i want 2nd record from table data
can anyonr please give me the query fro above scenario
There is no such thing as the nth row in a table. And for a simple reason: SQL tables represent unordered sets (technically multi-sets because they allow duplicates).
You can do what you want use offset/fetch:
select t.*
from t
order by id desc
offset 1 fetch first 1 row only;
This assumes that the descending ordering on id is what you want, based on your example data.
You can also do this using row_number():
select t.*
from (select t.*,
row_number() over (order by id desc) as seqnum
from t
) t
where seqnum = 2;
I should note that that SQL Server allows you to assign row_number() without having an effective sort using something like this:
select t.*
from (select t.*,
row_number() over (order by (select NULL)) as seqnum
from t
) t
where seqnum = 2;
However, this returns an arbitrary row. There is no guarantee it returns the same row each time it runs, nor that the row is "second" in any meaningful use of the term.

Oracle Sql: Select only the latest records by an id and a date

My table looks like this:
ID FROM WHEN
1 mario 24.10.19
1 robin 23.10.19
2 mario 24.10.19
3 robin 23.10.19
3 mario 22.10.19
I just want the newest records from an ID. So the result should look like this:
ID FROM WHEN
1 mario 24.10.19
2 mario 24.10.19
3 robin 23.10.19
I dont know how to get this result
There are multiple methods. For just three columns in Oracle, I have had good luck with group by:
select id,
max("from") keep (dense_rank first order by "when" desc) as "from",
max("when") as when
from t
group by id;
Often a correlated subquery performs well, in this case, with an index on (id, when):
select t.*
from t
where t."when" = (select max(t2."when") from t t2 where t2.id = t.id);
And the canonical solution is to use window functions:
select t.*
from (select t.*,
row_number() over (partition by id order by "when" desc) as seqnum
from t
) t
where seqnum = 1;
Oracle has a smart optimizer but this has to do a bit more work, because row numbers are assigned to all rows before the filtering. That can make this a wee bit slower (in some databases) than alternative, but it is still a very viable solution.

I need to find top 2 most frequently occurring device_id and how many time they occur

I have a table like this like this
I want to find top 2 most frequently occurring device ids with their counts.
device_id count
32145678665 3
3214567866555 4
I'm only really answering because it is slightly more tricky than simply using GROUP BY and COUNT()
SELECT *
FROM(
SELECT device_id , COUNT(*)
FROM table_name
GROUP BY device_id
ORDER BY COUNT(*) DESC
)
WHERE rownum <=2
The subquery (inline view) will find all device_ids and how often they come up as well as order them from most frequent to least frequent.
Then we can just query from there and keep only the two most frequent rows by using the pseudocolumn ROWNUM
select top 2 DEVICEID,COUNT(DEVICEID) as CountOfDevice from yourtable
group by DEVICEID
order by COUNT(DEVICEID) DESC

Global row numbers in chunked query

I would like to include a column row_number in my result set with the row number sequence, where 1 is the newest item, without gaps. This works:
SELECT id, row_number() over (ORDER BY id desc) AS row_number, title
FROM mytable
WHERE group_id = 10;
Now I would like to query for the same data in chunks of 1000 each to be easier on memory:
SELECT id, row_number() over (ORDER BY id desc) AS row_number, title
FROM mytable
WHERE group_id = 10 AND id >= 0 AND id < 1000
ORDER BY id ASC;
Here the row_number restarts from 1 for every chunk, but I would like it to be as if it were part of the global query, as in the first case. Is there an easy way to accomplish this?
Assuming:
id is defined as PRIMARY KEY - which means UNIQUE and NOT NULL. Else you may have to deal with NULL values and / or duplicates (ties).
You have no concurrent write access on the table - or you don't care what happens after you have taken your snapshot.
A MATERIALIZED VIEW, like you demonstrate in your answer, is a good choice.
CREATE MATERIALIZED VIEW mv_temp AS
SELECT row_number() OVER (ORDER BY id DESC) AS rn, id, title
FROM mytable
WHERE group_id = 10;
But index and subsequent queries must be on the row number rn to get
data in chunks of 1000
CREATE INDEX ON mv_temp (rn);
SELECT * FROM mv_temp WHERE rn BETWEEN 1000 AND 2000;
Your implementation would require a guaranteed gap-less id column - which would void the need for an added row number to begin with ...
When done:
DROP MATERIALIZED VIEW mv_temp;
The index dies with the table (materialized view in this case) automatically.
Related, with more details:
Optimize query with OFFSET on large table
You want to have a query for the first 1000 rows, then one for the next 1000, and so on?
Usually you just write one query (the one you already use), have your app fetch 1000 records, do something with them, then fetch the next 1000 and so on. No need for separate queries, hence.
However, it would be rather easy to write such partial queries:
select *
from
(
SELECT id, row_number() over (ORDER BY id desc) AS rn, title
FROM mytable
WHERE group_id = 10
) numbered
where rn between 1 and 1000; -- <- simply change the row number range here
-- e.g. where rn between 1001 and 2000 for the second chunk
You need a pagination. Try this
SELECT id, row_number() over (ORDER BY id desc)+0 AS row_number, title
FROM mytable
WHERE group_id = 10 AND id >= 0 AND id < 1000
ORDER BY id ASC;
Next time, when you change the start value of id in the WHERE clause change it in row_number() as well like below
SELECT id, row_number() over (ORDER BY id desc)+1000 AS row_number, title
FROM mytable
WHERE group_id = 10 AND id >= 1000 AND id < 2000
ORDER BY id ASC;
or Better you can use OFFSET and LIMIT approach for pagination
https://wiki.postgresql.org/images/3/35/Pagination_Done_the_PostgreSQL_Way.pdf
In the end I ended up doing it this way:
First I create a temporary materialized view:
CREATE MATERIALIZED VIEW vw_temp AS SELECT id, row_number() over (ORDER BY id desc) AS rn, title
FROM mytable
WHERE group_id = 10;
Then I define the index:
CREATE INDEX idx_temp ON vw_temp USING btree(id);
Now I can perform all operations very quickly, and with numbered rows:
SELECT * FROM vw_temp WHERE id BETWEEN 1000 AND 2000;
After doing the operations, cleanup:
DROP INDEX idx_temp;
DROP MATERIALIZED VIEW vw_temp;
Even though Thorsten Kettner's answer seems the cleanest one, it was not practical for me due to being too slow. Thanks for contributing everyone. For those interesed in the practical use case, I use this for feeding data to the Sphinx indexer.