Last entry for non-unique id where X=Y - sql

Please see the above image for rows I would like returned, those highlighted in yellow.
From the picture attached, I would like it to only return id 133766 and 133792 as they end at stage 5.
I want to pull the last entry for a non-unique id where there could be X amount of entries per non-unique id.
I am not overly experience in SQL, but what I know is;
I could do
SELECT max(stage), id
FROM [dbo].[table] group by id
and this gives me a pretty good starting point. I'd rather sort on the date field, as the "stage" isn't actually an int, I've done that for simplicity here.
So I essentially need to get the last entry (figured out by date) for all non-unique id's where stage doesn't equal X
I feel like it's a really simple, everyday query, but I just can not wrap my head around a simple, efficient way to do it.
Any help is much appreciated.

try this
SELECT *
FROM(
SELECT Id, Stage, CompletionDate
,Row_number() OVER(PARTITION BY ID ORDER BY CompletionDate DESC) AS RN
FROM YourTable
) AS t
WHERE RN = 1 AND Stage = 5;

I want to point out that not exists is also a way to approach this:
select t.*
from t
where t.stage = 5 and
not exists (select 1
from t t2
where t2.id = t.id and t2.stage > t.stage
);
With an index on (id, stage), you might be surprised at how good the performance is.

You could use the window version of MAX
;WITH CTE_DATA AS
(
SELECT *
, MAX(stage) OVER (PARTITION BY id) AS max_stage
FROM [dbo].[table]
)
SELECT *
FROM CTE_DATA
WHERE stage = max_stage
AND max_stage = 5;

Related

Global row numbers in chunked query

I would like to include a column row_number in my result set with the row number sequence, where 1 is the newest item, without gaps. This works:
SELECT id, row_number() over (ORDER BY id desc) AS row_number, title
FROM mytable
WHERE group_id = 10;
Now I would like to query for the same data in chunks of 1000 each to be easier on memory:
SELECT id, row_number() over (ORDER BY id desc) AS row_number, title
FROM mytable
WHERE group_id = 10 AND id >= 0 AND id < 1000
ORDER BY id ASC;
Here the row_number restarts from 1 for every chunk, but I would like it to be as if it were part of the global query, as in the first case. Is there an easy way to accomplish this?
Assuming:
id is defined as PRIMARY KEY - which means UNIQUE and NOT NULL. Else you may have to deal with NULL values and / or duplicates (ties).
You have no concurrent write access on the table - or you don't care what happens after you have taken your snapshot.
A MATERIALIZED VIEW, like you demonstrate in your answer, is a good choice.
CREATE MATERIALIZED VIEW mv_temp AS
SELECT row_number() OVER (ORDER BY id DESC) AS rn, id, title
FROM mytable
WHERE group_id = 10;
But index and subsequent queries must be on the row number rn to get
data in chunks of 1000
CREATE INDEX ON mv_temp (rn);
SELECT * FROM mv_temp WHERE rn BETWEEN 1000 AND 2000;
Your implementation would require a guaranteed gap-less id column - which would void the need for an added row number to begin with ...
When done:
DROP MATERIALIZED VIEW mv_temp;
The index dies with the table (materialized view in this case) automatically.
Related, with more details:
Optimize query with OFFSET on large table
You want to have a query for the first 1000 rows, then one for the next 1000, and so on?
Usually you just write one query (the one you already use), have your app fetch 1000 records, do something with them, then fetch the next 1000 and so on. No need for separate queries, hence.
However, it would be rather easy to write such partial queries:
select *
from
(
SELECT id, row_number() over (ORDER BY id desc) AS rn, title
FROM mytable
WHERE group_id = 10
) numbered
where rn between 1 and 1000; -- <- simply change the row number range here
-- e.g. where rn between 1001 and 2000 for the second chunk
You need a pagination. Try this
SELECT id, row_number() over (ORDER BY id desc)+0 AS row_number, title
FROM mytable
WHERE group_id = 10 AND id >= 0 AND id < 1000
ORDER BY id ASC;
Next time, when you change the start value of id in the WHERE clause change it in row_number() as well like below
SELECT id, row_number() over (ORDER BY id desc)+1000 AS row_number, title
FROM mytable
WHERE group_id = 10 AND id >= 1000 AND id < 2000
ORDER BY id ASC;
or Better you can use OFFSET and LIMIT approach for pagination
https://wiki.postgresql.org/images/3/35/Pagination_Done_the_PostgreSQL_Way.pdf
In the end I ended up doing it this way:
First I create a temporary materialized view:
CREATE MATERIALIZED VIEW vw_temp AS SELECT id, row_number() over (ORDER BY id desc) AS rn, title
FROM mytable
WHERE group_id = 10;
Then I define the index:
CREATE INDEX idx_temp ON vw_temp USING btree(id);
Now I can perform all operations very quickly, and with numbered rows:
SELECT * FROM vw_temp WHERE id BETWEEN 1000 AND 2000;
After doing the operations, cleanup:
DROP INDEX idx_temp;
DROP MATERIALIZED VIEW vw_temp;
Even though Thorsten Kettner's answer seems the cleanest one, it was not practical for me due to being too slow. Thanks for contributing everyone. For those interesed in the practical use case, I use this for feeding data to the Sphinx indexer.

Filter SQL data by repetition on a column

Very simple basic SQL question here.
I have this table:
Row Id __________Hour__Minute__City_Search
1___1409346767__23____24_____Balears (Illes)
2___1409346767__23____13_____Albacete
3___1409345729__23____7______Balears (Illes)
4___1409345729__23____3______Balears (Illes)
5___1409345729__22____56_____Balears (Illes)
What I want to get is only one distinct row by ID and select the last City_Search made by the same Id.
So, in this case, the result would be:
Row Id __________Hour__Minute__City_Search
1___1409346767__23____24_____Balears (Illes)
3___1409345729__23____7______Balears (Illes)
What's the easier way to do it?
Obviously I don't want to delete any data just query it.
Thanks for your time.
SELECT Row,
Id,
Hour,
Minute,
City_Search
FROM Table T
JOIN
(
SELECT MIN(Row) AS Row,
ID
FROM Table
GROUP BY ID
) AS M
ON M.Row = T.Row
AND M.ID = T.ID
Can you change hour/minute to a timestamp?
What you want in this case is to first select what uniquely identifies your row:
Select id, max(time) from [table] group by id
Then use that query to add the data to it.
SELECT id,city search, time
FROM (SELECT id, max(time) as lasttime FROM [table] GROUP BY id) as Tkey
INNER JOIN [table] as tdata
ON tkey.id = tdata.id AND tkey.lasttime = tdata.time
That should do it.
two options to do it without join...
use Row_Number function to find the last one
Select * FROM
(Select *,
row_number() over(Partition BY ID Order BY Hour desc Minute Desc) as RNB
from table)
Where RNB=1
Manipulate the string and using simple Max function
Select ID,Right(MAX(Concat(Hour,Minute,RPAD(Searc,20,''))),20)
From Table
Group by ID
avoiding Joins is usually much faster...
Hope this helps

Select all but last row in Oracle SQL

I want to pull all rows except the last one in Oracle SQL
My database is like this
Prikey - Auto_increment
common - varchar
miles - int
So I want to sum all rows except the last row ordered by primary key grouped by common. That means for each distinct common, the miles will be summed (except for the last one)
Note: the question was changed after this answer was posted. The first two queries work for the original question. The last query (in the addendum) works for the updated question.
This should do the trick, though it will be a bit slow for larger tables:
SELECT prikey, authnum FROM myTable
WHERE prikey <> (SELECT MAX(prikey) FROM myTable)
ORDER BY prikey
This query is longer but for a large table it should faster. I'll leave it to you to decide:
SELECT * FROM (
SELECT
prikey,
authnum,
ROW_NUMBER() OVER (ORDER BY prikey DESC) AS RowRank
FROM myTable)
WHERE RowRank <> 1
ORDER BY prikey
Addendum There was an update to the question; here's the updated answer.
SELECT
common,
SUM(miles)
FROM (
SELECT
common,
miles,
ROW_NUMBER() OVER (PARTITION BY common ORDER BY prikey DESC) AS RowRank
FROM myTable
)
WHERE RowRank <> 1
GROUP BY common
Looks like I am a little too late but here is my contribution, similar to Ed Gibbs' first solution but instead of calculating the max id for each value in the table and then comparing I get it once using an inline view.
SELECT d1.prikey,
d1.authnum
FROM myTable d1,
(SELECT MAX(prikey) prikey myTable FROM myTable) d2
WHERE d1.prikey != d2.prikey
At least I think this is more efficient if you want to go without the use of Analytics.
query to retrieve all the records in the table except first row and last row
select * from table_name
where primary_id_column not in
(
select top 1 * from table_name order by primary_id_column asc
)
and
primary_id_column not in
(
select top 1 * from table_name order by primary_id_column desc
)

Removing dups and updating null values

I've just been tasked with removing all the duplicate values in a database. Simple enough. But they also want me to go through and check if there are any Null values that were not Null in previous entries for that record.
So let's say that we have user 123. User 123 doesn't have a zip code listed for whatever reason. But in a past entry he had zip code 55555. I'm supposed to update the latest entry with that zip code from a past entry and then delete the past entry. Leaving me with only one entry for user 123 AND having the zip code 55555.
I'm just unsure how to do the update portion. Anybody have any suggestions?
Thanks!
Here is how you can do the update. It finds the last value for zip, and then updates the field, if necessary:
with lastval as (
select *
from (select id, zip, row_number() over (partition by id order by datecreated desc) as seqnum
from t
where zip is not null
) t
where seqnum = 1
)
update t
set t.zip = lastval.zip
from lastval
where t.id = lastval.id
However, I would suggest that you create a new table with the data that you want. Don't both deleting and updating a zilion rows, create a table using a query such as:
select *
from (select t.*, row_number() over (partition by id order by datecreated desc) as seqnum
from t
where zip is not null
) t
where seqnum = 1
And insert the rows into a new table.
And, one more suggestion. Ask another question, with a better notion of what the fields are like in the table, and which ones you want to look up last values for. That will provide additional information for better solutions.
You could use a statement similar to the following one:
update t1
set t1.address = dt.address,
t1.city = dt.city,
... and so on ...
from your_table as t1
inner join
(
select
max(id) as id,
companyname,
max(address) as address,
max(city) as city,
... and so on ...
from your_table
group by companyname -- your duplicate detection goes here
) dt
on dt.id = t1.id
This way you fill up all gaps in your duplicates. Then you just have to delete the duplicates.

Order by clause is changing my result set

I know why it's happening but I want to find a way around it if possible.
For example I have 4 rows in my database and each has a datetime (which are all different). What I want to do is get the latest 2 rows but use ascending order, so that the oldest is at the top of the result set.
I currently am using
SELECT TOP 2 *
FROM mytable
WHERE someid = #something
ORDER BY added DESC
This gets me the correct rows but in the wrong order. If I change the DESC to ASC it gets the right order, but the older two of the four rows. This all makes sense to me but is there a way around it?
EDIT: Solved with Elliot's answer below. The syntax would not work without setting an alias for the derived table however. Here is the result
SELECT * FROM
(SELECT TOP 2 * FROM mytable WHERE someid = #something ORDER BY added DESC) AS tbl
ORDER BY tbl.added ASC
I'd think one brute-force solution would be:
SELECT *
FROM (SELECT TOP 2 * FROM mytable WHERE someid = #something ORDER BY added DESC)
ORDER BY added
This will allow "top 2 per something" with a PARTITION BY added to the OVER clause
SELECT *
FROM
(
SELECT *, ROW_NUMBER() OVER (ORDER BY added DESC) as rn
FROM mytable
WHERE someid = #something
) foo
WHERE rn <= 2
ORDER BY added
Note that the derived table requires an alias