Query optimization with rows referring to parents - sql

I have one table that has "Id", "ParentId", and "CreatedDate". If the row is an original submission, it will have no "ParentId". When an edit is made to an original submission, a new row is created where NewRow."ParentId" = Original."Id". Every new edit made from there on will take the proper "ParentId". This gives a way to see the history of edits.
Now for the query. I rushed together a query that will get all of the latest and unique entries. For example. if I have 3 unique original forms. I only want to see their most recent revision (most recent child) unless they have none, in which case I want the original where "ParentId" IS NULL.
This is the query I am using:
SELECT DISTINCT A.*
FROM "dbo"."customercomplaint" AS A
RIGHT OUTER JOIN "dbo"."customercomplaint" AS B
ON B."parentid" != A."id"
WHERE A."parentid" IS NULL
AND A."id" IS NOT NULL
UNION
SELECT t1.*
FROM "dbo"."customercomplaint" t1
JOIN (SELECT "parentid" AS id,
Max("createddate") AS "CreatedDate"
FROM "dbo"."customercomplaint"
GROUP BY id) t2
ON t1."parentid" = t2.id
AND t1."createddate" = t2."createddate"
This query feels a little sloppy to me and I would like to seek out a better solution. Let me know if any further information is required. I appreciate any and all advice.

You can simplify the query using Row_Number() function.
Below is an example and a working demo
select ID, ParentID, CreatedDate
from (
select ID, ParentID, CreatedDate, row_number() over(partition by isnull(ParentID, ID) order by CreatedDate desc) RowNumber
from CustomerComplaint
) t
where
t.RowNumber = 1

Related

Foreach/per-item iteration in SQL

I'm new to SQL and I think I must just be missing something, but I can't find any resources on how to do the following:
I have a table with three relevant columns: id, creation_date, latest_id. latest_id refers to the id of another entry (a newer revision).
For each entry, I would like to find the min creation date of all entries with latest_id = this.id. How do I perform this type of iteration in SQL / reference the value of the current row in an iteration?
select
t.id, min(t2.creation_date) as min_creation_date
from
mytable t
left join
mytable t2 on t2.latest_id = t.id
group by
t.id
You could solve this with a loop, but it's not anywhere close the best strategy. Instead, try this:
SELECT tf.id, tf.Creation_Date
FROM
(
SELECT t0.id, t1.Creation_Date,
row_number() over (partition by t0.id order by t1.creation_date) rn
FROM [MyTable] t0 -- table prime
INNER JOIN [MyTable] t1 ON t1.latest_id = t0.id -- table 1
) tf -- table final
WHERE tf.rn = 1
This connects the id to the latest_id by joining the table to itself. Then it uses a windowing function to help identify the smallest Creation_Date for each match.

SQL Help: SELECT CASE?

I'm looking for some general guidance on the best solution for a reoccurring SQL query. Basically, I want to create a view of a table which has a lot of nearly identical rows, (except for 1 discerning column called [Status], which can be either 'Closed' or 'Draft').
I want to return distinct data for each [Port], if both 'Closed' and 'Draft' exist, then return only the 'Draft' row data, and if only 'Closed' exists, then return the 'Closed' row data.
Please refer to the attached files for a visual. Any assistance is greatly appreciated! I believe this solution will lend itself well to other practical cases/solutions for me in the future - thank you!
Original Table Data:
Example Output:
Try this,
select c.Port,c.DateAdded,max(Status) as Status
from myTable c
group by c.Port,c.DateAdded
Basically, group the table, and take the highest status code (Closed or Draft)
If both exists, Draft will be returned
Use NOT EXISTS:
SELECT t1.*
FROM tablename t1
WHERE t1.Status = 'Draft'
OR NOT EXISTS (
SELECT 1
FROM tablename t2
WHERE t2.Port = t1.Port AND t1.Status = 'Draft'
)
Or with ROW_NUMBER() window function:
SELECT Port, DateAdded, Status
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Port ORDER BY CASE WHEN Status = 'Draft' THEN 1 ELSE 2 END) rn
FROM tablename
) t
WHERE rn = 1
A rewording of your requirement is to return just one row per Port, and that Draft rows take precidence over Closed rows.
You don't make clear if they can have different dates though. Such that if one port has two Draft rows or two Closed rows, do you want the earlier dated row, or the later dated row?
The code below presumes the dates can indeed be different, and that your prefer the later dated row.
WITH
sorted AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY port ORDER BY status DESC, dateAdded DESC) AS seq_num
FROM
YourTable
)
SELECT
*
FROM
sorted
WHERE
seq_num = 1
If the dates are always identical, MAX(status) with GROUP BY port, dateAdded is easily sufficient.
I'd use a full outer join, and coalesce the results so that the "draft" row is preferred over the "closed" row:
SELECT COALESCE(d.Port, c.Port),
COALESCE(d.DateAdded, c.DateAdded),
COALESCE(d.Status, c.Status)
FROM (SELECT Port, DateAdded, Status
FROM mytable
WHERE Status = 'Draft') d
FULL OUTER JOIN (SELECT Port, DateAdded, Status
FROM mytable
WHERE Status = 'Closed') c ON d.Port = c.Port

Last entry for non-unique id where X=Y

Please see the above image for rows I would like returned, those highlighted in yellow.
From the picture attached, I would like it to only return id 133766 and 133792 as they end at stage 5.
I want to pull the last entry for a non-unique id where there could be X amount of entries per non-unique id.
I am not overly experience in SQL, but what I know is;
I could do
SELECT max(stage), id
FROM [dbo].[table] group by id
and this gives me a pretty good starting point. I'd rather sort on the date field, as the "stage" isn't actually an int, I've done that for simplicity here.
So I essentially need to get the last entry (figured out by date) for all non-unique id's where stage doesn't equal X
I feel like it's a really simple, everyday query, but I just can not wrap my head around a simple, efficient way to do it.
Any help is much appreciated.
try this
SELECT *
FROM(
SELECT Id, Stage, CompletionDate
,Row_number() OVER(PARTITION BY ID ORDER BY CompletionDate DESC) AS RN
FROM YourTable
) AS t
WHERE RN = 1 AND Stage = 5;
I want to point out that not exists is also a way to approach this:
select t.*
from t
where t.stage = 5 and
not exists (select 1
from t t2
where t2.id = t.id and t2.stage > t.stage
);
With an index on (id, stage), you might be surprised at how good the performance is.
You could use the window version of MAX
;WITH CTE_DATA AS
(
SELECT *
, MAX(stage) OVER (PARTITION BY id) AS max_stage
FROM [dbo].[table]
)
SELECT *
FROM CTE_DATA
WHERE stage = max_stage
AND max_stage = 5;

Filter SQL data by repetition on a column

Very simple basic SQL question here.
I have this table:
Row Id __________Hour__Minute__City_Search
1___1409346767__23____24_____Balears (Illes)
2___1409346767__23____13_____Albacete
3___1409345729__23____7______Balears (Illes)
4___1409345729__23____3______Balears (Illes)
5___1409345729__22____56_____Balears (Illes)
What I want to get is only one distinct row by ID and select the last City_Search made by the same Id.
So, in this case, the result would be:
Row Id __________Hour__Minute__City_Search
1___1409346767__23____24_____Balears (Illes)
3___1409345729__23____7______Balears (Illes)
What's the easier way to do it?
Obviously I don't want to delete any data just query it.
Thanks for your time.
SELECT Row,
Id,
Hour,
Minute,
City_Search
FROM Table T
JOIN
(
SELECT MIN(Row) AS Row,
ID
FROM Table
GROUP BY ID
) AS M
ON M.Row = T.Row
AND M.ID = T.ID
Can you change hour/minute to a timestamp?
What you want in this case is to first select what uniquely identifies your row:
Select id, max(time) from [table] group by id
Then use that query to add the data to it.
SELECT id,city search, time
FROM (SELECT id, max(time) as lasttime FROM [table] GROUP BY id) as Tkey
INNER JOIN [table] as tdata
ON tkey.id = tdata.id AND tkey.lasttime = tdata.time
That should do it.
two options to do it without join...
use Row_Number function to find the last one
Select * FROM
(Select *,
row_number() over(Partition BY ID Order BY Hour desc Minute Desc) as RNB
from table)
Where RNB=1
Manipulate the string and using simple Max function
Select ID,Right(MAX(Concat(Hour,Minute,RPAD(Searc,20,''))),20)
From Table
Group by ID
avoiding Joins is usually much faster...
Hope this helps

Removing dups and updating null values

I've just been tasked with removing all the duplicate values in a database. Simple enough. But they also want me to go through and check if there are any Null values that were not Null in previous entries for that record.
So let's say that we have user 123. User 123 doesn't have a zip code listed for whatever reason. But in a past entry he had zip code 55555. I'm supposed to update the latest entry with that zip code from a past entry and then delete the past entry. Leaving me with only one entry for user 123 AND having the zip code 55555.
I'm just unsure how to do the update portion. Anybody have any suggestions?
Thanks!
Here is how you can do the update. It finds the last value for zip, and then updates the field, if necessary:
with lastval as (
select *
from (select id, zip, row_number() over (partition by id order by datecreated desc) as seqnum
from t
where zip is not null
) t
where seqnum = 1
)
update t
set t.zip = lastval.zip
from lastval
where t.id = lastval.id
However, I would suggest that you create a new table with the data that you want. Don't both deleting and updating a zilion rows, create a table using a query such as:
select *
from (select t.*, row_number() over (partition by id order by datecreated desc) as seqnum
from t
where zip is not null
) t
where seqnum = 1
And insert the rows into a new table.
And, one more suggestion. Ask another question, with a better notion of what the fields are like in the table, and which ones you want to look up last values for. That will provide additional information for better solutions.
You could use a statement similar to the following one:
update t1
set t1.address = dt.address,
t1.city = dt.city,
... and so on ...
from your_table as t1
inner join
(
select
max(id) as id,
companyname,
max(address) as address,
max(city) as city,
... and so on ...
from your_table
group by companyname -- your duplicate detection goes here
) dt
on dt.id = t1.id
This way you fill up all gaps in your duplicates. Then you just have to delete the duplicates.