Select last item for each unique column value - sql

I have a table containing message logs. Each conversation has a conversation ID.
I want to select distinct conversation IDs, and for each of them, find the latest message with that conversation ID and join it into the row.
This is what I tried but it doesn't add any data into the table except the two columns (conversationId and id). I want to get all columns from that table for each row with the latest
SELECT
logs.conversationId,
-- latest message id
MAX(logs.id) AS id
FROM [dbo].[Logs] AS logs
-- trying to get the remaining columns for the last message with that conversation ID
LEFT JOIN [dbo].[Logs] AS logs2 ON logs.id = logs2.id
WHERE
-- only conversations for last month
logs.timestamp >= DATEADD(month, -1, GETDATE())
GROUP BY logs.conversationId
When I try to add another column into SELECT, I get the error saying I need to add that column into the GROUP BY clause. But that causes the statement to run for an extremely long time, over 20 seconds for just a few dozen rows in the result.

use row_number() function
select *
from (
select *,
row_number() over(partition by conversationId order by id desc) as rn
from logs
) as t where t.rn=1

First get max log id per conversion from logs and then apply left join:
select * from
(SELECT
logs.conversationId,
MAX(logs.id) AS id
FROM [dbo].[Logs] AS logs group by logs.conversationId)a
left join [dbo].[Logs] AS logs2 ON a.id = logs2.id and a.conversationid=logs.conversationid

I would use a subquery in where to make it.
select *
from logs t
where t.id = (
SELECT MAX(tt.id)
from logs tt
WHERE tt.conversationId = t.conversationId
GROUP BY tt.conversationId
)
Note
if you make index in id might be faster than row_number version

Related

How create a unique ID based on conditions in SQL?

I would like to get a new ID, no matter the format (in the example below 11,12,13...)
Based on the following condition:
Every time the days column value is greater then 1 and not null then current row and all following ones will get the same ID until a new value will meet the condition.
Within the same email
Below you can see the expected 1 (in the format of XX)
I thought about using two conditions with the following order between them
Every time the days column value is greater then 1 then all following rows will get the same ID until a new value will meet the condition.
2.AND When lag (previous) is equal to 0/1/null.
Assuming you have an EmailDate column over which you're ordering (a DATETIME field, really), try something like this:
WITH
TableNameWithEmailDateIDs AS (
SELECT
*,
ROW_NUMBER() OVER (
ORDER BY
Email DESC,
EmailDate
) AS EmailDateID
FROM
TableName
),
IDs AS (
SELECT
*,
LEAD(EmailDateID, 1) OVER (
ORDER BY
Email,
EmailDate
) AS LeadEmailDateID
FROM
(
SELECT
*,
-- REMOVE +10 if you don't want 11 to be starting ID
ROW_NUMBER() OVER (
ORDER BY
Email DESC,
EmailDate
)+10 AS ID
FROM
TableNameWithEmailDateIDs
WHERE
Days > 1
OR Days IS NULL
) X
)
SELECT
COALESCE(TableName.EmailDate, IDs.EmailDate) AS EmailDate,
IDs.Email,
COALESCE(TableName.Days, IDs.Days) AS Days,
IDs.ID
FROM
IDs
LEFT JOIN TableNameWithEmailDateIDs TableName
ON IDs.Email = TableName.Email
AND TableName.EmailDateID BETWEEN
IDs.EmailDateID
AND IDs.LeadEmailDateID-1
ORDER BY
ID DESC,
TableName.EmailDate DESC
;
First, create a CTE that generates IDs for each distinct Email/Date combo (helpful for LEFT JOIN condition later). Then, create a CTE that generates IDs for rows that meet your condition (i.e. the important rows). Finally, LEFT JOIN your main table onto that CTE to fill in the "gaps", so to speak.
I suggest running each of the components of this query independently to fully understand what's going on.
Hope it helps!

Need to find a difference of data from the same table in hive

I have a history table with loaded timestamp column. I need to fetch the subtracted data using the timestamp column.
Logic:To get the email address by subtracting data from (loaded_timestamp -1)and current_timestamp.Only the subtracted data should be the output.
Select query :
select t1.email_addr
from (select *
from table t1
where loaded_timestamp = current_timestamp
) left outer join
(select *
from table t2
where loaded_timestamp = date_sub(current_timestamp,1)
)
where t1.email!=t2.email;
Table has following columns
Email address, First name , last name, loaded_timestamp.
xxx#gmail.com,xxx,aaa,2020-03-08.
yyy#gmail.com,yyy,bbb,2020-03-08.
zzz#gmail.com,zzz,ccc,2020-03-08.
xxx#gmail.com,xxx,aaa,2020-03-09.
yyy#gmail.com,yyy,bbb,2020-03-09.
Desired Result
zzz#gmail.com
So if subtract the two dates from the same table i.e (2020-03-09 - 2020-03-08 ). I should get only the record which is not matching . Matching records should be discarded and unmatched record should be the output.
The best I can figure out is that you want emails that appear only once. If that is the case, use window functions:
select t.*
from (select t.*, count(*) over (partition by email) as cnt
from t
) t
where cnt = 1;
If you want emails in the data but not loaded on the current date, then:
select t.email
from t
group by t.email
having max(timestamp) <> current_date;

SQL query for filtering duplicate rows of a column by the minimum DateTime of those corresponding rows

I have a SQL database table, "Helium_Test_Data", that has multiple entries based on the KeyID column (the KeyID represents a single tested part ). I need to query the entries and only show one entry per KeyID (part) based on the earliest creation date-time (format example is 2018-12-29 08:22:11.123). This is because the same part was tested several times but the first reading is the one I need to use. Here is the query currently tried:
SELECT mt.*
FROM Helium_Test_Data mt
INNER JOIN
(
SELECT
KeyID,
MIN(DateTime) AS DateTime
FROM Helium_Test_Data
WHERE PSNo='11166565'
GROUP BY KeyID
) t ON mt.KeyID = t.KeyID AND mt.DateTime = t.DateTime
WHERE PSNo='11167197'
AND (mt.DateTime > '2018-12-29 07:00')
AND (mt.DateTime < '2018-12-29 18:00') AND OK=1
ORDER BY KeyId,DateTime
It returns only the rows that have no duplicate KeyID present in the table whereas I need one row per every single KeyID (duplicate or not). And for the duplicate ones, I need the earliest date.
Thanks in advance for the help.
use row_number() window function which support most dbms
select * from
(
select *,row_number() over(partition by KeyID order by DateTime) rn
from Helium_Test_Data
) t where t.rn=1
or you could use corelated subquery
select t1.* from Helium_Test_Data t1
where t1.DateTime= (select min(DateTime)
from Helium_Test_Data t2
where t2.KeyID=t1.KeyID
)

Filter SQL data by repetition on a column

Very simple basic SQL question here.
I have this table:
Row Id __________Hour__Minute__City_Search
1___1409346767__23____24_____Balears (Illes)
2___1409346767__23____13_____Albacete
3___1409345729__23____7______Balears (Illes)
4___1409345729__23____3______Balears (Illes)
5___1409345729__22____56_____Balears (Illes)
What I want to get is only one distinct row by ID and select the last City_Search made by the same Id.
So, in this case, the result would be:
Row Id __________Hour__Minute__City_Search
1___1409346767__23____24_____Balears (Illes)
3___1409345729__23____7______Balears (Illes)
What's the easier way to do it?
Obviously I don't want to delete any data just query it.
Thanks for your time.
SELECT Row,
Id,
Hour,
Minute,
City_Search
FROM Table T
JOIN
(
SELECT MIN(Row) AS Row,
ID
FROM Table
GROUP BY ID
) AS M
ON M.Row = T.Row
AND M.ID = T.ID
Can you change hour/minute to a timestamp?
What you want in this case is to first select what uniquely identifies your row:
Select id, max(time) from [table] group by id
Then use that query to add the data to it.
SELECT id,city search, time
FROM (SELECT id, max(time) as lasttime FROM [table] GROUP BY id) as Tkey
INNER JOIN [table] as tdata
ON tkey.id = tdata.id AND tkey.lasttime = tdata.time
That should do it.
two options to do it without join...
use Row_Number function to find the last one
Select * FROM
(Select *,
row_number() over(Partition BY ID Order BY Hour desc Minute Desc) as RNB
from table)
Where RNB=1
Manipulate the string and using simple Max function
Select ID,Right(MAX(Concat(Hour,Minute,RPAD(Searc,20,''))),20)
From Table
Group by ID
avoiding Joins is usually much faster...
Hope this helps

SQL - Group By unique column combination

I am trying to write a script that will return the latest values for a unique documentid-physician-patient triplet. I need the script to act similar to a group by statement, except group by only works with one column at a time. I need to date and status information for only the most recent unique triplet. Please let me know what you will need to see from me to help. Here is the current, very bare, statement:
SELECT
TransmissionSend.CreateTimestamp,
TransmissionSendItem.Status,
TransmissionSendItem.PhysicianId,
TransmissionSendItem.DocumentIdDisplay,
Utility.SqlFunctions_NdnListToAccountList(TransmissionSendItem.NdocNum) AS AccountNum
FROM
Interface_SFAX.TransmissionSend,
Interface_SFAX.TransmissionSendItem
WHERE
TransmissionSend.ID = TransmissionSendItem.childsub --I don't know exactly what this does, I did not write this script. It must stay here though for the exact results.
ORDER BY TransmissionSend.CreateTimestamp DESC -- In the end, each latest result of the unique triplet will be ordered from most recent to oldest in return
My question is, again, how can I limit results to only the latest status for each physician id, document id, and account number combination?
First select the MAX(date) with the documentid GROUP BY documentid then select all data from the table by the first select result for example with an inner join.
SELECT table.additionalData, J.id, J.date
FROM table
INNER JOIN (SELECT id, MAX(date) AS date
FROM table GROUP BY id) AS J
ON J.id = table.id
AND J.date /* this is the max date */ = table.date