SQL Server: Select all but the first result in query - sql

We have an issue with two tables, let's call them Item and ItemStatuses. We track each change to the status, so there is a start date and end date in the ItemStatuses table. We track the current status by looking for the one with an end date that is null.
Through an error in the system the newest status was added multiple times to a number of items. I need to select all but the first status for each item. I have the following query which gives me all the open statuses. I was trying this route because I figured I could use the row number to skip the first one, but there are multiple Items in these sets, so I need to skip the first status for each item. I think I'm pretty close with my query, but I'm not sure what I need to do.
SELECT ID, rn = ROW_NUMBER() OVER (ORDER BY ItemID)
FROM ItemStatuses WHERE ID IN
(
SELECT
s.ID
FROM Items as i
INNER JOIN ItemStatuses AS s ON
i.ID = s.ItemID AND
s.EndDate IS NULL
GROUP BY i.ID
HAVING COUNT(i.ID) > 1
)

To illustrate how to update all but the first status of your table:
declare #itemstatuses table (id int, Enddate datetime, theStatus int)
insert into #itemstatuses
values (1,getdate()-3,1),(1,getdate()-2,2),(1,getdate()-1,3),
(2,getdate()-3,2),(2,getdate()-2,2),(2,getdate()-1,99),
(3,getdate(),1)
select 'before',* from #itemStatuses
;with sorted
as (
select [r] = row_number()over(partition by id order by Enddate), *
from #ItemStatuses
)
update sorted
set theStatus = 100
where r>1
select 'after',* from #itemStatuses

I would simplify your SQL query since this can become overly complicated and expensive.
Then use your server sided language to perform any filtering or condition.

Related

How create a unique ID based on conditions in SQL?

I would like to get a new ID, no matter the format (in the example below 11,12,13...)
Based on the following condition:
Every time the days column value is greater then 1 and not null then current row and all following ones will get the same ID until a new value will meet the condition.
Within the same email
Below you can see the expected 1 (in the format of XX)
I thought about using two conditions with the following order between them
Every time the days column value is greater then 1 then all following rows will get the same ID until a new value will meet the condition.
2.AND When lag (previous) is equal to 0/1/null.
Assuming you have an EmailDate column over which you're ordering (a DATETIME field, really), try something like this:
WITH
TableNameWithEmailDateIDs AS (
SELECT
*,
ROW_NUMBER() OVER (
ORDER BY
Email DESC,
EmailDate
) AS EmailDateID
FROM
TableName
),
IDs AS (
SELECT
*,
LEAD(EmailDateID, 1) OVER (
ORDER BY
Email,
EmailDate
) AS LeadEmailDateID
FROM
(
SELECT
*,
-- REMOVE +10 if you don't want 11 to be starting ID
ROW_NUMBER() OVER (
ORDER BY
Email DESC,
EmailDate
)+10 AS ID
FROM
TableNameWithEmailDateIDs
WHERE
Days > 1
OR Days IS NULL
) X
)
SELECT
COALESCE(TableName.EmailDate, IDs.EmailDate) AS EmailDate,
IDs.Email,
COALESCE(TableName.Days, IDs.Days) AS Days,
IDs.ID
FROM
IDs
LEFT JOIN TableNameWithEmailDateIDs TableName
ON IDs.Email = TableName.Email
AND TableName.EmailDateID BETWEEN
IDs.EmailDateID
AND IDs.LeadEmailDateID-1
ORDER BY
ID DESC,
TableName.EmailDate DESC
;
First, create a CTE that generates IDs for each distinct Email/Date combo (helpful for LEFT JOIN condition later). Then, create a CTE that generates IDs for rows that meet your condition (i.e. the important rows). Finally, LEFT JOIN your main table onto that CTE to fill in the "gaps", so to speak.
I suggest running each of the components of this query independently to fully understand what's going on.
Hope it helps!

Select last item for each unique column value

I have a table containing message logs. Each conversation has a conversation ID.
I want to select distinct conversation IDs, and for each of them, find the latest message with that conversation ID and join it into the row.
This is what I tried but it doesn't add any data into the table except the two columns (conversationId and id). I want to get all columns from that table for each row with the latest
SELECT
logs.conversationId,
-- latest message id
MAX(logs.id) AS id
FROM [dbo].[Logs] AS logs
-- trying to get the remaining columns for the last message with that conversation ID
LEFT JOIN [dbo].[Logs] AS logs2 ON logs.id = logs2.id
WHERE
-- only conversations for last month
logs.timestamp >= DATEADD(month, -1, GETDATE())
GROUP BY logs.conversationId
When I try to add another column into SELECT, I get the error saying I need to add that column into the GROUP BY clause. But that causes the statement to run for an extremely long time, over 20 seconds for just a few dozen rows in the result.
use row_number() function
select *
from (
select *,
row_number() over(partition by conversationId order by id desc) as rn
from logs
) as t where t.rn=1
First get max log id per conversion from logs and then apply left join:
select * from
(SELECT
logs.conversationId,
MAX(logs.id) AS id
FROM [dbo].[Logs] AS logs group by logs.conversationId)a
left join [dbo].[Logs] AS logs2 ON a.id = logs2.id and a.conversationid=logs.conversationid
I would use a subquery in where to make it.
select *
from logs t
where t.id = (
SELECT MAX(tt.id)
from logs tt
WHERE tt.conversationId = t.conversationId
GROUP BY tt.conversationId
)
Note
if you make index in id might be faster than row_number version

Retrieving most recent data in SQL

Total disclosure: I'm a SQL beginner.
I have a data set of certain accounting and governance metrics for US companies. It has about 15 columns and roughly 18 million rows. Each row is a unique combination of company, date and metric being measured. The columns include certain identifiers like isin number, ticker symbol, etc, the date the metric was released, the metric description, and the metric itself.
What I'm trying to do is write a query that will yield the NEWEST values for a certain metric for all companies. In my hopeless search over the past few days I've come to think that the GROUP BY clause may be what I'm looking for. However, it doesn't seem to do exactly what I need. I've got it working with just 2 columns: isin number (company identifier), and date. In other words, I can spit out a list that shows the most recent date for each company, but I'm not sure how to add more columns to this, how to specify what metric to look at.
Any guidance would be appreciated, even if it's just pointing me in the right direction towards what kind of commands I should be looking into.
Thanks!
EDIT: Wow. Thanks for the quick and thorough replies. And point taken on the clarity and example data sets/starting query. Update: I think I have it working. Here's what I used:
SELECT a1.["id_isin_number"], a1.["metric_description"], a1.["date_period_ends"], a1.["company_metric_value"], a2.maxdate
FROM [AGR Metrics].[dbo].[Audit_Integrity_Metric_Data_File_NA Original_0] a1
INNER JOIN (
SELECT a2.["id_isin_number"], MAX(a2.["date_period_ends"]) AS maxdate
FROM [AGR Metrics].[dbo].[Audit_Integrity_Metric_Data_File_NA Original_0] a2
GROUP BY a2.["id_isin_number"]
) a2
ON a1.["date_period_ends"] = a2.maxdate
AND a1.["id_isin_number"] = a2.["id_isin_number"]
WHERE a1.["metric_description"] = '"Litigation: Class Action"'
I'm looking over the responses now to make sure I'm doing this as efficiently as possible.
You can use the ROW_NUMBER() function for this (if using SQL Server 2005 or newer):
SELECT *
FROM (SELECT *,ROW_NUMBER() OVER(PARTITION BY isin ORDER BY [date] DESC) AS RowRank
FROM YourTable
)sub
WHERE RowRank = 1
Just list out the fields you want in place of * if you don't want them all returned.
The ROW_NUMBER() function adds a number to each row, PARTITION BY is optional and is used to define a group for which numbering will start over at 1, in this case, you want the most recent for each value of isin so we PARTITION BY that. ORDER BY is required and defines the order of the numbering, in this case by date.
Your current query can also be used, but the ROW_NUMBER() method is simpler and more efficient:
SELECT a.*
FROM YourTable a
JOIN (SELECT isin, MAX([date])
FROM YourTable
GROUP BY isin
)b
ON a.isin = b.isin
AND a.[date] = b.[date]
Well as you quote the date the metric was released , So you can use it to sort your table using Order By .
This is a very basic example which can be used to simply sort data and selecting top 1 value.
Please refer This
CREATE TABLE trialOne (
Id INT NULL,
NAME VARCHAR(50) NULL,
[Date] DATETIME NULL
)
SELECT * FROM dbo.ETProgram
INSERT INTO trialone VALUES(1,'john','2009-01-06 11:39:51.827')
INSERT INTO trialone VALUES(2,'joseph','2010-01-06' )
INSERT INTO trialone VALUES(3,'Ajay','2009-05-06' )
INSERT INTO trialone VALUES(4,'Dave','2009-11-06' )
INSERT INTO trialone VALUES(5,'jonny','2004-01-06')
INSERT INTO trialone VALUES(6,'sunny','2005-01-06')
INSERT INTO trialone VALUES(7,'elle','2013-01-06' )
INSERT INTO trialone VALUES(8,'mac','2012-01-06' )
INSERT INTO trialone VALUES(8,'Sam','2008-01-06' )
INSERT INTO trialone VALUES(10,'xxxxx','2013-08-06')
SELECT TOP(1)name FROM trialone ORDER BY Date DESC

sql query to get earliest date

If I have a table with columns id, name, score, date
and I wanted to run a sql query to get the record where id = 2 with the earliest date in the data set.
Can you do this within the query or do you need to loop after the fact?
I want to get all of the fields of that record..
If you just want the date:
SELECT MIN(date) as EarliestDate
FROM YourTable
WHERE id = 2
If you want all of the information:
SELECT TOP 1 id, name, score, date
FROM YourTable
WHERE id = 2
ORDER BY Date
Prevent loops when you can. Loops often lead to cursors, and cursors are almost never necessary and very often really inefficient.
SELECT TOP 1 ID, Name, Score, [Date]
FROM myTable
WHERE ID = 2
Order BY [Date]
While using TOP or a sub-query both work, I would break the problem into steps:
Find target record
SELECT MIN( date ) AS date, id
FROM myTable
WHERE id = 2
GROUP BY id
Join to get other fields
SELECT mt.id, mt.name, mt.score, mt.date
FROM myTable mt
INNER JOIN
(
SELECT MIN( date ) AS date, id
FROM myTable
WHERE id = 2
GROUP BY id
) x ON x.date = mt.date AND x.id = mt.id
While this solution, using derived tables, is longer, it is:
Easier to test
Self documenting
Extendable
It is easier to test as parts of the query can be run standalone.
It is self documenting as the query directly reflects the requirement
ie the derived table lists the row where id = 2 with the earliest date.
It is extendable as if another condition is required, this can be easily added to the derived table.
Try
select * from dataset
where id = 2
order by date limit 1
Been a while since I did sql, so this might need some tweaking.
Using "limit" and "top" will not work with all SQL servers (for example with Oracle).
You can try a more complex query in pure sql:
select mt1.id, mt1."name", mt1.score, mt1."date" from mytable mt1
where mt1.id=2
and mt1."date"= (select min(mt2."date") from mytable mt2 where mt2.id=2)

Correct sql/hql query (aggregate in where clause)

I want to do query as below. Query is wrong but describes my intentions.
SELECT name, dateTime, data
FROM Record
WHERE dateTime = MAX(dateTime)
Update: Ok. The query describes intentions not quite good. My bad.
I want to select latest record for each person.
Try This:
SELECT name, dateTime, data
FROM Record
WHERE dateTime = SELECT MAX(dateTime) FROM Record
You could also write it using an inner join:
SELECT R.name, R.dateTime, R.data
FROM Record R
INNER JOIN (SELECT MAX(dateTime) FROM Record) RMax ON R.dateTime = RMax.dateTime
Which is the same but written from a different perspective
SELECT R.name, R.dateTime, R.data
FROM Record R,
(SELECT MAX(dateTime) FROM Record) RMax
WHERE R.dateTime = RMax.dateTime
I like Miky's answer and the from Quassnoi (and upvoted Miky's) but, if your needs are similar to mine, you should keep in mind some limitations. First and most importantly, it only works if you are looking for the latest record overall or the latest record for a single name. If you want the latest record for each person in a set (one record per person but the latest record for each) then the above solutions fall short. Second, and less importantly, if you'll be working with large datasets, might prove a bit slow over the long run. So, what is the work-around?
What I do is to add a bit field to the table marked "newest." Then, when I store a record (which is done in a stored procedure in SQL Server) I follow this pattern:
Update Table Set Newest=0 Where Name=#Name
Insert into Table (Name, dateTimeVal, Data, Newest) Values (#Name, GetDate(), #Data, 1);
Also, there is an index on Name and Newest to make Selects very fast.
Then the Select is just:
Select dateTimeVal, Data From Table Where (Name=#Name) and (Newest=1);
A select for a group will be something like:
Select Name, dateTimeVal, Data from Table Where (Newest=1); -- Gets multiple records
If the records may not be entered in date order, then your logic is a little bit different:
Update Table Set Newest=0 Where Name=#Name
Insert into Table (Name, dateTimeVal, Data, Newest) Values (#Name, GetDate(), #Data, 0); -- NOTE ZERO
Update Table Set Newest=1 Where dateTimeVal=(Select Max(dateTimeVal) From Table Where Name=#Name);
The rest stays the same.
In MySQL and PostgreSQL:
SELECT name, dateTime, data
FROM Record
ORDER BY
dateTime DESC
LIMIT 1
In SQL Server:
SELECT TOP 1 name, dateTime, data
FROM Record
ORDER BY
dateTime DESC
In Oracle
SELECT *
FROM (
SELECT name, dateTime, data
FROM Record
ORDER BY
dateTime DESC
)
WHERE rownum = 1
Update:
To select one person for each record, in SQL Server, use this:
WITH q AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY person ORDER BY dateTime DESC)
FROM Record
)
SELECT *
FROM q
WHERE rn = 1
or this:
SELECT ro.*
FROM (
SELECT DISTINCT person
FROM Record
) d
CROSS APPLY
(
SELECT TOP 1 *
FROM Record r
WHERE r.person = d.person
ORDER BY
dateTime DESC
) ro
See this article in my blog:
SQL Server: Selecting records holding group-wise maximum
for benefits and drawbacks of both solutions.
I tried Milky's advice but all three ways of constructing subquery resulted in HQL parser errors.
What does work though, is a slight change to the first method (added extra parentheses).
SELECT name, dateTime, data
FROM Record
WHERE dateTime = (SELECT MAX(dateTime) FROM Record)
PS: This is just for pointing out the obvious to HQL newbies and the like. Thought it would help.