SQL-query help needed - sql

I guess my first question was not really clear so here is it again.
I wrote a query that does not select the last five rows in my dataset:
with query AS
(select ROW_NUMBER() over (PARTITION BY SchemaName, TableName order by StartTime desc) AS num,
from logging.ImportLogging_MSH)
Select
from query
where num > 5 and SchemaName = 'opps' and TableName = 'vs4_status_history_address'
I want to extend my query so that it checks if any of the last five rows in EndTime has a NULL value, then the query skips that row, I do not however want to remove all null values in my Endtime.
My idea was to create an integer variable that gets incremented if EndTime has a null value
e.g. in java it'll usually be like this:
int x = 5;
if (EndTime == null) {
x++;
}
so I tried to add this to my query which is not correct:
declare #n INT = 5
with query AS
(select ROW_NUMBER() over (PARTITION BY SchemaName, TableName order by StartTime desc) AS num,
from logging.ImportLogging_MSH)
Select ,
case when EndTime is null then #n = #n + 1 end
from query
where num > n and SchemaName = 'opps' and TableName = 'vs4_status_history_address'
It's an assignment from my new student job and I don't want to ask around a lot on my first day so if you guys have any suggestions it'll help me a lot :)
both queries with and without the last 5 rows
The desired result I'm trying to get but this problem with this approach is that all NULL values in EndTime are removed, which I should that do
with query AS
(select ROW_NUMBER() over (PARTITION BY SchemaName, TableName order by StartTime desc) AS num, *
from logging.ImportLogging_MSH
where EndTime is not null)
Select *
from query
where num > 5 and SchemaName = 'opps' and TableName = 'vs4_status_history_address'
select *
from logging.ImportLogging_MSH
where SchemaName = 'opps' and TableName = 'vs4_status_history_address'
enter image description here
as you can see here, the last four rows that contain null values in EndTime are removed and the next last 5 rows are not selected
I'm trying to write this same query but without removing all null values off Endtime
The answer I'm getting when writing Steve's query
enter image description here

Related

How to get this column result in SQL Server

Looking to try to return the dataset below in SQL Server. I have the 2 columns TrailerPosition & Divider and looking to also return Zone. Zone would be calculated as starting with Zone 1 and then would change to zone 2 on the record after divider = 1. And then to 3 after the next record where Divider = 1. The screenshot below looks like the column I'm trying to return.
any ideas how this can be done in SQL Server?
Test data for the below:
declare #t table (TrailerPosition nvarchar(5),Divider bit);
insert into #t values ('01L',0),('01R',0),('02L',0),('02R',1),('03L',1),('03R',0),('04L',0),('04R',0),('05L',1),('05R',1),('06L',0),('06R',0),('07L',0),('07R',0),('08L',0),('08R',0),('09L',0),('09R',0),('10L',0),('10R',0),('11L',0),('11R',0),('12L',0),('12R',0),('13L',0),('13R',0),('14L',0),('14R',0),('15L',0),('15R',0);
If 2012+, the window functions would be a nice fit here
Select TrailerPosition
,Divider
,Zone = 1+sum(Flag) over (Order By TrailerPosition)
From (
Select *
,Flag = case when Lag(Divider,1) over (Order By TrailerPosition) =1 and Divider=0 then 1 else 0 end
From YourTable
) A
Returns
So, the Zone = 1 + The number of previous rows with the divider value of 0 and the previous row having a divider value of 1.
UPDATED
SELECT TrailerPosition, Divider,
(SELECT COUNT(*)
FROM #MyTable T1
WHERE (T1.TrailerPosition <= t0.TrailerPosition)
AND (T1.Divider = 0)
AND (SELECT Divider
FROM #MyTable t2
WHERE T2.TrailerPosition =
(SELECT MAX(T3.TrailerPosition)
FROM #MyTable T3
WHERE T3.TrailerPosition < T1.TrailerPosition)) = 1) + 1
AS Zone
FROM #MyTable t0

ROW_NUMBER() Query Plan SORT Optimization

The query below accesses the Votes table that contains over 30 million rows. The result set is then selected from using WHERE n = 1. In the query plan, the SORT operation in the ROW_NUMBER() windowed function is 95% of the query's cost and it is taking over 6 minutes to complete execution.
I already have an index on same_voter, eid, country include vid, nid, sid, vote, time_stamp, new to cover the where clause.
Is the most efficient way to correct this to add an index on vid, nid, sid, new DESC, time_stamp DESC or is there an alternative to using the ROW_NUMBER() function for this to achieve the same results in a more efficient manner?
SELECT v.vid, v.nid, v.sid, v.vote, v.time_stamp, v.new, v.eid,
ROW_NUMBER() OVER (
PARTITION BY v.vid, v.nid, v.sid ORDER BY v.new DESC, v.time_stamp DESC) AS n
FROM dbo.Votes v
WHERE v.same_voter <> 1
AND v.eid <= #EId
AND v.eid > (#EId - 5)
AND v.country = #Country
One possible alternative to using ROW_NUMBER():
SELECT
V.vid,
V.nid,
V.sid,
V.vote,
V.time_stamp,
V.new,
V.eid
FROM
dbo.Votes V
LEFT OUTER JOIN dbo.Votes V2 ON
V2.vid = V.vid AND
V2.nid = V.nid AND
V2.sid = V.sid AND
V2.same_voter <> 1 AND
V2.eid <= #EId AND
V2.eid > (#EId - 5) AND
V2.country = #Country AND
(V2.new > V.new OR (V2.new = V.new AND V2.time_stamp > V.time_stamp))
WHERE
V.same_voter <> 1 AND
V.eid <= #EId AND
V.eid > (#EId - 5) AND
V.country = #Country AND
V2.vid IS NULL
The query basically says to get all rows matching your criteria, then join to any other rows that match the same criteria, but which would be ranked higher for the partition based on the new and time_stamp columns. If none are found then this must be the row that you want (it's ranked highest) and if none are found that means that V2.vid will be NULL. I'm assuming that vid otherwise can never be NULL. If it's a NULLable column in your table then you'll need to adjust that last line of the query.

troubles with next and previous query

I have a list and the returned table looks like this. I took the preview of only one car but there are many more.
What I need to do now is check that the current KM value is larger then the previous and smaller then the next. If this is not the case I need to make a field called Trustworthy and should fill it with either 1 or 0 (true/ false).
The result that I have so far is this:
validKMstand and validkmstand2 are how I calculate it. It did not work in one list so that is why I separated it.
In both of my tries my code does not work.
Here is the code that I have so far.
FullList as (
SELECT
*
FROM
eMK_Mileage as Mileage
)
, ValidChecked1 as (
SELECT
UL1.*,
CASE WHEN EXISTS(
SELECT TOP(1)UL2.*
FROM FullList AS UL2
WHERE
UL2.FK_CarID = UL1.FK_CarID AND
UL1.KM_Date > UL2.KM_Date AND
UL1.KM > UL2.KM
ORDER BY UL2.KM_Date DESC
)
THEN 1
ELSE 0
END AS validkmstand
FROM FullList as UL1
)
, ValidChecked2 as (
SELECT
List1.*,
(CASE WHEN List1.KM > ulprev.KM
THEN 1
ELSE 0
END
) AS validkmstand2
FROM ValidChecked1 as List1 outer apply
(SELECT TOP(1)UL3.*
FROM ValidChecked1 AS UL3
WHERE
UL3.FK_CarID = List1.FK_CarID AND
UL3.KM_Date <= List1.KM_Date AND
List1.KM > UL3.KM
ORDER BY UL3.KM_Date DESC) ulprev
)
SELECT * FROM ValidChecked2 order by FK_CarID, KM_Date
Maybe something like this is what you are looking for?
;with data as
(
select *, rn = row_number() over (partition by fk_carid order by km_date)
from eMK_Mileage
)
select
d.FK_CarID, d.KM, d.KM_Date,
valid =
case
when (d.KM > d_prev.KM /* or d_prev.KM is null */)
and (d.KM < d_next.KM /* or d_next.KM is null */)
then 1 else 0
end
from data d
left join data d_prev on d.FK_CarID = d_prev.FK_CarID and d_prev.rn = d.rn - 1
left join data d_next on d.FK_CarID = d_next.FK_CarID and d_next.rn = d.rn + 1
order by d.FK_CarID, d.KM_Date
With SQL Server versions 2012+ you could have used the lag() and lead() analytical functions to access the previous/next rows, but in versions before you can accomplish the same thing by numbering rows within partitions of the set. There are other ways too, like using correlated subqueries.
I left a couple of conditions commented out that deal with the first and last rows for every car - maybe those should be considered valid is they fulfill only one part of the comparison (since the previous/next rows are null)?

Make a single column value from multiple rows column

I am facing a very isolated problem regarding to the dynamic sql query. I have two queries running on a single stored procedure. They are following
First query:
SELECT *
FROM (
SELECT ROW_NUMBER() OVER(ORDER BY viwPerformance.LastModifiedOn DESC) AS rowNumber,viwPerformance.* FROM viwPerformance WHERE OrgId=218 AND EmployeeId = 1668 AND IsTerminate = 0 AND TagId LIKE '%' + CAST(2893 AS VARCHAR) + '%' AND Archive='False' AND SmartGoalId IS NOT NULL
) AS E
WHERE rowNumber >= 1 AND
rowNumber < 11
it results all the column values and the SmartGoalId as
4471,2815,4751,4733,4863,4690,4691,4692,4693,4694
And the second query (here I need only SmartgoalId from the above query so I use stuff)
SELECT #strGoalIds = STUFF((SELECT ',' + CAST(SmartGoalId AS VARCHAR)
FROM (
SELECT ROW_NUMBER() OVER(ORDER BY viwPerformance.LastModifiedOn DESC) AS rowNumber,viwPerformance.* FROM viwPerformance WHERE OrgId=218 AND EmployeeId = 1668 AND IsTerminate = 0 AND TagId LIKE '%' + CAST(2893 AS VARCHAR) + '%' AND Archive='False' AND SmartGoalId IS NOT NULL
) AS E
WHERE rowNumber >= 1 AND
rowNumber < 11 FOR XML PATH('')), 1, 1, '')
and it's results the SmartgoalId as
4471,2815,4751,4733,4863,4651,4690,4691,4692,4693
Please note that the last id "4694" is not available from above query as the "4651"is added to it but it's not available from first query and this is correct that "4651" should not be in the second query result.
So my main point is why the second query gives different results as it's the same as the first query.
Note: Am I right that the Stuff function reversing the values and not giving them in correct order.
Because you have some rows with the same value for LastModifiedOn it depends how you want to handle ties.
If you want this query to always return the 10 most "recent" rows but always return the same ones when there are ties you can add another column to your ORDER BY viwPerformance.LastModifiedOn DESC clause that will make the sort unique and unchanging, like:
ORDER BY viwPerformance.LastModifiedOn,viwPerformance.SmartGoalId DESC)

How to find the average time difference between rows in a table?

I have a mysql database that stores some timestamps. Let's assume that all there is in the table is the ID and the timestamp. The timestamps might be duplicated.
I want to find the average time difference between consecutive rows that are not duplicates (timewise). Is there a way to do it in SQL?
If your table is t, and your timestamp column is ts, and you want the answer in seconds:
SELECT TIMESTAMPDIFF(SECOND, MIN(ts), MAX(ts) )
/
(COUNT(DISTINCT(ts)) -1)
FROM t
This will be miles quicker for large tables as it has no n-squared JOIN
This uses a cute mathematical trick which helps with this problem. Ignore the problem of duplicates for the moment. The average time difference between consecutive rows is the difference between the first timestamp and the last timestamp, divided by the number of rows -1.
Proof: The average distance between consecutive rows is the sum of the distance between consective rows, divided by the number of consecutive rows. But the sum of the difference between consecutive rows is just the distance between the first row and last row (assuming they are sorted by timestamp). And the number of consecutive rows is the total number of rows -1.
Then we just condition the timestamps to be distinct.
Are the ID's contiguous ?
You could do something like,
SELECT
a.ID
, b.ID
, a.Timestamp
, b.Timestamp
, b.timestamp - a.timestamp as Difference
FROM
MyTable a
JOIN MyTable b
ON a.ID = b.ID + 1 AND a.Timestamp <> b.Timestamp
That'll give you a list of time differences on each consecutive row pair...
Then you could wrap that up in an AVG grouping...
Here's one way:
select avg(timestampdiff(MINUTE,prev.datecol,cur.datecol))
from table cur
inner join table prev
on cur.id = prev.id + 1
and cur.datecol <> prev.datecol
The timestampdiff function allows you to choose between days, months, seconds, and so on.
If the id's are not consecutive, you can select the previous row by adding a rule that there are no other rows in between:
select avg(timestampdiff(MINUTE,prev.datecol,cur.datecol))
from table cur
inner join table prev
on prev.datecol < cur.datecol
and not exists (
select *
from table inbetween
where prev.datecol < inbetween.datecol
and inbetween.datecol < cur.datecol)
)
OLD POST but ....
Easies way is to use the Lag function and TIMESTAMPDIFF
SELECT
id,
TIMESTAMPDIFF('MINUTES', PREVIOUS_TIMESTAMP, TIMESTAMP) AS TIME_DIFF_IN_MINUTES
FROM (
SELECT
id,
TIMESTAMP,
LAG(TIMESTAMP, 1) OVER (ORDER BY TIMESTAMP) AS PREVIOUS_TIMESTAMP
FROM TABLE_NAME
)
Adapted for SQL Server from this discussion.
Essential columns used are:
cmis_load_date: A date/time stamp associated with each record.
extract_file: The full path to a file from which the record was loaded.
Comments:
There can be many records in each file. Records have to be grouped by the files loaded on the extract_file column. Intervals of days may pass between one file and the next being loaded. There is no reliable sequential value in any column, so the grouped rows are sorted by the minimum load date in each file group, and the ROW_NUMBER() function then serves as an ad hoc sequential value.
SELECT
AVG(DATEDIFF(day, t2.MinCMISLoadDate, t1.MinCMISLoadDate)) as ElapsedAvg
FROM
(
SELECT
ROW_NUMBER() OVER (ORDER BY MIN(cmis_load_date)) as RowNumber,
MIN(cmis_load_date) as MinCMISLoadDate,
CASE WHEN NOT CHARINDEX('\', extract_file) > 0 THEN '' ELSE RIGHT(extract_file, CHARINDEX('\', REVERSE(extract_file)) - 1) END as ExtractFile
FROM
TrafTabRecordsHistory
WHERE
court_id = 17
and
cmis_load_date >= '2019-09-01'
GROUP BY
CASE WHEN NOT CHARINDEX('\', extract_file) > 0 THEN '' ELSE RIGHT(extract_file, CHARINDEX('\', REVERSE(extract_file)) - 1) END
) t1
LEFT JOIN
(
SELECT
ROW_NUMBER() OVER (ORDER BY MIN(cmis_load_date)) as RowNumber,
MIN(cmis_load_date) as MinCMISLoadDate,
CASE WHEN NOT CHARINDEX('\', extract_file) > 0 THEN '' ELSE RIGHT(extract_file, CHARINDEX('\', REVERSE(extract_file)) - 1) END as ExtractFile
FROM
TrafTabRecordsHistory
WHERE
court_id = 17
and
cmis_load_date >= '2019-09-01'
GROUP BY
CASE WHEN NOT CHARINDEX('\', extract_file) > 0 THEN '' ELSE RIGHT(extract_file, CHARINDEX('\', REVERSE(extract_file)) - 1) END
) t2 on t2.RowNumber + 1 = t1.RowNumber