I have a SQL query that I'm trying to debug. It works fine for small sets of data, but in large sets of data, this particular part of it causes it to take 45-50 seconds instead of being sub second in speed. This subquery is one of the select items in a larger query. I'm basically trying to figure out when the earliest work date is that fits in the same category as the current row we are looking at (from table dr)
ISNULL(CONVERT(varchar(25),(SELECT MIN(drsd.DateWorked) FROM [TableName] drsd
WHERE drsd.UserID = dr.UserID
AND drsd.Val1 = dr.Val1
OR (((drsd.Val2 = dr.Val2 AND LEN(dr.Val2) > 0) AND (drsd.Val3 = dr.Val3 AND LEN(dr.Val3) > 0) AND (drsd.Val4 = dr.Val4 AND LEN(dr.Val4) > 0))
OR (drsd.Val5 = dr.Val5 AND LEN(dr.Val5) > 0)
OR ((drsd.Val6 = dr.Val6 AND LEN(dr.Val6) > 0) AND (drsd.Val7 = dr.Val7 AND LEN(dr.Val2) > 0))))), '') AS WorkStartDate,
This winds up executing a key lookup some 18 million times on a table that has 346,000 records. I've tried creating an index on it, but haven't had any success. Also, selecting a max value in this same query is sub second in time, as it doesn't have to execute very many times at all.
Any suggestions of a different approach to try? Thanks!
Create a composite index on drsd (UserID, DateWorked).
It is also possible that the record distribution in drsd is skewed towards the greater dates, like this:
DateWorked Condition
01.01.2001 FALSE
02.01.2001 FALSE
…
18.04.2010 FALSE
19.04.2010 TRUE
In this case, the MAX query will need to browse over only 1 record, while the MIN query will have to browse all records from 2001 and further on.
In this case, you'll need to create four separate indexes:
UserId, Val1, DateWorked
UserId, Val2, Val3, Val4, DateWorked
UserId, Val5, DateWorked
UserId, Val6, Val7, DateWorked
and rewrite the subquery:
SELECT MIN(dateWorked)
FROM (
SELECT MIN(DateWorked) AS DateWorked
FROM drsd
WHERE UserID = dr.UserID
AND Val1 = dr.Val1
UNION ALL
SELECT MIN(DateWorked)
FROM drsd
WHERE UserID = dr.UserID
AND drsd.Val2 = dr.Val2 AND LEN(dr.Val2) > 0
AND drsd.Val3 = dr.Val3 AND LEN(dr.Val3) > 0
AND drsd.Val4 = dr.Val4 AND LEN(dr.Val4) > 0
UNION ALL
SELECT MIN(DateWorked)
FROM drsd
WHERE UserID = dr.UserID
AND drsd.Val5 = dr.Val5 AND LEN(dr.Val5) > 0
UNION ALL
SELECT MIN(DateWorked)
FROM drsd
WHERE UserID = dr.UserID
AND drsd.Val6 = dr.Val6 AND LEN(dr.Val6) > 0
AND drsd.Val7 = dr.Val7 AND LEN(dr.Val7) > 0
) q
Each query will use its own index and the final query will just select the minimal of the four values (which is instant).
Related
I guess my first question was not really clear so here is it again.
I wrote a query that does not select the last five rows in my dataset:
with query AS
(select ROW_NUMBER() over (PARTITION BY SchemaName, TableName order by StartTime desc) AS num,
from logging.ImportLogging_MSH)
Select
from query
where num > 5 and SchemaName = 'opps' and TableName = 'vs4_status_history_address'
I want to extend my query so that it checks if any of the last five rows in EndTime has a NULL value, then the query skips that row, I do not however want to remove all null values in my Endtime.
My idea was to create an integer variable that gets incremented if EndTime has a null value
e.g. in java it'll usually be like this:
int x = 5;
if (EndTime == null) {
x++;
}
so I tried to add this to my query which is not correct:
declare #n INT = 5
with query AS
(select ROW_NUMBER() over (PARTITION BY SchemaName, TableName order by StartTime desc) AS num,
from logging.ImportLogging_MSH)
Select ,
case when EndTime is null then #n = #n + 1 end
from query
where num > n and SchemaName = 'opps' and TableName = 'vs4_status_history_address'
It's an assignment from my new student job and I don't want to ask around a lot on my first day so if you guys have any suggestions it'll help me a lot :)
both queries with and without the last 5 rows
The desired result I'm trying to get but this problem with this approach is that all NULL values in EndTime are removed, which I should that do
with query AS
(select ROW_NUMBER() over (PARTITION BY SchemaName, TableName order by StartTime desc) AS num, *
from logging.ImportLogging_MSH
where EndTime is not null)
Select *
from query
where num > 5 and SchemaName = 'opps' and TableName = 'vs4_status_history_address'
select *
from logging.ImportLogging_MSH
where SchemaName = 'opps' and TableName = 'vs4_status_history_address'
enter image description here
as you can see here, the last four rows that contain null values in EndTime are removed and the next last 5 rows are not selected
I'm trying to write this same query but without removing all null values off Endtime
The answer I'm getting when writing Steve's query
enter image description here
I am not sure how to solve this problem through an SQL query.
Imagine you have a table like this one:
user_id
timestamp
quantity1
quantity2
A
2021/01/10
10
0
B
2021/01/17
10
0
A
2021/01/19
1
12
B
2021/01/25
10
8
A
2021/01/27
2
8
Now I want to aggregate by performing an ordered difference between quantity 1 and quantity 2.
So by a simple group by and sum I would have this result:
user_id
result
A
-7
B
12
However what I want is that whenever the next intermediary sum is smaller than 0, then that sum is set to 0 before aggregating the next quantity.
In this case I would like to keep B = 12, but A instead should be:
First aggregation: (10-0)
Second aggregation: 10 + (1-12) = -1
Now since this is a negative result, it should be set to 0.
Third aggregation: 0 + (2-8) = -6
And if there was another entry, a negative previous aggregation should always start from 0.
I hope it was clear.
Anybody knows how to do this in SQL?
Consider below approach
declare result, delta, count int64;
declare id, prev_id string;
create temp table results as (select * from (select '' as user_id, 0 as result) where false);
create temp table your_table as (
select 'A' user_id, '2021/01/10' timestamp, 10 quantity1, 0 quantity2 union all
select 'B', '2021/01/17', 10, 0 union all
select 'A', '2021/01/19', 1, 12 union all
select 'B', '2021/01/25', 10, 8 union all
select 'A', '2021/01/27', 2, 8
);
set (count, result, delta, prev_id) = (0, 0, 0, '');
for record in (select * from your_table order by user_id, timestamp) do
if record.user_id != prev_id and prev_id != '' then
insert into results values(prev_id , result + delta);
set result = 0;
end if;
set prev_id = record.user_id;
set result = result + record.quantity1 - record.quantity2;
if result < 0 then set (result, delta) = (0, result); else set delta = 0; end if;
end for;
insert into results values(prev_id , result + delta);
select * from results;
with output (in temp table results)
Obviously, instead of creating temp table your_table - you should just use you real table
Also, note use of just introduced (earlier today) FOR...IN Loop
The query below accesses the Votes table that contains over 30 million rows. The result set is then selected from using WHERE n = 1. In the query plan, the SORT operation in the ROW_NUMBER() windowed function is 95% of the query's cost and it is taking over 6 minutes to complete execution.
I already have an index on same_voter, eid, country include vid, nid, sid, vote, time_stamp, new to cover the where clause.
Is the most efficient way to correct this to add an index on vid, nid, sid, new DESC, time_stamp DESC or is there an alternative to using the ROW_NUMBER() function for this to achieve the same results in a more efficient manner?
SELECT v.vid, v.nid, v.sid, v.vote, v.time_stamp, v.new, v.eid,
ROW_NUMBER() OVER (
PARTITION BY v.vid, v.nid, v.sid ORDER BY v.new DESC, v.time_stamp DESC) AS n
FROM dbo.Votes v
WHERE v.same_voter <> 1
AND v.eid <= #EId
AND v.eid > (#EId - 5)
AND v.country = #Country
One possible alternative to using ROW_NUMBER():
SELECT
V.vid,
V.nid,
V.sid,
V.vote,
V.time_stamp,
V.new,
V.eid
FROM
dbo.Votes V
LEFT OUTER JOIN dbo.Votes V2 ON
V2.vid = V.vid AND
V2.nid = V.nid AND
V2.sid = V.sid AND
V2.same_voter <> 1 AND
V2.eid <= #EId AND
V2.eid > (#EId - 5) AND
V2.country = #Country AND
(V2.new > V.new OR (V2.new = V.new AND V2.time_stamp > V.time_stamp))
WHERE
V.same_voter <> 1 AND
V.eid <= #EId AND
V.eid > (#EId - 5) AND
V.country = #Country AND
V2.vid IS NULL
The query basically says to get all rows matching your criteria, then join to any other rows that match the same criteria, but which would be ranked higher for the partition based on the new and time_stamp columns. If none are found then this must be the row that you want (it's ranked highest) and if none are found that means that V2.vid will be NULL. I'm assuming that vid otherwise can never be NULL. If it's a NULLable column in your table then you'll need to adjust that last line of the query.
I have a list and the returned table looks like this. I took the preview of only one car but there are many more.
What I need to do now is check that the current KM value is larger then the previous and smaller then the next. If this is not the case I need to make a field called Trustworthy and should fill it with either 1 or 0 (true/ false).
The result that I have so far is this:
validKMstand and validkmstand2 are how I calculate it. It did not work in one list so that is why I separated it.
In both of my tries my code does not work.
Here is the code that I have so far.
FullList as (
SELECT
*
FROM
eMK_Mileage as Mileage
)
, ValidChecked1 as (
SELECT
UL1.*,
CASE WHEN EXISTS(
SELECT TOP(1)UL2.*
FROM FullList AS UL2
WHERE
UL2.FK_CarID = UL1.FK_CarID AND
UL1.KM_Date > UL2.KM_Date AND
UL1.KM > UL2.KM
ORDER BY UL2.KM_Date DESC
)
THEN 1
ELSE 0
END AS validkmstand
FROM FullList as UL1
)
, ValidChecked2 as (
SELECT
List1.*,
(CASE WHEN List1.KM > ulprev.KM
THEN 1
ELSE 0
END
) AS validkmstand2
FROM ValidChecked1 as List1 outer apply
(SELECT TOP(1)UL3.*
FROM ValidChecked1 AS UL3
WHERE
UL3.FK_CarID = List1.FK_CarID AND
UL3.KM_Date <= List1.KM_Date AND
List1.KM > UL3.KM
ORDER BY UL3.KM_Date DESC) ulprev
)
SELECT * FROM ValidChecked2 order by FK_CarID, KM_Date
Maybe something like this is what you are looking for?
;with data as
(
select *, rn = row_number() over (partition by fk_carid order by km_date)
from eMK_Mileage
)
select
d.FK_CarID, d.KM, d.KM_Date,
valid =
case
when (d.KM > d_prev.KM /* or d_prev.KM is null */)
and (d.KM < d_next.KM /* or d_next.KM is null */)
then 1 else 0
end
from data d
left join data d_prev on d.FK_CarID = d_prev.FK_CarID and d_prev.rn = d.rn - 1
left join data d_next on d.FK_CarID = d_next.FK_CarID and d_next.rn = d.rn + 1
order by d.FK_CarID, d.KM_Date
With SQL Server versions 2012+ you could have used the lag() and lead() analytical functions to access the previous/next rows, but in versions before you can accomplish the same thing by numbering rows within partitions of the set. There are other ways too, like using correlated subqueries.
I left a couple of conditions commented out that deal with the first and last rows for every car - maybe those should be considered valid is they fulfill only one part of the comparison (since the previous/next rows are null)?
I have a monstrous query in Oracle SQL. My problem is that I need to take the % of two related queries.
So, what I'm doing is:
SELECT type*100/decode(total, 0, 1, total) as result
from (SELECT
(select count(*) from tb1, tb2, tb3
where tb1.fieldA = tb2.fieldB
and tb2.fieldC = tb3.fieldD
and tb3.fieldE = 'Some stuf') as type,
(select count(*) from tb1, tb2, tb3
where tb1.fieldA = tb2.fieldB
and tb2.fieldC = tb3.fieldD) as total from dual) auxTable;
As you can see my variable called type is a subset of the total variable. This is a simplified example of a much bigger problem..
Is there any efficient way of selecting the subset(type) from the total and then getting the percentage?
Yes, there is a more efficient way of doing this:
SELECT type*100/DECODE(total, 0, 1, total) FROM (
SELECT COUNT(*) AS total, SUM(DECODE(tb3.fieldE, 'Some stuf', 1, 0)) AS type
FROM tb1, tb2, tb3
WHERE tb1.fieldA = tb2.fieldB
AND tb2.fieldC = tb3.fieldD
);
This SUM(DECODE(tb3.fieldE, 'Some stuf', 1, 0) will get a count of all records for which tb3.fieldE = 'Some stuff'. Alternately, you can use:
COUNT(CASE WHEN tb3.fieldE = 'Some stuff' THEN 1 END) AS type
It that CASE will return NULL when fieldE is not the chosen value, and NULLs are not counted in COUNT().