Unknown processing time of SQL statement - sql

I have a query like this
SELECT
a.LeaseNo, RIGHT(a.LeaseNo, 1) AS idx
FROM
Leases a
LEFT OUTER JOIN
DebitNoteItems b ON a.leaseno = b.LeaseNo
WHERE
status = 'A'
AND PortfolioType = 'R'
AND b.NoteItemID IS NULL
Result set is found in less than 1 second.
Then I tried to do find the records with idx = 0, thus I wrote
SELECT LeaseNo
FROM
(SELECT
a.LeaseNo, RIGHT(a.LeaseNo, 1) AS idx
FROM
Leases a
LEFT OUTER JOIN
DebitNoteItems b ON a.leaseno = b.LeaseNo
WHERE
status = 'A'
AND PortfolioType = 'R'
AND b.NoteItemID IS NULL) tmp
WHERE
tmp.idx = '0'
However, this query is very slow.
Then I tried
SELECT LeaseNo
FROM
(SELECT
a.LeaseNo, RIGHT(a.LeaseNo, 1) AS idx
FROM
Leases a
LEFT OUTER JOIN
DebitNoteItems b ON a.leaseno = b.LeaseNo
WHERE
status = 'A'
AND PortfolioType = 'R'
AND b.NoteItemID IS NULL) tmp
WHERE
tmp.idx LIKE '%0%'
This one is executed in less than 1 second.
I want to know why could the second one much faster than the first one when there are only 1 simple condition different (= '0' and LIKE '%0%')? What am I talking about is not few seconds difference, by querying it and the differences are less than 1 second for the second query (applied LIKE), and more than a minute (applied =, in fact it is still querying, it is terminated manually, it doesn't look like it is comparing the idx in the queried tmp table)
Is there something wrong or inappropriate in the query?

Related

Match specific or default value on multiple columns

raw_data :
name
account_id
type
element_id
cost
First
1
type1
element1
0.1
Second
2
type2
element2
0.2
First
11
type2
element11
0.11
components:
name
account_id (default = -1)
type (default = null)
element_id (default = null)
cost
First
-1
null
null
0.1
Second
2
type2
null
0.2
First
11
type2
element11
0.11
I seek to check whether the cost logged in raw_data is the same as that in components for a given combination. They need to be joined on column name.
Remaining fields in raw_data are always populated. In components, any row can be a combination of specific values and the default values.
I seek to match the columns from raw_data to components wherever I find a match and otherwise need to use the default value to get the cost.
I failed with left join and union and IN.
E.g. For the first row in raw_data table with name "First", I do not have account_id = 1 in the components table. So I need to go with account_id = -1.
Match as many specific values as found in components, Otherwise resort to default values.
I think one way you could do this is something like:
SELECT *
FROM
(
SELECT rd.name, rd.account_id, rd.type, rd.element_id, rd.cost raw_cost, c.account_id component_account_id, c.type component_type, c.element_id component_element_id, c.cost component_cost,
row_number() OVER (PARTITION BY rd.name, rd.account_id, rd.type, rd.element_id
ORDER BY
CASE WHEN c.account_id <> -1 THEN 1 END
+ CASE WHEN c.type IS NOT NULL THEN 1 END
+ CASE WHEN c.element_id IS NOT NULL THEN 1 END DESC) rd
FROM raw_data rd LEFT OUTER JOIN components c
ON rd.name = c.name
AND (rd.account_id = c.account_id or c.account_id = -1)
AND (rd.type = c.type OR c.type IS NULL)
AND (rd.element_id = c.element_id OR c.element_id IS NULL)
) iq
WHERE rd = 1
The idea here is to match on an actual match or the default. Then the row_number window function is used to prioritize the matches based on a count of how many columns actually matched (you said you don't care about ties, so this doesn't handle that). The outer query throws away the matches that aren't the best.
With the sample data above, this could be an inner join, but I left it as a left join since that's what was mentioned.
Here's a fiddle of it working. Hopefully this is close to what you want.
If the ratio of records in the tables is not 1 to 1, then an unambiguous sample cannot be made.
Also, if the selection condition is "Coincidence of at least one parameter", then it will also not work to make an unambiguous selection.
Below is an example that will bring you closer to solving the problem. It selects data that matches one of the selection criteria, however there may be duplicates!
Try this variant and report the result, possibly on a larger variant of samples and records
Maybe this will get you closer to the solution.
select rd.name, rd.account_id, rd.type, rd.element_id, rd.cost, c.cost
from raw_data rd
left join components c on rd.name = c.name
where (c.account_id = rd.account_id or c.account_id = -1) OR
(c.type = rd.type OR c.type is null) OR
(c.element_id = rd.element_id OR c.element_id = null)
You can build the priority of checking values through union
select rd.name, rd.account_id, rd.type, rd.element_id, rd.cost, c.cost
from raw_data rd
left join components c on rd.name = c.name
where c.account_id = rd.account_id and
c.type = rd.type
c.element_id = rd.element_id
union
select rd.name, rd.account_id, rd.type, rd.element_id, rd.cost, c.cost
from raw_data rd
left join components c on rd.name = c.name
where c.account_id = rd.account_id and
c.type = rd.type
union
select rd.name, rd.account_id, rd.type, rd.element_id, rd.cost, c.cost
from raw_data rd
join components c on rd.name = c.name
where c.account_id = rd.account_id
etc
Without seeing all the problems, all the data options in the tables, it is difficult to give the right solution, which may not be...

SQL CASE statement only updates one value froom join table

I am trying to do a case statement to update multiple columns, but only the first value from my join table AS 'b' is getting picked up. Below I have EDLCode = 1 and this value gets picked up. If I set the EDLCode = 2, it does not update anything. I can separate the updates into it's own blocks and those update correctly, but I was hoping to combine and make this code a little more condense. Any ideas?
UPDATE mainTable
SET
col_one = CASE WHEN EDLCode = 1 THEN b.Amount END,
col_two = CASE WHEN EDLCode = 2 THEN b.Amount END
FROM #mainDataTable AS a
INNER JOIN #deductLiabData AS b
ON a.Employee = b.Employee
WHERE b.EDLType = 'D'
Result would be:
EDLCode Amount
1 100
2 200
col_one col_two
100 null

Best way to compare two sets of data w/ SQL

What I have is a query that grabs a set of data. This query is ran at a certain time. Then, 30 minutes later, I have another query (same syntax) that runs and grabs that same set of data. Finally, I have a third query (which is the query in question) that compares both sets of data. The records it pulls out are ones that agree with: if "FEDVIP_Active" was FALSE in the first data set and TRUE in the second data set, OR "UniqueID" didn't exist in the first data set and does in the second data set AND FEDVIP_Active is TRUE. I'm questioning the performance of the query below that does the comparison. It times out after 30 minutes. Is there anything you can see that I shouldn't be doing in order to be the most efficient to run? The two identical-ish data sets I'm comparing have around a million records each.
First query that grabs the initial set of data:
select Unique_ID, First_Name, FEDVIP_Active, Email_Primary
from Master_Subscribers_Prospects
Second query is exactly the same as the first.
Then, the third query below compares the data:
select
a.FEDVIP_Active,
a.Unique_ID,
a.First_Name,
a.Email_Primary
from
Master_Subscribers_Prospects_1 a
inner join
Master_Subscribers_Prospects_2 b
on 1 = 1
where a.FEDVIP_Active = 1 and b.FEDVIP_Active = 0 or
(b.Unique_ID not in (select Unique_ID from Master_Subscribers_Prospects_1) and b.FEDVIP_Active = 1)
If I understand correctly, you want all records from the second data set where the corresponding unique id in the first data set is not active (either by not existing or by having the flag set to not active).
I would suggest exists:
select a.*
from Master_Subscribers_Prospects_1 a
where a.FEDVIP_Active = 1 and
not exists (select 1
from Master_Subscribers_Prospects_2 b
where b.Unique_ID = a.Unique_ID and
b.FEDVIP_Active = 1
);
For performance, you want an index on Master_Subscribers_Prospects_2(Unique_ID, FEDVIP_Active).
An inner join on 1 = 1 is a disguised cross join and the number of rows a cross join produces can grow rapidly. It's the product of the number of rows in both relations involved. For performance you want to keep intermediate results as small as possible.
Then instead of IN EXISTS is often performing better, when the number of rows of the subquery is large.
But I think you don't need IN or EXITS at all.
Assuming unique_id identifies a record and is not null, you could left join the first table to the second one on common unique_ids. Then if and only if no record for an unique_id in the second table exits the unique_id of the first table in the result of the join is null, so you can check for that.
SELECT b.fedvip_active,
b.unique_id,
b.first_name,
b.email_primary
FROM master_subscribers_prospects_2 b
LEFT JOIN master_subscribers_prospects_1 a
ON b.unique_id = a.unique_id
WHERE a.fedvip_active = 1
AND b.fedvip_active = 0
OR a.unique_id IS NULL
AND b.fedvip_active = 1;
For that query indexes on master_subscribers_prospects_1 (unique_id, fedvip_active) and master_subscribers_prospects_2 (unique_id, fedvip_active) might also help to speed things up.
Doing an inner select in where sats is always bad.
Here is a same version with a left join, that might work for you.
select
a.FEDVIP_Active,
a.Unique_ID,
a.First_Name,
a.Email_Primary
from
Master_Subscribers_Prospects_1 a
inner join
Master_Subscribers_Prospects_2 b on 1 = 1
left join Master_Subscribers_Prospects_1 sa on sa.Unique_ID = b.Unique_ID
where (a.FEDVIP_Active = 1 and b.FEDVIP_Active = 0) or
(sa.Unique_ID is null and b.FEDVIP_Active = 1)

How to check on which column to create Index to optimize performance

I have below query which is costing too much time and i have to optimize the query performance. There is no index on any of the table.
But now for query performance optimization i am thinking to create index. But not sure on particulary which filtered column i have to create index.
I am thinking i will do group by and count the number of distinct records for all the filtered column condition and then decide on which column i should create index but not sure about this.
Select * from ORDER_MART FOL where FOL.PARENT_PROD_SRCID
IN
(
select e.PARENT_PROD_SRCID
from SRC_GRP a
JOIN MAR_GRP b ON a.h_lpgrp_id = b.h_lpgrp_id
JOIN DATA_GRP e ON e.parent_prod_srcid = b.H_LOCPR_ID
WHERE a.CHILD_LOCPR_ID != 0
AND dt_id BETWEEN 20170101 AND 20170731
AND valid_order = 1
AND a.PROD_TP_CODE like 'C%'
)
AND FOL.PROD_SRCID = 0 and IS_CAPS = 1;
Below is my query execution plan:
Select *
from ORDER_MART FOL
INNER JOIN (
select distinct e.PARENT_PROD_SRCID
from SRC_GRP a
JOIN MAR_GRP b ON a.h_lpgrp_id = b.h_lpgrp_id
JOIN DATA_GRP e ON e.parent_prod_srcid = b.H_LOCPR_ID
WHERE a.CHILD_LOCPR_ID != 0 -- remove the lines from INT_CDW_DV.S_LOCAL_PROD_GRP_MAIN with child prod srcid equal to 0
AND dt_id BETWEEN 20170101 AND 20170731
AND valid_order = 1 --and is_caps=1
AND a.PROD_TP_CODE like 'C%'
) sub ON sub.PARENT_PROD_SRCID=FOL.PARENT_PROD_SRCID
where FOL.PROD_SRCID = 0 and IS_CAPS = 1;
What if you use JOIN instead of IN and add distinct to reduce amount of rows in the subquery.

Performance Issue in Left outer join Sql server

In my project I need find difference task based on old and new revision in the same table.
id | task | latest_Rev
1 A N
1 B N
2 C Y
2 A Y
2 B Y
Expected Result:
id | task | latest_Rev
2 C Y
So I tried following query
Select new.*
from Rev_tmp nw with (nolock)
left outer
join rev_tmp old with (nolock)
on nw.id -1 = old.id
and nw.task = old.task
and nw.latest_rev = 'y'
where old.task is null
when my table have more than 20k records this query takes more time?
How to reduce the time?
In my company don't allow to use subquery
Use LAG function to remove the self join
SELECT *
FROM (SELECT *,
CASE WHEN latest_Rev = 'y' THEN Lag(latest_Rev) OVER(partition BY task ORDER BY id) ELSE NULL END AS prev_rev
FROM Rev_tmp) a
WHERE prev_rev IS NULL
My answer assumes
You can't change the indexes
You can't use subqueries
All fields are indexed separately
If you look at the query, the only value that really reduces the resultset is latest_rev='Y'. If you were to eliminate that condition, you'd definitely get a table scan. So we want that condition to be evaluated using an index. Unfortunately a field that just values 'Y' and 'N' is likely to be ignored because it will have terrible selectivity. You might get better performance if you coax SQL Server into using it anyway. If the index on latest_rev is called idx_latest_rev then try this:
Set transaction isolated level read uncommitted
Select new.*
from Rev_tmp nw with (index(idx_latest_rev))
left outer
join rev_tmp old
on nw.id -1 = old.id
and nw.task = old.task
where old.task is null
and nw.latest_rev = 'y'
latest_Rev should be a Bit type (boolean equivalent), i better for performance (Detail here)
May be can you add index on id, task
, latest_Rev columns
You can try this query (replace left outer by not exists)
Select *
from Rev_tmp nw
where nw.latest_rev = 'y' and not exists
(
select * from rev_tmp old
where nw.id -1 = old.id and nw.task = old.task
)