Match specific or default value on multiple columns - sql

raw_data :
name
account_id
type
element_id
cost
First
1
type1
element1
0.1
Second
2
type2
element2
0.2
First
11
type2
element11
0.11
components:
name
account_id (default = -1)
type (default = null)
element_id (default = null)
cost
First
-1
null
null
0.1
Second
2
type2
null
0.2
First
11
type2
element11
0.11
I seek to check whether the cost logged in raw_data is the same as that in components for a given combination. They need to be joined on column name.
Remaining fields in raw_data are always populated. In components, any row can be a combination of specific values and the default values.
I seek to match the columns from raw_data to components wherever I find a match and otherwise need to use the default value to get the cost.
I failed with left join and union and IN.
E.g. For the first row in raw_data table with name "First", I do not have account_id = 1 in the components table. So I need to go with account_id = -1.
Match as many specific values as found in components, Otherwise resort to default values.

I think one way you could do this is something like:
SELECT *
FROM
(
SELECT rd.name, rd.account_id, rd.type, rd.element_id, rd.cost raw_cost, c.account_id component_account_id, c.type component_type, c.element_id component_element_id, c.cost component_cost,
row_number() OVER (PARTITION BY rd.name, rd.account_id, rd.type, rd.element_id
ORDER BY
CASE WHEN c.account_id <> -1 THEN 1 END
+ CASE WHEN c.type IS NOT NULL THEN 1 END
+ CASE WHEN c.element_id IS NOT NULL THEN 1 END DESC) rd
FROM raw_data rd LEFT OUTER JOIN components c
ON rd.name = c.name
AND (rd.account_id = c.account_id or c.account_id = -1)
AND (rd.type = c.type OR c.type IS NULL)
AND (rd.element_id = c.element_id OR c.element_id IS NULL)
) iq
WHERE rd = 1
The idea here is to match on an actual match or the default. Then the row_number window function is used to prioritize the matches based on a count of how many columns actually matched (you said you don't care about ties, so this doesn't handle that). The outer query throws away the matches that aren't the best.
With the sample data above, this could be an inner join, but I left it as a left join since that's what was mentioned.
Here's a fiddle of it working. Hopefully this is close to what you want.

If the ratio of records in the tables is not 1 to 1, then an unambiguous sample cannot be made.
Also, if the selection condition is "Coincidence of at least one parameter", then it will also not work to make an unambiguous selection.
Below is an example that will bring you closer to solving the problem. It selects data that matches one of the selection criteria, however there may be duplicates!
Try this variant and report the result, possibly on a larger variant of samples and records
Maybe this will get you closer to the solution.
select rd.name, rd.account_id, rd.type, rd.element_id, rd.cost, c.cost
from raw_data rd
left join components c on rd.name = c.name
where (c.account_id = rd.account_id or c.account_id = -1) OR
(c.type = rd.type OR c.type is null) OR
(c.element_id = rd.element_id OR c.element_id = null)
You can build the priority of checking values through union
select rd.name, rd.account_id, rd.type, rd.element_id, rd.cost, c.cost
from raw_data rd
left join components c on rd.name = c.name
where c.account_id = rd.account_id and
c.type = rd.type
c.element_id = rd.element_id
union
select rd.name, rd.account_id, rd.type, rd.element_id, rd.cost, c.cost
from raw_data rd
left join components c on rd.name = c.name
where c.account_id = rd.account_id and
c.type = rd.type
union
select rd.name, rd.account_id, rd.type, rd.element_id, rd.cost, c.cost
from raw_data rd
join components c on rd.name = c.name
where c.account_id = rd.account_id
etc
Without seeing all the problems, all the data options in the tables, it is difficult to give the right solution, which may not be...

Related

Trying to update a field conditionally in SQL Stored Procedure

I have a procedure that populates two sets of application information into the same fields. First the fields are filled out with applicable accounts from group "A" and then the same process happens for group "B" accounts.
Most of the group B fields are filled in by a insert/select statement. However, the query to select "account number" is a little more complex and that is in an UPDATE statement. I will paste the code below but I cannot get it to properly update the rows (for group B) with account numbers, despite the fact the query works on its own outside the procedure (essentially, the account numbers do exist).
Any idea why? I tried adding a case statement to single out group B rows (the where clause is hardcoded for group B... e.g. clfcode = 3) but that didn't work. Let me know if you need more information. I haven't much experience with update statements in stored procedures.
update src
set account_key = case when src.clfcode = 3 and src.branch_key = 12 then a.account_key else src.account_key end
from #src_table src
inner join SDFDW_Landing.cu.FICS_ms_Investor_Loan l
on l.loan_id = src.application_number
left join dm.dim_product p
on p.product_key = src.product_key
left join (
Select Distinct t.PARENTACCOUNT, t.USERCHAR1 as loan_id
from SDFDW_Landing.dbo.tracking t
where t.TYPE = 1
and t.ProcessDate = #v_max_last_processed_date
and t.USERCHAR1 is not null
) t on t.loan_id = l.loan_id
left join dm.dim_account a
on t.PARENTACCOUNT = a.account_nkey
WHERE p.bdw_report_category = 'Mortgage'
and l.processdate = #v_max_last_processed_date
The join on a subquery might cause the issue. You could try to replace it with an apply and see if that helps.
update
src
set
account_key =
case
when
src.clfcode = 3
and src.branch_key = 12
then
a.account_key
else
src.account_key
end
from
#src_table src
inner join
SDFDW_Landing.cu.FICS_ms_Investor_Loan l
on l.loan_id = src.application_number
left join
dm.dim_product p
on p.product_key = src.product_key
outer apply (
Select
acc.*
from
dm.dim_account acc
inner join
SDFDW_Landing.dbo.tracking t
on acc.account_nkey = t.parentaccount
where
t.TYPE = 1
and t.ProcessDate = #v_max_last_processed_date
and t.USERCHAR1 is not null
and t.loan_id = l.loan_id
) a
WHERE
p.bdw_report_category = 'Mortgage'
and l.processdate = #v_max_last_processed_date
alternatively since you are already within a stored procedure, I'd populate a temp table with the data from your subquery and simply join on that temp table from your update statement.

Performance Issue in Left outer join Sql server

In my project I need find difference task based on old and new revision in the same table.
id | task | latest_Rev
1 A N
1 B N
2 C Y
2 A Y
2 B Y
Expected Result:
id | task | latest_Rev
2 C Y
So I tried following query
Select new.*
from Rev_tmp nw with (nolock)
left outer
join rev_tmp old with (nolock)
on nw.id -1 = old.id
and nw.task = old.task
and nw.latest_rev = 'y'
where old.task is null
when my table have more than 20k records this query takes more time?
How to reduce the time?
In my company don't allow to use subquery
Use LAG function to remove the self join
SELECT *
FROM (SELECT *,
CASE WHEN latest_Rev = 'y' THEN Lag(latest_Rev) OVER(partition BY task ORDER BY id) ELSE NULL END AS prev_rev
FROM Rev_tmp) a
WHERE prev_rev IS NULL
My answer assumes
You can't change the indexes
You can't use subqueries
All fields are indexed separately
If you look at the query, the only value that really reduces the resultset is latest_rev='Y'. If you were to eliminate that condition, you'd definitely get a table scan. So we want that condition to be evaluated using an index. Unfortunately a field that just values 'Y' and 'N' is likely to be ignored because it will have terrible selectivity. You might get better performance if you coax SQL Server into using it anyway. If the index on latest_rev is called idx_latest_rev then try this:
Set transaction isolated level read uncommitted
Select new.*
from Rev_tmp nw with (index(idx_latest_rev))
left outer
join rev_tmp old
on nw.id -1 = old.id
and nw.task = old.task
where old.task is null
and nw.latest_rev = 'y'
latest_Rev should be a Bit type (boolean equivalent), i better for performance (Detail here)
May be can you add index on id, task
, latest_Rev columns
You can try this query (replace left outer by not exists)
Select *
from Rev_tmp nw
where nw.latest_rev = 'y' and not exists
(
select * from rev_tmp old
where nw.id -1 = old.id and nw.task = old.task
)

SQL Join / Union

I have two statements that I want to merge into one output.
Statement One:
select name from auxiliary_variable_inquiry
where inquiry_idbr_code = '063'
Returns the following list of names:
Name
------------
Affiliates
NetBookValue
Parents
Worldbase
Statement Two:
select name, value from auxiliary_variable_value
where inquiry_idbr_code = '063'
and ru_ref = 20120000008
and period = 200912
Returns the following:
Name Value
-------------------
Affiliates 112
NetBookValue 225.700
I would like to have an output like this:
Name Value
-------------------
Affiliates 112
NetBookValue 225.700
Parents 0
Worldbase 0
So basically, if the second query only returns 2 names and values, I'd still like to display the complete set of names from the first query, with no values. If all four values were returned by both queries, then all four would be displayed.
Sorry I must add, im using Ingres SQL so im unable to use the ISNULL function.
You can do a left join. This ensures that all records from the first table will stay included. Where value is null, no child record was found, and we use coalesce to display 0 in these cases.
select i.name, COALESCE(v.Value,0) from auxiliary_variable_inquiry i
left join auxiliary_variable_value v
on v.inquiry_idbr_code = i.inquiry_idbr_code
and v.ru_ref = 20120000008
and v.period = 200912
where i.inquiry_idbr_code = '063'
I'd recommend a self-JOIN using the LEFT OUTER JOIN syntax. Include your 'extra' conditions from the second query in the JOIN condition, while the first conditions stay in the WHERE, like this:
select a.name, CASE WHEN b.Value IS NULL THEN 0 ELSE b.Value END AS Value
from
auxiliary_variable_inquiry a
LEFT JOIN
auxiliary_variable_inquiry b ON
a.name = b.name and -- replace this with your real ID-based JOIN
a.inquiry_idbr_code = b.inquiry_idbr_code AND
b.ru_ref = 20120000008 AND
b.period = 200912
where a.inquiry_idbr_code = '063'
if i got right, you should use something like:
SELECT i.NAME,
v.NAME,
v.value
FROM auxiliary_variable_inquiry i
LEFT JOIN auxiliary_variable_value v
ON i.inquiry_idbr_code = v.inquiry_idbr_code
WHERE v.ru_ref = 20120000008
AND v.period = 200912

How can I join on multiple columns within the same table that contain the same type of info?

I am currently joining two tables based on Claim_Number and Customer_Number.
SELECT
A.*,
B.*,
FROM Company.dbo.Company_Master AS A
LEFT JOIN Company.dbp.Compound_Info AS B ON A.Claim_Number = B.Claim_Number AND A.Customer_Number = B.Customer_Number
WHERE A.Filled_YearMonth = '201312' AND A.Compound_Ind = 'Y'
This returns exactly the data I'm looking for. The problem is that I now need to join to another table to get information based on a Product_ID. This would be easy if there was only one Product_ID in the Compound_Info table for each record. However, there are 10. So basically I need to SELECT 10 additional columns for Product_Name based on each of those Product_ID's that are being selected already. How can do that? This is what I was thinking in my head, but is not working right.
SELECT
A.*,
B.*,
PD_Info_1.Product_Name,
PD_Info_2.Product_Name,
....etc {Up to 10 Product Names}
FROM Company.dbo.Company_Master AS A
LEFT JOIN Company.dbo.Compound_Info AS B ON A.Claim_Number = B.Claim_Number AND A.Customer_Number = B.Customer_Number
LEFT JOIN Company.dbo.Product_Info AS PD_Info_1 ON B.Product_ID_1 = PD_Info_1.Product_ID
LEFT JOIN Company.dbo.Product_Info AS PD_Info_2 ON B.Product_ID_2 = PD_Info_2.Product_ID
.... {Up to 10 LEFT JOIN's}
WHERE A.Filled_YearMonth = '201312' AND A.Compound_Ind = 'Y'
This query not only doesn't return the correct results, it also takes forever to run. My actual SQL is a lot longer and I've changed table names, etc but I hope that you can get the idea. If it matters, I will be creating a view based on this query.
Please advise on how to select multiple columns from the same table correctly and efficiently. Thanks!
I found put my extra stuff into CTE and add ROW_NUMBER to insure that I get only 1 row that I care about. it would look something like this. I only did for first 2 product info.
WITH PD_Info
AS ( SELECT Product_ID
,Product_Name
,Effective_Date
,ROW_NUMBER() OVER ( PARTITION BY Product_ID, Product_Name ORDER BY Effective_Date DESC ) AS RowNum
FROM Company.dbo.Product_Info)
SELECT A.*
,B.*
,PD_Info_1.Product_Name
,PD_Info_2.Product_Name
FROM Company.dbo.Company_Master AS A
LEFT JOIN Company.dbo.Compound_Info AS B
ON A.Claim_Number = B.Claim_Number
AND A.Customer_Number = B.Customer_Number
LEFT JOIN PD_Info AS PD_Info_1
ON B.Product_ID_1 = PD_Info_1.Product_ID
AND B.Fill_Date >= PD_Info_1.Effective_Date
AND PD_Info_2.RowNum = 1
LEFT JOIN PD_Info AS PD_Info_2
ON B.Product_ID_2 = PD_Info_2.Product_ID
AND B.Fill_Date >= PD_Info_2.Effective_Date
AND PD_Info_2.RowNum = 1

Query not working as expected

Goal
Select distinct ids from blog_news where
active = 1
title is not empty
has at least one picture unless picture is logo, or at least one video
The statement so far
select distinct n.id from blog_news n
left join blog_pics p ON n.id = p.blogid and active = '1' and trim(n.title) != ''
left join blog_vdos v ON n.id = v.blogid
where (p.islogo = '0' and p.id is not null) OR (v.id is not null)
order by `newsdate` desc, `createdate` desc
The issue
selects blog_news ids that have pictures, unless they're logos [correct]
selects blog_news ids that have both videos and pictures [correct]
does not select blog_news ids that have only videos [wrong]
How about this:
SELECT DISTINCT n.id
FROM blog_news n
WHERE n.active = '1'
AND trim(n.title) != ''
AND (EXISTS (SELECT 1
FROM blog_pics p
WHERE p.blogid = n.id
AND p.islogo = 0)
OR EXISTS (SELECT 1
FROM blog_vdos v
WHERE v.blogid = n.id)
)
ORDER BY n.newsdate desc, n.createdate desc
Where you are just interested in the existence (or not) of child rows then it is often clearer and easier to use EXISTS.
I can't see any problem in your query.
I expect active is column in blog_news table, you should call it n.active. If this column is in blog_pics table, then this is the problem.
I would add the condition (n.active, n.title) to WHERE, as it's not related to left join (blog_pics) - but that's just for better readability, the result would be the same.
You can write the query using sub selects as well:
SELECT n.id FROM blog_news n
WHERE n.active = 1 AND TRIM(n.title) != '' AND n.id IN (
SELECT DISTINCT p.blogid FROM blog_pics p WHERE p.islogo = 0 UNION
SELECT DISTINCT v.blogid FROM blog_vdos
);