I'm currently working with a B2B sales database that doesn't have a unique identifier for each customer. New records are allocated an ID code when loaded but the a person can have more than one ID code for various reasons.
The business regularly runs events where they capture registration data. I'm trying to match the event data (pre DB load) to a table of existing contacts. This is proving challenging as nothing is unique (not even email addresses as they can be shared or associated with multiple records in the contacts master table).
I need to run some code to flag where new contacts exist in the database already. My thinking led me to a cascading process of 'try this, if not then try this' etc. using a CASE in the join.
However, it's not stopping at the first condition that's met and just returns everything that meets any condition - resulting in the same record in the contacts master being joined on more than once and duplicated in the results in many cases.
Is there a way to improve the join or is there just a better way to achieve matching across these data sets?
SELECT nc.email
,nc.firstname
,nc.lastname
,nc.company
,cm.id_code
,cm.sales_region
FROM [sales].[new_contacts] nc
LEFT JOIN [sales].[contact_master] cm
ON CASE when nc.email = cm.email AND nc.fullname = cm.fullname and nc.company = cm.company then 1
when nc.email = cm.email AND nc.fullname = cm.fullname then 1
when nc.email = cm.email then 1
when nc.fullname = cm.fullname then 1
else 0 END = 1
Related
I have a complex set of schema that I am trying to pull data out of for a report. The query for it joins a bunch of tables together and I am specifically looking to pull a subset of data where everything for it might be null. The original relations for the tables look as such.
Location.DeptFK
Dept.PK
Section.DeptFK
Subsection.SectionFK
Question.SubsectionFK
Answer.QuestionFK, SubmissionFK
Submission.PK, LocationFK
From here my problems begin to compound a little.
SELECT Section.StepNumber + '-' + Question.QuestionNumber AS QuestionNumberVar,
Question.Question,
Subsection.Name AS Subsection,
Section.Name AS Section,
SUM(CASE WHEN (Answer.Answer = 0) THEN 1 ELSE 0 END) AS NA,
SUM(CASE WHEN (Answer.Answer = 1) THEN 1 ELSE 0 END) AS AnsNo,
SUM(CASE WHEN (Answer.Answer = 2) THEN 1 ELSE 0 END) AS AnsYes,
(select count(distinct Location.Abbreviation) from Department inner join Plant on location.DepartmentFK = Department.PK WHERE(Department.Name = 'insertParameter'))
as total
FROM Department inner join
section on Department.PK = section.DepartmentFK inner JOIN
subsection on Subsection.SectionFK = Section.PK INNER JOIN
question on Question.SubsectionFK = Subsection.PK INNER JOIN
Answer on Answer.QuestionFK = question.PK inner JOIN
Submission on Submission.PK = Answer.SubmissionFK inner join
Location on Location.DepartmentFK = Department.PK AND Location.pk = Submission.PlantFK
WHERE (Department.Name = 'InsertParameter') AND (Submission.MonthTested = '1/1/2017')
GROUP BY Question.Question, QuestionNumberVar, Subsection.Name, Section.Name, Section.StepNumber
ORDER BY QuestionNumberVar;
There are 15 total locations, with this query I get 12. If I remove a relation in the join for Location I get 15 total locations but my answer data gets multiplied by 15. My issue is that not all locations are required to test at the same time so their answers should default to NA, They don't get records placed in the DB so the relationship between Location/Submission is absent.
I have a workaround almost in place via the select count distinct but, The second part is a query for finding what each location answered instead of a sum which brings the problem right back around. It also has to be dynamic because the input parameters for a department won't bring a static number of locations back each time.
I am still learning my SQL so any additional material to look at for building this query would also be appreciated. So I guess the big question here is, How would I go about creating default data in this query for anytime the Location/Submission relation has a null value?
Edit: Dummy Data
QuestionNumberVar | Section | Subsection | Question | AnsYes | AnsNo | NA (expected)
1-1.1 Math Algebra Did you do your homework? 10 1 1(4)
1-1.2 Math Algebra Did your dog eat it? 9 3 0(3)
2-1.1 English Greek Did you do your homework? 8 0 4(7)
I have tried making left joins at various applicable portions of the code to no avail. All attempts at left joins have ended with no effect on info output. This query feeds into the Dataset for an SSRS report. There are a couple workarounds for this particular section via an expression to take total Locations and subtract AnsYes and AnsNo to get the true NA value but as explained above doesn't help with my next query.
Edit: SQL Server 2012 for those who asked
Edit: my attempt at an isnull() on the missing data returns nothing I suspect because the query already eliminates the "null/missing" data. Left joining while doing this has also failed. The point of failure is on Submissions. if we bind it to Locations there are locations missing but if we don't bind it there are multiplied duplicates because Department has a One-To-Many with Location and not vice versa. I am unable to make any schema changes to improve this process.
There is a previous report that I am trying to emulate/update. It used C# logic to process data and run multiple queries to attain the same data. I don't have this luxury. (previous report exports to excel directly instead of SSRS). Here is the previous logic used.
select PK from Department where Name = 'InsertParameter';
select PK from Submission where LocationFK = 'Location.PK_var' and MonthTested = '1/1/2017'
Then it runs those into a loop where it processes nulls into NA using C# logic
EDIT (Mediocre Solution): I ended up doing the workaround of making a calculated field that subtracts Yes and No from the total # of Locations that have that Dept. This is a mediocre solution because I didn't solve my original problem and made 3 datasets that should have been displayed as a singular dataset. One for question info, one for each locations answer and one for locations that didnt participate. If a true answer comes up I will check its validity but for now, Problem psuedo solved.
I am very new to Access, and what I am trying to do seems like it should be very simple, but I can't seem to get it.
I am a structural engineer by trade and am making a database to design buildings.
My Diaphragm Analysis Table includes the fields "Floor_Name", "Story_Number", "Wall_Left", and "Wall_Right". I want to write a new query that looks in another query called "Shear_Wall_incremental_Deflection" and pulls information from it based on input from Diaphragm Analysis. I want to take the value in "Wall_Right" (SW01), find the corresponding value in "Shear_Wall_incremental_Deflection", and report the "Elastic_Deflection" corresponding to the "Story_Below" instead of the "Story_Number" in the Diaphragm Analysis Table. In the case where "Story_Number" = 1, "Story_Below" will be 0 and I want the output to be 0.
Same procedure for "Wall_Left", but I'm just taking it one step at a time.
It seems that I need to use a "DLookup" in the expression builder with TWO criteria, one that Wall_Right = Shear_Wall and one that Story_Number = Story_Below, but when I try this I just get errors.
"Shear_Wall_incremental_Deflection" includes shearwalls for all three stories, i.e. it starts at SW01 and goes through SWW for Story Number 3 and then starts again at SW01 for Story Number 2, and so on until Story Number 1. I only show a part of the query results in the image, but rest assured, there are "Elastic_Deflection" values for story numbers below 3.
Here is my attempt in the Expression Builder:
Right_Defl_in: IIf(IsNull([Diaphragm_Analysis]![Wall_Right]),0,DLookUp("[Elastic_Deflection_in]","[Shear_Wall_incremental_Deflection]","[Shear_Wall_incremental_Deflection]![Story_Below]=" & [Diaphragm_Analysis]![Story_Number]))
I know my join from Diaphragm_Analysis "Wall_Left" and "Wall_Right" must include all records from Diaphragm_Analysis and only those from "Shear_Wall_incremental_Deflection"![Shear_Walls] where the joined fields are equal, but that's about all I know.
Please let me know if I need to include more information or send out the database file.
Thanks for your help.
Diaphragm Analysis (Input Table)
Shear_Wall_incremental_Deflection (Partial Image of Query)
I think what you are missing is that you can and should join to Diaphragm_Analysis twice, first time to get the Story_Below value and second to use it to get the corresponding Elastic_Deflection value.
To handle the special case where Story_Below is zero, I would write a separate query (only requires one join this time) and 'OR together' the two queries using the UNION set operation (note the following SQL is untested):
SELECT swid.Floor_Name,
swid.Story_Number,
swid.Wall_Left,
da2.Elastic_Deflection AS Story_Below_Elastic_Deflection
FROM ( Shear_Wall_incremental_Deflection swid
INNER JOIN Diaphragm_Analysis da1
ON da1.ShearWall = swid.Wall_Left )
INNER JOIN Diaphragm_Analysis da2
ON da2.ShearWall = swid.Wall_Left
AND da2.Story_Number = da1.Story_Below
UNION
SELECT swid.Floor_Name,
swid.Story_Number,
swid.Wall_Left,
0 AS Story_Below_Elastic_Deflection
FROM Shear_Wall_incremental_Deflection swid
INNER JOIN Diaphragm_Analysis da1
ON da1.ShearWall = swid.Wall_Left
WHERE da1.Story_Below = 0;
I've assumed that there is no data where Story_Number is zero.
So Here is the problem I have a requirement where I need a customer type to equal two different things.
To Cover the requirement I don't need the customer type to equal Client, or Non client but equal Client, and Non_Client. Each Customer_No can have multiple Customer Types
Here is an example of what I have worked on so far. If you know a better way of optimizing this as well as solving the problem please let me know.
The out put should look like this
CustomerID CustomerType CustomerType
--------------------------------------
2345 Client NonClient
Select TB1.Customer_ID, IB1.Customer_Type, AS Non_client IB1.Customer_Type AS Client
From Client TB1, Client_ReF XB1, Client_Instr IB1, Client_XREC FB1
Where XB1.Client_NO = TB1.Client_NO
AND FB1.Client_ACCT = TB1.ACCT
AND XB1.Client_Instruct_NO = IB1.Client_Instruct_NO
AND FB1.Customer_ID= TB1. Client_NO
AND IB1.Client = 'Client'
AND IB1.Non_Client = 'NonClient'
I have omitted a few other filters that I felt were unnecessary. This also may not make sense, but I tried to change up the names of stuff as to keep myself in compliance.
First a small syntactic error:
You mustn't have a comma before the "AS Non_client "
Then what you are trying to do is make 1 value equal 2 different things for the same column which can never be true:
IB1.Customer_Type for 1 record can never be equal to "Client" and "NonClient" simultaneously.
The key here is that 1 customer can have multiple records and the records can differ in the customer_type. So to use that we need to join those records together which is easy since they share a Customer_ID:
Select TB1.Customer_ID,
IB1.Customer_Type AS Client,
IB2.Customer_Type AS Non_client
From Client TB1,
Client_ReF XB1,
Client_Instr IB1,
Client_Instr IB2,
Client_XREC FB1
Where XB1.Client_NO = TB1.Client_NO
AND FB1.Client_ACCT = TB1.ACCT
AND XB1.Client_Instruct_NO = IB1.Client_Instruct_NO
AND FB1.Customer_ID= TB1.Client_NO
AND IB1.Client = 'Client'
AND XB1.Client_Instruct_NO = IB2.Client_Instruct_NO
AND IB2.Non_Client = 'NonClient';
The above may not actually work due to me not fully understanding your data and structures but should put you on the right path. Particularly around the join of IB2 with XB1, you might have to join IB2 with all the same tables as IB1.
A better way than that however, and i'll leave you to research it, is using the EXISTS statement. The difference is that the above will join all records for the same customer together whereas EXISTS will just be satisfied if there's at least 1 instance of a "NonClient" record.
Ok I have a rather unique situation and I can't believe there is not a better way of doing this than my solution.
Requirements:
Table 2 - EpmTask_UserView_RM is a subset of table 1 -
MSP_EpmTask_UserView So while all the fields match Table 1 has many
more rows than table 2
Table 2 needs to get updated from table 1 based on the date a task has changed (We can't do a drop and replace) There are three cases:
Task updates where something has changed about the task (We will know based on the task date stamp)
Task Deletes where a task has been deleted
Task Adds where a new task exists
I have 3 different queries that do this and am thinking there is a better way.
**** DELETE Tasks from ZZZ_TEST_OF_UPDATE_MSP_EpmTask_UserView_RM table if no longer present in Production***/
USE [ProjectWebApp]
GO
DELETE FROM [dbo].[ZZZ_TEST_OF_UPDATE_MSP_EpmTask_UserView_RM]
WHERE [dbo].[ZZZ_TEST_OF_UPDATE_MSP_EpmTask_UserView_RM].TaskUID IN
(SELECT
/*Subquery to select all records in ZZZ_TEST_OF_UPDATE_MSP_EpmTask_UserView_RM NOT found in MSP_EpmTask_UserView_RM */
[ProjectWebApp].[dbo].[ZZZ_TEST_OF_UPDATE_MSP_EpmTask_UserView_RM].[TaskUID]
FROM [ProjectWebApp].[dbo].[ZZZ_TEST_OF_UPDATE_MSP_EpmTask_UserView_RM]
LEFT JOIN [MSPSPRO].[ProjectWebApp].[dbo].[MSP_EpmTask_UserView] as Prod
on Prod.TaskUID = [ProjectWebApp].[dbo].[ZZZ_TEST_OF_UPDATE_MSP_EpmTask_UserView_RM].TASKuid
where Prod.TaskUID is NULL)
Query 2 the Update
UPDATE [dbo].[ZZZ_TEST_OF_UPDATE_MSP_EpmTask_UserView_RM]
SET
[ProjectUID] = Source.[ProjectUID]
,[TaskUID] = Source.[TaskUID]
,[TaskName] = Source.[TaskName]
,[TaskIndex] = Source.[TaskIndex]
,[TaskOutlineLevel] = Source.[TaskOutlineLevel]
,[TaskOutlineNumber] = Source.[TaskOutlineNumber]
,[TaskStartDate] = Source.[TaskStartDate]
,[TaskFinishDate] = Source.[TaskFinishDate]
,[TaskActualStartDate] = Source.[TaskActualStartDate]
,[TaskActualFinishDate] = Source.[TaskActualFinishDate]
,[TaskPercentCompleted] = Source.[TaskPercentCompleted]
,[Health] = Source.[Health]
,[Milestone Significance Level] = Source.[Milestone Significance Level]
,[TaskModifiedDate] = Source.[TaskModifiedDate]
,[TaskBaseline1StartDate] = Source.[TaskBaseline1StartDate]
,[TaskBaseline1FinishDate] = Source.[TaskBaseline1FinishDate]
,[TaskBaseline1Duration] = Source.[TaskBaseline1Duration]
,[QueryTimestamp] = GetDate()
FROM [MSPSPRO].[ProjectWebApp].[dbo].[MSP_EpmTask_UserView] AS Source
WHERE Source.TaskUID = [dbo].[ZZZ_TEST_OF_UPDATE_MSP_EpmTask_UserView_RM].TaskUID
AND GetDate() - Source.TaskModifiedDate <= .01 -- Update any task changed in last 14 minutes (14 minutes = 1% of a full day, ie '.01')
GO
Task 3 the add
SELECT
[MSP_EpmProject_UserView].[ProjectUID]
,[TaskUID]
,[TaskName]
,[TaskIndex]
,[TaskOutlineLevel]
,[TaskOutlineNumber]
,[TaskStartDate]
,[TaskFinishDate]
,[TaskActualStartDate]
,[TaskActualFinishDate]
,[TaskPercentCompleted]
,[Health]
,[Milestone Significance Level]
,[TaskModifiedDate]
,[TaskBaseline1StartDate]
,[TaskBaseline1FinishDate]
,[TaskBaseline1Duration]
,GetDate() as QueryTimestamp
INTO [ProjectWebApp].[dbo].[MSP_EpmTask_UserView_RM]
FROM [MSPSPRO].[ProjectWebApp].[dbo].[MSP_EpmTask_UserView]
Inner Join [MSPSPRO].[ProjectWebApp].[dbo].[MSP_EpmProject_UserView]
on [MSP_EpmProject_UserView].projectUID = [MSP_EpmTask_UserView].ProjectUID
WHERE [SMO Programs] = 'SMO Day 1 Release Management'
AND [Milestone Significance Level] is not null
/*AND [TaskModifiedDate] > (getdate() - 1)*/
Thoughts?
This looks like an ideal situation for a MERGE statement. If you haven't used them much or at all, I'd strongly suggest this site as a primer.
A MERGE can carry out INSERT, UPDATE, and DELETE in one shot, in the right conditions. The basic idea is that you compare rows in two tables, your source and destination, and from that comparison (and potentially other conditions) you then take the appropriate action.
MERGE can perform very well because it carries out these actions in bulk - but do test it out. Sometimes people have found them to be slower than using the separate statements in some situations. Indexing correctly (Microsoft suggest indexing the columns used to join in both tables) can help immensely. Writing the MERGE statement correctly and well is important in terms of both getting the right result, and getting good performance - so definitely do your reading up if you haven't used them before. The above link is a good starter, but there are plenty of other articles around.
INSERT INTO Shipments (Column1...Column200)
SELECT
O.Value1,...
CL.Value199,
isnull(P.PriceFactor1,1)
FROM Orders O
JOIN Clients CL on O.ClientNo = CL.ClientNo
JOIN Calc C on CL.CalcCode = C.CalcCode
JOIN Prices P on CL.PriceKey = P.PriceKey
WHERE O.PriceFactor1 = P.PriceFactor1
AND O.PriceFactor2 = P.PriceFactor2
AND O.PriceFactor3 = P.PriceFactor3
The above query (part of a new stored procedure meant to replace an old and nasty one that used a cursor...) fails to return some rows, because the rows in Orders do not have matching rows in Prices. In such cases, we want the last value in the INSERT list to be 1 by default. Instead, the row is never built; or, when we tried to fix it by changing the WHERE conditions, it brought PriceFactor1 from a different row, which is also no good.
How it's supposed to work:
A row is created in table Orders. A third-party program then executes a stored procedure (Asp_BuildShipments) and displays the results once they have been inserted into table Shipments. This SP is meant to populate the table Shipments by pulling values from Orders, Clients, Drivers, Vehicles, Prices, Routes, and others. It's a long SP, and the array of tables is big and varied.
In table Orders:
PriceFactor1 | PriceFactor2 | PriceFactor3
12 | 10 | 8
In table Prices:
PriceFactor1 | PriceFactor2 | PriceFactor3
18 | 12 | 10
In a case such as this, the SP needs to recognize that no such rows exist in Prices and use a default value of 1 rather than skipping the row or pulling the price from a different row.
We've tried isnull(), CASE statements, and WHERE EXISTS, but to no avail.
The new SP is set based, and we want to leave it that way - the old one took minutes, the new one takes only a few seconds. But without passing row by row, we aren't sure how to check each individual Order to see if has a matching Price before building the row in Shipments.
I know there are details missing here, but I didn't want to write a 1,000 page question. If these details are insufficient, I'll post as much as I need to to help get your brains storming. Been stuck on this for a while now...
Thanks in advance :)
Could this be as simple as:
SELECT
O.Value1,...
CL.Value199,
isnull(P.PriceFactor1,1)
FROM Orders O
JOIN Clients CL on O.ClientNo = CL.ClientNo
JOIN Calc C on CL.CalcCode = C.CalcCode
LEFT JOIN Prices P on CL.PriceKey = P.PriceKey
AND O.PriceFactor1 = P.PriceFactor1
AND O.PriceFactor2 = P.PriceFactor2
AND O.PriceFactor3 = P.PriceFactor3
Richard Hansell's answer should work if the problem is as you suggest there are no matching rows in Prices. Since it's not working the problem is in the other joins.
One of these joins filters out data.
JOIN Clients CL on O.ClientNo = CL.ClientNo
JOIN Calc C on CL.CalcCode = C.CalcCode