Two SQL Queries With Some Overlapping Data - sql

I'm trying to combine two different SQL queries into one table. I've tried various joins and Union but it either duplicates rows or doesn't show all of them.
The first query is
Select
HW.DisplayName,
HW.LocationDetails_0B39A057_2BE8_11B2_BBE2_1E03564AA5CA,
HW.Notes_5CFC0E2A_AB82_5830_D4BB_0596CBED1984
FROM MT_Cireson$AssetManagement$HardwareAsset HW
where HardwareAssetStatus_3019ADDF_4F3D_2C55_2024_72C22E11F4CF = '866879DF-8FB6-E521-F0E3-FEF86EE1BC92'
This gives all of my hardware assets that have the status I'm looking for.
The second query is:
SELECT
hw.DisplayName,
HW.LocationDetails_0B39A057_2BE8_11B2_BBE2_1E03564AA5CA,
HW.Notes_5CFC0E2A_AB82_5830_D4BB_0596CBED1984,
UB.UPN_7641DFF7_7A20_DC04_FC1C_B6FA8715DA02
FROM MT_Cireson$AssetManagement$HardwareAsset HW
inner join Relationship Rel on HW.BaseManagedEntityId = Rel.SourceEntityId
inner join RelationshipType RT on RT.RelationshipTypeId = Rel.RelationshipTypeId
inner join MT_Microsoft$AD$UserBase UB on UB.BaseManagedEntityId = Rel.TargetEntityId
where RT.RelationshipTypeName = 'Cireson.AssetManagement.HardwareAssetHasPrimaryUser'
and HardwareAssetStatus_3019ADDF_4F3D_2C55_2024_72C22E11F4CF = '866879DF-8FB6-E521-F0E3-FEF86EE1BC92'
This gives all of the hardware assets I'm looking for that have a primary user configured, but doesn't give the assets without a primary user. I'm not sure how to either A: combine the results just putting in NULL as a primary user for records that don't have one, or B: actually query all the assets at one time and include the primary user column.
I didn't write the second query and I'm not sure exactly how it works. I've tried doing union between the queries but that duplicates the rows because the first query already contains all the elements in the second.
Edit: The PrimaryUser comes from the MT_Microsoft$AD$UserBase table. I've tried adding another column to the first and just setting it as null like:
null as primaryUser,

How about a LEFT JOIN to include all records from HW that are not in UB:
SELECT
hw.DisplayName,
HW.LocationDetails_0B39A057_2BE8_11B2_BBE2_1E03564AA5CA,
HW.Notes_5CFC0E2A_AB82_5830_D4BB_0596CBED1984,
UB.UPN_7641DFF7_7A20_DC04_FC1C_B6FA8715DA02
FROM
MT_Cireson$AssetManagement$HardwareAsset HW
INNER JOIN
Relationship Rel
ON
HW.BaseManagedEntityId = Rel.SourceEntityId
INNER JOIN
RelationshipType RT
ON
RT.RelationshipTypeId = Rel.RelationshipTypeId
LEFT JOIN
MT_Microsoft$AD$UserBase UB
ON
UB.BaseManagedEntityId = Rel.TargetEntityId
WHERE
RT.RelationshipTypeName = 'Cireson.AssetManagement.HardwareAssetHasPrimaryUser'
AND HardwareAssetStatus_3019ADDF_4F3D_2C55_2024_72C22E11F4CF = '866879DF-8FB6-E521-F0E3-FEF86EE1BC92'
UPDATE:
If the null primary users is what you want, I would recraft the query like:
SELECT
hw.DisplayName,
HW.LocationDetails_0B39A057_2BE8_11B2_BBE2_1E03564AA5CA,
HW.Notes_5CFC0E2A_AB82_5830_D4BB_0596CBED1984,
UB.UPN_7641DFF7_7A20_DC04_FC1C_B6FA8715DA02
FROM
MT_Cireson$AssetManagement$HardwareAsset HW
LEFT JOIN
MT_Microsoft$AD$UserBase UB
ON
HW.BaseManagedEntityId = UB.SourceEntityId
INNER JOIN
RelationshipType RT
ON
RT.RelationshipTypeId = Rel.RelationshipTypeId
INNER JOIN
Relationship Rel
ON
UB.BaseManagedEntityId = Rel.TargetEntityId
WHERE
RT.RelationshipTypeName = 'Cireson.AssetManagement.HardwareAssetHasPrimaryUser'
AND HardwareAssetStatus_3019ADDF_4F3D_2C55_2024_72C22E11F4CF = '866879DF-8FB6-E521-F0E3-FEF86EE1BC92'
I LEFT JOIN'ed HW and UB tables.
As I said earlier, you'll have to tweak the joins. I would try a LEFT JOIN on all tables.

you have an extra column in your 1st query that's why you have duplicates in your union. I would suggest using CTE, there might be other better efficient solutions out there.
;WITH query1
AS
(
SELECT col1, col2
FROM table
),
query2 AS
(
SELECT col1, col2
FROM table
)
SELECT *
FROM cteTable1
UNION ALL
SELECT *
FROM cteTable2
WHERE NOT EXISTS(SELECT * FROM cteTable1 WHERE cteTable1.col1 = cteTable2.col2)

Related

SQL - How to put a condition for which table is selected without left join

I have a flag in a table which value ( 1 for US, or 2 for Global) indicates if the data will be in Table A or Table B.
A solution that works is to left join both tables; however this slows down significantly the scripts (from less than a second to over 15 seconds).
Is there any other clever way to do this? an equivalent of
join TableA only if TableCore.CountryFlag = "US"
join TableB only if TableCore.CountryFlag = "global"
Thanks a lot for the help.
You can try using this approach:
-- US data
SELECT
YourColumns
FROM
TableCore
INNER JOIN TableA AS T ON TableCore.JoinColumn = T.JoinColumn
WHERE
TableCore.CountryFlag = 'US'
UNION ALL
-- Non-US Data
SELECT
YourColumns -- These columns must match in number and datatype with previous SELECT
FROM
TableCore
INNER JOIN TableB AS T ON TableCore.JoinColumn = T.JoinColumn
WHERE
TableCore.CountryFlag = 'global'
However, if the result is still slow, you might want to check if the TableCore table has a index on CountryFlag and JoinColumn, and TableA and TableB an index on JoinColumn.
The basic structure is:
select . . ., coalesce(a.?, b.?) as ?
from tablecore c left join
tablea a
on c.? = a.? and c.countryflag = 'US' left join
tableb b
on c.? b.? and c.counryflag = 'global';
This version of the query can take advantage of indexes on tablea(?) and tableb(?).
If you have a complex query, this portion is probably not responsible for the performance problem.

SQL select results not appearing if a value is null

I am building a complex select statement, and when one of my values (pcf_auto_key) is null it will not disipaly any values for that header entry.
select c.company_name, h.prj_number, h.description, s.status_code, h.header_notes, h.cm_udf_001, h.cm_udf_002, h.cm_udf_008, l.classification_code
from project_header h, companies c, project_status s, project_classification l
where exists
(select company_name from companies where h.cmp_auto_key = c.cmp_auto_key)
and exists
(select status_code from project_status s where s.pjs_auto_key = h.pjs_auto_key)
and exists
(select classification_code from project_classification where h.pcf_auto_key = l.pcf_auto_key)
and pjm_auto_key = 11
--and pjt_auto_key = 10
and c.cmp_auto_key = h.cmp_auto_key
and h.pjs_auto_key = s.pjs_auto_key
and l.pcf_auto_key = h.pcf_auto_key
and s.status_type = 'O'
How does my select statement look? Is this an appropriate way of pulling info from other tables?
This is an oracle database, and I am using SQL Developer.
Assuming you want to show all the data that you can find but display the classification as blank when there is no match in that table, you can use a left outer join; which is much clearer with explicit join syntax:
select c.company_name, h.prj_number, h.description, s.status_code, h.header_notes,
h.cm_udf_001, h.cm_udf_002, h.cm_udf_008, l.classification_code
from project_header h
join companies c on c.cmp_auto_key = h.cmp_auto_key
join project_status s on s.pjs_auto_key = h.pjs_auto_key
left join project_classification l on l.pcf_auto_key = h.pcf_auto_key
where pjm_auto_key = 11
and s.status_type = 'O'
I've taken out the exists conditions as they just seem to be replicating the join conditions.
If you might not have matching data in any of the other tables you can make the other inner joins into outer joins in the same way, but be aware that if you outer join to project_status you will need to move the statatus_type check into the join condition as well, or Oracle will convert that back into an inner join.
Read more about the different kinds of joins.

Joining two tables on a key and then left outer joining a table on a number of criteria

I'm attempting to join 3 tables together in a single query. The first two have a key so each entry has a matching entry. This joined table will then be joined by a third table that could produce multiple entries for each entry from the first table (the joined ones).
select * from
(select a.bidentifier, a.bsession, a.symbol, b.jidentifier, b.JSession
from trade_monthly a, trade_monthly_second b
where
a.bidentifier = b.jidentifier AND
a.bsession = b.JSession)
left outer join
trade c
on c.symbol = a.symbol
order by a.bidentifier, a.bsession, a.symbol, b.jidentifier, b.JSession, c.symbol
There will be more criteria (not just c.symbol = a.symbol) on the left outer join but for now this should be useful. How can I nest the queries this way? I'm gettin gan SQL command not properly ended error.
Any help is appreciated.
Thanks
For what I know every derived table must be given a name; so try something like this:
SELECT * FROM
(SELECT a.bidentifier, ....
...
a.bsession = b.JSession) t
LEFT JOIN trade c
ON c.symbol = t.symbol
ORDER BY t.bidentifier, ...
Anyway I think you could use a simpler query:
SELECT a.bidentifier, a.bsession, a.symbol, b.jidentifier, b.JSession, c.*
FROM trade_monthly a
INNER JOIN trade_monthly_second b
ON a.bidentifier = b.jidentifier
AND a.bsession = b.JSession
LEFT JOIN trade c
ON c.symbol = a.symbol
ORDER BY a.bidentifier, a.bsession, a.symbol, b.jidentifier, b.JSession, c.symbol
Try this:
SELECT
`trade_monthly`.`bidentifier` AS `bidentifier`,
`trade_monthly`.`bsession` AS `bsession`,
`trade_monthly`.`symbol` AS `symbol`,
`trade_monthly_second`.`jidentifier` AS `jidentifier`,
`trade_monthly_second`.`jsession` AS `jsession`
FROM
(
(
`trade_monthly`
JOIN `trade_monthly_second` ON(
(
(
`trade_monthly`.`bidentifier` = `trade_monthly_second`.`jidentifier`
)
AND(
`trade_monthly`.`bsession` = `trade_monthly_second`.`jsession`
)
)
)
)
JOIN `trade` ON(
(
`trade`.`symbol` = `trade_monthly`.`symbol`
)
)
)
ORDER BY
`trade_monthly`.`bidentifier`,
`trade_monthly`.`bsession`,
`trade_monthly`.`symbol`,
`trade_monthly_second`.`jidentifier`,
`trade_monthly_second`.`jsession`,
`trade`.`symbol`
Why don't you just create a view of the two inner joined tables. Then you can build a query that joins this view to the trade table using the left outer join matching criteria.
In my opinion, views are one of the most overlooked solutions to a lot of complex queries.

SQL Server - getting info from two tables

I'm having trouble creating a query which in my mind should be simple.
I have two tables (tblReviews and tblRating). Both these tables have a venueId and a userId.
I want to create a single query that will return the review and the rating using the same venueId and userId. is this possible or should I use two queries?
Thanks in advance
SELECT Rev.column_name, Rat.column_name
FROM dbo.tblReview AS Rev
FULL OUTER JOIN dbo.tblRating AS Rat
ON Rev.VenueId = Rat.VenueId
AND Rev.UserId = Rat.UserId;
If you want all for a specific user:
SELECT Rev.column_name, Rat.column_name
FROM dbo.tblReview AS Rev
FULL OUTER JOIN dbo.tblRating AS Rat
ON Rev.VenueId = Rat.VenueId
AND Rev.UserId = Rat.UserId
WHERE (Rev.UserId = #UserId OR Rat.UserId = #UserId);
If you want all for a specific venue:
SELECT Rev.column_name, Rat.column_name
FROM dbo.tblReview AS Rev
FULL OUTER JOIN dbo.tblRating AS Rat
ON Rev.VenueId = Rat.VenueId
AND Rev.UserId = Rat.UserId
WHERE (Rev.VenueId = #VenueId OR Rat.VenueId = #VenueId);
You can
Join both tables using these fields;
..or (SELECT first) UNION (SELECT second)
SELECT *
FROM tblReviews AS rev INNER JOIN
tblRating AS rat ON rev.venueid = rat.venueid AND rev.userid = rat.userid
This query returns the matching rows from each tables. You can use outer joins (LEFT OUTER JOIN, RIGHT OUTER JOIN, also FULL OUTER JOIN) if you want all records from first, second, or both table.
I don't know if you're asking this.
Try:
SELECT re.venueId, re.userId, re.review, ra.rating
FROM tblReviews re INNER JOIN tblRating ra
ON re.venueId = ra.venueId AND re.userId = ra.userId

Super Slow Query - sped up, but not perfect... Please help

I posted a query yesterday (see here) that was horrible (took over a minute to run, resulting in 18,215 records):
SELECT DISTINCT
dbo.contacts_link_emails.Email, dbo.contacts.ContactID, dbo.contacts.First AS ContactFirstName, dbo.contacts.Last AS ContactLastName, dbo.contacts.InstitutionID,
dbo.institutionswithzipcodesadditional.CountyID, dbo.institutionswithzipcodesadditional.StateID, dbo.institutionswithzipcodesadditional.DistrictID
FROM
dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_3
INNER JOIN
dbo.contacts
INNER JOIN
dbo.contacts_link_emails
ON dbo.contacts.ContactID = dbo.contacts_link_emails.ContactID
ON contacts_def_jobfunctions_3.JobID = dbo.contacts.JobTitle
INNER JOIN
dbo.institutionswithzipcodesadditional
ON dbo.contacts.InstitutionID = dbo.institutionswithzipcodesadditional.InstitutionID
LEFT OUTER JOIN
dbo.contacts_def_jobfunctions
INNER JOIN
dbo.contacts_link_jobfunctions
ON dbo.contacts_def_jobfunctions.JobID = dbo.contacts_link_jobfunctions.JobID
ON dbo.contacts.ContactID = dbo.contacts_link_jobfunctions.ContactID
WHERE
(dbo.contacts.JobTitle IN
(SELECT JobID
FROM dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_1
WHERE (ParentJobID <> '1841')))
AND
(dbo.contacts_link_emails.Email NOT IN
(SELECT EmailAddress
FROM dbo.newsletterremovelist))
OR
(dbo.contacts_link_jobfunctions.JobID IN
(SELECT JobID
FROM dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_2
WHERE (ParentJobID <> '1841')))
AND
(dbo.contacts_link_emails.Email NOT IN
(SELECT EmailAddress
FROM dbo.newsletterremovelist AS newsletterremovelist))
ORDER BY EMAIL
With a lot of coaching and research, I've tuned it up to the following:
SELECT contacts.ContactID,
contacts.InstitutionID,
contacts.First,
contacts.Last,
institutionswithzipcodesadditional.CountyID,
institutionswithzipcodesadditional.StateID,
institutionswithzipcodesadditional.DistrictID
FROM contacts
INNER JOIN contacts_link_emails ON
contacts.ContactID = contacts_link_emails.ContactID
INNER JOIN institutionswithzipcodesadditional ON
contacts.InstitutionID = institutionswithzipcodesadditional.InstitutionID
WHERE
(contacts.ContactID IN
(SELECT contacts_2.ContactID
FROM contacts AS contacts_2
INNER JOIN contacts_link_emails AS contacts_link_emails_2 ON
contacts_2.ContactID = contacts_link_emails_2.ContactID
LEFT OUTER JOIN contacts_def_jobfunctions ON
contacts_2.JobTitle = contacts_def_jobfunctions.JobID
RIGHT OUTER JOIN newsletterremovelist ON
contacts_link_emails_2.Email = newsletterremovelist.EmailAddress
WHERE (contacts_def_jobfunctions.ParentJobID <> 1841)
GROUP BY contacts_2.ContactID
UNION
SELECT contacts_1.ContactID
FROM contacts_link_jobfunctions
INNER JOIN contacts_def_jobfunctions AS contacts_def_jobfunctions_1 ON
contacts_link_jobfunctions.JobID = contacts_def_jobfunctions_1.JobID
AND contacts_def_jobfunctions_1.ParentJobID <> 1841
INNER JOIN contacts AS contacts_1 ON
contacts_link_jobfunctions.ContactID = contacts_1.ContactID
INNER JOIN contacts_link_emails AS contacts_link_emails_1 ON
contacts_link_emails_1.ContactID = contacts_1.ContactID
LEFT OUTER JOIN newsletterremovelist AS newsletterremovelist_1 ON
contacts_link_emails_1.Email = newsletterremovelist_1.EmailAddress
GROUP BY contacts_1.ContactID))
While this query is now super fast (about 3 seconds), I've blown part of the logic somewhere - it only returns 14,863 rows (instead of the 18,215 rows that I believe is accurate).
The results seem near correct. I'm working to discover what data might be missing in the result set.
Can you please coach me through whatever I've done wrong here?
Thanks,
Russell Schutte
The main problem with your original query was that you had two extra joins just to introduce duplicates and then a DISTINCT to get rid of them.
Use this:
SELECT cle.Email,
c.ContactID,
c.First AS ContactFirstName,
c.Last AS ContactLastName,
c.InstitutionID,
izip.CountyID,
izip.StateID,
izip.DistrictID
FROM dbo.contacts c
INNER JOIN
dbo.institutionswithzipcodesadditional izip
ON izip.InstitutionID = c.InstitutionID
INNER JOIN
dbo.contacts_link_emails cle
ON cle.ContactID = c.ContactID
WHERE cle.Email NOT IN
(
SELECT EmailAddress
FROM dbo.newsletterremovelist
)
AND EXISTS
(
SELECT NULL
FROM dbo.contacts_def_jobfunctions cdj
WHERE cdj.JobId = c.JobTitle
AND cdj.ParentJobId <> '1841'
UNION ALL
SELECT NULL
FROM dbo.contacts_link_jobfunctions clj
JOIN dbo.contacts_def_jobfunctions cdj
ON cdj.JobID = clj.JobID
WHERE clj.ContactID = c.ContactID
AND cdj.ParentJobId <> '1841'
)
ORDER BY
email
Create the following indexes:
newsletterremovelist (EmailAddress)
contacts_link_jobfunctions (ContactID, JobID)
contacts_def_jobfunctions (JobID)
Do you get the same results when you do:
SELECT count(*)
FROM
dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_3
INNER JOIN
dbo.contacts
INNER JOIN
dbo.contacts_link_emails
ON dbo.contacts.ContactID = dbo.contacts_link_emails.ContactID
ON contacts_def_jobfunctions_3.JobID = dbo.contacts.JobTitle
SELECT COUNT(*)
FROM
contacts
INNER JOIN contacts_link_jobfunctions
ON contacts.ContactID = contacts_link_jobfunctions.ContactID
INNER JOIN contacts_link_emails
ON contacts.ContactID = contacts_link_emails.ContactID
If so keep adding each join conditon on until you don't get the same results and you will see where your mistake was. If all the joins are the same, then look at the where clauses. But I will be surprised if it isn't in the first join because the syntax you have orginally won't even work on SQL Server and it is pretty nonstandard SQL and may have been incorrect all along but no one knew.
Alternatively, pick a few of the records that are returned in the orginal but not the revised. Track them through the tables one at a time to see if you can find why the second query filters them out.
I'm not directly sure what is wrong, but when I run in to this situation, the first thing I do is start removing variables.
So, comment out the where clause. How many rows are returned?
If you get back the 11,604 rows then you've isolated the problems to the joins. Work though the joins, commenting each one out (remove the associated columns too) and figure out how many rows are eliminated.
As you do this, aim to find what is causing the desired rows to be eliminated. Once isolated, consider the join differences between the first query and the second query.
In looking at the first query, you could probably just modify that to eliminate any INs and instead do a EXISTS instead.
Consider your indexes as well. Any thing in the where or join clauses should probably be indexed.