Getting multiple returns on distinct selection - sql

This is going to look like a mess. I have a request for multiple tables to output into one spreadsheet. I'm still new at this and have only really used inner joins before. Here's what my SQL looks like:
select distinct (o.objectnumber), g.locale, g.locus, g.excavation, g.mapreferencenumber,
case when t.texttypeid like '9' then t.textentry end,
case when f.userfieldid like '25' then f.fieldvalue end
from objects o
inner join TextEntries t on t.id = o.objectid
inner join ObjGeography g on g.ObjectID = o.objectid
inner join userfieldxrefs f on f.id = o.objectid
inner join PackageList pl on o.objectID = pl.ID
inner join Packages p on pl.PackageID = p.PackageID
where p.packageid like '8502'
order by g.mapreferencenumber asc
I know, it's a mess right? It's giving me the correct output, but is also creating multiple rows with the same data. I've done some googling on this and have seen some remarks about outer joins, but I'm not sure how to correctly apply this to my statement. Any ideas?
Sorry; I got called away and just now had a chance to check on this. It's my first question and I guess I didn't really understand what you guys needed to see in order to understand what I wanted. It looks like the group by function will work best, I had seen this when researching the statement, but didn't understand how to properly implement it. Thanks everyone!

it is better to use GROUP BY for the columns you want in the resultset
select o.objectnumber,
g.locale,
g.locus,
g.excavation,
g.mapreferencenumber,
case when t.texttypeid like '9' then t.textentry end,
case when f.userfieldid like '25' then f.fieldvalue end
from objects o
inner join TextEntries t on t.id = o.objectid
inner join ObjGeography g on g.ObjectID = o.objectid
inner join userfieldxrefs f on f.id = o.objectid
inner join PackageList pl on o.objectID = pl.ID
inner join Packages p on pl.PackageID = p.PackageID
where p.packageid like '8502'
group by o.objectnumber,
g.locale,
g.locus,
g.excavation,
g.mapreferencenumber,
case when t.texttypeid like '9' then t.textentry end,
case when f.userfieldid like '25' then f.fieldvalue end
order by g.mapreferencenumber asc

Related

SQL Multiple inner joins with max() for latest recorded entry

Attempting to build SQL with INNER JOIN's. The INNER JOIN's work ok, now I need to add the MAX() function for limiting the rows to just most recent. Added this INNER JOIN client_diagnosis_record ON SELECT cr.PATID, cr.date_of_diagnosis, cr.most_recent_diagnosis...
Received this SQL code error, need some help, I'm sure it a simple oversight but my eyes are getting dim from looking so long...
Syntax error: [SQLCODE: <-4>:
SQLCODE: <-4>:<A term expected, beginning with one of the following: identifier, constant, aggregate, %ALPHAUP, %EXACT, %MVR, %SQLSTRING, %
[%msg: < The SELECT list of the subquery
SELECT pd.patient_name,
cr.PATID,
cr.date_of_diagnosis,
cr.EPISODE_NUMBER,
ce.diagnosing_clinician_value,
ce.data_entry_user_name,
most_recent_diagnosis
FROM client_diagnosis_record cr
INNER JOIN patient_current_demographics pd ON cr.patid = pd.patid
INNER JOIN client_diagnosis_entry ce ON ce.patid = pd.patid
AND cr.ID = ce.DiagnosisRecord
INNER JOIN client_diagnosis_record ON (SELECT cr.PATID,
cr.date_of_diagnosis,
cr.most_recent_diagnosis
FROM ( SELECT patid,
date_of_diagnosis,
MAX(ID) AS most_recent_diagnosis
FROM client_diagnosis_record) cr
INNER JOIN RADplus_users ru ON ru.staff_member_id = ce.diagnosing_clinician_code
WHERE cr.PATID <> '1'
AND ce.diagnosis_status_value ='Active'
AND (ru.user_description LIKE '%SOA' OR ru.user_description LIKE '%OA')
GROUP BY cr.PATID
I tried to re-format you query and it seems your query syntax is not correct. You may try below query -
SELECT pd.patient_name,
cr.PATID,
cr.date_of_diagnosis,
cr.EPISODE_NUMBER,
ce.diagnosing_clinician_value,
ce.data_entry_user_name,
most_recent_diagnosis
FROM client_diagnosis_record cr
INNER JOIN (SELECT patid,
date_of_diagnosis,
MAX(ID) AS most_recent_diagnosis
FROM client_diagnosis_record
GROUP BY patid,
date_of_diagnosis) cr2 ON cr.PATID = cr2.PATID
AND cr.date_of_diagnosis = cr2.date_of_diagnosis
AND cr.ID = cr2.most_recent_diagnosis
INNER JOIN patient_current_demographics pd ON cr.patid = pd.patid
INNER JOIN client_diagnosis_entry ce ON ce.patid = pd.patid
AND cr.ID = ce.DiagnosisRecord
INNER JOIN RADplus_users ru ON ru.staff_member_id = ce.diagnosing_clinician_code
WHERE cr.PATID <> '1'
AND ce.diagnosis_status_value ='Active'
AND (ru.user_description LIKE '%SOA' OR ru.user_description LIKE '%OA')
GROUP BY cr.PATID

Retrieve additional rows if bit flag is true

I have a large stored procedure that is used to return results for a dialog with many selections. I have a new criteria to get "extra" rows if a particular bit column is set to true. The current setup looks like this:
SELECT
CustomerID,
FirstName,
LastName,
...
FROM HumongousQuery hq
LEFT JOIN (
-- New Query Text
) newSubQuery nsq ON hq.CustomerID = nsq.CustomerID
I have the first half of the new query:
SELECT DISTINCT
c.CustomerID,
pp.ProjectID,
ep.ProductID
FROM Customers c
JOIN Evaluations e (NOLOCK)
ON c.CustomerID = e.CustomerID
JOIN EvaluationProducts ep (NOLOCK)
ON e.EvaluationID = ep.EvaluationID
JOIN ProjectProducts pp (NOLOCK)
ON ep.ProductID = pp.ProductID
JOIN Projects p
ON pp.ProjectID = p.ProjectID
WHERE
c.EmployeeID = #EmployeeID
AND e.CurrentStepID = 5
AND p.IsComplete = 0
The Projects table has a bit column, AllowIndirectCustomers, which tells me that this project can use additional customers when the value is true. As far as I can tell, the majority of the different SQL constructs are geared towards adding additional columns to the result set. I tried different permutations of the UNION command, with no luck. Normally, I would turn to a table-valued function, but I haven't been able to make it work with this scenerio.
This one has been a stumper for me. Any ideas?
So basically, you're looking to negate the need to match pp.ProjectID = p.ProjectID when the flag is set. You can do that right in the JOIN criteria:
JOIN Projects p
ON pp.ProjectID = p.ProjectID OR p.AllowIndirectCustomers = 1
Depending on the complexity of your tables, this might not work out too easily, but you could do a case statement on your bit column. Something like this:
select table1.id, table1.value,
case table1.flag
when 1 then
table2.value
else null
end as secondvalue
from table1
left join table2 on table1.id = table2.id
Here's a SQL Fiddle demo

How to improve the performance of a SQL query even after adding indexes?

I am trying to execute the following sql query but it takes 22 seconds to execute. the number of returned items is 554192. I need to make this faster and have already put indexes in all the tables involved.
SELECT mc.name AS MediaName,
lcc.name AS Country,
i.overridedate AS Date,
oi.rating,
bl1.firstname + ' ' + bl1.surname AS Byline,
b.id BatchNo,
i.numinbatch ItemNumberInBatch,
bah.changedatutc AS BatchDate,
pri.code AS IssueNo,
pri.name AS Issue,
lm.neptunemessageid AS MessageNo,
lmt.name AS MessageType,
bl2.firstname + ' ' + bl2.surname AS SourceFullName,
lst.name AS SourceTypeDesc
FROM profiles P
INNER JOIN profileresults PR
ON P.id = PR.profileid
INNER JOIN items i
ON PR.itemid = I.id
INNER JOIN batches b
ON b.id = i.batchid
INNER JOIN itemorganisations oi
ON i.id = oi.itemid
INNER JOIN lookup_mediachannels mc
ON i.mediachannelid = mc.id
LEFT OUTER JOIN lookup_cities lc
ON lc.id = mc.cityid
LEFT OUTER JOIN lookup_countries lcc
ON lcc.id = mc.countryid
LEFT OUTER JOIN itembylines ib
ON ib.itemid = i.id
LEFT OUTER JOIN bylines bl1
ON bl1.id = ib.bylineid
LEFT OUTER JOIN batchactionhistory bah
ON b.id = bah.batchid
INNER JOIN itemorganisationissues ioi
ON ioi.itemorganisationid = oi.id
INNER JOIN projectissues pri
ON pri.id = ioi.issueid
LEFT OUTER JOIN itemorganisationmessages iom
ON iom.itemorganisationid = oi.id
LEFT OUTER JOIN lookup_messages lm
ON iom.messageid = lm.id
LEFT OUTER JOIN lookup_messagetypes lmt
ON lmt.id = lm.messagetypeid
LEFT OUTER JOIN itemorganisationsources ios
ON ios.itemorganisationid = oi.id
LEFT OUTER JOIN bylines bl2
ON bl2.id = ios.bylineid
LEFT OUTER JOIN lookup_sourcetypes lst
ON lst.id = ios.sourcetypeid
WHERE p.id = #profileID
AND b.statusid IN ( 6, 7 )
AND bah.batchactionid = 6
AND i.statusid = 2
AND i.isrelevant = 1
when looking at the execution plan I can see an step which is costing 42%. Is there any way I could get this to a lower threshold or any way that I can improve the performance of the whole query.
Remove the profiles table as it is not needed and change the WHERE clause to
WHERE PR.profileid = #profileID
You have a left outer join on the batchactionhistory table but also have a condition in your WHERE clause which turns it back into an inner join. Change you code to this:
LEFT OUTER JOIN batchactionhistory bah
ON b.id = bah.batchid
AND bah.batchactionid = 6
You don't need the batches table as it is used to join other tables which could be joined directly and to show the id in you SELECT which is also available in other tables. Make the following changes:
i.batchidid AS BatchNo,
LEFT OUTER JOIN batchactionhistory bah
ON i.batchidid = bah.batchid
Are any of the fields that are used in joins or the WHERE clause from tables that contain large amounts of data but are not indexed. If so try adding an index on at time to the largest table.
Do you need every field in the result - if you could loose one or to you maybe could reduce the number of tables further.
First, if this is not a stored procedure, make it one. That's a lot of text for sql server to complile.
Next, my experience is that "worst practices" are occasionally a good idea. Specifically, I have been able to improve performance by splitting large queries into a couple or three small ones and assembling the results.
If this query is associated with a .net, coldfusion, java, etc application, you might be able to do the split/re-assemble in your application code. If not, a temporary table might come in handy.

Super Slow Query - sped up, but not perfect... Please help

I posted a query yesterday (see here) that was horrible (took over a minute to run, resulting in 18,215 records):
SELECT DISTINCT
dbo.contacts_link_emails.Email, dbo.contacts.ContactID, dbo.contacts.First AS ContactFirstName, dbo.contacts.Last AS ContactLastName, dbo.contacts.InstitutionID,
dbo.institutionswithzipcodesadditional.CountyID, dbo.institutionswithzipcodesadditional.StateID, dbo.institutionswithzipcodesadditional.DistrictID
FROM
dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_3
INNER JOIN
dbo.contacts
INNER JOIN
dbo.contacts_link_emails
ON dbo.contacts.ContactID = dbo.contacts_link_emails.ContactID
ON contacts_def_jobfunctions_3.JobID = dbo.contacts.JobTitle
INNER JOIN
dbo.institutionswithzipcodesadditional
ON dbo.contacts.InstitutionID = dbo.institutionswithzipcodesadditional.InstitutionID
LEFT OUTER JOIN
dbo.contacts_def_jobfunctions
INNER JOIN
dbo.contacts_link_jobfunctions
ON dbo.contacts_def_jobfunctions.JobID = dbo.contacts_link_jobfunctions.JobID
ON dbo.contacts.ContactID = dbo.contacts_link_jobfunctions.ContactID
WHERE
(dbo.contacts.JobTitle IN
(SELECT JobID
FROM dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_1
WHERE (ParentJobID <> '1841')))
AND
(dbo.contacts_link_emails.Email NOT IN
(SELECT EmailAddress
FROM dbo.newsletterremovelist))
OR
(dbo.contacts_link_jobfunctions.JobID IN
(SELECT JobID
FROM dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_2
WHERE (ParentJobID <> '1841')))
AND
(dbo.contacts_link_emails.Email NOT IN
(SELECT EmailAddress
FROM dbo.newsletterremovelist AS newsletterremovelist))
ORDER BY EMAIL
With a lot of coaching and research, I've tuned it up to the following:
SELECT contacts.ContactID,
contacts.InstitutionID,
contacts.First,
contacts.Last,
institutionswithzipcodesadditional.CountyID,
institutionswithzipcodesadditional.StateID,
institutionswithzipcodesadditional.DistrictID
FROM contacts
INNER JOIN contacts_link_emails ON
contacts.ContactID = contacts_link_emails.ContactID
INNER JOIN institutionswithzipcodesadditional ON
contacts.InstitutionID = institutionswithzipcodesadditional.InstitutionID
WHERE
(contacts.ContactID IN
(SELECT contacts_2.ContactID
FROM contacts AS contacts_2
INNER JOIN contacts_link_emails AS contacts_link_emails_2 ON
contacts_2.ContactID = contacts_link_emails_2.ContactID
LEFT OUTER JOIN contacts_def_jobfunctions ON
contacts_2.JobTitle = contacts_def_jobfunctions.JobID
RIGHT OUTER JOIN newsletterremovelist ON
contacts_link_emails_2.Email = newsletterremovelist.EmailAddress
WHERE (contacts_def_jobfunctions.ParentJobID <> 1841)
GROUP BY contacts_2.ContactID
UNION
SELECT contacts_1.ContactID
FROM contacts_link_jobfunctions
INNER JOIN contacts_def_jobfunctions AS contacts_def_jobfunctions_1 ON
contacts_link_jobfunctions.JobID = contacts_def_jobfunctions_1.JobID
AND contacts_def_jobfunctions_1.ParentJobID <> 1841
INNER JOIN contacts AS contacts_1 ON
contacts_link_jobfunctions.ContactID = contacts_1.ContactID
INNER JOIN contacts_link_emails AS contacts_link_emails_1 ON
contacts_link_emails_1.ContactID = contacts_1.ContactID
LEFT OUTER JOIN newsletterremovelist AS newsletterremovelist_1 ON
contacts_link_emails_1.Email = newsletterremovelist_1.EmailAddress
GROUP BY contacts_1.ContactID))
While this query is now super fast (about 3 seconds), I've blown part of the logic somewhere - it only returns 14,863 rows (instead of the 18,215 rows that I believe is accurate).
The results seem near correct. I'm working to discover what data might be missing in the result set.
Can you please coach me through whatever I've done wrong here?
Thanks,
Russell Schutte
The main problem with your original query was that you had two extra joins just to introduce duplicates and then a DISTINCT to get rid of them.
Use this:
SELECT cle.Email,
c.ContactID,
c.First AS ContactFirstName,
c.Last AS ContactLastName,
c.InstitutionID,
izip.CountyID,
izip.StateID,
izip.DistrictID
FROM dbo.contacts c
INNER JOIN
dbo.institutionswithzipcodesadditional izip
ON izip.InstitutionID = c.InstitutionID
INNER JOIN
dbo.contacts_link_emails cle
ON cle.ContactID = c.ContactID
WHERE cle.Email NOT IN
(
SELECT EmailAddress
FROM dbo.newsletterremovelist
)
AND EXISTS
(
SELECT NULL
FROM dbo.contacts_def_jobfunctions cdj
WHERE cdj.JobId = c.JobTitle
AND cdj.ParentJobId <> '1841'
UNION ALL
SELECT NULL
FROM dbo.contacts_link_jobfunctions clj
JOIN dbo.contacts_def_jobfunctions cdj
ON cdj.JobID = clj.JobID
WHERE clj.ContactID = c.ContactID
AND cdj.ParentJobId <> '1841'
)
ORDER BY
email
Create the following indexes:
newsletterremovelist (EmailAddress)
contacts_link_jobfunctions (ContactID, JobID)
contacts_def_jobfunctions (JobID)
Do you get the same results when you do:
SELECT count(*)
FROM
dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_3
INNER JOIN
dbo.contacts
INNER JOIN
dbo.contacts_link_emails
ON dbo.contacts.ContactID = dbo.contacts_link_emails.ContactID
ON contacts_def_jobfunctions_3.JobID = dbo.contacts.JobTitle
SELECT COUNT(*)
FROM
contacts
INNER JOIN contacts_link_jobfunctions
ON contacts.ContactID = contacts_link_jobfunctions.ContactID
INNER JOIN contacts_link_emails
ON contacts.ContactID = contacts_link_emails.ContactID
If so keep adding each join conditon on until you don't get the same results and you will see where your mistake was. If all the joins are the same, then look at the where clauses. But I will be surprised if it isn't in the first join because the syntax you have orginally won't even work on SQL Server and it is pretty nonstandard SQL and may have been incorrect all along but no one knew.
Alternatively, pick a few of the records that are returned in the orginal but not the revised. Track them through the tables one at a time to see if you can find why the second query filters them out.
I'm not directly sure what is wrong, but when I run in to this situation, the first thing I do is start removing variables.
So, comment out the where clause. How many rows are returned?
If you get back the 11,604 rows then you've isolated the problems to the joins. Work though the joins, commenting each one out (remove the associated columns too) and figure out how many rows are eliminated.
As you do this, aim to find what is causing the desired rows to be eliminated. Once isolated, consider the join differences between the first query and the second query.
In looking at the first query, you could probably just modify that to eliminate any INs and instead do a EXISTS instead.
Consider your indexes as well. Any thing in the where or join clauses should probably be indexed.

How do I use rows-as-fields in a SQL database

I've got a SQL related question regarding a general database structure that seems to be somewhat common. I came up with it one day while trying to solve a problem and (later on) I've seen other people do the same thing (or something remarkably similar) so I think the structure itself makes sense. I just have trouble trying to form certain queries against it.
The idea is that you've got a table with "items" in it and you want to store a set of fields and their values for each item. Normally this would be done by simply adding columns to the items table, the problem is that the field(s) themselves (not just the values) vary from item to item. For example, I might have two items:
Article 1
product_id = aproductid
hidden_key = ahiddenkeyvalue
Article 2
product_id = anotherproductid
address = anaddress
You can see that both items have a product_id field (with different values) but the data stored for each item is different.
The structure in the database ends up something like this:
ItemsTable
id
itemdata_1
itemdata_2
...
FieldsTable
id
field_name
...
And the table that relates them and makes it work
FieldsItemRelationsTable
field_id
item_id
value
Well when I'm trying to do something that involves just one "dynamic field" value there's no problem. I usually do something similar to:
SELECT i.* FROM ItemsTable i
INNER JOIN FieldsItemRelationsTable v ON v.item_id = i.id
INNER JOIN FieldsTable f ON f.id = v.field_id
WHERE v.value = 50 AND f.name = 'product_id';
Which selects all items where product_id=50
The problem arises when I need to do something involving multiple "dynamic field" values. Say I want to select all items where product_id = 50 AND hidden_key = 30. Is it possible with a single SQL statement? I've tried:
SELECT i.* FROM ItemsTable i
INNER JOIN FieldsItemRelationsTable v ON v.item_id = i.id
INNER JOIN FieldsTable f ON f.id = v.field_id
WHERE (v.value = 50 AND f.name = 'product_id')
AND (v.value = 30 AND f.name = 'hidden_key');
But it just returns zero rows.
You'll need to do a seperate join for each value you are bringing back...
SELECT i.* FROM ItemsTable i
INNER JOIN FieldsItemRelationsTable v ON v.item_id = i.id
INNER JOIN FieldsTable f ON f.id = v.field_id
INNER JOIN FieldsItemRelationsTable v2 ON v2.item_id = i.id
INNER JOIN FieldsTable f2 ON f2.id = v2.field_id
WHERE v.value = 50 AND f.name = 'product_id'
AND (v2.value = 30 AND f2.name = 'hidden_key');
er...that query might not function (a bit of a copy/paste sludge job on my part), but you get the idea...you'll need the second value held in a second instance of the table(s) (v2 and f2 in my example here) that is seperate than the first instance. v1.value = 30 and v2.value = 50. v1.value = 50 and v1.value = 30 should never return rows as nothing will equal 30 and 50 at the same time
As an after thought...the query will probably read easier had you put the where clause in the join statement
SELECT i.* FROM ItemsTable i
INNER JOIN FieldsItemRelationsTable v ON v.item_id = i.id and v.value = 50
INNER JOIN FieldsTable f ON f.id = v.field_id and f.name = 'product_id'
INNER JOIN FieldsItemRelationsTable v2 ON v2.item_id = i.id and v2.value = 30
INNER JOIN FieldsTable f2 ON f2.id = v2.field_id and f2.name = 'hidden_key'
Functionally both queries should operate the same though. I'm not sure if there's a logical limit...in scheduling systems you'll often see a setup for 'exceptions'...I've got a report query that's joining like this 28 times...one for each exception type returned.
It's called EAV
Some people hate it
There are alternatives (SO)
Sorry to be vague, but I would investigate your options more.
Try doing some left or right joins to see if you get any results. inner joins will not return results sometimes if there are null fields.
its a start.
Dont forget though, outer join = cartesian product