Deadlock - Lock of a column whereas there is no data - sql

I have a deadlock when I execute this stored procedure :
-- Delete transactions
delete from ADVICESEQUENCETRANSACTION
where ADVICESEQUENCETRANSACTION.id in (
select TR.id from ADVICESEQUENCETRANSACTION TR
inner join ACCOUNTDESCRIPTIONITEM IT on TR.ACCOUNTDESCRIPTIONITEMID = IT.id
inner join ACCOUNTDESCRIPTION ACC on IT.ACCOUNTDESCRIPTIONID = ACC.id
inner join RECOMMENDATIONDESCRIPTION RD on ACC.RECOMMENDATIONDESCRIPTIONID = RD.id
inner join RECOMMENDATION REC on REC.id = RD.RECOMMENDATIONID
inner join ADVICESEQUENCE ADV on ADV.id = REC.ADVICESEQUENCEID
where adv.Id = #AdviceSequenceId AND (#RecommendationState is NULL OR #RecommendationState=REC.[State])
);
Here is the schema of the table :
Here is the deadlock graph :
you can see the detail of the deadlock graph here
So, when I retrieve the associatedobjid of the ressource node, I identify that it's the primary key and an index of the table AdviceSequenceTransaction :
SELECT OBJECT_SCHEMA_NAME([object_id]), * ,
OBJECT_NAME([object_id])
FROM sys.partitions
WHERE partition_id = 72057595553120256 OR partition_id = 72057595553316864;
SELECT name FROM sys.indexes WHERE object_id = 31339176 and (index_id = 1 or index_id = 4)
PK_AdviceSequenceTransaction
IX_ADVICESEQUENCEID_ADVICE
As there is a relation on the table AdviceSequenceTransaction on the key ParentTransactionId and the key Primary key, I have created an index on the column ParentTransactionId.
And I have no more Deadlock. But the problem is I don't know exactly why there is no more deadlock :-/
Moreover, on the set of data to test it, there is no data in ParentTransactionId. All are NULL.
So, Even is there no data (null) in the ParentTransactionId, is there an access to the Primary key by SQL Server ???
An other thing is that I want to remove a join in the delete statement :
delete from ADVICESEQUENCETRANSACTION
where ADVICESEQUENCETRANSACTION.id in (
select TR.id from ADVICESEQUENCETRANSACTION TR
inner join ACCOUNTDESCRIPTIONITEM IT on TR.ACCOUNTDESCRIPTIONITEMID = IT.id
inner join ACCOUNTDESCRIPTION ACC on IT.ACCOUNTDESCRIPTIONID = ACC.id
inner join RECOMMENDATIONDESCRIPTION RD on ACC.RECOMMENDATIONDESCRIPTIONID = RD.id
inner join RECOMMENDATION REC on REC.id = RD.RECOMMENDATIONID
inner join ADVICESEQUENCE ADV on ADV.id = REC.ADVICESEQUENCEID
where adv.Id = #AdviceSequenceId AND (#RecommendationState is NULL OR #RecommendationState=REC.[State])
);
into :
delete from ADVICESEQUENCETRANSACTION
where ADVICESEQUENCETRANSACTION.id in (
select TR.id from ADVICESEQUENCETRANSACTION TR
inner join ACCOUNTDESCRIPTIONITEM IT on TR.ACCOUNTDESCRIPTIONITEMID = IT.id
inner join ACCOUNTDESCRIPTION ACC on IT.ACCOUNTDESCRIPTIONID = ACC.id
inner join RECOMMENDATIONDESCRIPTION RD on ACC.RECOMMENDATIONDESCRIPTIONID = RD.id
inner join RECOMMENDATION REC on REC.id = RD.RECOMMENDATIONID
where TR.AdviceSequenceId = #AdviceSequenceId AND (#RecommendationState is NULL OR #RecommendationState=REC.[State])
);
I removed the last join. But if I do this, I have again the deadlock ! And here again, I don't know why...
Thank you for your enlightment :)

Using a complex, compound join in your WHERE clause can often cause problems. SQL Server processes the clauses using the following Logical Processing Order (view here):
FROM
ON
JOIN
WHERE
GROUP BY
WITH CUBE or WITH ROLLUP
HAVING
SELECT
DISTINCT
ORDER BY
TOP
Using views, or derived tables or views greatly reduces the number of iterative (TABLE) scans required to obtain the desired result because the emphasis in your query is better aligned to use the logical execution order. The FROM clause for each of the derived tables (or views) is executed first, limiting the result set passed to the ON cause, then the JOIN clause and so on, because you're passing your parameters to an "inside FROM" rather than the "outermost WHERE".
So your code could look something like this:
delete from (SELECT ADVICESEQUENCETRANSACTION
FROM (SELECT tr.id
FROM ADVICESEQUENCETRANSACTION WITH NOLOCK
WHERE AdviceSequenceId = #AdviceSequenceId
)TR
INNER JOIN (SELECT [NEXT COLUMN]
FROM [NEXT TABLE] WITH NOLOCK
WHERE COLUMN = [PARAM]
)B
ON TR.COL = B.COL
)ALIAS
WHERE [COLUMN] = COL.PARAM
);
and so on...
(I know the code isn't cut and paste usable, but it should serve to convey the general idea)
In this way, you're passing your parameters to "inner queries" first, pre-loading your limited result set (particularly if you should use views), and then working outward. Using Locking Hints everywhere appropriate will also help to prevent some of the problems you might otherwise encounter. This technique can also help make the execution plan more effective in helping you diagnose where your blocks are coming from, should you still have any.

one approach is to use WITH (nolock) on table ADVICESEQUENCETRANSACTION TR if you dont mind dirty reads.

Related

Speed up SQL query performance with nested queries

Could anyone help me speed this query up? It currently take 17 minutes to run but does return the correct data and it populates a subform in MS Access. Functions in the rest of the VBA are declared as long to try to speed up more.
Here's the full query:
SELECT lots of things
FROM (((((((((((((((ngstest
INNER JOIN patients
ON ngstest.internalpatientid = patients.internalpatientid)
INNER JOIN referral
ON ngstest.referralid = referral.referralid)
INNER JOIN checker
ON ngstest.bookby = checker.check1id)
INNER JOIN ngspanel
ON ngstest.ngspanelid = ngspanel.ngspanelid)
LEFT JOIN ngspanel AS ngspanel_1
ON ngstest.ngspanelid_b = ngspanel_1.ngspanelid)
INNER JOIN status
ON ngstest.statusid = status.statusid)
INNER JOIN dbo_patient_table
ON patients.patientid = dbo_patient_table.patienttrustid)
LEFT JOIN dna
ON ngstest.dna = dna.dnanumber)
INNER JOIN status AS status_1
ON patients.s_statusoverall = status_1.statusid)
LEFT JOIN gw_gendertable
ON dbo_patient_table.genderid = gw_gendertable.genderid)
LEFT JOIN ngswesbatch
ON ngstest.wesbatch = ngswesbatch.ngswesbatchid)
LEFT JOIN checker AS checker_1
ON ngstest.check1id = checker_1.check1id)
LEFT JOIN checker AS checker_2
ON ngstest.check2id = checker_2.check1id)
LEFT JOIN checker AS checker_3
ON ngstest.check3id = checker_3.check1id)
LEFT JOIN ngspanel AS ngspanel_2
ON ngstest.ngspanelid_c = ngspanel_2.ngspanelid)
LEFT JOIN checker AS checker_4
ON ngstest.check4id = checker_4.check1id
WHERE ((ngstest.referralid IN
(SELECT referralid FROM referral
WHERE grouptypeid = 14)
AND ngstest.ngstestid IN
(SELECT ngstest.ngstestid
FROM ngsanalysis
INNER JOIN ngstest
ON ngsanalysis.ngstestid = ngstest.ngstestid
WHERE ngsanalysis.pedigree = 3302) )
AND status.statusid = 1202218800)
ORDER BY ngstest.priority,
ngstest.daterequested;
The two nested queries are strings from elsewhere in the code so are called in the vba as " & includereferralls & " And " & ParentsStatusesFilter & "
They are:
ParentsStatusesFilter = "NGSTest.NGSTestID in
(SELECT NGSTest.NGSTestID
FROM NGSAnalysis
INNER JOIN NGSTest
ON NGSAnalysis.NGSTestID = NGSTest.NGSTestID
WHERE NGSAnalysis.Pedigree IN (3302,3303,3304)"
And
includereferrals = "NGSTest.ReferralID
(SELECT referralid FROM referral WHERE referral.grouptypeid = 14)"
The query needs to remain readable (and therefore editable) so can't use things like Distinct, Group By or contain any Unions. Have tried Exists instead of In for the nested queries but that stops it from actually filtering the results.
WHERE EXISTS (SELECT NGSTest.NGSTestID
FROM NGSAnalysis
INNER JOIN NGSTest
ON NGSAnalysis.NGSTestID = NGSTest.NGSTestID
WHERE NGSAnalysis.Pedigree IN (3302,3303,3304)
So the exist clause you have there isn't tied to the outer query which would run similar to just added 1 = 1 to the where clause. I took your where clause and converted it. It should look something like this...
WHERE EXISTS (
SELECT referralid
FROM referral
WHERE grouptypeid = 14 AND ngstest.referralid = referral.referralid)
AND EXISTS (
SELECT ngsanalysis.ngstestid
FROM ngsanalysis
WHERE ngsanalysis.pedigree IN (3302,3303,3304) AND ngstest.ngstestid = ngsanalysis.ngstestid
)
AND status.statusid = 1202218800
Adding exists will speed it up a bit, but the the bulk of the slowness is the left joins. Access does not handle the left joins as well as SQL Server does. Change all your joins to inner joins and you will see the query runs very fast. This is obviously not ideal since some relationships are optional. What I have done to get around this is add a default record that replaces a null relationship.
Here is what that looks like for you: In the checker table you could add a record that represents a null value. So put a record into the checker table with check1id of -1 or 0. Then default check1id, check2id, check3id on ngstest to -1 or 0. You will need to do that type of thing for all tables you need to left join on.

Super Slow Query - sped up, but not perfect... Please help

I posted a query yesterday (see here) that was horrible (took over a minute to run, resulting in 18,215 records):
SELECT DISTINCT
dbo.contacts_link_emails.Email, dbo.contacts.ContactID, dbo.contacts.First AS ContactFirstName, dbo.contacts.Last AS ContactLastName, dbo.contacts.InstitutionID,
dbo.institutionswithzipcodesadditional.CountyID, dbo.institutionswithzipcodesadditional.StateID, dbo.institutionswithzipcodesadditional.DistrictID
FROM
dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_3
INNER JOIN
dbo.contacts
INNER JOIN
dbo.contacts_link_emails
ON dbo.contacts.ContactID = dbo.contacts_link_emails.ContactID
ON contacts_def_jobfunctions_3.JobID = dbo.contacts.JobTitle
INNER JOIN
dbo.institutionswithzipcodesadditional
ON dbo.contacts.InstitutionID = dbo.institutionswithzipcodesadditional.InstitutionID
LEFT OUTER JOIN
dbo.contacts_def_jobfunctions
INNER JOIN
dbo.contacts_link_jobfunctions
ON dbo.contacts_def_jobfunctions.JobID = dbo.contacts_link_jobfunctions.JobID
ON dbo.contacts.ContactID = dbo.contacts_link_jobfunctions.ContactID
WHERE
(dbo.contacts.JobTitle IN
(SELECT JobID
FROM dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_1
WHERE (ParentJobID <> '1841')))
AND
(dbo.contacts_link_emails.Email NOT IN
(SELECT EmailAddress
FROM dbo.newsletterremovelist))
OR
(dbo.contacts_link_jobfunctions.JobID IN
(SELECT JobID
FROM dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_2
WHERE (ParentJobID <> '1841')))
AND
(dbo.contacts_link_emails.Email NOT IN
(SELECT EmailAddress
FROM dbo.newsletterremovelist AS newsletterremovelist))
ORDER BY EMAIL
With a lot of coaching and research, I've tuned it up to the following:
SELECT contacts.ContactID,
contacts.InstitutionID,
contacts.First,
contacts.Last,
institutionswithzipcodesadditional.CountyID,
institutionswithzipcodesadditional.StateID,
institutionswithzipcodesadditional.DistrictID
FROM contacts
INNER JOIN contacts_link_emails ON
contacts.ContactID = contacts_link_emails.ContactID
INNER JOIN institutionswithzipcodesadditional ON
contacts.InstitutionID = institutionswithzipcodesadditional.InstitutionID
WHERE
(contacts.ContactID IN
(SELECT contacts_2.ContactID
FROM contacts AS contacts_2
INNER JOIN contacts_link_emails AS contacts_link_emails_2 ON
contacts_2.ContactID = contacts_link_emails_2.ContactID
LEFT OUTER JOIN contacts_def_jobfunctions ON
contacts_2.JobTitle = contacts_def_jobfunctions.JobID
RIGHT OUTER JOIN newsletterremovelist ON
contacts_link_emails_2.Email = newsletterremovelist.EmailAddress
WHERE (contacts_def_jobfunctions.ParentJobID <> 1841)
GROUP BY contacts_2.ContactID
UNION
SELECT contacts_1.ContactID
FROM contacts_link_jobfunctions
INNER JOIN contacts_def_jobfunctions AS contacts_def_jobfunctions_1 ON
contacts_link_jobfunctions.JobID = contacts_def_jobfunctions_1.JobID
AND contacts_def_jobfunctions_1.ParentJobID <> 1841
INNER JOIN contacts AS contacts_1 ON
contacts_link_jobfunctions.ContactID = contacts_1.ContactID
INNER JOIN contacts_link_emails AS contacts_link_emails_1 ON
contacts_link_emails_1.ContactID = contacts_1.ContactID
LEFT OUTER JOIN newsletterremovelist AS newsletterremovelist_1 ON
contacts_link_emails_1.Email = newsletterremovelist_1.EmailAddress
GROUP BY contacts_1.ContactID))
While this query is now super fast (about 3 seconds), I've blown part of the logic somewhere - it only returns 14,863 rows (instead of the 18,215 rows that I believe is accurate).
The results seem near correct. I'm working to discover what data might be missing in the result set.
Can you please coach me through whatever I've done wrong here?
Thanks,
Russell Schutte
The main problem with your original query was that you had two extra joins just to introduce duplicates and then a DISTINCT to get rid of them.
Use this:
SELECT cle.Email,
c.ContactID,
c.First AS ContactFirstName,
c.Last AS ContactLastName,
c.InstitutionID,
izip.CountyID,
izip.StateID,
izip.DistrictID
FROM dbo.contacts c
INNER JOIN
dbo.institutionswithzipcodesadditional izip
ON izip.InstitutionID = c.InstitutionID
INNER JOIN
dbo.contacts_link_emails cle
ON cle.ContactID = c.ContactID
WHERE cle.Email NOT IN
(
SELECT EmailAddress
FROM dbo.newsletterremovelist
)
AND EXISTS
(
SELECT NULL
FROM dbo.contacts_def_jobfunctions cdj
WHERE cdj.JobId = c.JobTitle
AND cdj.ParentJobId <> '1841'
UNION ALL
SELECT NULL
FROM dbo.contacts_link_jobfunctions clj
JOIN dbo.contacts_def_jobfunctions cdj
ON cdj.JobID = clj.JobID
WHERE clj.ContactID = c.ContactID
AND cdj.ParentJobId <> '1841'
)
ORDER BY
email
Create the following indexes:
newsletterremovelist (EmailAddress)
contacts_link_jobfunctions (ContactID, JobID)
contacts_def_jobfunctions (JobID)
Do you get the same results when you do:
SELECT count(*)
FROM
dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_3
INNER JOIN
dbo.contacts
INNER JOIN
dbo.contacts_link_emails
ON dbo.contacts.ContactID = dbo.contacts_link_emails.ContactID
ON contacts_def_jobfunctions_3.JobID = dbo.contacts.JobTitle
SELECT COUNT(*)
FROM
contacts
INNER JOIN contacts_link_jobfunctions
ON contacts.ContactID = contacts_link_jobfunctions.ContactID
INNER JOIN contacts_link_emails
ON contacts.ContactID = contacts_link_emails.ContactID
If so keep adding each join conditon on until you don't get the same results and you will see where your mistake was. If all the joins are the same, then look at the where clauses. But I will be surprised if it isn't in the first join because the syntax you have orginally won't even work on SQL Server and it is pretty nonstandard SQL and may have been incorrect all along but no one knew.
Alternatively, pick a few of the records that are returned in the orginal but not the revised. Track them through the tables one at a time to see if you can find why the second query filters them out.
I'm not directly sure what is wrong, but when I run in to this situation, the first thing I do is start removing variables.
So, comment out the where clause. How many rows are returned?
If you get back the 11,604 rows then you've isolated the problems to the joins. Work though the joins, commenting each one out (remove the associated columns too) and figure out how many rows are eliminated.
As you do this, aim to find what is causing the desired rows to be eliminated. Once isolated, consider the join differences between the first query and the second query.
In looking at the first query, you could probably just modify that to eliminate any INs and instead do a EXISTS instead.
Consider your indexes as well. Any thing in the where or join clauses should probably be indexed.

Complex SQL Query to NHibernate DetachedCriteria or HQL

I have the following SQL Query returning the results I need:
SELECT
Person.FirstName,Person.LastName,OrganisationUnit.Name AS UnitName, RS_SkillsArea.Name AS SkillsArea, Activity.Name AS ActivityName, Activity.CLASS, Activity.StartsOn, Activity.EndsOn,
SUM(ActivityCost.CostAmount) /
NULLIF(
(
SELECT COUNT(Registration.ActivityId) FROM
Registration INNER JOIN AttemptResultsSummary ON Registration.CurrentResultId = AttemptResultsSummary.AttemptResultsSummaryId AND
Registration.RegistrationId = AttemptResultsSummary.RegistrationId
WHERE (Registration.Status = 1) AND (Registration.ActivityId = Activity.ActivityId)
AND (AttemptResultsSummary.AttendanceStatus <> 1)
)
,0)
AS IndividualCost
FROM Registration AS Registration_1 INNER JOIN
Activity ON Registration_1.ActivityId = Activity.ActivityId INNER JOIN
Person ON Registration_1.PersonId = Person.PersonId INNER JOIN
OrganisationUnit ON Person.OrganisationUnitId = OrganisationUnit.OrganisationUnitId INNER JOIN
AttemptResultsSummary ON Registration_1.CurrentResultId = AttemptResultsSummary.AttemptResultsSummaryId AND
Registration_1.RegistrationId = AttemptResultsSummary.RegistrationId AND Activity.ActivityId = AttemptResultsSummary.ActivityId AND
Person.PersonId = AttemptResultsSummary.PersonId INNER JOIN
ActivityCost ON Activity.ActivityId = ActivityCost.ActivityId LEFT OUTER JOIN
(SELECT Category.Name, Category.CategoryId
FROM Category INNER JOIN
CategoryGroup ON Category.[Group] = CategoryGroup.CategoryGroupId
WHERE (CategoryGroup.Name = N'Skills Area')) AS RS_SkillsArea INNER JOIN
ActivityInCategory ON RS_SkillsArea.CategoryId = ActivityInCategory.CategoryId ON Activity.ActivityId = ActivityInCategory.ActivityId
AND AttemptResultsSummary.AttendanceStatus <> 1
GROUP BY RS_SkillsArea.Name, Person.FirstName,Person.LastName,Activity.Name, Activity.CLASS, Activity.StartsOn, Activity.EndsOn, Activity.ActivityId, OrganisationUnit.Name,
AttemptResultsSummary.CompletionStatus, AttemptResultsSummary.AttendanceStatus
HAVING AttemptResultsSummary.AttendanceStatus <> 1
Essentially is there any way using either DetachedCriteria or HQL to do the same against the entities rather than direct SQL?
The two challenges are:
The query for cost calculation per row.
The derived table join (which needs to be an outer join as this value may not exist)
I'd appreciate any pointers. I'd rather not use SQL because of infrastructure changes and the issues with (lack of) refactoring support
Take a look at the official HQL examples # http://docs.jboss.org/hibernate/stable/core/reference/en/html/queryhql.html#queryhql-examples .
In my opinion, the 'derived joins' would be even easier to pull off using HQL.
In the case of performance, my first start would be to catch how much it costs using native SQL using your prefered profiler, and then how much it costs on NHibernate using NHProf.

Replace IN with EXISTS or COUNT. How to do it. What is missing here?

I am using IN keyword in the query in the middle of a section. Since I am using nested query and want to replace In with Exists due to performance issues that my seniors have told me might arise.
Am I missing some column, what you are looking for in this query. This query contain some aliases for readibility.
How can I remove it.
SELECT TX.PK_MAP_ID AS MAP_ID
, MG.PK_GUEST_ID AS Guest_Id
, MG.FIRST_NAME
, H.PK_CATEGORY_ID AS Preference_Id
, H.DESCRIPTION AS Preference_Name
, H.FK_CATEGORY_ID AS Parent_Id
, H.IMMEDIATE_PARENT AS Parent_Name
, H.Department_ID
, H.Department_Name
, H.ID_PATH, H.DESC_PATH
FROM
dbo.M_GUEST AS MG
LEFT OUTER JOIN
dbo.TX_MAP_GUEST_PREFERENCE AS TX
ON
(MG.PK_GUEST_ID = TX.FK_GUEST_ID)
LEFT OUTER JOIN
dbo.GetHierarchy_Table AS H
ON
(TX.FK_CATEGORY_ID = H.PK_CATEGORY_ID)
WHERE
(MG.IS_ACTIVE = 1)
AND
(TX.IS_ACTIVE = 1)
AND
(H.Department_ID IN -----How to remove this IN operator with EXISTS or Count()
(
SELECT C.PK_CATEGORY_ID AS DepartmentId
FROM
dbo.TX_MAP_DEPARTMENT_OPERATOR AS D
INNER JOIN
dbo.M_OPERATOR AS M
ON
(D.FK_OPERATOR_ID = M.PK_OPERATOR_ID)
AND
(D.IS_ACTIVE = M.IS_ACTIVE)
INNER JOIN
dbo.L_USER_ROLE AS R
ON
(M.FK_ROLE_ID = R.PK_ROLE_ID)
AND
(M.IS_ACTIVE = R.IS_ACTIVE)
INNER JOIN
dbo.L_CATEGORY_TYPE AS C
ON
(D.FK_DEPARTMENT_ID = C.PK_CATEGORY_ID)
AND
(D.IS_ACTIVE = C.IS_ACTIVE)
WHERE
(D.IS_ACTIVE = 1)
AND
(M.IS_ACTIVE = 1)
AND
(R.IS_ACTIVE = 1)
AND
(C.IS_ACTIVE = 1)
)--END INNER QUERY
)--END Condition
What new problems might I get if I replace IN with EXISTS or COUNT ?
Basically, as I understand your question, you are asking how can I replace this:
where H.department_id in (select departmentid from...)
with this:
where exists (select...)
or this:
where (select count(*) from ...) > 1
It is fairly straight forward. One method might be this:
WHERE...
AND EXISTS (select c.pk_category_id
from tx_map_department_operator d
inner join m_operator as m
on d.fk_operator_id = m.pk_operator_id
inner join l_user_role l
on m.fk_role_id = r.pk_role_id
inner join l_category_type c
on d.fk_department_id = c.pk_category_id
where h.department_id = c.pk_category_id
and d.is_active = 1
and m.is_active = 1
and r.is_active = 1
and c.is_active = 1
)
I removed the extra joins on is_active because they were redundant. You should test how it runs with your indexes, because that might have been faster. I doubt it though. But it is worth comparing whether it is faster to add the join clause (join on ... and x.is_active=y.is_active) or to check in the where clause (x.is_active=1 and y.is_active=1 and z.is_active=1...)
And I'd recommend you just use exists, instead of count(*), because I know that exists should stop after finding 1 row, whereas count probably continues to execute until done, and then compares to your reference value (count > 1).
As an aside, that is a strange column naming standard you have. Do you really have PK prefixes for the primary keys, and FK prefixes for the foreign keys? I have never seen that.

Selecting the first row out of many sql joins

Alright, so I'm putting together a path to select a revision of a particular novel:
SELECT Catalog.WbsId, Catalog.Revision, NovelRevision.Revision
FROM Catalog, BookInCatalog
INNER JOIN NovelMaster
INNER JOIN HasNovelRevision
INNER JOIN NovelRevision
ON HasNovelRevision.right = NovelRevision.obid
ON HasNovelRevision.Left=NovelMaster.obid
ON NovelMaster.obid = BookInCatalog.Right
WHERE Catalog.obid = BookInCatalog.Left;
This returns all revisions that are in the Novel Master for each Novel Master that is in the catalog.
The problem is, I only want the FIRST revision of each novel master in the catalog. How do I go about doing that? Oh, and btw: my flavor of sql is hobbled, as many others are, in that it does not support the LIMIT Function.
****UPDATE****
So using answer 1 as a guide I upgraded my query to this:
SELECT Catalog.wbsid
FROM Catalog, BookInCatalog, NovelVersion old, NovelMaster, HasNovelRevision
LEFT JOIN NovelVersion newRevs
ON old.revision < newRevs.revision AND HasNovelRevision.right = newRevs.obid
LEFT JOIN HasNovelRevision NewerHasNovelRevision
ON NewerHasNovelRevision.right = newRevs.obid
LEFT JOIN NovelMaster NewTecMst
ON NewerHasNovelRevision.left = NewTecMst.obid
WHERE Catalog.programName = 'E18' AND Catalog.obid = BookInCatalog.Left
AND BookInCatalog.right = NewTecMst.obid AND newRevs.obid = null
ORDER BY newRevs.documentname;
I get an error on the fourth line:
"old"."revision": invalid identifier
SOLUTION
Well, I had to go to another forum, but I got a working solution:
select nr1.title, nr1.revision
from novelrevision nr1
where nr1.revision in (select min(revision) from novelrevision nr2
where nr1.title = nr2.title)
So this solution uses the JOIN mentioned by the OA, along with the IN keyword to match it to a revision.
Something like this might work, it's called an exclusive left join:
....
INNER JOIN NovelRevision
ON HasNovelRevision.right = NovelRevision.obid
LEFT JOIN NovelRevision as NewerRevision
ON HasNovelRevision.right = NewerRevision.obid
AND NewerRevision.revision > NovelRevision.revision
...
WHERE NeverRevision.obid is null
The where clause filters out rows for which a newer revision exists. This effectively limits the query to the newest revisions.
In response to your comment, you could filter out only revisions that have a newer revision in the same NovelMaster. For example:
....
LEFT JOIN NovelRevision as NewerRevision
ON HasNovelRevision.right = NewerRevision.obid
AND NewerRevision.revision > NovelRevision.revision
LEFT JOIN HasNovelRevision as NewerHasNovelRevision
ON NewerHasNovelRevision.right = NewerRevision.obid
LEFT JOIN NovelMaster as NewerNovelMaster
ON NewerHasNovelRevision.left = NewerNovelMaster.obid
AND NewerNovelMaster.obid = NovelMaster.obid
....
WHERE NeverNovelMaster.obid is null
P.S. I don't think you can group JOINs and follow them with a group of ON conditions. An ON must directly follow its JOIN.
You can use CTE
Check this
WITH NovelRevesion_CTE(obid,RevisionDate)
AS
(
SELECT obid,MIN(RevisionDate) RevisionDate FROM NovelRevision Group by obid
)
SELECT Catalog.WbsId, Catalog.Revision, NovelRevision.Revision
FROM Catalog, BookInCatalog
INNER JOIN NovelMaster
INNER JOIN HasNovelRevision
INNER JOIN NovelRevesion
INNER JOIN NovelRevesion_CTE
ON HasNovelRevision.[right] = NovelRevision.obid
ON HasNovelRevision.[Left]=NovelMaster.obid
ON NovelMaster.obid = BookInCatalog.[Right]
ON NovelRevesion_CTE.obid = NovelRevesion.obid
WHERE Catalog.obid = BookInCatalog.[Left];
First it select the first revision written for each novel (assuming obid is novel foriegn key) by taking the smallest date and group them.
then add it as join in your query