Remove duplicates after joining three different tables - sql

I'm working now on SQL query for Polarion widget, where I want get items with unresolved comments.
Unfortunately, I have a problem with joining three tables where one has optional empty/null.
In Polarion is a few different tables to contain comments, most of them has a resolved status. I'm using WORKITEM, COMMENT, MODULECOMMENT tables, because sometimes workitem has comment here, sometimes in another table.
Difference is one: when workitem has "normal" comment, then there is always FK key in COMMENT table, but not each workitem has a FK MODULE key (table structure).
In default widget, I can use SQL query with a one big limit:
I can't replace whole select and from, because there is a static line: SELECT WORKITEM.C_URI and FROM WORKITEM, so I can only add something (I tried DISTINCT ON, but how can you see, I can't replace that line)
Polarion has another problem too, each document has headings, texts etc. and each of this element is separate workitem with the same FK_URI_MODULE (and documents use MODULECOMMENT table).
I want remove duplicates from MODULECOMMENT (ignore situation where workitem.FK_URI_MODULE is empty/null, because it's normal state).
The best what I created, It's here (in this situation I replace is null to is not null, because It's return less items):
select workitem.c_uri, workitem.FK_URI_MODULE
from workitem
left join COMMENT on COMMENT.FK_URI_WORKITEM = WORKITEM.C_URI
left join MODULECOMMENT on MODULECOMMENT.FK_URI_MODULE = WORKITEM.FK_URI_MODULE
where true
and workitem.fk_uri_project = 231
and (comment.c_resolved is not null
or modulecomment.c_resolved is not null);
and my results:
c_uri | fk_uri_module
--------+---------------
7952 | 7940
7949 | 7940
7953 | 7940
7964 | 7940
124141 | 124138
609 |
1347 |
609 |
but I want get something like that:
c_uri | fk_uri_module
--------+---------------
7952 | 7940
124141 | 124138
609 |
1347 |
609 |

Don't generate the duplicates to begin with by not joining, but using an EXISTS condition:
select workitem.c_uri, workitem.fk_uri_module
from workitem
where exists (select *
from comment
left join modulecomment on modulecomment.fk_uri_module = workitem.fk_uri_module
where comment.fk_uri_workitem = workitem.c_uri
and (comment.c_resolved is not null
or modulecomment.c_resolved is not null))
where workitem.fk_uri_project = 231

You can use DISTINCT ON for this, which returns only the first record of a group:
demos:db<>fiddle
select DISTINCT ON (workitem.FK_URI_MODULE)
workitem.c_uri, workitem.FK_URI_MODULE
from workitem
left join COMMENT on COMMENT.FK_URI_WORKITEM = WORKITEM.C_URI
left join MODULECOMMENT on MODULECOMMENT.FK_URI_MODULE = WORKITEM.FK_URI_MODULE
where true
and workitem.fk_uri_project = 231
and (comment.c_resolved is not null
or modulecomment.c_resolved is not null)
The interesting part is the NULL records. The query above returns only one of the NULL records, because DISTINCT ON also removes duplicates for NULL.
So, if you want to keep all NULL values, we need to treat the NULL records differently:
SELECT DISTINCT ON (fk_uri_module) -- 1
*
FROM <your_query>
WHERE fk_uri_module IS NOT NULL
UNION ALL
SELECT -- 2
*
FROM <your_query>
WHERE fk_uri_module IS NULL
Apply DISTINCT ON on all records with non-NULL values
UNION ALL records with NULL afterwards
To avoid executing your entire query twice, you can move them into a CTE (WITH clause) and reference it later:
WITH my_result AS (
-- <your_query>
)
SELECT DISTINCT ON (fk_uri_module)
*
FROM my_result
WHERE fk_uri_module IS NOT NULL
UNION ALL
SELECT
*
FROM my_result
WHERE fk_uri_module IS NULL
Edit:
demo:db<>fiddle
If you cannot use DISTINCT ON, a valid alternative for this is the row_number() window function:
SELECT
*
FROM (
select
workitem.c_uri, workitem.FK_URI_MODULE,
row_number() OVER (PARTITION BY workitem.FK_URI_MODULE)
from workitem
left join COMMENT on COMMENT.FK_URI_WORKITEM = WORKITEM.C_URI
left join MODULECOMMENT on MODULECOMMENT.FK_URI_MODULE = WORKITEM.FK_URI_MODULE
where true
and workitem.fk_uri_project = 231
and (comment.c_resolved is not null
or modulecomment.c_resolved is not null)
) s
WHERE row_number = 1

Related

Don't select rows where column A is duplicated AND any row of column B is a specific value

I'm working on generating a report merging multiple tables. The report requires only showing projects that did not have any document marked 'Not Received' These document markings are listed in a table that lists each document in an individual line. So when merged into my other table it creates multiple rows of the same project. For example the following table
Project Number
ChecklistValue
565
Received
565
Not Received
465
Received
465
Not Applicable
As you can see really only two projects are listed on this table but the desired output is:
Project Number
Other Info
465
etc
I do not need the checklist value on the actual report, so I can use the GROUP BY to combine all the good rows, but where I have an Issue is that would still include project 565 even if I include something like where ChecklistValue <> 'Not Received', 565 needs to be hidden from the report entirely because any row for 565 contains 'Not Received'.
So that's my actual question, how do I exclude all project numbers rows that have any row containing 'Not Received'?
I'm adding the entire query will generalized names below:
SELECT
Project Number
,Name
,Contractor
,ABS(DATEDIFF(day,(ActualDate),(EstDate))) AS DelayPeriod
,S.NoteDate
,S.FinalAppDate
,Status
,S.ONE
,S.TWO
,S.THREE
,S.FOUR
,CH.ChecklistValue
FROM [DB1] A
INNER JOIN [DB2] C ON A.Contractor = C.Contractor
INNER JOIN [DB3] S ON A.AppID = S.AppID
INNER JOIN [DB4] LS ON S.StatusID = LS.StatusID
LEFT OUTER JOIN [DB5] CH ON A.AppID = CH.AppID AND CH.OtherID = 1
WHERE C.TypeID = 4 AND A.YEAR = 2022, AND S.THING = 1 AND
(CH.CheckListValue IS NULL OR A.AppID NOT IN (SELECT * FROM [DB5] WHERE
CheckListValue = 'Not Reveived'))
GROUP BY Project Number,Name,Contractor,ABS(DATEDIFF(day,(ActualDate),(EstDate))) AS DelayPeriod,S.NoteDate,S.FinalAppDate,Status,S.ONE,S.TWO,S.THREE,S.FOUR
The last portion of the WHERE clause was added from a suggestion, but I'm clearly not implementing it correctly as it errors
You can use not in like:
create table test(
num int,
description varchar(20)
);
insert into test(num,description)
values(565,'Received'),
(565,'Not Received'),
(465,'Received'),
(465,'Not Applicable');
select *
from test
where num not in
(
select num -- Only select one column here
from test
where description = 'Not Received'
);
Results:
+-----+---------------+
| num | description |
+-----+---------------+
| 465 | Received |
| 465 | Not Applicable|
+-----+---------------+
db<>fiddle this is on sql-server but works on other dbms as well.
So in your query you should have (in my understanding):
OR A.AppID NOT IN
(
SELECT AppID -- Not select *
FROM [DB5]
WHERE CheckListValue = 'Not Reveived'
)
Other way to do it is with a cte but it is complicated at first glance:
with x as(
select num
from test
where description = 'Not Received'
)
select t.num, t.description
from test t
left join x
on t.num = x.num
where x.num is null
I'm first creating a cte on the num column where the description = not received then I'm selecting all from the test table, and I'm left joining to the cte but I'm only selecting the num column that are not in the cte by using where x.num is null, and this will only return 465.
Now which one is better? I don't know sometimes join would be faster and sometimes in, for more you can find on this post.

Returning a number when result set is null

Each lot object contains a corresponding list of work orders. These work orders have tasks assigned to them which are structured by the task set on the lots parent (the phase). I am trying to get the LOT_ID back and a count of TASK_ID where the TASK_ID is found to exist for the where condition.
The problem is if the TASK_ID is not found, the result set is null and the LOT_ID is not returned at all.
I have uploaded a single row for LOT, PHASE, and WORK_ORDER to the following SQLFiddle. I would have added more data but there is a fun limiter .. err I mean character limiter to the editor.
SQLFiddle
SELECT W.[LOT_ID], COUNT(*) AS NUMBER_TASKS_FOUND
FROM [PHASE] P
JOIN [LOT] L ON L.[PHASE_ID] = P.[PHASE_ID]
JOIN [WORK_ORDER] W ON W.[LOT_ID] = L.[LOT_ID]
WHERE P.[TASK_SET_ID] = 1 AND W.[TASK_ID] = 41
GROUP BY W.[LOT_ID]
The query returns the expected result when the task id is found (46) but no result when the task id is not found (say 41). I'd expect in that case to see something like:
+--------+--------------------+
| LOT_ID | NUMBER_TASKS_FOUND |
+--------+--------------------+
| 500 | 0 |
| 506 | 0 |
+--------+--------------------+
I have a feeling this needs to be wrapped in a sub-query and then joined but I am uncertain what the syntax would be here.
My true objective is to be able to pass a list of TASK_ID and get back any LOT_ID that doesn't match, but for now I am just doing a query per task until I can figure that out.
You want to see all lots with their counts for the task. So either outer join the tasks or cross apply their count or use a subquery in the select clause.
select l.lot_id, count(wo.work_order_id) as number_tasks_found
from lot l
left join work_order wo on wo.lot_id = l.lot_id and wo.task_id = 41
where l.phase_id in (select p.phase_id from phase p where p.task_set_id = 1)
group by l.lot_id
order by l.lot_id;
or
select l.lot_id, w.number_tasks_found
from lot l
cross apply
(
select count(*) as number_tasks_found
from work_order wo
where wo.lot_id = l.lot_id
and wo.task_id = 41
) w
where l.phase_id in (select p.phase_id from phase p where p.task_set_id = 1)
order by l.lot_id;
or
select l.lot_id,
(
select count(*)
from work_order wo
where wo.lot_id = l.lot_id
and wo.task_id = 41
) as number_tasks_found
from lot l
where l.phase_id in (select p.phase_id from phase p where p.task_set_id = 1)
order by l.lot_id;
Another option would be to outer join the count and use COALESCE to turn null into zero in your result.

Join two tables and concatenate same column names in same statement

I have a large raw data table. It's long and I'm trying to sort of transpose it. I'm joining two select statements from within it. As such I get multiple columns with the same name. It's a full outer join and I would like to have the two separate columns of the same name as one column. Since it is an outer join I don't want to pick just one tables column for it either like select t1.c1
Thanks!
SELECT *
FROM (SELECT * FROM [LabData].[dbo].[FermHourlyDCSData] where Attribute='Urea') P
full outer JOIN
(SELECT * FROM [LabData].[dbo].[FermHourlyDCSData] where Attribute='Water to Mash Total Water') FPD ON
P.[TimeStamp] = FPD.[TimeStamp]
and P.Site = FPD.Site
and P.Element = FPD.Element
Actual:
Site Attribute Timestamp Value Site Attribute Timestamp Value
AD Urea 1/1/2019 127 Null Null Null Null
Null Null Null Null AD Water 1/1/2019 7.5
Expected/Desired:
Site Attribute Timestamp Value Value
AD Urea 1/1/2019 127 Null
AD Water 1/1/2019 Null 7.5
Try this, it's not very pretty, but it does work:
SELECT
[Site] = ISNULL(P.[Site], FPD.[Site]),
[Attribute] = ISNULL(P.[Attribute], FPD.[Attribute]),
[Timestamp] = ISNULL(P.[Timestamp], FPD.[Timestamp]),
[Value] = ISNULL(P.[Value], FPD.[Value]),
[Element] =ISNULL(P.[Element], FPD.[Element])
FROM (SELECT * FROM [dbo].[FermHourlyDCSData] where Attribute='Urea') P
full outer JOIN
(SELECT * FROM [dbo].[FermHourlyDCSData] where Attribute='Water to Mash Total Water') FPD ON
P.[TimeStamp] = FPD.[TimeStamp]
and P.Site = FPD.Site
and P.Element = FPD.Element
ISNULL is what you should use for this
ISNULL(p.Site,fpd.Site) as [Site]
Maybe I'm missing something, but you seem to want a much simpler query:
select Site, Attribute, Timestamp,
(case when Attribute = 'Urea' then Value end) as value_u,
(case when Attribute = 'Water to Mash Total Water' then Value end) as value_2
from [LabData].[dbo].[FermHourlyDCSData]
where Attribute in ('Urea', 'Water to Mash Total Water')

Is there a simpler way to write this query? [MS SQL Server]

I'm wondering if there is a simpler way to accomplish my goal than what I've come up with.
I am returning a specific attribute that applies to an object. The objects go through multiple iterations and the attributes might change slightly from iteration to iteration. The iteration will only be added to the table if the attribute changes. So the most recent iteration might not be in the table.
Each attribute is uniquely identified by a combination of the Attribute ID (AttribId) and Generation ID (GenId).
Object_Table
ObjectId | AttribId | GenId
32 | 2 | 3
33 | 3 | 1
Attribute_Table
AttribId | GenId | AttribDesc
1 | 1 | Text
2 | 1 | Some Text
2 | 2 | Some Different Text
3 | 1 | Other Text
When I query on a specific object I would like it to return an exact match if possible. For example, Object ID 33 would return "Other Text".
But if there is no exact match, I would like for the most recent generation (largest Gen ID) to be returned. For example, Object ID 32 would return "Some Different Text". Since there is no Attribute ID 2 from Gen 3, it uses the description from the most recent iteration of the Attribute which is Gen ID 2.
This is what I've come up with to accomplish that goal:
SELECT attr.AttribDesc
FROM Attribute_Table AS attr
JOIN Object_Table AS obj
ON obj.AttribId = obj.AttribId
WHERE attr.GenId = (SELECT MIN(GenId)
FROM(SELECT CASE obj2.GenId
WHEN attr2.GenId THEN attr2.GenId
ELSE(SELECT MAX(attr3.GenId)
FROM Attribute_Table AS attr3
JOIN Object_Table AS obj3
ON obj3.AttribId = attr3.AttribId
WHERE obj3.AttribId = 2
)
END AS GenId
FROM Attribute_Table AS attr2
JOIN Object_Table AS obj2
ON attr2.AttribId = obj2.AttribId
WHERE obj2.AttribId = 2
) AS ListOfGens
)
Is there a simpler way to accomplish this? I feel that there should be, but I'm relatively new to SQL and can't think of anything else.
Thanks!
The following query will return the matching value, if found, otherwise use a correlated subquery to return the value with the highest GenId and matching AttribId:
SELECT obj.Object_Id,
CASE WHEN attr1.AttribDesc IS NOT NULL THEN attr1.AttribDesc ELSE attr2.AttribDesc END AS AttribDesc
FROM Object_Table AS obj
LEFT JOIN Attribute_Table AS attr1
ON attr1.AttribId = obj.AttribId AND attr1.GenId = obj.GenId
LEFT JOIN Attribute_Table AS attr2
ON attr2.AttribId = obj.AttribId AND attr2.GenId = (
SELECT max(GenId)
FROM Attribute_Table AS attr3
WHERE attr3.AttribId = obj.AttribId)
In the case where there is no matching record at all with the given AttribId, it will return NULL. If you want to get no record at all in this case, make the second JOIN an INNER JOIN rather than a LEFT JOIN.
Try this...
Incase the logic doesn't find a match for the Object_table GENID it maps it to the next highest GENID in the ON clause of the JOIN.
SELECT AttribDesc
FROM object_TABLE A
INNER JOIN Attribute_Table B
ON A.AttrbId = B.AttrbId
AND (
CASE
WHEN A.Genid <> B.Genid
THEN (
SELECT MAX(C.Genid)
FROM Attribute_Table C
WHERE A.AttrbId = C.AttrbId
)
ELSE A.Genid
END
) -- Selecting the right GENID in the join clause should do the job
= B.Genid
This should work:
with x as (
select *, row_number() over (partition by AttribId order by GenId desc) as rn
from Attribute_Table
)
select isnull(a.attribdesc, x.attribdesc)
from Object_Table o
left join Attribute_Table a
on o.AttribId = a.AttribId and o.GenId = a.GenId
left join x on o.AttribId = x.AttribId and rn = 1

Using (IN operator) OR condition in Where clause as AND condition

Please look at following image, I have explained my requirements in the image.
alt text http://img30.imageshack.us/img30/5668/shippment.png
I can't use here WHERE UsageTypeid IN(1,2,3,4) because this will behave as an OR condition and fetch all records.
I just want those records, of first table, which are attached with all 4 ShipmentToID .
All others which are attached with 3 or less ShipmentToIDs are not needed in result set.
Thanks.
if (EntityId, UsageTypeId) is unique:
select s.PrimaryKeyField, s.ShipmentId from shipment s, item a
where s.PrimaryKeyField = a.EntityId and a.UsageTypeId in (1,2,3,4)
group by s.PrimaryKeyField, s.ShipmentId having count(*) = 4
otherwise, 4-way join for the 4 fields,
select distinct s.* from shipment s, item a, item b, item c, item d where
s.PrimaryKeyField = a.EntityId = b.EntityId = c.EntityId = d.EntityId and
a.UsageTypeId = 1 and b.UsageTypeId = 2 and c.UsageTypeId = 3 and
d.UsageTypeId = 4
you'll want appropriate index on (EntityId, UsageTypeId) so it doesn't hang...
If there will never be duplicates of the UsageTypeId-EntityId combo in the 2nd table, so you'll never see:
EntityUsageTypeId | EntityId | UsageTypeId
22685 | 4477 | 1
22687 | 4477 | 1
You can count matching EntityIds in that table.
WHERE (count(*) in <tablename> WHERE EntityId = 4477) = 4
DECLARE #numShippingMethods int;
SELECT #numShippingMethods = COUNT(*)
FROM shippedToTable;
SELECT tbl1.shipmentID, COUNT(UsageTypeId) as Usages
FROM tbl2 JOIN tbl1 ON tbl2.EntityId = tbl1.EntityId
GROUP BY tbl1.EntityID
HAVING COUNT(UsageTypeId) = #numShippingMethods
This way is preferred to the multiple join against same table method, as you can simply modify the IN clause and the COUNT without needing to add or subtract more tables to the query when your list of IDs changes:
select EntityId, ShipmentId
from (
select EntityId
from (
select EntityId
from EntityUsage eu
where UsageTypeId in (1,2,3,4)
group by EntityId, UsageTypeId
) b
group by EntityId
having count(*) = 4
) a
inner join Shipment s on a.EntityId = s.EntityId