Using GROUP-BY-HAVING on two joined tables - sql

I'm trying to join two tables, and select columns from both based on where constraints as a well as a group-by-having condition. I'm experiencing some issues and behavior that I do not understand. I'm using sybase. A simple example below
CREATE TABLE #test(
name varchar(4),
num int,
cat varchar(3)
)
CREATE TABLE #other(
name varchar(4),
label varchar(20)
)
Insert #test VALUES('a',2,'aa')
Insert #test VALUES ('b',2,'aa')
Insert #test VALUES ('c',3,'bb')
Insert #test VALUES ( 'a',3,'aa')
Insert #test VALUES ( 'd',4,'aa')
Insert #other VALUES('a','this label is a')
Insert #other VALUES ('b','this label is b')
Insert #other VALUES ('c','this label is c')
Insert #other VALUES ( 'd','this label is d')
SELECT t.name,t.num,o.label
FROM #other o inner JOIN #test t ON o.name=t.name
WHERE t.name='a'
GROUP BY t.name
HAVING t.num=MAX(t.num)
I get non-sense when I have the GROUP BY (the label columns are clearly related to a different t.name). If I cut out the GROUP BY statement the query behaves as I would expect, but then I am forced to use this as a subquery and then apply
SELECT * FROM (subquery here) s GROUP BY s.name having s.num=MAX(s.num)
There has to be a better way of doing this. Any help and explanation for this behavior would be very appreciated.
**I should clarify. In my actual query I have something like
SELECT .... FROM (joined tables) WHERE name IN (long list of names), GROUP BY .....

You can try the following, if I understand your query well.
SELECT t.name,t.num,o.label
FROM #other o inner JOIN #test t ON o.name=t.name
WHERE t.name='a' AND t.num=MAX(t.num)

Your GROUP BY must include t.name, t.num and o.label. If you do this
GROUP BY t.name, t.num, o.label
then the query executes without error.
However you're not computing any aggregate values in the group by.
What are you trying to do?

If I understand your requirement correctly, running this...
SELECT t.name, num=MAX(t.num), o.label
FROM #other o
INNER JOIN #test t ON o.name=t.name
WHERE t.name='a'
GROUP BY t.name, o.label;
...gives me this result:
name num label
---- ----------- --------------------
a 3 this label is a

Related

How to select from a master table but replace certain rows using a secondary, linked table?

I have two tables with a foreign key relationship on an ID. I'll refer to them as master and secondary to make things easier and also not worry about the FK for now. Here is cut down, easy to reproduce example using table variables to represent the problem:
DECLARE #Master TABLE (
[MasterID] Uniqueidentifier NOT NULL
,[Description] NVARCHAR(50)
)
DECLARE #Secondary TABLE (
[SecondaryID] Uniqueidentifier NOT NULL
,[MasterID] Uniqueidentifier NOT NULL
,[OtherInfo] NVARCHAR(50)
)
INSERT INTO #Master ([MasterID], [Description])
VALUES ('0C1F1A0C-1DB5-4FA2-BC70-26AA9B10D5C3', 'Test')
,('2696ECD2-FFDB-4E26-83D0-F146ED419C9C', 'Test 2')
,('F21568F0-59C5-4950-B936-AA73DA6009B5', 'Test 3')
INSERT INTO #Secondary (SecondaryID, MasterID, Otherinfo)
VALUES ('514673A6-8B5C-429B-905F-15BD8B55CB5D','0C1F1A0C-1DB5-4FA2-BC70-26AA9B10D5C3','Other info')
SELECT [MasterID], [Description], NULL AS [OtherInfo] FROM #Master
UNION
SELECT S.[MasterID], M.[Description], [OtherInfo] FROM #Secondary S
JOIN #Master M ON M.MasterID = S.MasterID
With the results.....
0C1F1A0C-1DB5-4FA2-BC70-26AA9B10D5C3 Test NULL
0C1F1A0C-1DB5-4FA2-BC70-26AA9B10D5C3 Test Other info
F21568F0-59C5-4950-B936-AA73DA6009B5 Test 3 NULL
2696ECD2-FFDB-4E26-83D0-F146ED419C9C Test 2 NULL
.... I would like to only return records from #Secondary if there is a duplicate MasterID, so this is my expected output:
0C1F1A0C-1DB5-4FA2-BC70-26AA9B10D5C3 Test Other info
F21568F0-59C5-4950-B936-AA73DA6009B5 Test 3 NULL
2696ECD2-FFDB-4E26-83D0-F146ED419C9C Test 2 NULL
I tried inserting my union query into a temporary table, then using a CTE with the partition function. This kind of works but unfortunately returns the row from the #Master table rather than the #Secondary table (regardless of the order I select). See below.
DECLARE #Results TABLE (MasterID UNIQUEIDENTIFIER,[Description] NVARCHAR(50),OtherInfo NVARCHAR(50))
INSERT INTO #Results
SELECT [MasterID], [Description], NULL AS [OtherInfo] FROM #Master
UNION
SELECT S.[MasterID], M.[Description], [OtherInfo] FROM #Secondary S
JOIN #Master M ON M.MasterID = S.MasterID
;WITH CTE AS (
SELECT *, RN= ROW_NUMBER() OVER (PARTITION BY [MasterID] ORDER BY [Description] DESC) FROM #Results
)
SELECT * FROM CTE WHERE RN =1
Results:
0C1F1A0C-1DB5-4FA2-BC70-26AA9B10D5C3 Test NULL 1
F21568F0-59C5-4950-B936-AA73DA6009B5 Test 3 NULL 1
2696ECD2-FFDB-4E26-83D0-F146ED419C9C Test 2 NULL 1
Note that I am not just trying to select the rows which have a value for OtherInfo, this is just to help differentiate the two tables in the result set.
Just to reiterate, what I need to only return the rows present in #Secondary, when there is a duplicate MasterID. If #Secondary has a row for a particular MasterID, I don't need the row from #Master. I hope this makes sense.
What is the best way to do this? I am happy to redesign my database structure. I'm effectively trying to have a master list of items but sometimes take one of those and assign extra info to it + tie it to another ID. In this instance, that record replaces the master list.
You are way overcomplicating this. All you need is a left join.
SELECT M.[MasterID], M.[Description], S.[OtherInfo] FROM #Master M
LEFT JOIN #Secondary S ON M.MasterID = S.MasterID
Union seems to be the wrong approach... I would suggest a left join:
SELECT m.[MasterID], m.[Description], s.[OtherInfo]
FROM #Master m
LEFT JOIN #Secondary s ON s.MasterID = m.MasterID

HAVING clause with subquery -- Checking if group has at least one row matching conditions

Suppose I have the following table
DROP TABLE IF EXISTS #toy_example
CREATE TABLE #toy_example
(
Id int,
Pet varchar(10)
);
INSERT INTO #toy
VALUES (1, 'dog'),
(1, 'cat'),
(1, 'emu'),
(2, 'cat'),
(2, 'turtle'),
(2, 'lizard'),
(3, 'dog'),
(4, 'elephant'),
(5, 'cat'),
(5, 'emu')
and I want to fetch all Ids that have certain pets (for example either cat or emu, so Ids 1, 2 and 5).
DROP TABLE IF EXISTS #Pets
CREATE TABLE #Pets
(
Animal varchar(10)
);
INSERT INTO #Pets
VALUES ('cat'),
('emu')
SELECT Id
FROM #toy_example
GROUP BY Id
HAVING COUNT(
CASE
WHEN Pet IN (SELECT Animal FROM #Pets)
THEN 1
END
) > 0
The above gives me the error Cannot perform an aggregate function on an expression containing an aggregate or a subquery. I have two questions:
Why is this an error? If I instead hard code the subquery in the HAVING clause, i.e. WHEN Pet IN ('cat','emu') then this works. Is there a reason why SQL server (I've checked with SQL server 2017 and 2008) does not allow this?
What would be a nice way to do this? Note that the above is just a toy example. The real problem has many possible "Pets", which I do not want to hard code. It would be nice if the suggested method could check for multiple other similar conditions too in a single query.
If I followed you correctly, you can just join and aggregate:
select t.id, count(*) nb_of_matches
from #toy_example t
inner join #pets p on p.animal = t.pet
group by t.id
The inner join eliminates records from #toy_example that have no match in #pets. Then, we aggregate by id and count how many recors remain in each group.
If you want to retain records that have no match in #pets and display them with a count of 0, then you can left join instead:
select t.id, count(*) nb_of_records, count(p.animal) nb_of_matches
from #toy_example t
left join #pets p on p.animal = t.pet
group by t.id
How about this approach?
SELECT e.Id
FROM #toy_example e JOIN
#pets p
ON e.pet = p.animal
GROUP BY e.Id
HAVING COUNT(DISTINCT e.pet) = (SELECT COUNT(*) FROM #pets);

Join on resultant table of another join without using subquery,CTE or temp tables

My question is can we join a table A to resultant table of inner join of table A and B without using subquery, CTE or temp tables ?
I am using SQL Server.
I will explain the situation with an example
The are two tables GoaLScorers and GoalScoredDetails.
GoaLScorers
gid Name
-----------
1 A
2 B
3 A
GoalScoredDetails
DetailId gid stadium goals Cards
---------------------------------------------
1 1 X 2 1
2 2 Y 5 2
3 3 Y 2 1
The result I am expecting is if I select a stadium 'X' (or 'Y')
I should get name of all who may or may not have scored there, also aggregate total number of goals,total cards.
Null value is acceptable for names if no goals or no cards.
I can get the result I am expecting with the below query
SELECT
gs.name,
SUM(goal) as TotalGoals,
SUM(cards) as TotalCards
FROM
(SELECT
gid, stadium, goal, cards
FROM
GoalScoredDetails
WHERE
stadium = 'Y') AS vtable
RIGHT OUTER JOIN
GoalScorers AS gs ON vtable.gid = gs.gid
GROUP BY
gs.name
My question is can we get the above result without using a subquery or CTE or temp table ?
Basically what we need to do is OUTER JOIN GoalScorers to resultant virtual table of INNER JOIN OF GoalScorers and GoalScoredDetails.
But I am always faced with ambiguous column name error as "gid" column is present in GoalScorers and also in resultant table. Error persists even if I try to use alias for column names.
I have created a sql fiddle for this her: http://sqlfiddle.com/#!3/40162/8
SELECT gs.name, SUM(gsd.goal) AS totalGoals, SUM(gsd.cards) AS totalCards
FROM GoalScorers gs
LEFT JOIN GoalScoredDetails gsd ON gsd.gid = gs.gid AND
gsd.Stadium = 'Y'
GROUP BY gs.name;
IOW, you could push your where criteria onto joining expression.
The error Ambiguous column name 'ColumnName' occurs when SQL Server encounters two or more columns with the same and it hasn't been told which to use. You can avoid the error by prefixing your column names with either the full table name, or an alias if provided. For the examples below use the following data:
Sample Data
DECLARE #GoalScorers TABLE
(
gid INT,
Name VARCHAR(1)
)
;
DECLARE #GoalScoredDetails TABLE
(
DetailId INT,
gid INT,
stadium VARCHAR(1),
goals INT,
Cards INT
)
;
INSERT INTO #GoalScorers
(
gid,
Name
)
VALUES
(1, 'A'),
(2, 'B'),
(3, 'A')
;
INSERT INTO #GoalScoredDetails
(
DetailId,
gid,
stadium,
goals,
Cards
)
VALUES
(1, 1, 'x', 2, 1),
(2, 2, 'y', 5, 2),
(3, 3, 'y', 2, 1)
;
In this first example we recieve the error. Why? Because there is more than one column called gid it cannot tell which to use.
Failed Example
SELECT
gid
FROM
#GoalScoredDetails AS gsd
RIGHT OUTER JOIN #GoalScorers as gs ON gs.gid = gsd.gid
;
This example works because we explicitly tell SQL which gid to return:
Working Example
SELECT
gs.gid
FROM
#GoalScoredDetails AS gsd
RIGHT OUTER JOIN #GoalScorers as gs ON gs.gid = gsd.gid
;
You can, of course, return both:
Example
SELECT
gs.gid,
gsd.gid
FROM
#GoalScoredDetails AS gsd
RIGHT OUTER JOIN #GoalScorers as gs ON gs.gid = gsd.gid
;
In multi table queries I would always recommend prefixing every column name with a table/alias name. This makes the query easier to follow, and reduces the likelihood of this sort of error.

Query for a group of IDs

I use the simple query below to output a list of partIDs based on a modelID that we get from a printout on a sheet of paper.
We've always just used one modelId at a time like this:
SELECT gm.partId, 343432346 as modelId
FROM dbo.systemMemberTable sm
WHERE gm.partID NOT IN (SELECT partID FROM systemsPartTable)
Is there a way to create a query that uses 10 modelIds at a time?
I can't think of a way because the modelIds are not in any table, just a printout that is handed to us.
Thanks!
SELECT gm.partId, T.number as modelId
FROM ( values (4),(9),(16),(25)
) as T(number)
CROSS JOIN dbo.systemMemberTable sm
WHERE gm.partID NOT IN (SELECT partID FROM systemsPartTable)
op said getting an error but this is the test that runs for me
SELECT T.number as [modelID]
,[main].[name]
FROM ( values (4),(9),(16),(25)
) as T(number)
cross join [Gabe2a_ENRONb].[dbo].[docFieldDef] as [main]
where [main].[ID] not in (1)
insert model ids into a table variable and then do the join with this table variable
Also use not exists instead of not in as not in doesn't work if there are null values in the parts table.
declare #modelIds table
(
model_id int
)
insert into #modelIds values (343432346) , (123456)
your select would be
As you want same model id repeated for all parts, you can just use cross join
select gm.partId, m.model_id
from dbo.systemMeberTable sm
cross join #modelIds m
where not exists ( select 1 from systemPartsTable SPT where SPT.partId = gm.PartID )
Try this:
DECLARE #T TABLE (ModelId INT);
INSERT INTO #T (ModelID)
VALUES (343432346), (343432347) -- And so on and so forth
SELECT gm.partId, T.ModelId
FROM dbo.systemMemberTable sm
INNER JOIN #T AS T
ON T.ModelId = SM.ModelID
WHERE gm.partID NOT IN (SELECT partID FROM systemsPartTable)
Create table #tempmodelTable
(
#modelid int
)
insert all modelid here, then use join with your select query
INSERT INTO #tempmodelTable values(123123)
INSERT INTO #tempmodelTable values(1232323)
INSERT INTO #tempmodelTable values(1232343123)
SELECT gm.partId, modelId
FROM dbo.systemMemberTable gm inner join #tempmodelTable
WHERE gm.partID NOT IN (SELECT partID FROM systemsPartTable)

Join query on comma separated values

This is my 1st table :
this is another table on which i want to perform join operation :
I want to retrieve first_name for "activity_cc" column
For example, I want to show Pritam,Niket for activity_id=2
How can I retrieve those values?
From http://mikehillyer.com/articles/an-introduction-to-database-normalization/
The first normal form (or 1NF) requires that the values in each column
of a table are atomic. By atomic we mean that there are no sets of
values within a column.
Your database design violates the first normal form of database design. It is a simply unworkable design and it must be changed (and frankly the database designer who created this should be fired as this is gross incompetence) or there will be severe performance problems and querying will always be difficult. There is a reason why the very first rule of database design is never store more than one piece of information in a field.
Yes you could use some hack methods to get the answer you want, but they will cause performance issues and they are the wrong thing to do. A hack to fix this data into a related table used one-time is fine, a hack to continuallly query your database is simply a poor choice. It will cost less time in the long run to fix this cancer at the heart of your database right now. But in general the process to fix this is to split the data out into a related table using some version of fn_split (look up the various implementations of this for a script to create the function). You can use a temp table in your query or do the right thing and fix the database.
If you want to retrieve the result on the basis of Join then why don't you join your both tables on the "registration_id" by using inner-join. And please clearify me you want to perform the join on the active_cc, but its actually not present in your second table. So how you preform join in that case.
I completely agree with #HLGEM, but to solve this particular problem cost will be high.
I had given a try to want you want to achive here. Please modify the join if needed.
Let me know if any further help needed.
Sample Schema
create table tableC (ACTIVITY_ID int, REG_ID int,PROJ_ID int,DOSS_ID int,ACTIVITY_TO int, ACTIVITY_CC varchar(500))
insert into tableC select 4, 1,1,1,1, '3,4';
insert into tableC select 5, 2,2,2,2, '5,6';
insert into tableC select 6, 3,3,3,3, '3,5';
create table tableD (REG_ID int , FIRST_NAME VARCHAR(100), LAST_NAME VARCHAR(100))
insert into tableD select 3, 'Pritam', 'Sharma';
insert into tableD select 4, 'Pratik', 'Gupta';
insert into tableD select 5, 'Niket', 'Vaidya';
insert into tableD select 6, 'Ajinkya', 'Satwa';
Sample Query
with names as
(
select C.ACTIVITY_ID,C.ACTIVITY_CC
,Names = D.FIRST_NAME
from tableC C
inner join tableD D on charindex(cast(D.REG_ID as varchar), C.ACTIVITY_CC) > 0
)
select
C.ACTIVITY_ID,C.REG_ID,PROJ_ID,DOSS_ID,ACTIVITY_TO,ACTIVITY_CC
,Names = stuff
(
(
select ',' + Names
from names n
where n.ACTIVITY_ID = D.REG_ID
for xml path('')
)
, 1
, 1
, ''
)
from tableD D
inner join tableC C on C.ACTIVITY_ID = D.REG_ID
Added to SQLFiddle also
Considering Pratik's structure
CREATE TABLE tableC
(
ACTIVITY_ID int,
REG_ID int,
PROJ_ID int,
DOSS_ID int,
ACTIVITY_TO int,
ACTIVITY_CC varchar(500)
);
INSERT INTO tableC select 4, 1,1,1,1, '3,4';
INSERT INTO tableC select 5, 2,2,2,2, '5,6';
INSERT INTO tableC select 6, 3,3,3,3, '3,5';
CREATE TABLE tableD
(
REG_ID int,
FIRST_NAME VARCHAR(100),
LAST_NAME VARCHAR(100)
);
INSERT INTO tableD select 3, 'Pritam', 'Sharma';
INSERT INTO tableD select 4, 'Pratik', 'Gupta';
INSERT INTO tableD select 5, 'Niket', 'Vaidya';
INSERT INTO tableD select 6, 'Ajinkya', 'Satwa';
You can do this:
SELECT tableD.FIRST_NAME
FROM tableD
JOIN tableC ON tableC.ACTIVITY_CC LIKE CONCAT('%', tableD.REG_ID, '%')
GROUP BY tableD.FIRST_NAME;
OR
SELECT FIRST_NAME
FROM tableD, tableC
WHERE tableC.ACTIVITY_CC LIKE CONCAT('%', tableD.REG_ID, '%')
GROUP BY tableD.FIRST_NAME;