SQL Logical AND operator for bit fields - sql

I have 2 tables that have a many to many relationship; An Individual can belong to many Groups. A Group can have many Individuals.
Individuals basically just have their Primary Key ID
Groups have a Primary Key ID, IndividualID (same as the ID in the Individual Table), and a bit flag for if that group is the primary group for the individual
In theory, all but one of the entries for any given individual in the group table should have that bit flag set to false, because every individual must have exactly 1 primary group.
I know that for my current dataset, this assumption doesn't hold true, and I have some individuals that have the primary flag for ALL their groups set to false.
I'm having trouble generating a query that will return those individuals to me.
The closest I've gotten is:
SELECT * FROM Individual i
LEFT JOIN Group g ON g.IndividualID = i.ID
WHERE g.IsPrimaryGroup = 0
but going further than that with SUM or MAX doesn't work, because the field is a bit field, and not a numeric.
Any suggestions?

Don't know your data...but....that LEFT JOIN is an INNER JOIN
what happens when you change the WHERE to AND
SELECT * FROM Individual i
LEFT JOIN Group g ON g.IndividualID = i.ID
AND g.IsPrimaryGroup = 0
Here try running this....untested of course since you didn't provide any ample data
SELECT SUM(convert(int,g.IsPrimaryGroup)), i.ID
FROM Individual i
LEFT JOIN [Group] g ON g.IndividualID = i.ID
AND g.IsPrimaryGroup = 0
GROUP BY i.ID
HAVING COUNT(*) > 1

Try not using a bit field if you need to do SUM and MAX - use a TINYINT instead. In addition, from what I remember bit fields can not be indexed, so you will loose some performance in your joins.

Update: Got it working with a subselect. Select IndividualID from Group where the primary group is false, and individualID NOT IN (select IndividualID from Group where primary group is true)

You need to include the IsPrimaryGroup condition into the JOIN clause. This query finds all individuals with no PrimaryGroup set:
SELECT * FROM Individual i
LEFT OUTER JOIN Group g ON g.IndividualID = i.ID AND g.IsPrimaryGroup = 1
WHERE g.ID IS NULL
However, the ideal way to solve your problem (in terms of relational db) is to have a PrimaryGroupID in the Individual table.

SELECT COUNT(bitflag),individualId
FROM Groups
WHERE bitflag = 1
GROUP BY individualId
ORDER BY SUM(bitFlag)
HAVING COUNT(bitFlag) <> 1
That will give you each individual and how many primary groups they have

I don't know if this is optimal from a performance standpoint, but I believe something along these lines should work. I'm using OrgIndividual as the name of the resolution table between the Individal and the Group.SELECT DISTINCT(i.IndividualID)
FROM
Individual i INNER JOIN OrgIndividual oi
ON i.IndividualID = oi.IndividualID AND oi.PrimaryOrg = 0
LEFT JOIN OrgIndividual oip
ON oi.IndividualID = oip.IndividualID AND oi.PrimaryOrg = 1
WHERE
oi2.IndividualID IS NULL

SELECT IndividualID
FROM Group g
WHERE NOT EXISTS (
SELECT NULL FROM Group
WHERE PrimaryOrg = 1
AND IndividualID = g.IndividualID)
GROUP BY IndividualID

Related

Sum fields of an Inner join

How I can add two fields that belong to an inner join?
I have this code:
select
SUM(ACT.NumberOfPlants ) AS NumberOfPlants,
SUM(ACT.NumOfJornales) AS NumberOfJornals
FROM dbo.AGRMastPlanPerformance MPR (NOLOCK)
INNER JOIN GENRegion GR ON (GR.intGENRegionKey = MPR.intGENRegionLink )
INNER JOIN AGRDetPlanPerformance DPR (NOLOCK) ON
(DPR.intAGRMastPlanPerformanceLink =
MPR.intAGRMastPlanPerformanceKey)
INNER JOIN vwGENPredios P ​​(NOLOCK) ON ( DPR.intGENPredioLink =
P.intGENPredioKey )
INNER JOIN AGRSubActivity SA (NOLOCK) ON (SA.intAGRSubActivityKey =
DPR.intAGRSubActivityLink)
LEFT JOIN (SELECT RA.intGENPredioLink, AR.intAGRActividadLink,
AR.intAGRSubActividadLink, SUM(AR.decNoPlantas) AS
intPlantasTrabajads, SUM(AR.decNoPersonas) AS NumOfJornales,
SUM(AR.decNoPlants) AS NumberOfPlants
FROM AGRRecordActivity RA WITH (NOLOCK)
INNER JOIN AGRActividadRealizada AR WITH (NOLOCK) ON
(AR.intAGRRegistroActividadLink = RA.intAGRRegistroActividadKey AND
AR.bitActivo = 1)
INNER JOIN AGRSubActividad SA (NOLOCK) ON (SA.intAGRSubActividadKey
= AR.intAGRSubActividadLink AND SA.bitEnabled = 1)
WHERE RA.bitActive = 1 AND
AR.bitActive = 1 AND
RA.intAGRTractorsCrewsLink IN(2)
GROUP BY RA.intGENPredioLink,
AR.decNoPersons,
AR.decNoPlants,
AR.intAGRAActivityLink,
AR.intAGRSubActividadLink) ACT ON (ACT.intGENPredioLink IN(
DPR.intGENPredioLink) AND
ACT.intAGRAActivityLink IN( DPR.intAGRAActivityLink) AND
ACT.intAGRSubActivityLink IN( DPR.intAGRSubActivityLink))
WHERE
MPR.intAGRMastPlanPerformanceKey IN(4) AND
DPR.intAGRSubActivityLink IN( 1153)
GROUP BY
P.vchRegion,
ACT.NumberOfFloors,
ACT.NumOfJournals
ORDER BY ACT.NumberOfFloors DESC
However, it does not perform the complete sum. It only retrieves all the values ​​of the columns and adds them 1 by 1, instead of doing the complete sum of the whole column.
For example, the query returns these results:
What I expect is the final sums. In NumberOfPlants the result of the sum would be 163,237 and of NumberJornales would be 61.
How can I do this?
First of all the (nolock) hints are probably not accomplishing the benefit you hope for. It's not an automatic "go faster" option, and if such an option existed you can be sure it would be already enabled. It can help in some situations, but the way it works allows the possibility of reading stale data, and the situations where it's likely to make any improvement are the same situations where risk for stale data is the highest.
That out of the way, with that much code in the question we're better served with a general explanation and solution for you to adapt.
The issue here is GROUP BY. When you use a GROUP BY in SQL, you're telling the database you want to see separate results per group for any aggregate functions like SUM() (and COUNT(), AVG(), MAX(), etc).
So if you have this:
SELECT Sum(ColumnB) As SumB
FROM [Table]
GROUP BY ColumnA
You will get a separate row per ColumnA group, even though it's not in the SELECT list.
If you don't really care about that, you can do one of two things:
Remove the GROUP BY If there are no grouped columns in the SELECT list, the GROUP BY clause is probably not accomplishing anything important.
Nest the query
If option 1 is somehow not possible (say, the original is actually a view) you could do this:
SELECT SUM(SumB)
FROM (
SELECT Sum(ColumnB) As SumB
FROM [Table]
GROUP BY ColumnA
) t
Note in both cases any JOIN is irrelevant to the issue.

SQL: how to count entries in multiple tables by value?

I have two tables, question & field. I need to count entries , with coincidental value of template_id (both tables contains).
Please advice, how to do it?
select count(q.*)
from question q
left join field f on f.template.id = q.template_id
In StackOverflow one should show ones own attempt, show that some effort was done.
Above inner join is probably what you meant. Try first select q.*, f.*.
SELECT
COUNT(*) AS TotalRecords
FROM question q
INNER JOIN field f ON f.template_id = q.template_id
If you want the count of distinct template_id in the two tables, use JOIN and COUNT(DISTINCT):
select count(distinct q.template_id)
from question q join
field f
on f.template_id = q.template_id;
If you use count(*) you will get a count of matching rows, not template_ids, so duplicates will affect the result.
If template_id is known to be unique in one of the tables (say question), then exists is probably more efficient:
select count(*)
from question q
where exists (select 1
from field f
where f.template_id = q.template_id
);

SQL - Teams and Their Player Count

I'd need some help here. Im stuck on this SQL Query.
I have 2 Tables team and function. Team contains ID,Teamname. Function contains ID,Role
role decides whether a person is a player or a coach (0 for player, 1 for coach).
I am supposed to list all teams that exists, but they should only count players and even teams with no players should be listed to. There's teams that only have coaches for example.
i can't find any way to count everyone within a team.
This is the closest I've come:
select teamname,
count(*)
from function left outer join team on team.id = function.id
WHERE role=0
group by teamname;
Here's one way to get the result:
SELECT t.teamname
, COUNT(1) AS player_count
FROM team t
LEFT
JOIN function f
ON f.id = t.id
AND f.role = 0
GROUP BY t.teamname;
-- or --
SELECT t.teamname
, IFNULL(SUM(f.role=0),0) AS player_count
FROM team t
LEFT
JOIN function f
ON f.id = t.id
GROUP BY t.teamname;
These queries use the team table as the driver, so we get rows from team even when there isn't a matching row in function. In the first query, we include the role=0 condition as a join predicate, but it's an outer join. In the second query, it's still an outer join, but we only "count" rows that have role=0.
NOTE It looks as if something is missing from your model; the id=id join predicate looks odd; it's odd that we'd have a column named id in the function table which is a foreign key referencing team.id.
Normally, id is the surrogate primary key in a table, and it's unique in that table. If it were unique in the function table, then that implies a one-to-one relationship between team and function, and the query will never return a count greater than 1.
Again, I strongly suspect that the "member" entity is missing from the model.
What we'd expect is something like this::
team
id PK
teamname
member
id PK
team_id FK references team.id
role 0=player, 1=coach
such that the query would look something like this:
SELECT t.teamname
, IFNULL(SUM(m.role=0),0) AS player_count
, IFNULL(SUM(m.role=1),0) AS coach_count
FROM team t
LEFT
JOIN member m
ON m.team_id = t.id
You can't select a single column when an aggregate function exists in the SELECT clause.
To get a count of all rows in the table and get the name of the team, you must use only aggregated columns or functions in the SELECT clause. For example:
SELECT COUNT(*), team.teamname
FROM function LEFT OUTER JOIN team ON team.id = function.id
WHERE function.role = 0
GROUP BY team.teamname
Gives you a total of all rows where role is equal to 0 plus the name of the team.
Use the following instead:
SELECT teamname,SUM(IF(role=0,1,0)) players
FROM function
LEFT JOIN team USING (id)
GROUP BY teamname;
This counts only those entries from 'function' that have player role (0).

Filter a SQL Server table dynamically using multiple joins

I am trying to filter a single table (master) by the values in multiple other tables (filter1, filter2, filter3 ... filterN) using only joins.
I want the following rules to apply:
(A) If one or more rows exist in a filter table, then include only those rows from the master that match the values in the filter table.
(B) If no rows exist in a filter table, then ignore it and return all the rows from the master table.
(C) This solution should work for N filter tables in combination.
(D) Static SQL using JOIN syntax only, no Dynamic SQL.
I'm really trying to get rid of dynamic SQL wherever possible, and this is one of those places I truly think it's possible, but just can't quite figure it out. Note: I have solved this using Dynamic SQL already, and it was fairly easy, but not particularly efficient or elegant.
What I have tried:
Various INNER JOINS between master and filter tables - works for (A) but fails on (B) because the join removes all records from the master (left) side when the filter (right) side has no rows.
LEFT JOINS - Always returns all records from the master (left) side. This fails (A) when some filter tables have records and some do not.
What I really need:
It seems like what I need is to be able to INNER JOIN on each filter table that has 1 or more rows and LEFT JOIN (or not JOIN at all) on each filter table that is empty.
My question: How would I accomplish this without resorting to Dynamic SQL?
In SQL Server 2005+ you could try this:
WITH
filter1 AS (
SELECT DISTINCT
m.ID,
HasMatched = CASE WHEN f.ID IS NULL THEN 0 ELSE 1 END,
AllHasMatched = MAX(CASE WHEN f.ID IS NULL THEN 0 ELSE 1 END) OVER ()
FROM masterdata m
LEFT JOIN filtertable1 f ON join_condition
),
filter2 AS (
SELECT DISTINCT
m.ID,
HasMatched = CASE WHEN f.ID IS NULL THEN 0 ELSE 1 END,
AllHasMatched = MAX(CASE WHEN f.ID IS NULL THEN 0 ELSE 1 END) OVER ()
FROM masterdata m
LEFT JOIN filtertable2 f ON join_condition
),
…
SELECT m.*
FROM masterdata m
INNER JOIN filter1 f1 ON m.ID = f1.ID AND f1.HasMatched = f1.AllHasMatched
INNER JOIN filter2 f2 ON m.ID = f2.ID AND f2.HasMatched = f2.AllHasMatched
…
My understanding is, filter tables without any matches simply must not affect the resulting set. The output should only consist of those masterdata rows that have matched all the filters where matches have taken place.
SELECT *
FROM master_table mt
WHERE (0 = (select count(*) from filter_table_1)
OR mt.id IN (select id from filter_table_1)
AND (0 = (select count(*) from filter_table_2)
OR mt.id IN (select id from filter_table_2)
AND (0 = (select count(*) from filter_table_3)
OR mt.id IN (select id from filter_table_3)
Be warned that this could be inefficient in practice. Unless you have a specific reason to kill your existing, working, solution, I would keep it.
Do inner join to get results for (A) only and do left join to get results for (B) only (you will have to put something like this in the where clause: filterN.column is null) combine results from inner join and left join with UNION.
Left Outer Join - gives you the MISSING entries in master table ....
SELECT * FROM MASTER M
INNER JOIN APPRENTICE A ON A.PK = M.PK
LEFT OUTER JOIN FOREIGN F ON F.FK = M.PK
If FOREIGN has keys that is not a part of MASTER you will have "null columns" where the slots are missing
I think that is what you looking for ...
Mike
First off, it is impossible to have "N number of Joins" or "N number of filters" without resorting to dynamic SQL. The SQL language was not designed for dynamic determination of the entities against which you are querying.
Second, one way to accomplish what you want (but would be built dynamically) would be something along the lines of:
Select ...
From master
Where Exists (
Select 1
From filter_1
Where filter_1 = master.col1
Union All
Select 1
From ( Select 1 )
Where Not Exists (
Select 1
From filter_1
)
Intersect
Select 1
From filter_2
Where filter_2 = master.col2
Union All
Select 1
From ( Select 1 )
Where Not Exists (
Select 1
From filter_2
)
...
Intersect
Select 1
From filter_N
Where filter_N = master.colN
Union All
Select 1
From ( Select 1 )
Where Not Exists (
Select 1
From filter_N
)
)
I have previously posted a - now deleted - answer based on wrong assumptions on you problems.
But I think you could go for a solution where you split your initial search problem into a matter of constructing the set of ids from the master table, and then select the data joining on that set of ids. Here I naturally assume you have a kind of ID on your master table. The filter tables contains the filter values only. This could then be combined into the statement below, where each SELECT in the eligble subset provides a set of master ids, these are unioned to avoid duplicates and that set of ids are joined to the table with data.
SELECT * FROM tblData INNER JOIN
(
SELECT id FROM tblData td
INNER JOIN fa on fa.a = td.a
UNION
SELECT id FROM tblData td
INNER JOIN fb on fb.b = td.b
UNION
SELECT id FROM tblData td
INNER JOIN fc on fc.c = td.c
) eligible ON eligible.id = tblData.id
The test has been made against the tables and values shown below. These are just an appendix.
CREATE TABLE tblData (id int not null primary key identity(1,1), a varchar(40), b datetime, c int)
CREATE TABLE fa (a varchar(40) not null primary key)
CREATE TABLE fb (b datetime not null primary key)
CREATE TABLE fc (c int not null primary key)
Since you have filter tables, I am assuming that these tables are probably dynamically populated from a front-end. This would mean that you have these tables as #temp_table (or even a materialized table, doesn't matter really) in your script before filtering on the master data table.
Personally, I use the below code bit for filtering dynamically without using dynamic SQL.
SELECT *
FROM [masterdata] [m]
INNER JOIN
[filter_table_1] [f1]
ON
[m].[filter_column_1] = ISNULL(NULLIF([f1].[filter_column_1], ''), [m].[filter_column_1])
As you can see, the code NULLs the JOIN condition if the column value is a blank record in the filter table. However, the gist in this is that you will have to actively populate the column value to blank in case you do not have any filter records on which you want to curtail the total set of the master data. Once you have populated the filter table with a blank, the JOIN condition NULLs in those cases and instead joins on itself with the same column from the master data table. This should work for all the cases you mentioned in your question.
I have found this bit of code to be faster in terms of performance.
Hope this helps. Please let me know in the comments.

A messy SQL statement

I have a case where I wanna choose any database entry that have an invalid Country, Region, or Area ID, by invalid, I mean an ID for a country or region or area that no longer exists in my tables, I have four tables: Properties, Countries, Regions, Areas.
I was thinking to do it like this:
SELECT * FROM Properties WHERE
Country_ID NOT IN
(
SELECT CountryID FROM Countries
)
OR
RegionID NOT IN
(
SELECT RegionID FROM Regions
)
OR
AreaID NOT IN
(
SELECT AreaID FROM Areas
)
Now, is my query right? and what do you suggest that i can do and achieve the same result with better performance?!
Your query in fact is optimal.
LEFT JOIN's proposed by others are worse, as they select ALL values and then filter them out.
Most probably your subquery will be optimized to this:
SELECT *
FROM Properties p
WHERE NOT EXISTS
(
SELECT 1
FROM Countries i
WHERE i.CountryID = p.CountryID
)
OR
NOT EXISTS
(
SELECT 1
FROM Regions i
WHERE i.RegionID = p.RegionID
)
OR
NOT EXISTS
(
SELECT 1
FROM Areas i
WHERE i.AreaID = p.AreaID
)
, which you should use.
This query selects at most 1 row from each table, and jumps to the next iteration right as it finds this row (i. e. if it does not find a Country for a given Property, it will not even bother checking for a Region).
Again, SQL Server is smart enough to build the same plan for this query and your original one.
Update:
Tested on 512K rows in each table.
All corresponding ID's in dimension tables are CLUSTERED PRIMARY KEY's, all measure fields in Properties are indexed.
For each row in Property, PropertyID = CountryID = RegionID = AreaID, no actual missing rows (worst case in terms of execution time).
NOT EXISTS 00:11 (11 seconds)
LEFT JOIN 01:08 (68 seconds)
You could rewrite it differently as follows:
SELECT p.*
FROM Properties p
LEFT JOIN Countries c ON p.Country_ID = c.CountryID
LEFT JOIN Regions r on p.RegionID = r.RegionID
LEFT JOIN Areas a on p.AreaID = a.AreaID
WHERE c.CountryID IS NULL
OR r.RegionID IS NULL
OR a.AreaID IS NULL
Test the performance difference (if there is any - there should be as NOT IN is a nasty search, especially over a lot of items as it HAS to test every single one).
You can also make this faster by indexing the IDS being searched - in each master table (Country, Region, Area) they should be clustered primary keys.
Since this seems to be cleanup sql, this should be ok. But how about using foreign keys so that it does not bother you next time around?
Well, you could try things like UNION (instead of OR) - but I expect that the optimizer is already doing the best it can given the information available:
SELECT * FROM Properties
WHERE NOT EXISTS (SELECT 1 FROM Areas WHERE Areas.AreaID = Properties.AreaID)
UNION
SELECT * FROM Properties
WHERE NOT EXISTS (SELECT 1 FROM Regions WHERE Regions.RegionID = Properties.RegionID)
UNION
SELECT * FROM Properties
WHERE NOT EXISTS (SELECT 1 FROM Countries WHERE Countries.CountryID = Properties.CountryID)
Subqueries in the conditions can be quite inefficient. Instead you can do left joins against the related tables. Where there are no matching record you get a null value. You can use this in the condition to select only the records where there is a matching record missing:
select p.*
from Properties p
left join Countries c on c.CountryID = p.Country_ID
left join Regions r on r.RegionID = p.RegionID
left join Areas a on a.AreaID = p.AreaID
where c.CountryID is null or r.RegionID is null or a.AreaID is null
If you're not grabbing the row data from countries/regions/areas you can try using "exists":
SELECT Properties.*
FROM Properties
WHERE Properties.CountryID IS NOT NULL AND NOT EXISTS (SELECT 1 FROM Countries WHERE Countries.CountryID = Properties.CountryID)
OR Properties.RegionID IS NOT NULL AND NOT EXISTS (SELECT 1 FROM Regions WHERE Regions.RegionID = Properties.RegionID)
OR Properties.AreaID IS NOT NULL AND NOT EXISTS (SELECT 1 FROM Areas WHERE Areas.AreaID = Properties.AreaID)
This will typically hint to use the pkey indices of countries et al for the existence check... but whether that is an improvement depends on your data stats, you simply have to plug it into query analyzer and try it.