SQL - Select with specific conditions in sub-query - sql

I'm trying to use a query to select all the data of a company that answers a specific condition but I have trouble doing so. The following is what I have done so far:
SELECT *
FROM company a
WHERE a.id IN (SELECT b.company_id
FROM provider b
WHERE b.service_id IN (2, 4));
What I intend for the role of the sub-query (using the table below) to be, is to select the company_id that possess the service_id 2 and 4.
So in this example, it would only return the company_id 5:
+----------------+
| provider TABLE |
+----------------+
+----------------+----------------+----------------+
| id | company_id | service_id |
+--------------------------------------------------+
| 1 | 3 | 2 |
| 2 | 5 | 2 |
| 3 | 5 | 4 |
| 4 | 9 | 6 |
| 5 | 9 | 7 |
| ... | ... | ... |
As you may have guessed, the use of the IN in the sub-query does not fulfill my needs, it will select the company_id 5 but also the company_id 3.
I understand why, IN exists to check if a value matches any value in a list of values so it is not really what I need.
So my question is:
How can I replace the IN in my sub-query to select company_id
having the service_id 2 and 4?

The subquery should be:
SELECT b.company_id
FROM provider b
WHERE b.service_id IN (2, 4)
GROUP BY b.company_id
HAVING COUNT(b.service) = 2

You can self-JOIN the provider table to find companies that own both needed services.
SELECT p1.company_id
FROM provider p1
INNER JOIN provider p2 on p2.company_id = p1.company_id and p2.service_id = 2
WHERE p1.service_id = 4

As well as the other answers are correct if you want to see all company detail instead of only Company_id you can use two EXISTS() for each service_id.
SELECT *
FROM Company C
WHERE EXISTS (SELECT 1
FROM Provider P1
WHERE C.company_id = P1.company_id
AND P1.service_id = 2)
AND EXISTS (SELECT 1
FROM Provider P2
WHERE C.company_id = P2.company_id
AND P2.service_id = 4)

I offer this option for the subquery...
SELECT b.company_id
FROM provider b WHERE b.service_id = 2
INTERSECT
SELECT b.company_id
FROM provider b WHERE b.service_id = 4
Often, I find the performance of these operations to be outstanding even with very large data sets...
UNION
INTERSECT
EXCEPT (or MINUS in Oracle)
This article has some good insights:
You Probably Don't Use SQL Intersect or Except Often Enough
Hope it helps.

Related

How do I transform the specific row value into column headers in hive [duplicate]

I tried to search posts, but I only found solutions for SQL Server/Access. I need a solution in MySQL (5.X).
I have a table (called history) with 3 columns: hostid, itemname, itemvalue.
If I do a select (select * from history), it will return
+--------+----------+-----------+
| hostid | itemname | itemvalue |
+--------+----------+-----------+
| 1 | A | 10 |
+--------+----------+-----------+
| 1 | B | 3 |
+--------+----------+-----------+
| 2 | A | 9 |
+--------+----------+-----------+
| 2 | C | 40 |
+--------+----------+-----------+
How do I query the database to return something like
+--------+------+-----+-----+
| hostid | A | B | C |
+--------+------+-----+-----+
| 1 | 10 | 3 | 0 |
+--------+------+-----+-----+
| 2 | 9 | 0 | 40 |
+--------+------+-----+-----+
I'm going to add a somewhat longer and more detailed explanation of the steps to take to solve this problem. I apologize if it's too long.
I'll start out with the base you've given and use it to define a couple of terms that I'll use for the rest of this post. This will be the base table:
select * from history;
+--------+----------+-----------+
| hostid | itemname | itemvalue |
+--------+----------+-----------+
| 1 | A | 10 |
| 1 | B | 3 |
| 2 | A | 9 |
| 2 | C | 40 |
+--------+----------+-----------+
This will be our goal, the pretty pivot table:
select * from history_itemvalue_pivot;
+--------+------+------+------+
| hostid | A | B | C |
+--------+------+------+------+
| 1 | 10 | 3 | 0 |
| 2 | 9 | 0 | 40 |
+--------+------+------+------+
Values in the history.hostid column will become y-values in the pivot table. Values in the history.itemname column will become x-values (for obvious reasons).
When I have to solve the problem of creating a pivot table, I tackle it using a three-step process (with an optional fourth step):
select the columns of interest, i.e. y-values and x-values
extend the base table with extra columns -- one for each x-value
group and aggregate the extended table -- one group for each y-value
(optional) prettify the aggregated table
Let's apply these steps to your problem and see what we get:
Step 1: select columns of interest. In the desired result, hostid provides the y-values and itemname provides the x-values.
Step 2: extend the base table with extra columns. We typically need one column per x-value. Recall that our x-value column is itemname:
create view history_extended as (
select
history.*,
case when itemname = "A" then itemvalue end as A,
case when itemname = "B" then itemvalue end as B,
case when itemname = "C" then itemvalue end as C
from history
);
select * from history_extended;
+--------+----------+-----------+------+------+------+
| hostid | itemname | itemvalue | A | B | C |
+--------+----------+-----------+------+------+------+
| 1 | A | 10 | 10 | NULL | NULL |
| 1 | B | 3 | NULL | 3 | NULL |
| 2 | A | 9 | 9 | NULL | NULL |
| 2 | C | 40 | NULL | NULL | 40 |
+--------+----------+-----------+------+------+------+
Note that we didn't change the number of rows -- we just added extra columns. Also note the pattern of NULLs -- a row with itemname = "A" has a non-null value for new column A, and null values for the other new columns.
Step 3: group and aggregate the extended table. We need to group by hostid, since it provides the y-values:
create view history_itemvalue_pivot as (
select
hostid,
sum(A) as A,
sum(B) as B,
sum(C) as C
from history_extended
group by hostid
);
select * from history_itemvalue_pivot;
+--------+------+------+------+
| hostid | A | B | C |
+--------+------+------+------+
| 1 | 10 | 3 | NULL |
| 2 | 9 | NULL | 40 |
+--------+------+------+------+
(Note that we now have one row per y-value.) Okay, we're almost there! We just need to get rid of those ugly NULLs.
Step 4: prettify. We're just going to replace any null values with zeroes so the result set is nicer to look at:
create view history_itemvalue_pivot_pretty as (
select
hostid,
coalesce(A, 0) as A,
coalesce(B, 0) as B,
coalesce(C, 0) as C
from history_itemvalue_pivot
);
select * from history_itemvalue_pivot_pretty;
+--------+------+------+------+
| hostid | A | B | C |
+--------+------+------+------+
| 1 | 10 | 3 | 0 |
| 2 | 9 | 0 | 40 |
+--------+------+------+------+
And we're done -- we've built a nice, pretty pivot table using MySQL.
Considerations when applying this procedure:
what value to use in the extra columns. I used itemvalue in this example
what "neutral" value to use in the extra columns. I used NULL, but it could also be 0 or "", depending on your exact situation
what aggregate function to use when grouping. I used sum, but count and max are also often used (max is often used when building one-row "objects" that had been spread across many rows)
using multiple columns for y-values. This solution isn't limited to using a single column for the y-values -- just plug the extra columns into the group by clause (and don't forget to select them)
Known limitations:
this solution doesn't allow n columns in the pivot table -- each pivot column needs to be manually added when extending the base table. So for 5 or 10 x-values, this solution is nice. For 100, not so nice. There are some solutions with stored procedures generating a query, but they're ugly and difficult to get right. I currently don't know of a good way to solve this problem when the pivot table needs to have lots of columns.
SELECT
hostid,
sum( if( itemname = 'A', itemvalue, 0 ) ) AS A,
sum( if( itemname = 'B', itemvalue, 0 ) ) AS B,
sum( if( itemname = 'C', itemvalue, 0 ) ) AS C
FROM
bob
GROUP BY
hostid;
Another option,especially useful if you have many items you need to pivot is to let mysql build the query for you:
SELECT
GROUP_CONCAT(DISTINCT
CONCAT(
'ifnull(SUM(case when itemname = ''',
itemname,
''' then itemvalue end),0) AS `',
itemname, '`'
)
) INTO #sql
FROM
history;
SET #sql = CONCAT('SELECT hostid, ', #sql, '
FROM history
GROUP BY hostid');
PREPARE stmt FROM #sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
FIDDLE
Added some extra values to see it working
GROUP_CONCAT has a default value of 1000 so if you have a really big query change this parameter before running it
SET SESSION group_concat_max_len = 1000000;
Test:
DROP TABLE IF EXISTS history;
CREATE TABLE history
(hostid INT,
itemname VARCHAR(5),
itemvalue INT);
INSERT INTO history VALUES(1,'A',10),(1,'B',3),(2,'A',9),
(2,'C',40),(2,'D',5),
(3,'A',14),(3,'B',67),(3,'D',8);
hostid A B C D
1 10 3 0 0
2 9 0 40 5
3 14 67 0 8
Taking advantage of Matt Fenwick's idea that helped me to solve the problem (a lot of thanks), let's reduce it to only one query:
select
history.*,
coalesce(sum(case when itemname = "A" then itemvalue end), 0) as A,
coalesce(sum(case when itemname = "B" then itemvalue end), 0) as B,
coalesce(sum(case when itemname = "C" then itemvalue end), 0) as C
from history
group by hostid
I edit Agung Sagita's answer from subquery to join.
I'm not sure about how much difference between this 2 way, but just for another reference.
SELECT hostid, T2.VALUE AS A, T3.VALUE AS B, T4.VALUE AS C
FROM TableTest AS T1
LEFT JOIN TableTest T2 ON T2.hostid=T1.hostid AND T2.ITEMNAME='A'
LEFT JOIN TableTest T3 ON T3.hostid=T1.hostid AND T3.ITEMNAME='B'
LEFT JOIN TableTest T4 ON T4.hostid=T1.hostid AND T4.ITEMNAME='C'
use subquery
SELECT hostid,
(SELECT VALUE FROM TableTest WHERE ITEMNAME='A' AND hostid = t1.hostid) AS A,
(SELECT VALUE FROM TableTest WHERE ITEMNAME='B' AND hostid = t1.hostid) AS B,
(SELECT VALUE FROM TableTest WHERE ITEMNAME='C' AND hostid = t1.hostid) AS C
FROM TableTest AS T1
GROUP BY hostid
but it will be a problem if sub query resulting more than a row, use further aggregate function in the subquery
If you could use MariaDB there is a very very easy solution.
Since MariaDB-10.02 there has been added a new storage engine called CONNECT that can help us to convert the results of another query or table into a pivot table, just like what you want:
You can have a look at the docs.
First of all install the connect storage engine.
Now the pivot column of our table is itemname and the data for each item is located in itemvalue column, so we can have the result pivot table using this query:
create table pivot_table
engine=connect table_type=pivot tabname=history
option_list='PivotCol=itemname,FncCol=itemvalue';
Now we can select what we want from the pivot_table:
select * from pivot_table
More details here
My solution :
select h.hostid, sum(ifnull(h.A,0)) as A, sum(ifnull(h.B,0)) as B, sum(ifnull(h.C,0)) as C from (
select
hostid,
case when itemName = 'A' then itemvalue end as A,
case when itemName = 'B' then itemvalue end as B,
case when itemName = 'C' then itemvalue end as C
from history
) h group by hostid
It produces the expected results in the submitted case.
I make that into Group By hostId then it will show only first row with values,
like:
A B C
1 10
2 3
I figure out one way to make my reports converting rows to columns almost dynamic using simple querys. You can see and test it online here.
The number of columns of query is fixed but the values are dynamic and based on values of rows. You can build it So, I use one query to build the table header and another one to see the values:
SELECT distinct concat('<th>',itemname,'</th>') as column_name_table_header FROM history order by 1;
SELECT
hostid
,(case when itemname = (select distinct itemname from history a order by 1 limit 0,1) then itemvalue else '' end) as col1
,(case when itemname = (select distinct itemname from history a order by 1 limit 1,1) then itemvalue else '' end) as col2
,(case when itemname = (select distinct itemname from history a order by 1 limit 2,1) then itemvalue else '' end) as col3
,(case when itemname = (select distinct itemname from history a order by 1 limit 3,1) then itemvalue else '' end) as col4
FROM history order by 1;
You can summarize it, too:
SELECT
hostid
,sum(case when itemname = (select distinct itemname from history a order by 1 limit 0,1) then itemvalue end) as A
,sum(case when itemname = (select distinct itemname from history a order by 1 limit 1,1) then itemvalue end) as B
,sum(case when itemname = (select distinct itemname from history a order by 1 limit 2,1) then itemvalue end) as C
FROM history group by hostid order by 1;
+--------+------+------+------+
| hostid | A | B | C |
+--------+------+------+------+
| 1 | 10 | 3 | NULL |
| 2 | 9 | NULL | 40 |
+--------+------+------+------+
Results of RexTester:
http://rextester.com/ZSWKS28923
For one real example of use, this report bellow show in columns the hours of departures arrivals of boat/bus with a visual schedule. You will see one additional column not used at the last col without confuse the visualization:
** ticketing system to of sell ticket online and presential
This isn't the exact answer you are looking for but it was a solution that i needed on my project and hope this helps someone. This will list 1 to n row items separated by commas. Group_Concat makes this possible in MySQL.
select
cemetery.cemetery_id as "Cemetery_ID",
GROUP_CONCAT(distinct(names.name)) as "Cemetery_Name",
cemetery.latitude as Latitude,
cemetery.longitude as Longitude,
c.Contact_Info,
d.Direction_Type,
d.Directions
from cemetery
left join cemetery_names on cemetery.cemetery_id = cemetery_names.cemetery_id
left join names on cemetery_names.name_id = names.name_id
left join cemetery_contact on cemetery.cemetery_id = cemetery_contact.cemetery_id
left join
(
select
cemetery_contact.cemetery_id as cID,
group_concat(contacts.name, char(32), phone.number) as Contact_Info
from cemetery_contact
left join contacts on cemetery_contact.contact_id = contacts.contact_id
left join phone on cemetery_contact.contact_id = phone.contact_id
group by cID
)
as c on c.cID = cemetery.cemetery_id
left join
(
select
cemetery_id as dID,
group_concat(direction_type.direction_type) as Direction_Type,
group_concat(directions.value , char(13), char(9)) as Directions
from directions
left join direction_type on directions.type = direction_type.direction_type_id
group by dID
)
as d on d.dID = cemetery.cemetery_id
group by Cemetery_ID
This cemetery has two common names so the names are listed in different rows connected by a single id but two name ids and the query produces something like this
CemeteryID Cemetery_Name Latitude
1 Appleton,Sulpher Springs 35.4276242832293
You can use a couple of LEFT JOINs. Kindly use this code
SELECT t.hostid,
COALESCE(t1.itemvalue, 0) A,
COALESCE(t2.itemvalue, 0) B,
COALESCE(t3.itemvalue, 0) C
FROM history t
LEFT JOIN history t1
ON t1.hostid = t.hostid
AND t1.itemname = 'A'
LEFT JOIN history t2
ON t2.hostid = t.hostid
AND t2.itemname = 'B'
LEFT JOIN history t3
ON t3.hostid = t.hostid
AND t3.itemname = 'C'
GROUP BY t.hostid
I'm sorry to say this and maybe I'm not solving your problem exactly but PostgreSQL is 10 years older than MySQL and is extremely advanced compared to MySQL and there's many ways to achieve this easily. Install PostgreSQL and execute this query
CREATE EXTENSION tablefunc;
then voila! And here's extensive documentation: PostgreSQL: Documentation: 9.1: tablefunc or this query
CREATE EXTENSION hstore;
then again voila! PostgreSQL: Documentation: 9.0: hstore

Comparing different columns in SQL for each row

after some transformation I have a result from a cross join (from table a and b) where I want to do some analysis on. The table for this looks like this:
+-----+------+------+------+------+-----+------+------+------+------+
| id | 10_1 | 10_2 | 11_1 | 11_2 | id | 10_1 | 10_2 | 11_1 | 11_2 |
+-----+------+------+------+------+-----+------+------+------+------+
| 111 | 1 | 0 | 1 | 0 | 222 | 1 | 0 | 1 | 0 |
| 111 | 1 | 0 | 1 | 0 | 333 | 0 | 0 | 0 | 0 |
| 111 | 1 | 0 | 1 | 0 | 444 | 1 | 0 | 1 | 1 |
| 112 | 0 | 1 | 1 | 0 | 222 | 1 | 0 | 1 | 0 |
+-----+------+------+------+------+-----+------+------+------+------+
The ids in the first column are different from the ids in the sixth column.
In a row are always two different IDs that are matched with each other. The other columns always have either 0 or 1 as a value.
I am now trying to find out how many values(meaning both have "1" in 10_1, 10_2 etc) two IDs have on average in common, but I don't really know how to do so.
I was trying something like this as a start:
SELECT SUM(CASE WHEN a.10_1 = 1 AND b.10_1 = 1 then 1 end)
But this would obviously only count how often two ids have 10_1 in common. I could make something like this for example for different columns:
SELECT SUM(CASE WHEN (a.10_1 = 1 AND b.10_1 = 1)
OR (a.10_2 = 1 AND b.10_1 = 1) OR [...] then 1 end)
To count in general how often two IDs have one thing in common, but this would of course also count if they have two or more things in common. Plus, I would also like to know how often two IDS have two things, three things etc in common.
One "problem" in my case is also that I have like ~30 columns I want to look at, so I can hardly write down for each case every possible combination.
Does anyone know how I can approach my problem in a better way?
Thanks in advance.
Edit:
A possible result could look like this:
+-----------+---------+
| in_common | count |
+-----------+---------+
| 0 | 100 |
| 1 | 500 |
| 2 | 1500 |
| 3 | 5000 |
| 4 | 3000 |
+-----------+---------+
With the codes as column names, you're going to have to write some code that explicitly references each column name. To keep that to a minimum, you could write those references in a single union statement that normalizes the data, such as:
select id, '10_1' where "10_1" = 1
union
select id, '10_2' where "10_2" = 1
union
select id, '11_1' where "11_1" = 1
union
select id, '11_2' where "11_2" = 1;
This needs to be modified to include whatever additional columns you need to link up different IDs. For the purpose of this illustration, I assume the following data model
create table p (
id integer not null primary key,
sex character(1) not null,
age integer not null
);
create table t1 (
id integer not null,
code character varying(4) not null,
constraint pk_t1 primary key (id, code)
);
Though your data evidently does not currently resemble this structure, normalizing your data into a form like this would allow you to apply the following solution to summarize your data in the desired form.
select
in_common,
count(*) as count
from (
select
count(*) as in_common
from (
select
a.id as a_id, a.code,
b.id as b_id, b.code
from
(select p.*, t1.code
from p left join t1 on p.id=t1.id
) as a
inner join (select p.*, t1.code
from p left join t1 on p.id=t1.id
) as b on b.sex <> a.sex and b.age between a.age-10 and a.age+10
where
a.id < b.id
and a.code = b.code
) as c
group by
a_id, b_id
) as summ
group by
in_common;
The proposed solution requires first to take one step back from the cross-join table, as the identical column names are super annoying. Instead, we take the ids from the two tables and put them in a temporary table. The following query gets the result wanted in the question. It assumes table_a and table_b from the question are the same and called tbl, but this assumption is not needed and tbl can be replaced by table_a and table_b in the two sub-SELECT queries. It looks complicated and uses the JSON trick to flatten the columns, but it works here:
WITH idtable AS (
SELECT a.id as id_1, b.id as id_2 FROM
-- put cross join of table a and table b here
)
SELECT in_common,
count(*)
FROM
(SELECT idtable.*,
sum(CASE
WHEN meltedR.value::text=meltedL.value::text THEN 1
ELSE 0
END) AS in_common
FROM idtable
JOIN
(SELECT tbl.id,
b.*
FROM tbl, -- change here to table_a
json_each(row_to_json(tbl)) b -- and here too
WHERE KEY<>'id' ) meltedL ON (idtable.id_1 = meltedL.id)
JOIN
(SELECT tbl.id,
b.*
FROM tbl, -- change here to table_b
json_each(row_to_json(tbl)) b -- and here too
WHERE KEY<>'id' ) meltedR ON (idtable.id_2 = meltedR.id
AND meltedL.key = meltedR.key)
GROUP BY idtable.id_1,
idtable.id_2) tt
GROUP BY in_common ORDER BY in_common;
The output here looks like this:
in_common | count
-----------+-------
2 | 2
3 | 1
4 | 1
(3 rows)

Left Joining table with values in lookup table

I have two tables on SQL-Server. One containing clients, and one a client profile lookup table. So a bit like this (note that Fred doesn't have any values in the lookup table):
Table: Clients Table: Profile
ID | Name | Status ClientID | Type | Value
----------------------- -----------------------
1 | John | Current 1 | x | 1
2 | Peter | Past 1 | y | 2
3 | Fred | Current 2 | x | 3
2 | y | 4
I then am trying to create a tmp table that needs to contain all current clients like this:
ID | Name | TypeY
==================
1 | John | 2
3 | Fred |
My knowledge of SQL is limited, but I think I should be able to do this with a Left Join, so I tried this (#tmpClient is already created):
insert into #tmpClient
select a.ID, a.Name, b.Value
from Clients a
left join Profile b
on a.ID = b.ClientID
where a.Status = 'Current' and b.Type = 'y'
However this will always miss Fred out of the temporary table. I am probably doing something very simple wrong, but as I said I am missing the SQL skills to work this one out. Please can someone help me with getting this query right.
You have to move the predicate concerning the second table of the LEFT JOIN operation from WHERE to ON clause:
insert into #tmpClient
select a.ID, a.Name, b.Value
from Clients a
left join Profile b
on a.ID = b.ClientID and b.Type = 'y'
where a.Status = 'Current'

MySQL query help (involving joins?)

Although I've figured out several queries that almost do this, I can't quite get it perfectly and I'm getting frustrated. Here is the setup:
Table: Issue
| id | name | value |
+-------------------+
| 1 | a | 10 |
| 2 | b | 3 |
| 3 | c | 4 |
| 4 | d | 9 |
Table: Link
| source | dest |
+---------------+
| 1 | 2 |
| 1 | 3 |
The link table sets up a source/dest relationship between rows in the issue table. Yes, I know this is normalized terribly, but I did not create this schema even though I now have to write queries against it :(.
What I want is results that look like this:
| name | value |
+--------------+
| a | 17 |
| d | 9 |
The values in the results should be the sum of the values in the issue table when you aggregate together a source with all its dests along with the name of the source.
Some notes
(1) A source->dest is a one->many relationship.
(2) The best answer will not have any hardcoded id's or names in the query (meaning, it will be generalized for all setups like this).
(3) This is in MySQL
Thank you and let me know if I should include any more information
Its fairly simple, but the stickler is the fact that A is not a destination of A yet it is included in the table. The robust solution would involve modifying the data to add
Table: Link
| source | dest |
+---------------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
Then a simple
SELECT a.name, SUM(d.value) FROM
Issues as a
JOIN Link as b on a.id=b.source
JOIN Issues AS d on b.dest=d.id;
GROUP BY a.name;
If you can't modify the data.
SELECT a.name, SUM(d.value)+a.value FROM
Issues as a
JOIN Link as b on a.id=b.source
JOIN Issues AS d on b.dest=d.id;
GROUP BY a.name,a.value;
MAY work.
SELECT S.name, S.value + SUM(D.value) as value
FROM Link AS L
LEFT JOIN Issue AS S ON L.source = S.id
LEFT JOIN Issue AS D ON L.dest = D.id
GROUP BY S.name
You could use a double join to find all linked rows, and add the sum to the value of the source row itself:
select src.name, src.value + sum(dest.value)
from Issue src
left join Link l
on l.source = src.id
left join Link dest
on dest.id = l.dest
group by src.name, src.value
This one should return the SUM of both source and dests, and only return items which are source.
SELECT s.name, COALESCE( SUM(d.value), 0 ) + s.value value
FROM Issue s
LEFT JOIN Link l ON ( l.source = s.id )
LEFT JOIN Issue d ON ( d.id = l.dest )
WHERE s.id NOT IN ( SELECT dest FROM Link )
GROUP BY s.name, s.value
ORDER BY s.name;

SQL searching for rows that contain multiple criteria

I have 3 tables
Customer
Groups
CustomerGroupJoins
Fields to be used
Customer:Key
Groups:Key
CustomerGroupJoins:KeyCustomer, KeyGroup
I need to search for all users that are in all groups with keys, 1,2,3
I was thinking something like (but have no idea whether this is the right/best way to go):
SELECT
*
FROM
Customer
WHERE
Key = (
SELECT KeyCustomer
FROM CustomerGroupJoins
WHERE KeyGroup = a
) = (
SELECT KeyCustomer
FROM CustomerGroupJoins
WHERE KeyGroup = b
) = (
SELECT KeyCustomer
FROM CustomerGroupJoins
WHERE KeyGroup = c
)
I created this test data:
srh#srh#[local] =# select * from customer join customergroupjoins on customer.key = customergroupjoins.keycustomer join groups on groups.key = customergroupjoins.keygroup;
key | name | keycustomer | keygroup | key | name
-----+--------+-------------+----------+-----+---------
1 | fred | 1 | 1 | 1 | alpha
1 | fred | 1 | 2 | 2 | beta
1 | fred | 1 | 3 | 3 | gamma
2 | jim | 2 | 1 | 1 | alpha
2 | jim | 2 | 2 | 2 | beta
2 | jim | 2 | 4 | 4 | delta
2 | jim | 2 | 5 | 5 | epsilon
3 | shelia | 3 | 1 | 1 | alpha
3 | shelia | 3 | 3 | 3 | gamma
3 | shelia | 3 | 5 | 5 | epsilon
(10 rows)
So "fred" is the only customer in all of (alpha, beta, gamma). To determine that:
srh#srh#[local] =# select * from customer
where exists (select 1 from customergroupjoins where keycustomer = customer.key and keygroup = 1)
and exists (select 1 from customergroupjoins where keycustomer = customer.key and keygroup = 2)
and exists (select 1 from customergroupjoins where keycustomer = customer.key and keygroup = 3);
key | name
-----+------
1 | fred
(1 row)
This is one approach. The (1,2,3) - your known group keys - are the parameters in the subqueries. Someone already mentioned you don't actually need to join to the groups table at all.
Another way:
select customer.*
from customer
join customergroupjoins g1 on g1.keycustomer = customer.key
join customergroupjoins g2 on g2.keycustomer = customer.key
join customergroupjoins g3 on g3.keycustomer = customer.key
where g1.keygroup = 1 and g2.keygroup = 2 and g3.keygroup = 3
The general problem of finding users with all groups (g_1, g_2 .. g_N) is a bit tricker. These queries above have joined to the link table (customergroupjoins) N times, so it's a different query depending on the number of groups you're checking against.
One approach to that is to create a temporary table to use as a query parameter: the table contains the list of groups that the customers must have all of. So for instance create a temp table called "ParamGroups" (or "#ParamGroups" on SQL Server to mark it as temporary), populate it with the group keys you're interested in and then do this:
select * from customer where key in (
select keycustomer
from customergroupjoins
join paramgroup on paramgroup.keygroup = customergroupjoins.keygroup
group by keycustomer
having count(*) = (select count(*) from paramgroup))
Also, as a beginner, I strongly recommend you look into advice about naming conventions for database tables and columns. Everyone has different ideas (and they can spark off holy wars), but pick some standards (if they aren't dictated to you) and stick to them. For instance you named one table "customer" (singular) and one table "groups" (plural) which looks bad. It's more usual to use "id" rather than "key", and to use it as a suffix ("customer_id" or "CustomerID") than a prefix. The whole CamelCase vs old_skool argument is more a matter of style, as is the primary-key-is-just-"id"-not-"table_id".
The above solutions will work if the customer is in any of the three groups, but won't check for membership in all of them.
Try this instead:
SELECT a.*
FROM (SELECT c.*, substring((SELECT (', ' + cg.KeyGroup)
FROM CustomerGroupJoins cg
WHERE cg.KeyCustomer = c.[Key]
AND cg.KeyGroup IN (1,2,3)
ORDER BY cg.KeyGroup ASC
FOR XML PATH('')), 3, 2000) AS GroupList
FROM Customer AS c) AS a
WHERE a.GroupList = ('1, 2, 3')
This will also work:
SELECT c.*
FROM Customer c
WHERE c.[Key] IN (SELECT cg.[KeyGroup]
JOIN CustomerGroupJoins cg WHERE cg.KeyGroup IN (1,2,3)
GROUP BY cg.KeyGroup
HAVING count(*) = 3)
Maybe something like this?
SELECT c.Key, g.Key, cgj.KeyCustomer, cgj.KeyGroup
FROM Customer c
LEFT JOIN CustomerGroupJoins cgj ON cgj.KeyCustomer = c.Key
LEFT JOIN Groups g ON g.Key = cgj.KeyGroup
WHERE g.key IN (1, 2, 3)
From what you described, try this:
SELECT * FROM Customer c
INNER JOIN CustomerGroupJoins cgj
ON c.key = cgj.keyCustomer
INNER JOIN groups g
ON cgj.keyGroup = g.key
WHERE g.key IN (1,2,3)
SELECT *
FROM customer c
INNER JOIN customerGroupJoins j ON(j.customerKey = c.key)
WHERE j.keyGroup IN (1, 2, 3)
You don't need to join against groups-table, as long as you are only interested in the group key, which is found in your join table.
Here's a possible answer, not tested:
select custid
from CustomerGroupJoins
where groupid in (1,2,3)
group by custid
having count(*) = 3
Searches for customer's that have 3 rows with groupid 1, 2, or 3. Which means that they are in all 3 groups, because I assume you have a primary key on (custid,groupid).