I have three tables in the many-to-many format. I.e, table A, B, and AB set up as you'd expect.
Given some set of A ids, I need to select only the rows in AB that match all of the ids.
Something like the following won't work:
"SELECT * FROM AB WHERE A_id = 1 AND A_id = 2 AND A_id = 3 AND ... "
As no single row will have more than one A_id
Using, an OR in the sql statment is no better as it yields results all results that have at least one of the A ids (whereas I only want those rows that have all of the ids).
Edit:
Sorry, I should explain. I don't know if the actual many-to-many relationship is relevant to the actual problem. The tables are outlined as follows:
Table People
int id
char name
Table Options
int id
char option
Table peoples_options
int id
int people_id
int option_id
And so I have a list of people, and a list of options, and a table of options and people.
So, given a list of option ids such as (1, 34, 44, ...), I need to select only those people that have all the options.
Your database doesn't appear to be normalized correctly. Your AB table should have a single A_id and a single B_id in each of its rows. If that were the case, your OR-version should work (although I would use IN myself).
Ignore the preceding paragraph. From your edit, you really wanted to know all the B's that have all of a subset of A's in the many-to-many table - see below for the query.
Please tell us the actual schema details, it's a little hard to figure out what you want without that.
I'd expect to see something like:
table a:
a_id integer
a_payload varchar(20)
table b:
b_id integer
b_payload varchar(20)
table ab:
a_id integer
b_id integer
Based on your description, the only thing I can think of is that you want a list of all the B's that have all of a set of A's in the AB table. In which case, you're looking at something like (to get the list of B's that have A's of 1, 3 and 4):
select distinct b_id from ab n1
where exists (select b_id from ab where a_id = 1 and b_id = n1.b_id)
and exists (select b_id from ab where a_id = 3 and b_id = n1.b_id)
and exists (select b_id from ab where a_id = 4 and b_id = n1.b_id);
This works in DB2 but I'm not sure how much of SQL your chosen server implements.
A bit of a hacky solution is to use IN with a group by and having filter. Like so:
SELECT B_id FROM AB
WHERE A_id IN (1,2,3)
GROUP BY B_id
HAVING COUNT(DISTINCT A_id) = 3;
That way, you only get the B_id values that have exactly 3 A_id values, and they have to be from your list. I used DISTINCT in the COUNT just in case (A_id, B_id) isn't unique. If you need other columns, you could then join to this query as a sub-select in the FROM clause of another select statement.
Try this....It brings all people that is associated with all options.
In other words the query bring all people that there it doesnt exists an option that were not associated to it.
select
p.*
from
people p
where
not exists (
select
1
from
options o
where
not exists
(
select 1
from peoples_options po
where po.people_id = p.people_id
AND po.option_id = o.option_id
)
)
Related
I used a query to find a list of Primary Keys. One Primary key per each ForiegnKey in a table by using below query.
select foreignKey, min(primaryKey)
from t
group by foreignKey;
Let us say this is the result : 1,4,5
NOw I have another table - Table B that has list of all Primary keys. It has 1,2,3,6,7,8,9
I want a write a query using the above query So that I get a subset of the original query(above) that does not exist in Table B. I want 4 and 5 back with the new query.
Use a having clause:
select foreignKey, min(primaryKey)
from t
group by foreignKey
having min(primarykey) not in (select pk from b);
You should also be able to express this as not exists:
having not exists (select 1
from b
where b.pk = min(t.primaryKey)
)
Postgresql:
I have two tables 'abc' and 'xyz' in postgresql. Both tables have same 'id' columns which type is 'serial primary key;'.
abc table id column values are 1,2,3 and also xyz id column containing same values 1,2,3,4
I want to union both tables with 'union all' constraint. But I want to change 'xyz' id column values to next value of 'abc' id column last value as 1,2,3,4,5,6,7
select id from abc
union all
select id from xyz
|id|
1
2
3
1
2
3
4
my wanted resuls as
|id|
1
2
3
4
5
6
7
BETTER - Thanks to #CaiusJard
This should do it for you
select id FROM abc
UNION ALL select x.id + a.maxid FROM xyz x,
(SELECT MAX(id) as maxid from abc) a
ORDER BY id
For anyone who's doing something like this:
I had a similar problem to this, I had table A and table B which had two different serials. My solution was to create a new table C which was identical to table B except it had an "oldid" column, and the id column was set to use the same sequence as table A. I then inserted all the data from table B into table C (putting the id in the oldid field). Once I fixed the refernces to point to from the oldid to the (new)id I was able to drop the oldid column.
In my case I needed to fix the old relations, and needed it to remain unique in the future (but I don't care that the ids from table A HAVE to all be before those from table C). Depending on what your trying to accomplish, this approach may be useful.
If anyone is going to use this approach, strictly speaking, there should be a trigger to prevent someone from manually setting an id in one table to match another. You should also alter the sequence to be owned by NONE so it's not dropped with table A, if table A is ever dropped.
I'm using PostgreSQL 9.3, and I'm trying to write a SQL script to insert some data for unit tests, and I've run into a bit of a problem.
Let's say we have three tables, structured like this:
------- Table A ------- -------- Table B -------- -------- Table C --------
id | serial NOT NULL id | serial NOT NULL id | serial NOT NULL
foo | character varying a_id | integer NOT NULL b_id | integer NOT NULL
bar | character varying baz | character varying
The columns B.a_id and C.b_id are foreign keys to the id column of tables A and B, respectively.
What I'm trying to do is to insert a row into each of these three tables with pure SQL, without having the ID's hard-coded into the SQL (making assumptions about the database before this script is run seems undesirable, since if those assumptions change I'll have to go back and re-compute the proper ID's for all of the test data).
Note that I do realize I could do this programatically, but in general writing pure SQL is way less verbose than writing program code to execute SQL, so it makes more sense for test suite data.
Anyway, here's the query I wrote which I figured would work:
WITH X AS (
WITH Y AS (
INSERT INTO A (foo)
VALUES ('abc')
RETURNING id
)
INSERT INTO B (a_id, bar)
SELECT id, 'def'
FROM Y
RETURNING id
)
INSERT INTO C (b_id, baz)
SELECT id, 'ghi'
FROM X;
However, this doesn't work, and results in PostgreSQL telling me:
ERROR: WITH clause containing a data-modifying statement must be at the top level
Is there a correct way to write this type of query in general, without hard-coding the ID values?
(You can find a fiddle here which contains this example.)
Don't nest the common table expressions, just write one after the other:
WITH Y AS (
INSERT INTO A (foo)
VALUES ('abc')
RETURNING id
), x as (
INSERT INTO B (a_id, bar)
SELECT id, 'def'
FROM Y
RETURNING id
)
INSERT INTO C (b_id, baz)
SELECT id, 'ghi'
FROM X;
I am currently trying to delete from Table A where a corresponding record is not being used in Table B. Table A has Section, SubSection, Code, Text as fields, where the first three are the Primary Key. Table B has ID, Section, SubSection, Code as fields, where all four are the Primary Key. There are more columns, but they are irrelevant to this question...just wanted to point that out before I get questioned on why all columns are part of the Primary Key for Table B. Pretty much Table A is a repository of all possible data that can be assigned to a entity, Table B is where they are assigned. I want to delete all records from table A that are not in use in Table B. I have tried the following with no success:
DELETE FROM Table A
WHERE NOT EXISTS (SELECT * from Table B
WHERE A.section = B.section
AND A.subsection = B.subsection
AND A.code = b.code)
If I do a Select instead of a delete, I get the subset I am looking for, but when I do a delete, I get an error saying that there is a syntax error at Table A. I would use a NOT IN statement, but with multiple columns being part of the Primary Key, I just don't see how that would work. Any help would be greatly appreciated.
In sql server,when using not exists, you need to set an alias for the table to be connected, and in the delete statement, to specify the table to delete rows from.
DELETE a FROM Table_A a
WHERE NOT EXISTS (SELECT * from Table_B b
WHERE a.section = b.section
AND a.subsection = b.subsection
AND a.code = b.code)
Please try :
DELETE FROM Table A
WHERE NOT EXISTS (SELECT 1 from Table B
WHERE A.section = B.section
AND A.subsection = B.subsection
AND A.code = b.code)
1 is just a placeholder, any constant/single non-null column will work.
Try something like this:
delete from Table_A
where (section, subsection, code) not in (select section,
subsection,
code
from Table_B)
I inherited a large existing DB and I'd like to know if I should refactor it because 95% of my queries require joining at least 4 tables.
The DB has a 5 tables that only have an ID and Name column with less than 20 rows. I assume the author did this so he could change the names there and not change them in the other tables, but many of those tables are only referenced in one other table. Should I refactor these small 2 column tables into the a larger table and add a constraint to the column so users can't input incorrect names instead of having seperate tables?
Resist that urge. From your description I can deduce that the existing design is solid and probably well normalized. Your refactoring may actually undo a good db structure.
If you are bothered by writing a lot of joins in your queries I would suggest creating views to mitigate the boilerplate.
...the author did this so he could change the names there not change
them in the other tables...
That is evidence of good design and exactly what you should strive for in a normalized database.
no.
your db is normalized and proper.
and you save space, lookup time, indexing for storing an int rather then a varchar name
small tables are optimized away if they are properly keyed.
Sounds like what you have are lookup tables. Let me tell you waht happens when you decide to put all lookups in one table with an additonal column to specify which type it is. Fisrt instead of joining to 4 different tables in one query, you have to join to the same table 4 times. There ends up being more contention for the resources in the "one table to rule them all". Further, you lose FK constraints. That means you eventually lose data integrity. So if one lookup is state, nothing wil prevent you from putting the id values for a different lookup for customer type in the stateid column in the customeraddress table. When the lookups are separate you con enforce that relationship.
Suppose instead of one big table you decide to have a constraint on the column for customer type. Constraints are now enforced but you have a problem when they need to change. Now you have to alter the database in order to add a new type. Again usually this is a very bad idea espcially when the table gets large.
Short story: Replacing strings with ID numbers has nothing to do with normalization. Using natural keys in your case might improve performance. In my tests, queries using natural keys were faster by 1 or 2 orders of magnitude.
You might have accepted an answer too quickly.
The DB has a 5 tables that only have an ID and Name column with less
than 20 rows.
I'm assuming these tables have a structure something like this.
create table a (
a_id integer primary key,
a_name varchar(30) not null unique
);
create table b (...
-- Just like a
create table your_data (
yet_another_id integer primary key,
a_id integer not null references a (a_id),
b_id integer not null references b (b_id),
c_id integer not null references c (c_id),
d_id integer not null references d (d_id),
unique (a_id, b_id, c_id, d_id),
-- other columns go here
);
And it's obvious that your_data will require four joins (at least) to get usable information from it.
But the names in table a, b, c, and d are unique (ahem), so you can use the unique names as targets for foreign key references. You could rewrite the table your_data like this.
create table your_data (
yet_another_id integer primary key,
a_name varchar(30) not null references a (a_name),
b_name varchar(30) not null references b (b_name),
c_name varchar(30) not null references c (c_name),
d_name varchar(30) not null references d (d_name),
unique (a_name, b_name, c_name, d_name),
-- other columns go here
);
Replacing id numbers with strings doesn't change the normal form. (And replacing strings with id numbers doesn't have anything to do with normalization.) If the original table were in 5NF, then this rewrite will be in 5NF, too.
But what about performance? Aren't id numbers plus joins supposed to be faster than strings?
I tested that by inserting 20 rows into each of the four tables a, b, c, and d. Then I generated a Cartesian product to fill one test table written with id numbers, and another using the names. (So, 160K rows in each.) I updated the statistics, and ran a couple of queries.
explain analyze
select a.a_name, b.b_name, c.c_name, d.d_name
from your_data_id
inner join a on (a.a_id = your_data_id.a_id)
inner join b on (b.b_id = your_data_id.b_id)
inner join c on (c.c_id = your_data_id.c_id)
inner join d on (d.d_id = your_data_id.d_id)
...
Total runtime: 808.472 ms
explain analyze
select a_name, b_name, c_name, d_name
from your_data
Total runtime: 132.098 ms
The query using id numbers takes a lot longer to execute. I used a WHERE clause on all four columns, which returns a single row.
explain analyze
select a.a_name, b.b_name, c.c_name, d.d_name
from your_data_id
inner join a on (a.a_id = your_data_id.a_id and a.a_name = 'a one')
inner join b on (b.b_id = your_data_id.b_id and b.b_name = 'b one')
inner join c on (c.c_id = your_data_id.c_id and c.c_name = 'c one')
inner join d on (d.d_id = your_data_id.d_id and d.d_name = 'd one)
...
Total runtime: 14.671 ms
explain analyze
select a_name, b_name, c_name, d_name
from your_data
where a_name = 'a one' and b_name = 'b one' and c_name = 'c one' and d_name = 'd one';
...
Total runtime: 0.133 ms
The tables using id numbers took about 100 times longer to query.
Tests used PostgreSQL 9.something.
My advice: Try before you buy. I mean, test before you invest. Try rewriting your data table to use natural keys. Think carefully about ON UPDATE CASCADE and ON DELETE CASCADE. Test performance with representative sample data. Edit your original question and let us know what you found.