Copy column to another table, but keep equal ID's - sql

I've searched all over, but can't find an answer to this.. I have two tables user_setting and fibruser which both contains ID's to users. fibruser contains all possible ID's, but user_setting only contain a few. I need to copy all fibruser.id to user_setting.user_id where user_id is NULL and make sure none of the existing ones get overridden.
I do this Query:
SELECT user_setting.user_id, fibruser.id
FROM user_setting
FULL JOIN fibruser
ON user_id = fibruser.id;
to compare ID's and find all the ID's, but I simply cannot get UPDATE user_setting working like I want to...
UPDATE user_setting
SET user_id = fibruser.id
FROM fibruser
WHERE user_id IS null;
This only gives me `UPDATE user_setting
SET user_id = fibruser.id
FROM fibruser
WHERE user_id = id
AND user_id IS null;
This only outputs 0 row(s) affected.. What am I doing wrong and/or missing?
EDIT: Forgot to mention datatypes.
id: INT(auto increment)
user_id: INT

When you do a FULL JOIN, all those null values you are seeing represent cases where there is a row in fibruser with the given ID value, but there is NOT any row with that ID value in user_setting. Thus, when you do an UPDATE on user_setting, by definition you are going to have 0 rows affected; you have specifically told it to only update rows that you know do not exist!
What I think you want to do is to INSERT those rows, something like this:
INSERT INTO user_setting (user_id)
SELECT f.id
FROM fibruser f
WHERE f.id NOT IN (SELECT user_id FROM user_setting);
Alternatively, you could write it with NOT EXISTS:
INSERT INTO user_setting (user_id)
SELECT f.id
FROM fibruser f
WHERE NOT EXISTS (SELECT 1 FROM user_setting u WHERE f.id = u.user_id);
You may find that one way or the other will run faster, or they may be identical; however, since I'm assuming you're only doing this one time, you probably don't care too much about performance, so you can just pick whichever syntax floats your boat.

Related

How to group by one column and limit to rows where another column has the same value for all rows in group?

I have a table like this
CREATE TABLE userinteractions
(
userid bigint,
dobyr int,
-- lots more fields that are not relevant to the question
);
My problem is that some of the data is polluted with multiple dobyr values for the same user.
The table is used as the basis for further processing by creating a new table. These cases need to be removed from the pipeline.
I want to be able to create a clean table that contains unique userid and dobyr limited to the cases where there is only one value of dobyr for the userid in userinteractions.
For example I start with data like this:
userid,dobyr
1,1995
1,1995
2,1999
3,1990 # dobyr values not equal
3,1999 # dobyr values not equal
4,1989
4,1989
And I want to select from this to get a table like this:
userid,dobyr
1,1995
2,1999
4,1989
Is there an elegant, efficient way to get this in a single sql query?
I am using postgres.
EDIT: I do not have permissions to modify the userinteractions table, so I need a SELECT solution, not a DELETE solution.
Clarified requirements: your aim is to generate a new, cleaned-up version of an existing table, and the clean-up means:
If there are many rows with the same userid value but also the same dobyr value, one of them is kept (doesn't matter which one), rest gets discarded.
All rows for a given userid are discarded if it occurs with different dobyr values.
create table userinteractions_clean as
select distinct on (userid,dobyr) *
from userinteractions
where userid in (
select userid
from userinteractions
group by userid
having count(distinct dobyr)=1 )
order by userid,dobyr;
This could also be done with an not in, not exists or exists conditions. Also, select which combination to keep by adding columns at the end of order by.
Updated demo with tests and more rows.
If you don't need the other columns in the table, only something you'll later use as a filter/whitelist, plain userid's from records with (userid,dobyr) pairs matching your criteria are enough, as they already uniquely identify those records:
create table userinteractions_whitelist as
select userid
from userinteractions
group by userid
having count(distinct dobyr)=1
Just use a HAVING clause to assert that all rows in a group must have the same dobyr.
SELECT
userid,
MAX(dobyr) AS dobyr
FROM
userinteractions
GROUP BY
userid
HAVING
COUNT(DISTINCT dobyr) = 1

Returning value from other table in place of foreign key

Let's say I have 3 tables.
table_1
id
fk_table_2
fk_table_3
1
1
1
table_2
id
name
1
"foo"
table_3
id
name
1
"bar"
I'd like to query a row in table_1 but instead of returning the fk_table_2 & fk_table_3, is there a way to return the name associated the the row in their respective tables, without individually selecting fields.
Should return something like this:
id
fk_table_2
fk_table_3
1
"foo"
"bar"
For the moment I have this:
SELECT * FROM table_1
INNER JOIN table_2
ON table_1.fk_table_2 = table_2.id
INNER JOIN table_3
ON table_1.fk_table_3 = table_3.id;
which returns all the data I need, but incorrectly structured.
Can anyone help? Thank you.
Maybe SELECT table_2.*, table_3.* FROM ... does what you want -- if your DBMS supports it -- and is enough to get around your "without individually selecting fields" limitation.
Otherwise, you're saying "I want to individually select fields without individually selecting fields", and that's obviously a contradiction. Why are you subject to such a bizarre restriction, anyhow?

Delete duplicates with no primary key

Here want to delete rows with a duplicated column's value (Product) which will be then used as a primary key.
The column is of type nvarchar and we don't want to have 2 rows for one product.
The database is a large one with about thousands rows we need to remove.
During the query for all the duplicates, we want to keep the first item and remove the second one as the duplicate.
There is no primary key yet, and we want to make it after this activity of removing duplicates.
Then the Product columm could be our primary key.
The database is SQL Server CE.
I tried several methods, and mostly getting error similar to :
There was an error parsing the query. [ Token line number = 2,Token line offset = 1,Token in error = FROM ]
A method which I tried :
DELETE FROM TblProducts
FROM TblProducts w
INNER JOIN (
SELECT Product
FROM TblProducts
GROUP BY Product
HAVING COUNT(*) > 1
)Dup ON w.Product = Dup.Product
The preferred way trying to learn and adjust my code with something similar
(It's not correct yet):
SELECT Product, COUNT(*) TotalCount
FROM TblProducts
GROUP BY Product
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC
--
;WITH cte -- These 3 lines are the lines I have more doubt on them
AS (SELECT ROW_NUMBER() OVER (PARTITION BY Product
ORDER BY ( SELECT 0)) RN
FROM Word)
DELETE FROM cte
WHERE RN > 1
If you have two DIFFERENT records with the same Product column, then you can SELECT the unwanted records with some criterion, e.g.
CREATE TABLE victims AS
SELECT MAX(entryDate) AS date, Product, COUNT(*) AS dups FROM ProductsTable WHERE ...
GROUP BY Product HAVING dups > 1;
Then you can do a DELETE JOIN between ProductTable and Victims.
Or also you can select Product only, and then do a DELETE for some other JOIN condition, for example having an invalid CustomerId, or EntryDate NULL, or anything else. This works if you know that there is one and only one valid copy of Product, and all the others are recognizable by the invalid data.
Suppose you instead have IDENTICAL records (or you have both identical and non-identical, or you may have several dupes for some product and you don't know which). You run exactly the same query. Then, you run a SELECT query on ProductsTable and SELECT DISTINCT all products matching the product codes to be deduped, grouping by Product, and choosing a suitable aggregate function for all fields (if identical, any aggregate should do. Otherwise I usually try for MAX or MIN). This will "save" exactly one row for each product.
At that point you run the DELETE JOIN and kill all the duplicated products. Then, simply reimport the saved and deduped subset into the main table.
Of course, between the DELETE JOIN and the INSERT SELECT, you will have the DB in a unstable state, with all products with at least one duplicate simply disappeared.
Another way which should work in MySQL:
-- Create an empty table
CREATE TABLE deduped AS SELECT * FROM ProductsTable WHERE false;
CREATE UNIQUE INDEX deduped_ndx ON deduped(Product);
-- DROP duplicate rows, Joe the Butcher's way
INSERT IGNORE INTO deduped SELECT * FROM ProductsTable;
ALTER TABLE ProductsTable RENAME TO ProductsBackup;
ALTER TABLE deduped RENAME TO ProductsTable;
-- TODO: Copy all indexes from ProductsTable on deduped.
NOTE: the way above DOES NOT WORK if you want to distinguish "good records" and "invalid duplicates". It only works if you have redundant DUPLICATE records, or if you do not care which row you keep and which you throw away!
EDIT:
You say that "duplicates" have invalid fields. In that case you can modify the above with a sorting trick:
SELECT * FROM ProductsTable ORDER BY Product, FieldWhichShouldNotBeNULL IS NULL;
Then if you have only one row for product, all well and good, it will get selected. If you have more, the one for which (FieldWhichShouldNeverBeNull IS NULL) is FALSE (i.e. the one where the FieldWhichShouldNeverBeNull is actually not null as it should) will be selected first, and inserted. All others will bounce, silently due to the IGNORE clause, against the uniqueness of Product. Not a really pretty way to do it (and check I didn't mix true with false in my clause!), but it ought to work.
EDIT
actually more of a new answer
This is a simple table to illustrate the problem
CREATE TABLE ProductTable ( Product varchar(10), Description varchar(10) );
INSERT INTO ProductTable VALUES ( 'CBPD10', 'C-Beam Prj' );
INSERT INTO ProductTable VALUES ( 'CBPD11', 'C Proj Mk2' );
INSERT INTO ProductTable VALUES ( 'CBPD12', 'C Proj Mk3' );
There is no index yet, and no primary key. We could still declare Product to be primary key.
But something bad happens. Two new records get in, and both have NULL description.
Yet, the second one is a valid product since we knew nothing of CBPD14 before now, and therefore we do NOT want to lose this record completely. We do want to get rid of the spurious CBPD10 though.
INSERT INTO ProductTable VALUES ( 'CBPD10', NULL );
INSERT INTO ProductTable VALUES ( 'CBPD14', NULL );
A rude DELETE FROM ProductTable WHERE Description IS NULL is out of the question, it would kill CBPD14 which isn't a duplicate.
So we do it like this. First get the list of duplicates:
SELECT Product, COUNT(*) AS Dups FROM ProductTable GROUP BY Product HAVING Dups > 1;
We assume that: "There is at least one good record for every set of bad records".
We check this assumption by positing the opposite and querying for it. If all is copacetic we expect this query to return nothing.
SELECT Dups.Product FROM ProductTable
RIGHT JOIN ( SELECT Product, COUNT(*) AS Dups FROM ProductTable GROUP BY Product HAVING Dups > 1 ) AS Dups
ON (ProductTable.Product = Dups.Product
AND ProductTable.Description IS NOT NULL)
WHERE ProductTable.Description IS NULL;
To further verify, I insert two records that represent this mode of failure; now I do expect the query above to return the new code.
INSERT INTO ProductTable VALUES ( "AC5", NULL ), ( "AC5", NULL );
Now the "check" query indeed returns,
AC5
So, the generation of Dups looks good.
I proceed now to delete all duplicate records that are not valid. If there are duplicate, valid records, they will stay duplicate unless some condition may be found, distinguishing among them one "good" record and declaring all others "invalid" (maybe repeating the procedure with a different field than Description).
But ay, there's a rub. Currently, you cannot delete from a table and select from the same table in a subquery ( http://dev.mysql.com/doc/refman/5.0/en/delete.html ). So a little workaround is needed:
CREATE TEMPORARY TABLE Dups AS
SELECT Product, COUNT(*) AS Duplicates
FROM ProductTable GROUP BY Product HAVING Duplicates > 1;
DELETE ProductTable FROM ProductTable JOIN Dups USING (Product)
WHERE Description IS NULL;
Now this will delete all invalid records, provided that they appear in the Dups table.
Therefore our CBPD14 record will be left untouched, because it does not appear there. The "good" record for CBPD10 will be left untouched because it's not true that its Description is NULL. All the others - poof.
Let me state again that if a record has no valid records and yet it is a duplicate, then all copies of that record will be killed - there will be no survivors.
To avoid this can may first SELECT (using the query above, the check "which should return nothing") the rows representing this mode of failure into another TEMPORARY TABLE, then INSERT them back into the main table after the deletion (using transactions might be in order).
Create a new table by scripting the old one out and renaming it. Also script all objects (indexes etc..) from the old table to the new. Insert the keepers into the new table. If you're database is in bulk-logged or simple recovery model, this operation will be minimally logged. Drop the old table and then rename the new one to the old name.
The advantage of this over a delete will be that the insert can be minimally logged. Deletes do double work because not only does the data get deleted, but the delete has to be written to the transaction log. For big tables, minimally logged inserts will be much faster than deletes.
If it's not that big and you have some downtime, and you have Sql Server Management studio, you can put an identity field on the table using the GUI. Now you have the situation like your CTE, except the rows themselves are truly distinct. So now you can do the following
SELECT MIN(table_a.MyTempIDField)
FROM
table_a lhs
join table_1 rhs
on lhs.field1 = rhs.field1
and lhs.field2 = rhs.field2 [etc]
WHERE
table_a.MyTempIDField <> table_b.MyTempIDField
GROUP BY
lhs.field1, rhs.field2 etc
This gives you all the 'good' duplicates. Now you can wrap this query with a DELETE FROM query.
DELETE FROM lhs
FROM table_a lhs
join table_b rhs
on lhs.field1 = rhs.field1
and lhs.field2 = rhs.field2 [etc]
WHERE
lhs.MyTempIDField <> rhs.MyTempIDField
and lhs.MyTempIDField not in (
SELECT MIN(lhs.MyTempIDField)
FROM
table_a lhs
join table_a rhs
on lhs.field1 = rhs.field1
and lhs.field2 = rhs.field2 [etc]
WHERE
lhs.MyTempIDField <> rhs.MyTempIDField
GROUP BY
lhs.field1, lhs.field2 etc
)
Try this:
DELETE FROM TblProducts
WHERE Product IN
(
SELECT Product
FROM TblProducts
GROUP BY Product
HAVING COUNT(*) > 1)
This suffers from the defect that it deletes ALL the records with a duplicated Product. What you probably want to do is delete all but one of each group of records with a given Product. It might be worthwhile to copy all the duplicates to a separate table first, and then somehow remove duplicates from that table, then apply the above, and then copy remaining products back to the original table.

unique pair in a "friendship" database

I'm posting this question which is somewhat a summary of my other question.
I have two databases:
1) db_users.
2) db_friends.
I stress that they're stored in separate databases on different servers and therefore no foreign keys can be used.
In 'db_friends' I have the table 'tbl_friends' which has the following columns:
- id_user
- id_friend
Now how do I make sure that each pair is unique at this table ('tbl_friends')?
I'd like to enfore that at the table level, and not through a query.
For example these are invalid rows:
1 - 2
2 - 1
I'd like this to be impossible to add.
Additionally - how would I seach for all of the friends of user 713 while he could be mentioned, on some friendship rows, at the second column ('id_friend')?
You're probably not going to be able to do this at the database level -- your application code is going to have to do this. If you make sure that your tbl_friends records always go in with (lowId, highId), then a typical PK/Unique Index will solve the duplicate problem. In fact, I'd go so far to rename the columns in your tbl_friends to (id_low, id_high) just to reinforce this.
Your query to find anything with user 713 would then be something like
SELECT id_low AS friend FROM tbl_friends WHERE (id_high = ?)
UNION ALL
SELECT id_high AS friend FROM tbl_friends WHERE (id_low = ?)
For efficiency, you'd probably want to index it forward and backward -- that is by (id_user, id_friend) and (id_friend, id_user).
If you must do this at a DB level, then a stored procedure to swap arguments to (low,high) before inserting would work.
You'd have to use a trigger to enforce that business rule.
Making the two columns in tbl_friends the primary key (unique constraint failing that) would only ensure there can't be duplicates of the same set: 1, 2 can only appear once but 2, 1 would be valid.
how would I seach for all of the friends of user 713 while he could be mentioned, on some friendship rows, at the second column ('id_friend')?
You could use an IN:
WHERE 713 IN (id_user, id_friend)
..or a UNION:
JOIN (SELECT id_user AS user
FROM TBL_FRIENDS
UNION ALL
SELECT id_friend
FROM TBL_FRIENDS) x ON x.user = u.user
Well, a unique constraint on the pair of columns will get you half way there. I think the easiest way to ensure you don't get the reversed version would be to add a constraint ensuring that id_user < id_friend. You will need to compensate for this ordering at insertion time, but it will get you the database Level constraint you desire without duplicating data or relying on foreign keys.
As for the second question, to find all friends for id=1 you could select id_user, id_friend from tbl_friend where id_user = 1 or id_friend = 1 and then in your client code throw out all the 1's regardless of column.
One way you could do it is to store the two friends on two rows:
CREATE TABLE FriendPairs (
pair_id INT NOT NULL,
friend_id INT NOT NULL,
PRIMARY KEY (pair_id, friend_id)
);
INSERT INTO FriendPairs (pair_id, friend_id)
VALUES (1234, 317), (1234, 713);
See? It doesn't matter which order you insert them, because both friends go in the friend_id column. So you can enforce uniqueness easily.
You can also query easily for friends of 713:
SELECT f2.friend_id
FROM FriendPairs AS f1
JOIN FriendPairs AS f2 ON (f1.pair_id = f2.pair_id)
WHERE f1.friend_id = 713

MySQL strict select of rows involving many to many tables

I have three tables in the many-to-many format. I.e, table A, B, and AB set up as you'd expect.
Given some set of A ids, I need to select only the rows in AB that match all of the ids.
Something like the following won't work:
"SELECT * FROM AB WHERE A_id = 1 AND A_id = 2 AND A_id = 3 AND ... "
As no single row will have more than one A_id
Using, an OR in the sql statment is no better as it yields results all results that have at least one of the A ids (whereas I only want those rows that have all of the ids).
Edit:
Sorry, I should explain. I don't know if the actual many-to-many relationship is relevant to the actual problem. The tables are outlined as follows:
Table People
int id
char name
Table Options
int id
char option
Table peoples_options
int id
int people_id
int option_id
And so I have a list of people, and a list of options, and a table of options and people.
So, given a list of option ids such as (1, 34, 44, ...), I need to select only those people that have all the options.
Your database doesn't appear to be normalized correctly. Your AB table should have a single A_id and a single B_id in each of its rows. If that were the case, your OR-version should work (although I would use IN myself).
Ignore the preceding paragraph. From your edit, you really wanted to know all the B's that have all of a subset of A's in the many-to-many table - see below for the query.
Please tell us the actual schema details, it's a little hard to figure out what you want without that.
I'd expect to see something like:
table a:
a_id integer
a_payload varchar(20)
table b:
b_id integer
b_payload varchar(20)
table ab:
a_id integer
b_id integer
Based on your description, the only thing I can think of is that you want a list of all the B's that have all of a set of A's in the AB table. In which case, you're looking at something like (to get the list of B's that have A's of 1, 3 and 4):
select distinct b_id from ab n1
where exists (select b_id from ab where a_id = 1 and b_id = n1.b_id)
and exists (select b_id from ab where a_id = 3 and b_id = n1.b_id)
and exists (select b_id from ab where a_id = 4 and b_id = n1.b_id);
This works in DB2 but I'm not sure how much of SQL your chosen server implements.
A bit of a hacky solution is to use IN with a group by and having filter. Like so:
SELECT B_id FROM AB
WHERE A_id IN (1,2,3)
GROUP BY B_id
HAVING COUNT(DISTINCT A_id) = 3;
That way, you only get the B_id values that have exactly 3 A_id values, and they have to be from your list. I used DISTINCT in the COUNT just in case (A_id, B_id) isn't unique. If you need other columns, you could then join to this query as a sub-select in the FROM clause of another select statement.
Try this....It brings all people that is associated with all options.
In other words the query bring all people that there it doesnt exists an option that were not associated to it.
select
p.*
from
people p
where
not exists (
select
1
from
options o
where
not exists
(
select 1
from peoples_options po
where po.people_id = p.people_id
AND po.option_id = o.option_id
)
)