SQL to normalize existing (many-to-many) data - sql

Summary:
See below for Details. I'm copying the [unanswered] many-to-many question here to the top for readability:
Given the "Input" table, what is the SQL to generate the 3rd "Output"
table (Person_plays_Instrument)?
Current input (1 table):
OriginalTable:
PersonId PersonName Instrument_1 Instrument_2 Instrument_3 MailingAddress HomePhone
--------|----------|------------|------------|------------|--------------|------------
1 Bob Violin Viola Trumpet someplace 111-111-1111
2 Suzie Cello Flute <null> otherplace 222-222-2222
3 Jim Violin <null> <null> thirdplace 333-333-3333
Desired output (3 tables):
Person:
Id Name MailingAddress HomePhone
--|------|--------------|------------
1 Bob someplace 111-111-1111
2 Suzie otherplace 222-222-2222
3 Jim thirdplace 333-333-3333
Instrument:
Id Name
--|-------
1 Violin
2 Cello
3 Viola
4 Flute
5 Trumpet
Person_plays_Instrument:
PersonId InstrumentId
--------|------------
1 1
1 3
1 5
2 2
2 4
3 1
Details:
I have a single flat SQL table which started out as a spreadsheet. I'd like to normalize it. I'll split this into 1 question for each table.
Questions 1 and 2 have been answered, but I am leaving them in in case others find them helpful.
Questions:
Question #1: [answered]
How do I generate Person table?
Answer #1:
This wonderful post gets me 2/3rds of the way there. For the one-to-many tables, I'm set. Here's the code:
[add autonumber field to OriginalTable, name it PersonId]
[create empty Person table with Id, Name, MailingAddress, HomePhone fields]
INSERT INTO Person (Id, Name, MailingAddress, HomePhone)
SELECT o.PersonID, o.PersonName, o.MailingAddress, o.HomePhone
FROM OriginalTable as o
WHERE o.PersonName Is Not Null;
Question #2: [attempted] (better version by #Branko in Accepted Answer)
How do I generate Instrument table?
Answer #2:
Again, one-to-many. At first, the multiple columns had me stumped.
The solution came in two parts:
I'd just have to repeat the INSERT command, once for each column.
Using this post and the IN operator, I can check each time to confirm I hadn't already inserted that value.
Here's the code:
[create empty Instrument table with Id[autonumber], Name fields]
INSERT INTO Instrument (Name)
SELECT Distinct o.Instrument_1
FROM OriginalTable as o
WHERE o.Instrument_1 Is Not Null
AND o.Instrument_1 Not In (SELECT Name from Instrument);
INSERT INTO Instrument (Name)
SELECT Distinct o.Instrument_2
FROM OriginalTable as o
WHERE o.Instrument_2 Is Not Null
AND o.Instrument_2 Not In (SELECT Name from Instrument);
INSERT INTO Instrument (Name)
SELECT Distinct o.Instrument_3
FROM OriginalTable as o
WHERE o.Instrument_3 Is Not Null
AND o.Instrument_3 Not In (SELECT Name from Instrument);
Question #3: [unanswered]
How do I generate Person_plays_Instrument table?

Assuming there is OriginalTable.PersonID, which you haven't shown us, but is implied by your own answer #1, the answer #3 can be expressed simply as:
INSERT INTO Person_plays_Instrument (PersonId, InstrumentId)
SELECT PersonID, Instrument.Id
FROM
OriginalTable
JOIN Instrument
ON OriginalTable.Instrument_1 = Instrument.Name
OR OriginalTable.Instrument_2 = Instrument.Name
OR OriginalTable.Instrument_3 = Instrument.Name;
BTW, there is a more concise way to express the answer #2:
INSERT INTO Instrument (Name)
SELECT *
FROM (
SELECT o.Instrument_1 I
FROM OriginalTable as o
UNION
SELECT o.Instrument_2
FROM OriginalTable as o
UNION
SELECT o.Instrument_3
FROM OriginalTable as o
) Q
WHERE I IS NOT NULL;
And here is a fully working SQL Fiddle example for MS SQL Server. Other DBMSes should behave similarly. BTW, you should tag your question appropriately to indicate your DBMS.

Related

How can I delete equal rows in a table in SQL? [duplicate]

This question already has answers here:
Delete duplicate rows from small table
(15 answers)
Closed 2 years ago.
I have a table with some data inserted in it. The issue is that there are many rows that are equal to other rows and I want to delete them leaving just one of those rows. For example:
Table Person
name pet
---------------------------
Mike Dog
Kevin Dog
Claudia Cat
Mike Dog
Mike Dog
Kevin Snake
As you can see, we can see multiple times that Person named Mike has a Dog.
But I would like to see it only once. So the output I'll want after update this table is:
name pet
---------------------------
Mike Dog
Kevin Dog
Claudia Cat
Kevin Snake
How can this be done?
You can do this with exists. In apparent absence of a primary key, system column ctid can be used:
delete from mytable t
where exists (
select 1
from mytable t1
where t1.name = t.name and t1.pet = t.pet and t1.ctid > t.ctid
);
The simplest method is probably to recreate the table:
create table temp_t as
select distinct name, pet
from t;
truncate table t; -- back it up first!
insert into t (name, pet)
select name, pet
from temp_t;
create unique index unq_t_name_pet on t(name, pet);
The last step is to prevent this problem in the future.

How to display data in SQL from multiple tables, but only if one column data matches another column?

I'm still learning SQL, so this may just be my ignorance or inability to express in a search what I'm looking for. I've spent roughly an hour searching for some variation of the title (both here and general searches on Google). I apologize, I apparently also don't know how to format here. I'll try to clean it up now that I've posted.
I have a database of customer data that I did not design. In the GUI, there are multiple tabs, and it seems like each tab earned it's own table. The tables are linked together with a field called RecordID. In one of the tables is the Customer Data tab. The way that it's organized is that a single customer record from table A can have multiple rows in table B. I only want data from column B in table B is "CompanyA" and if column A in table B = 1. Sample data is below.
Expected output:
CardNumber LastName FirstName CustomerID DataItem
------------------------------------------------------
32154 Clapton Eric 181212 CompanyA
Table A:
RecordID CardNumber LastName FirstName CustomerID
---------------------------------------------------------------
1 12345 Smith John 190201
2 12346 Jones Sandy 190202
3 23456 Petty Tom 190203
4 32154 Clapton Eric 181212
5 14728 Tyler Steven 180225
Table B:
RecordID DataID DataItem
--------------------------------
1 0 CompanyA
1 1 Yes
1 2 No
1 3 Revoked
1 4 NULL
1 5 CompanyB
2 0 CompanyB
2 1 Yes
2 2 No
2 3 NULL
2 4 24-54A
2 5 CompanyC
3 0 CompanyA
3 1 No
3 2 No
3 3 NULL
3 4 68-69B
3 5 NULL
4 0 CompanyA
4 1 Yes
4 2 Yes
5 0 CompanyB
5 1 No
5 2 No
5 5 CompanyA
The concept you're looking for is a JOIN. In this case specifically you need an INNER JOIN. Joins connects two tables together based on criteria you specify (such as matching values in fields) and merges the result into one table in the output.
Here's an example to suit your scenario:
SELECT
A.CardNumber,
A.LastName,
A.FirstName,
A.CustomerID,
B.DataItem
FROM
TableA A
INNER JOIN TableB B -- join tableB onto tableA
ON A.RecordID = B.RecordID -- in the ON clause you specify criteria by you match the fields
WHERE
B.columnA = 'CompanyA'
AND B.columnB = 1
Here's the relevant SQL Server Documentation
Also I'd advise you to potentially take a comprehensive introductory SQL tutorial, and/or find a book. A good one will introduce all of the basic, key concepts such as this to you in a logical way, then you're not grasping in the dark trying to google things for which you don't know the correct terminology.
select a.CardNumber, a.LastName, a.FirstName, a.CustomerID, b.dataitem
from tableA A inner join TableB b
on a.recordid = b.recordid
where b.columnA= 'CompanyA' and b.columnB = 1
Here is your solution,
select a.CardNumber, a.LastName, a.FirstName, a.CustomerID, b.DataItem from
tableA a
inner join tableB b
on (a.RecordID = b.RecordID)
where
b.DataItem='CompanyA'
b.RecordID=1;
Le me know if the result is not as expected
Your question is quite hard to understand, but let me give you an example that resembles the what i think you are asking.
SELECT a.*, b.DataItem FROM A a INNER JOIN B b
ON a.RecordID = b.RecordID AND
b.DataItem = `CompanyA`
At the database engine level, if you are using Microsoft technology, the most efficient structure is to use an indexed foreign key constraint on Table B, and a Primary Surrogate Key (PSK) column on Table A. The Primary Surrogate Key in your case is on the Parent table, Table A, and is called RecordID. The foreign key column with the FKC is on Table B, on the column named RecordID. Once you verify that there is a FKC (foreign key constraint on Table B, which pins both columns named RecordID between both tables on matched values), then address the GUI. At the GUI, between the tabs, you generally indicate you have a parent table with a unique set of Record IDs (one column named Record ID with absolutely unique values in each row and no empty rows on that column). There will also be child tables on each Tab in your GUI, and those are bound to the parent table in a "1 to Many (1:M)" fashion, where 1 parent has many children. Your commentary or question indicates that you also want to filter, where Record ID on the child in one of the related tabs equates to the integer value 1 on the Record ID. So, there needs to be a query somewhere:
SELECT [columns]
FROM [Table B]
INNER JOIN [Table A]
ON A.RecordID = B.RecordID
AND B.RecordID = 1;
Does that help?

Insert values from one table to another if certain columns are unique from different databases

The setup I have is the following I have to databases A and B on the same server.
A and B have identical table names. I want to append data from tables of A to tables of B. However I want to append a certain row from A to B if and only if that row is unique on certain fields. For instance if i have table named People.
in A
People:
ID name Surname
1 Mark Anthony
2 Julius Ceasar
3 Marcus Crassus
in B
People
ID name Surname
1 Marcus Caelius
2 Julius Ceasar
3 Sevilius Casca
4 Marcus Crassus
I want to add to the People table in B database those rows of People table in A where the name and surname fields do not already exist People of B.
so the result would be
in B
People
ID name Surname
1 Marcus Caelius
2 Julius Ceasar
3 Sevilius Casca
4 Marcus Crassus
5 Mark Anthony
I think this is just an insert . . . select:
insert into b.people(name, surname)
select name, surname
from a.people a
where not exists (select 1 from b.people b where b.name = a.name and b.surname = a.surname);
This assumes that the id field is a serial/identity/auto increment column. That is a best practice anyway.
You can also write this as:
insert into b.people(name, surname)
select name, surname
from a.people a
except
select name, surname
from b.people b;

SQL Insert with value from different table

I have 2 tables storing information. For example:
Table 1 contains persons:
ID NAME CITY
1 BOB 1
2 JANE 1
3 FRED 2
The CITY is a id to a different table:
ID NAME
1 Amsterdam
2 London
The problem is that i want to insert data that i receive in the format:
ID NAME CITY
1 PETER Amsterdam
2 KEES London
3 FRED London
Given that the list of Cities is complete (i never receive a city that is not in my list) how can i insert the (new/received from outside)persons into the table with the right ID for the city?
Should i replace them before I try to insert them, or is there a performance friendly (i might have to insert thousands of lines at one) way to make the SQL do this for me?
The SQL server i'm using is Microsoft SQL Server 2012
First, load the data to be inserted into a table.
Then, you can just use a join:
insert into persons(id, name, city)
select st.id, st.name, c.d
from #StagingTable st left join
cities c
on st.city = c.name;
Note: The persons.id should probably be an identity column so it wouldn't be necessary to insert it.
insert into persons (ID,NAME,CITY) //you dont need to include ID if it is auto increment
values
(1,'BOB',(select Name from city where ID=1)) //another select query is getting Name from city table
if you want to add 1000 rows at a time that'd be great if you use stored procedure like this link

mysql select data from multiple rows

I have a table
id, name, keyword
1 bob guy
2 bob developer
3 mary girl
4 joe guy
Q1 : What would be the sql to get back the row (bob) containing both keywords 'guy' AND 'developer'?
Intuitively, I thought it'd be SELECT * FROM TABLE WHERE keyword = 'guy' AND keyword = 'developer'
Q2: But I suppose the first conditional AND removes the 2nd row (bob, developer) which causes the sql to return no result? Am I correct about this speculation?
SELECT * FROM TABLE WHERE keyword = 'guy' AND name in (SELECT name FROM TABLE WHERE keyword = 'developer')