Convert from an EAV table in SQL - sql

Im having issues converting to format of an EAV table to something useful. The link table is confusing me and I dont really know how to start fixing this. Anyone have suggestions?
Contacts table
con_id Name Data
1 email a#gmail.com
2 phone 123
3 email b#gmail.com
4 phone 456
Link table (maps actual user accounts to rows in the Contacts table):
acct_id con_id
1 1
1 2
2 3
2 4
END GOAL:
acct_id Email Phone
1 a#gmail.com 123
2 b#gmail.com 456

http://sqlfiddle.com/#!4/7cf20/5/0
CREATE TABLE Contacts
(con_id int, Name varchar2(5), Data varchar2(11))
;
INSERT ALL
INTO Contacts (con_id, Name, Data)
VALUES (1, 'email', 'a#gmail.com')
INTO Contacts (con_id, Name, Data)
VALUES (2, 'phone', '123')
INTO Contacts (con_id, Name, Data)
VALUES (3, 'email', 'b#gmail.com')
INTO Contacts (con_id, Name, Data)
VALUES (4, 'phone', '456')
SELECT * FROM dual
;
CREATE TABLE Link
(acct_id int, con_id int)
;
INSERT ALL
INTO Link (acct_id, con_id)
VALUES (1, 1)
INTO Link (acct_id, con_id)
VALUES (1, 2)
INTO Link (acct_id, con_id)
VALUES (2, 3)
INTO Link (acct_id, con_id)
VALUES (2, 4)
SELECT * FROM dual
;
Query -
select * from (
select acct_id, name, Data
from contacts c, Link l
where c.con_id = l.con_id
)
pivot (max(Data) for name in ('email' as Email,'phone' as Phone));
Output -
ACCT_ID EMAIL PHONE
1 a#gmail.com 123
2 b#gmail.com 456

Related

Get records from table with join where records in join not contain specific value

I have two tables:
Table user:
create table user (
id bigserial not null primary key,
username varchar(256),
active boolean not null default true
);
And table address:
create table address (
id bigserial not null primary key,
user_id integer not null,
country varchar(256),
city varchar(256),
street varchar(256)
);
And some data as example:
insert into user(id, username, active) values (1, 'john', true);
insert into user(id, username, active) values (2, 'alex', true);
insert into user(id, username, active) values (3, 'alice', true);
insert into user(id, username, active) values (4, 'tom', true);
insert into user(id, username, active) values (5, 'dave', true);
insert into address(id, user_id, country, city, street) values (1, 1, 'Germany', 'Berlin', '');
insert into address(id, user_id, country, city, street) values (2, 2, 'Germany', 'Berlin', '');
insert into address(id, user_id, country, city, street) values (3, 2, 'Great Britain', 'London', '');
insert into address(id, user_id, country, city, street) values (4, 3, 'France', 'Paris', '');
insert into address(id, user_id, country, city, street) values (5, 4, 'USA', 'New York', '');
insert into address(id, user_id, country, city, street) values (6, 5, 'South Korea', 'Seoul', '');
Every user can have several addresses. I need to get all users who doesn't have in their set of addresses address with specific country, for example 'Germany'.
What I tried:
select u.* from user u
left join address a on u.id=a.user_id where a.country is not like '%Germany%'
But it returns users, who have address with specific country but also have some other address, which country is different from the specific one, for example with the data used above this is alex, who has two addresses Germany and Great Britain:
id username active
--------------------
2 alex True
3 alice True
4 tom True
5 dave True
Any suggestions how can I do such query?
Your code checks whether each user has at least one address outside of Germany, while you want to ensure that they have none.
I would recommend not exists:
select c.*
from client c
where not exists (
select 1
from address a
where a.user_id = c.id and a.country = 'Germany'
)
This query would take advantage of an index on address(user_id, country).
Note that it is unclear whether your table is called user or client... I used the latter.
Note that this also returns clients that have no address at all. If that's not what you want, then an alternative uses aggregation:
select c.*
from client c
inner join address on a.user_id = c.id
group by c.id
having not bool_or(a.country = 'Germany')
This is the query:
select user_id from address where country = 'Germany'
that returns all the users that you want to filter out.
Use it with NOT IN:
select u.*
from user u
where id not in (select user_id from address where country = 'Germany')
See the demo.
Results:
> id | username | active
> -: | :------- | :-----
> 3 | alice | t
> 4 | tom | t
> 5 | dave | t

How to get data from multiple tables using SQL query

I have three Tables Administration with relation one-to-many with Telephone, Fax Tables:
Administration : Id_Administration, Lib_Administration
Telephone: Id_Phone, Phone_Number, Id_Administration
Fax: Id_Fax, Fax_Number, Id_Administration
Administration table contains:
Id_Administration Lib_Administration
1 adminstration1
2 adminstration2
Telephone table contains:
Id_Phone Phone_Number Id_Administration
1 0313131 1
2 0212121 1
3 0353535 2
4 0343434 2
Fax table contains:
Id_Fax Fax_Number Id_Administration
1 0323232 1
2 0363636 2
3 0373737 2
I want to make a query to show this result:
Id_Administration Lib_Administration Phone_Number Fax_Number
1 adminstration1 0313131 0323232
0212121
2 adminstration2 0353535 0363636
0343434 0373737
I used this query
SELECT Administration.Id_Administration, Administration.Lib_Administration,Telephone.Phone_Number, Fax.Fax_Number
FROM ((Administration INNER JOIN
Telephone ON Administration.Id_Administration = Telephone.Id_Administration) INNER JOIN
Fax ON Administration.Id_Administration = Fax.Id_Administration)
But the result was iterated like this:
Id_Administration Lib_Administration Phone_Number Fax_Number
1 adminstration1 0313131 0323232
0212121 0323232
2 adminstration2 0353535 0363636
0343434 0363636
0353535 0373737
0343434 0373737
I used left join but i didn't get the right result, so where is the problem in my query?
Everything is possible, but sometimes hard work to achieve our goal is not worth it. Are you trying to prepare a recordset for your form?
In this scenario I would have 2 subforms, for phone and fax numbers. No need to load all in one recordset.
I don't think the original table structure works. In the posted DDL you have administration-level granularity. What you want to do is pair a phone number with a fax number- I would call this the office-level granularity. What you can do is add another level of granularity by creating an Office table. In the inserts, link each phone and fax number also to this office ID. For the Select, join the phone and fax tables also to this Id_Office. This should pair the phones by both Organization and Office to give the expected output.
For example if you include an Office table similar to below:
CREATE TABLE Administration (Id_Administration int, Lib_Administration varchar(100))
INSERT INTO Administration (Id_Administration, Lib_Administration) VALUES
(1, 'adminstration1'), (2, 'adminstration2')
CREATE TABLE Office(Id_Administration int, Id_Office int, Lib_Office varchar(100))
INSERT INTO Office (Id_Administration, Id_Office, Lib_Office) VALUES
(1, 1, 'A1 Office 1'), (1, 2, 'A1 Office 2')
,(2, 1, 'A2 Office 2'), (2, 2, 'A2 Office 2')
CREATE TABLE Fax (Id_Fax int, Fax_Number char(7), Id_Administration int, Id_Office int)
INSERT INTO Fax (Id_Fax, Fax_Number, Id_Administration, Id_Office) VALUES
(1, '0323232', 1, 1), (2, '0363636', 2, 1)
,(3, '0373737', 2, 2)
CREATE TABLE Telephone (Id_Phone int, Phone_Number char(7), Id_Administration int, Id_Office int)
INSERT INTO Telephone (Id_Phone, Phone_Number, Id_Administration, Id_Office) VALUES
(1, '0313131', 1, 1)
,(2, '0212121', 1, 2)
,(3, '0353535', 2, 1)
,(4, '0343434', 2, 2)
SELECT A.Id_Administration, A.Lib_Administration, T.Phone_Number, COALESCE(F.Fax_Number, '') AS Fax_Number
FROM Administration A LEFT JOIN Office O ON A.Id_Administration = O.Id_Administration
LEFT JOIN Fax F ON F.Id_Administration = A.Id_Administration AND F.Id_Office = O.Id_Office
LEFT JOIN Telephone T ON T.Id_Administration = A.Id_Administration AND T.Id_Office = O.Id_Office
This produces output:
Id_Administration Lib_Administration Phone_Number Fax_Number
1 adminstration1 0313131 0323232
1 adminstration1 0212121
2 adminstration2 0353535 0363636
2 adminstration2 0343434 0373737
http://sqlfiddle.com/#!18/27964/3/0

Querying SQL Server table with different values in same column with same ID [duplicate]

This question already has answers here:
Querying SQL table with different values in same column with same ID
(2 answers)
Closed 6 years ago.
I have an SQL Server 2012 table with ID, First Name and Last name. The ID is unique per person but due to an error in the historical feed, different people were assigned the same id.
------------------------------
ID FirstName LastName
------------------------------
1 ABC M
1 ABC M
1 ABC M
1 ABC N
2 BCD S
3 CDE T
4 DEF T
4 DEF T
There are two ID's which are present multiple time. 1 and 4. The rows with id 4 are identical. I dont want this in my result. The rows with ID 1, although the first name is same, the last name is different for 1 row. I want only those ID's whose ID is same but one of the first or last names is different.
I tried loading ID's which have multiple occurrences into a temp table and tried to compare it against the parent table albeit unsuccessfully. Any other ideas that I can try and implement?
This is the output I am looking for
ID
---
1
If you want the ids, then use aggregation and having:
select id
from t
group by id
having min(firstname) <> max(firstname) or min(lastname) <> max(lastname);
Try This:
CREATE TABLE #myTable(id INT, firstname VARCHAR(50), lastname VARCHAR(50))
INSERT INTO #myTable VALUES
(1, 'ABC', 'M'),
(1, 'ABC', 'M'),
(1, 'ABC', 'M'),
(1, 'ABC', 'N'),
(2, 'BCD', 'S'),
(3, 'CDE', 'T'),
(4, 'DEF', 'T'),
(4, 'DEF', 'T')
SELECT id FROM (
SELECT DISTINCT id, firstname, lastname
FROM #myTable) t GROUP BY id HAVING COUNT(*)>1
OUTPUT is : 1

Check duplicates in sql table and replace the duplicates ID in another table

I have a table with duplicate entries (I forgot to make NAME column unique)
So I now have this Duplicate entry table called 'table 1'
ID NAME
1 John F Smith
2 Sam G Davies
3 Tom W Mack
4 Bob W E Jone
5 Tom W Mack
IE ID 3 and 5 are duplicates
Table 2
ID NAMEID ORDERS
1 2 item4
2 1 item5
3 4 item6
4 3 item23
5 5 item34
NAMEID are ID from table 1. Table 2 ID 4 and 5 I want to have NAMEID of 3 (Tom W Mack's Orders) like so
Table 2 (correct version)
ID NAMEID ORDERS
1 2 item4
2 1 item5
3 4 item6
4 3 item23
5 3 item34
Is there an easy way to find and update the duplicates NAMEID in table 2 then remove the duplicates from table 1
In this case what you can do is.
You can find how many duplicate records you have.
In Order to find duplicate records you can use.
SELECT ID, NAME,COUNT(1) as CNT FROM TABLE1 GROUP BY ID, NAME
This is will give you the count and you find all the duplicate records
and delete them manually.
Don't forget to alter your table after removing all the duplicate records.
Here's how you can do it:
-- set up the environment
create table #t (ID int, NAME varchar(50))
insert #t values
(1, 'John F Smith'),
(2, 'Sam G Davies'),
(3, 'Tom W Mack'),
(4, 'Bob W E Jone'),
(5, 'Tom W Mack')
create table #t2 (ID int, NAMEID int, ORDERS varchar(10))
insert #t2 values
(1, 2, 'item4'),
(2, 1, 'item5'),
(3, 4, 'item6'),
(4, 3, 'item23'),
(5, 5, 'item34')
go
-- update the referencing table first
;with x as (
select id,
first_value(id) over(partition by name order by id) replace_with
from #t
),
y as (
select #t2.nameid, x.replace_with
FROM #t2
join x on #t2.nameid = x.id
where #t2.nameid <> x.replace_with
)
update y set nameid = replace_with
-- delete duplicates from referenced table
;with x as (
select *, row_number() over(partition by name order by id) rn
from #t
)
delete x where rn > 1
select * from #t
select * from #t2
Pls, test first for performance and validity.
Let's use the example data
INSERT INTO TableA
(`ID`, `NAME`)
VALUES
(1, 'NameA'),
(2, 'NameB'),
(3, 'NameA'),
(4, 'NameC'),
(5, 'NameB'),
(6, 'NameD')
and
INSERT INTO TableB
(`ID`, `NAMEID`, `ORDERS`)
VALUES
(1, 2, 'itemB1'),
(2, 1, 'itemA1'),
(3, 4, 'itemC1'),
(4, 3, 'itemA2'),
(5, 5, 'itemB2'),
(5, 6, 'itemD1')
(makes it a bit easier to spot the duplicates and check the result)
Let's start with a simple query to get the smallest ID for a given NAME
SELECT
NAME, min(ID)
FROM
tableA
GROUP BY
NAME
And the result is [NameA,1], [NameB,2], [NameC,4], [NameD,6]
Now if you use that as an uncorrelated subquery for a JOIN with the base table like
SELECT
keep.kid, dup.id
FROM
tableA as dup
JOIN
(
SELECT
NAME, min(ID) as kid
FROM
tableA
GROUP BY
NAME
) as keep
ON
keep.NAME=dup.NAME
AND keep.kid<dup.id
It finds all duplicates that have the same name as in the result of the subquery but a different id + it also gives you the id of the "original", i.e. the smallest id for that name.
For the example it's [1,3], [2,5]
Now you can use that in an UPDATE query like
UPDATE
TableB as b
JOIN
tableA as dup
JOIN
(
SELECT
NAME, min(ID) as kid
FROM
tableA
GROUP BY
NAME
) as keep
ON
keep.NAME=dup.NAME
AND keep.kid<dup.id
SET
b.NAMEID=keep.kid
WHERE
b.NAMEID=dup.id
And the result is
ID,NAMEID,ORDERS
1, 2, itemB1
2, 1, itemA1
3, 4, itemC1
4, 1, itemA2 <- now has NAMEID=1
5, 2, itemB2 <- now has NAMEID=2
5, 6, itemD1
To eleminate the duplicates from tableA you can use the first query again.

How to detect duplicate records with sub table records

Let's say I'm creating an address book in which the main table contains the basic contact information and a phone number sub table -
Contact
===============
Id [PK]
Name
PhoneNumber
===============
Id [PK]
Contact_Id [FK]
Number
So, a Contact record may have zero or more related records in the PhoneNumber table. There is no constraint on uniqueness of any column other than the primary keys. In fact, this must be true because:
Two contacts having different names may share a phone number, and
Two contacts may have the same name but different phone numbers.
I want to import a large dataset which may contain duplicate records into my database and then filter out the duplicates using SQL. The rules for identifying duplicate records are simple ... they must share the same name and the same number of phone records having the same content.
Of course, this works quite effectively for selecting duplicates from the Contact table but doesn't help me to detect actual duplicates given my rules:
SELECT * FROM Contact
WHERE EXISTS
(SELECT 'x' FROM Contact t2
WHERE t2.Name = Contact.Name AND
t2.Id > Contact.Id);
It seems as if what I want is a logical extension to what I already have, but I must be overlooking it. Any help?
Thanks!
In my question, I created a greatly simplified schema that reflects the real-world problem I'm solving. Przemyslaw's answer is indeed a correct one and did what I was asking both with the sample schema and, when extended, with the real one.
But, after doing some experiments with the real schema and a larger (~10k records) dataset, I found that performance was an issue. I don't claim to be an index guru, but I wasn't able to find a better combination of indices than what was already in the schema.
So, I came up with an alternate solution which fills the same requirements but executes in a small fraction (< 10%) of the time, at least using SQLite3 - my production engine. In hopes that it may assist someone else, I'll offer it as an alternative answer to my question.
DROP TABLE IF EXISTS Contact;
DROP TABLE IF EXISTS PhoneNumber;
CREATE TABLE Contact (
Id INTEGER PRIMARY KEY,
Name TEXT
);
CREATE TABLE PhoneNumber (
Id INTEGER PRIMARY KEY,
Contact_Id INTEGER REFERENCES Contact (Id) ON UPDATE CASCADE ON DELETE CASCADE,
Number TEXT
);
INSERT INTO Contact (Id, Name) VALUES
(1, 'John Smith'),
(2, 'John Smith'),
(3, 'John Smith'),
(4, 'Jane Smith'),
(5, 'Bob Smith'),
(6, 'Bob Smith');
INSERT INTO PhoneNumber (Id, Contact_Id, Number) VALUES
(1, 1, '555-1212'),
(2, 1, '222-1515'),
(3, 2, '222-1515'),
(4, 2, '555-1212'),
(5, 3, '111-2525'),
(6, 4, '111-2525');
COMMIT;
SELECT *
FROM Contact c1
WHERE EXISTS (
SELECT 1
FROM Contact c2
WHERE c2.Id > c1.Id
AND c2.Name = c1.Name
AND (SELECT COUNT(*) FROM PhoneNumber WHERE Contact_Id = c2.Id) = (SELECT COUNT(*) FROM PhoneNumber WHERE Contact_Id = c1.Id)
AND (
SELECT COUNT(*)
FROM PhoneNumber p1
WHERE p1.Contact_Id = c2.Id
AND EXISTS (
SELECT 1
FROM PhoneNumber p2
WHERE p2.Contact_Id = c1.Id
AND p2.Number = p1.Number
)
) = (SELECT COUNT(*) FROM PhoneNumber WHERE Contact_Id = c1.Id)
)
;
The results are as expected:
Id Name
====== =============
1 John Smith
5 Bob Smith
Other engines are bound to have differing performance which may be quite acceptable. This solution seems to work quite well with SQLite for this schema.
The author stated the requirement of "two people being the same person" as:
Having the same name and
Having the same number of phone numbers and all of which are the same.
So the problem is a bit more complex than it seems (or maybe I just overthought it).
Sample data and (an ugly one, I know, but the general idea is there) a sample query which I tested on below test data which seems to be working correctly (I'm using Oracle 11g R2):
CREATE TABLE contact (
id NUMBER PRIMARY KEY,
name VARCHAR2(40))
;
CREATE TABLE phone_number (
id NUMBER PRIMARY KEY,
contact_id REFERENCES contact (id),
phone VARCHAR2(10)
);
INSERT INTO contact (id, name) VALUES (1, 'John');
INSERT INTO contact (id, name) VALUES (2, 'John');
INSERT INTO contact (id, name) VALUES (3, 'Peter');
INSERT INTO contact (id, name) VALUES (4, 'Peter');
INSERT INTO contact (id, name) VALUES (5, 'Mike');
INSERT INTO contact (id, name) VALUES (6, 'Mike');
INSERT INTO contact (id, name) VALUES (7, 'Mike');
INSERT INTO phone_number (id, contact_id, phone) VALUES (1, 1, '123'); -- John having number 123
INSERT INTO phone_number (id, contact_id, phone) VALUES (2, 1, '456'); -- John having number 456
INSERT INTO phone_number (id, contact_id, phone) VALUES (3, 2, '123'); -- John the second having number 123
INSERT INTO phone_number (id, contact_id, phone) VALUES (4, 2, '456'); -- John the second having number 456
INSERT INTO phone_number (id, contact_id, phone) VALUES (5, 3, '123'); -- Peter having number 123
INSERT INTO phone_number (id, contact_id, phone) VALUES (6, 3, '456'); -- Peter having number 123
INSERT INTO phone_number (id, contact_id, phone) VALUES (7, 3, '789'); -- Peter having number 123
INSERT INTO phone_number (id, contact_id, phone) VALUES (8, 4, '456'); -- Peter the second having number 456
INSERT INTO phone_number (id, contact_id, phone) VALUES (9, 5, '123'); -- Mike having number 456
INSERT INTO phone_number (id, contact_id, phone) VALUES (10, 5, '456'); -- Mike having number 456
INSERT INTO phone_number (id, contact_id, phone) VALUES (11, 6, '123'); -- Mike the second having number 456
INSERT INTO phone_number (id, contact_id, phone) VALUES (12, 6, '789'); -- Mike the second having number 456
-- Mike the third having no number
COMMIT;
-- does not meet the requirements described in the question - will return Peter when it should not
SELECT DISTINCT c.name
FROM contact c JOIN phone_number pn ON (pn.contact_id = c.id)
GROUP BY name, phone_number
HAVING COUNT(c.id) > 1
;
-- returns correct results for provided test data
-- take all people that have a namesake in contact table and
-- take all this person's phone numbers that this person's namesake also has
-- finally (outer query) check that the number of both persons' phone numbers is the same and
-- the number of the same phone numbers is equal to the number of (either) person's phone numbers
SELECT c1_id, name
FROM (
SELECT c1.id AS c1_id, c1.name, c2.id AS c2_id, COUNT(1) AS cnt
FROM contact c1
JOIN contact c2 ON (c2.id != c1.id AND c2.name = c1.name)
JOIN phone_number pn ON (pn.contact_id = c1.id)
WHERE
EXISTS (SELECT 1
FROM phone_number
WHERE contact_id = c2.id
AND phone = pn.phone)
GROUP BY c1.id, c1.name, c2.id
)
WHERE cnt = (SELECT COUNT(1) FROM phone_number WHERE contact_id = c1_id)
AND (SELECT COUNT(1) FROM phone_number WHERE contact_id = c1_id) = (SELECT COUNT(1) FROM phone_number WHERE contact_id = c2_id)
;
-- cleanup
DROP TABLE phone_number;
DROP TABLE contact;
Check at SQL Fiddle: http://www.sqlfiddle.com/#!4/36cdf/1
Edited
Answer to author's comment: Of course I didn't take that into account... here's a revised solution:
-- new test data
INSERT INTO contact (id, name) VALUES (8, 'Jane');
INSERT INTO contact (id, name) VALUES (9, 'Jane');
SELECT c1_id, name
FROM (
SELECT c1.id AS c1_id, c1.name, c2.id AS c2_id, COUNT(1) AS cnt
FROM contact c1
JOIN contact c2 ON (c2.id != c1.id AND c2.name = c1.name)
LEFT JOIN phone_number pn ON (pn.contact_id = c1.id)
WHERE pn.contact_id IS NULL
OR EXISTS (SELECT 1
FROM phone_number
WHERE contact_id = c2.id
AND phone = pn.phone)
GROUP BY c1.id, c1.name, c2.id
)
WHERE (SELECT COUNT(1) FROM phone_number WHERE contact_id = c1_id) IN (0, cnt)
AND (SELECT COUNT(1) FROM phone_number WHERE contact_id = c1_id) = (SELECT COUNT(1) FROM phone_number WHERE contact_id = c2_id)
;
We allow a situation when there are no phone numbers (LEFT JOIN) and in outer query we now compare the number of person's phone numbers - it must either be equal to 0, or the number returned from the inner query.
The keyword "having" is your friend. The generic use is:
select field1, field2, count(*) records
from whereever
where whatever
group by field1, field2
having records > 1
Whether or not you can use the alias in the having clause depends on the database engine. You should be able to apply this basic principle to your situation.