Joining a table with dictionary to unique id column in different table - sql

I have two tables, Table A and Table B that I am trying to extract information from and create a resulting table.
Table A
user_ids value
{user_123, user_234} apples
{user_456, user_123} oranges
{user_234} kiwi
Table B
id name
123 John Smith
234 Jane Doe
456 John Doe
I want to join the two tables in a way that will result in the following:
Table C
user_ids value user_names
{user_123, user_234} apples {John Smith, Jane Doe}
{user_456, user_123} oranges {John Doe, John Smith}
{user_234} kiwi {Jane Doe}
Any help would be really appreciated!

Others have already encouraged you to normalize your design and there are numerous posts on why this is recommended. Using your current shared dataset, the following was done using postgres where the user_ids was treated as a text array. I also tested with the user_ids as text as used cast(user_ids as text[]) to convert it to a text array
See fiddle and result below:
Schema (PostgreSQL v11)
CREATE TABLE table_a (
"user_ids" text[],
"value" VARCHAR(7)
);
INSERT INTO table_a
("user_ids", "value")
VALUES
('{user_123, user_234}', 'apples'),
('{user_456, user_123}', 'oranges'),
('{user_234}', 'kiwi');
CREATE TABLE table_b (
"id" INTEGER,
"name" VARCHAR(10)
);
INSERT INTO table_b
("id", "name")
VALUES
('123', 'John Smith'),
('234', 'Jane Doe'),
('456', 'John Doe');
The first CTE user_values creates a row for each user_id and value. The second CTE merged_values joins table_b on the pattern user_<user_id> if it exists and ensures unique results using DISTINCT. The final projection groups based on values and users array_agg to collect all user_ids or names into a single row.
Query #1
WITH user_values AS (
SELECT
unnest(a.user_ids) user_id,
a.value
FROM
table_a a
),
merged_values AS (
SELECT DISTINCT
a.user_id,
a.value,
b.name
FROM
user_values a
LEFT JOIN
table_b b ON a.user_id = CONCAT('user_',b.id)
)
SELECT
array_agg(user_id) user_ids,
value,
array_agg(name) "names"
FROM
merged_values
GROUP BY
value;
user_ids
value
names
user_123,user_456
oranges
John Smith,John Doe
user_123,user_234
apples
John Smith,Jane Doe
user_234
kiwi
Jane Doe
View on DB Fiddle

Related

Merging Duplicate Rows with SQL

I have a table that contains usernames, these names are duplicated in various forms, for example, Mr. John is replicated as John Mr. I want to combine the two rows using their unique phone numbers in SQL.
I want a new table in this form after removing the duplicates
you can do it with ROW_NUMBER window function.
First, you need to group the data by your unique column (Phone_Number), then sort by name.
Preparing the table and example data:
DECLARE #vCustomers TABLE (
Name NVARCHAR(25),
Phone_Number NVARCHAR(9),
Address NVARCHAR(25)
)
INSERT INTO #vCustomers
VALUES
('Mr John', '234881675', 'Lagos'),
('Mr Felix', '234867467', 'Atlanta'),
('Mrs Ayo', '234786959', 'Doha'),
('John Mr', '234881675', 'Lagos'),
('Mr Jude', '235689760', 'Rabat'),
('Ayo', '234786959', 'Doha'),
('Jude', '235689760', 'Rabat')
After that, removing the duplicate rows:
DELETE
vc
FROM (
SELECT
ROW_NUMBER() OVER(PARTITION BY Phone_Number ORDER BY Name DESC) AS RN
FROM #vCustomers
) AS vc
WHERE RN > 1
SELECT * FROM #vCustomers
As final, the result:
Name
Phone_Number
Address
Mr John
234881675
Lagos
Mr Felix
234867467
Atlanta
Mrs Ayo
234786959
Doha
Mr Jude
235689760
Rabat

Find missing setting

I have two tables in my DB. Table A is informational data table and Table B is a setting table. How do I find Table A is missing one of the setting in Table B.
E.G.
Table A
username setting
Mark 1
Mark 2
Martin 2
Jane 1
Table B
Possible_Setting
1
2
3
Result Table
username missing_setting
Mark 3
Martin 1
Martin 3
Jane 2
Jane 3
Thanks for help!
This may be inefficient if table sizes are significant, owing to the cross join but its the only answer I could come up with.
SELECT a.username, b.Possible_Setting AS missing_setting
FROM
(SELECT DISTINCT username FROM TableA a) a
CROSS JOIN TableB b
WHERE
NOT EXISTS (
SELECT *
FROM TableA real_a
WHERE real_a.username = a.username
AND real_a.setting = b.Possible_Setting)
ORDER BY 1, 2
Setup code:
CREATE TABLE TableA (username varchar(20), setting tinyint)
CREATE TABLE TableB (Possible_Setting tinyint PRIMARY KEY)
INSERT TableA VALUES
('Mark', 1),
('Mark', 2),
('Martin', 2),
('Jane', 1)
INSERT TableB VALUES
(1),
(2),
(3)

One SQL statement for counting the records in the master table based on matching records in the detail table?

I have the following master table called Master and sample data
ID---------------Date
1 2014-09-07
2 2014-09-07
3 2014-09-08
The following details table called Details
masterId-------------Name
1 John Walsh
1 John Jones
2 John Carney
1 Peter Lewis
3 John Wilson
Now I want to find out the count of Master records (grouped on the Date column) whose corresponding details record with Name having the value "John".
I cannot figure how to write a single SQL statement for this job.
**Please note that join is needed in order to find master records for count. However, such join creates duplicate master records for count. I need to remove such duplicate records from being counted when grouping on the Date column in the Master table.
The correct results should be:
count: grouped on Date column
2 2014-09-07
1 2014-09-08
**
Thanks and regards!
This answer assumes the following
The Name field is always FirstName LastName
You are looking once and only once for the John firstname. The search criteria would be different, pending what you need
SELECT Date, Count(*)
FROM tblmaster
INNER JOIN tbldetails ON tblmaster.ID=tbldetails.masterId
WHERE NAME LIKE 'John%'
GROUP BY Date, tbldetails.masterId
What we're doing here is using a wilcard character in our string search to say "Look for John where any characters of any length follows".
Also, here is a way to create table variables based on what we're working with
DECLARE #tblmaster as table(
ID int,
[date] datetime
)
DECLARE #tbldetails as table(
masterID int,
name varchar(50)
)
INSERT INTO #tblmaster (ID,[date])
VALUES
(1,'2014-09-07'),(2,'2014-09-07'),(3,'2014-09-08')
INSERT INTO #tbldetails(masterID, name) VALUES
(1,'John Walsh'),
(1,'John Jones'),
(2,'John Carney'),
(1,'Peter Lewis'),
(3,'John Wilson')
Based on all comments below, this SQL statement in it's clunky glory should do the trick.
SELECT date,count(t1.ID) FROM #tblmaster mainTable INNER JOIN
(
SELECT ID, COUNT(*) as countOfAll
FROM #tblmaster t1
INNER JOIN #tbldetails t2 ON t1.ID=t2.masterId
WHERE NAME LIKE 'John%'
GROUP BY id)
as t1 on t1.ID = mainTable.id
GROUP BY mainTable.date
Is this what you want?
select date, count(distinct m.id)
from master m join
details d
on d.masterid = m.id
where name like '%John%'
group by date;

SQL Query Retrieving Latest Row When Other Columns Are Equal

I'm having trouble figuring out the SQL statement to retrieve a specific set of data. Where all columns are equal except the last update date, I want the most recent. For example.
Book Author Update
John Foo 1/21/2010
John Foo 1/22/2010
Fred Foo2 1/21/2010
Fred Foo2 1/22/2010
What's the query that retrieves the most recent rows? That is, the query that returns:
Book Author Update
John Foo 1/22/2010
Fred Foo2 1/22/2010
TIA,
Steve
SELECT
book,
author,
MAX(update)
FROM
My_Table
GROUP BY
book,
author
This only works in this particular case because all of the other columns have the same value. If you wanted to get the latest row by book, but where the author (or some other column that you will retrieve) might be different then you could use:
SELECT
T.book,
T.author,
T.update
FROM
(SELECT book, MAX(update) AS max_update FROM My_Table GROUP BY book) SQ
INNER JOIN My_Table T ON
T.book = SQ.book AND
T.update = SQ.max_update
Fixed it up
DROP TABLE #tmpBooks
CREATE TABLE #tmpBooks
(
Book VARCHAR(100),
Author VARCHAR(100),
Updated DATETIME
)
INSERT INTO #tmpBooks VALUES ('Foo', 'Bar', '1/1/1980')
INSERT INTO #tmpBooks VALUES ('Foo', 'Bar', '1/1/1990')
INSERT INTO #tmpBooks VALUES ('Foo', 'Bar', '1/1/2000')
INSERT INTO #tmpBooks VALUES ('Foo', 'Bar', '1/1/2010')
INSERT INTO #tmpBooks VALUES ('Foo2', 'Bar2', '1/1/1980')
INSERT INTO #tmpBooks VALUES ('Foo2', 'Bar2', '1/1/1990')
INSERT INTO #tmpBooks VALUES ('Foo2', 'Bar2', '1/1/2000')
SELECT Book, Author, Max(Updated) as MaxUpdated
FROM #tmpBooks
GROUP BY Book, Author
Results:
Book Author MaxUpdated
--------------- --------------- -----------------------
Foo Bar 2010-01-01 00:00:00.000
Foo2 Bar2 2000-01-01 00:00:00.000
(2 row(s) affected)
This will get what you asked for, but something tells me it's not what you want.
SELECT Book, Author, MAX(Update)
FROM BookUpdates
GROUP BY Book, Author
Is there more to the table schema?
Try this (don't have the data to test on but it should work):
SELECT
bu.Book,
bu.Author
FROM
BookUpdates bu
JOIN
(SELECT MAX(Updated) as Date FROM BookUpdates) max
WHERE
bu.Updated = max.Date;

Add or delete repeated row

I have an output like this:
id name date school school1
1 john 11/11/2001 nyu ucla
1 john 11/11/2001 ucla nyu
2 paul 11/11/2011 uft mit
2 paul 11/11/2011 mit uft
I would like to achieve this:
id name date school school1
1 john 11/11/2001 nyu ucla
2 paul 11/11/2011 mit uft
I am using direct join as in:
select distinct
a.id, a.name,
b.date,
c.school
a1.id, a1.name,
b1.date,
c1.school
from table a, table b, table c,table a1, table b1, table c1
where
a.id=b.id
and...
Any ideas?
We will need more information such as what your tables contain and what you are after.
One thing I noticed is you have a school and then school1. 3nf states that you should never duplicate fields and append numbers to them to get more information even if you think that the relationship will only be 1 or 2 additional items. You need to create a second table that stores a user associated with 1 to many schools.
I agree with everyone else that both your source table and your desired output are poor design. While you probably can't do anything about your source table, I recommend the following code and output:
Select id, name, date, school from MyTable;
union
Select id, name, date, school1 from MyTable;
(repeat as necessary)
This will give you results in the format:
id name date school
1 john 11/11/2001 nyu
1 john 11/11/2001 ucla
2 paul 11/11/2011 mit
2 paul 11/11/2011 uft
(Note: in my version of SQL, union queries automatically select distinct records so the distinct flag isn't needed)
With this format, you could easily count the number of schools per student, number of students per school, etc.
If processing time and/or storage space is a factor here, you could then split this into 2 tables, 1 with the id,name & date, the other with the id & school (basically what JonH just said). But if you're just working up some simple statistics, this should suffice.
This problem was just too irresistable, so I just took a guess at the data structures that we are dealing with. The technology wasn't specified in the question. This is in Transact-SQL.
create table student
(
id int not null primary key identity,
name nvarchar(100) not null default '',
graduation_date date not null default getdate(),
)
go
create table school
(
id int not null primary key identity,
name nvarchar(100) not null default ''
)
go
create table student_school_asc
(
student_id int not null foreign key references student (id),
school_id int not null foreign key references school (id),
primary key (student_id, school_id)
)
go
insert into student (name, graduation_date) values ('john', '2001-11-11')
insert into student (name, graduation_date) values ('paul', '2011-11-11')
insert into school (name) values ('nyu')
insert into school (name) values ('ucla')
insert into school (name) values ('uft')
insert into school (name) values ('mit')
insert into student_school_asc (student_id, school_id) values (1,1)
insert into student_school_asc (student_id, school_id) values (1,2)
insert into student_school_asc (student_id, school_id) values (2,3)
insert into student_school_asc (student_id, school_id) values (2,4)
select
s.id,
s.name,
s.graduation_date as [date],
(select max(name) from
(select name,
RANK() over (order by name) as rank_num
from school sc
inner join student_school_asc ssa on ssa.school_id = sc.id
where ssa.student_id = s.id) s1 where s1.rank_num = 1) as school,
(select max(name) from
(select name,
RANK() over (order by name) as rank_num
from school sc
inner join student_school_asc ssa on ssa.school_id = sc.id
where ssa.student_id = s.id) s2 where s2.rank_num = 2) as school1
from
student s
Result:
id name date school school1
--- ----- ---------- ------- --------
1 john 2001-11-11 nyu ucla
2 paul 2011-11-11 mit uft