Add or delete repeated row - sql

I have an output like this:
id name date school school1
1 john 11/11/2001 nyu ucla
1 john 11/11/2001 ucla nyu
2 paul 11/11/2011 uft mit
2 paul 11/11/2011 mit uft
I would like to achieve this:
id name date school school1
1 john 11/11/2001 nyu ucla
2 paul 11/11/2011 mit uft
I am using direct join as in:
select distinct
a.id, a.name,
b.date,
c.school
a1.id, a1.name,
b1.date,
c1.school
from table a, table b, table c,table a1, table b1, table c1
where
a.id=b.id
and...
Any ideas?

We will need more information such as what your tables contain and what you are after.
One thing I noticed is you have a school and then school1. 3nf states that you should never duplicate fields and append numbers to them to get more information even if you think that the relationship will only be 1 or 2 additional items. You need to create a second table that stores a user associated with 1 to many schools.

I agree with everyone else that both your source table and your desired output are poor design. While you probably can't do anything about your source table, I recommend the following code and output:
Select id, name, date, school from MyTable;
union
Select id, name, date, school1 from MyTable;
(repeat as necessary)
This will give you results in the format:
id name date school
1 john 11/11/2001 nyu
1 john 11/11/2001 ucla
2 paul 11/11/2011 mit
2 paul 11/11/2011 uft
(Note: in my version of SQL, union queries automatically select distinct records so the distinct flag isn't needed)
With this format, you could easily count the number of schools per student, number of students per school, etc.
If processing time and/or storage space is a factor here, you could then split this into 2 tables, 1 with the id,name & date, the other with the id & school (basically what JonH just said). But if you're just working up some simple statistics, this should suffice.

This problem was just too irresistable, so I just took a guess at the data structures that we are dealing with. The technology wasn't specified in the question. This is in Transact-SQL.
create table student
(
id int not null primary key identity,
name nvarchar(100) not null default '',
graduation_date date not null default getdate(),
)
go
create table school
(
id int not null primary key identity,
name nvarchar(100) not null default ''
)
go
create table student_school_asc
(
student_id int not null foreign key references student (id),
school_id int not null foreign key references school (id),
primary key (student_id, school_id)
)
go
insert into student (name, graduation_date) values ('john', '2001-11-11')
insert into student (name, graduation_date) values ('paul', '2011-11-11')
insert into school (name) values ('nyu')
insert into school (name) values ('ucla')
insert into school (name) values ('uft')
insert into school (name) values ('mit')
insert into student_school_asc (student_id, school_id) values (1,1)
insert into student_school_asc (student_id, school_id) values (1,2)
insert into student_school_asc (student_id, school_id) values (2,3)
insert into student_school_asc (student_id, school_id) values (2,4)
select
s.id,
s.name,
s.graduation_date as [date],
(select max(name) from
(select name,
RANK() over (order by name) as rank_num
from school sc
inner join student_school_asc ssa on ssa.school_id = sc.id
where ssa.student_id = s.id) s1 where s1.rank_num = 1) as school,
(select max(name) from
(select name,
RANK() over (order by name) as rank_num
from school sc
inner join student_school_asc ssa on ssa.school_id = sc.id
where ssa.student_id = s.id) s2 where s2.rank_num = 2) as school1
from
student s
Result:
id name date school school1
--- ----- ---------- ------- --------
1 john 2001-11-11 nyu ucla
2 paul 2011-11-11 mit uft

Related

Merging Duplicate Rows with SQL

I have a table that contains usernames, these names are duplicated in various forms, for example, Mr. John is replicated as John Mr. I want to combine the two rows using their unique phone numbers in SQL.
I want a new table in this form after removing the duplicates
you can do it with ROW_NUMBER window function.
First, you need to group the data by your unique column (Phone_Number), then sort by name.
Preparing the table and example data:
DECLARE #vCustomers TABLE (
Name NVARCHAR(25),
Phone_Number NVARCHAR(9),
Address NVARCHAR(25)
)
INSERT INTO #vCustomers
VALUES
('Mr John', '234881675', 'Lagos'),
('Mr Felix', '234867467', 'Atlanta'),
('Mrs Ayo', '234786959', 'Doha'),
('John Mr', '234881675', 'Lagos'),
('Mr Jude', '235689760', 'Rabat'),
('Ayo', '234786959', 'Doha'),
('Jude', '235689760', 'Rabat')
After that, removing the duplicate rows:
DELETE
vc
FROM (
SELECT
ROW_NUMBER() OVER(PARTITION BY Phone_Number ORDER BY Name DESC) AS RN
FROM #vCustomers
) AS vc
WHERE RN > 1
SELECT * FROM #vCustomers
As final, the result:
Name
Phone_Number
Address
Mr John
234881675
Lagos
Mr Felix
234867467
Atlanta
Mrs Ayo
234786959
Doha
Mr Jude
235689760
Rabat

SQL - EXIST OR ALL?

I have two different table student and grades;
grade table has an attribute student_id which references student_id from student table.
How do I find which student has every grade that exists?
If this is not clear,
Student ID Name
1 1 John
2 2 Paul
3 3 George
4 4 Mike
5 5 Lisa
Grade Student_Id Course Grade
1 1 Math A
2 1 English B
3 1 Physics C
4 2 Math A
5 2 English A
6 2 Physics B
7 3 Economics A
8 4 Art C
9 5 Biology A
Assume there is only grade a,b,c (no d, e or fail)
I want to find only John because He has grade a,b,c while
other student like Paul(2) should not be selected because he does not have grade c. It does not matter which course he took, I just need to find if he has all the grades out there available.
Feel like I should something like exist or all function in sql but not sure.
Please help. Thank you in advance.
I would use GROUP BY and HAVING, but like this:
SELECT s.Name
FROM Student s JOIN
Grade g
ON s.ID = g.Student_Id
GROUP BY s.id, s.Name
HAVING COUNT(DISTINCT g.Grade) = (SELECT COUNT(DISTINCT g2.grade) FROM grade g2);
You say "all the grades out there", so the query should not use a constant for that.
You can use HAVING COUNT(DISTINCT Grade) = 3 to check that the student has all 3 grades:
SELECT Name
FROM Student S
JOIN Grade G ON S.ID = G.Student_Id
GROUP BY Name
HAVING COUNT(DISTINCT Grade) = 3
Guessing at S.ID vs S.Student on the join. Not sure what the difference is there.
By using exists
select * from student s
where exists ( select 1
from grades g where g.Student_Id=s.ID
group by g.Student_Id
having count(distinct Grade)=3
)
Example
with Student as
(
select 1 as id,'John' as person
union all
select 2 as id,'Paul' as person
union all
select 3 as id,'jorge'
),
Grades as
(
select 1 as Graden, 1 as Student_Id, 'Math' as Course, 'A' as Grade
union all
select 2 as Graden, 1 as Student_Id, 'English' as Course, 'B' as Grade
union all
select 3 as Graden, 1 as Student_Id, 'Physics' as Course, 'C' as Grade
union all
select 4 as Graden, 2 as Student_Id, 'Math' as Course, 'A' as Grade
union all
select 5 as Graden, 2 as Student_Id, 'English' as Course, 'A' as Grade
union all
select 6 as Graden, 2 as Student_Id, 'Physics' as Course, 'B' as Grade
)
select * from Student s
where exists ( select 1
from Grades g where g.Student_Id=s.ID
group by g.Student_Id
having count(distinct Grade)=3
)
Note having count(distinct Grade)=3 i used this as in your sample data grade type is 3
Before delving into the answer, here's a working SQL Fiddle Example so you can see this in action.
As Gordon Linoff points out in his excellent answer, you should use GroupBy and Having Count(Distinct ... ) ... as an easy way to check.
However, I'd recommend changing your design to ensure that you have tables for each concern.
Currently your Grade table holds each student's grade per course. So it's more of a StudentCourse table (i.e. it's the combination of student and course that's unique / gives you that table's natural key). You should have an actual Grade table to give you the list of available grades; e.g.
create table Grade
(
Code char(1) not null constraint PK_Grade primary key clustered
)
insert Grade (Code) values ('A'),('B'),('C')
This then allows you to ensure that your query would still work if you decided to include grades D and E, without having to amend any code. It also ensures that you only have to query a small table to get the complete list of grades, rather than a potentially huge table; so will give better performance. Finally, it will also help you maintain good data; i.e. so you don't accidentally end up with students with grade X due to a typo; i.e. since the validation/constraints exist in the database.
select Name from Student s
where s.Id in
(
select sc.StudentId
from StudentCourse sc
group by sc.StudentId
having count(distinct sc.Grade) = (select count(Code) from Grade)
)
order by s.Name
Likewise, it's sensible to create a Course table. In this case holding Ids for each course; since holding the full course name in your StudentCourse table (as we're now calling it) uses up a lot more space and again lacks validation / constraints. As such, I'd propose amending your database schema to look like this:
create table Grade
(
Code char(1) not null constraint PK_Grade primary key clustered
)
insert Grade (Code) values ('A'),('B'),('C')
create table Course
(
Id bigint not null identity(1,1) constraint PK_Course primary key clustered
, Name nvarchar(128) not null constraint UK_Course_Name unique
)
insert Course (Name) values ('Math'),('English'),('Physics'),('Economics'),('Art'),('Biology')
create table Student
(
Id bigint not null identity(1,1) constraint PK_Student primary key clustered
,Name nvarchar(128) not null constraint UK_Student_Name unique
)
set identity_insert Student on --inserting with IDs to ensure the ids of these students match data from your question
insert Student (Id, Name)
values (1, 'John')
, (2, 'Paul')
, (3, 'George')
, (4, 'Mike')
, (5, 'Lisa')
set identity_insert Student off
create table StudentCourse
(
Id bigint not null identity(1,1) constraint PK_StudentCourse primary key
, StudentId bigint not null constraint FK_StudentCourse_StudentId foreign key references Student(Id)
, CourseId bigint not null constraint FK_StudentCourse_CourseId foreign key references Course(Id)
, Grade char /* allow null in case we use this table for pre-results; otherwise make non-null */ constraint FK_StudentCourse_Grade foreign key references Grade(Code)
, Constraint UK_StudentCourse_StudentAndCourse unique clustered (StudentId, CourseId)
)
insert StudentCourse (StudentId, CourseId, Grade)
select s.Id, c.Id, x.Grade
from (values
('John', 'Math', 'A')
,('John', 'English', 'B')
,('John', 'Physics', 'C')
,('Paul', 'Math', 'A')
,('Paul', 'English', 'A')
,('Paul', 'Physics', 'B')
,('George', 'Economics','A')
,('Mike', 'Art', 'C')
,('Lisa', 'Biology', 'A')
) x(Student, Course, Grade)
inner join Student s on s.Name = x.Student
inner join Course c on c.Name = x.Course

Select all entries from one table which has two specific entries in another table

So, I have 2 tables defined like this:
CREATE TABLE tblPersons (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT
);
CREATE TABLE tblHobbies (
person_id INTEGER REFERENCES tblPersons (id),
hobby TEXT
);
And for example I have 3 person added to tblPersons:
1 | John
2 | Bob
3 | Eve
And next hobbies in tblHobbies:
1 | skiing
1 | serfing
1 | hiking
1 | gunsmithing
1 | driving
2 | table tennis
2 | driving
2 | hiking
3 | reading
3 | scuba diving
And what I need, is query which will return me a list of person who have several specific hobbies.
The only thing I could've come up with, is this:
SELECT id, name FROM tblPersons
INNER JOIN tblHobbies as hobby1 ON hobby1.hobby = 'driving'
INNER JOIN tblHobbies as hobby2 ON hobby2.hobby = 'hiking'
WHERE tblPersons.id = hobby1.person_id and tblPersons.id = hobby2.person_id;
But it is rather slow. Isn't there any better solution?
First, you don't have a Primary Key on tblHobbies this is one cause of slow query (and other problems). Also you should consider creating a index on tblHobbies.hobby.
Second, I'd to advice you to create a third table to evidence N:N cardinality that exists in your model and avoid redundant hobbies. Something like:
--Person
CREATE TABLE tblPersons (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT
);
--Hobby
CREATE TABLE tblHobbies (
id INTEGER PRIMARY KEY AUTOINCREMENT,
hobby TEXT
);
--Associative table between Person and Hobby
CREATE TABLE tblPersonsHobbies (
person_id INTEGER REFERENCES tblPersons (id),
hobby_id INTEGER REFERENCES tblHobbies (id),
PRIMARY KEY (person_id, hobby_id)
);
Adds an extra table but it's worth it.
--Query on your current model
SELECT id, name FROM tblPersons
INNER JOIN tblHobbies as hobby1 ON tblPersons.id = hobby1.person_id
WHERE hobby1.hobby IN ('driving', 'hiking');
--Query on suggested model
SELECT id, name FROM tblPersons
INNER JOIN tblPersonsHobbies as personsHobby ON tblPersons.id = personsHobby.person_id
INNER JOIN tblHobbies as hobby1 ON hobby1.id = personsHobby.hobby_id
WHERE hobby1.hobby IN ('driving', 'hiking');
You can aggregate the hobbies table to get persons with both hobbies:
select person_id
from tblhobbies
group by person_id
having count(case when hobby = 'driving' then 1 end) > 0
and count(case when hobby = 'hiking' then 1 end) > 0
Or better with a WHERE clause restricting the records to read:
select person_id
from tblhobbies
where hobby in ('driving', 'hiking')
group by person_id
having count(distinct hobby) =2
(There should be a unique constraint on person + hobby in the table, though. Then you could remove the DISTINCT. And as I said in the comments section it should even be person_id + hobby_id with a separate hobbies table. EDIT: Oops, I should have read the other answer. Michal suggested this data model three hours ago already :-)
If you want the names, select from the persons table where you find the IDs in above query:
select id, name
from tblpersons
where id in
(
select person_id
from tblhobbies
where hobby in ('driving', 'hiking')
group by person_id
having count(distinct hobby) =2
);
With the better data model you'd replace
from tblhobbies
where hobby in ('driving', 'hiking')
group by person_id
having count(distinct hobby) =2
with
from tblpersonhobbies
where hobby_id in (select id from tblhobbies where hobby in ('driving', 'hiking'))
group by person_id
having count(*) =2

Check for uniqueness within update statement

I have a table that links a class to the students that are in that class:
create table class_student
(class_id int,
student_id int,
constraint class_student_u unique nonclustered (class_id, student_id))
If I want to transfer all the classes from one student to another (remove one student from all the classes he/she is enrolled in and add another student to each of the classes the old student was enrolled in), I use the following query:
update class_student
set student_id = #newStudent
where student_id = #oldStudent
and class_id not in (select class_id
from class_student
where student_id = #newStudent)
delete from class_student
where student_id = #oldStudent
How can I transfer the classes from more than one student to the new student? I can't just put where student_id in (#oldStudent1, #oldStudent2) because if both old students are in the same class, after running the above query there will be a violation of the unique constraint. Also, I'd like to do the update in as few queries if possible (I could just run the above queries twice, but I'd like to do it in fewer).
I'm using SQL Server 2008 R2.
Edit: To clarify, here's an example:
class_id student_id
===================
1 1
1 2
2 3
3 1
3 3
4 2
4 3
This means that student 1 is in class 1 and 3, student 2 is in class 1 and 4, and student 3 is in class 2, 3, and 4. If I want to transfer all the classes from student 1 to student 3, I would run the following query:
update class_student
set student_id = 3
where student_id = 1
and class_id not in (select class_id
from class_student
where student_id = 3)
delete from class_student
where student_id = 1
Our data would look like this:
class_id student_id
===================
1 3
1 2
2 3
3 3
4 2
4 3
If, instead, I had run this query:
update class_student
set student_id = 3
where student_id in (1, 2)
and class_id not in (select class_id
from class_student
where student_id = 3)
delete from class_student
where student_id in (1, 2)
Ignoring the unique constraint on the table, the data would look like this:
class_id student_id
===================
1 3
1 3
2 3
3 3
4 3
The double (1, 3) record is what I'm trying to avoid, because it will cause a unique constraint violation in the table.
When setting up the original table you should always include a unique row id with which to reference any specific row (please see below the 'identity' column called row_id):
DROP TABLE class_student
create table class_student
(
row_id int identity(1,1),
class_id int,
student_id int,
constraint class_student_u unique nonclustered (class_id, student_id)
)
insert class_student (class_id,student_id) values (1,1)
insert class_student (class_id,student_id) values (1,2)
insert class_student (class_id,student_id) values (2,3)
insert class_student (class_id,student_id) values (3,1)
insert class_student (class_id,student_id) values (3,3)
insert class_student (class_id,student_id) values (4,2)
insert class_student (class_id,student_id) values (4,3)
In a situation where students 1 and 2 are leaving and you are passing any classes they were taking to student 3 (unless student 3 is already attending those classes), the code could
look something like this:
WITH CTE
AS
(
SELECT row_Id,class_id,student_id,RN = ROW_NUMBER()OVER(PARTITION BY
class_id ORDER BY class_id) FROM class_student WHERE student_id in (1,2,3)
)
DELETE FROM class_student where class_id in (select class_id from
class_student group by class_id having count(class_id) > 1) and student_id
<> 3 and row_id not in (select row_id from cte where student_id <> 3 and
rn >= 2)
Update class_student set student_id = 3
I am using a 'common table expression' with 'RANK' to number each class_id according to the number of rows bearing the same class_id. To see this you can run the code below after
creating the class_student table and inserting the data (see top) but before you run the CTE code above:
WITH CTE
AS
(
SELECT row_Id,class_id,student_id,RN = ROW_NUMBER()OVER(PARTITION BY
class_id ORDER BY class_id) FROM class_student WHERE student_id in (1,2,3)
)
SELECT * FROM CTE
Because class_id 1,3 and 4 are duplicated, they have a value of 2 in the RN (Row Number) column.
I'm using this result in the CTE to delete the rows we don't need from the class_student table and this is where the importance of always having a unique row_id can be seen.
The Delete query deletes rows in the class_student table which are Class ID duplicates. In the case of a class attended by both student 3 and one or both of the other students it
takes the rows where the Student ID is not 3 (because Student 3 is not leaving).
To do this successfully (without taking rows that we want to retain to be assigned to student 3), it requires (by comparing row_id's) that rows where RN = 2 (i.e. class_id is duplicated)
and student_id is not 3 are retained so that we keep one of the rows for Classes that both student 1 and 2 were doing but student 3 was not (i.e. where neither student_id was 3).
Finally, update all remaining rows in the table to a student_id of 3 so that Student 3 gets all the courses.
To see the result you can run:
select * from class_student
I think you'll need at least 2 DML statements to accomplish your goal. And if you really need it to happen in one go, then you can wrap the statements in a stored procedure.
insert into class_student (class_id, student_id)
select distinct class_id, #newStudent
from class_student
where student_id in (#oldStudent1, #oldStudent2)
and class_id not in (select class_id
from class_student
where student_id = #newStudent);
delete from class_student
where student_id in (#oldStudent1, #oldStudent2);
EDIT: Fixed insert to include the "not in" clause.

SQL Server table - Update Order by

I have a SQL Server table with fields: id, city, country. I imported this table from Excel file, everything is imported successfully, but id field is not ordered by number. The tool I use imported the rows in some random number.
What kind of Update command I should use from SQL Server Management Studio Express to re-order my ids?
Do you have a primary key and a clustered index on your table? If not, id is a good candidate for a primary key and when you create that the primary key it will be the clustered index.
Assuming this is your table
create table CityCountry(id int, city varchar(10), country varchar(10))
And you add data like this.
insert into CityCountry values (2, '2', '')
insert into CityCountry values (1, '1', '')
insert into CityCountry values (4, '4', '')
insert into CityCountry values (3, '3', '')
The output of select * from CityCountry will be
id city country
----------- ---------- ----------
2 2
1 1
4 4
3 3
A column that is primary key can not accept null values so first you have to do
alter table CityCountry alter column id int not null
Then you can add the primary key
alter table CityCountry add primary key (id)
When you do select * from CityCountry now you get
id city country
----------- ---------- ----------
1 1
2 2
3 3
4 4
Just use the order by part of the select statement to order them.
If I understood you correctly, you want all the ids to have consecutive numbers 1,2,3,4...
Image your table contents is:
select *
from yourTable
id city country
----------- ---------- ----------
1 Madrid Spain
3 Lisbon Portugal
7 Moscow Russia
10 Brasilia Brazil
(4 row(s) affected)
To reorder the ids, just run this:
declare #counter int = 0
update yourTable
set #counter = id = #counter + 1
(4 row(s) affected)
You can now check, that indeed all the ids are reordered:
select *
from yourTable
id city country
----------- ---------- ----------
1 Madrid Spain
2 Lisbon Portugal
3 Moscow Russia
4 Brasilia Brazil
(4 row(s) affected)
However, you need to be careful with this. If some table has a Foreign key to this id column, then you need first to disable that FK, update this table, update the values in other tables that have FK's pointing to yourTable finally enable again the FKs
First, I think you may have some misconceptions about the purpose of the Id column. The Id column is probably a surrogate key; i.e. an arbitrary value that is unique and non-null that is never shown to the user. Thus, it should not be implied to have any inherit meaning or sequence. In fact, you should always have another column or columns that are marked as being unique to represent a "business key" or a set of values that are unique to the user. In your case, city, country should probably be unique (although you will likely need to add province or state as it is common to have the same city exist in the same country multiple times.)
Now, that said, it is possible to re-sequence your Ids if the following are true:
The Id column is not an identity column. Since this was from an import, I'm going to guess this is true.
There does not exist a relationship to the table where Cascade Update is not enabled.
You are using SQL Express 2005 or later:
Update MyTable
Set Id = T2.NewId
From (
Select Id
, Row_Number() Over ( Order By Id ) As NewId
From MyTable
) As T1
Join MyTable As T2
On T2.Id = T1.Id