Retrieving duplicate and original rows from a table using sql query - sql

Say I have a student table with the following fields - student id, student name, age, gender, marks, class.Assume that due to some error, there are multiple entries corresponding to each student. My requirement is to identify the duplicate rows in the table and the filter criterion is the student name and the class.But in the query result, in addition to identifying the duplicate records, I also need to find the original student detail which got duplicated. Is there any method to do this. I went through this answer: SQL: How to find duplicates based on two fields?. But here it only specifies how to find the duplicate rows and not a means to identify the actual row that was duplicated. Kindly throw some light on the possible solution. Thanks.

First of all: if the columns you've listed are all in the same table, it looks like your database structure could use some normalization.
In terms of your question: I'm assuming your StudentID field is a database generated, primary key and so has not been duplicated. (If this is not the case, I think you have bigger problems than just duplicates).
I'm also assuming the duplicate row has a higher value for StudentID than the original row.
I think the following should work (Note: I haven't created a table to verify this so it might not be perfect straight away. If it doesn't it should be fairly close)
select dup.StudentID as DuplicateStudentID
dup.StudentName, dup.Age, dup.Gender, dup.Marks, dup.Class,
orig.StudentID as OriginalStudentId
from StudentTable dup
inner join (
-- Find first student record for each unique combination
select Min(StudentId) as StudentID, StudentName, Age, Gender, Marks, Class
from StudentTable t
group by StudentName, Age, Gender, Marks, Class
) orig on dup.StudentName = orig.StudenName
and dup.Age = orig.Age
and dup.Gender = orig.Gender
and dup.Marks = orig.Marks
and dup.Class = orig.Class
and dup.StudentID > orig.StudentID -- Don't identify the original record as a duplicate

Related

SQL Query to return a table of specific matching values based on a criteria

I have 3 tables in PostgreSQL database:
person (id, first_name, last_name, age)
interest (id, title, person_id REFERENCES person)
location (id, city, state text NOT NULL, country, person_id REFERENCES person)
city can be null, but state and country cannot.
A person can have many interests but only one location. My challenge is to return a table of people who share the same interest and location.
All ID's are serialized and thus created automatically.
Let's say I have 4 people living in "TX", they each have two interests a piece, BUT only person 1 and 3 share a similar interest, lets say "Guns" (cause its Texas after all). I need to select all people from person table where the person's interest title (because the id is auto generated, two Guns interest would result in two different ID keys) equals that of another persons interest title AND the city or state is also equal.
I was looking at the answer to this question here Select Rows with matching columns from SQL Server and I feel like the logic is sort of similar to my question, the difference is he has two tables, to join together where I have three.
return a table of people who share the same interest and location.
I'll interpret this as "all rows from table person where another rows exists that shares at least one matching row in interest and a matching row in location. No particular order."
A simple solution with a window function in a subquery:
SELECT p.*
FROM (
SELECT person_id AS id, i.title, l.city, l.state, l.country
, count(*) OVER (PARTITION BY i.title, l.city, l.state, l.country) AS ct
FROM interest i
JOIN location l USING (person_id)
) x
JOIN person p USING (id)
WHERE x.ct > 1;
This treats NULL values as "equal". (You did not specify clearly.)
Depending on undisclosed cardinalities, there may be faster query styles. (Like reducing to duplicative interests and / or locations first.)
Asides 1:
It's almost always better to have a column birthday (or year_of_birth) than age, which starts to bit-rot immediately.
Asides 2:
A person can have [...] only one location.
You might at least add a UNIQUE constraint on location.person_id to enforce that. (If you cannot make it the PK or just append location columns to the person table.)

How to combine multiple row values in Oracle SQL in a single table

I've been having trouble combining multiple row values into a single column for each semester.
The table (GENERALEDPATHWAY) has these columns:
*STUDENTID
*SEMESTER
*CLASS
*CLASS_COMBINATION (all values currently null)
*YEAR
*CLASS_GRADE
*ENTRYPOINT
*DEGREE
*CLASS_DISTRIBUTION
*DEGREE
*GRADUATED_IN
I'm only currently worried about the STUDENTID, SEMESTER, CLASS, and Class_Combination. Every student has a unique ID and may have a different combination of classes each semester. Instead of having a separate row for every class every semester, I want to put the class values into the CLASS_COMBINATION column. EX: instead of having 5 rows for 5 classes taken in a single semester, I just want 1 row for that semester with all classes listed alphabetically separated by commas in the CLASS_COMBINATION column.
The difficulty I'm having is that all of the information is in a single table and needs to work in Oracle SQL Developer.
Try this query:
Select studentid, semester, class,
Listagg(class) within group (order by class) as CLASS_COMBINATION
From GENERALEDPATHWAY
Group by studentid, semester, class;
If you want to update the table. Then use it in update or merge query to update each recoed and then remove the duplicate rows.
Cheers!!
What you describe is an aggregation query, with one row per student and semester.
You do not need a separate table for this. My recommendation is either a simple query or a view.
Your data structure describes STUDENTID as the primary key. However, you specify that you have multiple rows for a STUDENTID, so something is wrong -- either your explanation or the data model. From what you describe, you want:
select STUDENTID, semester,
listagg(class ', ') within group (order by class) as classes
from GENERALEDPATHWAY
group by STUDENTID, semester;

SQL - Append counter to recurring value in query output

I am in the process of creating an organizational charts for my company, and to create the chart, the data must have a unique role identifier, and a unique 'reports to role' identifier for each line. Unfortunately my data is not playing ball and it out of my scope to change the source.
I have two source tables, simplified in the image below. It is important to note a couple of things in the data.
An employees manager in the query needs to come from the [EmpData] table. The 'ReportsTo' field is only in the [Role] table to be used when a role is vacant
Any number of employees can hold the same role, but for simplicity lets assume that there will only ever be one person in the 'Reports to' role
Using this sample data, my query is as follows:
/**Join Role table with employee data table.
/**Right join so roles with more than one employee will generate a row each
SELECT [Role].RoleId As PositionId
,[EmpData].ReportsToRole As ReportsToPosition
,[Role].RoleTitle
,[Empdata].EmployeeName
FROM [Role]
RIGHT JOIN [EmpData] ON [Role].RoleId=[EmpData].[Role]
UNION
/** Output all roles that do not have a holder, 'VACANT' in employee name.
SELECT [Role].RoleId
,[Role].ReportsToRole
,[Role].RoleTitle
,'VACANT'
FROM [Role]
WHERE [Role].RoleID NOT IN (SELECT RoleID from [empdata])
This almost creates the intended output, but each operator roles has 'OPER', in the PositionId column.
For the charting software to work, each position must have a unique identifier.
Any thoughts on how to achieve this outcome? I'm specifically chasing the appended -01, -02, -03 etc. highlighted yellow in the Desired Query Output.
If you are using T-SQL, you should look into using the ROW_NUMBER operator with the PARTITON BY command and combining the column with your existing column.
Specifically, you would add a column to your select of ROW_NUMBER () OVER (PARTITION BY PositionID ORDER BY ReportsToPosition,EmployeeName) AS SeqNum
I would add that to your first query, and then, in your second, I would do something like SELECT PositionID + CASE SeqNum WHEN 1 THEN "" ELSE "-"+CAST(SeqNum AS VarChar(100)),...
There are multiple ways to do this, but this will leave out the individual ones that don't need a "-1" and only add it to the rest. The major difference between this and your scheme is it doesn't contain the "0" pad on the left, which is easy to do, nor would the first "OPER" be "OPER-1", they would simply be "OPER", but this can also be worked around.
Hopefully this gets you what you need!

DISTINCT in a simple SQL query

When executing SQL queries I have been trying to figure out the following:
In this example:
SELECT DISTINCT AL.id, AL.name
FROM albums AL
why is there a need to specify distinct? I thought that the Id being a primary key was enough to avoid duplicate results.
When you specify distinct you are specifying that you want the whole row to be distinct. For example if you have two rows:
ID=1 and Name='Joe Smith'
ID=2 and Name='Joe Smith'
then your query is going to return both rows because the different ID values make the rows distinct.
However, if you are selecting only the ID column (and it's your primary key) then the distinct is pointless.
If you're trying to find all of the unique names then you'd want to:
SELECT DISTINCT AL.name
FROM albums AL
You are right, in your case there should be no need for the word distinct because you are asking for the id and the name. Now, for sake of example where distinct is necessary, say you had multiple id's with the same name. Let It Be is an album by both the Beatles and the Replacements. And let's say you were using your database to write out labels that only included the names of the albums. The query you would want would be:
select distinct al.name
from albums al;
Sometimes your database is not perfect and it ends up with a bunch of junk data. If the id has not been designated as unique, you might end up with duplicate records, and then you might want to avoid seeing the duplicates in your query results.

Extract info from one table based on data from antoher

I am kind of new to SQL and I made a couple of tables to practice. The columns may have some unrelated categories but I don't know what else write...
Anyway, basically what i want to do is get info from two tables based on the first and last name from one table.
Here are my tables:
Order
Host
I want create a query to pull the ticket number, height, order, subtotal and total by first and last name. The only orders I want to pull are from John Smith And Sam Ting. So in the end, I want my extraction to have the following columns:
Ticket Number
First Name
Last Name
Height
Order
Subtotal
Total
Any help or direction would be awesome!
With the assumption the tables both have unique Ticket_Numbers and that will provide a one-to-one mapping between then.
SELECT
Order.Ticket_Number,
First_Name,
Last_Name,
Height,
Order,
Subtotal,
Total
FROM Order
JOIN Host on Host.Ticket_Number = Order.Ticket_Number
WHERE
(First_Name = 'John' AND Last_Name = 'Smith')
OR (First_Name = 'Sam' AND Last_Name = 'Ting')
You need to "call" the table name first, and then the column. After that you need to use the "join" for the 2 tables. And finally you need the "where". I didn't look for the details so you need to check the "names".
SELECT Order.Ticket_Number, Order.First_Name, Order.Last_Name, Order.Height, Order.Order, Cost.Subtotal, Cost.Total
FROM Order
INNER JOIN Cost
where First_Name="Jhon" and Last_Name="blablabla"
or
First_Name="SecondGuy" and Last_Name="blablabla"