Oracle SQL to get Unique Records - sql

Does anyone know the sql to pull 4 rows from the following table which contains 8 rows?
Just want one row for each arbitrary person.
The real data will be thousands of records so it must be generic and use only the ID's not the names.
table

You seem to have a symmetric relationship. So, you can do:
select t.*
from t
where t.id < t.pid;

select
ID,
FName,
LName
from your_table
union
select
PID,
PFName,
PLName
from your_table
order by 3, 2, 1

Related

SQL - How to select rows with the same ID values where all other column values are also identical

I cannot find the answer to the abovementioned problem in the title.
I need to select all rows that have the identical ID and all other column values must also be identical as well. This table consists of 20 columns.
Any suggestion would be much appreciated! Many thanks.
How about this
select id, name, ...other fields
from my_table
where id in (
Select id, count(id)
from my_table
group by id, name, ...other fields
having count(id) > 1
)
Change group by and where conditions accordingly

Is it possible to UNION distinct rows but disregard one column to determine uniqueness?

select d.id, d.registration_number
from DOCUMENTS d
union
select dd.id, dd.registration_number
from DIFFERENT_DOCUMENTS dd
Would it be possible to union those results based solely on the uniqueness of the registration_number, disregarding the id of the documents?
Or, is it possible to achieve the same result in a different way?
Just to add: actually I'm unioning 5 queries, each ~20 lines long, with 4 columns that should be disregarded in determining uniqueness.
you basically need to wrap the unioned data with something else to get only the ones you want.
SELECT min(id), registration_number
FROM (SELECT id, registration_number
FROM documents
UNION ALL
SELECT id, registration_number
FROM different_documents)
GROUP BY registration_number
Union will check the combination of all the columns for uniqueness. You could, however, use union all (that does not remove duplicates) and then apply the logic yourself using the row_number window function:
SELECT id, registration_number
FROM (SELECT id, registration_number,
ROW_NUMBER() OVER (PARTITION BY registration_number ORDER BY id) AS rn
FROM (SELECT id, registration_number
FROM documents
UNION ALL
SELECT id, registration_number
FROM different_documents) u
) r
WHERE rn = 1
Since the other answers are already correct, may I ask why do you need to retrieve other columns in that query since the primary purpose appear to gather unique registration numbers?
Wouldn't it be simpler to first gather unique registration number and then retrieve the other info?
Or in your actual query, first gather the info without the columns that should be disregarded and then gather the info in these column if need be?
Like,for example, making a view with
SELECT d.registration_number
FROM DOCUMENT d
UNION
SELECT dd.registration_number
FROM DIFFERENT_DOCUMENT dd
and then gather information using that view and JOINS?
Assuming registration_number is unique in each table, you can use not exists:
select d.id, d.registration_number
from DOCUMENTS d
union all
select dd.id, dd.registration_number
from DIFFERENT_DOCUMENTS dd
where not exists (select 1
from DOCUMENTS d
where dd.registration_number = d.registration_number
);

How to count unique rows in Oracle

I have an oracle database table with a lot of columns. I'd like to count the number of fully unique rows. The only thing I could find is:
SELECT COUNT(DISTINCT col_name) FROM table;
This however would require me listing all the columns and I haven't been able to come up with syntax that will do that for me. I'm guessing the reason for that is that this query would be very low performance? Is there a recommended way of doing this?
How about
SELECT COUNT(*)
FROM (SELECT DISTINCT * FROM Table)
It depends on what you are trying to accomplish.
To get a count of the distinct rows by specific column, so that you know what data exists, and how many of that distinct data there are:
SELECT DISTINCT
A_CODE, COUNT(*)
FROM MY_ARCHV
GROUP BY A_CODE
--This informs me there are 93 unique codes, and how many of each of those codes there are.
Another method
--How to count how many of a type value exists in an oracle table:
select A_CDE, --the value you need to count
count(*) as numInstances --how many of each value
from A_ARCH -- the table where it resides
group by A_CDE -- sorting method
Either way, you get something that looks like this:
A_CODE Count(*)
1603 32
1600 2
1605 14
I think you want a count of all distinct rows from a table like this
select count(1) as c
from (
select distinct *
from tbl
) distinct_tbl;
SELECT DISTINCT **col_name**, count(*) FROM **table_name** group by **col_name**

Check if tables are identical using SQL in Oracle

I was asked this question during an interview for a Junior Oracle Developer position, the interviewer admitted it was a tough one:
Write a query/queries to check if the table 'employees_hist' is an exact copy of the table 'employees'. Any ideas how to go about this?
EDIT: Consider that tables can have duplicate records so a simple MINUS will not work in this case.
EXAMPLE
EMPLOYEES
NAME
--------
Jack Crack
Jack Crack
Jill Hill
These two would not be identical.
EMPLOYEES_HIST
NAME
--------
Jack Crack
Jill Hill
Jill Hill
If the tables have the same columns, you can use this; this will return no rows if the rows in both tables are identical:
(
select * from test_data_01
minus
select * from test_data_02
)
union
(
select * from test_data_02
minus
select * from test_data_01
);
Identical regarding what? Metadata or the actual table data too?
Anyway, use MINUS.
select * from table_1
MINUS
select * from table_2
So, if the two tables are really identical, i.e. the metadata and the actual data, it would return no rows. Else, it would prove that the data is different.
If, you receive an error, it would mean the metadata itself is different.
Update If the data is not same, and that one of the table has duplicates.
Just select the unique records from one of the table, and simply apply MINUS against the other table.
One possible solution, which caters for duplicates, is to create a subquery which does a UNION on the two tables, and includes the number of duplicates contained within each table by grouping on all the columns. The outer query can then group on all the columns, including the row count column. If the table match, there should be no rows returned:
create table employees (name varchar2(100));
create table employees_hist (name varchar2(100));
insert into employees values ('Jack Crack');
insert into employees values ('Jack Crack');
insert into employees values ('Jill Hill');
insert into employees_hist values ('Jack Crack');
insert into employees_hist values ('Jill Hill');
insert into employees_hist values ('Jill Hill');
with both_tables as
(select name, count(*) as row_count
from employees
group by name
union all
select name, count(*) as row_count
from employees_hist
group by name)
select name, row_count from both_tables
group by name, row_count having count(*) <> 2;
gives you:
Name Row_count
Jack Crack 1
Jack Crack 2
Jill Hill 1
Jill Hill 2
This tells you that both names appear once in one table and twice in the other, and therefore the tables don't match.
select name, count(*) n from EMPLOYEES group by name
minus
select name, count(*) n from EMPLOYEES_HIST group by name
union all (
select name, count(*) n from EMPLOYEES_HIST group by name
minus
select name, count(*) n from EMPLOYEES group by name)
You could merge the two tables and then subtract one of the tables from the result. If the result of the subtraction is an empty table then you know that the the tables must be the same since merge had no effect (every row and column were effectively the same)
How do I merge two tables with different column number while removing duplicates?
That link provides a good way to merge the two tables without duplicates without knowing what the columns are.
Ensure the rows are unique by adding a pseudo column
WITH t1 AS
(SELECT <All_Columns>
, row_number() OVER
(PARTITION BY <All_Columns>
ORDER BY <All_Columns>) row_num
FROM employees)
, t2 AS
(SELECT <All_Columns>
, row_number() OVER
(PARTITION BY <All_Columns>
ORDER BY <All_Columns>) row_num
FROM employees_hist)
(SELECT *
FROM t1
MINUS
SELECT *
FROM t2
UNION ALL
(SELECT *
FROM t1
MINUS
SELECT *
FROM t2)
Use row_number to make sure there are no duplicate rows. Now you can use minus and if there are no results, the tables are identical.
SELECT ROW_NUMBER() OVER (Order By Name), *
FROM tab1
MINUS
SELECT ROW_NUMBER() OVER (Order By Name), *
FROM tab2

Create a UNION query that identifies which table the unique data came from

I have two tables with data. Both tables have a CUSTOMER_ID column (which is numeric). I am trying to get a list of all the unique values for CUSTOMER_ID and know whether or not the CUSTOMER_ID exists in both tables or just one (and which one).
I can easily get a list of the unique CUSTOMER_ID:
SELECT tblOne.CUSTOMER_ID
FROM tblOne.CUSTOMER_ID
UNION
SELECT tblTwo.CUSTOMER_ID
FROM tblTwo.CUSTOMER_ID
I can't do just add an identifier column to the SELECT statemtn (like: SELECT tblOne.CUSTOMER_ID, "Table1" AS DataSource) because then the records wouldn't be unique and it will get both sets of data.
I feel I need to add it somewhere else in this query but am not sure how.
Edit for clarity:
For the union query output I need an additional column that can tell me if the unique value I am seeing exists in: (1) both tables, (2) table one, or (3) table two.
If the CUSTOMER_ID appears in both tables then we'll have to arbitrarily pick which table to call the source. The following query uses "tblOne" as the [SourceTable] in that case:
SELECT
CUSTOMER_ID,
MIN(Source) AS SourceTable,
COUNT(*) AS TableCount
FROM
(
SELECT DISTINCT
CUSTOMER_ID,
"tblOne" AS Source
FROM tblOne
UNION ALL
SELECT DISTINCT
CUSTOMER_ID,
"tblTwo" AS Source
FROM tblTwo
)
GROUP BY CUSTOMER_ID
Gord Thompson's answer is correct. But, it is not necessary to do a distinct in the subqueries. And, you can return a single column with the information you are looking for:
select customer_id,
iif(min(which) = max(which), min(which), "both") as DataSource
from (select customer_id, "tblone" as which
from tblOne
UNION ALL
select customer_id, "tbltwo" as which
from tblTwo
) t
group by customer_id
We could add an identifier column with the integer data type and then do an outer query:
SELECT
CUSTOMER_ID,
sum(Table)
FROM
(
SELECT
DISTINCT CUSTOMER_ID,
1 AS Table
FROM tblOne
UNION
SELECT
DISTINCT CUSTOMER_ID,
2 AS Table
FROM tblTwo
)
GROUP BY CUSTOMER_ID`
So if the "sum is 1" then it comes from tablOne and if it is 2 then it comes from tableTwo an if it is 3 then it exists in both
If you want to add a 3rd table in the union then give it a value of 4 so that you should have a unique sum for each combination