I was asked this question during an interview for a Junior Oracle Developer position, the interviewer admitted it was a tough one:
Write a query/queries to check if the table 'employees_hist' is an exact copy of the table 'employees'. Any ideas how to go about this?
EDIT: Consider that tables can have duplicate records so a simple MINUS will not work in this case.
EXAMPLE
EMPLOYEES
NAME
--------
Jack Crack
Jack Crack
Jill Hill
These two would not be identical.
EMPLOYEES_HIST
NAME
--------
Jack Crack
Jill Hill
Jill Hill
If the tables have the same columns, you can use this; this will return no rows if the rows in both tables are identical:
(
select * from test_data_01
minus
select * from test_data_02
)
union
(
select * from test_data_02
minus
select * from test_data_01
);
Identical regarding what? Metadata or the actual table data too?
Anyway, use MINUS.
select * from table_1
MINUS
select * from table_2
So, if the two tables are really identical, i.e. the metadata and the actual data, it would return no rows. Else, it would prove that the data is different.
If, you receive an error, it would mean the metadata itself is different.
Update If the data is not same, and that one of the table has duplicates.
Just select the unique records from one of the table, and simply apply MINUS against the other table.
One possible solution, which caters for duplicates, is to create a subquery which does a UNION on the two tables, and includes the number of duplicates contained within each table by grouping on all the columns. The outer query can then group on all the columns, including the row count column. If the table match, there should be no rows returned:
create table employees (name varchar2(100));
create table employees_hist (name varchar2(100));
insert into employees values ('Jack Crack');
insert into employees values ('Jack Crack');
insert into employees values ('Jill Hill');
insert into employees_hist values ('Jack Crack');
insert into employees_hist values ('Jill Hill');
insert into employees_hist values ('Jill Hill');
with both_tables as
(select name, count(*) as row_count
from employees
group by name
union all
select name, count(*) as row_count
from employees_hist
group by name)
select name, row_count from both_tables
group by name, row_count having count(*) <> 2;
gives you:
Name Row_count
Jack Crack 1
Jack Crack 2
Jill Hill 1
Jill Hill 2
This tells you that both names appear once in one table and twice in the other, and therefore the tables don't match.
select name, count(*) n from EMPLOYEES group by name
minus
select name, count(*) n from EMPLOYEES_HIST group by name
union all (
select name, count(*) n from EMPLOYEES_HIST group by name
minus
select name, count(*) n from EMPLOYEES group by name)
You could merge the two tables and then subtract one of the tables from the result. If the result of the subtraction is an empty table then you know that the the tables must be the same since merge had no effect (every row and column were effectively the same)
How do I merge two tables with different column number while removing duplicates?
That link provides a good way to merge the two tables without duplicates without knowing what the columns are.
Ensure the rows are unique by adding a pseudo column
WITH t1 AS
(SELECT <All_Columns>
, row_number() OVER
(PARTITION BY <All_Columns>
ORDER BY <All_Columns>) row_num
FROM employees)
, t2 AS
(SELECT <All_Columns>
, row_number() OVER
(PARTITION BY <All_Columns>
ORDER BY <All_Columns>) row_num
FROM employees_hist)
(SELECT *
FROM t1
MINUS
SELECT *
FROM t2
UNION ALL
(SELECT *
FROM t1
MINUS
SELECT *
FROM t2)
Use row_number to make sure there are no duplicate rows. Now you can use minus and if there are no results, the tables are identical.
SELECT ROW_NUMBER() OVER (Order By Name), *
FROM tab1
MINUS
SELECT ROW_NUMBER() OVER (Order By Name), *
FROM tab2
Related
I cannot find the answer to the abovementioned problem in the title.
I need to select all rows that have the identical ID and all other column values must also be identical as well. This table consists of 20 columns.
Any suggestion would be much appreciated! Many thanks.
How about this
select id, name, ...other fields
from my_table
where id in (
Select id, count(id)
from my_table
group by id, name, ...other fields
having count(id) > 1
)
Change group by and where conditions accordingly
Does anyone know the sql to pull 4 rows from the following table which contains 8 rows?
Just want one row for each arbitrary person.
The real data will be thousands of records so it must be generic and use only the ID's not the names.
table
You seem to have a symmetric relationship. So, you can do:
select t.*
from t
where t.id < t.pid;
select
ID,
FName,
LName
from your_table
union
select
PID,
PFName,
PLName
from your_table
order by 3, 2, 1
I have two tables that I was going to join, but I understand it's more efficient to use CREATE VIEW. This is what I have:
CREATE OR REPLACE VIEW view0_joinedTablesGrouped
AS
Select table1.*,table2.*
FROM table1
inner join table2 on table1.col =
table2.matchingcol
group by table2.matchingcol;
which causes the following error:
ERROR: column "table1.col" must appear in the GROUP BY clause or be
used in an aggregate function
LINE 3: Select table.*,table2.*
Group By cannot do what you are trying to do.
Consider a simple table:
Name Age
-------
Ann 10
Bill 10
Chris 11
If you try to group by age with:
Select * from Table group by Age
What, exactly, do you expect to appear in the Name column for Age=10? Ann, or Bill or both or neither or ....? There is no good answer.
So, when you group by, every column in the output has to be an aggregate – that means a function of every row in the group.
So these are valid:
Select Age, Count(*) from Table group by Age
Select Age, Max( Length(Name)) from Table group by Age
Select Age, Max(Name) from Table group by Age
But this is impossible to do, and isn't valid:
Select Age,Name from Table group by Age
So your select * is the problem -- you can't just select column values because when you group by there's a whole group of column values for every output row, and you can't stuff all those values into one column of one row.
As for using a view, #systemjack's comment is correct.
I am new to SQL and stackoverflow, but I was hoping someone could help me out with the following problem.
There are three different databases for a company, each of which represents a different division within the company.
Each database contains employee information in the employee table:
id, first name, last name, division.
There are three divisions. Each employee can be in more than one division. The id is unique to the employee; an employee in multiple divisions has the same id in each table.
How can I write a query that selects each unique employee and the divisions that they work in (in one row)?
My results from the following code is incomplete, meaning that there are a few missing employees that are unaccounted for.
Insert into #temp1 (id, first name, last name, division AS divison1) from db1.table WHERE active_flag = 1 AND termination_date IS NULL
Insert into #temp2 (id, first name, last name, division AS division2) from db2.table WHERE active_flag = 1 AND termination_date IS NULL
Insert into #temp3 (id, first name, last name, division AS division3) from db3.table WHERE active_flag = 1 AND termination_date IS NULL
Insert into #uniqueids (id, first name, last name)
SELECT id, first name, last name FROM #temp1
UNION SELECT id, first name, last name FROM #temp2
UNION SELECT id, first name, last name FROM #temp3
SELECT #uniqueids.id, #uniqueids.first name, #uniqueids.last name,
division1+division2+division3 AS divisions
FROM #uniqueids
LEFT JOIN #temp1 ON #uniqueids.id=#temp1.id
LEFT JOIN #temp2 ON #uniqueids.id=#temp2.id
LEFT JOIN #temp3 ON #uniqueids.id=#temp3.id
WHERE #uniqueids.id NOT LIKE '%default%' AND #uniqueids.id NOT LIKE 'S%'
*** I edited the code to make it more clear
I know that certain employees are unaccounted for because I was given a result set that has a list of 778 unique employees.
Example row:
[id, first name, last name, divisions]
[asd1234, Julie, Wong, 1 2 3]
The results of inserting into and selecting the #uniqueids displaying 900 employees, due to the existence of ids that contain "default" and starting with "S", which do not count. These are addressed in the last section of the code, which I have just now included.
My current result set has 770 unique employees after running the entire query, which means that I am missing 8.
I am using SQL Server 2014.
Does anyone happen to know a way of basically taking the 'Distinct' command but only using it on a single column. For lack of example, something similar to this:
Select (Distinct ID), Name, Term from Table
So it would get rid of row with duplicate ID's but still use the other column information. I would use distinct on the full query but the rows are all different due to certain columns data set. And I would need to output only the top most term between the two duplicates:
ID Name Term
1 Suzy A
1 Suzy B
2 John A
2 John B
3 Pete A
4 Carl A
5 Sally B
Any suggestions would be helpful.
select t.Id, t.Name, t.Term
from (select distinct ID from Table order by id, term) t
You can use row number for this
Select ID, Name, Term from(
Select ID, Name, Term, ROW_NUMBER ( )
OVER ( PARTITION BY ID order by Name) as rn from Table
Where rn = 1)
as tbl
Order by determines the order from which the first row will be picked.