Create a UNION query that identifies which table the unique data came from - sql

I have two tables with data. Both tables have a CUSTOMER_ID column (which is numeric). I am trying to get a list of all the unique values for CUSTOMER_ID and know whether or not the CUSTOMER_ID exists in both tables or just one (and which one).
I can easily get a list of the unique CUSTOMER_ID:
SELECT tblOne.CUSTOMER_ID
FROM tblOne.CUSTOMER_ID
UNION
SELECT tblTwo.CUSTOMER_ID
FROM tblTwo.CUSTOMER_ID
I can't do just add an identifier column to the SELECT statemtn (like: SELECT tblOne.CUSTOMER_ID, "Table1" AS DataSource) because then the records wouldn't be unique and it will get both sets of data.
I feel I need to add it somewhere else in this query but am not sure how.
Edit for clarity:
For the union query output I need an additional column that can tell me if the unique value I am seeing exists in: (1) both tables, (2) table one, or (3) table two.

If the CUSTOMER_ID appears in both tables then we'll have to arbitrarily pick which table to call the source. The following query uses "tblOne" as the [SourceTable] in that case:
SELECT
CUSTOMER_ID,
MIN(Source) AS SourceTable,
COUNT(*) AS TableCount
FROM
(
SELECT DISTINCT
CUSTOMER_ID,
"tblOne" AS Source
FROM tblOne
UNION ALL
SELECT DISTINCT
CUSTOMER_ID,
"tblTwo" AS Source
FROM tblTwo
)
GROUP BY CUSTOMER_ID

Gord Thompson's answer is correct. But, it is not necessary to do a distinct in the subqueries. And, you can return a single column with the information you are looking for:
select customer_id,
iif(min(which) = max(which), min(which), "both") as DataSource
from (select customer_id, "tblone" as which
from tblOne
UNION ALL
select customer_id, "tbltwo" as which
from tblTwo
) t
group by customer_id

We could add an identifier column with the integer data type and then do an outer query:
SELECT
CUSTOMER_ID,
sum(Table)
FROM
(
SELECT
DISTINCT CUSTOMER_ID,
1 AS Table
FROM tblOne
UNION
SELECT
DISTINCT CUSTOMER_ID,
2 AS Table
FROM tblTwo
)
GROUP BY CUSTOMER_ID`
So if the "sum is 1" then it comes from tablOne and if it is 2 then it comes from tableTwo an if it is 3 then it exists in both
If you want to add a 3rd table in the union then give it a value of 4 so that you should have a unique sum for each combination

Related

PostgreSQL create count, count distinct columns

fairly new to PostgreSQL and trying out a few count queries. I'm looking to count and count distinct all values in a table. Pretty straightforward -
CountD Count
351 400
With a query like this:
SELECT COUNT(*)
COUNT(id) AS count_id,
COUNT DISTINCT(id) AS count_d_id
FROM table
I see that I can create a single column this way:
SELECT COUNT(*) FROM (SELECT DISTINCT id FROM table) AS count_d_id
But the title (count_d_id) doesn't come through properly and unsure how can I add an additional column. Guidance appreciated
This is the correct syntax:
SELECT COUNT(id) AS count_id,
COUNT(DISTINCT id) AS count_d_id
FROM table
Your original query aliases the subquery rather than the column. You seem to want:
SELECT COUNT(*) AS count_d_id FROM (SELECT DISTINCT id FROM table) t
-- column alias --^ -- subquery alias --^

Oracle SQL to get Unique Records

Does anyone know the sql to pull 4 rows from the following table which contains 8 rows?
Just want one row for each arbitrary person.
The real data will be thousands of records so it must be generic and use only the ID's not the names.
table
You seem to have a symmetric relationship. So, you can do:
select t.*
from t
where t.id < t.pid;
select
ID,
FName,
LName
from your_table
union
select
PID,
PFName,
PLName
from your_table
order by 3, 2, 1

Join two select statements together

I am trying to work out how much we have taken in for entry fees.
I have two separate queries both returning values but i need them be as one instead of two separate queries.
SELECT SUM(ENTRY) AS TOTAL1 FROM MONEY
SELECT SUM(ENTRY) AS TOTAL1 FROM MONEY2
I needed to use UNION in order to get the statements together. Then used the below to get one number.
SELECT SUM(X.TOTAL1) from
(
SELECT SUM(ENTRY) AS TOTAL1 FROM MONEY
UNION
SELECT SUM(ENTRY) AS TOTAL1 FROM MONEY2
) X;
select sum(entry) as grand_total
from ( select entry from money
union all
select entry from money2
);
The point being, you SHOULD use UNION ALL; and how many columns each table has is irrelevant, because you don't need to UNION ALL the two tables (all columns from each); you only need to UNION ALL the ENTRY column from the first table and the ENTRY column from the second table.

select multiple records based on order by

i have a table with a bunch of customer IDs. in a customer table is also these IDs but each id can be on multiple records for the same customer. i want to select the most recently used record which i can get by doing order by <my_field> desc
say i have 100 customer IDs in this table and in the customers table there is 120 records with these IDs (some are duplicates). how can i apply my order by condition to only get the most recent matching records?
dbms is sql server 2000.
table is basically like this:
loc_nbr and cust_nbr are primary keys
a customer shops at location 1. they get assigned loc_nbr = 1 and cust_nbr = 1
then a customer_id of 1.
they shop again but this time at location 2. so they get assigned loc_nbr = 2 and cust_Nbr = 1. then the same customer_id of 1 based on their other attributes like name and address.
because they shopped at location 2 AFTER location 1, it will have a more recent rec_alt_ts value, which is the record i would want to retrieve.
You want to use the ROW_NUMBER() function with a Common Table Expression (CTE).
Here's a basic example. You should be able to use a similar query with your data.
;WITH TheLatest AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY group-by-fields ORDER BY sorting-fields) AS ItemCount
FROM TheTable
)
SELECT *
FROM TheLatest
WHERE ItemCount = 1
UPDATE: I just noticed that this was tagged with sql-server-2000. This will only work on SQL Server 2005 and later.
Since you didn't give real table and field names, this is just psuedo code for a solution.
select *
from customer_table t2
inner join location_table t1
on t1.some_key = t2.some_key
where t1.LocationKey = (select top 1 (LocationKey) as LatestLocationKey from location_table where cust_id = t1.cust_id order by some_field)
Use an aggregate function in the query to group by customer IDs:
SELECT cust_Nbr, MAX(rec_alt_ts) AS most_recent_transaction, other_fields
FROM tableName
GROUP BY cust_Nbr, other_fields
ORDER BY cust_Nbr DESC;
This assumes that rec_alt_ts increases every time, thus the max entry for that cust_Nbr would be the most recent entry.
By using time and date we can take out the recent detail for the customer.
use the column from where you take out the date and the time for the customer.
eg:
SQL> select ename , to_date(hiredate,'dd-mm-yyyy hh24:mi:ss') from emp order by to_date(hiredate,'dd-mm-yyyy hh24:mi:ss');

Check if tables are identical using SQL in Oracle

I was asked this question during an interview for a Junior Oracle Developer position, the interviewer admitted it was a tough one:
Write a query/queries to check if the table 'employees_hist' is an exact copy of the table 'employees'. Any ideas how to go about this?
EDIT: Consider that tables can have duplicate records so a simple MINUS will not work in this case.
EXAMPLE
EMPLOYEES
NAME
--------
Jack Crack
Jack Crack
Jill Hill
These two would not be identical.
EMPLOYEES_HIST
NAME
--------
Jack Crack
Jill Hill
Jill Hill
If the tables have the same columns, you can use this; this will return no rows if the rows in both tables are identical:
(
select * from test_data_01
minus
select * from test_data_02
)
union
(
select * from test_data_02
minus
select * from test_data_01
);
Identical regarding what? Metadata or the actual table data too?
Anyway, use MINUS.
select * from table_1
MINUS
select * from table_2
So, if the two tables are really identical, i.e. the metadata and the actual data, it would return no rows. Else, it would prove that the data is different.
If, you receive an error, it would mean the metadata itself is different.
Update If the data is not same, and that one of the table has duplicates.
Just select the unique records from one of the table, and simply apply MINUS against the other table.
One possible solution, which caters for duplicates, is to create a subquery which does a UNION on the two tables, and includes the number of duplicates contained within each table by grouping on all the columns. The outer query can then group on all the columns, including the row count column. If the table match, there should be no rows returned:
create table employees (name varchar2(100));
create table employees_hist (name varchar2(100));
insert into employees values ('Jack Crack');
insert into employees values ('Jack Crack');
insert into employees values ('Jill Hill');
insert into employees_hist values ('Jack Crack');
insert into employees_hist values ('Jill Hill');
insert into employees_hist values ('Jill Hill');
with both_tables as
(select name, count(*) as row_count
from employees
group by name
union all
select name, count(*) as row_count
from employees_hist
group by name)
select name, row_count from both_tables
group by name, row_count having count(*) <> 2;
gives you:
Name Row_count
Jack Crack 1
Jack Crack 2
Jill Hill 1
Jill Hill 2
This tells you that both names appear once in one table and twice in the other, and therefore the tables don't match.
select name, count(*) n from EMPLOYEES group by name
minus
select name, count(*) n from EMPLOYEES_HIST group by name
union all (
select name, count(*) n from EMPLOYEES_HIST group by name
minus
select name, count(*) n from EMPLOYEES group by name)
You could merge the two tables and then subtract one of the tables from the result. If the result of the subtraction is an empty table then you know that the the tables must be the same since merge had no effect (every row and column were effectively the same)
How do I merge two tables with different column number while removing duplicates?
That link provides a good way to merge the two tables without duplicates without knowing what the columns are.
Ensure the rows are unique by adding a pseudo column
WITH t1 AS
(SELECT <All_Columns>
, row_number() OVER
(PARTITION BY <All_Columns>
ORDER BY <All_Columns>) row_num
FROM employees)
, t2 AS
(SELECT <All_Columns>
, row_number() OVER
(PARTITION BY <All_Columns>
ORDER BY <All_Columns>) row_num
FROM employees_hist)
(SELECT *
FROM t1
MINUS
SELECT *
FROM t2
UNION ALL
(SELECT *
FROM t1
MINUS
SELECT *
FROM t2)
Use row_number to make sure there are no duplicate rows. Now you can use minus and if there are no results, the tables are identical.
SELECT ROW_NUMBER() OVER (Order By Name), *
FROM tab1
MINUS
SELECT ROW_NUMBER() OVER (Order By Name), *
FROM tab2