Comparing between rows in same table in Oracle SQL - sql

I'm trying to find the best way to compare between rows by CustomerID and Status. In other words, only show the CustomerID when the status are equal between multiple rows and CustomerID. If not, don't show the CustomerID.
Example data
CUSTOMERID STATUS
1000 ACTIVE
1000 ACTIVE
1000 NOT ACTIVE
2000 ACTIVE
2000 ACTIVE
RESULT I'm hoping for
CUSTOMERID STATUS
2000 ACTIVE

You can do this with a WHERE NOT EXISTS:
Select Distinct CustomerId, Status
From YourTable A
Where Not Exists
(
Select *
From YourTable B
Where A.CustomerId = B.CustomerId
And A.Status <> B.Status
)

SELECT DISTINCT o.*
FROM
(
SELECT
CustomerId
FROm
TableName
GROUP BY
CustomerId
HAVING
COUNT(DISTINCT Status) = 1
) t
INNER JOIN TableName o
ON t.CustomerId = o.CustomerId

The only "Code" here is the last 4 lines in the code block. The other is establishing sample data.
with T1 as (
Select 1000 as CUSTOMERID, 'ACTIVE' as STATUS from dual union all
select 1000, 'ACTIVE' from dual union all
select 1000, 'NOT ACTIVE' from dual union all
select 2000, 'ACTIVE' from dual union all
select 2000, 'ACTIVE' from dual )
SELECT customerID, max(status) as status
FROM T1
GROUP BY customerID
HAVING count(distinct Status) = 1
I used a CTE to setup sample data and called this Common table Expression T1.
Order of operations matter here. First the table T1 is identified
second the engine groups by customer ID
third the engine limits the results to those records having a distinct record status matching 1 and only 1 value.
4th the engine picks the max status which will always be 1 value. min/max it doesn't matter as there is only 1 possible value. note, we have to use an aggregate here since we can't group by status or you wouldn't get the desired results.

Here's a pretty simple one using IN:
SELECT DISTINCT CustomerID, Status
FROM My_Table
WHERE CustomerID IN
(SELECT CustomerID
FROM My_Table
GROUP BY CustomerID
HAVING COUNT(Distinct Status) = 1)
Addition: based on your comment, it seems what you really want is all the IDs that do not have a 'Not Active' row, which is actually easier:
SELECT Distinct CustomerID, Status
FROM My_Table
WHERE CustomerID NOT IN
(SELECT CustomerID
FROM My_Table
WHERE Status = 'Not Active')

This is a SQL Server answer, I believe it should work in Oracle.
SELECT
a.AGMTNUM
FROM TableA a
WHERE NOT EXISTS (SELECT 1 FROM TableB b WHERE b.Status = 'NOT ACTIVE' AND a.AGMTNUM = b.AGMTNUM)
AND EXISTS (SELECT 1 FROM TableB c WHERE c.Status = 'ACTIVE' AND a.AGMTNUM = c.AGMTNUM)
This will only return values that have at least one 'ACTIVE' value and no 'NOT ACTIVE' values.

Related

Using Groupby with additional filter

I have the data in Initial format:
STEP 1: To find out the users having more than 1 record and show those records. This was achieved using the below.
SELECT ID,
USER,
STATUS
FROM TABLE
WHERE USER in
(SELECT USER
FROM TABLE
GROUP BY USER
HAVING COUNT(*) > 1)
*STEP 2: From the above set of records find out records for which all the values are either 1 or 2. SO data should be something like:
Can I get some suggestions to how to achieve that. Note status is NVARCHAR hence aggregate functions can't be used.
The simplest thing is to check that the status is the same in your subquery. Assuming that status only takes on the values 1 and 2:
SELECT t.ID, t.USER, t.STATUS
FROM TABLE
WHERE t.USER IN (SELECT t2.USER
FROM TABLE t2
GROUP BY t2.USER
HAVING COUNT(*) > 1 AND
MIN(t2.status) = MAX(t2.status)
);
If there are other status values and you particularly care about 1 and 2, you would use:
SELECT t.ID, t.USER, t.STATUS
FROM TABLE
WHERE t.USER IN (SELECT t2.USER
FROM TABLE t2
GROUP BY t2.USER
HAVING COUNT(*) > 1 AND
MIN(t2.status) = MAX(t2.status) AND
MIN(t2.status) IN (1, 2)
);
Please check if this helps
SELECT ID,
[USER],
[STATUS]
FROM TABLE
WHERE [USER] in
(SELECT [USER]
FROM TABLE
GROUP BY [USER]
HAVING COUNT([USER]) > 1 AND ((MIN(STATUS) != MAX(STATUS) AND COUNT(STATUS) > 2) OR (MIN(STATUS) = MAX(STATUS))))

Select Customer ID who hasnt purchased product X

I have a table of customer IDs and Products Purchased. A customer ID can purchase multiple products over time.
customerID, productID
In BigQuery I need to find the CustomerID for those who have not purchased product A.
I've been going around in circles trying to do self joins, inner joins, but I'm clueless.
Any help appreciated.
select customerID
from your_table
group by customerID
having sum(case when productID = 'A' then 1 else 0 end) = 0
and to check if it only contains a name
sum(case when productID contains 'XYZ' then 1 else 0 end) = 0
Below is for BigQuery Standard SQL
#standardSQL
SELECT CustomerID
FROM `project.dataset.yourTable`
GROUP BY CustomerID
HAVING COUNTIF(Product = 'A') = 0
You can test / play with it using dummy data as below
#standardSQL
WITH `project.dataset.yourTable` AS (
SELECT 1234 CustomerID, 'A' Product UNION ALL
SELECT 11234, 'A' UNION ALL
SELECT 4567, 'A' UNION ALL
SELECT 7896, 'C' UNION ALL
SELECT 5432, 'B'
)
SELECT CustomerID
FROM `project.dataset.yourTable`
GROUP BY CustomerID
HAVING COUNTIF(Product = 'A') = 0
how would I adjust this so it could be productID contains "xyz"
#standardSQL
WITH `project.dataset.yourTable` AS (
SELECT 1234 CustomerID, 'Axyz' Product UNION ALL
SELECT 11234, 'A' UNION ALL
SELECT 4567, 'A' UNION ALL
SELECT 7896, 'Cxyz' UNION ALL
SELECT 5432, 'B'
)
SELECT CustomerID
FROM `project.dataset.yourTable`
GROUP BY CustomerID
HAVING COUNTIF(REGEXP_CONTAINS(Product, 'xyz')) = 0
If you have a customer table, you might want:
select c.*
from customers c
where not exists (select 1 from t where t.customer_id = c.customer_id and t.proectID = 'A');
This will return customers who have made no purchases as well as those who have purchased all but product A. Of course, the definition of a customer in your data might be that the customer has made a purchase, in which case I like Juergen's solution.

Diff between two tables (using sql) -> incremental changes

I have a need to identify differences between two tables. I have looked at sql query to return differences between two tables but it was a bit too different for me to extrapolate with my current SQL skills.
Table A is a snapshot of a certain group of people where the snapshot was taken yesterday, where each row is a unique person and certain characteristics about the person. Table B is the same snapshot taken 24 hours later. Within the 24 hour period:
New people may have been added.
People from yesterday may have been removed.
People from yesterday may have changed (i.e., original row is there, but one or more column values have changed).
My output should have the following:
a row for each new person added
a row for each person removed
a row for each person who has changed
I would grateful for any ideas. Thanks!
This type of problem has a very simple and efficient solution that does not use joins (it doesn't even use a union of the results of two MINUS operations) - it just uses one union and a GROUP BY operation. The solution was developed in a thread on AskTom many years ago, it is surprising that it is not more widely known and used. For example (but not only): https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:24371552251735
In your case, assuming there is a primary key constraint on PERSON_ID (which makes the solution simpler):
select max(flag) as flag, PERSON_ID, first_name, last_name, (etc. - all the columns)
from ( select 'old' as flag, t1.*
from old_table t1
union all
select 'new' as flag, t2.*
from new_table t2
)
group by PERSON_ID, first_name, last_name, (etc.)
having count(*) = 1
order by PERSON_ID -- optional
;
If for a PERSON_ID all the data is the same in both tables, that will result in a count of 2 for that group. So it won't pass the HAVING condition. The only groups that will have a count of 1 (and therefore will be just one row each!) are either rows that are in one table but not the other. If a person was added, that will show only one row, with the flag = 'new'. If a person was deleted, you will get only one row, with the flag 'old'. If there were updates, the same PERSON_ID will appear twice, but since at least one field is different, the two rows (one with flag 'new' and the other with 'old') will be in different groups, they will pass the HAVING filter, and they will BOTH be in the output.
Which is slightly different from what you requested; you will get both the old AND the new information for updates, labeled as 'old' and 'new'. You said you wanted only one of those but didn't state which one. This will give you both (which makes more sense anyway), but if you really only want one, it can be done easily in the query above.
Note - the outer select must have max(flag) rather than flag because flag is not a GROUP BY column; but it's the max() over exactly one row, so it WILL be the flag for that row anyway.
Added - OP indicated he would like to get only the "new" row for a person with updated (changed, modified) data. The approach shown below will change the flag to "changed" in this case.
with old_table ( person_id, first_name, last_name ) as (
select 101, 'John', 'Smith' from dual union all
select 102, 'Mary', 'Green' from dual union all
select 103, 'July', 'Dobbs' from dual union all
select 104, 'Will', 'Scott' from dual
),
new_table ( person_id, first_name, last_name ) as (
select 101, 'Joe' , 'Smith' from dual union all
select 102, 'Mary', 'Green' from dual union all
select 104, 'Will', 'Scott' from dual union all
select 105, 'Andy', 'Brown' from dual
)
-- end of test data; solution (SQL query) begins below this line
select case ct when 1 then flag else 'changed' end as flag,
person_id, first_name, last_name
from (
select max(flag) as flag, person_id, first_name, last_name,
count(*) over (partition by person_id) as ct,
row_number() over (partition by person_id order by max(flag)) as rn
from ( select 'old' as flag, t1.*
from old_table t1
union all
select 'new' as flag, t2.*
from new_table t2
)
group by person_id, first_name, last_name
having count(*) = 1
)
where rn = 1
order by person_id -- ORDER BY clause is optional
;
Output:
FLAG PERSON_ID FIRS_NAME LAST_NAME
------- ---------- --------- ---------
changed 101 Joe Smith
old 103 July Dobbs
new 105 Andy Brown
The first 2 parts are easy:
select 'New', name from B where not exists (select name from A where A.name=B.name)
union select 'Removed', name from A where not exists (select name from B where B.name = A.name)
The last one is where you need to compare characteristics. How many of them are there? Do you want to list what has changed or only that they have changed?
For argument's sake, let us only say that the characteristics are address and telephone #:
union select 'Phone', name from A,B where A.name = B.name and A.telephone != B.telephone
union select 'Address', name from A,B where A.name = B.name and A.address != B.address
Note: The question isn't currently tagged with the dbms. I use sql-server, so that's what I used to write the below. There may be slight differences in another dbms.
You can do something along these lines:
select *
from TableA a
left join TableB b on b.ID = a.ID
where a.ID is null -- added since yesterday
union
select *
from TableA a
left join TableB b on b.ID = a.ID
where b.ID is null -- removed since yesterday
union
select *
from TableA a
inner join TableB b on b.ID = a.ID -- restrict to records in both tables
where a.SomeValue <> b.SomeValue
or a.SomeOtherValue <> b.SomeOtherValue
--etc
Each select handles one portion of your expected output. In this manner, they'd all be joined into 1 result set. If you drop the union, you'll end up with a separate set for each.
I suggest to use Except to get the changed records. The below query should work if the db is sql server.
-- added since yesterday
SELECT B.*
FROM TableA A
LEFT Outer Join TableB B on B.ID = A.ID
WHERE A.ID IS NULL
UNION
-- removed since yesterday
SELECT A.*
FROM TableA A
LEFT OUTER JOIN TableB B on B.ID = A.ID
WHERE B.ID IS NULL
UNION
-- Those changed with values from yesterdady
SELECT B.* FROM TableB B WHERE EXISTS(SELECT A.ID FROM TableA A WHERE A.ID = B.ID)
EXCEPT
SELECT A.* FROM TableA A WHERE EXISTS(SELECT B.ID FROM TableB B WHERE B.ID = A.ID)
Assuming you have a unique id for each person in the able, you can use full outer join:
select coalesce(ty.customerid, tt.customerid) as customerid,
(case when ty.customerid is null then 'New'
when tt.customerid is null then 'Removed'
else 'Modified'
end) as status
from tyesterday ty full outer join
ttoday tt
on ty.customerid= tt.customerid
where ty.customerid is null or
tt.customerid is null or
(tt.col1 <> ty.col1 or tt.col2 <> ty.col2 or . . . ); -- may need to take `NULL`s into account
mathguy provided a successful answer to my initial problem. I asked him for a revision (to make it even better). He provided a revision, but I am getting a "missing keyword" error when executing against my code. Here is my code:
select case when ct = 1 then flag else 'changed' as flag, PERSON_ID, FIRSTNAME, LASTNAME
from (
select max(flag), PERSON_ID, FIRSTNAME, LASTNAME
count() over (partition by PERSON_ID) as ct,
row_number() over (partition by PERSON_ID
order by case when flag = 'new' then 0 end) as rn
from ( select 'old' as flag, t1.*
from YESTERDAY_TABLE t1
union all
select 'new' as flag, t2.*
from TODAY_TABLE t2
)
group by PERSON_ID, FIRSTNAME, LASTNAME
having count(*) = 1
)
where rn = 1
order by PERSON_ID;

apply order by after combining results from two tables?

I have two tables with same columns. am using oracle 10g.
TableA
------
id status
---------------
1 W
2 R
TableB
------
id status
---------------
1 W
3 S
I have two tables. i get results from both the tables using UNION as below.
select id, status
from TableA
union
select id, status
from TableB
order by status;
if i do that, is order by applied for both the queries?
My requirement is first it has to combine the results then it has to apply order by...
How can i do that?
Thanks!
Given the data you've shown, your query will return this:
ID STATUS
-- ------
2 R
3 S
1 W
That's because UNION will return only unique rows and the (1, 'W') row has a duplicate.
If you want to include all rows, even duplicates, use UNION ALL instead of UNION:
select id, status
from TableA
union all
select id, status
from TableB
order by status;
With UNION ALL your query will return this:
ID STATUS
-- ------
2 R
3 S
1 W
1 W
Try the following query
select id, status
from (select id, status from TableA
union select id, status from TableB)
order by status
select id, status
from TableA
union
select id, status
from TableB
order by 2; -- status
I think you need Distinct :
Select Distinct id, status
from (
select id, status from TableA
union
select id, status from TableB)
order by status

TSQL: Return row(s) with earliest dates

Given 2 tables called "table1" and "table1_hist" that structurally resemble this:
TABLE1
id status date_this_status
1 open 2008-12-12
2 closed 2009-01-01
3 pending 2009-05-05
4 pending 2009-05-06
5 open 2009-06-01
TABLE1_hist
id status date_this_status
2 open 2008-12-24
2 pending 2008-12-26
3 open 2009-04-24
4 open 2009-05-04
With table1 being the current status and table1_hist being a history table of table1, how can I return the rows for each id that has the earliest date. In other words, for each id, I need to know it's earliest status and date.
EXAMPLE:
For id 1 earliest status and date is open and 2008-12-12.
For id 2 earliest status and date is open and 2008-12-24.
I've tried using MIN(datetime), unions, dynamic SQL, etc. I've just reached tsql writers block today and I'm stuck.
Edited to add: Ugh. This is for a SQL2000 database, so Alex Martelli's answer won't work. ROW_NUMBER wasn't introduced until SQL2005.
SQL Server 2005 and later support an interesting (relatively recent) aspect of SQL Standards, "ranking/windowing functions", allowing:
WITH AllRows AS (
SELECT id, status, date_this_status,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY date_this_status ASC) AS row,
FROM (SELECT * FROM Table1 UNION SELECT * FROM Table1_hist) Both_tables
)
SELECT id, status, date_this_status
FROM AllRows
WHERE row = 1
ORDER BY id;
where I'm also using the nice (and equally "new") WITH syntax to avoid nesting the sub-query in the main SELECT.
This article shows how one could hack the equivalent of ROW_NUMBER (and also RANK and DENSE_RANK, the other two "new" ranking/windowing functions) in SQL Server 2000 -- but that's not necessarily pretty nor especially well-performing, alas.
The following code sample is completely self-sufficient, just copy and paste it into a management studio query and hit F5 =)
DECLARE #TABLE1 TABLE
(
id INT,
status VARCHAR(50),
date_this_status DATETIME
)
DECLARE #TABLE1_hist TABLE
(
id INT,
status VARCHAR(50),
date_this_status DATETIME
)
--TABLE1
INSERT #TABLE1
SELECT 1, 'open', '2008-12-12' UNION ALL
SELECT 2, 'closed', '2009-01-01' UNION ALL
SELECT 3, 'pending', '2009-05-05' UNION ALL
SELECT 4, 'pending', '2009-05-06' UNION ALL
SELECT 5, 'open', '2009-06-01'
--TABLE1_hist
INSERT #TABLE1_hist
SELECT 2, 'open', '2008-12-24' UNION ALL
SELECT 2, 'pending', '2008-12-26' UNION ALL
SELECT 3, 'open', '2009-04-24' UNION ALL
SELECT 4, 'open', '2009-05-04'
SELECT x.id,
ISNULL(y.[status], x.[status]) AS [status],
ISNULL(y.date_this_status, x.date_this_status) AS date_this_status
FROM #TABLE1 x
LEFT JOIN (
SELECT a.*
FROM #TABLE1_hist a
INNER JOIN (
SELECT id,
MIN(date_this_status) AS date_this_status
FROM #TABLE1_hist
GROUP BY id
) b
ON a.id = b.id
AND a.date_this_status = b.date_this_status
) y
ON x.id = y.id
SELECT id,
status,
date_this_status
FROM ( SELECT *
FROM Table1
UNION
SELECT *
from TABLE1_hist
) a
WHERE date_this_status = ( SELECT MIN(date_this_status)
FROM ( SELECT *
FROM Table1
UNION
SELECT *
from TABLE1_hist
) t
WHERE id = a.id
)
This is a bit ugly, but seems to work in MS SQL Server 2005.
You can do this with an exclusive self join. Join on the history table, and then another time on all earlier history entries. In the where statement, you specify that there are not allowed to be any earlier entries.
select t1.id,
isnull(hist.status, t1.status),
isnull(hist.date_this_status, t1.date_this_status)
from table1 t1
left join (
select h1.id, h1.status, h1.date_this_status
from table1_hist h1
left join table1_hist h2
on h2.id = h1.id
and h2.date_this_status < h1.date_this_status
where h2.date_this_status is null
) hist on hist.id = t1.id
A bit of a mind-binder, but fairly flexible and efficient!
This assumes there are no two history entries with the exact same date. If there are, write the self join like:
left join table1_hist h2
on h2.id = h1.id
and (
h2.date_this_status < h1.date_this_status
or (h2.date_this_status = h1.date_this_status and h2.id < h1.id)
)
If I understand the OP correctly, a given ID may appear in TABLE1 or TABLE1_HISTORY or both.
In your result set, you want back each distinct ID and the oldest status/date associated with that ID, regardless which table the oldest one happens to be in.
So, look in BOTH tables and return any record where there is no record in either table for it's ID that has a smaller date_this_status.
Try this:
SELECT ID, status, date_this_status FROM table1 ta WHERE
NOT EXISTS(SELECT null FROM table1 tb WHERE
tb.id = ta.id
AND tb.date_this_status < ta.date_this_status)
AND NOT EXISTS(SELECT null FROM table1_history tbh WHERE
tbh.id = ta.id
AND tbh.date_this_status < ta.date_this_status)
UNION ALL
SELECT ID, status, date_this_status FROM table1_history tah WHERE
NOT EXISTS(SELECT null FROM table1 tb WHERE
tb.id = tah.id
AND tb.date_this_status < tah.date_this_status)
AND NOT EXISTS(SELECT null FROM table1_history tbh WHERE
tbh.id = tah.id
AND tbh.date_this_status < tah.date_this_status)
Three underlying assumptions here:
Every ID you want back will have at least one record in at least one of the tables.
There won't be multiple records for the same ID in the same table with the same date_this_status value (can be mitigated by using DISTINCT)
There won't be records for the same ID in the other table with the same date_this_status value (can be mitigated by using UNION instead of UNION ALL)
There are two slight optimizations we can make:
If an ID has a record in TABLE1_HISTORY, it will always be older than the record in TABLE1 for that ID.
TABLE1 will never contain multiple records for the same ID (but the history table may).
So:
SELECT ID, status, date_this_status FROM table1 ta WHERE
NOT EXISTS(SELECT null FROM table1_history tbh WHERE
tbh.id = ta.id
)
UNION ALL
SELECT ID, status, date_this_status FROM table1_history tah WHERE
NOT EXISTS(SELECT null FROM table1_history tbh WHERE
tbh.id = tah.id
AND tbh.date_this_status < tah.date_this_status)
If that is the actual structure of your tables, you can't get a 100% accurate answer, the issue being that you can have 2 different statuses for the same (earliest) date for any given record and you would not know which one was entered first, because you don't have a primary key on the history table
Ignoring the "two tables" issues for a moment, I'd use the following logic...
SELECT
id, status, date
FROM
Table1_hist AS [data]
WHERE
[data].date = (SELECT MIN(date) FROM Table1_hist WHERE id = [data].id)
(EDIT: As per BlackTigerX's comment, this assumes no id can have more than one status with the same datetime.)
The simple way to extrapolate this to two tables is to use breitak67's answer. Replace all instances of "my_table" with subqueries that UNION the two tables together. A potential issue here is that of performance, as you may find that indexes become unusable.
One method of speeding this up could be to use implied knowledge:
1. The main table always has a record for each id.
2. The history table doesn't always have a record.
3. Any record in the history table is always 'older' than the one in main table.
SELECT
[main].id,
ISNULL([hist].status, [main].status),
ISNULL([hist].date, [main].date)
FROM
Table1 AS [main]
LEFT JOIN
(
SELECT
id, status, date
FROM
Table1_hist AS [data]
WHERE
[data].date = (SELECT MIN(date) FROM Table1_hist WHERE id = [data].id)
)
AS [hist]
ON [hist].id = [main].id
Find the oldest status for each id in the history table. (Can use its indexes)
LEFT JOIN that to the main table (which always has exactly one record for each id)
If [hist] contains a value, it's the older by definition
If the [hist] doesn't have a value, use the [main] value