Filter out identical rows od data based on mutliple criteria to only show outliers

Filter out identical rows od data based on mutliple criteria to only show outliers - sql

I'm creating a stored procedure where I want to return a particular result set
I have 2 data sets that I'm trying to somehow join so that it only returns the outlier records (where there not a match). I've thought about using UNION and EXCEPT but it doesn't seem to work with this scenario. TO make this less complicated, i currently have two CTE in my proc (alternatively i can use #TempTables).
Query result 1. In the below result set, this query will return 3 fields. Field3 text value will always be the same here.
Field1 Field2 Field3
123 BAK 'Missing in Query 2'
234 HAO 'Missing in Query 2'
345 OPP 'Missing in Query 2'
Query result 2. Same deal here, Field3 will always have the same text value.
Field1 Field2 Field3
123 BAK 'Missing in Query 1'
234 HAO 'Missing in Query 1'
678 UTO 'Missing in Query 1'
DESIRED RESULT: The reason why these two are returned the first row (Field 345), is Missing in Query 2 and the 2nd row is missing in Query 1. I'm only looking for matches where Query1.Field1=Query2.Field1 and Query1.Field2 = Query2.Field2.
Field1 Field2 Field3
345 OPP 'Missing in Query 2' <- from Query 1
678 UTO 'Missing in Query 1' <- from Query 2
I've tried to use a FULL JOIN to do this, but FULL JOIN returns additional 3 columns with NULL values. I'm trying to avoid that and only display the result as shown above. Any help would be appreciated.

I think you want the rows that are not in both tables. One method is:
select qr1.*
from qr1
where not exists (select 1 from qr2 where qr2.field1 = qr1.field1 and qr2.field2 = qr1.field2)
union all
select qr2.*
from qr2
where not exists (select 1 from qr1 where qr1.field1 = qr2.field1 and qr1.field2 = qr2.field2);

You can use full outer join :
select coalesce(qr1.field1, qr2.field1) as field1,
coalesce(qr1.field2, qr2.field2) as field2,
(case when qr1.field1 is null
then 'Missing in Query 1'
else 'Missing in Query 2'
end) as Field3
from qr1 full outer join
qr2
on qr1.field1 = qr2.field1 and qr1.field2 = qr2.field2
where (qr1.field1 is null or qr2.field2 is null);

Related

SQL: select 3 values and count how many times they are different

Say that you need to query some data and that there are three fields like the following (this is part of a larger query):
Field1, Field2, Field3.
So you select them like this:
SELECT Field1=MyTable.Field1, Field2=MyTable.Field2, Field3=MyTable.Field3
FROM MyTable
I need to compare these values and return the variable Result that is computed as follows:
0 if they are all the same
1/2 if two are the same and one is different
1 if they are all different.
How should I restructure my query? I think I need a subquery but I am not sure how to structure it.

You can use case:
select (case when field1 = field2 and field1 = field3 then 0
when field1 in (field2, field3) or field2 = field3 then 0.5
else 1
end) as result

SQL Server 2014 - SQL Case statement on columns

This is the table I have :
For every unique TID, there are 2 records. For a unique TID if both records in a field is populated I want the name of the field. For example, for T01 : Field2 and Field4 have both records populated.
My current approach is I create a column with comma separated values with the field names :
INSERT INTO TEMP
SELECT *,
(CASE WHEN COUNT(IIF(Field1 IS NOT NULL,1,NULL)) = 2 THEN 'FIELD1' ELSE 'NO' END) + ',' +
(CASE WHEN COUNT(IIF(Field2 IS NOT NULL,1,NULL)) = 2 THEN 'FIELD2' ELSE 'NO' END) + ',' +
(CASE WHEN COUNT(IIF(Field3 IS NOT NULL,1,NULL)) = 2 THEN 'FIELD3' ELSE 'NO' END) + ',' +
(CASE WHEN COUNT(IIF(Field4 IS NOT NULL,1,NULL)) = 2 THEN 'FIELD4' ELSE 'NO' END) AS ATTR
FROM ORIGINAL_TABLE;
I then convert the comma separated column into multiple records :
SELECT *, S.ITEMS as ATTRIBUTES
FROM TEMP
CROSS APPLY DBO.SPLIT(ATTR, ',') S
WHERE S.ITEMS NOT LIKE '%NO%'
Consider T101 of the result obtained from above command, This gives me the output :
Edit : Apologies. It should be Field2 instead of Field1.
This does give me information on the fields for every unique TID that follows the condition but I want it to be more specific. I run this for very big data with over 100 columns so this approach is slow.
Is there a way to get this? Where I display just the fields that satisfy the condition and their values for both records in T101.
Edit : Apologies. It should be Field2 instead of Field1 in the table.
I am fairly new to SQL, any help would be much appreciated!

Your question is rather complicated, and I'm not 100% sure what you really want. But based on:
For a unique TID if both records in a field is populated I want the name of the field.
You can unpivot and aggregate. Assuming that your columns all have a similar data type, you can use:
SELECT t.tId, v.fieldname
FROM ORIGINAL_TABLE t CROSS APPLY
(VALUES ('Field1', Field1),
('Field2', Field2),
('Field3', Field3),
('Field4', Field4)
) v(fieldname, val)
GROUP BY t.tID, v.fieldname
HAVING COUNT(*) = COUNT(v.val) -- all populated

SQL: where clause with priority

I want to write a select query with where clause having two conditions.
select distinct user_name from table1 where column1 = 'AAAA' or is null
I want to give priority to AAAA if records with both AAAA and null are found.
How to do that?

Consider the 'null' entries only if the corresponding AAAA does not exist :
WHERE ... OR (column1 is null AND not exists (select * from table1 as inner where inner.user_name = outer.user_name and inner.column1 = 'AAAA'))
The variation using EXCEPT would be closer to RA-style thinking but would probably fail precisely because you're working with NULL here.
EDIT
(And this answer is only to be kept in mind for the cases where you need more attributes from the row than just the user_name and those attributes must match precisely the row retained for the result set.)

It's just
select distinct user_name from table1 where column1 = 'AAAA' or column1 is null

What do you mean by "if records with both AAAA and null are found." give priority to AAAA"?
One interpretation is to return AAAA records. If none exist, then choose NULL:
select distinct user_name
from table1
where column1 = AAAA
union all
select distinct user_name
from table1
where column1 is null and (not exists select 1 from table1 where column1 = AAAA);

What is the fastest/easiest way to tell if 2 records in the same SQL table are different?

I want to be able to compare 2 records in the same SQL table and tell if they are different. I do not need to tell what is different, just that they are different.
Also, I only need to compare 7 of 10 columns in the records. ie.) each record has 10 columns but I only care about 7 of these columns.
Can this be done through SQL or should I get the records in C# and hash them to see if they are different values?

You can write a group by query like this:
SELECT field1, field2, field3, .... field7, COUNT(*)
FROM table
[WHERE primary_key = key1 OR primary_key = key2]
GROUP BY field1, field2, field3, .... field7
HAVING COUNT(*) > 1
That way you get all records with same values for field 1 to 7, along with the number of occurrences.
Add the part between brackets to limit your search for duplicates, either with OR, or with IN (...).

IF EXISTS (SELECT Col1, Col2, ColEtc...
from MyTable
where condition1
EXCEPT SELECT Col1, Col2, ColEtc...
from MyTable
where condition2)
BEGIN
-- Query returns all rows from first set that are not column for column
-- also in the second (EXCEPT) set. So if there are any, there will be
-- rows returned, which meets the EXISTS criteria. Since you're only
-- checking EXISTS, SQL doesn't actually need to return columns.
END

No hash is necessary. Normal equality comparison is enough:
select isEqual = case when t1.a <> t2.a or t1.b <> t2.b bbb then 1 else 0 end

SELECT
CASE WHEN (a.column1, a.column2, ..., a.column7)
= (b.column1, b.column2, ..., b.column7)
THEN 'all 7 columns same'
ELSE 'one or more of the 7 columns differ'
END AS result
FROM tableX AS a
JOIN tableX AS b
ON t1.PK = #PK_of_row_one
AND t2.PK = #PK_of_row_two

Can't you just use the DISTINCT keyword? All duplicates will not be returned, so each row you receive is unique (and different from the others).
http://www.mysqlfaqs.net/mysql-faqs/SQL-Statements/Select-Statement/How-does-DISTINCT-work-in-MySQL
So you could make this query:
SELECT DISTINCT x,y,z FROM RandomTable WHERE x = something
Which will only return one row for each unique x,y,z combination.

Access query with no returned results

I have a query in Access and would like to know if it were possible to use the where not exists clause to display a specific text for each field when there are no returned rows.
Example query:
Select Field1, Field2, Field3
From TableA
Where Field1 = "test";
If there are no returned results I would like the following to return:
Field1 = "test"
Field2 = "not provided"
Field2 = "not provided"

How about:
SELECT Field1, Field2
FROM Table
WHERE ID=3
UNION ALL SELECT DISTINCT "None","None" FROM AnyTableithAtLeastOneRow
WHERE 3 NOT IN (SELECT ID FROM Table)

The usual way to do what you're asking is:
Select Field1, isnull(Field2, 'Not Provided'), isnull(Field3, 'Not Provided')
edit
whoops, you're using Access, in that case the equivalent function is "nz" (what?! :p)
Select Field1, nz(Field2, 'Not Provided'), nz(Field3, 'Not Provided')

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Filter out identical rows od data based on mutliple criteria to only show outliers - sql

Related

SQL: select 3 values and count how many times they are different

SQL Server 2014 - SQL Case statement on columns

SQL: where clause with priority

What is the fastest/easiest way to tell if 2 records in the same SQL table are different?

Access query with no returned results

Categories

Resources