Minus operator in sql - sql

I am trying to create a sql query with minus.
I have query1 which returns 28 rows with 2 columns
I have query2 which returns 22 row2 with same 2 columns in query 2.
when I create a query query1 minus query 2 it should have only show the 28-22=6 rows.
But it showing up all the 28 rows returned by query1.
Please advise.

Try using EXCEPT instead of MINUS. For Example:
Lets consider a case where you want to find out what tasks are in a table that haven't been assigned to you(So basically you are trying to find what tasks could be available to do).
SELECT TaskID, TaskType
FROM Tasks
EXCEPT
SELECT TaskID, TaskType
FROM Tasks
WHERE Username = 'Vidya'
That would return all the tasks that haven't been assigned to you. Hope that helps.

If MINUS won't work for you, the general form you want is the main query in the outer select and a variation of the other query in a not exists clause.
select <insert list of fields here>
from mytable a
join myothertable b
on b.aId = a.aid
where not exists (select * from tablec c where a.aid = c.aid)

The fields might not be exactly alike. may be one of the fields is char(10) and the other is char(20) and they both have the string "TEST" in them. They might "look" the same.
If the database you are working on supports "INTERSECT", try this query and see how many are perfectly matching results.
select field1, field2 from table1
intersect
select field1, field2 from table2
To get the results you are expecting, this query should give you 22 rows.

something like this:
select field1, field2, . field_n
from tables
MINUS
select field1, field2, . field_n
from tables;

MINUS works on the same principle as it does in the set operations. Suppose if you have set A and B,
A = {1,2,3,4}; B = {3,5,6}
then, A-B = {1,2,4}
If A = {1,3,5} and B = {2,4,6}
then, A-B = {1,3,5}. Here the count(A) before and after the MINUS operation will be the same, as it does not contain any overlapping terms with set B.
On similar lines, may be the result set obtained in query 2 may not have matching terms with the result of query1. Hence you are still getting 28 instead of 6 rows.
Hope this helps.

It returns the difference records in the upper query which are not contained by the second query.
In your case for example
A={1,2,3,4,5...28} AND B={29,30} then A-B={1,2,3....28}

Related

How to use MINUS in google bigquery?

I am trying to do MINUS on 2 tables which have same schema in big-query.As I understand MINUS is not working in biquery
You can do something like:
SELECT
field
FROM `project_id.dataset.tableA` A
WHERE NOT EXISTS(SELECT 1 FROM `project_id.dataset.tableB` b WHERE a.field = b.field)
I see that there is EXCEPT set operator in Big Query for Standard SQL.
The EXCEPT operator returns rows from the left input query that are not present in the right input query. This is similar to what the MINUS does in ORACLE/MySQL
SELECT fieldId from dataset.table1 except DISTINCT SELECT fieldId from dataset.table2
Note: the datatype of both the columns should be same in both the tables

SQL - Find duplicate fields and count how many fields are matched

I have a a large customer database where customers have been added multiple times in some circumstances which is causing problems. I am able to use a query to identify the records which are an exact match, although some records have slight variations such as different addresses or given names.
I want to query across 10 fields, some records will match all 10 which is clearly a duplicate although other fields may only match 5 fields with another record and require further investigation. Therefore i want to create a results set which has field with a count how many fields have been matched. Basically to create a rating of the likely hood the result is an actual match. All 10 would be a clear dup but 5 would only be a possible duplicate.
Some will only match on POSTCODE and FIRSTNAME which is generally can be discounted.
Something like this helps but as it only returns records which explicitly match on all 3 records its not really useful due the sheer amount of data.
SELECT field1,field2,field3, count(*)
FROM table_name
GROUP BY field1,field2,field3
HAVING count(*) > 1
You are just missing the magic of CUBE(), which generates all the combinations of columns automatically
DECLARE #duplicate_column_threshold int = 5;
WITH cte AS (
SELECT
field1,field2,...,field10
,duplicate_column_count = (SELECT COUNT(col) FROM (VALUES (field1),(field2),...,(field10)) c(col))
FROM table_name
GROUP BY CUBE(field1,field2,...,field10)
HAVING COUNT(*) > 1
)
SELECT *
INTO #duplicated_rows
FROM cte
WHERE duplicate_column_count >= #duplicate_column_threshold
Update: to fetch the rows from the original table, join it against the #duplicated_rows using a technique that treats NULLs as wildcards when comparing the columns.
SELECT
a.*
,b.duplicate_column_count
FROM table_name a
INNER JOIN #duplicated_rows b
ON NULLIF(b.field1,a.field1) IS NULL
AND NULLIF(b.field2,a.field2) IS NULL
...
AND NULLIF(b.field10,a.field10) IS NULL
You might try something like
Select field1, field2, field3, ... , field10, count(1)
from customerdatabase
group by field1, field2, field3, ... , field10
order by field1, field2, field3, ... , field10
Where field1 through field10 are ordered by the "most identifiable/important" to least.
This is as close I've got to what i'm trying to achieve, which will return all records which have any duplicate fields. I want to add a column to the results which indicate how many fields have matched any other record in the table. There are around 40,000 records in total.
select * from [CUST].[dbo].[REPORTA] as a
where exists
(select [GIVEN.NAMES],[FAMILY.NAME],[DATE.OF.BIRTH],[POST.CODE],[STREET],[TOWN.COUNTRY]
from [CUST].[dbo].[REPORTA] as b
where a.[GIVEN.NAMES] = b.[GIVEN.NAMES]
or a.[FAMILY.NAME] = b.[FAMILY.NAME]
or a.[DATE.OF.BIRTH] = b.[DATE.OF.BIRTH]
or a.[POST.CODE] = b.[POST.CODE]
or a.[STREET] = b.[STREET]
or a.[TOWN.COUNTRY] = b.[TOWN.COUNTRY]
group by [GIVEN.NAMES],[FAMILY.NAME],[DATE.OF.BIRTH],[POST.CODE],[STREET],[TOWN.COUNTRY]
having count(*) >= 1)
This query will return thousands of records but I'm generally interested in the record with a high count of exactly matching fields

Need help with optimizing "not in" query

I have an SQL query that I am looking to optimize.
SELECT *
FROM QUEUE_SMS_ALERT Q1
where ALERT_ORIGIN = "FOO"
AND RECORD_ID is null
and PHONE NOT IN (
SELECT DISTINCT PHONE
FROM QUEUE_SMS_ALERT Q2
where Q2.ALERT_ORIGIN = "BAR"
);
Basically need to get all rows where ALERT_ORIGIN is "FOO" Which do not have a corresponding row in the same table with ALERT_ORIGIN "BAR". The table contains abt 17000 rows and there are only abt 1000 records with ALERT_ORIGIN "BAR". So my query is supposed to give me abt 16000 rows.
EDIT : The current query is very slow. I do not have any indexes currently.
I'm guessing that you have NULL values in the phone column which means NOT IN doesn't work (so it's "fix" not "optimise"). So I've written it with NOT EXISTS:
SELECT *
FROM QUEUE_SMS_ALERT Q1
WHERE
Q1.ALERT_ORIGIN = 'FOO'
AND
Q1.RECORD_ID is null
AND
NOT EXISTS (SELECT *
FROM QUEUE_SMS_ALERT Q2
WHERE
Q2.ALERT_ORIGIN = 'BAR'
AND
Q1.PHONE = Q2.PHONE)
If it is slow rather than "wrong" then you need to use indexes. What do you have now?
For this query, you need an index on (ALERT_ORIGIN, PHONE, RECORD_ID).
Note: use single quotes for string delimiters

how to compare two rows in one mdb table?

I have one mdb table with the following structure:
Field1 Field2 Field3 Field4
A ...
B ...
I try to use a query to list all the different fields of row A and B in a result-set:
SELECT * From Table1
WHERE Field1 = 'A'
UNION
SELECT * From Table1
WHERE Field1 = 'B';
However this query has two problems:
it list all the fields including the
identical cells, with a large table
it gives out an error message: too
many fields defined.
How could i get around these issues?
Is it not easiest to just select all fields needed from the table, based on the Field1 value and group on the values needed?
So something like this:
SELECT field1, field2,...field195
FROM Table1
WHERE field1 = 'A' or field1 = 'B'
GROUP BY field1, field2, ....field195
This will give you all rows where field1 is A or B and there is a difference in one of the selected fields.
Oh and for the group by statement as well as the SELECT part, indeed use the previously mentioned edit mode for the query. There you can add all fields (by selecting them in the table and dragging them down) that are needed in the result, then click the 'totals' button in the ribbon to add the group by- statements for all. Then you only have to add the Where-clause and you are done.
Now that the question is more clear (you want the query to select fields instead of records based on the particular requirements), I'll have to change my answer to:
This is not possible.
(untill proven otherwise) ;)
As far as I know, a query is used to select records using for example the where clause, never used to determine which fields should be shown depending on a certain criterium.
One thing that MIGHT help in this case is to look at the database design. Are those tables correctly made?
Suppose you have 190 of those fields that are merely details of the main data. You could separate this in another table, so you have a main table and details table.
The details table could look something like:
ID ID_Main Det_desc Det_value
This way you can filter all Detail values that are equal between the two main values A and B using something like:
Select a.det_desc, a.det_value, b.det_value
(Select Det_desc, det_value
from tblDetails
where id_main = a) as A inner join
(Select Det_desc, det_value
from tblDetails
where id_main = a) as B
on A.det_desc = B.det_desc and A.det_value <> B.det_value
This you can join with your main table again if needed.
You can full join the table on itself, matching identical rows. Then you can filter on mismatches if one of the two join parts is null. For example:
select *
from (
select *
from Table1
where Field1 = 'A'
) A
full join
(
select *
from Table1
where Field1 = 'B'
) B
on A.Field2 = B.Field2
and A.Field3 = B.Field3
where A.Field1 is null
or B.Field1 is null
If you have 200 fields, ask Access to generate the column list by creating a query in design view. Switch to SQL view and copy/paste. An editor with column mode (like UltraEdit) will help create the query.

Optionally use a UNION from another table in T-SQL without using temporary tables or dynamic sql?

I have two sql server tables with the same structure. In a stored procedure I have a Select from the first table. Occasionally I want to select from the second table as well based on a passed in parameter.
I would like a way to do this without resorting to using dynamic sql or temporary tables.
Pass in param = 1 to union, anything else to only return the first result set:
select field1, field2, ... from table1 where cond
union
select field1, field2, ... from table2 where cond AND param = 1
If they are both the exact same structure, then why not have a single table with a parameter that differentiates the two tables? At that point, it becomes a simple matter of a case statement on the parameter on which results set you receive back.
A second alternative is dual result sets. You can select multiple result sets out of a stored procedure. Then in code, you would either use DataReader.NextResult or DataSet.Tables(1) to get at the second set of data. It will then be your code's responsibility to place them into the same collection or merge the two tables.
A THIRD possibility is to utilize an IF Statement. Say, pass in an integer with the expected possible values of 1,2, 3 and then have something along this in your actual stored procedure code
if #Param = 1 Then
Select From Table1
if #Param = 2 THEN
Select From Table2
if #Param = 3 Then
Select From Table1 Union Select From Table 2
A fourth possibility would be to have two distinct procedures one which runs a union and one which doesn't and then make it your code's responsibility to determine which one to call based on that parameter, something like:
myCommandObject.CommandText = IIf(myParamVariable = true, "StoredProc1", StoredProc2")
It's pretty easy.
/* Always return tableX */
select colA, colB
from tableX
union
select colA, colB
from tableY
where #parameter = 'IncludeTableY' /* Will union with an empty set otherwise */
If this isn't immediately apparent (it often isn't), consider the examples below. The primary thing to remember is that the if the where clause evaluates to true for a row, it is returned otherwise it's discarded.
This always evaluates to true so every row is returned.
select *
from tableX
where 1 = 1
This always evaluates to false so no rows are returned (sometimes used as a quick and dirty get-me-the-columns query).
select *
from tableX
where 1 = 0
this will return values from either table, depending on if you passed a value on the parameter
select field1, field2, ... from table1 where #p1 is null
union
select field1, field2, ... from table2 where #p1 is not null
you just need to add the rest of your criteria for the where clause
Use a view.
CREATE view_both
AS
SELECT *, 1 AS source
FROM table1
UNION ALL
SELECT *, 2 AS source
FROM table2
SELECT * FROM view_both WHERE source < #source_flag
The optimizer determines which, or both, tables to use based on source without requiring it to be indexed.