SQL queries producing unexpected results - sql

I've got a strange situation with two SQL queries that aren't producing the expected results. Here are the queries:
Query 1:
SELECT DISTINCT SomeCharValue
FROM Table1
JOIN Table2
ON Table1.SomeCharValue = Table2.SomeCharValue
ORDER BY SomeCharValue
Query 2:
SELECT DISTINCT SomeCharValue
FROM Table1
JOIN Table2
ON Table1.SomeCharValue <> Table2.SomeCharValue
ORDER BY SomeCharValue
I have two tables with columns of varchar(15). Table2 is essentially a small subset of the values in Table1, thus Table1 has all values stored in Table2. The problem is, the two queries should never produce the same results, yet they do. Both queries will produce the same result for certain values; for example, if Table1 and Table2 contain the word 'hello', then Query 1 should return it, while Query 2 should not. However, BOTH queries return 'hello'. It doesn't make sense that 'hello' in both tables is equal and not equal at the same time. I ran a length query to test the values, and some were a different size with trailing white spaces, but even after changing these to be an exact match, and verifying the hexadecimal value of the characters to be the same, the same results occur. I can't compare numeric key fields since there is no key relationship between these tables. I can only compare the exact character values in the columns. Any ideas?

Imagine you have table1 containing a and b as separate rows, and table2 has the exact same contents.
Now for your second query, table1's row a will be compared to both the rows in table2. It will pass the ON clause when comparing to row b in table2, and hence a will be in your result set. Similarly for the b row in table1 which will pass the ON clause when compared to the a row in table2.
You could rewrite the query as
SELECT DISTINCT SomeCharValue
FROM TABLE1
WHERE SomeCharValue NOT IN (SELECT DISTINCT SomeCharValue FROM Table2)
ORDER BY SomeCharValue

Did you try to use NOT LIKE instead of <>

Related

How to Identify matching records in two tables?

I have two tables with same column names. There are a total 40 columns in each table. Both the tables have same unique IDs. If I perform an inner join on the ID columns I get a match on 80% of the data. However, I would like to see if this match has exactly same data in each of the columns.
If there were a few rows like say 50-100 I could have performed a simple union operation ordered by ID and manually checked for the data. But both the tables contain more than 5000 records.
Is a join on each of the columns a valid solution for this or do I need to perform concatenation?
Suppose you have N columns, you can add GROUP BY COL1,COL2,....COLN
select * from table1
union all
select * from table2
group by COL1, COL2, ... , COLN
having count(*)>1;
Reference: link

SQL MINUS showing no difference between first and second while shows difference between second and first

SQL MINUS is used as:
SELECT expression1, expression2, ... expression_n
FROM tables
[WHERE conditions]
MINUS
SELECT expression1, expression2, ... expression_n
FROM tables
[WHERE conditions];
In case I see no difference between first minus second but see a difference between second minus first, what does this signify? Is there any real difference? If so, then why and how may I get first minus second as no difference?
Please help.
You can refer the docs to understand the MINUS operator:
The Oracle MINUS operator is used to return all rows in the first
SELECT statement that are not returned by the second SELECT statement.
Each SELECT statement will define a dataset. The MINUS operator will
retrieve all records from the first dataset and then remove from the
results all records from the second dataset.
So if all the records of table1 exist in table2 then there will not be any record shown in the output. But when you reverse the tables and if there is a change in records then the same difference of records can be seen.
Of course there is, MINUS is substractring all the records from the first table, that appear on the second table.
Take this example:
TABLE1:
ID
1
2
4
Table2:
ID
1
2
4
5
SELECT * FROM TABLE1
MINUS
SELECT * FROM TABLE2
Will return nothing, since 1,2,4(all of table1 records) appear on table2 although they don't have exactly the same content.
As oppose to:
SELECT * FROM TABLE2
MINUS
SELECT * FROM TABLE1
Will return 5, because its the only value that doesn't appear on table1
So even if you are selecting from the same table, if you select different content(different where conditions) then minus won't work both sides equally .
If I understand you correctly, there are two cases of the problem as follows:
Table1 and Table2 have ame number of rows, but different values: In this case using Table1 Minus Table2 will have the same results as Table2 Minuse table1.
Different number of rows: In this case, Table1 Minus Table2 will only return those rows exists in Table1 and do not exist in Table2.
If you want to return rows exist in Table2, and do not exist in Table1, you have to write Table2 Minus Table1.
If you want to return all the differences, then you can use UNION ALL:
Table1 MINUS Table2
UNION All
Table2 MINUS Table1

SQL insert into, where not exists (select 1... what this "1" stands for?

INSERT INTO table1
SELECT * FROM table2
WHERE NOT EXISTS
(SELECT 1 FROM table1
WHERE table2.id = table1.id)
What is the role of that 1 in the forth line of code? I want to make an incremental update of table1 with records from table2. A friendly soul advised me to use the above query, which I find very common on the web in case of incremental update of a table. Can someone please explain how this mechanism works?
Exists checks for the presence of rows in the sub-select, not for the data returned by those rows.
So we are only interested if there is a row or not.
But as you can't have a select without selecting something, you need to put an expression into the select list.
That could be any expression. The actual expression is of no interest You could use select some_column or select * or select null or select 42 - that would all be the same.
You can select whatever in the case of EXISTS (sub-select, the only thing that matters are if a row is found (EXISTS true), or no rows found (EXISTS false).
The EXISTS keyword, as the name suggests, is used to determine whether or not any rows exist in a table that meet the specified condition. Since we only need to filter out those rows which meet the condition, but do not need to actually retrieve the values of individual columns, we use select 1 instead. For what it's worth, you can also write it as
INSERT INTO table1
SELECT * FROM table2
WHERE NOT EXISTS
(SELECT id FROM table1
WHERE table2.id = table1.id)
without affecting the filtering logic.

How to compare two tables each having 500 columns using PL-SQL

I need to compare two tables in different databases and check whether the data in both tables are matching or not.
The compare should return a result showing rows that don't match using an exact column to column data check.
Is this possible in PL-SQL?
To return all rows in table1 that do not match exactly the rows in table2:
select * from table1 except select * from table2
And to return all rows in table1 that match exactly what is in table2:
select * from table1 intersect select * from table2

Returning more than one value from a sql statement

I was looking at sql inner queries (bit like the sql equivalent of a C# anon method), and was wondering, can I return more than one value from a query?
For example, return the number of rows in a table as one output value, and also, as another output value, return the distinct number of rows?
Also, how does distinct work? Is this based on whether one field may be the same as another (thus classified as "distinct")?
I am using Sql Server 2005. Would there be a performance penalty if I return one value from one query, rather than two from one query?
Thanks
You could do your first question by doing this:
SELECT
COUNT(field1),
COUNT(DISTINCT field2)
FROM table
(For the first field you could do * if needed to count null values.)
Distinct means the definition of the word. It eliminates duplicate returned rows.
Returning 2 values instead of 1 would depend on what the values were, if they were indexed or not and other undetermined possible variables.
If you are meaning subqueries within the select statement, no you can only return 1 value. If you want more than 1 value you will have to use the subquery as a join.
If the inner query is inline in the SELECT, you may struggle to select multiple values. However, it is often possible to JOIN to a sub-query instead; that way, the sub-query can be named and you can get multiple results
SELECT a.Foo, a.Bar, x.[Count], x.[Avg]
FROM a
INNER JOIN (SELECT COUNT(1) AS [Count], AVG(something) AS [Avg]) x
ON x.Something = a.Something
Which might help.
DISTINCT does what it says. IIRC, you can SELECT COUNT(DISTINCT Foo) etc to query distinct data.
you can return multiple results in 3 ways (off the top of my head)
By having a select with multiple values eg: select col1, col2, col3
With multiple queries eg: select 1 ; select "2" ; select colA. you would get to them in a datareader by calling .NextRecord()
Using output parameters, declare the parameters before exec the query then get the value from them afterwards. eg: set #param1 = "2" . string myparam2 = sqlcommand.parameters["param1"].tostring()
Distinct, filters resulting rows to be unique.
Inner queries in the form:
SELECT * FROM tbl WHERE fld in (SELECT fld2 FROM tbl2 WHERE tbl.fld = tbl2.fld2)
cannot return multiple rows. When you need multiple rows from a secondary query, you usually need to do an inner join on the other query.
rows:
SELECT count(*), count(distinct *) from table
will return a dataset with one row containing two columns. Column 1 is the total number of rows in the table. Column 2 counts only distinct rows.
Distinct means the returned dataset will not have any duplicate rows. Distinct can only appear once usually directly after the select. Thus a query such as:
SELECT distinct a, b, c FROM table
might have this result:
a1 b1 c1
a1 b1 c2
a1 b2 c2
a1 b3 c2
Note that values are duplicated across the whole result set but each row is unique.
I'm not sure what your last question means. You should return from a query all the data relevant to the query. As for faster, only benchmarking can tell you which approach is faster.