Database Table Content Comparison - sql

We Use SAP HANA as database.
How can I compare if two tables have the same content?
I already did a comparison of the primary key using SQL:
select COUNT (*) from Schema.table1;
select COUNT (*) from Schema.table2;
select COUNT (*)
from Schema.table1 p
join schema.table2 r
on p.keyPart1 = r.keyPart1
and p.keyPart2 = r.keyPart2
and p.keyPart3 = r.keypart3;
So I compared the rows of both tables and of the join. All row counts are the same.
But I still don't know if the content of all rows are exactly the same. It could be that one ore more cells of a non-key column is deviating.
I thought about putting all columns in the join Statement. But that did not feel right.

You might want to use except
SELECT * FROM A
EXCEPT
SELECT * FROM B;
SELECT * FROM B
EXCEPT
SELECT * FROM A;

Related

How to find different rows in two tables with same columns?

I have two tables that have exact same set of columns. I'd like to select all rows that don't exactly match. Is there a way to do that without joining by every column or typing every column's name in any other way (I have a large number of them)?
If the number, type and order of columns are exactly the same, you can use the EXCEPT (or in some DBMS MINUS) operator to remove all rows from the first table, that match a row from the second table (by every column).
SELECT *
FROM table1
EXCEPT
SELECT *
FROM table2;
(Use EXCEPT ALL, if you don't want or need duplicate elimination. If you want also the result when the operands are interchanged, you can use UNION (or UNION ALL to union the results of a second EXCEPT operation. In doubt use parenthesis to prioritize the operations as needed.)
use minus
select * from tableA
minus
select * from tableB
If the query returns no rows then the data is exactly the same.
You could use JOIN by PK and compare all other columns using:
SELECT *
FROM src s
FULL OUTER JOIN trg t
ON s.id = t.id
WHERE NOT EXISTS (SELECT s.col1, s.col2, s.col3, s.col4
INTERSECT
SELECT t.col1, t.col2, t.col3, t.col4);
Please note that this approach allows to compare data side-by-side.
DBFiddle Demo
EDIT:
That still requires to explicitly mention every column? I'd rather not to.
Yes, but you could use drag and drop from object explorer(SSMS/TOAD/Oracle Developer) and avoid manually typing them.
There is SELECT * EXCEPT(only Google Big Query):
SELECT *
FROM src s
FULL OUTER JOIN trg t
ON s.id = t.id
WHERE NOT EXISTS (SELECT s.* EXCEPT s.id
INTERSECT
SELECT t.* EXCEPT t.id);

INTERSECT and UNION giving different counts of duplicate rows

I have two tables A and B with same column names. I have to combine them into table C
when I am running following query, the count is not matching -
select * into C
from
(
select * from A
union
select * from B
)X
The record count of C is not matching with A and B. There is difference of 89 rows. So I figured out that there are duplicates.
I used following query to find duplicates -
select * from A
INTERSECT
select * from B
-- 80 rows returned
Can anybody tell me why intersect returns 80 dups whereas count difference on using union is 89 ?
There are probably duplicates inside of A and/or B as well. All set operators perform an implicit DISTINCT on the result (logically, not necessarily physically).
Duplicate rows are usually a data-quality issue or an outright bug. I usually mitigate this risk by adding unique indexes on all columns and column sets that are supposed to be unique. I especially make sure that every table has a primary key if that is at all possible.

Combining the results of two SQL queries as separate columns

I have two queries which return separate result sets, and the queries are returning the correct output.
How can I combine these two queries into one so that I can get one single result set with each result in a separate column?
Query 1:
SELECT SUM(Fdays) AS fDaysSum From tblFieldDays WHERE tblFieldDays.NameCode=35 AND tblFieldDays.WeekEnding=?
Query 2:
SELECT SUM(CHdays) AS hrsSum From tblChargeHours WHERE tblChargeHours.NameCode=35 AND tblChargeHours.WeekEnding=?
Thanks.
You can aliasing both query and Selecting them in the select query
http://sqlfiddle.com/#!2/ca27b/1
SELECT x.a, y.b FROM (SELECT * from a) as x, (SELECT * FROM b) as y
You can use a CROSS JOIN:
SELECT *
FROM ( SELECT SUM(Fdays) AS fDaysSum
FROM tblFieldDays
WHERE tblFieldDays.NameCode=35
AND tblFieldDays.WeekEnding=1) A -- use you real query here
CROSS JOIN (SELECT SUM(CHdays) AS hrsSum
FROM tblChargeHours
WHERE tblChargeHours.NameCode=35
AND tblChargeHours.WeekEnding=1) B -- use you real query here
You could also use a CTE to grab groups of information you want and join them together, if you wanted them in the same row. Example, depending on which SQL syntax you use, here:
WITH group1 AS (
SELECT testA
FROM tableA
),
group2 AS (
SELECT testB
FROM tableB
)
SELECT *
FROM group1
JOIN group2 ON group1.testA = group2.testB --your choice of join
;
You decide what kind of JOIN you want based on the data you are pulling, and make sure to have the same fields in the groups you are getting information from in order to put it all into a single row. If you have multiple columns, make sure to name them all properly so you know which is which. Also, for performance sake, CTE's are the way to go, instead of inline SELECT's and such. Hope this helps.
how to club the 4 query's as a single query
show below query
total number of cases pending + 2.cases filed during this month ( base on sysdate) + total number of cases (1+2) + no. cases disposed where nse= disposed + no. of cases pending (other than nse <> disposed)
nsc = nature of case
report is taken on 06th of every month
( monthly report will be counted from 05th previous month to 05th present of present month)

SQL select from data in query where this data is not already in the database?

I want to check my database for records that I already have recorded before making a web service call.
Here is what I imagine the query to look like, I just can't seem to figure out the syntax.
SELECT *
FROM (1,2,3,4) as temp_table
WHERE temp_table.id
LEFT JOIN table ON id IS NULL
Is there a way to do this? What is a query like this called?
I want to pass in a list of id's to mysql and i want it to spit out the id's that are not already in the database?
Use:
SELECT x.id
FROM (SELECT #param_1 AS id
FROM DUAL
UNION ALL
SELECT #param_2
FROM DUAL
UNION ALL
SELECT #param_3
FROM DUAL
UNION ALL
SELECT #param_4
FROM DUAL) x
LEFT JOIN TABLE t ON t.id = x.id
WHERE x.id IS NULL
If you need to support a varying number of parameters, you can either use:
a temporary table to populate & join to
MySQL's Prepared Statements to dynamically construct the UNION ALL statement
To confirm I've understood correctly, you want to pass in a list of numbers and see which of those numbers isn't present in the existing table? In effect:
SELECT Item
FROM IDList I
LEFT JOIN TABLE T ON I.Item=T.ID
WHERE T.ID IS NULL
You look like you're OK with building this query on the fly, in which case you can do this with a numbers / tally table by changing the above into
SELECT Number
FROM (SELECT Number FROM Numbers WHERE Number IN (1,2,3,4)) I
LEFT JOIN TABLE T ON I.Number=T.ID
WHERE T.ID IS NULL
This is relatively prone to SQL Injection attacks though because of the way the query is being built. It'd be better if you could pass in '1,2,3,4' as a string and split it into sections to generate your numbers list to join against in a safer way - for an example of how to do that, see http://www.sqlteam.com/article/parsing-csv-values-into-multiple-rows
All of this presumes you've got a numbers / tally table in your database, but they're sufficiently useful in general that I'd strongly recommend you do.
SELECT * FROM table where id NOT IN (1,2,3,4)
I would probably just do:
SELECT id
FROM table
WHERE id IN (1,2,3,4);
And then process the list of results, removing any returned by the query from your list of "records to submit".
How about a nested query? This may work. If not, it may get you in the right direction.
SELECT * FROM table WHERE id NOT IN (
SELECT id FROM table WHERE 1
);

Returning more than one value from a sql statement

I was looking at sql inner queries (bit like the sql equivalent of a C# anon method), and was wondering, can I return more than one value from a query?
For example, return the number of rows in a table as one output value, and also, as another output value, return the distinct number of rows?
Also, how does distinct work? Is this based on whether one field may be the same as another (thus classified as "distinct")?
I am using Sql Server 2005. Would there be a performance penalty if I return one value from one query, rather than two from one query?
Thanks
You could do your first question by doing this:
SELECT
COUNT(field1),
COUNT(DISTINCT field2)
FROM table
(For the first field you could do * if needed to count null values.)
Distinct means the definition of the word. It eliminates duplicate returned rows.
Returning 2 values instead of 1 would depend on what the values were, if they were indexed or not and other undetermined possible variables.
If you are meaning subqueries within the select statement, no you can only return 1 value. If you want more than 1 value you will have to use the subquery as a join.
If the inner query is inline in the SELECT, you may struggle to select multiple values. However, it is often possible to JOIN to a sub-query instead; that way, the sub-query can be named and you can get multiple results
SELECT a.Foo, a.Bar, x.[Count], x.[Avg]
FROM a
INNER JOIN (SELECT COUNT(1) AS [Count], AVG(something) AS [Avg]) x
ON x.Something = a.Something
Which might help.
DISTINCT does what it says. IIRC, you can SELECT COUNT(DISTINCT Foo) etc to query distinct data.
you can return multiple results in 3 ways (off the top of my head)
By having a select with multiple values eg: select col1, col2, col3
With multiple queries eg: select 1 ; select "2" ; select colA. you would get to them in a datareader by calling .NextRecord()
Using output parameters, declare the parameters before exec the query then get the value from them afterwards. eg: set #param1 = "2" . string myparam2 = sqlcommand.parameters["param1"].tostring()
Distinct, filters resulting rows to be unique.
Inner queries in the form:
SELECT * FROM tbl WHERE fld in (SELECT fld2 FROM tbl2 WHERE tbl.fld = tbl2.fld2)
cannot return multiple rows. When you need multiple rows from a secondary query, you usually need to do an inner join on the other query.
rows:
SELECT count(*), count(distinct *) from table
will return a dataset with one row containing two columns. Column 1 is the total number of rows in the table. Column 2 counts only distinct rows.
Distinct means the returned dataset will not have any duplicate rows. Distinct can only appear once usually directly after the select. Thus a query such as:
SELECT distinct a, b, c FROM table
might have this result:
a1 b1 c1
a1 b1 c2
a1 b2 c2
a1 b3 c2
Note that values are duplicated across the whole result set but each row is unique.
I'm not sure what your last question means. You should return from a query all the data relevant to the query. As for faster, only benchmarking can tell you which approach is faster.