Difference between tables with the same structure - sql

I have two tables with the same structure and with slightly different rows - Table A, and Table B.
I would like to extract all the rows that are contained in table A but not in Table B.
CAn you help me do that?
By the way - Table A is in definition form, it does not previously created.
And additionaly - I have 15 sql scripts to analyse.
I would like to find some software that can help me with visualization of the entire proces (composed of 15 sql scripts).
Can you suggest something good?

try
SELECT * FROM Table_A
EXCEPT
SELECT * FROM Table_B
See http://en.wikipedia.org/wiki/Set_operations_%28SQL%29#EXCEPT_operator

One way is to use an left outer join this selects all in the first table and then matches these in the second. If the extra columns coming from the second table a NULL then there is no matching record in the second.
Suppose columns a to c are unique in both tables
select a.*
from tableA a
left outer join tableB on a.a = b.a and ... a.c = b.c
where b.a is null and ... and b.c is null

I was facing on a regular basis the same problem so I wrote my own software that can handle large databases (dozens of columns, tens of thousands of line) efficiently. I imagine you solved your problem but I post here if anybody else face the same problem.
The software is in R and can query and save to a MySQL server. To test it out though it may be easier to export your bases to two csv files as configuring the MySQL link (via RMySQL) may take a little time. Check it out on gitHub.
We use it on a very regular basis in my team and are happy with it.

A pain in the butt to write the query manually, so there are tools (like RedGate's SQL Compare) that do it for you. But...
SELECT
A.*
,B.*
FROM
A LEFT OUTER JOIN B
ON A.Field1 = B.Field1
AND A.Field2 = B.Field2
... -- join on each field
WHERE
B.Field1 IS NULL OR
B.Field2 IS NULL OR
... -- check for any NULL fields in B
If you're not interested in all data differences and only key differences then just change the list of fields you join on and filter on to the key fields.

Related

Oracle: Use only few tables in WHERE clause but mentioned more tables in 'FROM' in a jon SQL

What will happen in an Oracle SQL join if I don't use all the tables in the WHERE clause that were mentioned in the FROM clause?
Example:
SELECT A.*
FROM A, B, C, D
WHERE A.col1 = B.col1;
Here I didn't use the C and D tables in the WHERE clause, even though I mentioned them in FROM. Is this OK? Are there any adverse performance issues?
It is poor practice to use that syntax at all. The FROM A,B,C,D syntax has been obsolete since 1992... more than 30 YEARS now. There's no excuse anymore. Instead, every join should always use the JOIN keyword, and specify any join conditions in the ON clause. The better way to write the query looks like this:
SELECT A.*
FROM A
INNER JOIN B ON A.col1 = B.col1
CROSS JOIN C
CROSS JOIN D;
Now we can also see what happens in the question. The query will still run if you fail to specify any conditions for certain tables, but it has the effect of using a CROSS JOIN: the results will include every possible combination of rows from every included relation (where the "A,B" part counts as one relation). If each of the three parts of those joins (A&B, C, D) have just 100 rows, the result set will have 1,000,000 rows (100 * 100 * 100). This is rarely going to give the results you expect or intend, and it's especially suspect when the SELECT clause isn't looking at any of the fields from the uncorrelated tables.
Any table lacking join definition will result in a Cartesian product - every row in the intermediate rowset before the join will match every row in the target table. So if you have 10,000 rows and it joins without any join predicate to a table of 10,000 rows, you will get 100,000,000 rows as a result. There are only a few rare circumstances where this is what you want. At very large volumes it can cause havoc for the database, and DBAs are likely to lock your account.
If you don't want to use a table, exclude it entirely from your SQL. If you can't for reason due to some constraint we don't know about, then include the proper join predicates to every table in your WHERE clause and simply don't list any of their columns in your SELECT clause. If there's a cost to the join and you don't need anything from it and again for some very strange reason can't leave the table out completely from your SQL (this does occasionally happen in reusable code), then you can disable the joins by making the predicates always false. Remember to use outer joins if you do this.
Native Oracle method:
WITH data AS (SELECT ROWNUM col FROM dual CONNECT BY LEVEL < 10) -- test data
SELECT A.*
FROM data a,
data b,
data c,
data d
WHERE a.col = b.col
AND DECODE('Y','Y',NULL,a.col) = c.col(+)
AND DECODE('Y','Y',NULL,a.col) = d.col(+)
ANSI style:
WITH data AS (SELECT ROWNUM col FROM dual CONNECT BY LEVEL < 10)
SELECT A.*
FROM data a
INNER JOIN data b ON a.col = b.col
LEFT OUTER JOIN data c ON DECODE('Y','Y',NULL,a.col) = b.col
LEFT OUTER JOIN data d ON DECODE('Y','Y',NULL,a.col) = d.col
You can plug in a variable for the first Y that you set to Y or N (e.g. var_disable_join). This will bypass the join and avoid both the associated performance penalty and the Cartesian product effect. But again, I want to reiterate, this is an advanced hack and is probably NOT what you need. Simply leaving out the unwanted tables it the right approach 95% of the time.

Is exist a simple way to select all data from specific table

I have two tables: A, B. I need to select all data from B.
I could do this like
SELECT id, b1, b2, ... b20 FROM A,B WHERE A.id = B.id;
it's not cool solution. I will need to update this statement if I modify B's database. Is exist something like ?
SELECT *(B)
It selected all data from B and didn't selected any data from A.
My databases
A
id
a1
a2
...
a20
B
id
b1
b2
...
b20
So if you want create database seriously you shouldn't see on complexity of way but on efficiency and speed. So my advice to you is use JOIN that is the best solution for selecting data from two and more tables, because this way is fast as possible, at least for my more cleaner like for example inserted select.
You wrote: I need to select all data from B this means SELECT * FROM B not that you wrote.
My advice to your is using this:
SELECT * FROM A <INNER / LEFT / RIGHT / NATURAL> JOIN B ON A.id = B.id;
or to select specific columns
SELECT A.column1, A.column2, ... FROM A <INNER / LEFT / RIGHT / NATURAL> JOIN B ON A.id = B.id;
Note:
NATURAL JOIN will work in above example since the primary key
and the foreign key in the two tables have the same name. So you must be very
careful in using NATURAL JOIN queries in the absence of properly
matched columns.
So you really should think about how you will create database and how you will working with database, how you will pulling data for View from database, how you will insert new potential data to database etc.
Regards man!
Use following query:
SELECT * FROM B;
or
SELECT * FROM B INNER JOIN A ON A.id = B.id;
if you want to join tables A and B.
I suspect that the others have sufficiently answered your question about selecting all fields from table B. That's great, as you really should understand the SQL basics. If they haven't, I'd also advise that you check out SQLite.org site for a clarification on SQL syntax understood by SQLite.
But, assuming you've answered your question, I just want to voice two words of caution about using the asterisk syntax.
First, what if, at some later date, you add a column to B that is a big hairy blob (e.g. a multimegabyte image). If you use the * (or B.*) syntax to retrieve all of the columns from B, you may be retrieving a ton of information you might not need for your particular function. Don't retrieve data from SQLite if you don't need it.
Second, is your Objective C retrieving the data from your select statement on the basis of the column names of the result, or based upon the index number of the column in question. If you're doing the latter, then using the * syntax can be dangerous, because you can break your code if the physical order of columns in your table ever changes.
Using named columns can solve the problem of memory/performance issues in terms of retrieving too much data, as well as isolating the Objective C from the physical implementation of the table in SQLite. Generally, I would not advise developers to use the * syntax when retrieving data from a SQL database. Perhaps this isn't an issue for a trivial project, but as projects become more complicated, you may want to think carefully about the implications of the * syntax.
I don't know if the following query would work in sqlite, I know it works in Oracle, but you can give it a try...
SELECT B.* FROM A,B WHERE A.id = B.id;

SQL Method of checking that INNER / LEFT join doesn't duplicate rows

Is there a good or standard SQL method of asserting that a join does not duplicate any rows (produces 0 or 1 copies of the source table row)? Assert as in causes the query to fail or otherwise indicate that there are duplicate rows.
A common problem in a lot of queries is when a table is expected to be 1:1 with another table, but there might exist 2 rows that match the join criteria. This can cause errors that are hard to track down, especially for people not necessarily entirely familiar with the tables.
It seems like there should be something simple and elegant - this would be very easy for the SQL engine to detect (have I already joined this source row to a row in the other table? ok, error out) but I can't seem to find anything on this. I'm aware that there are long / intrusive solutions to this problem, but for many ad hoc queries those just aren't very fun to work out.
EDIT / CLARIFICATION: I'm looking for a one-step query-level fix. Not a verification step on the results of that query.
If you are only testing for linked rows rather than requiring output, then you'd use EXISTS.
More correctly, you need a "semi-join" but this isn't supported by most RDBMS unless as EXISTS
SELECT a.*
FROM TableA a
WHERE EXISTS (SELECT * FROM TableB b WHERE a.id = b.id)
Also see:
Using 'IN' with a sub-query in SQL Statements
EXISTS vs JOIN and use of EXISTS clause
SELECT JoinField
FROM MyJoinTable
GROUP BY JoinField
HAVING COUNT(*) > 1
LIMIT 1
Is that simple enough? Don't have Postgres but I think it's valid syntax.
Something along the lines of
SELECT a.id, COUNT(b.id)
FROM TableA a
JOIN TableB b ON a.id = b.id
GROUP BY a.id
HAVING COUNT(b.id) > 1
Should return rows in TableA that have more than one associated row in TableB.

mysql query add [something]_ infront of all table rows when grabbing multiple tables

Example of what I want to do:
select a.* as a_, b. as b_* FROM a, b WHERE a.id = b.id
Cannot be done automatically in SQL. Either type or generate the following:
SELECT a.field AS a_field, a.field2 AS a_field2, ...
At any rate, listing the fields by hand is good practice anyways.

SQL (any) Request for insight on a query optimization

I have a particularly slow query due to the vast amount of information being joined together. However I needed to add a where clause in the shape of id in (select id from table).
I want to know if there is any gain from the following, and more pressing, will it even give the desired results.
select a.* from a where a.id in (select id from b where b.id = a.id)
as an alternative to:
select a.* from a where a.id in (select id from b)
Update:
MySQL
Can't be more specific sorry
table a is effectively a join between 7 different tables.
use of * is for examples
Edit, b doesn't get selected
Your question was about the difference between these two:
select a.* from a where a.id in (select id from b where b.id = a.id)
select a.* from a where a.id in (select id from b)
The former is a correlated subquery. It may cause MySQL to execute the subquery for each row of a.
The latter is a non-correlated subquery. MySQL should be able to execute it once and cache the results for comparison against each row of a.
I would use the latter.
Both queries you list are the equivalent of:
select a.*
from a
inner join b on b.id = a.id
Almost all optimizers will execute them in the same way.
You could post a real execution plan, and someone here might give you a way to speed it up. It helps if you specify what database server you are using.
YMMV, but I've often found using EXISTS instead of IN makes queries run faster.
SELECT a.* FROM a WHERE EXISTS (SELECT 1 FROM b WHERE b.id = a.id)
Of course, without seeing the rest of the query and the context, this may not make the query any faster.
JOINing may be a more preferable option, but if a.id appears more than once in the id column of b, you would have to throw a DISTINCT in there, and you more than likely go backwards in terms of optimization.
I would never use a subquery like this. A join would be much faster.
select a.*
from a
join b on a.id = b.id
Of course don't use select * either (especially never use it when doing a join as at least one field is repeated) and it wastes network resources to send unnneeded data.
Have you looked at the execution plan?
How about
select a.*
from a
inner join b
on a.id = b.id
presumably the id fields are primary keys?
Select a.* from a
inner join (Select distinct id from b) c
on a.ID = c.AssetID
I tried all 3 versions and they ran about the same. The execution plan was the same (inner join, IN (with and without where clause in subquery), Exists)
Since you are not selecting any other fields from B, I prefer to use the Where IN(Select...) Anyone would look at the query and know what you are trying to do (Only show in a if in b.).
your problem is most likely in the seven tables within "a"
make the FROM table contain the "a.id"
make the next join: inner join b on a.id = b.id
then join in the other six tables.
you really need to show the entire query, list all indexes, and approximate row counts of each table if you want real help