Suppose I have 2 tables, each tables has N columns. There are NO duplicate rows in table1
And now we want to know what datasets in table2 (including duplicates) are also contained in table1.
I tried
select * from table1
intersect
select * from table2
But this only gives me unique rows that are in both tables. But I don't want unique rows, are want to see all rows in table2 that are in table1...
Keep in mind!! I cannot do
select *
from table1 a, table b
where a.table1col = b.table2col
...because I don't know the number of columns of the tables at runtime.
Sure I could do something with dynamic SQL and iterate over the column numbers but I'm asking this precisely because it seems too simple a query for that kind of stuff..
Example:
create table table1 (table1col int)
create table table2 (table2col int)
insert into table1 values (8)
insert into table1 values (7)
insert into table2 values (1)
insert into table2 values (8)
insert into table2 values (7)
insert into table2 values (7)
insert into table2 values (2)
insert into table2 values (9)
I want my query then to return:
8
7
7
If the amount of columns is not know, you will have to resort to a value computed over a row to make a match.
One such function is CHECKSUM.
Returns the checksum value computed over a row of a table, or over a
list of expressions. CHECKSUM is intended for use in building hash
indices.
SQL Statement
SELECT tm.*
FROM (
SELECT CS = CHECKSUM(*)
FROM Table2
) tm
INNER JOIN (
SELECT CS = CHECKSUM(*)
FROM Table2
INTERSECT
SELECT CHECKSUM(*)
FROM Table1
) ti ON ti.CS = tm.CS
Note that CHECKSUM might introduce collisions. You will have to test for that before doing any operation on your data.
Edit
In case you are using SQL Server 2005, you might make this a bit more robust by throwing in HASH_BYTES.
The downside of HASH_BYTESis that you need to specify the columns on which you want to operate but for all the columns you do known up-front, you could use this to prevent collisions.
EXCEPT vs INTERSECT - link
EXCEPT returns any distinct values from the left query that are not also found on the right query.
INTERSECT returns any distinct values that are returned by both the query on the left and right sides of the INTERSECT operand.
Maybe EXCEPT can solve your problem
Related
Let's say var_Environment = QA1,QA2.
I have two tables table 1 and table 2. Table 1 has some column name var_Environment =QA1; where as table 2 does have the same column name but no values
Now my task is, I need to copy all the values from table1 to table2 only if condition meets var_Environment =QA2 How will I perform this operation ? Can I use Union All with some condition? Can someone help?
You can use INSERT INTO SELECT like below.
The WHERE clause may need adjustment, so run the the SELECT query by itself to check if the correct rows are selected.
INSERT INTO table2
SELECT var_Environment
FROM Table1
WHERE var_Environment LIKE '%QA2%'
I'm trying to append two tables in MS Access at the moment. This is my SQL View of my Query at the moment:
INSERT INTO MainTable
SELECT
FROM Table1 INNER JOIN Table2 ON Table1.University = Table2.University;
Where "University" is the only field name that would have similarities between the two tables. When I try and run the query, I get this error:
Query must have at least one destination field.
I assumed that the INSERT INTO MainTable portion of my SQL was defining the destination, but apparently I am wrong. How can I specify my destination?
You must select something from your select statement.
INSERT INTO MainTable
SELECT col1, col2
FROM Table1 INNER JOIN Table2 ON Table1.University = Table2.University;
Besides Luke Ford's answer (which is correct), there's another gotcha to consider:
MS Access (at least Access 2000, where I just tested it) seems to match the columns by name.
In other words, when you execute the query from Luke's answer:
INSERT INTO MainTable
SELECT col1, col2
FROM ...
...MS Access assumes that MainTable has two columns named col1 and col2, and tries to insert col1 from your query into col1 in MainTable, and so on.
If the column names in MainTable are different, you need to specify them in the INSERT clause.
Let's say the columns in MainTable are named foo and bar, then the query needs to look like this:
INSERT INTO MainTable (foo, bar)
SELECT col1, col2
FROM ...
As other users have mentioned, your SELECT statement is empty. If you'd like to select more that just col1, col2, however, that is possible. If you want to select all columns in your two tables that are to be appended, you can use SELECT *, which will select everything in the tables.
I have two table one is having all field VARCHAR2 but other having different type for different data.
For Example :
Table One
==========================
Col 1 VARCHAR2 UNIQUE KEY
Col 2 VARCHAR2
Col 3 VARCHAR2
===========================
Table Two
==========================
Col One VARCHAR2 UNIQUE KEY
Col Two TIMESTAMP
Col Three NUMBER
==========================
we are having one mapping table. it denotes which column of Table One has to compare with which column of Table Two.
For Example
Mapping Table
==============================
Table One Table Two
==============================
Col 1 Col One
Col 2 Col Three
Col 3 Col Two
==============================
Now with the help of UNIQUE KEY of TABLE ONE we have to find same row in TABLE TWO and compare rows column by column and get changes in data.
Currently we are using java program for comparing data row by row and column by column and getting changes between data in rows with same UNIQUE KEY. it is working fine but taking too much time as we are having 100000 records in DB.
Now my question is : is there any way i can compare data at SQL level and get changes in data?
You can do it 'manually' with a query like this: It's a lot of work, but there are only three different types of checks you need to do, so it's not very complex:
select
*
from
Table1 t1
full outer join Table2 t2 on t2.ID = t1.ID
where
-- Check ID, either record does not exist in either table.
t1.ID is null or
t2.ID = null or
-- Not nullable field can be easily compared.
t1.NotNullableField1 <> t2.NotNUllableField1 or
-- Nullable field is slightly more work.
t1.NullableField1 <> t2.NullableField1 or
(t1.NullableField1 is null and t2.NullableField1 is not null) or
(t1.NullableField1 is not null and t2.NullableField1 is null)
Another solution is to use MINUS, which is a bit like UNION, only it returns a dataset minus the records in a second dataset:
select * from Table1 t1
MINUS
select * from Table2 t2
This works only one way (which might be fine for your purpose), but you can also combine it with UNION to make it bidirectional.
select
*
from
( select * from Table1
MINUS
select * from Table2)
UNION ALL
( select * from Table2
MINUS
select * from Table1)
The output of both solutions is a bit different.
In the FULL OUTER JOIN query, the IDs will be joined and the values of the matching rows will be displayed next to each other as a single row.
In the MINUS query, the result will be presented as a single dataset. If a record does not exist in either one table, it will be displayed. If a record (ID) exists in both tables, but other fields are different, you will get both rows. So it's a bit harder to compare them.
See: http://www.techonthenet.com/oracle/minus.php
I have two tables in my SQLite Database (dummy names):
Table 1: FileID F_Property1 F_Property2 ...
Table 2: PointID ForeignKey(fileid) P_Property1 P_Property2 ...
The entries in Table2 all have a foreign key column that references an entry in Table1.
I now would like to select entries from Table2 where for example F_Property1 of the referenced file in Table1 has a specific value.
I tried something naive:
select * from Table2 where fileid=(select FileID from Table1 where F_Property1 > 1)
Now this actually works..kind of. It selects a correct file id from Table1 and returns entries from Table2 with this ID. But it only uses the first returned ID. What I need it to do is basically connect the returned IDs from the inner select by OR so it returns data for all the IDs.
How can I do this? I think it is some kind of cross-table-query like what is asked here What is the proper syntax for a cross-table SQL query? but these answers contain no explaination of what they are actually doing so I'm struggeling with any implementation.
They are using JOIN statements, but wouldn't this mix entries from Table1 and Table2 together while only checking matching IDs in both tables? At least that is how I understand this http://www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins
As you may have noticed from the style, I'm very new to using databases in general, so please forgive me if not everything is clear about what I want. Please leave a comment and I will try to improve the question if neccessary.
The = operator compares a single value against another, so it is assumed that the subquery returns only a single row.
To check whether a (column) value is in a set of values, use IN:
SELECT *
FROM Table2
WHERE fileid IN (SELECT FileID
FROM Table1
WHERE F_Property1 > 1)
The way joins work is not by "mixing" the data, but sort of combining them based on the key.
In your case (I am assuming the key field in Table 1 is unique), if you join those two tables on the primary key field, you will end up with all the entries in table2 plus all corresponding fields from table1. If you were doing this:
select * from table1, table2 where table1.fieldID=table2.foreignkey;
then, providing your key fields are set up right, you will end up with the following:
PointID ForeignKey(fileid) P_Property1 P_Property2 FileID F_Property1 F_Property2
The field values from table1 would be from matching rows.
Now, if you do this:
select table1.* from table 1, table2 where
table1.fieldID=table2.foreignkey and F_Property1>1;
Would essentially get the same set of records, but will only show the columns from the second table, and only those that satisfy the where condition for the first one.
Hope this helps :)
If I understood your question correctly this will get the job done.
Select t2.*
from table1 t1
inner join table2 t2 on t2.id = t1.id
where t1.Prop = 'SomeValue'
How do I append only distinct records from a master table to another table, when the master may have duplicates. Example - I only want the distinct records in the smaller table but I need to insert/append records to what I already have in the smaller table.
Ignoring any concurency issues:
insert into smaller (field, ... )
select distinct field, ... from bigger
except
select field, ... from smaller;
You can also rephrase it as a join:
insert into smaller (field, ... )
select distinct b.field, ...
from bigger b
left join smaller s on s.key = b.key
where s.key is NULL
If you don't like NOT EXISTS and EXCEPT/MINUS (cute, Remus!), you have also LEFT JOIN solution:
INSERT INTO smaller(a,b)
SELECT DISTINCT master.a, master.b FROM master
LEFT JOIN smaller ON smaller.a=master.a AND smaller.b=master.b
WHERE smaller.pkey IS NULL
You don't say the scale of the problem so I'll mention something I recently helped a friend with.
He works for an insurance company that provides supplemental Dental and Vision benefits management for other insurance companies. When they get a new client they also get a new database that can have 10's of millions of records. They wanted to identify all possible dupes with the data they already had in a master database of 100's of millions of records.
The solution we came up with was to identify two distinct combinations of field values (normalized in various ways) that would indicate a high probability of a dupe. We then created a new table containing MD5 hashes of the combos plus the id of the master record they applied to. The MD5 columns were indexed. All new records would have their combo hashes computed and if either of them had a collision with the master the new record would be kicked out to an exceptions file for some human to deal with it.
The speed of this surprised the hell out of us (in a nice way) and it has had a very acceptable false-positive rate.
You could use the distinct keyword to filter out duplicates:
insert into AnotherTable
(col1, col2, col3)
select distinct col1, col2, col3
from MasterTable
Based on Microsoft SQL Server and its Transact-SQL. Untested as always and the target_table has the same amount of rows as the source table (otherwise use columnnames between INSERT INTO and SELECT
INSERT INTO target_table
SELECT DISTINCT row1, row2
FROM source_table
WHERE NOT EXISTS(
SELECT row1, row2
FROM target_table)
Something like this would work for SQL Server (you don't mention what RDBMS you're using):
INSERT INTO table (col1, col2, col3)
SELECT DISTINCT t2.a, t2.b, t2.c
FROM table2 AS t2
WHERE NOT EXISTS (
SELECT 1
FROM table
WHERE table.col1 = t2.a AND table.col2 = t2.b AND table.col3 = t2.c
)
Tune where appropriate, depending on exactly what defines "distinctness" for your table.