Oracle Compare data between two different table - sql

I have two table one is having all field VARCHAR2 but other having different type for different data.
For Example :
Table One
==========================
Col 1 VARCHAR2 UNIQUE KEY
Col 2 VARCHAR2
Col 3 VARCHAR2
===========================
Table Two
==========================
Col One VARCHAR2 UNIQUE KEY
Col Two TIMESTAMP
Col Three NUMBER
==========================
we are having one mapping table. it denotes which column of Table One has to compare with which column of Table Two.
For Example
Mapping Table
==============================
Table One Table Two
==============================
Col 1 Col One
Col 2 Col Three
Col 3 Col Two
==============================
Now with the help of UNIQUE KEY of TABLE ONE we have to find same row in TABLE TWO and compare rows column by column and get changes in data.
Currently we are using java program for comparing data row by row and column by column and getting changes between data in rows with same UNIQUE KEY. it is working fine but taking too much time as we are having 100000 records in DB.
Now my question is : is there any way i can compare data at SQL level and get changes in data?

You can do it 'manually' with a query like this: It's a lot of work, but there are only three different types of checks you need to do, so it's not very complex:
select
*
from
Table1 t1
full outer join Table2 t2 on t2.ID = t1.ID
where
-- Check ID, either record does not exist in either table.
t1.ID is null or
t2.ID = null or
-- Not nullable field can be easily compared.
t1.NotNullableField1 <> t2.NotNUllableField1 or
-- Nullable field is slightly more work.
t1.NullableField1 <> t2.NullableField1 or
(t1.NullableField1 is null and t2.NullableField1 is not null) or
(t1.NullableField1 is not null and t2.NullableField1 is null)
Another solution is to use MINUS, which is a bit like UNION, only it returns a dataset minus the records in a second dataset:
select * from Table1 t1
MINUS
select * from Table2 t2
This works only one way (which might be fine for your purpose), but you can also combine it with UNION to make it bidirectional.
select
*
from
( select * from Table1
MINUS
select * from Table2)
UNION ALL
( select * from Table2
MINUS
select * from Table1)
The output of both solutions is a bit different.
In the FULL OUTER JOIN query, the IDs will be joined and the values of the matching rows will be displayed next to each other as a single row.
In the MINUS query, the result will be presented as a single dataset. If a record does not exist in either one table, it will be displayed. If a record (ID) exists in both tables, but other fields are different, you will get both rows. So it's a bit harder to compare them.
See: http://www.techonthenet.com/oracle/minus.php

Related

How do I compare whether same/intersection data is there in one and another row in same table when I have huge data in the table in sql server?

I have a table with the sample data below. Now, I just want to compare one record with all other records in the same table and we have to give ID if that record colloids with any other records in the remaining records. And column is with comma separated data, So if we have 'A,C' as Name in one record and 'A' in another record(Check the input from text) then it colloid each other because 'A' is common in both.
In the same way one of the record is not having anything in the Name it is NULL. When it is Null it should colloid with remaining other records. Like this Name column I have around 10 columns to verify data.
Input
ID
Name
1
A,C
2
B
3
A
4
NULL
OUTPUT
ID
ColloidID
1
3
1
4
2
4
3
1
3
4
4
1
4
2
4
3
Problem : I have implemented solution like below, and it working fine as expected. But the thing here is it is fine when less data in the table(<100k) but it's taking more time and space when dealing with millions of data(Ex : >20M Data)
SELECT DISTINCT A.ID,B.ID AS ColloidID
FROM #Temp1 A
CROSS APPLY #Temp1 B
WHERE A.ID<>B.ID
AND master.dbo.fIntersection(COALESCE(A.Name,B.Name,''),COALESCE(B.Name,A.Name,'')) = 1
Ideally you should not store multiple pieces of info in a single column.
Be that as it may, you can use a nested EXISTS with STRING_SPLIT to compare the two columns.
SELECT t1.ID, t2.ID
FROM #Temp1 t1
JOIN #Temp1 t2 ON t2.ID <> t1.ID
AND (t1.Name IS NULL OR t2.Name IS NULL
OR EXISTS (SELECT 1
FROM STRING_SPLIT(t1.Name, ',') s1
JOIN STRING_SPLIT(t2.Name, ',') s2 ON s2.value = s1.value
)
)
ORDER BY
t1.ID,
t2.ID;
db<>fiddle
20M isn't a lot of data, provided a good database design is used, with proper indexes. This is definitely not a good design. It violates the most basic design rule - one value per field. As a result, it's impossible to index Name, forcing 4*10^14 comparisons.
The only way to get acceptable performance is to fix the design. To do that Name has to be split into separate rows. The data needs to be stored in a table whose Name column is covered by an index or primary key:
create table #Id_Names (
ID bigint not null,
Name varchar(30) null,
INDEX IX_Id_Names (Name,ID)
);
GO
INSERT INTO #Id_Names (Id,Name)
select ID,value
from #Temp1 t
CROSS APPLY STRING_SPLIT(Name,',');
After that, the query is simplified to :
SELECT
t1.ID,t2.ID as ColloidID
FROM #Id_Names t1
INNER JOIN #Id_Names t2
ON t1.ID<>t2.ID
AND (t1.Name=t2.Name
OR t1.Name IS NULL
OR t2.Name IS NULL)
This can run a lot faster. The only real problem is the logic of treating NULL as a wildcard. This will return the entire table. And since the table joins itself, each null will result in (20M-1)^2 extra rows. The same relations will be repeated twice, eg (1,4) and (4,1)
If #Temp1 was a proper table, an alternative would be to create an indexed view. Creating an index over a VIEW essentially generates, stores and updates its results automatically.
Another option is to create a Clustered Columnstore index. This provides both compression and acceleration. The data is stored per column in buckets of roughly 1M rows. In each bucket, each column value is only stored once.
create table #Id_Names (
ID bigint not null,
Name varchar(30) null,
INDEX CCI_Id_Names CLUSTERED COLUMNSTORE
);

SQL Server multiply multiple columns by values in a column in a second table

I have a very large data set which has following columns:
[ID] [code_1] [code_2] [code_3] [code_4]
[days_code_1] [days_code_2] [days_code_3] [days_code_4]
The ID column is not unique, the [code_n] columns are text and the [days_code_n] columns are numeric.
In a second table I have two columns, one with code values which match [code_n], and [cost value] which corresponds to each code.
I want to be able to multiply the [days_code_n] by the [cost value]. I can do this individually, but for reasons out of my control I have 50 [code_n] and [days_code_n] columns. As the ID value is not unique I have to keep the data in the current format.
Can anyone advise me how to multiply the values in the code_days column, by the value in [the cost_value] variable in the second table without running 50 queries.
Since you didn't mention the names of your tables I call them Table1 and Table2.
You can join in the second table for every code_n column you have:
SELECT Table1.ID
,Table1.days_code_1 * t2_01.cost_value AS result1
,Table1.days_code_2 * t2_02.cost_value AS result2
,Table1.days_code_3 * t2_03.cost_value AS result3
...
,Table1.days_code_50 * t2_50.cost_value AS result50
FROM Table1
JOIN Table2 t2_01 ON Table1.[code_1] = t2_01.[code_n]
JOIN Table2 t2_02 ON Table1.[code_2] = t2_02.[code_n]
JOIN Table2 t2_03 ON Table1.[code_3] = t2_03.[code_n]
...
JOIN Table2 t2_50 ON Table1.[code_50] = t2_50.[code_n]
And make sure that you have a foreign key on every Table1.code_... column which references Table2.code_n. Otherwise the query could be very slow.

Compare one value of column A with all the values of column B in Hive HQL

I have two columns in one table say Column A and Column B. I need to search each value of Column A with All the values of column B each and every time and return true if the column A value is found in any of the rows of column B. How can i get this?
I have tried using the below command:
select column _A, column_B,(if (column_A =column_B), True, False) as test from sample;
If i use the above command, it is checking for that particular row alone. But I need true value, if a value of column A is found in any of the rows of column B.
How can i can check one value of column A with all the all the values of column B?
Or Is there any possibility to iterate and compare each value between two columns?
Solution
create temporary table t as select rand() as id, column_A, column_B from sample; --> Refer 1
select distinct t3.id,t3.column_A,t3.column_B,t3.match from ( --> Refer 3
select t1.id as id, t1.column_A as column_A, t1.column_B as column_B,--> Refer 2
if(t2.column_B is null, False, True) as match from t t1 LEFT OUTER JOIN
t t2 ON t1.column_A = t2.column_B
) t3;
Explanation
Create an identifier column to keep track of the rows in original table. I am using rand() here. We will take advantage of this to get the original rows in Step 3. Creating a temporary table t here for simplicity in next steps.
Use a LEFT OUTER JOIN with self to do your test that requires matching each column with another across all rows, yielding the match column. Note that here multiple duplicate rows may get created than in Sample table, but we have got a handle on the duplicates, since the id column for them will be same.
In this step, we apply distinct to get the original rows as in Sample table. You can then ditch the id column.
Notes
Self joins are costly in terms of performance, but this is unavoidable for solution to the question.
The distinct used in Step 3, is costly too. A more performant approach would be to use Window functions where we can partition by the id and pick the first row in the window. You can explore that.
You can do a left join to itself and check if the column key is null. If it is null, then that value is not found in the other table. Use if or "case when" function to check if it is null or not.
Select t1.column_A,
t1.column_B,
IF(t2.column_B is null, 'False', 'True') as test
from Sample t1
Left Join Sample t2
On t1.column_A = t2.column_B;

SQL: How to update an empty column with pre-defined set of values

I have a table with, let's say, 100 records. The table has two columns. The first column (A) has unique values. The second column (B) has NULL values
For 4 elements from column A I'd like to associate some earlier defined values, and they are unique as well.
I don't care about which value from column B will be associated with the value from column A. I'd like to associate 4 unique values with another 4 unique values. Basically, like I'd cut and paste a block of values from one column to another in excel.
How can I do it without using cursors?
I'd like to use one Update statement for ALL rows instead one Update statement for EVERY row as I do now.
Try this:
UPDATE t
SET ColumnB = BValue
FROM Table t
INNER JOIN
(
SELECT 1 AValue, 'Mouse' BValue UNION
SELECT 2, 'Cat' UNION
SELECT 3, 'Dog' UNION
SELECT 4, 'Wolf'
) PreDefined ON(t.ColumnA = PreDefined.AValue)
Use any number you want in the 'PreDefined' table, as long as they are unique and within the range of values in columnA of your original table.
If you are only trying to fill a table for testing purposes, I guess you could:
A) Use the value from Column A itself (as it is already unique).
B) If they are to be different, use some function on the column A's value to obtain a column B value (something simple, like (ColumnA * 10), and this would give youA)
C) Create a temp table with a "dictionary" setting a B value for each possible A value, and then update the rows desired on your table looking up from values on this dictionary table.
Anyway, if you explain a little further your purpose it will be easier to try suggesting you a solution.
if your animal data is already in a database table, then you can use a single update statement like this:
update target_table t4
set columnb = (
select animal_name
from (select columna, animal_name
from (select rownum rowNumber, animal_name from animal_table) t1
join (select rownum rowNumber, columna from target_table t1 where columnb is null) t2
on t1.rowNumber = t2.rowNumber
) t3
where t4.columna = t3.columna
)
;
this works by selecting a sequence number and animal name from the source table, then selecting a sequence number and columna value from your target table. by joining those records on the sequence number you guarantee you get exactly 1 animal name for each columna value. you can then join those columna-to-animal records to your target table to do an update of columnb.
for more background on updating one table from values in another, you might consider the solutions presented here: Update rows in one table with data from another table based on one column in each being equal. the only difference is that in your example, you do not have any column that matches between your target table and your animal names table, so you need to use the rownum to create an arbitrary 1-to-1 matching of records.
if your unique options are in a text file or spreadsheet, then you can format them into a fixed-width space-padded string and pick the one you want using the rownum index like so:
update table_name
set columnb = trim(substr('mouse cat dog wolf ', rownum*6-6, 6))
where columnb is null;

Short but though SQL Query (T-SQL,SQL Server)

Suppose I have 2 tables, each tables has N columns. There are NO duplicate rows in table1
And now we want to know what datasets in table2 (including duplicates) are also contained in table1.
I tried
select * from table1
intersect
select * from table2
But this only gives me unique rows that are in both tables. But I don't want unique rows, are want to see all rows in table2 that are in table1...
Keep in mind!! I cannot do
select *
from table1 a, table b
where a.table1col = b.table2col
...because I don't know the number of columns of the tables at runtime.
Sure I could do something with dynamic SQL and iterate over the column numbers but I'm asking this precisely because it seems too simple a query for that kind of stuff..
Example:
create table table1 (table1col int)
create table table2 (table2col int)
insert into table1 values (8)
insert into table1 values (7)
insert into table2 values (1)
insert into table2 values (8)
insert into table2 values (7)
insert into table2 values (7)
insert into table2 values (2)
insert into table2 values (9)
I want my query then to return:
8
7
7
If the amount of columns is not know, you will have to resort to a value computed over a row to make a match.
One such function is CHECKSUM.
Returns the checksum value computed over a row of a table, or over a
list of expressions. CHECKSUM is intended for use in building hash
indices.
SQL Statement
SELECT tm.*
FROM (
SELECT CS = CHECKSUM(*)
FROM Table2
) tm
INNER JOIN (
SELECT CS = CHECKSUM(*)
FROM Table2
INTERSECT
SELECT CHECKSUM(*)
FROM Table1
) ti ON ti.CS = tm.CS
Note that CHECKSUM might introduce collisions. You will have to test for that before doing any operation on your data.
Edit
In case you are using SQL Server 2005, you might make this a bit more robust by throwing in HASH_BYTES.
The downside of HASH_BYTESis that you need to specify the columns on which you want to operate but for all the columns you do known up-front, you could use this to prevent collisions.
EXCEPT vs INTERSECT - link
EXCEPT returns any distinct values from the left query that are not also found on the right query.
INTERSECT returns any distinct values that are returned by both the query on the left and right sides of the INTERSECT operand.
Maybe EXCEPT can solve your problem