Comparing two tables that doesn't have unique key - sql

I need to compare two tables data and check which attributed are mismatching, tables have same table definition, but the problem is i dint have a unique key to compare. I tried to use
CONCAT(CONCAT(CONCAT(table1.A, Table1.B))
=CONCAT(CONCAT(CONCAT(table2.A, Table2.B))
but still facing duplicate rows also tried NVL on few columns but didn't work
SELECT
UT.cat,
PD.cat
FROM
EM UT, EM_63 PD
WHERE
NVL(UT.cat, 1) = NVL(PD.cat, 1) AND
NVL(UT.AT_NUMBER, 1) = NVL(PD.AT_NUMBER, 1) AND
NVL(UT.OFFSET, 1) = NVL(PD.OFFSET, 1) AND
NVL(UT.PROD, 1) = NVL(PD.PROD, 1)
;
There are 34k records in one table 35k records in another table, but if I run the above query, the count of rows is 3 millions.
Columns in table:
COUNTRY
CATEGORY
TYPE
DESCRIPTION
Sample data :
Table 1 :
COUNTRY CATEGORY TYPE DESCRIPTION
US C T1 In
IN A T2 OUT
B C T2 IN
Y C T1 INOUT
Table 2:
COUNTRY CATEGORY TYPE DESCRIPTION
US C T2 In
IN B T2 Out
Q C T2 IN
Expected output:
column Matched unmatched
COUNTRY 2 1
CATEGORY 2 1
TYPE 2 1
DESCRIPTION 3 0

In the most general case (when you may have duplicate rows, and you want to see which rows exist in one table but not in the other, and ALSO which rows may exist in both tables, but the row exists 3 times in the first table but 5 times in the other):
This is a very common problem with a settled "best solution" which for some reason it seems most people are still not aware of, even though it was developed on AskTom many years ago and has been presented numerous times.
You do NOT need a join, you do not need a unique key of any kind, and you don't need to read either table more than once. The idea is to add two columns to show from which table each row comes, do a UNION ALL, then GROUP BY all the columns except the "source" columns and show the count for each table. Something like this:
select count(t_1) as count_table_1, count(t_2) as count_table_2, col1, col2, ...
from (
select 'x' as t_1, null as t_2, col1, col2, ...
from table_1
union all
select null as t_1, 'x' as t_2, col1, col2, ...
from table_2
)
group by col1, col2, ...
having count(t_1) != count(t_2)
;

Start with this query to check if these 4 columns form a key.
select occ_total,occ_ut,occ_pd
,count(*) as records
from (select count (*) as occ_total
,count (case tab when 'UT' then 1 end) as occ_ut
,count (case tab when 'PD' then 1 end) as occ_pd
from select 'UT' as tab,cat,AT_NUMBER,OFFSET,PROD from EM
union all select 'PD' ,cat,AT_NUMBER,OFFSET,PROD from EM_63 PD
) t
group by cat,AT_NUMBER,OFFSET,PROD
) t
group by occ_total,occ_ut,occ_pd
order by records desc
;
After you have chosen your "key",you can use the following query to see the attributes' values
select count (*) as occ_total
,count (case tab when 'UT' then 1 end) as occ_ut
,count (case tab when 'PD' then 1 end) as occ_pd
,count (distinct att1) as cnt_dst_att1
,count (distinct att2) as cnt_dst_att2
,count (distinct att3) as cnt_dst_att3
,...
,listagg (case tab when 'UT' then att1 end) within group (order by att1) as att1_vals_ut
,listagg (case tab when 'PD' then att1 end) within group (order by att1) as att1_vals_pd
,listagg (case tab when 'UT' then att2 end) within group (order by att2) as att2_vals_ut
,listagg (case tab when 'PD' then att2 end) within group (order by att2) as att2_vals_pd
,listagg (case tab when 'UT' then att3 end) within group (order by att3) as att3_vals_ut
,listagg (case tab when 'PD' then att3 end) within group (order by att3) as att3_vals_pd
,...
from select 'UT' as tab,cat,AT_NUMBER,OFFSET,PROD,att1,att2,att3,... from E M
union all select 'PD' ,cat,AT_NUMBER,OFFSET,PROD,att1,att2,att3,... from EM_63 PD
) t
group by cat,AT_NUMBER,OFFSET,PROD
;

The problem with CONCATis, that you could get invalid matches, if your data looks similar to this:
table1.A = '123'
table1.B = '456'
concatenates to: '123456'
table2.A = '12'
table2.B = '3456'
concatenates also to: '123456'
You have to compare the fields individually: table1.A = table2.A AND table1.B = table2.B

Related

Is there a way to make this run without a case statement? [duplicate]

This question already has answers here:
TSQL Pivot without aggregate function
(9 answers)
Closed 1 year ago.
I'm relatively new to coding and SQL so please bear with me.
I'm currently working on a query and I have no idea how to get the infinite loop to stop without using a case statement. When I use the case statement I get each value on its own row rather than the values all together in the combination they're supposed to be in.
Case statement SQL
select
CASE
When Attribute_id = '5024923' Then attribute_value
END Page_Name,
CASE
When Attribute_id = '5024925' Then attribute_value
END Site_Name,
CASE
When Attribute_id = '5024924' Then attribute_value
END Last_Touch_Channel,
count(distinct MASTER_CONTACT_ID) known_contact_count,
count (distinct visitor_id) total_contact_Count,
ACTION_DATE
from Adobe_Analytics_Staging
where ATTRIBUTE_ID in ('5024925','5024924','5024923')
group by ATTRIBUTE_ID, ACTION_DATE, ATTRIBUTE_VALUE
Example:
Error with Case statement:
Column A
Column B
Column C
value1
NULL
NULL
NULL
value2
NULL
NULL
NULL
value3
When in the data it is value1, value2, value3 on the same row.
So I'm trying a new avenue. I suspect the loop is because I'm linking back to the table so many times but I have limited the amount of results to the best of my ability to reduce the amount of records being sent through. Each query works and works fast individually. It's collectively that it slows down a ton.
The reason for joining to the table so many times is because I have to distinguish different types of values within one column.
Note: Not sure if it's relevant but the different values in the table correlate to a specific id number within that that table. Attribute value and attribute ID are different columns
For example in Table A the column looks like this
Column
A
B
C
I have to make it look like this:
Column 1
Column 2
Column 3
A
B
C
select
a.ATTRIBUTE_VALUE,
b.ATTRIBUTE_VALUE,
c.ATTRIBUTE_VALUE,
count(distinct aas.MASTER_CONTACT_ID) known_contact_count,
count (distinct d.visitor_id) total_contact_Count,
aas.ACTION_DATE
from Adobe_Analytics_Staging aas
left join (select ATTRIBUTE_VALUE, VISITOR_ID from Adobe_Analytics_Staging
where Attribute_id = '5024923') a on a.VISITOR_ID = aas.VISITOR_ID
left join (select ATTRIBUTE_VALUE, VISITOR_ID from Adobe_Analytics_Staging
where Attribute_id = '5024925') b on b.VISITOR_ID = aas.VISITOR_ID
left join (select ATTRIBUTE_VALUE, VISITOR_ID from Adobe_Analytics_Staging
where Attribute_id = '5024924') c on c.VISITOR_ID = aas.VISITOR_ID
inner join (select visitor_id from Adobe_Analytics_Staging
where ATTRIBUTE_ID in ('5024923','5024925','5024924')) d
on d.VISITOR_ID = aas.VISITOR_ID
--where aas.VISITOR_ID = '3438634761938550664_6795123974460253552'
group by a.ATTRIBUTE_VALUE, b.ATTRIBUTE_VALUE, c.ATTRIBUTE_VALUE, aas.ACTION_DATE
SELECT
VISITOR_ID,
MAX(CASE WHEN Attribute_id = '5024923' Then attribute_value END) Page_Name,
MAX(CASE WHEN Attribute_id = '5024925' Then attribute_value END) Site_Name,
MAX(CASE WHEN Attribute_id = '5024924' Then attribute_value END) Last_Touch_Channel,
COUNT(distinct MASTER_CONTACT_ID) known_contact_count,
COUNT(distinct visitor_id) total_contact_Count,
ACTION_DATE
FROM ContactTargeting.dbo.Adobe_Analytics_Staging
GROUP BY VISITOR_ID, ACTION_DATE
See this fiddle with some demo data

How to get the value as per type from one column and display it as multiple column in hive?

The below query will populate only check no and ac no even if there is data in rest of the column. How to populate all column from the value?
query:
select distinct * from(
select
(CASE WHEN b.dtl_typ = 'CheckNumber' THEN b.col_val END) as `Check Number`,
a.acc_id as Account_Number,
(CASE WHEN b.dtl_typ = 'AddressLine1' THEN b.col_val END) as `Address Line 1`,
(CASE WHEN b.dtl_typ = 'AddressLine2' THEN b.col_val END) as `Address Line 2`,
(CASE WHEN b.dtl_typ = 'City' THEN b.col_val END) as City,
(CASE WHEN b.dtl_typ = 'State' THEN b.col_val END) as State,
(CASE WHEN b.dtl_typ = 'Zipcode' THEN b.col_val END) as ZipCode
from
(select *from(select acc_id as acc_id, row_number() over(partition by acc_id order by booking_ts desc) rw_nbr
from TABLE1 where P_CODE = 'CheckTrnsfr' ) d where rw_nbr = 1) a
LEFT OUTER JOIN
(select acc_id, dtl_val as col_val, dtl_typ
from TABLE2 where dtl_typ in('AddressLine1','AddressLine2','City','State','Zipcode','CheckNumber')) b on b.acc_id = a.acc_id)a where
`Check Number` is not null
output:
expected output:The above picture only populate check no and a/c no and rest of the column is null even if the data is present which should not be happen. All column should populate with the available data.It is possible if i use more then one left join with where clause one by one like: dtl_typ= 'AddressLine1' left join dtl_typ='AddressLine2 and so on, but that should be a performance issue. It will hit DB multiple times.
select acc_id, dtl_val as col_val, dtl_typ from TABLE2 where dtl_typ in('AddressLine1','AddressLine2','City','State','Zipcode')
output:

SQL CASE statement returns duplicate values

Here is how my data looks
title value
------------
t1 v1
t2 v2
t3 v3
Now I want t1 and t2 to be inferred as the same value t12. So, I do:
SELECT
CASE
WHEN title = 't1' OR title = 't2'
THEN 't12'
ELSE title
END AS inferred_title,
COUNT(value)
FROM
my_table
GROUP BY
inferred_title;
I expected the output to be:
inferred title values
-----------------------
t12 2
t3 1
But what I end up getting is:
inferred title values
--------------------------
t12 1
t12 1
t3 1
How do I make it behave the way I want it to? I don't want the duplicated rows.
The problem is scoping. You must have an inferred_title in the table. Either give a new column alias or repeat the expression:
SELECT (CASE WHEN title IN ('t1', 't2') THEN 't12'
ELSE title
END) AS inferred_title,
COUNT(value)
FROM my_table
GROUP BY (CASE WHEN title IN ('t1', 't2') THEN 't12'
ELSE title
END);
Do the "merge" case in a derived table (sub-query), group by its result:
SELECT inferred_title, COUNT(value)
FROM
(
SELECT CASE WHEN title = 't1' OR title = 't2' THEN 't12'
ELSE title
END AS inferred_title,
value
FROM my_table
) dt
GROUP BY inferred_title;
This saves you some typing, is less error prone and easier to maintain - and is
ANSI SQL compliant!
Select Title, COUNT(Title) AS Totals
From my_table
Group By Title
Having COUNT(Title)>1
Order By 2 desc

How to use pivote to find out the max value from a row with three columns, that is max value out of three columns

I have following table named as 'Table',
Where I want result like following table where if you take first row and last three columns I want value to be 56.
I want sql server code for above table 'Table' and result to be second table. Here MaxV-1 and MaxV-2 are dependent on 'Number' column. MaxV-1 is max value out of FirstV, SecondV and ThirdV when Number is equal to 1 and same logic for MaxV-2.
One method is an unpivot and conditional aggreation:
select t.model,
max(case when t.number = 1 then t.pro_code end) as pro_code_1,
max(case when t.number = 2 then t.pro_code end) as pro_code_2,
max(case when t.number = 1 then v.v end) as max_val_1,
max(case when t.number = 2 then v.v end) as max_val_2
from t cross apply
(select max(v.v) as v
from (values (t.firstv), (t.secondv), (t.thirdv)) v(v)
) v
group by t.model;

SQL using CASE in SELECT with GROUP BY. Need CASE-value but get row-value

so basicially there is 1 question and 1 problem:
1. question - when I have like 100 columns in a table(and no key or uindex is set) and I want to join or subselect that table with itself, do I really have to write out every column name?
2. problem - the example below shows the 1. question and my actual SQL-statement problem
Example:
A.FIELD1,
(SELECT CASE WHEN B.FIELD2 = 1 THEN B.FIELD3 ELSE null FROM TABLE B WHERE A.* = B.*) AS CASEFIELD1
(SELECT CASE WHEN B.FIELD2 = 2 THEN B.FIELD4 ELSE null FROM TABLE B WHERE A.* = B.*) AS CASEFIELD2
FROM TABLE A
GROUP BY A.FIELD1
The story is: if I don't put the CASE into its own select statement then I have to put the actual rowname into the GROUP BY and the GROUP BY doesn't group the NULL-value from the CASE but the actual value from the row. And because of that I would have to either join or subselect with all columns, since there is no key and no uindex, or somehow find another solution.
DBServer is DB2.
So now to describing it just with words and no SQL:
I have "order items" which can be divided into "ZD" and "EK" (1 = ZD, 2 = EK) and can be grouped by "distributor". Even though "order items" can have one of two different "departements"(ZD, EK), the fields/rows for "ZD" and "EK" are always both filled. I need the grouping to consider the "departement" and only if the designated "departement" (ZD or EK) is changing, then I want a new group to be created.
SELECT
(CASE WHEN TABLE.DEPARTEMENT = 1 THEN TABLE.ZD ELSE null END) AS ZD,
(CASE WHEN TABLE.DEPARTEMENT = 2 THEN TABLE.EK ELSE null END) AS EK,
TABLE.DISTRIBUTOR,
sum(TABLE.SOMETHING) AS SOMETHING,
FROM TABLE
GROUP BY
ZD
EK
TABLE.DISTRIBUTOR
TABLE.DEPARTEMENT
This here worked in the SELECT and ZD, EK in the GROUP BY. Only problem was, even if EK was not the designated DEPARTEMENT, it still opened a new group if it changed, because he was using the real EK value and not the NULL from the CASE, as I was already explaining up top.
And here ladies and gentleman is the solution to the problem:
SELECT
(CASE WHEN TABLE.DEPARTEMENT = 1 THEN TABLE.ZD ELSE null END) AS ZD,
(CASE WHEN TABLE.DEPARTEMENT = 2 THEN TABLE.EK ELSE null END) AS EK,
TABLE.DISTRIBUTOR,
sum(TABLE.SOMETHING) AS SOMETHING,
FROM TABLE
GROUP BY
(CASE WHEN TABLE.DEPARTEMENT = 1 THEN TABLE.ZD ELSE null END),
(CASE WHEN TABLE.DEPARTEMENT = 2 THEN TABLE.EK ELSE null END),
TABLE.DISTRIBUTOR,
TABLE.DEPARTEMENT
#t-clausen.dk: Thank you!
#others: ...
Actually there is a wildcard equality test.
I am not sure why you would group by field1, that would seem impossible in your example. I tried to fit it into your question:
SELECT FIELD1,
CASE WHEN FIELD2 = 1 THEN FIELD3 END AS CASEFIELD1,
CASE WHEN FIELD2 = 2 THEN FIELD4 END AS CASEFIELD2
FROM
(
SELECT * FROM A
INTERSECT
SELECT * FROM B
) C
UNION -- results in a distinct
SELECT
A.FIELD1,
null,
null
FROM
(
SELECT * FROM A
EXCEPT
SELECT * FROM B
) C
This will fail for datatypes that are not comparable
No, there's no wildcard equality test. You'd have to list every field you want tested individually. If you don't want to test each individual field, you could use a hack such as concatenating all the fields, e.g.
WHERE (a.foo + a.bar + a.baz) = (b.foo + b.bar + b.az)
but either way, you're listing all of the fields.
I might tend to solve it something like this
WITH q as
(SELECT
Department
, (CASE WHEN DEPARTEMENT = 1 THEN ZD
WHEN DEPARTEMENT = 2 THEN EK
ELSE null
END) AS GRP
, DISTRIBUTOR
, SOMETHING
FROM mytable
)
SELECT
Department
, Grp
, Distributor
, sum(SOMETHING) AS SumTHING
FROM q
GROUP BY
DEPARTEMENT
, GRP
, DISTRIBUTOR
If you need to find all rows in TableA that match in TableB, how about INTERSECT or INTERSECT DISTINCT?
select * from A
INTERSECT DISTINCT
select * from B
However, if you only want rows from A where the entire row matches the values in a row from B, then why does your sample code take some values from A and others from B? If the row matches on all columns, then that would seem pointless. (Perhaps your question could be explained a bit more fully?)