How to check how many times a record is repeated in different tables - sql

I have two tables here:
Table 1:
process_id customer_id
16 1
21 1
22 1
Table 2:
process_id customer_id
16 1
16 1
22 1
I would like to check how many times each row in table 1 is repeated in table 2.
For example, row 1 in table 1 is repeated 2 times in table 2, row 2 repeated 0 times and row 3 repeated 1 time. I'm not sure how to loop through each row in table 1 and get this result.

As I understood, this is what you are asking for:
select table1.process_id,table1.customer_id,count(table2.process_id) as table2count
from table1 left outer join table2 on table1.process_id==table2.process_id and table1.customer_id=table2.customer_id
group by table1.process_id,table1.customer_id;

Related

Find whether id matches and substitute using Case Hive query

I have a table called "Scan" customer transactions where an individual_id appears once for every different transaction and contains column like scan_id.
I have another table called ids which contains random individual_ids sampled from Scan Table
I would like to join ids with scan and get a single record of ids and scan_id if it matches certain values.
Suppose data is like below
Scan table
Ids scan_id
---- ------
1 100
1 111
1 1000
2 100
2 111
3 124
4 1000
4 111
Ids table
id
1
2
3
4
5
I want below output i.e if scan_id matches either 100 or 1000
Id MT
------ ------
1 1
2 1
3 0
4 1
I executed below query and got error
select MT, d.individual_id
from
(
select
CASE
when scan_id in (90069421,53971306,90068594,136739913,195308160) then 1
ELSE 0
END as MT
from scan cs join ids r
on cs.individual_id = r.individual_id
where
base_div_nbr =1
and
country_code ='US'
and
retail_channel_code=1
and visit_date between '2019-01-01' and '2019-12-31'
) as d
group by individual_id;
I would appreciate any suggestions or help with regard to this Hive query. If there is an efficient way of getting this job done. Let me know.
Use a group by:
select s.individual_id,
max(case when s.scan_id in (100, 1000) then 1 else 0 end) as mt
from scan s
group by s.individual_id;
The ids table doesn't seem to be needed for this query.

Get row counts for different lookup values

A temp table has 700+ records with a PK. 12 columns contain Id values from lookup tables. Each lookup table has 4-8 records in it. How can I get a record count for each Id value in table LookupA that has a relationship via the PK to Id values in every other lookup table? Each lookup value in each lookup table needs to compared for a record count to every other lookup table and value.
I can write a SQL statement to get specific values for specific columns, but that's a long exercise and will slow down the proc.
Here's a sample of the data.
PK LookupA LookupB LookupC
1 1 1 3
2 1 2 3
3 1 3 2
4 2 4 2
5 4 1 1
6 3 2 1
7 2 3 3
8 4 4 3
9 4 3 2
10 1 1 2
The results need to compare LookupA with LookupB and LookupC to get a row count.
Table Value LookupB 1 2 3 4 LookupC 1 2 3
LookupA 1 2 1 1 0 0 2 2
2 0 0 1 1 0 1 1
3 0 1 0 0 1 0 0
4 1 0 1 1 1 1 1
Then LookupB would be compared to LookupA and LookupC.
And LookupC would be compared to LookupA and LookupB.
With this code you can get the numbers for all combinations of A,B and C in pairs:
select 'A-B' as Combination, LookupA, LookupB, count(*) as NumRecords
from table
group by Combination,LookupA, LookupB
UNION
select 'A-C' as Combination, LookupA, LookupC, count(*) as NumRecords
from table
group by Combination,LookupA, LookupC
UNION
select 'B-C' as Combination, LookupB, LookupC, count(*) as NumRecords
from table
group by Combination,LookupB, LookupC
After this, if you want to see all the values for LookupA comparing to B and C just
look for Combinations A-B and A-C
If I understand correctly, your temp table contains foreign keys to other tables, so why not simply use joins? Something like this.
SELECT COUNT(DISTINCT lookupA.id) as CountA
, COUNT(DISTINCT lookupB.id) as CountB
, etc...
FROM #temp_table t
LEFT OUTER JOIN lookupA a on a.id = t.lookupA
LEFT OUTER JOIN lookupB b on b.id = t.lookupB
...etc
I would suggest reviewing the design if possible. Having so many small tables complicates things, is it not possible to consolidate this and just have one lookup table? You could have an additional field "LookupType" and all the lookups could be in the same place which would make retrieval much simpler.
I used a slight derivative of the statement below without any UNIONs to get me where I wanted to go.
/*
select 'A-B' as Combination, LookupA, LookupB, count(*) as NumRecords
from table
group by Combination, LookupA, LookupB
*/
I used a variable and a WHILE loop to place the various summaries where they need to be.

why 9 rows are fetch from this query?

Given 2 tables T1 and T2.
T1 T2
A 1
B 2
C 3
You make a query SELECT * FROM T1, T2.
What is the no: of rows that are fetched from this query?
Answer is 9
This query results in cartesian product because no other conditions are provided. Every row from first table is matched with every row from second table.
The result is
A 1
A 2
A 3
B 1
B 2
B 3
C 1
C 2
C 3
Because each record from the first table is returned along with each record of the second table and the result is not filtered.
The exact output will be:
T1 T2
A 1
A 2
A 3
B 1
B 2
B 3
C 1
C 2
C 3
(order may vary)
It is a cartesian product: select all rows from one table (3) and all rows from another table (3) and combine them, so 3*3=9.
That's what you asked it to do. You got all the rows from T1 and all the rows from T2. They don't just get added together -- that won't work if the columns are different, for example, though you can do this with UNION -- they get merged in what's known as a "cartesian product". You essentially get all combinations of rows from both tables. And 3*3 = 9.

SQL query to find rows that aren't present in other tables

Here's what I'm trying to accomplish:
I've got two tables, call them first and second. They each have an ID column. They might have other columns but those aren't important. I have a third table, call it third. It contains two columns, ID and OTHERID. OTHERID references entries that may or may not exist in tables first and second.
I want to query third and look for rows who don't have an OTHERID column value that is found in either tables first or second. The goal is to delete those rows from table third.
Example:
first table:
ID
1
2
3
second table:
ID
6
7
8
third table
ID | OTHERID
21 1
22 2
23 3
24 4
25 5
26 6
27 7
28 8
In this case, I'd want to retrieve the IDs from third who don't have a matching ID in either table first or table second. I'd expect to get back the following IDs:
24
25
What I've tried:
I've done something this to get back the entries in third that aren't in first:
select t.* from third t where not exists (select * from first f where t.otherid = f.id);
and this will get me back the following rows:
ID | OTHERID
24 4
25 5
26 6
27 7
28 8
Similarly, I can get the ones that aren't in second:
select t.* from third t where not exists (select * from second s where t.otherid = s.id);
and I'll get:
ID | OTHERID
21 1
22 2
23 3
24 4
25 5
What I can't get my brain about this morning is how to combine the two queries together to get the intersection between the two results sets, so that just the rows with IDs 24 and 25 are returned. Those would be two rows I could remove since they are orphans.
How would you solve this? I think I'm on the right track but I'm just spinning at this point making no progress.
Maybe this :
SELECT third.*
FROM third
LEFT JOIN first ON third.otherID = first.id
LEFT JOIN second ON third.otherID = second.id
WHERE first.id IS NULL AND second.id IS NULL
Just use
select t.*
from third t
where
not exists (select * from first f where t.otherid = f.id)
and not exists (select * from second s where t.otherid = s.id)
SELECT t.ID
FROM third t
WHERE t.OTHERID NOT IN (
SELECT ID
FROM first
UNION
SELECT ID
FROM second
)

How to find recursively self-joined records from a table

I've got a simple problem that really has me stumped.
I have a master table Table X
Table X
ID
_________
1
2
3
4
5
I have a join table for Table X that allows records to be self joined. Let's call this JoinTableX
JoinTableX
RecordAID RecordBID
--------- --------
1 2 (So Record 1 from Table X has a link to Record 2 from Table X)
1 3
1 4
2 3
2 4
3 1
3 2
4 1
4 2
So how do I write a SQL query to show me all the records in JoinTableX that have a duplicate dependency on each other (example bove Table X Record 1 is linked to Table X Record 4 and Table X Record 4 is linked to Table X Record 1.
select *
from JoinTableX a
inner join JoinTableX b on a.RecordAID = b.RecordBID
and a.RecordBID = b.RecordAID
(SELECT RecordAID, RecordBID FROM JoinTableX)
INTERSECT
(SELECT RecordBID, RecordAID FROM JoinTableX)