Count on UNION in Oracle - sql

I have 3 table and I need to get the details from 2 tables where the count of UNION is greater than 1.But need to apply certain conditions as well
Table A
id entity_id name category
1 45 abcd win_1
2 46 efgh win_2
3 47 efgh1 win_2
4 48 dfgh win_5
5 49 adfgh win_4
Table B
id product_id name parent_id
1 P123 asdf win_1
2 P234 adfgh win_4
Table 3 category_list
id cat_id name
1 win_1 Households
2 win_2 Outdoors
3 win_3 Mixed
4 win_4 Omni
Now I need to have the count of UNION from Table A and Table B where they have count of cat_id greater than 1 and Table A.name != Table B.name
The result which I require is
p_id name cat_id
45 abcd win_1
P123 asdf win_1
46 efgh win_2
47 efgh1 win_2
win_5 is excluded as the count is one and win_4 should be excluded as name in Table A nd B is same.
I have run out of Ideas as i am relatively new to Oracle and DB.Any help is appreciated.

I think you can use exists to ensure that the cat_id is present in both tables
select entity_id as p_id, name, category as cat_id
from table_a a
where exists (select null from table_b where a.category = table_b.parent_id)
union
select entity_id, name, parent_id
from table_b b
where exists (select null from table_a where b.parent_id = table_a.category)

I believe you are looking for something like this -
Select T2.*
from
(Select category
from
(Select name, category from TableA
Union all
Select name, parent_id as category from TableB) t
group by category
having count(distinct name) > 1) T1
Join
(Select entity_id as Pid, name, category from TableA
Union
Select product_id as Pid, name, parent_id as category from TableB) T2
ON T1.category = T2.category;

Would you try this code.
First CTE (Common Table Expression) "list_union" gets the records for each table those have different names then makes the union. with the second CTE "list_cnt" counts the categories and finally gets the result cnt>1 with the last select statement as you pictured.
With
list_union AS (
SELECT
id,
----------
TO_CHAR(entity_id) entity_id,
----------
name,
category
FROM table_A a
WHERE NOT EXISTS(SELECT 1 FROM table_B b WHERE a.name=b.name)
----------
UNION ALL
----------
SELECT
id,
product_id,
name,
parent_id
FROM table_B b
WHERE NOT EXISTS(SELECT 1 FROM table_A a WHERE a.name=b.name)
)
,list_cnt AS (
SELECT
l.*,
----------
COUNT(*) over (PARTITION BY category) cnt
----------
FROM list_union l
)
SELECT
entity_id AS p_id,
name,
category AS cat_id
FROM list_cnt
WHERE cnt>1
ORDER BY cat_id ASC, p_id ASC
;

Just use a union all and window functions:
select ab.*
from (select ab.*,
count(distinct name) over (partition by category) as cnt
from ((select a.* from a
) union all
(select b.* from b
)
) ab
) ab
where cnt > 1;
Although you describe the problem as:
Now I need to have the count of UNION from Table A and Table B where they have count of cat_id greater than 1 and Table A.name != Table B.name
You seem to just want cat_ids that have different names across the two tables. Your sample data includes cat_id = 'win_2', which is not even in the second table.

Related

Hive query optimization

My requirement is to get the id and name of the students having more than 1 email id's and type=1.
I am using a query like
select distinct b.id, b.name, b.email, b.type,a.cnt
from (
select id, count(email) as cnt
from (
select distinct id, email
from table1
) c
group by id
) a
join table1 b on a.id = b.id
where b.type=1
order by b.id
Please let me know is this fine or any simpler version available.
Sample data is like:
id name email type
123 AAA abc#xyz.com 1
123 AAA acd#xyz.com 1
123 AAA ayx#xyz.com 3
345 BBB nch#xyz.com 1
345 BBB nch#xyz.com 1
678 CCC iuy#xyz.com 1
Expected Output:
123 AAA abc#xyz.com 1 2
123 AAA acd#xyz.com 1 2
345 BBB nch#xyz.com 1 1
678 CCC iuy#xyz.com 1 1
you can use group by -> having count() for this requirement.
select distinct b.id
, b.name,
, b.email
, b.type
from table1 b
where id in
(select distinct id from table1 group by email, id having count(email) > 1)
and b.type=1
order by b.id
You can try to use the analytical way of count() function:
SELECT sub.ID, sub.NAME
FROM (SELECT ID, NAME, COUNT (*) OVER (PARTITION BY ID, EMAIL) cnt
FROM raw.crddacia_raw) sub
WHERE sub.cnt > 1 AND sub.TYPE = 1
I strongly recommend using window functions. However, Hive does not support count(distinct) as a window function. There are different methods to solve this. One is the sum of dense_rank()s:
select id, name, email, type, cnt
from (select t1.*,
(dense_rank() over (partition by id order by email) +
dense_rank() over (partition by id order by email desc)
) as cnt
from table1 t1
) t
where type = 1;
I would expect this to have better performance than your version. However, it is worth testing different versions to see which has the better performance (and feel free to come back to let others know which is better).
One more method using collect_set and taking the size of returned array for calculating distinct emails.
Demo:
--your data example
with table1 as ( --use your table instead of this
select stack(6,
123, 'AAA', 'abc#xyz.com', 1,
123, 'AAA', 'acd#xyz.com', 1,
123, 'AAA', 'ayx#xyz.com', 3,
345, 'BBB', 'nch#xyz.com', 1,
345, 'BBB', 'nch#xyz.com', 1,
678, 'CCC', 'iuy#xyz.com', 1
) as (id, name, email, type )
)
--query
select distinct id, name, email, type,
size(collect_set(email) over(partition by id)) cnt
from table1
where type=1
Result:
id name email type cnt
123 AAA abc#xyz.com 1 2
123 AAA acd#xyz.com 1 2
345 BBB nch#xyz.com 1 1
678 CCC iuy#xyz.com 1 1
We still need DISTINCT here because analytic function does not remove duplicates like in case 345 BBB nch#xyz.com.
This is very similar to your query but here i am filtering data at initial step(in inner query)so that the join should not happen on less data
select distinct b.id,b.name,b.email,b.type,intr_table.cnt from table1 orig_table join
(
select a.id,a.type,count(a.email) as cnt from table1 as a where a.type=1 group by a
) intr_table on inter_table.id=orig_table.id,inter_table.type=orig_table.type

Add values from count that have come from a UNION

I am attempting to add together the values_occurrence for the id's that match. For example ID number 1 has 3 and another record shows ID number 1 having 4. I want to ADD 3 + 4 based on the ID matching.
I am trying to see which ID has the most entries in each table and then add them together.
ID value_occurrence
--------------------
1 3
1 4
so far this is what I have.
SELECT ID, COUNT(ID) AS value_occurrence
FROM TABLE1
GROUP BY ID
UNION
SELECT ID, COUNT(ID) AS value_occurrence
FROM TABLE2
GROUP BY ID
ORDER BY ID ASC;
Any help would be appreciated.
Do a union all and then aggregate:
select id, count(*) as total_cnt, sum(t1) as t1_cnt, sum(t2) as t2_cnt
from ((select id, 1 as t1, 0 as t2 from table1) union all
(select id, 0, 1 from table2)
) t
group by id
order by id;

Select rows not in another table by comparing two table

I have following two tables TableA and TableB
TableA
Id Month_Id Customer_Id Total_Amount
1 1 1 50
2 2 1 150
3 3 1 200
4 1 2 75
5 2 2 100
6 1 3 400
7 2 3 200
TableB
Id Month_Id Customer_Id Total_Amount
1 1 1 50
2 2 1 150
3 1 2 75
I want to compare Month_Id Customer_Id Total_Amount in both tables and select Id from TableA. The output should be as follow.
Output
Id
3
5
6
7
My concept is:
SELECT TableA.Id FROM TableA
WHERE TableA.Month_Id <> TableB.MonthId AND
TableA.Customer_Id <> TableB.Customer_Id AND
TableA.Total_Amount <> TableB.Total_Amount
SELECT TableA.Id
FROM TableA
WHERE NOT EXISTS (
SELECT 1
FROM TableB
WHERE TableB.Month_Id = TableA.Month_Id
AND TableB.Customer_Id = TableA.Customer_Id
AND TableB.Total_Amount = TableA.Total_Amount
)
select Id
from (
select Id, Month_Id, Customer_Id, Total_Amount from TableA
except
select Id, Month_Id, Customer_Id, Total_Amount from TableB
) q
SELECT id FROM
(SELECT id, month_id, customer_id, total_ammount FROM TableA
EXCEPT
SELECT id, month_id, customer_id, total_ammount FROM TableB);
You can use the EXCEPT set operator:
SELECT id
FROM (SELECT * FROM table_a
EXCEPT
SELECT * FROM table_b) t
You can use Merge with WHEN NOT MATCHED
place your condition in ON <merge_search_condition>
SELECT Id FROM TableA A LEFT JOIN tableB B
ON A.Id=B.Id AND A.Month_Id =B.Month_Id
AND A.Customer_Id =B.Customer_Id
AND A.Total_Amount=b.Total_Amount
WHERE B.Id is NULL
In oracle sql it would be:
SELECT ID FROM
(SELECT ID, Month_Id, Customer_Id, Total_Amount FROM TABLE_A
MINUS
SELECT ID, Month_Id, Customer_Id, Total_Amount FROM TABLE_B);
Is this what you want?
(Not sure of MINUS operator in sql-server though)

In Oracle, how do I get a page of distinct values from sorted results?

I have 2 columns in a one-to-many relationship. I want to sort on the "many" and return the first occurrence of the "one". I need to page through the data so, for example, I need to be able to get the 3rd group of 10 unique "one" values.
I have a query like this:
SELECT id, name
FROM table1
INNER JOIN table2 ON table2.fkid = table1.id
ORDER BY name, id;
There can be multiple rows in table2 for each row in table1.
The results of my query look like this:
id | name
----------------
2 | apple
23 | banana
77 | cranberry
23 | dark chocolate
8 | egg
2 | yak
19 | zebra
I need to page through the result set with each page containing n unique ids. For example, if start=1 and n=4 I want to get back
2
23
77
8
in the order they were sorted on (i.e., name), where id is returned in the position of its first occurrence. Likewise if start=3 and n=4 and order = desc I want
8
23
77
2
I tried this:
SELECT * FROM (
SELECT id, ROWNUM rnum FROM (
SELECT DISTINCT id FROM (
SELECT id, name
FROM table1
INNER JOIN table2 ON table2.fkid = table1.id
ORDER BY name, id)
WHERE ROWNUM <= 4)
WHERE rnum >=1)
which gave me the ids in numerical order, instead of being ordered as the names would be.
I also tried:
SELECT * FROM (
SELECT DISTINCT id, ROWNUM rnum FROM (
SELECT id FROM (
SELECT id, name
FROM table1
INNER JOIN table2 ON table2.fkid = table1.id
ORDER BY name, id)
WHERE ROWNUM <= 4)
WHERE rnum >=1)
but that gave me duplicate values.
How can I page through the results of this data? I just need the ids, nothing from the "many" table.
update
I suppose I'm getting closer with changing my inner query to
SELECT id, name, rank() over (order by name, id)
FROM table1
INNER JOIN table2 ON table2.fkid = table1.id
...but I'm still getting duplicate ids.
You may need to debug it a little, but but it will be something like this:
SELECT * FROM (
SELECT * FROM (
SELECT id FROM (
SELECT id, name, row_number() over (partition by id order by name) rn
FROM table1
INNER JOIN table2 ON table2.fkid = table1.id
)
) WHERE rn=1 ORDER BY name, id
) WHERE rownum>=1 and rownum<=4;
It's a bit convoluted (and I would tend to suspect that it could be simplified) but it should work. You'd can put whatever start and end position you want in the WHERE clause-- I'm showing here with start=2 and n=4 are pulled from a separate table but you could simplify things by using a couple of parameters instead.
SQL> ed
Wrote file afiedt.buf
1 with t as (
2 select 2 id, 'apple' name from dual union all
3 select 23, 'banana' from dual union all
4 select 77, 'cranberry' from dual union all
5 select 23, 'dark chocolate' from dual union all
6 select 8, 'egg' from dual union all
7 select 2, 'yak' from dual union all
8 select 19, 'zebra' from dual
9 ),
10 x as (
11 select 2 start_pos, 4 n from dual
12 )
13 select *
14 from (
15 select distinct
16 id,
17 dense_rank() over (order by min_id_rnk) outer_rnk
18 from (
19 select id,
20 min(rnk) over (partition by id) min_id_rnk
21 from (
22 select id,
23 name,
24 rank() over (order by name) rnk
25 from t
26 )
27 )
28 )
29 where outer_rnk between (select start_pos from x) and (select start_pos+n-1 from x)
30* order by outer_rnk
SQL> /
ID OUTER_RNK
---------- ----------
23 2
77 3
8 4
19 5

t-sql query to get all the rows from Table_A which have at least one matching relatedid in Table_B

Based on the below tables
Table_A
Id RelatedId
---------------
1 1
1 2
1 3
2 4
2 5
2 2
3 7
3 8
4 9
Table_B
RelatedId Name
--------------
2 A
3 B
I want to get all the rows from Table_A which have at least one matching relatedid in Table_B. The Ids from Table_A that do not have match in Table_B will have single row in the result table.
So output (result table) in this case will be
Id RelatedId
---------------
1 1
1 2
1 3
2 4
2 5
2 2
3 Null
4 Null
EDIT: Seems like the question text is confusing for many. So a detailed explanation:
Table_A Id 1 has both 2 and 3 (Related Ids) matching in table_B. So output will have all the rows for 1 from Table_A. Similary Id 2 from Table_A has 2 (Related Id) matching in table_B. So all rows corresponding to 2 from Table_A will be picked up. Since 3 does not have any matching relatedid in table_B it will be displayed but with NULL as relatedid in the results table.
with validids as (
select distinct id
from Table_A
inner join Table_B on
Table_A.relatedid = Table_B.relatedid
)
select
id,
relatedid
from Table_A
where id in (select id from validids)
union
select distinct
id,
null
from Table_A
where id not in (select id from validids)
Try:
Select Distinct Id, Related_Id
From Table_A
Where Related_Id In
(Select Related_Id From Table_B)
This should do what you want:
with IdsWithMatchInB(Id) as (
select distinct
Id
from Table_A
where Table_A.RelatedId in (
select Table_B.RelatedId
from Table_B
)
)
select
Table_A.Id,
CASE WHEN IdsWithMatchInB.Id IS NULL
THEN NULL
ELSE Table_A.RelatedId END AS RelatedId
from Table_A
left outer join IdsWithMatchInB
on IdsWithMatchInB.Id = Table_A.Id
Here's another option that avoids DISTINCT by putting the subquery inside the CASE expression:
select
Id,
case when Id in (
select Id
from Table_A as Acopy
where Acopy.RelatedId in (
select
RelatedId
from Table_B
)
)
then RelatedId
else null end as RelatedId
from Table_A;
One or the other might be more efficient for your particular data and indexes.