Sum analytical function or any other easy way - sql

I have below Data and need to select all columns with sum of one column
id size desc1, desc2
1 13 xxx yyy
1 13 xxx yyy
1 10 mmm kkk
1 10 mmm kkk
I need below output
id **total_size** desc1 des2
1 23 xxx yyy
1 23 xxx yyy
1 23 mmm kkk
1 23 mmm kkk
total_size should be sum (distinct size)

select a.id
,a.size
,sum(b.size) as 'total_size'
,a.desc1
,a.desc2
from (
select *, row_number() over (order by id, size, desc1, desc2) as 'RowNumber'
from #tmp
) a
left join (
select *, row_number() over(partition by id, size order by id) as 'dupe'
from #tmp
) b
on a.id = b.id
and b.dupe=1
group by a.RowNumber
,a.id
,a.size
,a.desc1
,a.desc2
Not here to argue, but you should really consider reviewing the data structure you're working with.
Select your data, adding a column to number the rows
Join a copy of your data (with distinct records only)
Sum the size column from the list of distinct records

You just need to add sum(distinct "size") over (partition by id) for computing total_size column for each row in your SQL :
with tab(id,"size","desc1","desc2") as
(
select 1 ,13,'xxx','yyy' from dual union all
select 1 ,13,'xxx','yyy' from dual union all
select 1 ,10,'mmm','kkk' from dual union all
select 1 ,10,'mmm','kkk' from dual
)
select t.id,
sum(distinct t."size") over (partition by id) as "total_size",
t."desc1",t."desc2"
from tab t;
P.S. size is a reserved keyword, so, cannot be used as a column name, unless quoted. as "size"

Related

Count on UNION in Oracle

I have 3 table and I need to get the details from 2 tables where the count of UNION is greater than 1.But need to apply certain conditions as well
Table A
id entity_id name category
1 45 abcd win_1
2 46 efgh win_2
3 47 efgh1 win_2
4 48 dfgh win_5
5 49 adfgh win_4
Table B
id product_id name parent_id
1 P123 asdf win_1
2 P234 adfgh win_4
Table 3 category_list
id cat_id name
1 win_1 Households
2 win_2 Outdoors
3 win_3 Mixed
4 win_4 Omni
Now I need to have the count of UNION from Table A and Table B where they have count of cat_id greater than 1 and Table A.name != Table B.name
The result which I require is
p_id name cat_id
45 abcd win_1
P123 asdf win_1
46 efgh win_2
47 efgh1 win_2
win_5 is excluded as the count is one and win_4 should be excluded as name in Table A nd B is same.
I have run out of Ideas as i am relatively new to Oracle and DB.Any help is appreciated.
I think you can use exists to ensure that the cat_id is present in both tables
select entity_id as p_id, name, category as cat_id
from table_a a
where exists (select null from table_b where a.category = table_b.parent_id)
union
select entity_id, name, parent_id
from table_b b
where exists (select null from table_a where b.parent_id = table_a.category)
I believe you are looking for something like this -
Select T2.*
from
(Select category
from
(Select name, category from TableA
Union all
Select name, parent_id as category from TableB) t
group by category
having count(distinct name) > 1) T1
Join
(Select entity_id as Pid, name, category from TableA
Union
Select product_id as Pid, name, parent_id as category from TableB) T2
ON T1.category = T2.category;
Would you try this code.
First CTE (Common Table Expression) "list_union" gets the records for each table those have different names then makes the union. with the second CTE "list_cnt" counts the categories and finally gets the result cnt>1 with the last select statement as you pictured.
With
list_union AS (
SELECT
id,
----------
TO_CHAR(entity_id) entity_id,
----------
name,
category
FROM table_A a
WHERE NOT EXISTS(SELECT 1 FROM table_B b WHERE a.name=b.name)
----------
UNION ALL
----------
SELECT
id,
product_id,
name,
parent_id
FROM table_B b
WHERE NOT EXISTS(SELECT 1 FROM table_A a WHERE a.name=b.name)
)
,list_cnt AS (
SELECT
l.*,
----------
COUNT(*) over (PARTITION BY category) cnt
----------
FROM list_union l
)
SELECT
entity_id AS p_id,
name,
category AS cat_id
FROM list_cnt
WHERE cnt>1
ORDER BY cat_id ASC, p_id ASC
;
Just use a union all and window functions:
select ab.*
from (select ab.*,
count(distinct name) over (partition by category) as cnt
from ((select a.* from a
) union all
(select b.* from b
)
) ab
) ab
where cnt > 1;
Although you describe the problem as:
Now I need to have the count of UNION from Table A and Table B where they have count of cat_id greater than 1 and Table A.name != Table B.name
You seem to just want cat_ids that have different names across the two tables. Your sample data includes cat_id = 'win_2', which is not even in the second table.

Hive query optimization

My requirement is to get the id and name of the students having more than 1 email id's and type=1.
I am using a query like
select distinct b.id, b.name, b.email, b.type,a.cnt
from (
select id, count(email) as cnt
from (
select distinct id, email
from table1
) c
group by id
) a
join table1 b on a.id = b.id
where b.type=1
order by b.id
Please let me know is this fine or any simpler version available.
Sample data is like:
id name email type
123 AAA abc#xyz.com 1
123 AAA acd#xyz.com 1
123 AAA ayx#xyz.com 3
345 BBB nch#xyz.com 1
345 BBB nch#xyz.com 1
678 CCC iuy#xyz.com 1
Expected Output:
123 AAA abc#xyz.com 1 2
123 AAA acd#xyz.com 1 2
345 BBB nch#xyz.com 1 1
678 CCC iuy#xyz.com 1 1
you can use group by -> having count() for this requirement.
select distinct b.id
, b.name,
, b.email
, b.type
from table1 b
where id in
(select distinct id from table1 group by email, id having count(email) > 1)
and b.type=1
order by b.id
You can try to use the analytical way of count() function:
SELECT sub.ID, sub.NAME
FROM (SELECT ID, NAME, COUNT (*) OVER (PARTITION BY ID, EMAIL) cnt
FROM raw.crddacia_raw) sub
WHERE sub.cnt > 1 AND sub.TYPE = 1
I strongly recommend using window functions. However, Hive does not support count(distinct) as a window function. There are different methods to solve this. One is the sum of dense_rank()s:
select id, name, email, type, cnt
from (select t1.*,
(dense_rank() over (partition by id order by email) +
dense_rank() over (partition by id order by email desc)
) as cnt
from table1 t1
) t
where type = 1;
I would expect this to have better performance than your version. However, it is worth testing different versions to see which has the better performance (and feel free to come back to let others know which is better).
One more method using collect_set and taking the size of returned array for calculating distinct emails.
Demo:
--your data example
with table1 as ( --use your table instead of this
select stack(6,
123, 'AAA', 'abc#xyz.com', 1,
123, 'AAA', 'acd#xyz.com', 1,
123, 'AAA', 'ayx#xyz.com', 3,
345, 'BBB', 'nch#xyz.com', 1,
345, 'BBB', 'nch#xyz.com', 1,
678, 'CCC', 'iuy#xyz.com', 1
) as (id, name, email, type )
)
--query
select distinct id, name, email, type,
size(collect_set(email) over(partition by id)) cnt
from table1
where type=1
Result:
id name email type cnt
123 AAA abc#xyz.com 1 2
123 AAA acd#xyz.com 1 2
345 BBB nch#xyz.com 1 1
678 CCC iuy#xyz.com 1 1
We still need DISTINCT here because analytic function does not remove duplicates like in case 345 BBB nch#xyz.com.
This is very similar to your query but here i am filtering data at initial step(in inner query)so that the join should not happen on less data
select distinct b.id,b.name,b.email,b.type,intr_table.cnt from table1 orig_table join
(
select a.id,a.type,count(a.email) as cnt from table1 as a where a.type=1 group by a
) intr_table on inter_table.id=orig_table.id,inter_table.type=orig_table.type

Running count but reset on some column value in select query

I want to achieve a running value, but condition is reset on some specific column value.
Here is my select statement:
with tbl(emp,salary,ord) as
(
select 'A',1000,1 from dual union all
select 'B',1000,2 from dual union all
select 'K',1000,3 from dual union all
select 'A',1000,4 from dual union all
select 'B',1000,5 from dual union all
select 'D',1000,6 from dual union all
select 'B',1000,7 from dual
)
select * from tbl
I want to reset count on emp B if the column value is B, then count is reset to 0 and started again increment by 1:
emp salary ord running_count
A 1000 1 0
B 1000 2 1
K 1000 3 0
A 1000 4 1
B 1000 5 2
D 1000 6 0
B 1000 7 1
Here order column is ord.
I want to achieve the whole thing by select statement, not using the cursor.
You want to define groups were the counting takes place. Within a group, the solution is row_number().
You can define the group by doing a cumulative sum of B values. Because B ends the group, you want to count the number of B after each record.
This results in:
select t.*,
row_number() over (partition by grp order by ord) - 1 as running_count
from (select t.*,
sum(case when emp = 'B' then 1 else 0 end) over (order by ord desc) as grp
from tbl t
) t;

SQL Server group by first then ungroup?

I have a list of data need to be grouped, but we only want to group data that count are greater than 3.
AA
AA
BB
CCC
CCC
CCC
return
AA 1
AA 1
BB 1
CCC 3
Thank you for your help
select data, case when total < 3 then 1 else total end total
from
(
select data, Count(Data) Total
from tbl
group by data
) g
join (select 1 union all select 2) a(b)
on a.b <= case when total < 3 then Total else 1 end
order by data
This should perform faster than LittleBobbyTables's answer most of the time.
Off the top of my head, you could use a get a count of everything with a count greater than 2, and then use UNION ALL to get any records not in the first query:
SELECT 'AA' AS Data
INTO #Temp
UNION ALL SELECT 'AA'
UNION ALL SELECT 'BB'
UNION ALL SELECT 'CCC'
UNION ALL SELECT 'CCC'
UNION ALL SELECT 'CCC'
SELECT Data, COUNT(Data) AS MyCount
FROM #Temp
GROUP BY Data
HAVING COUNT(Data) > 2
UNION ALL
SELECT Data, 1
FROM #Temp
WHERE Data NOT IN (
SELECT Data
FROM #Temp
GROUP BY Data
HAVING COUNT(Data) > 2
)
ORDER BY Data
DROP TABLE #Temp
Use the window functions for this:
select col, count(*) as cnt
from (select col, count(*) over (partition by col) as colcnt,
row_number() over (order by (select NULL)) as seqnum
from t
) t
group by col, (case when colcnt < 3 then seqnum else NULL end)
This calculates the total count over the column and a unique identifier for each row. The group by clause then tests for the condition. If less than 3, then it uses the identifier to get each row. If greater, it uses a constant value (NULL) in this case.

In Oracle, how do I get a page of distinct values from sorted results?

I have 2 columns in a one-to-many relationship. I want to sort on the "many" and return the first occurrence of the "one". I need to page through the data so, for example, I need to be able to get the 3rd group of 10 unique "one" values.
I have a query like this:
SELECT id, name
FROM table1
INNER JOIN table2 ON table2.fkid = table1.id
ORDER BY name, id;
There can be multiple rows in table2 for each row in table1.
The results of my query look like this:
id | name
----------------
2 | apple
23 | banana
77 | cranberry
23 | dark chocolate
8 | egg
2 | yak
19 | zebra
I need to page through the result set with each page containing n unique ids. For example, if start=1 and n=4 I want to get back
2
23
77
8
in the order they were sorted on (i.e., name), where id is returned in the position of its first occurrence. Likewise if start=3 and n=4 and order = desc I want
8
23
77
2
I tried this:
SELECT * FROM (
SELECT id, ROWNUM rnum FROM (
SELECT DISTINCT id FROM (
SELECT id, name
FROM table1
INNER JOIN table2 ON table2.fkid = table1.id
ORDER BY name, id)
WHERE ROWNUM <= 4)
WHERE rnum >=1)
which gave me the ids in numerical order, instead of being ordered as the names would be.
I also tried:
SELECT * FROM (
SELECT DISTINCT id, ROWNUM rnum FROM (
SELECT id FROM (
SELECT id, name
FROM table1
INNER JOIN table2 ON table2.fkid = table1.id
ORDER BY name, id)
WHERE ROWNUM <= 4)
WHERE rnum >=1)
but that gave me duplicate values.
How can I page through the results of this data? I just need the ids, nothing from the "many" table.
update
I suppose I'm getting closer with changing my inner query to
SELECT id, name, rank() over (order by name, id)
FROM table1
INNER JOIN table2 ON table2.fkid = table1.id
...but I'm still getting duplicate ids.
You may need to debug it a little, but but it will be something like this:
SELECT * FROM (
SELECT * FROM (
SELECT id FROM (
SELECT id, name, row_number() over (partition by id order by name) rn
FROM table1
INNER JOIN table2 ON table2.fkid = table1.id
)
) WHERE rn=1 ORDER BY name, id
) WHERE rownum>=1 and rownum<=4;
It's a bit convoluted (and I would tend to suspect that it could be simplified) but it should work. You'd can put whatever start and end position you want in the WHERE clause-- I'm showing here with start=2 and n=4 are pulled from a separate table but you could simplify things by using a couple of parameters instead.
SQL> ed
Wrote file afiedt.buf
1 with t as (
2 select 2 id, 'apple' name from dual union all
3 select 23, 'banana' from dual union all
4 select 77, 'cranberry' from dual union all
5 select 23, 'dark chocolate' from dual union all
6 select 8, 'egg' from dual union all
7 select 2, 'yak' from dual union all
8 select 19, 'zebra' from dual
9 ),
10 x as (
11 select 2 start_pos, 4 n from dual
12 )
13 select *
14 from (
15 select distinct
16 id,
17 dense_rank() over (order by min_id_rnk) outer_rnk
18 from (
19 select id,
20 min(rnk) over (partition by id) min_id_rnk
21 from (
22 select id,
23 name,
24 rank() over (order by name) rnk
25 from t
26 )
27 )
28 )
29 where outer_rnk between (select start_pos from x) and (select start_pos+n-1 from x)
30* order by outer_rnk
SQL> /
ID OUTER_RNK
---------- ----------
23 2
77 3
8 4
19 5