Email Duplicate Percentage calculation - sql

I have a table like below
email table_name
a#mail.com a1
a#mail.com b2
b#mail.com a1
c#mail.com c1
d#mail.com d1
e#mail.com e
g#mail.com e
g#mail.com e
e#mail.com f
g#mail.com g
So from here how can I calculate email duplicate percentage of each table
table_name total_email duplicate_email duplicate_percentage
a1 2 1 50%
b2 1 1 100%
c1 1 0 0
d1 1 0 0
e 2 2 100%
f 1 1 100%
g 1 1 100%

Here's my try. Setup:
DECLARE #Test TABLE (
email VARCHAR(100),
table_name VARCHAR(100))
INSERT INTO #Test (
email,
table_name)
VALUES
('a#mail.com', 'a1'),
('a#mail.com', 'b2'),
('b#mail.com', 'a1'),
('c#mail.com', 'c1'),
('d#mail.com', 'd1'),
('e#mail.com', 'e'),
('g#mail.com', 'e'),
('g#mail.com', 'e'),
('e#mail.com', 'f'),
('g#mail.com', 'g')
Solution:
;WITH DupDetail AS
(
SELECT
T.email,
T.table_name,
IsDup = CASE WHEN COUNT(*) OVER (PARTITION BY T.email) > 1 THEN 1 ELSE 0 END
FROM
#Test AS T
),
DupStats AS
(
SELECT
T.table_name,
total_email = COUNT(DISTINCT(T.email)),
duplicate_email = COUNT(DISTINCT(CASE WHEN T.IsDup = 1 THEN T.email END))
FROM
DupDetail AS T
GROUP BY
T.table_name
)
SELECT
D.table_name,
D.total_email,
D.duplicate_email,
duplicate_percentage = CONVERT(
DECIMAL(5,2),
D.duplicate_email * 100.0 / D.total_email)
FROM
DupStats AS D
IsDup column marks the mail as 1 if it's repeated in any table, then duplicate_email is a COUNT DISTINCT for all emails that are duplicates across all tables, but grouped by each table name.
Result:
table_name total_email duplicate_email duplicate_percentage
a1 2 1 50.00
b2 1 1 100.00
c1 1 0 0.00
d1 1 0 0.00
e 2 2 100.00
f 1 1 100.00
g 1 1 100.00

You can use window functions, then aggregation:
select
table_name,
sum(cnt1) duplicate,
sum(1.0 * cnt1) / cnt2 percentage
from (
select
t.*,
count(*) over(partition by email) - 1 cnt1,
count(*) over(partition by table_name) cnt2
from mytable t
) t
group by table_name, cnt2
order by table_name
Demo on DB Fiddle:
table_name | duplicate | percentage
:--------- | --------: | :---------
a1 | 1 | 0.500000
b2 | 1 | 1.000000
c1 | 0 | 0.000000
d1 | 0 | 0.000000

Related

SQL server pivot + sum + group by

I have data as following and i need to group sum pivot
AA
BB
date
a
1
01/01/2020
a
2
01/01/2020
b
5
01/01/2020
b
1
01/01/2020
c
5
01/01/2020
d
1
01/01/2020
d
8
02/01/2020
e
1
01/01/2020
what I obtain with my sql code
a
b
c
d
e
01/01/2020
3
6
5
1
1
02/01/2020
/
/
/
8
/
what I need to obtain: a and d grouped as f and c and e grouped as g and b separate
b
f
g
01/01/2020
6
4
6
02/01/2020
/
8
/
I have got the following sql but I cant seem to do the group summing. Do you do it before pivoting or after?
SELECT * FROM(
SELECT AA,Date
FROM [dbo].[Data] )
AS SourceTable
PIVOT(SUM([BB])
FOR [AA] IN([a],[b],[c],[d],[e]))
AS PivotTable
IF I try this it doesnt work
SELECT * FROM(
SELECT AA,Date
FROM [dbo].[Data] )
AS SourceTable
PIVOT(SUM([BB])
FOR [AA] IN([a]+[d],[b],[c]+[e]))
AS PivotTable
Use conditional aggregation as follows:
select sum(case when aa in ('a','d') then BB end) as f,
sum(case when aa in ('c','e') then BB end) as g,
sum(case when aa = 'b' then BB end) as b
from table_name
group by date
I find that this is simpler done with conditional aggregation:
select
date,
sum(case when d.aa = 'b' then bb else 0 end) as b,
sum(case when d.aa in ('a', 'd') then bb else 0 end) as f,
sum(case when d.aa in ('c', 'e') then bb else 0 end) as g
from data d
group by date

Get Count for Each Column values

Input
Create Table #t1 (CaseId Int, NewValue char(2),Attribute char(2),TimeStamp datetime)
insert into #t1 values
(1, 'A', 'X' , '2020-01-01 13:01'),
(1, 'Au', 'WB' , '2020-01-01 13:02'),
(1 , 'C' , 'P' , '2020-01-01 13:03'),
(1 , 'Ma', 'WB' , '2020-01-01 13:04'),
(1 , 'C' , 'D', '2020-01-01 13:05'),
(1, 'D' , 'E', '2020-01-01 13:04'),
(2 , 'M' , 'P' , '2020-05-01 15:20'),
(2 , 'X' , 'WB' , '2020-05-01 15:26'),
(2 , 'Y' , 'WB', '2020-05-01 15:29'),
(2 , 'X' , 'P' , '2020-05-01 15:31')
I need output like the following.
CaseId NewValue Attribute TimeStamp NewColumn NewColumn Count
1 A X 01:00.0 NULL NULL 0
1 Au WB 02:00.0 Au-WB Au-WB 2
1 C P 03:00.0 Au-WB Au-WB 2
1 Ma WB 04:00.0 Ma-WB Ma-WB 3
1 C D 05:00.0 Ma-WB Ma-WB 3
1 D E 04:00.0 Ma-WB Ma-WB 3
2 M P 20:00.0 NULL NULL 0
2 X WB 26:00.0 X -WB X -WB 1
2 Y WB 29:00.0 Y -WB Y -WB 2
2 X P 31:00.0 Y -WB Y -WB 2
Squirrel helped to get everything minus count. The query is as follows. Does anyone know how to get that count?
select *, wb.NewColumn
from #t1 t
outer apply
(
select top 1 x.NewValue + '-' + x.Attibute as NewColumn
from #t1 x
where x.CaseId = t.CaseId
and x.TimeStamp <= t.TimeStamp
and x.Attibute = 'WB'
order by x.TimeStamp desc
) wb
This looks like a gaps-and-island problem, where a new island starts everytime a record with Attribute 'WB' is encountered.
If so, here is one way to solve it using window functions:
select
caseId,
newValue,
attribute,
timeStamp,
case when grp > 0
then first_value(newValue) over(partition by caseId, grp order by timeStamp)
+ '-'
+ first_value(attribute) over(partition by caseId, grp order by timeStamp)
end newValue,
case when grp > 0
then count(*) over(partition by caseId, grp)
else 0
end cnt
from (
select
t.*,
sum(case when attribute = 'WB' then 1 else 0 end)
over(partition by caseId order by timeStamp) grp
from #t1 t
) t
order by caseId, timeStamp
The inner query does a window sum() to define the groups: everytime attribute 'WB' is met for a given caseId, a new group starts. Then, the outer query uses first_value() to recover the first value in the group, and performs a window count() to compute the number of records per group. This is wrapped in conditional logic so the additional columns are not filled before the first 'WB' atribute is met.
Demo on DB Fiddle:
caseId | newValue | attribute | timeStamp | newValue | cnt
-----: | :------- | :-------- | :---------------------- | :------- | --:
1 | A | X | 2020-01-01 13:01:00.000 | null | 0
1 | Au | WB | 2020-01-01 13:02:00.000 | Au-WB | 2
1 | C | P | 2020-01-01 13:03:00.000 | Au-WB | 2
1 | Ma | WB | 2020-01-01 13:04:00.000 | Ma-WB | 3
1 | D | E | 2020-01-01 13:04:00.000 | Ma-WB | 3
1 | C | D | 2020-01-01 13:05:00.000 | Ma-WB | 3
2 | M | P | 2020-05-01 15:20:00.000 | null | 0
2 | X | WB | 2020-05-01 15:26:00.000 | X -WB | 1
2 | Y | WB | 2020-05-01 15:29:00.000 | Y -WB | 2
2 | X | P | 2020-05-01 15:31:00.000 | Y -WB | 2
Using your query output, create a cte and perform the count using the windowing function by partition on caseid,newcolumn as follows
with data
as (
select *, wb.NewColumn
from #t1 t
outer apply
(
select top 1 x.NewValue + '-' + x.Attibute as NewColumn
from #t1 x
where x.CaseId = t.CaseId
and x.TimeStamp <= t.TimeStamp
and x.Attibute = 'WB'
order by x.TimeStamp desc
) wb
)
select *,count(*) over(partition by caseid,newcolumn) as cnt
from data

Find duplicates and remove it applying conditions SQL Server

I'm new to SQL Server, and I'm trying to remove duplicates from a table but with some conditions, and my doubt is on how to apply these conditions to the query.
I need to remove duplicates from the Users table, eg:
Id Code Name SysName
-----------------------------
1 D1 N1
2 D1
3 D1 N1 N-1
4 E2 N2
5 E2 N2
6 E2 N2
7 X3
8 X3 N-3
9
10
11 Z4 W2 N-4-4
12 Z4 W2 N-44
In the above table: for D1 code I want to keep the ID=3, which has all columns filled (Code, Name, and SysName) and delete ID=1 and ID=2
For E2 code, I want to keep any of these and remove the two duplicated ones
For X3 code, keep the one which has SysName=N-3
For ID=9, ID=10 (empty code and everything empty, remove all)
For Z4 code, remove ID=11 and keep N-44 Sysname
And the last thing, I've a FK with other table, so I think that I need first to get all Id's from Users, delete these ids from the second dependent table and finally delete from Users table.
Do you have any idea on how to achieve it? I do not pretend the solution but a structure code or examples/scenarios you have similar to it, any suggestion would be fine for me.
EDIT:
To resume.. I have Users table:
Id Code Name SysName
-----------------------------
1 D1 N1
2 D1
3 D1 N1 N-1
4 E2 N2
5 E2 N2
6 E2 N2
7 X3
8 X3 N-3
9
10
11 Z4 W2 N-4-4
12 Z4 W2 N-44
And I want to keep only:
Id Code Name SysName
-----------------------------
3 D1 N1 N-1
4 E2 N2
8 X3 N-3
12 Z4 W2 N-44
Are you looking for something like
SELECT Code,
MAX(ISNULL(Name, '')) Name,
MAX(ISNULL(SysName, '')) SysName
FROM T
WHERE Code IS NOT NULL
GROUP BY Code;
Returns:
+------+------+---------+
| Code | Name | SysName |
+------+------+---------+
| D1 | N1 | N-1 |
| E2 | N2 | |
| X3 | | N-3 |
| Z4 | W2 | N-4-4 |
+------+------+---------+
Demo
The next query show the list of ids to remove according to the next rules of importance:
1- If the user has all the fields empty/null will be deleted.
2- The user with more fields with errors will be considered first to remove (example SysName cannot contains two -).
3- The user with more fields empty/null will be considered first to remove.
;WITH
[Ids]
AS
(
SELECT
[U].[Id]
,[Importance] =
CASE
WHEN [X].[NumberOfFilledFields] = 0
THEN -1
ELSE ROW_NUMBER() OVER (PARTITION BY [U].[Code] ORDER BY [X].[NumberOfInvalidFields], [X].[NumberOfFilledFields] DESC)
END
FROM [Users] AS [U]
CROSS APPLY
(
SELECT
[NumberOfFilledFields] =
+ CASE WHEN NULLIF([U].[Code], '') IS NULL THEN 0 ELSE 1 END
+ CASE WHEN NULLIF([U].[Name], '') IS NULL THEN 0 ELSE 1 END
+ CASE WHEN NULLIF([U].[SysName], '') IS NULL THEN 0 ELSE 1 END
,[NumberOfInvalidFields] =
+ CASE WHEN [U].[SysName] LIKE '%-%-%' THEN 1 ELSE 0 END
) AS [X]
)
SELECT
[Id]
FROM [Ids]
WHERE (1 = 1)
AND ([Importance] = -1 OR [Importance] > 1);
This uses window functions and coalesce:
DECLARE #t TABLE ([Id] INT, [Code] CHAR(2), [Name] CHAR(2), [SysName] VARCHAR(10))
INSERT INTO #t values
(1 , 'D1', 'N1', Null ), (2 , 'D1', Null, Null ), (3 , 'D1', 'N1', 'N-1' ), (4 , 'E2', 'N2', Null ), (5 , 'E2', 'N2', Null ), (6 , 'E2', 'N2', Null )
, (7 , 'X3', Null, Null ), (8 , 'X3', Null, 'N-3' ) , (9 , Null, Null, Null ), (10, Null, Null, Null ), (11, 'Z4', 'W2', 'N-44'), (12, 'Z4', 'W2', 'N-44' )
;WITH t AS (
SELECT DISTINCT
[code]
, COALESCE([name], max([name]) OVER(PARTITION BY [code])) AS [Name]
, COALESCE([sysname], COALESCE(MAX([sysname]) OVER(PARTITION BY [code], [name]), MAX([sysname]) OVER(PARTITION BY [code]))) AS [SysName]
FROM #t
WHERE [code] IS NOT NULL)
SELECT MIN(t2.id), t.Code, t.Name, t.SysName
from #t t2
INNER JOIN t ON t.code = t2.code AND ISNULL(t.[Name], 'null') = ISNULL(t2.[Name], 'Null') AND ISNULL(t.[SysName], 'Null') = ISNULL(t2.[SysName], 'Null')
GROUP BY t.Code, t.Name, t.SysName
DEMO
(Any other answer: feel free to borrow the demos to test your answer or use it in yours! no need to duplicate effort!)
One could use an analytic function/window function like row_number() to assign a row to each record we want and keep all the #1 rows except for those where code is null... do this with a cte and then just delete.
We determine what to keep by looking at the record having the most data and in case of ties, use the earliest ID.
With cte as (
SELECT id, code, name, sysname,
row_number() over (partition by code order by (case when name is not null then 1 else 0 end + case when sysname is not null then 1 else 0 end) desc, ID) RN
FROM users)
Delete from cte where RN <> 1 or code is null;
Results in:
+----+----+------+------+---------+
| | ID | Code | Name | Sysname |
+----+----+------+------+---------+
| 1 | 3 | D1 | N1 | N-1 |
| 2 | 4 | E2 | N2 | NULL |
| 3 | 8 | X3 | NULL | N-3 |
| 4 | 11 | Z4 | W2 | N-4-4 |
+----+----+------+------+---------+
One could use the CTE and delete related FK records that would get purged
and then use the cte again and delete the users
you need to have knowledge of case
then you can change the condition accordingly
you can see the sample code bellow. just twist the case as per your requirements in where clause.
;with C as
(
select Dense_rank() over(partition by code order by id) as rn,*
from Users
)
delete from C
where rn =
(case
when (code = 'd1' and name is not null and sysname !='') then 0
when (code = 'E1' and rn = 1) then 0
when (code = 'X3' and sysname!='') then 0
when (code = 'z4' and name is not null and sysname !='') then 0
else rn
end )
Output:-
3 D1 N1 N-1
8 X3 N-3
11 Z4 W2 N-4-4
12 Z4 W2 N-44

display 3 or more consecutive rows(Sql)

I have a table with below data
+------+------------+-----------+
| id | date1 | people |
+------+------------+-----------+
| 1 | 2017-01-01 | 10 |
| 2 | 2017-01-02 | 109 |
| 3 | 2017-01-03 | 150 |
| 4 | 2017-01-04 | 99 |
| 5 | 2017-01-05 | 145 |
| 6 | 2017-01-06 | 1455 |
| 7 | 2017-01-07 | 199 |
| 8 | 2017-01-08 | 188 |
+------+------------+-----------+
now what i am trying to do is to display 3 consecutive rows where people were >=100 like this
+------+------------+-----------+
| id | date1 | people |
+------+------------+-----------+
| 5 | 2017-01-05 | 145 |
| 6 | 2017-01-06 | 1455 |
| 7 | 2017-01-07 | 199 |
| 8 | 2017-01-08 | 188 |
+------+------------+-----------+
can anyone help me how to do this query using oracle database. I am able to display rows which are above 100 but not in a consecutive way
Table creation(reducing typing time for people who will be helping)
CREATE TABLE stadium
( id int
, date1 date, people int
);
Insert into stadium values (
1,TO_DATE('2017-01-01','YYYY-MM-DD'),10);
Insert into stadium values
(2,TO_DATE('2017-01-02','YYYY-MM-DD'),109);
Insert into stadium values(
3,TO_DATE('2017-01-03','YYYY-MM-DD'),150);
Insert into stadium values(
4,TO_DATE('2017-01-04','YYYY-MM-DD'),99);
Insert into stadium values(
5,TO_DATE('2017-01-05','YYYY-MM-DD'),145);
Insert into stadium values(
6,TO_DATE('2017-01-06','YYYY-MM-DD'),1455);
Insert into stadium values
(7,TO_DATE('2017-01-07','YYYY-MM-DD'),199);
Insert into stadium values(
8,TO_DATE('2017-01-08','YYYY-MM-DD'),188);
Thanks in advance for the help
Assuming you mean >= 100, there are a couple of ways. One method just uses lead() and lag(). But a simple method defines each group >= 100 by the number of values < 100 before it. Then it uses count(*) to find the size of the consecutive values:
select s.*
from (select s.*, count(*) over (partition by grp) as num100pl
from (select s.*,
sum(case when people < 100 then 1 else 0 end) over (order by date) as grp
from stadium s
) s
) s
where num100pl >= 3;
Here is a SQL Fiddle showing that the syntax works.
You can use the following sql script to get the desired output.
WITH partitioned AS (
SELECT *, id - ROW_NUMBER() OVER (ORDER BY id) AS grp
FROM stadium
WHERE people >= 100
),
counted AS (
SELECT *, COUNT(*) OVER (PARTITION BY grp) AS cnt
FROM partitioned
)
select id , visit_date,people
from counted
where cnt>=3
I'm assuming that both the id and date columns are sequential and correspond to each other (there will need to be additional ROW_NUMBER() if the ids are not sequential with the dates, and more complex logic included if the dates are not necessarily sequential).
SELECT
*
FROM
(
SELECT
*
,COUNT(date) OVER (PARTITION BY sequential_group_num) AS num_days_in_sequence
FROM
(
SELECT
*
,(id - ROW_NUMBER() OVER (ORDER BY date)) AS sequential_group_num
FROM
stadium
WHERE
people >= 100
) AS subquery1
) AS subquery2
WHERE
num_days_in_sequence >= 3
That produces the following output:
id date people sequential_group_num num_days_in_sequence
----------- ---------- ----------- -------------------- --------------------
5 2017-01-05 145 2 4
6 2017-01-06 1455 2 4
7 2017-01-07 199 2 4
8 2017-01-08 188 2 4
By using joins we can display the consecutive rows like this
SELECT id, date1, people FROM stadium a WHERE people >= 100
AND (SELECT people FROM stadium b WHERE b.id = a.id + 1) >= 100
AND (SELECT people FROM stadium c WHERE c.id = a.id + 2) >= 100
OR people >= 100
AND (SELECT people FROM stadium e WHERE e.id = a.id - 1) >= 100
AND (SELECT people FROM stadium f WHERE f.id = a.id + 1) >= 100
OR people >= 100
AND (SELECT people FROM stadium g WHERE g.id = a.id - 1) >= 100
AND (SELECT people FROM stadium h WHERE h.id = a.id - 2) >= 100
order by id;
select distinct
t1.*
from
stadium t1
join
stadium t2
join
stadium t3
where
t1.people >= 100
and t2.people >= 100
and t3.people >= 100
and
(
(t1.id + 1 = t2.id
and t2.id + 1 = t3.id)
or
(
t2.id + 1 = t1.id
and t1.id + 1 = t3.id
)
or
(
t2.id + 1 = t3.id
and t3.id + 1 = t1.id
)
)
order by
id;
SQL script:
SELECT DISTINCT SS.*
FROM STADIUM SS
INNER JOIN
(SELECT S1.ID
FROM STADIUM S1
WHERE 3 = (
SELECT COUNT(1)
FROM STADIUM S2
WHERE (S2.ID=S1.ID OR S2.ID=S1.ID+1 OR S2.ID=S1.ID+2)
AND S2.PEOPLE >= 100
)) AS SS2
ON SS.ID>=SS2.ID AND SS.ID<SS2.ID+3
select *
from(
select * , count(*) over (partition by grp) as total
from
(select * , Sum(case when people < 100 then 1 else 0 end) over (order by date) as grp
from stadium) T -- inner Query 1
where people >=100 )S--inner query 2
where total >=3 --outer query
I wrote the following solution for this similar leetcode problem:
with groupVisitsOver100 as (
select *,
sum(
case
when people < 100 then 1
else 0
end
) over (order by date1) as visitGroups
from stadium
),
filterUnder100 as (
select
*
from groupVisitsOver100
where people >= 100
),
countGroupsSize as (
select
*,
count(*) over (partition by visitGroups) as groupsSize
from filterUnder100
)
select id, date1, people from countGroupsSize where groupsSize >= 3 order by date1

join 2 tables in oracle sql

Here is the configuration I am starting with:
DROP TABLE ruleset1;
CREATE TABLE ruleset1 (id int not null unique,score_rule1 float default 0.0,score_rule2 float default 0.0,score_rule3 float default 0.0);
DROP TABLE ruleset2;
CREATE TABLE ruleset2 (id int not null unique,score_rule1 float default 0.0,score_rule2 float default 0.0,score_rule3 float default 0.0);
insert into ruleset1 (id, score_rule1, score_rule2, score_rule3) values (0,0.8,0,0);
insert into ruleset1 (id, score_rule1, score_rule2, score_rule3) values (1,0,0.1,0);
insert into ruleset2 (id, score_rule1, score_rule2, score_rule3) values (0,0,0,0.3);
insert into ruleset2 (id, score_rule1, score_rule2, score_rule3) values (2,0,0.2,0);
what I have is this now is 2 tables
ruleset1:
| ID | SCORE_RULE1 | SCORE_RULE2 | SCORE_RULE3
================================================
| 0 | 0.8 | 0 | 0
| 1 | 0 | 0.1 | 0
and ruleset2:
| ID | SCORE_RULE1 | SCORE_RULE2 | SCORE_RULE3
================================================
| 0 | 0 | 0 | 0.3
| 2 | 0 | 0.2 | 0
and I want to outer join them and calculate the mean of non zero columns, like this:
| ID | Average
================
| 0 | 0.55
| 1 | 0.1
| 2 | 0.2
My current query is:
select * from ruleset1 full outer join ruleset2 on ruleset1.id = ruleset2.id;
which gives an ugly result:
| ID | SCORE_RULE1 | SCORE_RULE2 | SCORE_RULE3 | ID | SCORE_RULE1 | SCORE_RULE2 | SCORE_RULE3
============================================================================================
| 0 | .8 | 0 | 0 | 0 | 0 | 0 | .3
| - | - | - | - | 2 | 0 | .2 | 0
| 1 | 0 | .1 | 0 | - | - | - | -
Can anyone help with a better query please?
Thank you very much!
Of course avg doesn't ignore zeroes, only NULLs, thus NULLIF(column, 0) could be used.
But as you got denormalized data you can simply normalize it on-the-fly:
select id, avg(score)
from
(
select id, score_rule1 score
from ruleset1 where score_rule1 <> 0
union all
select id, score_rule2 from ruleset1 where score_rule2 <> 0
union all
select id, score_rule3 from ruleset1 where score_rule3 <> 0
union all
select id, score_rule1 from ruleset2 where score_rule1 <> 0
union all
select id, score_rule2 from ruleset2 where score_rule2 <> 0
union all
select id, score_rule3 from ruleset2 where score_rule3 <> 0
) dt
group by id;
To avoid five Unions you could use only one and do some additional logic:
select id, sum(score) / sum(score_count)
from
(
select id, score_rule1 + score_rule2 + score_rule3 score,
case when score_rule1 = 0 then 0 else 1 end +
case when score_rule2 = 0 then 0 else 1 end +
case when score_rule3 = 0 then 0 else 1 end score_count
from ruleset1
union all
select id, score_rule1 + score_rule2 + score_rule3 score,
case when score_rule1 = 0 then 0 else 1 end +
case when score_rule2 = 0 then 0 else 1 end +
case when score_rule3 = 0 then 0 else 1 end score_count
from ruleset2
) dt
group by id;
This assumes there are no NULLs in the core_rule columns.
Here's an example with PostgreSQL that you could adapt with Oracle (sorry, SQLFiddle's Oracle isn't cooperating). Thanks to Juan Carlos Oropeza's suggestion, the code below runs on Oracle well: http://rextester.com/DVP59353
select
r.id,
sum(coalesce(r1.score_rule1,0) +
coalesce(r1.score_rule2,0) +
coalesce(r1.score_rule3,0) +
coalesce(r2.score_rule1,0) +
coalesce(r2.score_rule2,0) +
coalesce(r2.score_rule3,0)
)
/
sum(case when coalesce(r1.score_rule1,0) <> 0 then 1 else 0 end +
case when coalesce(r1.score_rule2,0) <> 0 then 1 else 0 end +
case when coalesce(r1.score_rule3,0) <> 0 then 1 else 0 end +
case when coalesce(r2.score_rule1,0) <> 0 then 1 else 0 end +
case when coalesce(r2.score_rule2,0) <> 0 then 1 else 0 end +
case when coalesce(r2.score_rule3,0) <> 0 then 1 else 0 end) as Average
from
(select id from ruleset1
union
select id from ruleset2) r
left join ruleset1 r1 on r.id = r1.id
left join ruleset2 r2 on r.id = r2.id
group by r.id
SQLFiddle with PostgreSQL version is here: http://sqlfiddle.com/#!15/24e3f/1.
This example combines id from both tables using a union. Doing so allows the same ID in both ruleset1 and ruleset2 to appear just once in the result. r is an alias given to this generated table.
All the ids are then left joined with both tables. During the summation process, it is possible that the NULL values resulting from left join may impact the result. So the NULLs are coalesced to zero in the math.
dnoeth is the easy and clean answer.
here I was just playing with COALESCE and NVL2
select COALESCE(r.ID, s.ID),
COALESCE(r.score_rule1, 0) +
COALESCE(r.score_rule2, 0) +
COALESCE(r.score_rule3, 0) +
COALESCE(s.score_rule1, 0) +
COALESCE(s.score_rule2, 0) +
COALESCE(s.score_rule3, 0) as sum,
NVL2(r.score_rule1, 0, 1) +
NVL2(r.score_rule2, 0, 1) +
NVL2(r.score_rule3, 0, 1) +
NVL2(s.score_rule1, 0, 1) +
NVL2(s.score_rule2, 0, 1) +
NVL2(s.score_rule3, 0, 1) as tot
from ruleset1 r
full outer join ruleset2 s
on ruleset1.id = ruleset2.id;
Then your avg is sum/tot
union all your two tables, unpivot, change the zeros into null with nullif, and use standard avg() aggregate function:
select id, avg(nullif(value, 0)) as avg_value from (
select * from ruleset1
union all
select * from ruleset2
)
unpivot ( value for column_name in (score_rule1, score_rule2, score_rule3))
group by id
order by id
;
ID AVG_VALUE
---------- ----------
0 .55
1 .1
2 .2
SELECT s.id, AVG(s.score)
FROM(
SELECT id,score_rule1+score_rule2+score_rule3 as score
FROM ruleset2
UNION ALL
SELECT id,(score_rule1+score_rule2+score_rule3) as score
FROM ruleset1) s
group by s.id