So is it possible to filter by one measure group and take the distinct members/people from that measure group and use it as a filter from another query from a different measure group.
Equivalent sql query
SELECT
sum(spend)
FROM
(SELECT distinct person_id FROM enrollment_fact WHERE program = 'blah') AS a
JOIN
(SELECT person_id, sum(spend) AS spend FROM sales_fact GROUP BY person_id) AS b
ON a.person_id = b.person_id
Assume I have two different measure groups in SSAS, one for spend and one from enrollment that are unrelated except for time and person_id
Also assume there are two many programs (thousands) to create columns/attributes for each person for each program to act as a member filter.
Is that the same as:
SELECT sum(spend) AS spend
FROM sales_fact
WHERE person_id IN
(
SELECT person_id
FROM enrollment_fact
WHERE program = 'blah'
GROUP BY person_id
)
If both these measure groups are in the same cube and the person_id dimension is shared then it looks possible in mdx.
The sub-query could be done with the NonEmpty function:
NONEMPTY(
[person].[person_id].[person_id].MEMBERS
, (
[program].[program].[blah]
, [measures].[someEnrollmentMeaure]
)
)
So adding this to an mdx WHERE clause will slice the cube by these person_ids:
SELECT
[measures].[someEnrollmentMeaure] ON 0
FROM [youCube]
WHERE
NONEMPTY(
[person].[person_id].[person_id].MEMBERS
,(
[program].[program].[blah]
,[measures].[someEnrollmentMeaure]
)
);
Related
i am trying to join two ctes to get the difference in performance of different countries and group on id here is my example
every campaign can be done in different countries, so how can i group by at the end to have 1 row per campaign id ?
CTE 1: (planned)
select
country
, campaign_id
, sum(sales) as planned_sales
from table x
group by 1,2
CTE 2: (Actual)
select
country
, campaign_id
, sum(sales) as actual_sales
from table y
group by 1,2
outer select
select
country,
planned_sales,
actual_sales
planned - actual as diff
from cte1
join cte2
on campaign_id = campaign_id
This should do it:
select
cte1.campaign_id,
sum(cte1.planned_sales),
sum(cte2.actual_sales)
sum(cte1.planned_sales) - sum(cte2.actual_sales) as diff
from cte1
join cte2
on cte1.campaign_id = cte2.campaign_id and cte1.country = cte2.country
group by 1
I would suggest using full join, so all data is included in both tables, not just data in one or the other. Your query is basically correct but it needs a group by.
select campaign_id,
sum(cte1.planned_sales) as planned_sales
sum(cte2.actual_sales) as actual_sales,
(coalesce(sum(cte1.planned_sales), 0) -
coalesce(sum(cte2.actual_sales), 0)
) as diff
from cte1 full join
cte2
using (campaign_id, country)
group by campaign_id;
That said, there is no reason why the CTEs should aggregate by both campaign and country. They could just aggregate by campaign id -- simplifying the query and improving performance.
Group by and Pivot operations give different counts.
I used Group by to get count of vehicles by City and used Pivot to get count of vehicles by Make.
SELECT MAKE, [AMB],[BNG],[CBE],[GBM],[KKE],[OMR],[PDR]
FROM
(
SELECT MAKE, BRANCH, COUNT(DISTINCT [VEH NO]) [VEHICLE COUNT]
FROM MAKE_MODEL_DESCRIPTION
GROUP BY MAKE, BRANCH
) X
PIVOT
(
SUM([VEHICLE COUNT]) FOR BRANCH IN ([AMB],[BNG],[CBE],[GBM],[KKE],
[OMR],[PDR])
) AS PVT
The total count I get for above Pivot query is 150.
select BRANCH, COUNT(distinct [VEH NO])
from MAKE_MODEL_DESCRIPTION
group by BRANCH
The total count I get for above GROUP BY query is 140.
Shouldn't both the same number given they are from same data source?
Can someone let me know where I am going wrong.
No, you should not expect the counts to be the same. The GROUP BY is counting distinct vehicles over all makes. The PIVOT is counting distinct vehicles only within a single branch and model.
In other words, the same vehicle might be in different branches.
If you include the make, then the numbers should be the same:
select MAKE, BRANCH, COUNT(distinct [VEH NO])
from MAKE_MODEL_DESCRIPTION
group by MAKE, BRANCH
I'm trying to query each distinct medical speciality (e.g. oncologist, pediatrician, etc.) in a table and then count the number of times a claim (claim_id) is linked to it, which I've done using this:
select distinct specialization, count(distinct claim_id) AS Claim_Totals
from table1
group by specialization
order by Claim_Totals DESC
However, I also want to include an additional column which lists the % that each speciality makes up in the table (based on the number of claim_id related to it). So for instance, if there were 100 total claims and "cardiologist" had 25 claim_id records related to it, "oncologist" had 15, "general surgeon" had 10, and so forth, I want the output to look like this:
specialization | Claims_Totals | PERCENTAGE
___________________________________________
cardiologist 25 25%
oncologist 15 15%
general surgeon 10 10%
Could do this? I'm not familiar with Barbaros's syntax. If that works its more concise and better.
select specialization, count(distinct claim_id) AS Claim_Totals, count(distinct claim_id)/total_claims
from table1
INNER JOIN ( SELECT COUNT(DISTINCT claim_id)*1.0000 total_claims AS total_claims
FROM table1 ) TMP
ON 1 = 1
group by specialization
order by Claim_Totals DESC
select specialization,
count(distinct claim_id) AS claim_by_spec,
count(distinct claim_id)/
( SELECT COUNT(DISTINCT claim_id)*1.0000
FROM table1 ) AS percentage_calc
from table1
group by specialization
order by Claim_Totals DESC
You can use sum(count(distinct)) over() to get the overall claims and use it in the denominator to get the percentage.
select specialization
,count(distinct claim_id) AS Claim_Totals
,round(100*count(distinct claim_id)/sum(count(distinct claim_id)) over(),3) as percentage
from table1
group by specialization
You can use
,concat_ws('',count(distinct claim_id),'%') as percentage
or
,concat(count(distinct claim_id),'%') as percentage
as added to the select list's tail
Btw, distinct before specialization in the select list is redundant, since already included in the group by list.
Because you are using count(distinct), window functions are less useful. You can try:
select t1.specialization,
count(distinct t1.claim_id) AS Claim_Totals,
count(distinct t1.claim_id) / tt1.num_claims
from table1 t1 cross join
(select count(distinct claim_id) as num_claims
from table1
) tt1
group by t1.specialization
order by Claim_Totals DESC
Using this query I can find the Company Assignee number for company with most patents but I can't seem to print the company name.
SELECT count(*), patent.assignee
FROM Patent
GROUP BY patent.assignee
HAVING count(*) =
(SELECT max(count(*))
FROM Patent
Group by patent.assignee);
COUNT(*) --- ASSIGNEE
9 19715
9 27895
Nesting above query into
SELECT company.compname
FROM company
WHERE ( company.assignee = ( *above query* ) );
would give an error "too many values" since there are two companies with most patents but above query takes only one assignee number in the WHERE clause. How do I solve this problem? I need to print name of BOTH companies with assignee number 19715 and 27895. Thank you.
You have started down the path of using nested queries. All you need to do is remove COUNT(*):
SELECT company.compname
FROM company
WHERE company.assignee IN
(SELECT patent.assignee
FROM Patent
GROUP BY patent.assignee
HAVING count(*) = (SELECT max(count(*))
FROM Patent
GROUP BY patent.assignee
)
);
I wouldn't write the query this way. The use of max(count(*)) is particularly jarring, but it is valid Oracle syntax.
Applying an aggregate function on another aggregate function (like max(count(*))) is illegal in many databases but I believe using the ALL operator instead and a join to get the company name would solve your problem.
Try this:
SELECT COUNT(*), p.assignee, c.compname
FROM Patent p
JOIN Company c ON c.assignee = p.assignee
GROUP BY p.assignee, c.compname
HAVING COUNT(*) >= ALL -- this predicate will return those rows
( -- for which the comparison holds true
SELECT COUNT(*) -- for all instances.
FROM Patent -- it can only be true for the highest count
GROUP BY assignee
);
Assuming you have Oracle, I thought about this a bit differently:
select
c.compname
from
company c
join
(
select
assignee,
dense_rank() over (order by count(1) desc) rnk
from
patent
group by
assignee
) p
on p.assignee = c.assignee
where
p.rnk = 1
;
I like this because is lets you find the any rank. For example, if you want the top 3 you would just change p.rnk = 1 to p.rnk <= 3. If you want 10th place, you just change it to p.rnk = 10. Adding the total count and rank into the results would be easy from here too. Overall I think it's more versatile.
I have a two tables that have following attributes
DOCTORS OPERATIONS
D_ID DATE
Name TYPE
Specialiation DOCTORS_D_ID
PACIENTS_PACIENT_ID
I want to return name and ID of doctores that operated more than the average number of operations per doctor.
I have created following SQL command
SELECT Name D_ID,COUNT(*) FROM DOCTORS
JOIN OPERATION
ON D_ID = DOCTORS_D_ID
GROUP BY D_ID,Name
HAVING COUNT(*) > ( SELECT AVG(COUNT(DOCTORs_D_ID))
FROM OPERATIONS GROUP by DOCTORS_D_ID )
this result in following table
D_ID COUNTS(*)
Dr. Martin 3
In column D_ID is name instead of ID = only one of two attributes is returned in table. How can I return both - name and D_ID from this command?
I am not a fan of nested aggregation functions. I would just do this by calculating the average directly:
SELECT Name, D_ID, COUNT(*)
FROM DOCTORS JOIN
OPERATION
ON D_ID = DOCTORS_D_ID
GROUP BY D_ID, Name
HAVING COUNT(*) > (SELECT COUNT(*) / COUNT(DISTINCT DOCTORs_D_ID))
FROM OPERATIONS
);
There is an issue of not counting doctors who do no operations in the average (in which case the average from just using the operations table [or an inner join with the operations table] will be higher than the actual answer from taking the number of operations in the operations table and the number of doctors in the doctors table).
To compensate for this you can do:
SELECT Name,
D_ID,
num_operations
FROM ( SELECT Name,
D_ID,
COUNT( 1 ) OVER () AS num_doctors
FROM doctors ) d
LEFT OUTER JOIN
( SELECT DISTINCT
DOCTORS_D_ID,
COUNT( 1 ) OVER ( PARTITION BY DOCTORS_D_ID ) AS num_operations,
COUNT( 1 ) OVER () AS total_operations
FROM operations ) o
ON ( d.d_id = o.doctors_d_id )
WHERE num_operations > total_operations / num_doctors;
It has the added bonus using analytic functions to calculate the counts rather than performing a third table scan.
with num_operations as
select doctors_d_id,count( * ) as operations from operations
group by doctors_d_id and having count(*)>
(select avg(count(doctors_d_id) from operations group by doctors_d_id )
select doctors_d_id,operations,name from num_operation a, doctors b
where a.doctors_d_id=b.d_id