SQL add multiple calculated rows after using pivot - sql

I have a table of data that looks like this:
|Agency|Rating |AVG|
------------------------
|Army |Exceptional|10 |
|Navy |Very Good |8.5|
And I need to pivot/count the number of each type of rating as well as calculate the totals and percentages of each rating category so it looks like this:
|Rating |Army|Navy|
--------------------------
|Exceptional |1 |0 |
|Very Good |0 |1 |
|Satisfactory |0 |0 |
|Marginal |0 |0 |
|Unsatisfactory|0 |0 |
|Total |1 |1 |
|% of Ex |100 |0 |
|% of VG |0 |100 |
|% of Sat |0 |0 |
|% of Mar |0 |0 |
|% of Uns |0 |0 |
Using the following Query:
select RatingWords as Rating, case when grouping([RatingWords]) = 1 then
'Total' else [RatingWords] end as [RatingWords], Sum([Navy]) as Navy,
sum([Army]) as Army
from
(select ratingwords, agency,
CASE WHEN Average BETWEEN 8.6 AND 10 THEN 1 else 0 end as Exceptional,
case WHEN Average BETWEEN 7.1 AND 8.5 THEN 1 else 0 end as VeryGood,
case WHEN Average BETWEEN 3.1 AND 7.0 THEN 1 else 0 end as Satisfactory,
case WHEN Average BETWEEN 0.1 AND 3.0 THEN 1 else 0 end as Marginal,
case WHEN Average = 0 THEN 1 ELSE 0 END as Unsatisfactory
from dbo.DOD_Average) as sourcetable
pivot
(
Count(exceptional)
for Agency in ([Navy], [Army])
) as PivotTable
group by grouping sets ((RatingWords),())
I get the following table:
|Rating |Army|Navy|
--------------------------
|Exceptional |1 |0 |
|Very Good |0 |1 |
|Satisfactory |0 |0 |
|Marginal |0 |0 |
|Unsatisfactory|0 |0 |
|Total |1 |1 |
So my question is, how can I add another group below the Total row to calculate the percentages? Or, if this is not the best way to create this report, i'm open to reworking it.

Keeping your above query in a table variable and UNION ALL the percentage queries. Something like this:
Declare #table table (Rating nvarchar(60), Army int, Navy int);
Insert into #table
select RatingWords as Rating, case when grouping([RatingWords]) = 1 then
'Total' else [RatingWords] end as [RatingWords], Sum([Navy]) as Navy,
sum([Army]) as Army
from
(select ratingwords, agency,
CASE WHEN Average BETWEEN 8.6 AND 10 THEN 1 else 0 end as Exceptional,
case WHEN Average BETWEEN 7.1 AND 8.5 THEN 1 else 0 end as VeryGood,
case WHEN Average BETWEEN 3.1 AND 7.0 THEN 1 else 0 end as Satisfactory,
case WHEN Average BETWEEN 0.1 AND 3.0 THEN 1 else 0 end as Marginal,
case WHEN Average = 0 THEN 1 ELSE 0 END as Unsatisfactory
from dbo.DOD_Average) as sourcetable
pivot
(
Count(exceptional)
for Agency in ([Navy], [Army])
) as PivotTable
group by grouping sets ((RatingWords),())
Declare #army int, #navy int;
Select #army = Army, #navy = Navy from #table where Rating = 'Total';
Select * from #table
union all
Select '% of'+Rating, (Army/#army)*100, (Navy/#navy)*100 from #table
where Rating <> 'Total'

Using your existing query as a CTE, you can SELECT * from it, and follow that with a series of UNION ALL queries that get each of the percentage rows that you want.

Related

In PostgreSQL, conditionally count rows

Background
I'm a novice Postgres user running a local server on a Windows 10 machine. I've got a dataset g that looks like this:
+--+---------+----------------+
|id|treatment|outcome_category|
+--+---------+----------------+
|a |1 |cardiovascular |
|a |0 |cardiovascular |
|b |0 |metabolic |
|b |0 |sensory |
|c |1 |NULL |
|c |0 |cardiovascular |
|c |1 |sensory |
|d |1 |NULL |
|d |0 |cns |
+--+---------+----------------+
The Problem
I'd like to get a count of outcome_category by outcome_category for those id who are "ever treated" -- defined as "id's who have any row where treatment=1".
Here's the desired result:
+----------------+---------+
|outcome_category| count |
+----------------+---------+
|cardiovascular | 3 |
|sensory | 1 |
|cns | 1 |
+----------------+---------+
It would be fine if the result had to contain metabolic, like so:
+----------------+---------+
|outcome_category|treatment|
+----------------+---------+
|cardiovascular | 3 |
|metabolic | 0 |
|sensory | 1 |
|cns | 1 |
+----------------+---------+
Obviously I don't need the rows to be in any particular order, though descending would be nice.
What I've tried
Here's a query I've written:
select treatment, outcome_category, sum(outcome_ct)
from (select max(treatment) as treatment,
outcome_category,
count(outcome_category) as outcome_ct
from g
group by outcome_category) as sub
group by outcome_category, sub.treatment;
But it's a mishmash result:
+---------+----------------+---+
|treatment|outcome_category|sum|
+---------+----------------+---+
|1 |cardiovascular |3 |
|1 |sensory |2 |
|0 |metabolic |1 |
|1 |NULL |0 |
|0 |cns |1 |
+---------+----------------+---+
I'm trying to identify the "ever exposed" id's using that first line in the subquery: select max(treatment) as treatment. But I'm not quite getting at the rest of it.
EDIT
I realized that the toy dataset g I originally gave you above doesn't correspond to the idiosyncrasies of my real dataset. I've updated g to reflect that many id's who are "ever treated" won't have a non-null outcome_category next to a row with treatment=1.
Interesting little problem. You can do:
select
outcome_category,
count(x.id) as count
from g
left join (
select distinct id from g where treatment = 1
) x on x.id = g.id
where outcome_category is not null
group by outcome_category
order by count desc
Result:
outcome_category count
----------------- -----
cardiovascular 3
sensory 1
cns 1
metabolic 0
See running example at db<>fiddle.
This would appear to be just a simple aggregation,
select outcome_category, Count(*) count
from t
where treatment=1
group by outcome_category
order by Count(*) desc
Demo fiddle

Conditional count of rows where at least one peer qualifies

Background
I'm a novice SQL user. Using PostgreSQL 13 on Windows 10 locally, I have a table t:
+--+---------+-------+
|id|treatment|outcome|
+--+---------+-------+
|a |1 |0 |
|a |1 |1 |
|b |0 |1 |
|c |1 |0 |
|c |0 |1 |
|c |1 |1 |
+--+---------+-------+
The Problem
I didn't explain myself well initially, so I've rewritten the goal.
Desired result:
+-----------------------+-----+
|ever treated |count|
+-----------------------+-----+
|0 |1 |
|1 |3 |
+-----------------------+-----+
First, identify id that have ever been treated. Being "ever treated" means having any row with treatment = 1.
Second, count rows with outcome = 1 for each of those two groups. From my original table, the ids who are "ever treated" have a total of 3 outcome = 1, and the "never treated", so to speak, have 1 `outcome = 1.
What I've tried
I can get much of the way there, I think, with something like this:
select treatment, count(outcome)
from t
group by treatment;
But that only gets me this result:
+---------+-----+
|treatment|count|
+---------+-----+
|0 |2 |
|1 |4 |
+---------+-----+
For the updated question:
SELECT ever_treated, sum(outcome_ct) AS count
FROM (
SELECT id
, max(treatment) AS ever_treated
, count(*) FILTER (WHERE outcome = 1) AS outcome_ct
FROM t
GROUP BY 1
) sub
GROUP BY 1;
ever_treated | count
--------------+-------
0 | 1
1 | 3
db<>fiddle here
Read:
For those who got no treatment at all (all treatment = 0), we see 1 x outcome = 1.
For those who got any treatment (at least one treatment = 1), we see 3 x outcome = 1.
Would be simpler and faster with proper boolean values instead of integer.
(Answer to updated question)
here is an easy to follow subquery logic that works with integer:
select subq.ever_treated, sum(subq.count) as count
from (select id, max(treatment) as ever_treated, count(*) as count
from t where outcome = 1
group by id) as subq
group by subq.ever_treated;

Big Query or SQL reshape data

im use bigquery for storage data
For example im have table
userId|event |count
------------- |
1 |event1 |1
1 |event2 |2
2 |event1 |2
2 |event2 |1
2 |event3 |4
3 |event1 |3
4 |event3 |5
4 |event4 |5
How i can get this table?(on column event{index} count sum)
using only ability BigQuery(or SQL)
userId|event1 |event2|event3|event4
----------------------------------
1 |1 |2 |0 |0 |
2 |2 |1 |4 |0 |
3 |0 |0 |0 |0 |
4 |0 |0 |5 |5 |
If you have just few events below will work for you - you will need to construct as many respective rows as you have different events. If number of expected events constant - you can always easily build such query once and then use it
SELECT
userID,
SUM(CASE WHEN event = 'event1' THEN [count] ELSE 0 END) AS event1,
SUM(CASE WHEN event = 'event2' THEN [count] ELSE 0 END) AS event2,
SUM(CASE WHEN event = 'event3' THEN [count] ELSE 0 END) AS event3,
SUM(CASE WHEN event = 'event4' THEN [count] ELSE 0 END) AS event4
FROM YourTable
GROUP BY userId
If you need something more dynamic - look at very similar example https://stackoverflow.com/a/36623258/5221944
In your case that query to build dynamic sql will look as below
SELECT 'SELECT userId, ' +
GROUP_CONCAT_UNQUOTED(
'SUM(IF(event="'+event+'",[count],0)) as [d_'+REPLACE(event,'/','_')+']'
)
+ ' FROM YourTable GROUP BY userId ORDER BY userId'
FROM (
SELECT event FROM YourTable GROUP BY event ORDER BY event
)
Note below line
'SUM(IF(event="'+event+'",[count],0)) as [d_'+REPLACE(event,'/','_')+']'
It makes sure your even name complies with requirement for fields/columns name
If your evens will always look like event1, event2, etc you can simplify this line and use
'SUM(IF(event = "' + event + '", [count], 0)) as ' + event

SQL Find duplicates against specific records

Edit: It is MS SQL Server 2008
I want to find duplicates against only specific records,
in example below I want to find duplicates against records that have Status = 1
Here is example data set
ID |Name |Status
------------------------
1 |ABC |1
2 |BAC |1
3 |CBA |1
4 |ABC |2
5 |BAC |5
6 |BAC |7
7 |DAE |8
8 |DAE |2
What I want to get is this
Name |Count
-----------------
ABC |2
BAC |3
Originally I thought to use this
SELECT Name,COUNT(*)
GROUP BY Name
HAVING COUNT(*) > 1
But the result would be
Name |Count
-----------------
ABC |2
BAC |3
DAE |2
But that's not what I need.
You are close. You want to change the having clause to just count values where status is 1:
SELECT Name, COUNT(*)
FROM table t
GROUP BY Name
HAVING sum(case when status = 1 then 1 else 0 end) > 0;
EDIT:
If you only want things with a count greater than 1 as well as a status of 1:
SELECT Name, COUNT(*)
FROM table t
GROUP BY Name
HAVING sum(case when status = 1 then 1 else 0 end) > 0 and
count(*) > 1;

selecting records matching condition A and at least X matching B

I have table of data as follows
|table_id|ref_table_id|is_used| date |url|
|--------+------------+-------+-------------------+---|
|1 |1 | | |abc|
|2 |1 | |2016-01-01 00:00:00|abc|
|3 |1 |0 | |abc|
|4 |1 |1 | |abc|
|5 |2 | | | |
|6 |2 | |2016-01-01 00:00:00|abc|
|7 |2 |1 | |abc|
|8 |2 |1 |2016-01-01 00:00:00|abc|
|9 |2 |1 |2016-01-01 00:00:00|abc|
|10 |3 | | | |
|11 |3 | |2016-01-01 00:00:00|abc|
|12 |3 |0 | | |
|13 |3 |0 | | |
|14 |3 |0 |2016-01-01 00:00:00| |
|15 |3 |1 |2016-01-01 00:00:00|abc|
...
|int |int |boolean|timestamp |varchar|
As it is obvious, the combination of null values and filled values in the columns is_used, date, url has no rules.
Now I want to get distinct ref_table_id with conditions
there is at least 1 row that is not used and has empty date and url
there are fewer than X rows that are not used and has filled either
date or url
The table has many rows (~7mil) and groupped ref_table_id can range from 50 rows to 600k rows.
I tried to create this select, which runs for more than 2secs.
select
distinct on (ref_table_id) t1.ref_table_id,
count(1) as my_count
from my_table t1 inner join (
select distinct t2.ref_table_id from my_table t2
where t2.is_used is not true -- null or false
and t2.url is null
and t2.date is null
group by t2.ref_table_id
) tjoin on t1.ref_table_id = tjoin.ref_table_id
where t1.is_used is not true
and (t1.date is not null
or t1.url is not null)
group by t1.ref_table_id
having my_count < X
order by 1,2;
Can I rewrite it using INTERSECT, VIEW or other db features so that it would be faster?
This sounds like aggregation with a having clause:
select ref_table_id
from my_table t
group by ref_table_id
having sum(case when is_used = 0 and date is null and url is null
then 1 else 0 end) > 0 and
sum(case when is_used = 0 and (date is not null or url is not null)
then 1 else 0 end) >= N;
This checks explicitly for is_used to be 0 as the meaning of "not used". I'm not sure what the blanks represent, so the logic may need to be tweaked.
As a note, you can simplify the query by removing the common condition on is_used:
select ref_table_id
from my_table t
where is_used = 0 -- or is_used is NULL ??
group by ref_table_id
having sum(case when date is null and url is null
then 1 else 0 end) > 0 and
sum(case when (date is not null or url is not null)
then 1 else 0 end) >= N;