Apply COUNT function on a subgroup of groups - sql

I made up this weird example trying to illustrate what I want to do (it's kind of stupid, but bear with me):
Consider the following table:
EMPLOYEES
married, certified and religious are just boolean fields (in case of Oracle, they are of type NUMBER(1,0)).
I need to come up with SQL that displays for each hire_year, count of married, certified and religious employees within the following salary categories:
A SALARY > 2000
B SALARY BETWEEN 1000 AND 2000
C SALARY < 1000
Based on the above dataset, here is what I expect to get:
So far, I've only come up with the following SQL:
SELECT
COUNT(CASE WHEN married = 1 THEN 1 END) as MARRIED,
COUNT(CASE WHEN certified = 1 THEN 1 END) as certified,
COUNT(CASE WHEN religious = 1 THEN 1 END) as religious,
hire_year
FROM employees
GROUP BY hire_year;
The result of executing this SQL is:
Which is almost what I need, but I also need to divide these counters further down into the groups based on a salary range.
I guess that some analytic function, that divides groups into the buckets based on some SQL expression would help, but I can't figure out which one. I tried with NTILE, but it expects a positive constant as a parameter, rather than an SQL expression (such as SALARY BETWEEN X and Y).

Nope, no need for analytic functions; they're difficult to have in the same query as an aggregate function anyway.
You're looking for the case statement again, you just have to put it in the GROUP BY.
select hire_year
, sum(married) as married
, sum(certified) as certified
, sum(religious) as religious
, case when salary > 2000 then 'A'
when salary >= 1000 then 'B'
else 'C' end as salary_class
from employees
group by hire_year
, case when salary > 2000 then 'A'
when salary >= 1000 then 'B'
else 'C' end
Note that I've changed your count(case when...) to sum(). This is because you're using a boolean 1/0 so this'll work in the same manner but it's a lot cleaner.
For the same reason I've ignored your between in your salary calculation; there's no particular need for it as if the salary is greater than 2000 the first CASE has already been fulfilled.

Related

I need to do a query from 2 tables using count function

The query contains 4 columns: the full name of the doctor, the number of male patients, the number of female patients, and the total number of patients seen by that doctor.
My problem is that I dont know how to count the number of males and females
I am only suppoused to use COUNT, GROUP BY and basic DML (cant use case when)
data in the table PACIENTE
er diagram
data in table medico
This depends on which database you are using specifically. One possible way to write this is:
SELECT
doc_name,
COUNT(CASE WHEN PAT_SEX = 'M' THEN 1 END) males,
COUNT(CASE WHEN PAT_SEX = 'F' THEN 1 END) females
FROM
...
Another common syntax for this is:
COUNT(IF PAT_SEX = 'M' THEN 1 ENDIF)
Some databases support this directly:
COUNTIF(PAT_SEX = 'M')
If you would really like to avoid any kind of conditional, then you could add gender to your groups but then you will have two rows for each doctor:
SELECT
doc_name,
pat_sex,
count(*)
FROM
...
GROUP BY
doc_name,
pat_sex

Select multiple maximum values from the same column

I have a user database where users are assigned an ID number in a certain range according to their type. For example, board members get an ID between 1 and 100, children get an ID between 1001 and 3000, parents get an ID between 3001 and 7000 etc.*
I'd like to get a list of the highest number in use for each "segment" of my IDs.
I can of course get the highest number of all by doing
SELECT MAX(Persons.Number) as Maximum FROM Persons
and get the highest number below 3000 like this:
SELECT MAX(Persons.Number) as MaxChild FROM Persons WHERE Persons.Number<=3000
...but how could I get the highest number below 100 AND the highest number below 1000 AND the highest number below 3000 etc. etc. with a single SELECT statement?
*I do have these characteristics stored in the database elsewhere; the "bucketing" of ID numbers is just for making it easier to spot at first glance where a certain user belongs
Just use IF():
SELECT
MAX(IF(Persons.Number BETWEEN x AND y, Persons.Number, NULL)) AS max_range_x_y,
MAX(IF(Persons.Number BETWEEN i AND j, Persons.Number, NULL)) AS max_range_i_j, ...
FROM Persons;
Above is MySQL syntax. In SQL Server you might use IIF() instead. What should work in every RDBMS (because it's ANSI-SQL Standard) is
SELECT
MAX(CASE WHEN Persons.Number BETWEEN x AND y THEN Persons.Number ELSE NULL END) AS max_range_x_y,
MAX(CASE WHEN Persons.Number BETWEEN i AND j THEN Persons.Number ELSE NULL END) AS max_range_i_j, ...
FROM Persons;
SELECT
MAX(CASE WHEN id BETWEEN 1 and 100 THEN Number ELSE NULL END) as BoardMax,
MAX(CASE WHEN ID BETWEEN 1001 and 3000 THEN Number ELSE null END) as ChildMax,
MAX(CASE WHEN ID BETWEEN 3001 and 7000 THEN Number ELSE null END) as ParentMax
from
Persons

Calculate percentages of columns in Oracle SQL

I have three columns, all consisting of 1's and 0's. For each of these columns, how can I calculate the percentage of people (one person is one row/ id) who have a 1 in the first column and a 1 in the second or third column in oracle SQL?
For instance:
id marketing_campaign personal_campaign sales
1 1 0 0
2 1 1 0
1 0 1 1
4 0 0 1
So in this case, of all the people who were subjected to a marketing_campaign, 50 percent were subjected to a personal campaign as well, but zero percent is present in sales (no one bought anything).
Ultimately, I want to find out the order in which people get to the sales moment. Do they first go from marketing campaign to a personal campaign and then to sales, or do they buy anyway regardless of these channels.
This is a fictional example, so I realize that in this example there are many other ways to do this, but I hope anyone can help!
The outcome that I'm looking for is something like this:
percentage marketing_campaign/ personal campaign = 50 %
percentage marketing_campaign/sales = 0%
etc (for all the three column combinations)
Use count, sum and case expressions, together with basic arithmetic operators +,/,*
COUNT(*) gives a total count of people in the table
SUM(column) gives a sum of 1 in given column
case expressions make possible to implement more complex conditions
The common pattern is X / COUNT(*) * 100 which is used to calculate a percent of given value ( val / total * 100% )
An example:
SELECT
-- percentage of people that have 1 in marketing_campaign column
SUM( marketing_campaign ) / COUNT(*) * 100 As marketing_campaign_percent,
-- percentage of people that have 1 in sales column
SUM( sales ) / COUNT(*) * 100 As sales_percent,
-- complex condition:
-- percentage of people (one person is one row/ id) who have a 1
-- in the first column and a 1 in the second or third column
COUNT(
CASE WHEN marketing_campaign = 1
AND ( personal_campaign = 1 OR sales = 1 )
THEN 1 END
) / COUNT(*) * 100 As complex_condition_percent
FROM table;
You can get your percentages like this :
SELECT COUNT(*),
ROUND(100*(SUM(personal_campaign) / sum(count(*)) over ()),2) perc_personal_campaign,
ROUND(100*(SUM(sales) / sum(count(*)) over ()),2) perc_sales
FROM (
SELECT ID,
CASE
WHEN SUM(personal_campaign) > 0 THEN 1
ELSE 0
end AS personal_campaign,
CASE
WHEN SUM(sales) > 0 THEN 1
ELSE 0
end AS sales
FROM the_table
WHERE ID IN
(SELECT ID FROM the_table WHERE marketing_campaign = 1)
GROUP BY ID
)
I have a bit overcomplicated things because your data is still unclear to me. The subquery ensures that all duplicates are cleaned up and that you only have for each person a 1 or 0 in marketing_campaign and sales
About your second question :
Ultimately, I want to find out the order in which people get to the
sales moment. Do they first go from marketing campaign to a personal
campaign and then to sales, or do they buy anyway regardless of these
channels.
This is impossible to do in this state because you don't have in your table, either :
a unique row identifier that would keep the order in which the rows were inserted
a timestamp column that would tell when the rows were inserted.
Without this, the order of rows returned from your table will be unpredictable, or if you prefer, pure random.

SQL Count Expressions

I am trying to create a table to will count the occurrences of each position for various offices.
So if my data is as follows:
Office Position
A Manager
A Supervisor
A Entry Level
A Entry Level
B Manager
B Entry Level
I would want my code to return:
Office Managers Supervisors EntryLevel
A 1 1 2
B 1 0 1
I have my code below. The issue is that this code counts the total amount of occurrences, not the unique count to each office. The results are as follows
A 2 1 3
B 2 1 3
CREATE TABLE OfficeTest AS
SELECT DISTINCT Office,
(Select COUNT(Position) FROM OfficeData WHERE Make_Name = 'Manager') as Managers,
(Select COUNT(Position) FROM OfficeData WHERE Make_Name = 'Supervisor') as Supervisors,
(Select COUNT(Position) FROM OfficeData WHERE Make_Name = 'Entry Level') as EntryLevel
FROM OfficeData
GROUP BY Office;
Any ideas on how to fix this?
The easiest way I can think of doing this is like this:
SELECT Office,
COUNT(CASE WHEN Make_Name = 'Manager' THEN Position END) AS Managers,
COUNT(CASE WHEN Make_Name = 'Supervisor' THEN Position END) AS Supervisors,
COUNT(CASE WHEN Make_Name = 'Entry Level' THEN Position END) AS EntryLevel
FROM OfficeData
GROUP BY Office
COUNT ignores MISSING values; if the Position is not the one specified in the CASE clause, it will return a MISSING value and won't be counted. This way each case considers only the value of Position you compare.
Another option, as stated in the comments, would be pivoting the table. The SAS equivalent is the TRANSPOSE procedure. I don't have a SAS system to create and test a query using it, but here's the documentation in case you want to check it out.
Just to flush out Danny's comment a bit, the SUM code would look like:
proc sql;
CREATE TABLE want AS
SELECT office,
SUM( (position='Manager') ) as Managers,
SUM( (position='Supervisor') ) as Supervisors,
SUM( (position='Entry Level') ) as EntryLevel
FROM OfficeData
GROUP BY office
;quit;
The (position='Manager') bit resolves to 0 or 1, depending on if its true for the current record. I find the SUM version a lot more concise and legible, but both should work for your situation. Plus, its easily extensible to more than one criteria, like (postion='Manager')*(sex='F') to count only female managers.
SUM with CASE statement should resolve the issue. Below is a reference code
proc sql;
create table result as
select age
, sum(case sex when 'F' then 1 else 0 end) as Female
, sum(case sex when 'M' then 1 else 0 end) as Male
from sashelp.class
group by age;
quit;
proc print data=result;run;

How to count 2 different data in one query

I need to calculate sum of occurences of some data in two columns in one query. DB is in SQL Server 2005.
For example I have this table:
Person: Id, Name, Age
And I need to get in one query those results:
1. Count of Persons that have name 'John'
2. Count of 'John' with age more than 30 y.
I can do that with subqueries in this way (it is only example):
SELECT (SELECT COUNT(Id) FROM Persons WHERE Name = 'John'),
(SELECT COUNT (Id) FROM Persons WHERE Name = 'John' AND age > 30)
FROM Persons
But this is very slow, and I'm searching for faster method.
I found this solution for MySQL (it almost solve my problem, but it is not for SQL Server).
Do you know better way to calculate few counts in one query than using subqueries?
Using a CASE statement lets you count whatever you want in a single query:
SELECT
SUM(CASE WHEN Persons.Name = 'John' THEN 1 ELSE 0 END) AS JohnCount,
SUM(CASE WHEN Persons.Name = 'John' AND Persons.Age > 30 THEN 1 ELSE 0 END) AS OldJohnsCount,
COUNT(*) AS AllPersonsCount
FROM Persons
Use:
SELECT COUNT(p.id),
SUM(CASE WHEN p.age > 30 THEN 1 ELSE 0 END)
FROM PERSONS p
WHERE p.name = 'John'
It's always preferable when accessing the same table more than once, to review for how it can be done in a single pass (SELECT statement). It won't always be possible.
Edit:
If you need to do other things in the query, see Chris Shaffer's answer.