Add a Total Row and converting into percentages - sql

I have a database with information about people's work condition and neighbourhood.
I have to display a chart of information in percentages like this:
Neighbourhood Total Employed Unemployed Inactive
Total 100 50 25 25
1 100 45 30 25
2 100 55 20 25
To do that, the code that I've made so far is:
select neighbourhood, Count (*) as Total,
Count(Case when (condition = 1) then 'employed' end) as employed,
Count (case when (condition = 2) then 'unemployed' end) as unemployed,
Count (Case when (condition =3) then 'Inactive' end) as Inactive
from table
group by neighbourhood
order by neighbourhood
the output for that code is (the absolut numbers are made up, they dont result in the percentages above):
Neighbourhood Total Employed Unemployed Inactive
1 600 300 200 100
2 450 220 159 80
So, I have to turn the absolut numbers in percentages and add the Total Row (suming the values from the neighbourhoods) but I all my efforts were a failure. I can't solve how to add that Total row nor how to have that total for each neighbourhood for calculating the percentages
I started studying SQL just two weeks ago so I apologize for any inconvenience. I tried my best to keep it simple (in my database are 15 neighbourhoods and it's ok if they are labeled by numbers)
Thanks

You need to UNION to the add the total row
select 'All' as neighbourhood, Count (*) as Total,
Count(Case when (condition = 1) then 1 end) as employed,
Count (case when (condition = 2) then 1 end) as unemployed,
Count (Case when (condition =3) then 1 end) as Inactive
from table
UNION all
select neighbourhood, Count (*) as Total,
Count(Case when (condition = 1) then 1 end) as employed,
Count (case when (condition = 2) then 1 end) as unemployed,
Count (Case when (condition =3) then 1 end) as Inactive
from table
group by neighbourhood
order by neighbourhood

You can add the total rows using grouping sets:
select neighbourhood, Count(*) as Total,
sum((condition = 1)::int) as employed,
sum((condition = 2)::int) as unemployed,
sum((condition = 3)::int) as Inactive
from table
group by grouping sets ( (neighbourhood), () )
order by neighbourhood;
If you want averages within each row, then use avg() rather than sum().

Related

Get count With Distinct in SQL Server

Select
Count(Distinct iif(t.HasReplyTask = 1, t.CustomerID, Null)) As Reply,
Count(Distinct iif(t.HasOverdueTask = 1, t.CustomerID, Null)) As Overdue,
Count(Distinct t.CustomerID) As Total
From
Table1 t
If a customer is in Reply, we need to remove that customer in Overdue count, That means if Customer 123 is in both, The Overdue count should be one less. How can I do this?
I am adding some data here,
Customer 123 has "HasReplyTask", so, we have to filter that customer from Count in OverDue(even though that customer has one Overdue task without HasReplyTask). 234 is one and Distinct of 456 is one.
So, the overdue count should be 2, Above query returns 3
If I've got it right, this can be done using a subquery to get the numbers for each customer, and then get the summary information as follows:
Select Sum(HasReplyTask) As Reply,
Sum(HasOverdueTask) As Overdue,
Count(CustomerID) As Total
From (
Select CustomerID,
IIF(Max(Cast(HasReplyTask As TinyInt))<>0, 0, Max(Cast(HasOverdueTask As TinyInt))) As HasOverdueTask,
Max(Cast(HasReplyTask As TinyInt)) As HasReplyTask
From Table1
Group by CustomerID) As T
I don't know about column data types, so I used cast function to use max function.
db<>fiddle
Reply
Overdue
Total
1
2
3
What would probably be more efficient for you is to pre-aggregate your table by customer ID and have counts per customer. Then your outer query can test for whatever you are really looking for. Something like
select
sum( case when PQ.ReplyCount > 0 then 1 else 0 end ) UniqReply,
sum( case when PQ.OverdueCount > 0 then 1 else 0 end ) UniqOverdue,
sum( case when PQ.OverdueCount - PQ.ReplyCount > 0 then 1 else 0 end ) PendingReplies,
count(*) as UniqCustomers
from
( select
yt.customerid,
count(*) CustRecs,
sum( case when yt.HasReplyTask = 1 then 1 else 0 end ) ReplyCount,
sum( case when yt.HasOverdueTask = 1 then 1 else 0 end ) OverdueCount
from
yourTable yt
group by
yt.customerid ) PQ
Now to differentiate the count you are REALLY looking for, you might need to do a test against the prequery (PQ) of ReplyCount vs OverdueCount such as... For a single customer ID (thus the pre query), if the OverdueCount is GREATER than the ReplyCount, then it is still considered overdue? So for customer 123, they had 3 overdue, but only 2 replies. You want that counted once? But for customers 234 and 456, the only had overdue entries and NO replies. So, the total where Overdue - Reply > 0 = 3 distinct people.
Is that your ultimate goal?

How to compare between 2 COUNT() SQL Server

I have a table like this:
Name Profit
==============
A 50
A -10
A 60
I want to count how many data partition by Name and then I compare it with how many data that only profit. So, from the data above I will get result like this:
Name Total Profit Percentage
===============================
A 3 2 66.7
Please help me to solve this problem. Thanks in advance.
I think a simple GROUP BY query should work:
SELECT
Name,
COUNT(*) AS Total,
SUM(CASE WHEN Profit > 0 THEN 1 ELSE 0 END) AS Profit,
100.0 * SUM(CASE WHEN Profit > 0 THEN 1 ELSE 0 END) / COUNT(*) AS Percentage
FROM yourTable
GROUP BY
Name;
The only part of the query which might not be self-explanatory is the summation of the CASE expression. This sum tallies, for each group of records having the same name, the number of times that Profit has a non zero value. This technique is called conditional aggregation, and we also reuse this sum when calculating the percentage.
A somewhat enhanced version of Tim's answer (i.e. eliminate the calculation repetition):
SELECT Name, Total, Profit, 100 * Profit / Total AS Percentage
FROM (SELECT Name,
COUNT(*) AS Total,
SUM(CASE WHEN Profit > 0 THEN 1 ELSE 0 END) AS Profit
FROM yourTable
GROUP BY Name) q;
The engine may optimize the repletion, but it is mainly for readability and maintainability (in case the calculation needs to change). In this case, there is not much gain because it is only one repletion, but in cases where thee are several repetitions, this solution becomes more useful.
In case the maintainability point is not clear, let's say the definition of profitability has changed in the company and now we consider > 10 to be profitable. In Tim's query, you'll have to change every calculation from > 0 to > 10. In the query above, you'll only have to change it in one place.
Try this
declare #t table
(
Name varchar(10),
Profit int
)
insert into #t
values('A',50),('A',60),('A',-10)
SELECT
Name,
Profit = SUM(CASE WHEN Profit>0 THEN 1 ELSE 0 END),
Total = COUNT(1),
Average = (SUM(CASE WHEN Profit>0 THEN 1.0 ELSE 0.0 END)/COUNT(1))*100
FROM #t
GROUP BY Name

Calculate percentages of columns in Oracle SQL

I have three columns, all consisting of 1's and 0's. For each of these columns, how can I calculate the percentage of people (one person is one row/ id) who have a 1 in the first column and a 1 in the second or third column in oracle SQL?
For instance:
id marketing_campaign personal_campaign sales
1 1 0 0
2 1 1 0
1 0 1 1
4 0 0 1
So in this case, of all the people who were subjected to a marketing_campaign, 50 percent were subjected to a personal campaign as well, but zero percent is present in sales (no one bought anything).
Ultimately, I want to find out the order in which people get to the sales moment. Do they first go from marketing campaign to a personal campaign and then to sales, or do they buy anyway regardless of these channels.
This is a fictional example, so I realize that in this example there are many other ways to do this, but I hope anyone can help!
The outcome that I'm looking for is something like this:
percentage marketing_campaign/ personal campaign = 50 %
percentage marketing_campaign/sales = 0%
etc (for all the three column combinations)
Use count, sum and case expressions, together with basic arithmetic operators +,/,*
COUNT(*) gives a total count of people in the table
SUM(column) gives a sum of 1 in given column
case expressions make possible to implement more complex conditions
The common pattern is X / COUNT(*) * 100 which is used to calculate a percent of given value ( val / total * 100% )
An example:
SELECT
-- percentage of people that have 1 in marketing_campaign column
SUM( marketing_campaign ) / COUNT(*) * 100 As marketing_campaign_percent,
-- percentage of people that have 1 in sales column
SUM( sales ) / COUNT(*) * 100 As sales_percent,
-- complex condition:
-- percentage of people (one person is one row/ id) who have a 1
-- in the first column and a 1 in the second or third column
COUNT(
CASE WHEN marketing_campaign = 1
AND ( personal_campaign = 1 OR sales = 1 )
THEN 1 END
) / COUNT(*) * 100 As complex_condition_percent
FROM table;
You can get your percentages like this :
SELECT COUNT(*),
ROUND(100*(SUM(personal_campaign) / sum(count(*)) over ()),2) perc_personal_campaign,
ROUND(100*(SUM(sales) / sum(count(*)) over ()),2) perc_sales
FROM (
SELECT ID,
CASE
WHEN SUM(personal_campaign) > 0 THEN 1
ELSE 0
end AS personal_campaign,
CASE
WHEN SUM(sales) > 0 THEN 1
ELSE 0
end AS sales
FROM the_table
WHERE ID IN
(SELECT ID FROM the_table WHERE marketing_campaign = 1)
GROUP BY ID
)
I have a bit overcomplicated things because your data is still unclear to me. The subquery ensures that all duplicates are cleaned up and that you only have for each person a 1 or 0 in marketing_campaign and sales
About your second question :
Ultimately, I want to find out the order in which people get to the
sales moment. Do they first go from marketing campaign to a personal
campaign and then to sales, or do they buy anyway regardless of these
channels.
This is impossible to do in this state because you don't have in your table, either :
a unique row identifier that would keep the order in which the rows were inserted
a timestamp column that would tell when the rows were inserted.
Without this, the order of rows returned from your table will be unpredictable, or if you prefer, pure random.

How to subtract Total from conditioned sum in SQL

I want to do the following:
1) Find the total rows in a table
2) Find the total rows that meets a certain criteria.
3) Subtract (1) from (2).
Sample table Employees:
EmployeeID Nationality
1 Brazil
2 Korea
3 Germany
4 Brazil
5 Brazil
What I've tried:
SELECT count(EmployeeID) as Total from Employees
UNION
SELECT count(EmployeeID) as Brazilians from Employees
WHERE Nationality = 'Brazil'
Result:
Total
5
3
Row 1 will give me the total Employees. Row 2 will give me the Brazilian Employees.
I used UNION to see if I could subtract row 2 from row 1.
I could do this using CASE and SUM(), but that would require the row_number() function, which I can't use given that I'm using WebSQL. Is there another way to index these rows to be able to subtract?
Is there another approach I could use to solve this seemingly simple problem?
How about counting the rows that don't meet that criteria?
SELECT COUNT(EmployeedID) as non_brazilians
FROM Employees
WHERE Nationality <> 'Brazil';
You can use conditional aggregation:
select count(*) as TotalRows,
sum(case when Nationality = 'Brazil' then 1 else 0 end) as Brazilians,
sum(case when Nationality <> 'Brazil' then 1 else 0 end) as nonBrazilians
from Employee;
This assumes that Nationality is never NULL. If that is possible, the last condition should be:
sum(case when Nationality = 'Brazil' then 0 else 1 end) as nonBrazilians
Try this:
SELECT count(*) AS TotalRows
, (SELECT count(EmployeeID) FROM WHERE Nationality = 'Brazil') as Brazilians
, (count(*) - (SELECT count(EmployeeID) FROM WHERE Nationality = 'Brazil')) AS Subtract1From2
FROM Employee

SQL Server Group By which counts occurrences of a score

This might be a bit difficult to explain but I have two columns in my SQL server database which I have simplified...
Items
ID
itemName
voteCount
score
Votes
itemID
score
So, basically I am storing every vote that is placed in one table, but also a count of the number of votes for each item in the item table along with it's average score (out of 10) (which I know is duplicating data but it makes things easier).
Anyway, I want to create a SQL query which finds the 2 items which have the lowest score. This would be easy you would think as you'd just do this...
SELECT TOP 2 itemName FROM Items ORDER BY score ASC;
However, the client has added the following complication.
When 2 or more items have the same score then the item with the highest number of 10/10 votes would be placed above. If 2 or more items have the same score AND the same number of 10/10 votes then it would rank the item with the most 9/10 votes above the others and so on, right down to the number of 0/10 votes if everything else is equal.
So, the challenge is to rank all the items by these criteria then pick off the bottom 2. I have tried every combination of grouping, aggregating and "sub-querying" to work this out but I think I need the help of somebody much cleverer than me.
Any help would really be appreciated.
Clarification
The average score for an item is stored in the item table and the score cast against each vote is kept in the votes table. Initially we need to rank by the average score (I.score) and where 2 items have the same score we need to start counting the number of 10/10's in the votes linked to that item (v.score).
So, we might have an item called "t-shirt" which has an average score of 5/10. This comes from 6 votes with the following scores 5,5,5,5,5,5.
The next item is called "Ferrari" and also has an average score of 5/10, but this item only has 4 votes with the following scores 6,5,5,4
Clearly, the ferrari should win because the sql would see that it has no 10's, no 9's, no 8's, not 7's, but it does have a vote of 6 which trumps the t-shirt.
SELECT TOP 2 i.itemName
FROM Items i
left outer join (
select ItemID,
sum(case when score = 10 then 1 end) as Score10,
sum(case when score = 9 then 1 end) as Score9,
sum(case when score = 8 then 1 end) as Score8,
sum(case when score = 7 then 1 end) as Score7,
sum(case when score = 6 then 1 end) as Score6,
sum(case when score = 5 then 1 end) as Score5,
sum(case when score = 4 then 1 end) as Score4,
sum(case when score = 3 then 1 end) as Score3,
sum(case when score = 2 then 1 end) as Score2,
sum(case when score = 1 then 1 end) as Score1
from Votes
group by ItemID
) v on i.ID = v.ItemID
ORDER BY i.score,
v.Score10,
v.Score9,
v.Score8,
v.Score7,
v.Score6,
v.Score5,
v.Score4,
v.Score3,
v.Score2,
v.Score1