Calculate percentages of columns in Oracle SQL - sql

I have three columns, all consisting of 1's and 0's. For each of these columns, how can I calculate the percentage of people (one person is one row/ id) who have a 1 in the first column and a 1 in the second or third column in oracle SQL?
For instance:
id marketing_campaign personal_campaign sales
1 1 0 0
2 1 1 0
1 0 1 1
4 0 0 1
So in this case, of all the people who were subjected to a marketing_campaign, 50 percent were subjected to a personal campaign as well, but zero percent is present in sales (no one bought anything).
Ultimately, I want to find out the order in which people get to the sales moment. Do they first go from marketing campaign to a personal campaign and then to sales, or do they buy anyway regardless of these channels.
This is a fictional example, so I realize that in this example there are many other ways to do this, but I hope anyone can help!
The outcome that I'm looking for is something like this:
percentage marketing_campaign/ personal campaign = 50 %
percentage marketing_campaign/sales = 0%
etc (for all the three column combinations)

Use count, sum and case expressions, together with basic arithmetic operators +,/,*
COUNT(*) gives a total count of people in the table
SUM(column) gives a sum of 1 in given column
case expressions make possible to implement more complex conditions
The common pattern is X / COUNT(*) * 100 which is used to calculate a percent of given value ( val / total * 100% )
An example:
SELECT
-- percentage of people that have 1 in marketing_campaign column
SUM( marketing_campaign ) / COUNT(*) * 100 As marketing_campaign_percent,
-- percentage of people that have 1 in sales column
SUM( sales ) / COUNT(*) * 100 As sales_percent,
-- complex condition:
-- percentage of people (one person is one row/ id) who have a 1
-- in the first column and a 1 in the second or third column
COUNT(
CASE WHEN marketing_campaign = 1
AND ( personal_campaign = 1 OR sales = 1 )
THEN 1 END
) / COUNT(*) * 100 As complex_condition_percent
FROM table;

You can get your percentages like this :
SELECT COUNT(*),
ROUND(100*(SUM(personal_campaign) / sum(count(*)) over ()),2) perc_personal_campaign,
ROUND(100*(SUM(sales) / sum(count(*)) over ()),2) perc_sales
FROM (
SELECT ID,
CASE
WHEN SUM(personal_campaign) > 0 THEN 1
ELSE 0
end AS personal_campaign,
CASE
WHEN SUM(sales) > 0 THEN 1
ELSE 0
end AS sales
FROM the_table
WHERE ID IN
(SELECT ID FROM the_table WHERE marketing_campaign = 1)
GROUP BY ID
)
I have a bit overcomplicated things because your data is still unclear to me. The subquery ensures that all duplicates are cleaned up and that you only have for each person a 1 or 0 in marketing_campaign and sales
About your second question :
Ultimately, I want to find out the order in which people get to the
sales moment. Do they first go from marketing campaign to a personal
campaign and then to sales, or do they buy anyway regardless of these
channels.
This is impossible to do in this state because you don't have in your table, either :
a unique row identifier that would keep the order in which the rows were inserted
a timestamp column that would tell when the rows were inserted.
Without this, the order of rows returned from your table will be unpredictable, or if you prefer, pure random.

Related

In SQL, how do you count if any rows in a group match a certain criteria?

I'm new to SQL, but I have a dataset that has students, their class subjects, and if there was an error in their work. I want to know how many students have at least 1 error in any subject. Thus, whether a student has one subject with an error (like students 2 and 3 in the example) or multiple errors (like student 4), they'd be flagged. Only if they have no errors should they be categorized as 'no'.
I know I have to use GROUP BY and COUNT, and I'm thinking I have to use HAVING as well, but I can't seem to put it together. Here's a sample dataset:
ID Rating Error
==========================================
1 English No
1 Math No
2 English Yes
2 Math No
2 Science No
3 English Yes
4 English Yes
4 Math Yes
And the desired output:
Error Count Percent
==========================================
No 1 .25
Yes 3 .75
there are many different ways you can do it, here is one example by using CTE (common table expressions):
with t as (
select
id,
case when sum(case when error='Yes' then 1 else 0 end) > 0 then 'Yes' else 'No' end as error
from students
group by id
)
select
error,
count(*),
(0.0 + count(*)) / (select count(*) from t) as perc
from t
group by error
basically, inner query (t) is used to calculate error status for each student, outer query calculates error distribution/percentage numbers
There are several useful functions you can use:
bool_or(boolean) → boolean - Returns TRUE if any input value is TRUE, otherwise FALSE.
if(condition, true_value, false_value) - Evaluates and returns true_value if condition is true, otherwise evaluates and returns false_value.
select count(distinct id) - to count distinct ids.
with dataset (ID,Rating,Error) as (
values (1,'Math','No'),
(2,'English','Yes'),
(1,'English','No'),
(2,'Math','No'),
(2,'Science','No'),
(3,'English','Yes'),
(4,'English','Yes'),
(4,'Math','Yes')
)
select if(has_error, 'Yes', 'No') Error,
count(*) Count,
cast(count(*) as double) / (select count(distinct id) from dataset) Percent
from (
select bool_or(Error = 'Yes') has_error
from dataset
group by id
)
group by has_error;
Output:
Error
Count
Percent
Yes
3
0.75
No
1
0.25

Get count With Distinct in SQL Server

Select
Count(Distinct iif(t.HasReplyTask = 1, t.CustomerID, Null)) As Reply,
Count(Distinct iif(t.HasOverdueTask = 1, t.CustomerID, Null)) As Overdue,
Count(Distinct t.CustomerID) As Total
From
Table1 t
If a customer is in Reply, we need to remove that customer in Overdue count, That means if Customer 123 is in both, The Overdue count should be one less. How can I do this?
I am adding some data here,
Customer 123 has "HasReplyTask", so, we have to filter that customer from Count in OverDue(even though that customer has one Overdue task without HasReplyTask). 234 is one and Distinct of 456 is one.
So, the overdue count should be 2, Above query returns 3
If I've got it right, this can be done using a subquery to get the numbers for each customer, and then get the summary information as follows:
Select Sum(HasReplyTask) As Reply,
Sum(HasOverdueTask) As Overdue,
Count(CustomerID) As Total
From (
Select CustomerID,
IIF(Max(Cast(HasReplyTask As TinyInt))<>0, 0, Max(Cast(HasOverdueTask As TinyInt))) As HasOverdueTask,
Max(Cast(HasReplyTask As TinyInt)) As HasReplyTask
From Table1
Group by CustomerID) As T
I don't know about column data types, so I used cast function to use max function.
db<>fiddle
Reply
Overdue
Total
1
2
3
What would probably be more efficient for you is to pre-aggregate your table by customer ID and have counts per customer. Then your outer query can test for whatever you are really looking for. Something like
select
sum( case when PQ.ReplyCount > 0 then 1 else 0 end ) UniqReply,
sum( case when PQ.OverdueCount > 0 then 1 else 0 end ) UniqOverdue,
sum( case when PQ.OverdueCount - PQ.ReplyCount > 0 then 1 else 0 end ) PendingReplies,
count(*) as UniqCustomers
from
( select
yt.customerid,
count(*) CustRecs,
sum( case when yt.HasReplyTask = 1 then 1 else 0 end ) ReplyCount,
sum( case when yt.HasOverdueTask = 1 then 1 else 0 end ) OverdueCount
from
yourTable yt
group by
yt.customerid ) PQ
Now to differentiate the count you are REALLY looking for, you might need to do a test against the prequery (PQ) of ReplyCount vs OverdueCount such as... For a single customer ID (thus the pre query), if the OverdueCount is GREATER than the ReplyCount, then it is still considered overdue? So for customer 123, they had 3 overdue, but only 2 replies. You want that counted once? But for customers 234 and 456, the only had overdue entries and NO replies. So, the total where Overdue - Reply > 0 = 3 distinct people.
Is that your ultimate goal?

Case Statement for multiple criteria

I would like to ignore some of the results of my query as for all intents and purposes, some of the results are a duplicate, but based on the way the request was made, we need to use this hierarchy and although we are seeing different 'Company_Name' 's, we need to ignore one of the results.
Query:
SELECT
COUNT(DISTINCT A12.Company_name) AS Customer_Name_Count,
Company_Name,
SUM(Total_Sales) AS Total_Sales
FROM
some_table AS A12
GROUP BY
2
ORDER BY
3 ASC, 2 ASC
This code omits half a doze joins and where statements that are not germane to this question.
Results:
Customer_Name_Count Company_Name Total_Sales
-------------------------------------------------------------
1 3 Blockbuster 1,000
2 6 Jimmy's Bar 1,500
3 6 Jimmy's Restaurant 1,500
4 9 Impala Hotel 2,000
5 12 Sports Drink 2,500
In the above set, we can see that numbers 2 & 3 have the same count and the same total_sales number and similar company names. Is there a way to create a case statement that takes these 3 factors into consideration and then drops one or the other for Jimmy's enterprises? The other issue is that this has to be variable as there are other instances where this happens. And I would only want this to happen if the count and sales number match each other with a similar name in the company name.
Desired result:
Customer_Name_Count Company_Name Total_Sales
--------------------------------------------------------------
1 3 Blockbuster 1,000
2 6 Jimmy's Bar 1,500
3 9 Impala Hotel 2,000
4 12 Sports Drink 2,500
Looks like other answers are accurate based on assumption that Company_IDs are the same for both.
If Company_IDs are different for both Jimmy's Bar and Jimmy's Restaurant then you can use something like this. I suggest you get functional users involved and do some data clean-up else you'll be maintaining this every time this issue arise:
SELECT
COUNT(DISTINCT CASE
WHEN A12.Company_Name = 'Name2' THEN 'Name1'
ELSE A12.Company_Name
END) AS Customer_Name_Count
,CASE
WHEN A12.Company_Name = 'Name2' THEN 'Name1'
ELSE A12.Company_Name
END AS Company_Name
,SUM(A12.Total_Sales) AS Total_Sales
FROM some_table er
GROUP BY CASE
WHEN A12.Company_Name = 'Name2' THEN 'Name1'
ELSE A12.Company_Name
END
Your problem is that the joins you are using are multiplying the number of rows. Somewhere along the way, multiple names are associated with exactly the same entity (which is why the numbers are the same). You can fix this by aggregating by the right id:
SELECT COUNT(DISTINCT A12.Company_name) AS Customer_Name_Count,
MAX(Company_Name) as Company_Name,
SUM(Total_Sales) AS Total_Sales
FROM some_table AS A12
GROUP BY Company_id -- I'm guessing the column is something like this
ORDER BY 3 ASC, 2 ASC;
This might actually overstate the sales (I don't know). Better would be fixing the join so it only returned one name. One possibility is that it is a type-2 dimension, meaning that there is a time component for values that change over time. You may need to restrict the join to a single time period.
You need to have function to return a common name for the companies and then use DISTINCT:
SELECT DISTINCT
Customer_Name_Count,
dbo.GetCommonName(Company_Name) as Company_Name,
Total_Sales
FROM dbo.theTable
You can try to use ROW_NUMBER with window function to make row number by Customer_Name_Count and Total_Sales then get rn = 1
SELECT * FROM (
SELECT *,ROW_NUMBER() OVER(PARTITION BY Customer_Name_Count,Total_Sales ORDER BY Company_Name) rn
FROM (
SELECT
COUNT(DISTINCT A12.Company_name) AS Customer_Name_Count,
Company_Name,
SUM(Total_Sales) AS Total_Sales
FROM
some_table AS A12
GROUP BY
Company_Name
)t1
)t1
WHERE rn = 1

How to compare between 2 COUNT() SQL Server

I have a table like this:
Name Profit
==============
A 50
A -10
A 60
I want to count how many data partition by Name and then I compare it with how many data that only profit. So, from the data above I will get result like this:
Name Total Profit Percentage
===============================
A 3 2 66.7
Please help me to solve this problem. Thanks in advance.
I think a simple GROUP BY query should work:
SELECT
Name,
COUNT(*) AS Total,
SUM(CASE WHEN Profit > 0 THEN 1 ELSE 0 END) AS Profit,
100.0 * SUM(CASE WHEN Profit > 0 THEN 1 ELSE 0 END) / COUNT(*) AS Percentage
FROM yourTable
GROUP BY
Name;
The only part of the query which might not be self-explanatory is the summation of the CASE expression. This sum tallies, for each group of records having the same name, the number of times that Profit has a non zero value. This technique is called conditional aggregation, and we also reuse this sum when calculating the percentage.
A somewhat enhanced version of Tim's answer (i.e. eliminate the calculation repetition):
SELECT Name, Total, Profit, 100 * Profit / Total AS Percentage
FROM (SELECT Name,
COUNT(*) AS Total,
SUM(CASE WHEN Profit > 0 THEN 1 ELSE 0 END) AS Profit
FROM yourTable
GROUP BY Name) q;
The engine may optimize the repletion, but it is mainly for readability and maintainability (in case the calculation needs to change). In this case, there is not much gain because it is only one repletion, but in cases where thee are several repetitions, this solution becomes more useful.
In case the maintainability point is not clear, let's say the definition of profitability has changed in the company and now we consider > 10 to be profitable. In Tim's query, you'll have to change every calculation from > 0 to > 10. In the query above, you'll only have to change it in one place.
Try this
declare #t table
(
Name varchar(10),
Profit int
)
insert into #t
values('A',50),('A',60),('A',-10)
SELECT
Name,
Profit = SUM(CASE WHEN Profit>0 THEN 1 ELSE 0 END),
Total = COUNT(1),
Average = (SUM(CASE WHEN Profit>0 THEN 1.0 ELSE 0.0 END)/COUNT(1))*100
FROM #t
GROUP BY Name

Find the proportion of rows verifying a condition in a single SQL query

Suppose I have a sales table which is as follows:
ID | Price
----------------------
1 0.33
2 1.5
3 0.5
4 10
5 0.99
I would like to find, in a single query, the proportion of rows verifying a given condition. For example, if the condition is Price < 1, the result should be 3/5 = 0.6.
The only workaround that I have found so far is :
SELECT
SUM(
CASE
WHEN Price < 1
THEN 1
WHEN Price >= 1
THEN 0
END
)/COUNT(*)
FROM sales
but is there a way to do this without CASE ?
You can do it with IF:
SELECT SUM(IF(Price < 1, 1, 0))/COUNT(*) FROM sales
-but it's no big difference from CASE (your logic here is correct)
You may want to use WHERE (to sum only Price<1) - but since you're using total COUNT it's not valid in your case. Another option: get total count separately:
SELECT
COUNT(sales.Price)/total_count
FROM
sales
CROSS JOIN (SELECT COUNt(*) AS total_count FROM sales) AS c
WHERE
-- you're summing 1 or 0 depending of Price, so your sum is
-- just count where Price<1
sales.Price<1