How to compare between 2 COUNT() SQL Server - sql

I have a table like this:
Name Profit
==============
A 50
A -10
A 60
I want to count how many data partition by Name and then I compare it with how many data that only profit. So, from the data above I will get result like this:
Name Total Profit Percentage
===============================
A 3 2 66.7
Please help me to solve this problem. Thanks in advance.

I think a simple GROUP BY query should work:
SELECT
Name,
COUNT(*) AS Total,
SUM(CASE WHEN Profit > 0 THEN 1 ELSE 0 END) AS Profit,
100.0 * SUM(CASE WHEN Profit > 0 THEN 1 ELSE 0 END) / COUNT(*) AS Percentage
FROM yourTable
GROUP BY
Name;
The only part of the query which might not be self-explanatory is the summation of the CASE expression. This sum tallies, for each group of records having the same name, the number of times that Profit has a non zero value. This technique is called conditional aggregation, and we also reuse this sum when calculating the percentage.

A somewhat enhanced version of Tim's answer (i.e. eliminate the calculation repetition):
SELECT Name, Total, Profit, 100 * Profit / Total AS Percentage
FROM (SELECT Name,
COUNT(*) AS Total,
SUM(CASE WHEN Profit > 0 THEN 1 ELSE 0 END) AS Profit
FROM yourTable
GROUP BY Name) q;
The engine may optimize the repletion, but it is mainly for readability and maintainability (in case the calculation needs to change). In this case, there is not much gain because it is only one repletion, but in cases where thee are several repetitions, this solution becomes more useful.
In case the maintainability point is not clear, let's say the definition of profitability has changed in the company and now we consider > 10 to be profitable. In Tim's query, you'll have to change every calculation from > 0 to > 10. In the query above, you'll only have to change it in one place.

Try this
declare #t table
(
Name varchar(10),
Profit int
)
insert into #t
values('A',50),('A',60),('A',-10)
SELECT
Name,
Profit = SUM(CASE WHEN Profit>0 THEN 1 ELSE 0 END),
Total = COUNT(1),
Average = (SUM(CASE WHEN Profit>0 THEN 1.0 ELSE 0.0 END)/COUNT(1))*100
FROM #t
GROUP BY Name

Related

Get count With Distinct in SQL Server

Select
Count(Distinct iif(t.HasReplyTask = 1, t.CustomerID, Null)) As Reply,
Count(Distinct iif(t.HasOverdueTask = 1, t.CustomerID, Null)) As Overdue,
Count(Distinct t.CustomerID) As Total
From
Table1 t
If a customer is in Reply, we need to remove that customer in Overdue count, That means if Customer 123 is in both, The Overdue count should be one less. How can I do this?
I am adding some data here,
Customer 123 has "HasReplyTask", so, we have to filter that customer from Count in OverDue(even though that customer has one Overdue task without HasReplyTask). 234 is one and Distinct of 456 is one.
So, the overdue count should be 2, Above query returns 3
If I've got it right, this can be done using a subquery to get the numbers for each customer, and then get the summary information as follows:
Select Sum(HasReplyTask) As Reply,
Sum(HasOverdueTask) As Overdue,
Count(CustomerID) As Total
From (
Select CustomerID,
IIF(Max(Cast(HasReplyTask As TinyInt))<>0, 0, Max(Cast(HasOverdueTask As TinyInt))) As HasOverdueTask,
Max(Cast(HasReplyTask As TinyInt)) As HasReplyTask
From Table1
Group by CustomerID) As T
I don't know about column data types, so I used cast function to use max function.
db<>fiddle
Reply
Overdue
Total
1
2
3
What would probably be more efficient for you is to pre-aggregate your table by customer ID and have counts per customer. Then your outer query can test for whatever you are really looking for. Something like
select
sum( case when PQ.ReplyCount > 0 then 1 else 0 end ) UniqReply,
sum( case when PQ.OverdueCount > 0 then 1 else 0 end ) UniqOverdue,
sum( case when PQ.OverdueCount - PQ.ReplyCount > 0 then 1 else 0 end ) PendingReplies,
count(*) as UniqCustomers
from
( select
yt.customerid,
count(*) CustRecs,
sum( case when yt.HasReplyTask = 1 then 1 else 0 end ) ReplyCount,
sum( case when yt.HasOverdueTask = 1 then 1 else 0 end ) OverdueCount
from
yourTable yt
group by
yt.customerid ) PQ
Now to differentiate the count you are REALLY looking for, you might need to do a test against the prequery (PQ) of ReplyCount vs OverdueCount such as... For a single customer ID (thus the pre query), if the OverdueCount is GREATER than the ReplyCount, then it is still considered overdue? So for customer 123, they had 3 overdue, but only 2 replies. You want that counted once? But for customers 234 and 456, the only had overdue entries and NO replies. So, the total where Overdue - Reply > 0 = 3 distinct people.
Is that your ultimate goal?

SQL SUM and value conversion

I'm looking to transform data in my SUM query to acknowledge that some numeric values are negative in nature, although not represented as such.
I look for customer balance where the example dataset includes also credit transactions that are not written as negative in the database (although all records that have value C for credit in inv_type column should be treated as negative in the SQL SUM function). As an example:
INVOICES
inv_no inv_type cust_no value
1 D 25 10
2 D 35 30
3 C 25 5
4 D 25 50
5 C 35 2
My simple SUM function would not give me the correct answer:
select cust_no, sum(value) from INVOICES
group by cust_no
This query would obviously sum the balance of customer no 25 for 65 and no 35 for 32, although the anticipated answer would be 10-5+50 = 55 and 30 - 2 = 28
Should I perhaps utilize CAST function somehow? Unfortunately I'm not up to date on the underlying db engine, however good chance of it being of IBM origin. Most of the basic SQL code has worked out so far though.
You can use the case expression inside of a sum(). The simplest syntax would be:
select cust_no,
sum(case when inv_type = 'C' then - value else value end) as total
from invoices
group by cust_no;
Note that value could be a reserved word in your database, so you might need to escape the column name.
You should be able to write a projection (select) first to obtain a signed value column based on inv_type or whatever, and then do a sum over that.
Like this:
select cust_no, sum(value) from (
select cust_no
, case when inv_type='D' then [value] else -[value] end [value]
from INVOICES
) SUMS
group by cust_no
You can put an expression in the sum that calculates a negative value if the invoice is a credit:
select
cust_no,
sum
(
case inv_type
when 'C' then -[value]
else [value]
end
) as [Total]
from INVOICES

Add a Total Row and converting into percentages

I have a database with information about people's work condition and neighbourhood.
I have to display a chart of information in percentages like this:
Neighbourhood Total Employed Unemployed Inactive
Total 100 50 25 25
1 100 45 30 25
2 100 55 20 25
To do that, the code that I've made so far is:
select neighbourhood, Count (*) as Total,
Count(Case when (condition = 1) then 'employed' end) as employed,
Count (case when (condition = 2) then 'unemployed' end) as unemployed,
Count (Case when (condition =3) then 'Inactive' end) as Inactive
from table
group by neighbourhood
order by neighbourhood
the output for that code is (the absolut numbers are made up, they dont result in the percentages above):
Neighbourhood Total Employed Unemployed Inactive
1 600 300 200 100
2 450 220 159 80
So, I have to turn the absolut numbers in percentages and add the Total Row (suming the values from the neighbourhoods) but I all my efforts were a failure. I can't solve how to add that Total row nor how to have that total for each neighbourhood for calculating the percentages
I started studying SQL just two weeks ago so I apologize for any inconvenience. I tried my best to keep it simple (in my database are 15 neighbourhoods and it's ok if they are labeled by numbers)
Thanks
You need to UNION to the add the total row
select 'All' as neighbourhood, Count (*) as Total,
Count(Case when (condition = 1) then 1 end) as employed,
Count (case when (condition = 2) then 1 end) as unemployed,
Count (Case when (condition =3) then 1 end) as Inactive
from table
UNION all
select neighbourhood, Count (*) as Total,
Count(Case when (condition = 1) then 1 end) as employed,
Count (case when (condition = 2) then 1 end) as unemployed,
Count (Case when (condition =3) then 1 end) as Inactive
from table
group by neighbourhood
order by neighbourhood
You can add the total rows using grouping sets:
select neighbourhood, Count(*) as Total,
sum((condition = 1)::int) as employed,
sum((condition = 2)::int) as unemployed,
sum((condition = 3)::int) as Inactive
from table
group by grouping sets ( (neighbourhood), () )
order by neighbourhood;
If you want averages within each row, then use avg() rather than sum().

Pivot for redshift database

I know this question has been asked before but any of the answers were not able to help me to meet my desired requirements. So asking the question in new thread
In redshift how can use pivot the data into a form of one row per each unique dimension set, e.g.:
id Name Category count
8660 Iced Chocolate Coffees 105
8660 Iced Chocolate Milkshakes 10
8662 Old Monk Beer 29
8663 Burger Snacks 18
to
id Name Cofees Milkshakes Beer Snacks
8660 Iced Chocolate 105 10 0 0
8662 Old Monk 0 0 29 0
8663 Burger 0 0 0 18
The category listed above gets keep on changing.
Redshift does not support the pivot operator and a case expression would not be of much help (if not please suggest how to do it)
How can I achieve this result in redshift?
(The above is just an example, we would have 1000+ categories and these categories keep's on changing)
i don't think there is a easy way to do that in Redshift,
also you say you have more then 1000 categories and the number is growing
you need to taking in to account you have limit of 1600 columns per table,
see attached link
[http://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_TABLE_usage.html][1]
you can use case but then you need to create case for each category
select id,
name,
sum(case when Category='Coffees' then count end) as Cofees,
sum(case when Category='Milkshakes' then count end) as Milkshakes,
sum(case when Category='Beer' then count end) as Beer,
sum(case when Category='Snacks' then count end) as Snacks
from my_table
group by 1,2
other option you have is to upload the table for example to R and then you can use cast function for example.
cast(data, name~ category)
and then upload the data back to S3 or Redshift
We do a lot of pivoting at Ro - we built python based tool for autogenerating pivot queries. This tool allows for the same basic options as what you'd find in excel, including specifying aggregation functions as well as whether you want overall aggregates.
Redshift released a Pivot/Unpivot functionality on last re:Invent 2021 (December 2021): https://docs.aws.amazon.com/redshift/latest/dg/r_FROM_clause-pivot-unpivot-examples.html
SELECT *
FROM (SELECT id, Name, Category, count FROM my_table) PIVOT (
SUM(count) FOR Category IN ('Coffees', 'Milkshakes', 'Beer', 'Snacks')
);
If you will typically want to query specific subsets of the categories from the pivot table, a workaround based on the approach linked in the comments might work.
You can populate your "pivot_table" from the original like so:
insert into pivot_table (id, Name, json_cats) (
select id, Name,
'{' || listagg(quote_ident(Category) || ':' || count, ',')
within group (order by Category) || '}' as json_cats
from to_pivot
group by id, Name
)
And access specific categories this way:
select id, Name,
nvl(json_extract_path_text(json_cats, 'Snacks')::int, 0) Snacks,
nvl(json_extract_path_text(json_cats, 'Beer')::int, 0) Beer
from pivot_table
Using varchar(max) for the JSON column type will give 65535 bytes which should be room for a couple thousand categories.
#user3600910 is right with the approach however 'END' is required else '500310' invalid operation would occur.
select id,
name,
sum(case when Category='Coffees' then count END) as Cofees,
sum(case when Category='Milkshakes' then count END) as Milkshakes,
sum(case when Category='Beer' then count END) as Beer,
sum(case when Category='Snacks' then count END) as Snacks
from my_table
group by 1,2
The answer given above worked for me after switching count to 1
select id,
name,
sum(case when Category='Coffees' then 1 end) as Cofees,
sum(case when Category='Milkshakes' then 1 end) as Milkshakes,
sum(case when Category='Beer' then 1 end) as Beer,
sum(case when Category='Snacks' then 1 end) as Snacks
from my_table
group by 1,2

Calculate percentages of columns in Oracle SQL

I have three columns, all consisting of 1's and 0's. For each of these columns, how can I calculate the percentage of people (one person is one row/ id) who have a 1 in the first column and a 1 in the second or third column in oracle SQL?
For instance:
id marketing_campaign personal_campaign sales
1 1 0 0
2 1 1 0
1 0 1 1
4 0 0 1
So in this case, of all the people who were subjected to a marketing_campaign, 50 percent were subjected to a personal campaign as well, but zero percent is present in sales (no one bought anything).
Ultimately, I want to find out the order in which people get to the sales moment. Do they first go from marketing campaign to a personal campaign and then to sales, or do they buy anyway regardless of these channels.
This is a fictional example, so I realize that in this example there are many other ways to do this, but I hope anyone can help!
The outcome that I'm looking for is something like this:
percentage marketing_campaign/ personal campaign = 50 %
percentage marketing_campaign/sales = 0%
etc (for all the three column combinations)
Use count, sum and case expressions, together with basic arithmetic operators +,/,*
COUNT(*) gives a total count of people in the table
SUM(column) gives a sum of 1 in given column
case expressions make possible to implement more complex conditions
The common pattern is X / COUNT(*) * 100 which is used to calculate a percent of given value ( val / total * 100% )
An example:
SELECT
-- percentage of people that have 1 in marketing_campaign column
SUM( marketing_campaign ) / COUNT(*) * 100 As marketing_campaign_percent,
-- percentage of people that have 1 in sales column
SUM( sales ) / COUNT(*) * 100 As sales_percent,
-- complex condition:
-- percentage of people (one person is one row/ id) who have a 1
-- in the first column and a 1 in the second or third column
COUNT(
CASE WHEN marketing_campaign = 1
AND ( personal_campaign = 1 OR sales = 1 )
THEN 1 END
) / COUNT(*) * 100 As complex_condition_percent
FROM table;
You can get your percentages like this :
SELECT COUNT(*),
ROUND(100*(SUM(personal_campaign) / sum(count(*)) over ()),2) perc_personal_campaign,
ROUND(100*(SUM(sales) / sum(count(*)) over ()),2) perc_sales
FROM (
SELECT ID,
CASE
WHEN SUM(personal_campaign) > 0 THEN 1
ELSE 0
end AS personal_campaign,
CASE
WHEN SUM(sales) > 0 THEN 1
ELSE 0
end AS sales
FROM the_table
WHERE ID IN
(SELECT ID FROM the_table WHERE marketing_campaign = 1)
GROUP BY ID
)
I have a bit overcomplicated things because your data is still unclear to me. The subquery ensures that all duplicates are cleaned up and that you only have for each person a 1 or 0 in marketing_campaign and sales
About your second question :
Ultimately, I want to find out the order in which people get to the
sales moment. Do they first go from marketing campaign to a personal
campaign and then to sales, or do they buy anyway regardless of these
channels.
This is impossible to do in this state because you don't have in your table, either :
a unique row identifier that would keep the order in which the rows were inserted
a timestamp column that would tell when the rows were inserted.
Without this, the order of rows returned from your table will be unpredictable, or if you prefer, pure random.