Find the proportion of rows verifying a condition in a single SQL query - sql

Suppose I have a sales table which is as follows:
ID | Price
----------------------
1 0.33
2 1.5
3 0.5
4 10
5 0.99
I would like to find, in a single query, the proportion of rows verifying a given condition. For example, if the condition is Price < 1, the result should be 3/5 = 0.6.
The only workaround that I have found so far is :
SELECT
SUM(
CASE
WHEN Price < 1
THEN 1
WHEN Price >= 1
THEN 0
END
)/COUNT(*)
FROM sales
but is there a way to do this without CASE ?

You can do it with IF:
SELECT SUM(IF(Price < 1, 1, 0))/COUNT(*) FROM sales
-but it's no big difference from CASE (your logic here is correct)
You may want to use WHERE (to sum only Price<1) - but since you're using total COUNT it's not valid in your case. Another option: get total count separately:
SELECT
COUNT(sales.Price)/total_count
FROM
sales
CROSS JOIN (SELECT COUNt(*) AS total_count FROM sales) AS c
WHERE
-- you're summing 1 or 0 depending of Price, so your sum is
-- just count where Price<1
sales.Price<1

Related

Get count With Distinct in SQL Server

Select
Count(Distinct iif(t.HasReplyTask = 1, t.CustomerID, Null)) As Reply,
Count(Distinct iif(t.HasOverdueTask = 1, t.CustomerID, Null)) As Overdue,
Count(Distinct t.CustomerID) As Total
From
Table1 t
If a customer is in Reply, we need to remove that customer in Overdue count, That means if Customer 123 is in both, The Overdue count should be one less. How can I do this?
I am adding some data here,
Customer 123 has "HasReplyTask", so, we have to filter that customer from Count in OverDue(even though that customer has one Overdue task without HasReplyTask). 234 is one and Distinct of 456 is one.
So, the overdue count should be 2, Above query returns 3
If I've got it right, this can be done using a subquery to get the numbers for each customer, and then get the summary information as follows:
Select Sum(HasReplyTask) As Reply,
Sum(HasOverdueTask) As Overdue,
Count(CustomerID) As Total
From (
Select CustomerID,
IIF(Max(Cast(HasReplyTask As TinyInt))<>0, 0, Max(Cast(HasOverdueTask As TinyInt))) As HasOverdueTask,
Max(Cast(HasReplyTask As TinyInt)) As HasReplyTask
From Table1
Group by CustomerID) As T
I don't know about column data types, so I used cast function to use max function.
db<>fiddle
Reply
Overdue
Total
1
2
3
What would probably be more efficient for you is to pre-aggregate your table by customer ID and have counts per customer. Then your outer query can test for whatever you are really looking for. Something like
select
sum( case when PQ.ReplyCount > 0 then 1 else 0 end ) UniqReply,
sum( case when PQ.OverdueCount > 0 then 1 else 0 end ) UniqOverdue,
sum( case when PQ.OverdueCount - PQ.ReplyCount > 0 then 1 else 0 end ) PendingReplies,
count(*) as UniqCustomers
from
( select
yt.customerid,
count(*) CustRecs,
sum( case when yt.HasReplyTask = 1 then 1 else 0 end ) ReplyCount,
sum( case when yt.HasOverdueTask = 1 then 1 else 0 end ) OverdueCount
from
yourTable yt
group by
yt.customerid ) PQ
Now to differentiate the count you are REALLY looking for, you might need to do a test against the prequery (PQ) of ReplyCount vs OverdueCount such as... For a single customer ID (thus the pre query), if the OverdueCount is GREATER than the ReplyCount, then it is still considered overdue? So for customer 123, they had 3 overdue, but only 2 replies. You want that counted once? But for customers 234 and 456, the only had overdue entries and NO replies. So, the total where Overdue - Reply > 0 = 3 distinct people.
Is that your ultimate goal?

Calculate percentages of columns in Oracle SQL

I have three columns, all consisting of 1's and 0's. For each of these columns, how can I calculate the percentage of people (one person is one row/ id) who have a 1 in the first column and a 1 in the second or third column in oracle SQL?
For instance:
id marketing_campaign personal_campaign sales
1 1 0 0
2 1 1 0
1 0 1 1
4 0 0 1
So in this case, of all the people who were subjected to a marketing_campaign, 50 percent were subjected to a personal campaign as well, but zero percent is present in sales (no one bought anything).
Ultimately, I want to find out the order in which people get to the sales moment. Do they first go from marketing campaign to a personal campaign and then to sales, or do they buy anyway regardless of these channels.
This is a fictional example, so I realize that in this example there are many other ways to do this, but I hope anyone can help!
The outcome that I'm looking for is something like this:
percentage marketing_campaign/ personal campaign = 50 %
percentage marketing_campaign/sales = 0%
etc (for all the three column combinations)
Use count, sum and case expressions, together with basic arithmetic operators +,/,*
COUNT(*) gives a total count of people in the table
SUM(column) gives a sum of 1 in given column
case expressions make possible to implement more complex conditions
The common pattern is X / COUNT(*) * 100 which is used to calculate a percent of given value ( val / total * 100% )
An example:
SELECT
-- percentage of people that have 1 in marketing_campaign column
SUM( marketing_campaign ) / COUNT(*) * 100 As marketing_campaign_percent,
-- percentage of people that have 1 in sales column
SUM( sales ) / COUNT(*) * 100 As sales_percent,
-- complex condition:
-- percentage of people (one person is one row/ id) who have a 1
-- in the first column and a 1 in the second or third column
COUNT(
CASE WHEN marketing_campaign = 1
AND ( personal_campaign = 1 OR sales = 1 )
THEN 1 END
) / COUNT(*) * 100 As complex_condition_percent
FROM table;
You can get your percentages like this :
SELECT COUNT(*),
ROUND(100*(SUM(personal_campaign) / sum(count(*)) over ()),2) perc_personal_campaign,
ROUND(100*(SUM(sales) / sum(count(*)) over ()),2) perc_sales
FROM (
SELECT ID,
CASE
WHEN SUM(personal_campaign) > 0 THEN 1
ELSE 0
end AS personal_campaign,
CASE
WHEN SUM(sales) > 0 THEN 1
ELSE 0
end AS sales
FROM the_table
WHERE ID IN
(SELECT ID FROM the_table WHERE marketing_campaign = 1)
GROUP BY ID
)
I have a bit overcomplicated things because your data is still unclear to me. The subquery ensures that all duplicates are cleaned up and that you only have for each person a 1 or 0 in marketing_campaign and sales
About your second question :
Ultimately, I want to find out the order in which people get to the
sales moment. Do they first go from marketing campaign to a personal
campaign and then to sales, or do they buy anyway regardless of these
channels.
This is impossible to do in this state because you don't have in your table, either :
a unique row identifier that would keep the order in which the rows were inserted
a timestamp column that would tell when the rows were inserted.
Without this, the order of rows returned from your table will be unpredictable, or if you prefer, pure random.

Select percentage of rows that have a certain value - SQL

Say I have a table like this in Sequel Pro:
SaleID VendorID
1 A
2 C
3 E
4 C
5 D
And I want to find the percentage of sales in which Vendor C was the vendor. (Obviously in this case it is 40%, but I'm working with much bigger tables). How would I do that? I was thinking something using the Count function, but I'm not sure how I would do it exactly. Thanks!
select sum(case when vendorID = 'C' then 1 else 0 end) * 100 / count(*)
from your_table

MS Access query table without primary key

Claim# Total ValuationDt
1 100 1/1/12
2 550 1/1/12
1 2000 3/1/12
2 100 4/1/12
1 2100 8/1/12
3 200 8/1/12
3 250 11/1/12
Using MS Access, I need a query that returns only claims which have been valuated greater than $500 at some point in that claim's life time. In this example, the query should return
Claim# Total ValuationDt
1 100 1/1/12
2 550 1/1/12
1 2000 3/1/12
2 100 4/1/12
1 2100 8/1/12
because claim# 1 was valuated greater than $500 on 3/1/12, claim# 2 was valuated greater than $500 on 1/1/12, and claim# 3 was never valuated greater than $500.
You can use IN:
SELECT *
FROM Table1
WHERE Claim IN (SELECT Claim
FROM Table1
WHERE Total > 500)
Sql Fiddle Demo
Try this:
Select * from table where claim in (Select claim from table where total > 500)
Here table is the name of your table.
This could be the solution
SELECT distinct *
FROM YourTableName
WHERE claim# IN (SELECT DISTINCT claim#
FROM YourTableName
WHERE total > 500)
ORDER BY 3;
Optionally order by
This should work
Select DISTINCT Claim FROM yourtable Where Total > 500
EDIT:
In the case that my initial answer does not fulfill your requirements, then you can use a sub-query. A subquery is a query inside your query (nested queries). The reason we have to do it like that is because if you use something like
Select * FROM yourtable Where Total > 500
Then the result set would only be those moments where the total of the claim was higher than 500, but it would not indicate other moments where it was less or equal than 500.
Therefore, as others have stated, you use a subquery like:
SELECT *
FROM Table1
WHERE Claim IN (SELECT Claim
FROM Table1
WHERE Total > 500)
Note: see that there is a query after the IN keyword, so we have nested queries (or subquery if you prefer).
Why does it work? well, because:
SELECT Claim
FROM Table1
WHERE Total > 500
Will return every claim (only the number of the claim) in which the total was greater than 500 at some point. Therefore, this query will return 1 and 2. If you substitute that in the original query you get:
SELECT *
FROM Table1
WHERE Claim IN (1, 2)
Which will get you every column of every row with Claim numbers equal to either 1 or 2.
You can identify which [Claim#] values satisfy your condition ...
SELECT DISTINCT [Claim#]
FROM YourTable
WHERE [Total] > 500
If that was correct, use it as a subquery which you INNER JOIN to your table, to restrict the result set to only those claims.
SELECT y.[Claim#], y.[Total], y.[ValidationDt]
FROM YourTable AS y
INNER JOIN
(
SELECT DISTINCT [Claim#]
FROM YourTable
WHERE [Total] > 500
) AS sub
ON y.[Claim#] = sub.[Claim#];
Compare this approach vs. the IN() suggestions and see whether you notice any difference in execution speed.
You should be able to use
SELECT [Claim#],[Total],[ValidationDt]
FROM yourtable
WHERE [Claim#] IN (SELECT [Claim#]
FROM yourtable
WHERE Total >= 500)
Should return all values >= 500 for any ValidationDt.

DB2 SQL filter query result by evaluating an ID which has two types of entries

After many attempts I have failed at this and hoping someone can help. The query returns every entry a user makes when items are made in the factory against and order number. For example
Order Number Entry type Quantity
3000 1 1000
3000 1 500
3000 2 300
3000 2 100
4000 2 1000
5000 1 1000
What I want to the query do is to return filter the results like this
If the order number has an entry type 1 and 2 return the row which is type 1 only
otherwise just return row whatever the type is for that order number.
So the above would end up:
Order Number Entry type Quantity
3000 1 1000
3000 1 500
4000 2 1000
5000 1 1000
Currently my query (DB2, in very basic terms looks like this ) and was correct until a change request came through!
Select * from bookings where type=1 or type=2
thanks!
select * from bookings
left outer join (
select order_number,
max(case when type=1 then 1 else 0 end) +
max(case when type=2 then 1 else 0 end) as type_1_and_2
from bookings
group by order_number
) has_1_and_2 on
type_1_and_2 = 2
has_1_and_2.order_number = bookings.order_number
where
bookings.type = 1 or
has_1_and_2.order_number is null
Find all the orders that have both type 1 and type 2, and then join it.
If the row matched the join, only return it if it is type 1
If the row did not match the join (has_type_2.order_number is null) return it no matter what the type is.
A "common table expression" [CTE] can often simplify your logic. You can think of it as a way to break a complex problem into conceptual steps. In the example below, you can think of g as the name of the result set of the CTE, which will then be joined to
WITH g as
( SELECT order_number, min(type) as low_type
FROM bookings
GROUP BY order_number
)
SELECT b.*
FROM g
JOIN bookings b ON g.order_number = b.order_number
AND g.low_type = b.type
The JOIN ON conditions will work so that if both types are present then low_type will be 1, and only that type of record will be chosen. If there is only one type it will be identical to low_type.
This should work fine as long as 1 and 2 are the only types allowed in the bookings table. If not then you can simply add a WHERE clause in the CTE and in the outer SELECT.