Nested groups and counts within SQL - sql

I can't eloquently explain my end problem, feel free to edit both this title and content of this question if you can describe the SQL solution better.
I have data in a table containing the columns "Item", "Category", and I have the following query calculating the number of distinct categories within each item.
SELECT
[ItemID], COUNT(DISTINCT [CategoryText] ) AS 'number of categories'
FROM [SampleTable]
GROUP BY
ItemID
This gives me the output
`ItemID', 'number of categories'
100 1
101 3
102 1
What I now want to do is group the number of items by the number of categories they have, with my aim being to determine 'Do most items only have one category?'
For the example above, I would expect the outcome to be
'Number of categories', 'Number of items'
1, 2
3, 1
I'm sure there is a simple query to get to this but am going around in circles without making progress.

Aggregate the results by number of categories.
select [number of categories], count(*) [number of items]
from (SELECT [ItemID], COUNT(DISTINCT [CategoryText]) AS 'number of categories'
FROM [SampleTable]
GROUP BY ItemID) t
group by [number of categories]

Related

Countif or CASE with multiple conditions

I am trying to figure out the most efficient way to count products being placed in a online cart . I have ranked the first 3 items placed in a cart by purchase time(time they were put in the cart not actual check out time), but now am struggling to figure out a way to count the different combinations of items going into the cart.
Counting the individual ranks is easy enough, but I need to figure out a count for purchasing product 1 first and product 1 second as well as all the combinations possible (5 products total). I only need to count first items in the cart, all combinations of first item in cart to second item in cart, and all combinations of second item in cart to third third item in cart.
SELECT
COUNTIF(product = 'Product1' and rank = 1) as firstpurchase_product1,
COUNTIF((product = 'Product1' and rank = 1) and (product = 'Product1' and rank = 2)) as firstpurchase_product1_secondpurchase_product1,
COUNTIF((product = 'Product1' and rank = 1) and (product = 'Product2' and rank = 2)) as firstpurchase_product1_secondpurchase_product2,
#code would continue for all combinations.
FROM(
customer_info.customer_id as customer_id,
customer_info.session_id as session_id,
customer_info.product_purchased as product,
ROW_NUMBER() OVER (PARTITION BY customer_info.session_id ORDER BY customer_info.purchase_time ASC) AS rank
FROM customer_purchases cp,
WHERE p_date >= "2022-04-12"
)rnk
where rnk.finish_rank in (1,2,3)
This seems like a lot of code, is there a better way to do it? The query is returning 0 for all line except when counting just first purchases, should I be using CASE instead?
Any thoughts or ideas would be appreciated.
Thanks!
Example of input:
Product 1, Product 2, Product 3
Product 1, Product 1, Product 1
Product 4, Product 2, Product 1
Product 3, Product 3, Product 5
Product 4, Product 2, Product 4
--this goes on for hundreds of lines
Output:
Count Product 1 in first column
Count Product 2 in first column
#continue for all 5
Count of customers who put product 1 in cart first AND product 1 in cart
second
Count of customers who put product 1 in cart first AND product 2 in cart second
###continue with all combinations with product 1
Count of customers who put product 2 in cart first and product 1 in cart second
Count of customers who put product 2 in the cart first and product 2 in the cart second
###continue with all combinations of product 2,3,4, and 5
It seems to me that you want to GROUP BY a set of columns (item1, item2, item3) and produce a count of the number of times each combination occurs.
Possibly (it's a little unclear from your wording - a well-formatted table showing example raw data and desired results for that example would be helpful), you want to know an overall count for values of item1 regardless of the other items. This can be achieved via GROUP BY ROLLUP(item1, item2, item3).
So, our aim is to get an unaggregated table with those columns, so that we can aggregate it as described!
You have a long-format table (customer ID, session ID, product, rank) and we want a wide-format table with a column for each value of rank. This is a PIVOT operation:
WITH rnk AS (
SELECT
customer_id,
session_id,
product_purchased AS product,
ROW_NUMBER() OVER (PARTITION BY session_id ORDER BY purchase_time ASC) AS rank
FROM customer_info
WHERE p_date >= "2022-04-12"
QUALIFY rank IN (1,2,3)
),
pivoted AS (
SELECT *
FROM rnk PIVOT(
ANY_VALUE(product) AS item FOR rank in (1,2,3)
)
)
SELECT
item_1,
item_2,
item_3,
COUNT(*) AS N
FROM
pivoted
GROUP BY
ROLLUP(item_1, item_2, item_3)
Does that get you what you want?
A couple of features to note:
I use common table expressions (WITH) to make this more readable
QUALIFY is a filter clause to apply to the output of a window clause
Pivoting requires an aggregation function because in general there could be many records with the same value of session, product, and rank. Here we know there will be one record only, so it's safe to use ANY_VALUE (which 'aggregates' by non-deterministically choosing one of the values).
Just to prevent confusion: ROLLUP will give you something like 'Product A', NULL, NULL for some of its records - this doesn't mean items 2 and 3 don't exist, it's just how it signals those records that group only by item 1 and aggregate over all values of the other items.

count the number of times a combination of values occurs

Dataset looking at the types of crime for a given city.
Incident ID
Incident Code
Incident Category
Incident Subcategory
Incident Description
618691
4134
Assault
Simple Assault
Battery
618691
15300
Offences Against The Family And Children
Other
Hate Crime (secondary only)
618701
7053
Vehicle Impounded
Vehicle Impounded
Vehicle, Impounded
618701
65010
Traffic Violation Arrest
Traffic Violation Arrest
Traffic Violation Arrest
618701
65050
Other Miscellaneous
Other
Driving While Under The Influence Of Alcohol
626010
5043
Burglary
Burglary - Residential
Burglary, Residence, Unlawful Entry
626010
6381
Larceny Theft
Larceny Theft - Other
Embezzlement from Dependent or Elder Adult by Caretaker
626010
7041
Recovered Vehicle
Recovered Vehicle
Vehicle, Recovered, Auto
626010
16650
Drug Offense
Drug Violation
Methamphetamine Offense
Each IncidentID has 2, 3, or 4 Incident Codes associated with it.
I want to be able to count the number of times each combination of 2, 3, or 4 Incident Codes appears in the entire dataset.
For example:
Incident Codes 4134, 15300: x amount of times
Incident Codes 7053, 65010, 65050: x amount of times
Incident Codes 5043, 6381, 7041, 16650: x amount of times
I apologize if I've given a poor explanation - this is my first post on SO and quite frankly I don't know how to best communicate this question.
I don't know what SQL code to run to get my answer. The closest I've come to finding an answer is this post, Select combination of two columns, and count occurrences of this combination, but it already has the data separated into two columns, which my data is not there.
My thought is to split the additional codes into other columns, but perhaps there is a way to avoid doing that by having the code run the calculation for me without it.
I appreciate any and all input you may be able to give!
Let's suppose your table is named "TableX". I think this query should be near to what you need:
Select T1.IncidentCode, T2.IncidentCode, T3.IncidentCode, T4.IncidentCode, Count(1) AS AmountOfTimes
From TableX T1
Join TableX T2 ON T2.IncidentID = T1.IncidentID AND
T2.IncidentCode <> T1.IncidentCode
Left Join TableX T3 ON T3.IncidentID = T1.IncidentID AND
T3.IncidentCode <> T1.IncidentCode AND
T3.IncidentCode <> T2.IncidentCode
Left Join TableX T4 ON T4.IncidentID = T1.IncidentID AND
T4.IncidentCode <> T1.IncidentCode AND
T4.IncidentCode <> T2.IncidentCode AND
T4.IncidentCode <> T3.IncidentCode
Group By T1.IncidentCode, T2.IncidentCode, T3.IncidentCode, T4.IncidentCode
You would probably be best to try and NOT get all 3 parts in one query and here is why. Lets say for example that one officer enters their data as codes 1, 2, 3. Another enters codes as 3, 1, 2, and yet another enters as 2, 3, 1. They are all the same "set" of codes just in different order. If you rely on just being the first being the same, you would be getting 3 different rows showing the same thing each with 1 count.
You would be better served by running 3 distinct queries with a WHERE and HAVING clause based on just the codes you are interested in the "set". Something simple like
select
YT.IncidentID,
count(*) HowMany
from
YourTable YT
where
YT.IncidentCode in ( 4134, 15300 )
group by
YT.IncidentID
having
count(*) = 2
This will return all incidents that have BOTH parts, even if the incident was associated with any 3rd and/or 4th additional codes in a given incident. Having the total records IS your count.
So, now, take your codes of interest ex: 1 & 2, and you have the possibility of 2 more incident codes per incident, and you add an additional 30+ combinations of codes 3 & 4 into the mix. If you dont care about the others that may be "extra", it does not screw up your count on the precise piece(s) you are looking for.
Then, all you have to do to get your other "what if" scenario counts is change your IN clause once and the having to match the count. Since you are only filtering based on the specific codes in question, you only want those that have the same count regardless of extra incident codes per example stated.
YT.IncidentCode in ( 7053, 65010, 65050 )
group by
YT.IncidentID
having
count(*) = 3
YT.IncidentCode in ( 5043, 6381, 7041, 16650 )
group by
YT.IncidentID
having
count(*) = 4
Now, if you only really care about the final count of each respectively, just wrap that up one more to get the count of rows returned such as
select
count(*) NumberOfIncidents
from
( select
YT.IncidentID,
count(*) HowMany
from
YourTable YT
where
YT.IncidentCode in ( 4134, 15300 )
group by
YT.IncidentID
having
count(*) = 2 ) PreQualified
Then, if you wanted to do this on some time period basis such as you have a given date of the incident, and you wanted to keep running the same query / counts, you could expand and do something like this by doing a UNION to each query.
select
'Assault and Offenses against Family and Children' as Activity,
count(*) NumberOfIncidents
from
( select
YT.IncidentID,
count(*) HowMany
from
YourTable YT
where
YT.IncidentCode in ( 4134, 15300 )
AND WhateverDateFilters...
group by
YT.IncidentID
having
count(*) = 2 ) PreQualified
UNION
select
'Vehicle Impound, Traffic Arrest, Other Misc' as Activity,
count(*) NumberOfIncidents
from
( select
YT.IncidentID,
count(*) HowMany
from
YourTable YT
where
YT.IncidentCode in ( 7053, 65010, 65050 )
AND WhateverDateFilters...
group by
YT.IncidentID
having
count(*) = 3 ) PreQualified
UNION
select
'Burglary, Theft, Drugs and Vehicle Recovery' as Activity,
count(*) NumberOfIncidents
from
( select
YT.IncidentID,
count(*) HowMany
from
YourTable YT
where
YT.IncidentCode in ( 5043, 6381, 7041, 16650 )
AND WhateverDateFilters...
group by
YT.IncidentID
having
count(*) = 4 ) PreQualified
Notice each query in the UNION returns the same number, and order of columns. So it will just return a list (in this case) of 3 rows with a description and count per category regardless of the physical order the incident codes were entered, even IF they were entered in the 3rd and 4th when only looking for 2 code possibilities.
Sometimes a generic query (as in the left-join sample) is ok, and nothing wrong with it, but ask yourself the flexibility and do you want to drill into each permutation just to get your final result numbers.

Retrieve the total number of orders made and the number of orders for which payment has been done

Retrieve the total number of orders made and the number of orders for which payment has been done(delivered).
TABLE ORDER
------------------------------------------------------
ORDERID QUOTATIONID STATUS
----------------------------------------------------
Q1001 Q1002 Delivered
O1002 Q1006 Ordered
O1003 Q1003 Delivered
O1004 Q1006 Delivered
O1005 Q1002 Delivered
O1006 Q1008 Delivered
O1007 Q1009 Ordered
O1008 Q1013 Ordered
Unable to get the total number of orderid i.e 8
select count(orderid) as "TOTALORDERSCOUNT",count(Status) as "PAIDORDERSCOUNT"
from orders
where status ='Delivered'
The expected output is
TOTALORDERDSCOUNT PAIDORDERSCOUNT
8 5
I think you want conditional aggregation:
select count(*) as TOTALORDERSCOUNT,
sum(case when status = 'Delivered' then 1 else 0 end) as PAIDORDERSCOUNT
from orders;
Try this-
SELECT COUNT(ORDERID) TOTALORDERDSCOUNT,
SUM(CASE WHEN STATUS = 'Delivered' THEN 1 ELSE 0 END ) PAIDORDERSCOUNT
FROM ORDER
You can also use COUNT in place of SUM as below-
SELECT COUNT(ORDERID) TOTALORDERDSCOUNT,
COUNT(CASE WHEN STATUS = 'Delivered' THEN 1 ELSE NULL END ) PAIDORDERSCOUNT
FROM ORDER
you could use cross join between the two count
select count(orderid) as TOTALORDERSCOUNT, t.PAIDORDERSCOUNT
from orders
cross join (
select count(Status) PAIDORDERSCOUNT
from orders where Status ='Delivered'
) t
What I've used in the past for summarizing totals is
SELECT
count(*) 'Total Orders',
sum( iif( orders.STATUS = 'Delivered', 1, 0 ) ) 'Total Paid Orders'
FROM orders
I personally don't like using CASE WHEN if I don't have to. This logic may look like its a little too much for a simple summation of totals, but it allows for more conditions to be added quite easily and also just involves less typing, at least for what I use this regularly for.
Using the iif( statement to set up the conditional where you're looking for all rows in the STATUS column with the value 'Delivered', with this set up, if the status is 'Delivered', then it marks it stores a value of 1 for that order, and if the status is either 'Ordered' or any other value, including null values or if you ever need a criteria such as 'Pending', it would still give an accurate count.
Then, nesting this within the 'sum' function totals all of the 1's denoted from your matched values. I use this method regularly for report querying when there's a need for many conditions to be narrowed down to a summed value. This also opens up a lot of options in the case you need to join tables in your FROM statement.
Also just out of personal preference and depending on which SQL environment you're using this in, I tend to only use AS statements for renaming when absolutely necessary and instead just denote the column name with a single quoted string. Does the same thing, but that's just personal preference.
As stated before, this may seem like it's doing too much, but for me, good SQL allows for easy change to conditions without having to rewrite an entire query.
EDIT** I forgot to mention using count(*) only works if the orderid's are all unique values. Generally speaking for an orders table, orderid is an expected unique value, but just wanted to add that in as a side note.
SELECT DISTINCT COUNT(ORDERID) AS [TOTALORDERSCOUNT],
COUNT(CASE WHEN STATUS = 'ORDERED' THEN ORDERID ELSE NULL END) AS [PAIDORDERCOUNT]
FROM ORDERS
TotalOrdersCount will count all distinct values in orderID while the case statement on PaidOrderCount will filter out any that do not have the desired Status.

Proper way to count how many times an item from 2 lists of items are ordered together

I am currently self-learning SQL so for most of you this would probably seem like a simple question (if I expressed it correctly).
I have a table 'orders' that look like this:
Orddid(Uid) Ordmid Odate Itmsid
----------- ------ ----- ------
100101 100101 01.12.2018 12
100102 100101 01.12.2018 88
100103 100101 01.12.2018 57
100104 100102 01.12.2018 12
What I want to do is count the times that any item from 2 lists of items (as in IN (itmsid1, itmsid2) coexists for all Ordmids.
For example, if I query about itemsid in (12,99) and also itemsid in (22,57) I would get a count of 1 at the end.
How do I do that?
EDIT: I have to say that this community is amazing! Lightning fast responses and very supportive even. Thank you very much people. I owe you!
I interpret your question as:
How many times does an Ormid group feature itemid 12 or 99, in combination with itemid 22 or 57..
Meaning, an ormid group should have either a 12 and a 22, or 12 and 57, or 99 and 22, or 99 and 27 (at least.. 12,22,57 etc would also be permitted). In plain english this might be expressed as "How many times did someone buy (a keyboard or a mouse) in combination with (a memory stick or a printer cartridge)" - to qualify for a special offer, someone has to buy at least one item from group 1, and one item from group 2
Many ways to do, here's one:
SELECT COUNT(distinct t_or_nn.ormid) FROM
(SELECT ormid FROM orders WHERE itemid in (12,99)) t_or_nn
INNER JOIN
(SELECT ormid FROM orders WHERE itemid in (22,57)) tt_or_fs
ON t_or_nn.ormid = rr_or_fs.ormid
How it works:
Two subqueries; one pulls a list of all the ormids that have a 12 or 99. The other pulls a list of all the ormids that have a 22 or 57.
When these lists are joined, only the ormids that are equal will survive to become a result of the join
We thus end up with a list of only those ormids that have a 12 or 99, in combination with a 22 or 57. Counting this (distinctly, to prevent an ormid with 12,99,22 being counted as 2, or an ormid of 12,22,57,99 items being counted as 4) provides our answer.
If you need more detail on why having an ormid with itemids 12,99,22,57 results in a count of 4, let me know. I won't launch into talking about cartesian products right away as you might already know..
There are a few ways to solve things like this, I've picked on this way as it's fairly easy to explain because the query logic is fairly well aligned with the way a human might think about it
You can use group by and having:
select o.ordmid
from orders o
where o.itmid in (12, 99)
group by o.ordmid
having count(distinct o.itmid) = 2; -- number of items in the list
If items cannot repeat within an order, then use count(o.itmid) in the having rather than count(distinct).
If you want the number of ordmids where this occurs, then just use this as a subquery and use count():
select count(*)
from (select o.ordmid
from orders o
where o.itmid in (12, 99)
group by o.ordmid
having count(distinct o.itmid) = 2; -- number of items in the list
) o;
EDIT:
If you have two separate lists and you want orders that have at least one item from each list you can do:
select o.ordmid
from orders o
group by o.ordmid
having sum(case when o.itmid in (<list 1>) then 1 else 0 end) > 0 and
sum(case when o.itmid in (<list 2>) then 1 else 0 end) > 0 ;
You want only orders that order at least one from set1 and at least one from set 2. This means that you need to select all records where both conditions are true and then count the distinct orders:
For example, you can use exists to check for each order if they have a record in either set:
select count(distinct ordmid)
from orders o
where exists (
select *
from orders
where ordmid = o.ordmid and itmsid in (12, 99)
)
and exists (
select *
from orders
where ordmid = o.ordmid and itmsid in (22, 57)
)
Alternatively, you could use the quantified comparison predicate (ANY or SOME) (see also Quantified Subquery Predicates in the Firebird 2.5 Language Reference). Contrary to the exists solution, this removes the need for a correlated subquery.
select count(distinct ordmid)
from orders o
where ordmid = any(
select ordmid
from orders
where itmsid in (12, 99)
)
and ordmid = any(
select ordmid
from orders
where itmsid in (22, 57)
)

Ratio or Percentage from group by SQL query from column with condition and without condition

I am having some trouble with a SQL query. From a table let's call it Reports:
I want to group all the reports by the name column.
Then for each of those name groups I want to go to the rating column and count the number of times the rating was 15 or less. Let's say this happened 10 times for one of the groups with the name BOBBO.
I also want to know the number of times ratings were submitted (same as total number of records for each name group). So using the name group BOBBO let's say he has 20 ratings.
So under the condition the group BOBBO 50% of the time has a rating 15 or less.
I've seen these posts -- I am still having some trouble cracking this.
using-count-and-return-percentage-against-sum-of-records
getting-two-counts-and-then-dividing-them
getting-a-percentage-from-mysql-with-a-group-by-condition-and-precision
divide-two-counts-from-one-select
After reading those I tried queries like these:
ActiveRecord::Base.connection.execute
("SELECT COUNT(*) Matched,
(select COUNT(rating) from reports group by name) Total,
CAST(COUNT(*) AS FLOAT)/CAST((SELECT COUNT(*) FROM reports group by name) AS FLOAT)*100 Percentage from reports
where rating <= 15 order by Percentage")
ActiveRecord::Base.connection.execute
("select name, sum(rating) / count(rating) as bad_rating
from reports group by name having bad_rating <= 15")
Any help would be very much appreciated!
Consider a conditional aggregate for the bad ratings divided by full count:
SELECT [name],
SUM(CASE WHEN [rating] <= 15 THEN 1 ELSE 0 END) / Count(*) AS bad_rating
FROM Reports
GROUP BY [name]
Or as #CL. points out a shorter conditional aggregate (where logical expression is summed):
SELECT [name],
SUM([rating] <= 15) / Count(*) AS bad_rating
FROM Reports
GROUP BY [name]