SQL multiple constrained counts in query - sql

I am trying to get with 1 query multiple count results where each one is a subset of the previous one.
So my table would be called Recipe and has these columns:
recipe_num(Primary_key) decimal,
recipe_added date,
is_featured bit,
liked decimal
And what I want is to make a query that will return the amount of likes grouped by day for any particular month with
total recipes as total_recipes,
total recipes that were featured as featured_recipes,
total number of recipes that were featured and had more than 100 likes liked_recipes
So as you can see each they are all counts with each being a subset of the previous one.
Ideally I don't want to run separate select count's where that query the whole table but rather get from the previous one.
I am not very good at using count with Where, Having, etc... and not exactly sure how to do it, so far I have the following which I managed via digging around here.
select
recipe_added,
count(*) total_recipes,
count(case is_featured when 1 then 1 else null end) total_featured_recipes
from
RECIPES
group by
recipe_added
I am not exactly sure why I have to use case inside the count but I wasn't able to get it to work using WHERE, would like to know if this is possible as well.
Thanks

With a CASE expression inside COUNT() you are doing conditional aggregation and this is exactly what you need for this requirement:
select recipe_added,
count(*) total_recipes,
count(case when is_featured = 1 then 1 end) total_featured_recipes,
count(case when is_featured = 1 and liked > 100 then 1 end) liked_recipes
from Recipes
group by recipe_added
There is no need for ELSE null because the default behavior of a CASE expression is to return null when no other branch returns a value.
If you want results for a specific month, say October 2020, you can add a WHERE clause before the GROUP BY:
where format(recipe_added, 'yyyyMM') = '202010'
This will work for SQL Server.
If you are using a different database then you can use a similar approach.

Related

SQL question with attempt on customer information

Schema
Question: List all paying customers with users who had 4 or 5 activities during the week of February 15, 2021; also include how many of the activities sent were paid, organic and/or app store. (i.e. include a column for each of the three source types).
My attempt so far:
SELECT source_type, COUNT(*)
FROM activities
WHERE activity_time BETWEEN '02-15-21' AND '02-19-21'
GROUP BY source_type
I would like to get a second opinion on it. I didn't include the accounts table because I don't believe that I need it for this query, but I could be wrong.
Have you tried to run this? It doesn't satisfy the brief on FOUR counts:
List all the ... customers (that match criteria)
There is no customer information included in the results at all, so this is an outright fail.
paying customers
This is the top level criteria, only customers that are not free should be included in the results.
Criteria: users who had 4 or 5 activities
There has been no attempt to evaluate this user criteria in the query, and the results do not provide enough information to deduce it.
there is further ambiguity in this requirement, does it mean that it should only include results if the account has individual users that have 4 or 5 acitvities, or is it simply that the account should have 4 or 5 activities overall.
If this is a test question (clearly this is contrived, if it is not please ask for help on how to design a better schema) then the use of the term User is usually very specific and would suggest that you need to group by or otherwise make specific use of this facet in your query.
Bonus: (i.e. include a column for each of the three source types).
This is the only element that was attempted, as the data is grouped by source_type but the information cannot be correlated back to any specific user or customer.
Next time please include example data and the expected outcome with your post. In preparing the data for this post you would have come across these issues yourself and may have been inspired to ask a different question, or through the process of writing the post up you may have resolved the issue yourself.
without further clarification, we can still start to evolve this query, a good place to start is to exclude the criteria and focus on the format of the output. the requirement mentions the following output requirements:
List Customers
Include a column for each of the source types.
Firstly, even though you don't think you need to, the request clearly states that Customer is an important facet in the output, and in your schema account holds the customer information, so although we do not need to, it makes the data readable by humans if we do include information from the account table.
This is a standard PIVOT style response then, we want a row for each customer, presenting a count that aggregates each of the values for source_type. Most RDBMS will support some variant of a PIVOT operator or function, however we can achieve the same thing with simple CASE expressions to conditionally put a value into projected columns in the result set that match the values we want to aggregate, then we can use GROUP BY to evaluate the aggregation, in this case a COUNT
The following syntax is for MS SQL, however you can achieve something similar easily enough in other RBDMS
OP please tag this question with your preferred database engine...
NOTE: there is NO filtering in this query... yet
SELECT accounts.company_id
, accounts.company_name
, paid = COUNT(st_paid)
, organic = COUNT(st_organic)
, app_store = COUNT(st_app_store)
FROM activities
INNER JOIN accounts ON activities.company_id = accounts.company_id
-- PIVOT the source_type
CROSS APPLY (SELECT st_paid = CASE source_type WHEN 'paid' THEN 1 END
,st_organic = CASE source_type WHEN 'organic' THEN 1 END
,st_app_store = CASE source_type WHEN 'app store' THEN 1 END
) as PVT
GROUP BY accounts.company_id, accounts.company_name
This results in the following shape of result:
company_id
company_name
paid
organic
app_store
apl01
apples
4
8
0
ora01
oranges
6
12
0
Criteria
When you are happy with the shpe of the results and that all the relevant information is available, it is time to apply the criteria to filter this data.
From the requirement, the following criteria can be identified:
paying customers
The spec doesn't mention paying specifically, but it does include a note that (free customers have current_mrr = 0)
Now aren't we glad we did join on the account table :)
users who had 4 or 5 activities
This is very specific about explicitly 4 or 5 activities, no more, no less.
For the sake of simplicity, lets assume that the user facet of this requirement is not important and that is is simply a reference to all users on an account, not just users who have individually logged 4 or 5 activities on their own - this would require more demo data than I care to manufacture right now to prove.
during the week of February 15, 2021.
This one was correctly identified in the original post, but we need to call it out just the same.
OP has used Monday to Friday of that week, there is no mention that weeks start on a Monday or that they end on Friday but we'll go along, it's only the syntax we need to explore today.
In the real world the actual values specified in the criteria should be parameterised, mainly because you don't want to manually re-construct the entire query every time, but also to sanitise input and prevent SQL injection attacks.
Even though it seems overkill for this post, using parameters even in simple queries helps to identify the variable elements, so I will use parameters for the 2nd criteria to demonstrate the concept.
DECLARE #from DateTime = '2021-02-15' -- Date in ISO format
DECLARE #to DateTime = (SELECT DateAdd(d, 5, #from)) -- will match Friday: 2021-02-19
/* NOTE: requirement only mentioned the start date, not the end
so your code should also only rely on the single fixed start date */
SELECT accounts.company_id, accounts.company_name
, paid = COUNT(st_paid), organic = COUNT(st_organic), app_store = COUNT(st_app_store)
FROM activities
INNER JOIN accounts ON activities.company_id = accounts.company_id
-- PIVOT the source_type
CROSS APPLY (SELECT st_paid = CASE source_type WHEN 'paid' THEN 1 END
,st_organic = CASE source_type WHEN 'organic' THEN 1 END
,st_app_store = CASE source_type WHEN 'app store' THEN 1 END
) as PVT
WHERE -- paid accounts = exclude 'free' accounts
accounts.current_mrr > 0
-- Date range filter
AND activity_time BETWEEN #from AND #to
GROUP BY accounts.company_id, accounts.company_name
-- The fun bit, we use HAVING to apply a filter AFTER the grouping is evaluated
-- Wording was explicitly 4 OR 5, not BETWEEN so we use IN for that
HAVING COUNT(source_type) IN (4,5)
I believe you are missing some information there.
without more information on the tables, I can only guess that you also have a customer table. i am going to assume there is a customer_id key that serves as key between both tables
i would take your query and do something like:
SELECT customer_id,
COUNT() AS Total,
MAX(CASE WHEN source_type = "app" THEN "numoperations" END) "app_totals"),
MAX(CASE WHEN source_type = "paid" THEN "numoperations" END) "paid_totals"),
MAX(CASE WHEN source_type = "organic" THEN "numoperations" END) "organic_totals"),
FROM (
SELECT source_type, COUNT() AS num_operations
FROM activities
WHERE activity_time BETWEEN '02-15-21' AND '02-19-21'
GROUP BY source_type
) tb1 GROUP BY customer_id
This is the most generic case i can think of, but does not scale very well. If you get new source types, you need to modify the query, and the structure of the output table also changes. Depending on the sql engine you are using (i.e. mysql vs microsoft sql) you could also use a pivot function.
The previous query is a little bit rough, but it will give you a general idea. You can add "ELSE" statements to the clause, to zero the fields when they have no values, and join with the customer table if you want only active customers, etc.

Django ORM remove unwanted Group by when annotate multiple aggregate columns

I want to create a query something like this in django ORM.
SELECT COUNT(CASE WHEN myCondition THEN 1 ELSE NULL end) as numyear
FROM myTable
Following is the djang ORM query i have written
year_case = Case(When(added_on__year = today.year, then=1), output_field=IntegerField())
qs = (ProfaneContent.objects
.annotate(numyear=Count(year_case))
.values('numyear'))
This is the query which is generated by django orm.
SELECT COUNT(CASE WHEN "analyzer_profanecontent"."added_on" BETWEEN 2020-01-01 00:00:00+00:00 AND 2020-12-31 23:59:59.999999+00:00 THEN 1 ELSE NULL END) AS "numyear" FROM "analyzer_profanecontent" GROUP BY "analyzer_profanecontent"."id"
All other things are good, but django places a GROUP BY at the end leading to multiple rows and incorrect answer. I don't want that at all. Right now there is just one column but i will place more such columns.
EDIT BASED ON COMMENTS
I will be using the qs variable to get values of how my classifications have been made in the current year, month, week.
UPDATE
On the basis of comments and answers i am getting here let me clarify. I want to do this at the database end only (obviously using Django ORM and not RAW SQL). Its a simple sql query. Doing anything at Python's end will be inefficient since the data can be too large. Thats why i want the database to get me the sum of records based on the CASE condition.
I will be adding more such columns in the future so something like len() or .count will not work.
I just want to create the above mentioned query using Django ORM (without an automatically appended GROUP BY).
When using aggregates in annotations, django needs to have some kind of grouping, if not it defaults to primary key. So, you need to use .values() before .annotate(). Please see django docs.
But to completely remove group by you can use a static value and django is smart enough to remove it completely, so you get your result using ORM query like this:
year_case = Case(When(added_on__year = today.year, then=1), output_field=IntegerField())
qs = (ProfaneContent.objects
.annotate(dummy_group_by = Value(1))
.values('dummy_group_by')
.annotate(numyear=Count(year_case))
.values('numyear'))
If you need to summarize only to one row then you should to use an .aggregate() method instead of annotate().
result = ProfaneContent.objects.aggregate(
numyear=Count(year_case),
# ... more aggregated expressions are possible here
)
You get a simple dictionary of result columns:
>>> result
{'numyear': 7, ...}
The generated SQL query is without groups, exactly how required:
SELECT
COUNT(CASE WHEN myCondition THEN 1 ELSE NULL end) as numyear
-- and more possible aggregated expressions
FROM myTable
What about a list comprehension:
# get all the objects
profane = ProfaneContent.objects.all()
# Something like this
len([pro for pro in profane if pro.numyear=today.year])
if the num years are equal it will add it to the list, so at the and you can check the len()
to get the count
Hopefully this is helpfull!
This is how I would write it in SQL.
SELECT SUM(CASE WHEN myCondition THEN 1 ELSE 0 END) as numyear
FROM myTable
SELECT
SUM(CASE WHEN "analyzer_profanecontent"."added_on"
BETWEEN 2020-01-01 00:00:00+00:00
AND 2020-12-31 23:59:59.999999+00:00
THEN 1
ELSE 0
END) AS "numyear"
FROM "analyzer_profanecontent"
GROUP BY "analyzer_profanecontent"."id"
If you intend to use other items in the SELECT clause I would recommend using a group by as well which would look like this:
SELECT SUM(CASE WHEN myCondition THEN 1 ELSE 0 END) as numyear
FROM myTable
GROUP BY SUM(CASE WHEN myCondition THEN 1 ELSE 0 END)

Creating indvidual columns with SQL

I'm new to sql and very novice with this package. SQL is running in Watson DashDB. For the past few hours I've been struggling to find the correct code.
The code is trying to accomplish a few things.
Create a new view called SENTIMENT
Join two tables together
Have the new table show 4 columns with A. USER_SCREEN_NAME, B. Total Tweets, C. Postive SENTIMENT Count D. Negative SENTIMENT Count
The code below only creates 2 columns, I am needing 4. SPACEX_SENTIMENTS.SENTIMENT_POLARITY contain both Negative and Positive.
CREATE VIEW SENTIMENT
AS
(SELECT SPACEX_TWEETS.USER_SCREEN_NAME, SPACEX_SENTIMENTS.SENTIMENT_POLARITY
FROM dash015214.SPACEX_TWEETS
LEFT JOIN dash015214.SPACEX_SENTIMENTS ON
SPACEX_TWEETS.MESSAGE_ID=SPACEX_SENTIMENTS.MESSAGE_ID);
SELECT USER_SCREEN_NAME, COUNT(1) tweetsCount
FROM dash015214.SENTIMENT
GROUP BY USER_SCREEN_NAME
HAVING COUNT (1)>1
ORDER BY COUNT (USER_SCREEN_NAME) DESC
FETCH FIRST 20 ROWS ONLY;
It looks like your view SENTIMENT has one row per tweet, with two columns: the user name and then a polarity column. From your comment, I assume the polarity column can have the values of either 'POSITIVE' or 'NEGATIVE'. I think you can get what you want with this query:
SELECT
USER_SCREEN_NAME,
COUNT(1) AS "Total Tweets",
COUNT(CASE SENTIMENT_POLARITY WHEN 'POSITIVE' THEN 1 ELSE NULL END) AS "Positive Tweets",
COUNT(CASE SENTIMENT_POLARITY WHEN 'NEGATIVE' THEN 1 ELSE NULL END) AS "Negative Tweets"
FROM
SENTIMENT
GROUP BY USER_SCREEN_NAME
HAVING COUNT(1) > 1
ORDER BY COUNT(1) DESC;
This will give you everyone with at least 2 tweets (is that what you want?), and tell you the number of tweets per user and how many were positive and how many were negative. Replace " with whatever your SQL uses to indicate column names.

Return NULL instead of 0 when using COUNT(column) SQL Server

I have query which running fine and its doing two types of work, COUNT and SUM.
Something like
select
id,
Count (contracts) as countcontracts,
count(something1),
count(something1),
count(something1),
sum(cost) as sumCost
from
table
group by
id
My problem is: if there is no contract for a given ID, it will return 0 for COUNT and Null for SUM. I want to see null instead of 0
I was thinking about case when Count (contracts) = 0 then null else Count (contracts) end but I don't want to do it this way because I have more than 12 count positions in query and its prepossessing big amount of records so I think it may slow down query performance.
Is there any other ways to replace 0 with NULL?
Try this:
select NULLIF ( Count(something) , 0)
Here are three methods:
1. (case when count(contracts) > 0 then count(contracts) end) as countcontracts
2. sum(case when contracts is not null then 1 end) as countcontracts
3. nullif(count(contracts), 0)
All three of these require writing more complicated expressions. However, this really isn't that difficult. Just copy the line multiple times, and change the name of the variable on each one. Or, take the current query, put it into a spreadsheet and use spreadsheet functions to make the transformation. Then copy the function down. (Spreadsheets are really good code generators for repeated lines of code.)

Writing a query to include certain values but exclude others when looking for a latest time period

I am trying to write a query that looks for a people that have a certain code with the latest period (year) but not if they have another code with that latest period(year). I'll be explicit just so my example makes sense.
I want people who have the code A1,A2,A3,A4,A5 but not AG,AP,AQ. There are people who have an A1 code for a period (like 2014) and an AG code for a the same period. I'd like to exclude them. Not everyone has a code so the field value could be NULL.
Is there a way to express this in a different way (i.e. less characters) than the way I did?
SELECT
people.firstName
FROM
people
WHERE EXISTS (
SELECT *
FROM codes
WHERE
codes.people_id = people.id
AND period = (SELECT MAX(period) FROM codes codes2 WHERE codes2.people_id = codes.people_id)
AND code LIKE 'A[1-5]'
)
AND NOT EXISTS (
SELECT *
FROM codes
WHERE
codes.people_id = people.id
AND period = (
SELECT MAX(period)
FROM codes codes2
WHERE codes2.people_id = codes.people_id
)
AND code LIKE 'A[GPQ]'
)
Schema is as follows:
People
id (PK)
firstName
Codes
people_id (FK) many to one relation with People table
code (e.g. "A1", "A2", "AG")
period (e.g. "2013", "2014")
There are so many ways you could do that, I'm not an SQL expert but I can't see your query being too bad, if you want to try and reduce the number of sub-queries you could consider using the GROUP BY clause along with a SUM Aggregate function in a HAVING clause.
I started updating your code as follows:
SELECT
people.firstName
FROM
people
LEFT JOIN codes AS a15 ON a15.people_id = people.id AND a15.code LIKE 'A[1-5]'
LEFT JOIN codes AS agpq ON agpq.people_id = people.id AND agpq.code LIKE 'A[GPQ]'
GROUP BY
people.firstName
HAVING
SUM(CASE WHEN a15.code IS NULL THEN 0 ELSE 1 END) > 0
AND SUM(CASE WHEN agpq.code IS NULL THEN 0 ELSE 1 END) = 0
This however doesn't take into account anything to do with period specific requirements described. You could add the period to the GROUP BY clause or add it to a WHERE or one of the JOIN constraints but I'm not quite sure from your description exactly what you're after (I don't believe this is through any fault of your own, I just can't personally align the code provided to the description).
I would also like to point out that the SUM functions above will not give an accurate count of the number of matching codes. This is because if both A[GPQ] and A[1_5] return at least one row, the number returned by each constraint will be multiplied by the number returned for the other, it can however be used to determine if there are "any" returned items as if the criteria is matched it will have a SUM(...) > 0
I'm sure a more experienced SQL Developer / DBA will be able to poke many holes in my proposed query but it might give them or someone else something to work from and hopefully gives you ideas for alternatives to using sub-queries.