SQL months_between grouping issue - sql

First post; go easy on me.
Relatively new to SQL (anything beyond simple queries really), but attempting to learn more complex functions in an effort to take advantage of superior server resources. My issue:
I would like to use a SUM function to aggregate cash flows across a very large variety of sources. I would like to see these cash flows along a monthly time period. Because the cash flows start at different times, I would like to season them so that they are all aligned. My current code:
select
months_between(A.reporting_date, B.start_date) as season,
sum(case when A.current_balance is null then B.original_balance
else A.current_balance end) as cashflow
from dataset1 A, dataset2 B
group by season
order by season
Now, executing the code like this generates an error message that states that A.reporting_date and B.start_date must be GROUPED or part of an AGGREGATE function.
The problem is, if I add them to the GROUP BY statement, while it generates output without error, I get cash flow sums that are essentially Cartesian crosses with all the grouped variables.
So long story short, is there any way for me to get cash flow sums grouped by only the season? If so, any ideas how to do it?
Thank you.

Most databases don't allow using column aliases defined previously, in where, group by and order by clauses.
For your query you should use months_between(A.reporting_date, B.start_date) instead of the alias season in group by and order by.
Also your query will return a cross product, as a join condition isn't specified.
select
months_between(A.reporting_date, B.start_date) as season,
sum(case when A.current_balance is null then B.original_balance
else A.current_balance end) as cashflow
from dataset1 A
JOIN dataset2 B ON --add a join condition
group by months_between(A.reporting_date, B.start_date)
order by months_between(A.reporting_date, B.start_date)

Related

SQL question with attempt on customer information

Schema
Question: List all paying customers with users who had 4 or 5 activities during the week of February 15, 2021; also include how many of the activities sent were paid, organic and/or app store. (i.e. include a column for each of the three source types).
My attempt so far:
SELECT source_type, COUNT(*)
FROM activities
WHERE activity_time BETWEEN '02-15-21' AND '02-19-21'
GROUP BY source_type
I would like to get a second opinion on it. I didn't include the accounts table because I don't believe that I need it for this query, but I could be wrong.
Have you tried to run this? It doesn't satisfy the brief on FOUR counts:
List all the ... customers (that match criteria)
There is no customer information included in the results at all, so this is an outright fail.
paying customers
This is the top level criteria, only customers that are not free should be included in the results.
Criteria: users who had 4 or 5 activities
There has been no attempt to evaluate this user criteria in the query, and the results do not provide enough information to deduce it.
there is further ambiguity in this requirement, does it mean that it should only include results if the account has individual users that have 4 or 5 acitvities, or is it simply that the account should have 4 or 5 activities overall.
If this is a test question (clearly this is contrived, if it is not please ask for help on how to design a better schema) then the use of the term User is usually very specific and would suggest that you need to group by or otherwise make specific use of this facet in your query.
Bonus: (i.e. include a column for each of the three source types).
This is the only element that was attempted, as the data is grouped by source_type but the information cannot be correlated back to any specific user or customer.
Next time please include example data and the expected outcome with your post. In preparing the data for this post you would have come across these issues yourself and may have been inspired to ask a different question, or through the process of writing the post up you may have resolved the issue yourself.
without further clarification, we can still start to evolve this query, a good place to start is to exclude the criteria and focus on the format of the output. the requirement mentions the following output requirements:
List Customers
Include a column for each of the source types.
Firstly, even though you don't think you need to, the request clearly states that Customer is an important facet in the output, and in your schema account holds the customer information, so although we do not need to, it makes the data readable by humans if we do include information from the account table.
This is a standard PIVOT style response then, we want a row for each customer, presenting a count that aggregates each of the values for source_type. Most RDBMS will support some variant of a PIVOT operator or function, however we can achieve the same thing with simple CASE expressions to conditionally put a value into projected columns in the result set that match the values we want to aggregate, then we can use GROUP BY to evaluate the aggregation, in this case a COUNT
The following syntax is for MS SQL, however you can achieve something similar easily enough in other RBDMS
OP please tag this question with your preferred database engine...
NOTE: there is NO filtering in this query... yet
SELECT accounts.company_id
, accounts.company_name
, paid = COUNT(st_paid)
, organic = COUNT(st_organic)
, app_store = COUNT(st_app_store)
FROM activities
INNER JOIN accounts ON activities.company_id = accounts.company_id
-- PIVOT the source_type
CROSS APPLY (SELECT st_paid = CASE source_type WHEN 'paid' THEN 1 END
,st_organic = CASE source_type WHEN 'organic' THEN 1 END
,st_app_store = CASE source_type WHEN 'app store' THEN 1 END
) as PVT
GROUP BY accounts.company_id, accounts.company_name
This results in the following shape of result:
company_id
company_name
paid
organic
app_store
apl01
apples
4
8
0
ora01
oranges
6
12
0
Criteria
When you are happy with the shpe of the results and that all the relevant information is available, it is time to apply the criteria to filter this data.
From the requirement, the following criteria can be identified:
paying customers
The spec doesn't mention paying specifically, but it does include a note that (free customers have current_mrr = 0)
Now aren't we glad we did join on the account table :)
users who had 4 or 5 activities
This is very specific about explicitly 4 or 5 activities, no more, no less.
For the sake of simplicity, lets assume that the user facet of this requirement is not important and that is is simply a reference to all users on an account, not just users who have individually logged 4 or 5 activities on their own - this would require more demo data than I care to manufacture right now to prove.
during the week of February 15, 2021.
This one was correctly identified in the original post, but we need to call it out just the same.
OP has used Monday to Friday of that week, there is no mention that weeks start on a Monday or that they end on Friday but we'll go along, it's only the syntax we need to explore today.
In the real world the actual values specified in the criteria should be parameterised, mainly because you don't want to manually re-construct the entire query every time, but also to sanitise input and prevent SQL injection attacks.
Even though it seems overkill for this post, using parameters even in simple queries helps to identify the variable elements, so I will use parameters for the 2nd criteria to demonstrate the concept.
DECLARE #from DateTime = '2021-02-15' -- Date in ISO format
DECLARE #to DateTime = (SELECT DateAdd(d, 5, #from)) -- will match Friday: 2021-02-19
/* NOTE: requirement only mentioned the start date, not the end
so your code should also only rely on the single fixed start date */
SELECT accounts.company_id, accounts.company_name
, paid = COUNT(st_paid), organic = COUNT(st_organic), app_store = COUNT(st_app_store)
FROM activities
INNER JOIN accounts ON activities.company_id = accounts.company_id
-- PIVOT the source_type
CROSS APPLY (SELECT st_paid = CASE source_type WHEN 'paid' THEN 1 END
,st_organic = CASE source_type WHEN 'organic' THEN 1 END
,st_app_store = CASE source_type WHEN 'app store' THEN 1 END
) as PVT
WHERE -- paid accounts = exclude 'free' accounts
accounts.current_mrr > 0
-- Date range filter
AND activity_time BETWEEN #from AND #to
GROUP BY accounts.company_id, accounts.company_name
-- The fun bit, we use HAVING to apply a filter AFTER the grouping is evaluated
-- Wording was explicitly 4 OR 5, not BETWEEN so we use IN for that
HAVING COUNT(source_type) IN (4,5)
I believe you are missing some information there.
without more information on the tables, I can only guess that you also have a customer table. i am going to assume there is a customer_id key that serves as key between both tables
i would take your query and do something like:
SELECT customer_id,
COUNT() AS Total,
MAX(CASE WHEN source_type = "app" THEN "numoperations" END) "app_totals"),
MAX(CASE WHEN source_type = "paid" THEN "numoperations" END) "paid_totals"),
MAX(CASE WHEN source_type = "organic" THEN "numoperations" END) "organic_totals"),
FROM (
SELECT source_type, COUNT() AS num_operations
FROM activities
WHERE activity_time BETWEEN '02-15-21' AND '02-19-21'
GROUP BY source_type
) tb1 GROUP BY customer_id
This is the most generic case i can think of, but does not scale very well. If you get new source types, you need to modify the query, and the structure of the output table also changes. Depending on the sql engine you are using (i.e. mysql vs microsoft sql) you could also use a pivot function.
The previous query is a little bit rough, but it will give you a general idea. You can add "ELSE" statements to the clause, to zero the fields when they have no values, and join with the customer table if you want only active customers, etc.

SQL multiple constrained counts in query

I am trying to get with 1 query multiple count results where each one is a subset of the previous one.
So my table would be called Recipe and has these columns:
recipe_num(Primary_key) decimal,
recipe_added date,
is_featured bit,
liked decimal
And what I want is to make a query that will return the amount of likes grouped by day for any particular month with
total recipes as total_recipes,
total recipes that were featured as featured_recipes,
total number of recipes that were featured and had more than 100 likes liked_recipes
So as you can see each they are all counts with each being a subset of the previous one.
Ideally I don't want to run separate select count's where that query the whole table but rather get from the previous one.
I am not very good at using count with Where, Having, etc... and not exactly sure how to do it, so far I have the following which I managed via digging around here.
select
recipe_added,
count(*) total_recipes,
count(case is_featured when 1 then 1 else null end) total_featured_recipes
from
RECIPES
group by
recipe_added
I am not exactly sure why I have to use case inside the count but I wasn't able to get it to work using WHERE, would like to know if this is possible as well.
Thanks
With a CASE expression inside COUNT() you are doing conditional aggregation and this is exactly what you need for this requirement:
select recipe_added,
count(*) total_recipes,
count(case when is_featured = 1 then 1 end) total_featured_recipes,
count(case when is_featured = 1 and liked > 100 then 1 end) liked_recipes
from Recipes
group by recipe_added
There is no need for ELSE null because the default behavior of a CASE expression is to return null when no other branch returns a value.
If you want results for a specific month, say October 2020, you can add a WHERE clause before the GROUP BY:
where format(recipe_added, 'yyyyMM') = '202010'
This will work for SQL Server.
If you are using a different database then you can use a similar approach.

Transposing and summing the top 5 results in Teradata SQL Assistant

I have a query that I converted from Access and is currently working correctly in Teradata SQL Assistant. The data pulled is just a standard table full of all of the data I need.
What I am wondering is: Can something be added to this query that will essentially sum up all of the Exposure values and then only show the top 5 Divisions by greatest to smallest sum (of those Top 5). Also, transposing the data so that my Topics are the left most column.
Here is the working code, details omitted.
SELECT
A.AS_OF_DT
, B.DIVISION
, B.CLASS
, Sum(A.BALANCE/1000000) AS "Bal in MMs"
, Sum(A.EXPOSURE/1000000) AS "Exp in MMs"
, Sum(CASE WHEN A.STATUS = 'NACC' THEN (B.BALANCE/1000000) ELSE 0 END) AS "NPL Bal as MMs"
FROM DB.TABLE1 A LEFT JOIN DB.TABLE2 B ON A.NAICS = B.NAICS_CD
WHERE A.AS_OF_DT= '2017-03-31'
GROUP BY
A.AS_OF_DT,
B.DIVISION,
B.CLASS
ORDER BY SUM (A.EXPOSURE/1000000) DESC
Essentially I want the columns to be the following:
DIVISION|DATE|
Below DIVISION would only be the Top 5 DIVISIONS summarized by EXPOSURE (under DATE)
I can try and clarify if needed. Just let me know.
Thanks!
End result is to have a datapaste I can throw into Excel without the manual work of transposing the data in Excel along with writing formulas to rummage through the 1000's of results of the base query to find summarize the individual Divisions and then picking the top 5 each month.
Thanks!
Shill
To get the 5 top for each division, you can use QUALIFY.
Add this to the end of you query:
QUALIFY ROW_NUMBER() over (PARTITION BY AS_OF_DATE,DIVISION order by (SUM (A.EXPOSURE/1000000))
For your other questions, SQL Assistant isn't much of a presentation tool, it won't do what you are asking for.
If your query already work,
try replacing:
SELECT
By:
SELECT top 10
(line 1)

SQL getting a percentage from a column of quantities

I have a set of shelves and some are not being used but some are. I want to get the percentage of shelves that are being used (I am using an ajax call from javascript) what is a good way to do this and could you please provide an example?
This is the query I have so far which gets the nulls and sets the quantity to 0:
SELECT warehouse_locations.location, ISNULL(product_stock_warehouse.quantity, 0) as quantity
FROM product_stock_warehouse
RIGHT JOIN warehouse_locations ON product_stock_warehouse.location = warehouse_locations.location
WHERE warehouse_locations.location LIKE 'A21%'
ORDER BY product_stock_warehouse.quantity
If the shelf is not 0, it is "Full" and therefore counts towards the percentage being used.
I am using MS-SQL
I think you need aggregation. The idea is something like this:
SELECT wl.location, COALESCE(SUM(psw.quantity), 0) as total_quantity,
(CASE WHEN COALESCE(SUM(psw.quantity), 0) = 0 THEN 'EMPTY' ELSE 'USED' END) as status
FROM warehouse_locations wl LEFT JOIN
product_stock_warehouse psw
ON psw.location = wl.location
WHERE wl.location LIKE 'A21%'
GROUP BY wl.location
ORDER BY psw.quantity;
Some notes:
LEFT JOIN is much easier to follow than RIGHT JOIN. It means "keep all rows in the first table" rather than "keep all rows in some table later in the FROM clause that I haven't seen yet".
Table aliases make a query easier to write and to read.
Use GROUP BY to get one row per "shelf", which I assume is the same as a "location".

Filtering based on SSRS total/sum

I am using Visual Studio 2010 and SQL Server 2012.
I have searched the net and the stackoverflow archives and found a few others who have had this problem but I could not get their solutions to work for me. The problem involves filtering based on the aggregate value of a dataset using a user definable report parameter.
I have the following report.
The ultimate goal of this report is to filter portfolios that have a user definable % of cash as a percent of total assets. The default will be 30% or greater. So for example if a Portfolios total market value was $100,000 and their cash asset class was $40,000 this would be a cash perent of 40% and this portfolio would appear on the report.
I have been able to select just the cash asset class using a filter on the dataset itself so that is not an issue. I easily added a cash percent parameter to the dataset but soon realized this is filtering on the row detail not the aggregated sum, and sometimes portfolios have multiple cash accounts. I need the sum of all cash accounts so I can truly know if cash is 30% or greater of total market value.
At first I thought the report was working correctly.
But after cross referencing this against another report I realized this portfolio only has 2.66 % total Cash because it has a large negative balance in a second cash account as well. It's the sum of all cash accounts I care about.
I want to filter portfolios that have >= cash based on the total line not the detail lines. I suspect I may need to alter the dataset using the scalar function SUM() then building a parameter off that, but I have not had success writing a query to do that. I also would be very interested to know if somehow this can be done in the .rdl layer rather than at the sql dataset level. The SQL for this is a little complicated because the program that this is reporting on requries the use of SQL functions, stored procedures, and parameters.
If the solution involves altering the query to include sum() in the dataset itself, I suspect it is line 20 that needs to be summed
a.PercentAssets,
Here is the data set for the report.
https://www.dropbox.com/s/bafdo2i6pfvdkk4/CashPercentDataSet.sql
and here is the .rdl file.
https://www.dropbox.com/s/htg09ypyh7f1a98/cashpercent2.rdl
Thank you
I would do this in SQL personally. You can use a SUM() aggregate to get the value you're after for each account. You won't want to sum the percents though, you'll want to SUM the values and then create the percent afterwards once you have the totals. A sum of averages isn't the same as an average of sums (except in some circumstances but we can't assume that).
I would do something like this, though you'll probably need to tweak it to get it working correctly for you. (I also may not fully understanding the meaning of all the columns you're using).
SELECT PortfolioBaseCode,totalValue,cashValue,
(CAST(cashValue AS FLOAT)/totalValue) * 100 AS pcntCash
FROM (
SELECT b.PortfolioBaseCode,
SUM(a.MarketValue) AS totalValue
SUM(CASE s.SecuritySymbol WHEN 'CASH' THEN a.MarketValue ELSE 0 END) AS cashValue
FROM APXUser.fAppraisal (#ReportData) a
LEFT JOIN APXUser.vPortfolioBaseSettingEx b ON b.PortfolioBaseID = a.PortfolioBaseID
LEFT JOIN APXUser.vSecurityVariant s ON s.SecurityID = a.SecurityID
AND s.SecTypeCode = a.SecTypeCode
AND s.IsShort = a.IsShortPosition
AND a.PercentAssets >= #PercentAssets
GROUP BY b.PortfolioBaseCode
) AS t
WHERE (CAST(cashValue AS FLOAT)/totalValue)>=#pcntParameter
Alternatively you can use the HAVING clause to filter aggregate functions like a WHERE clause would (though I find this a little less readable):
SELECT b.PortfolioBaseCode,
SUM(a.MarketValue) AS totalValue
SUM(CASE s.SecuritySymbol WHEN 'CASH' THEN a.MarketValue ELSE 0 END) AS cashValue,
(SUM(a.MarketValue)/SUM(CASE s.SecuritySymbol WHEN 'CASH' THEN a.MarketValue ELSE 0 END))*100 AS cashPcnt
FROM APXUser.fAppraisal (#ReportData) a
LEFT JOIN APXUser.vPortfolioBaseSettingEx b ON b.PortfolioBaseID = a.PortfolioBaseID
LEFT JOIN APXUser.vSecurityVariant s ON s.SecurityID = a.SecurityID
AND s.SecTypeCode = a.SecTypeCode
AND s.IsShort = a.IsShortPosition
AND a.PercentAssets >= #PercentAssets
GROUP BY b.PortfolioBaseCode
HAVING (SUM(a.MarketValue)/SUM(CASE s.SecuritySymbol WHEN 'CASH' THEN a.MarketValue ELSE 0 END))>=#pcntParameter
Basically you can use grouping and aggregate functions to get the total value and the cash value for each account and then calculate the proper percentage from that. This will take into account any negatives, etc in other holdings.
Some references for aggregate function and grouping:
MSDN - GROUP BY (TSQL)
MSDN - Aggregate Functions
MSDN - HAVING Clause
If you find you're needing to do some kind of aggregation but you cannot group your results you should look into using the TSQL window functions which are very handy:
Working with Window Functions in SQL Server
MSDN - OVER Clause