SQL question with attempt on customer information

SQL question with attempt on customer information - sql

Schema
Question: List all paying customers with users who had 4 or 5 activities during the week of February 15, 2021; also include how many of the activities sent were paid, organic and/or app store. (i.e. include a column for each of the three source types).
My attempt so far:
SELECT source_type, COUNT(*)
FROM activities
WHERE activity_time BETWEEN '02-15-21' AND '02-19-21'
GROUP BY source_type
I would like to get a second opinion on it. I didn't include the accounts table because I don't believe that I need it for this query, but I could be wrong.

Have you tried to run this? It doesn't satisfy the brief on FOUR counts:
List all the ... customers (that match criteria)
There is no customer information included in the results at all, so this is an outright fail.
paying customers
This is the top level criteria, only customers that are not free should be included in the results.
Criteria: users who had 4 or 5 activities
There has been no attempt to evaluate this user criteria in the query, and the results do not provide enough information to deduce it.
there is further ambiguity in this requirement, does it mean that it should only include results if the account has individual users that have 4 or 5 acitvities, or is it simply that the account should have 4 or 5 activities overall.
If this is a test question (clearly this is contrived, if it is not please ask for help on how to design a better schema) then the use of the term User is usually very specific and would suggest that you need to group by or otherwise make specific use of this facet in your query.
Bonus: (i.e. include a column for each of the three source types).
This is the only element that was attempted, as the data is grouped by source_type but the information cannot be correlated back to any specific user or customer.
Next time please include example data and the expected outcome with your post. In preparing the data for this post you would have come across these issues yourself and may have been inspired to ask a different question, or through the process of writing the post up you may have resolved the issue yourself.
without further clarification, we can still start to evolve this query, a good place to start is to exclude the criteria and focus on the format of the output. the requirement mentions the following output requirements:
List Customers
Include a column for each of the source types.
Firstly, even though you don't think you need to, the request clearly states that Customer is an important facet in the output, and in your schema account holds the customer information, so although we do not need to, it makes the data readable by humans if we do include information from the account table.
This is a standard PIVOT style response then, we want a row for each customer, presenting a count that aggregates each of the values for source_type. Most RDBMS will support some variant of a PIVOT operator or function, however we can achieve the same thing with simple CASE expressions to conditionally put a value into projected columns in the result set that match the values we want to aggregate, then we can use GROUP BY to evaluate the aggregation, in this case a COUNT
The following syntax is for MS SQL, however you can achieve something similar easily enough in other RBDMS
OP please tag this question with your preferred database engine...
NOTE: there is NO filtering in this query... yet
SELECT accounts.company_id
, accounts.company_name
, paid = COUNT(st_paid)
, organic = COUNT(st_organic)
, app_store = COUNT(st_app_store)
FROM activities
INNER JOIN accounts ON activities.company_id = accounts.company_id
-- PIVOT the source_type
CROSS APPLY (SELECT st_paid = CASE source_type WHEN 'paid' THEN 1 END
,st_organic = CASE source_type WHEN 'organic' THEN 1 END
,st_app_store = CASE source_type WHEN 'app store' THEN 1 END
) as PVT
GROUP BY accounts.company_id, accounts.company_name
This results in the following shape of result:
company_id
company_name
paid
organic
app_store
apl01
apples
4
8
0
ora01
oranges
6
12
0
Criteria
When you are happy with the shpe of the results and that all the relevant information is available, it is time to apply the criteria to filter this data.
From the requirement, the following criteria can be identified:
paying customers
The spec doesn't mention paying specifically, but it does include a note that (free customers have current_mrr = 0)
Now aren't we glad we did join on the account table :)
users who had 4 or 5 activities
This is very specific about explicitly 4 or 5 activities, no more, no less.
For the sake of simplicity, lets assume that the user facet of this requirement is not important and that is is simply a reference to all users on an account, not just users who have individually logged 4 or 5 activities on their own - this would require more demo data than I care to manufacture right now to prove.
during the week of February 15, 2021.
This one was correctly identified in the original post, but we need to call it out just the same.
OP has used Monday to Friday of that week, there is no mention that weeks start on a Monday or that they end on Friday but we'll go along, it's only the syntax we need to explore today.
In the real world the actual values specified in the criteria should be parameterised, mainly because you don't want to manually re-construct the entire query every time, but also to sanitise input and prevent SQL injection attacks.
Even though it seems overkill for this post, using parameters even in simple queries helps to identify the variable elements, so I will use parameters for the 2nd criteria to demonstrate the concept.
DECLARE #from DateTime = '2021-02-15' -- Date in ISO format
DECLARE #to DateTime = (SELECT DateAdd(d, 5, #from)) -- will match Friday: 2021-02-19
/* NOTE: requirement only mentioned the start date, not the end
so your code should also only rely on the single fixed start date */
SELECT accounts.company_id, accounts.company_name
, paid = COUNT(st_paid), organic = COUNT(st_organic), app_store = COUNT(st_app_store)
FROM activities
INNER JOIN accounts ON activities.company_id = accounts.company_id
-- PIVOT the source_type
CROSS APPLY (SELECT st_paid = CASE source_type WHEN 'paid' THEN 1 END
,st_organic = CASE source_type WHEN 'organic' THEN 1 END
,st_app_store = CASE source_type WHEN 'app store' THEN 1 END
) as PVT
WHERE -- paid accounts = exclude 'free' accounts
accounts.current_mrr > 0
-- Date range filter
AND activity_time BETWEEN #from AND #to
GROUP BY accounts.company_id, accounts.company_name
-- The fun bit, we use HAVING to apply a filter AFTER the grouping is evaluated
-- Wording was explicitly 4 OR 5, not BETWEEN so we use IN for that
HAVING COUNT(source_type) IN (4,5)

I believe you are missing some information there.
without more information on the tables, I can only guess that you also have a customer table. i am going to assume there is a customer_id key that serves as key between both tables
i would take your query and do something like:
SELECT customer_id,
COUNT() AS Total,
MAX(CASE WHEN source_type = "app" THEN "numoperations" END) "app_totals"),
MAX(CASE WHEN source_type = "paid" THEN "numoperations" END) "paid_totals"),
MAX(CASE WHEN source_type = "organic" THEN "numoperations" END) "organic_totals"),
FROM (
SELECT source_type, COUNT() AS num_operations
FROM activities
WHERE activity_time BETWEEN '02-15-21' AND '02-19-21'
GROUP BY source_type
) tb1 GROUP BY customer_id
This is the most generic case i can think of, but does not scale very well. If you get new source types, you need to modify the query, and the structure of the output table also changes. Depending on the sql engine you are using (i.e. mysql vs microsoft sql) you could also use a pivot function.
The previous query is a little bit rough, but it will give you a general idea. You can add "ELSE" statements to the clause, to zero the fields when they have no values, and join with the customer table if you want only active customers, etc.

Related

My Joins in query not pulling through correctly

Good evening. Could someone please help me with the following. I am trying to join two tables.The first id wbr_global.gl_ap_details. This stores historic GL information. The second table sandbox.utr_fixed_mapping is where account mapping is stored. For example, ana ccount number 60820 is mapped as Employee relation. The first table needs the mapping from the second table linked on the account number. The output I am getting is not right and way to bug. Any help would be appreciated!
Output
select sandbox.utr_fixed_mapping_na.new_mapping_1,sum(wbr_global.gl_ap_details.amount)
from wbr_global.gl_ap_details
LEFT JOIN sandbox.utr_fixed_mapping_na ON wbr_global.gl_ap_details.account_number = sandbox.utr_fixed_mapping_na.account_number
Where gl_ap_details.cost_center = '1172'
and gl_ap_details.period_name = 'JUL-21'
and gl_ap_details.ledger_name = 'Amazon.com, Inc.'
Group by 1;
I tried adding the cast function but after 5000 seconds of the query running I canceled it.

The query itself appears ok, but minor changes. Learn to use table "aliases". This way you don't have to keep typing long database.table.column all over. Additionally, SQL is easier to read doing it that way anyhow.
Notice the aliases "gl" and "fm" after the tables are declared, then these aliases are used to represent the columns.. Easier to read, would you agree.
Added GL Account number as described below the query.
select
gl.account_number,
fm.new_mapping_1,
sum(gl.amount)
from
wbr_global.gl_ap_details gl
LEFT JOIN sandbox.utr_fixed_mapping_na fm
ON gl.account_number = fm.account_number
Where
gl.cost_center = '1172'
and gl.period_name = 'JUL-21'
and gl.ledger_name = 'Amazon.com, Inc.'
Group by
gl.account_number,
fm.new_mapping_1
Now, as for your query and getting null. This just means that there are records within the gl_ap_details table with an account number that is not found in the utr_fixed_mapping_na table. So, to see WHAT gl account number does NOT exist, I have added it to the query. Its possible there are MULTIPLE records in the gl_ap_details that are not found in the mapping table. So, you may get
GLAccount Description SumOfAmount
glaccount1 null $someAmount
glaccount37 null $someAmount
glaccount49 null $someAmount
glaccount72 Depreciation $someAmount
glaccount87 Real Estate $someAmount
glaccount92 Building $someAmount
glaccount99 Salaries $someAmount
I obviously made-up glaccounts just to show the purpose. You may have multiple where the null's total amount is actually masking how many different gl account numbers were NOT found.
Once you find which are missing, you can check / confirm they SHOULD be in the mapping table.
FEEDBACK.
Since you do realize the missing numbers, lets consider a Cartesian result. If there are multiple entries in the mapping table for the same G/L account number, you will get a Cartesian result thus bloating your numbers. To clarify, lets say your mapping table has
Mapping file.
GL Descr1 NewMapping
1 test Salaries
1 testView Buildings
1 Another Depreciation
And your GL_AP_Details has
GL Amount
1 $100
Your total for the query would result in $300 because the query is trying to join the AP Details GL #1 to EACH of the entries in the mapping file thus bloating the amount. You could also add a COUNT(*) as NumberOfEntries to the query to see how many transactions it THINKS it is processing. Is there some "unique ID" in the GL_AP_Details table? If so, then you could also do a count of DISTINCT ID values. If they are different (distinct is lower than # of entries), I think THAT is your culprit.
select
fm.new_mapping_1,
sum(gl.amount),
count(*) as NumberOfEntries,
count( distinct gl.UniqueIdField ) as DistinctTransactions
from
wbr_global.gl_ap_details gl
LEFT JOIN sandbox.utr_fixed_mapping_na fm
ON gl.account_number = fm.account_number
Where
gl.cost_center = '1172'
and gl.period_name = 'JUL-21'
and gl.ledger_name = 'Amazon.com, Inc.'
Group by
fm.new_mapping_1
Might you also need to limit the mapping table for a specific prophecy or mec view?

If you "think" that the result of an aggregate is wrong, then the easiest way to verify this is to select the individual rows that correlate to 1 record in the aggregate output and inspect the records, looking for duplications.
For instance, pick 'Building Management':
SELECT fixed.new_mapping_1,details.amount,*
FROM wbr_global.gl_ap_details details
LEFT JOIN sandbox.utr_fixed_mapping_na fixed ON details.account_number = fixed.account_number
WHERE details.cost_center = '1172'
AND details.period_name = 'JUL-21'
AND details.ledger_name = 'Amazon.com, Inc.'
AND details.account_number = 'Building Management'
Notice that we tack on a ,* to the end of the projection, this will show you everything that the query has access to, you should look for repeating sections of data that you were not expecting, then depending on which table they originate from your might add additional criteria to the JOIN, or to the WHERE or you might need to group by additional columns.
This type of issue is really hard to comment on in a forum like this because it is highly specific to your schema, and the data contained within it, making solutions highly subjective to criteria you are not likely to publish online.
Generally if you think a calculation is wrong, you need to manually compute it to verify, this above advice helps you to inspect the data your query is using, you should either construct your own query or use other tools to build the data set that helps you to manually compute the correct values, then work them back into or replace your original query.
The speed issues are out of scope here, we can comment on the poor schema design but I suspect you don't have a choice. In the utr_fixed_mapping_na table you should make the account_number have the same column type as the source data, or add a new column that has the data in the original type, then you can setup indexes on the columns to improve the speed of the join.

Access query, grouped sum of 2 columns where either column contains values

Another team has an Access database that they use to track call logs. It's very basic, really just a table with a few lookups, and they enter data directly in the datasheet view. They've asked me to assist with writing a report to sum up their calls by week and reason and I'm a bit stumped on this problem because I'm not an Access guy by any stretch.
The database consists of two core tables, one holding the call log entries (Calls) and one holding the lookup list of call reasons (ReasonsLookup). Relevant table structures are:
Calls
-----
ID (autonumber, PK)
DateLogged (datetime)
Reason (int, FK to ReasonLookup.ID)
Reason 2 (int, FK to ReasonLookup.ID)
ReasonLookup
------------
ID (autonumber PK)
Reason (text)
What they want is a report that looks like this:
WeekNum Reason Total
------- ---------- -----
10 Eligibility Request 24
10 Extension Request 43
10 Information Question 97
11 Eligibility Request 35
11 Information Question 154
... ... etc ...
My problem is that there are TWO columns in the Calls table, because they wanted to log a primary and secondary reason for receiving the call, i.e. someone calls for reason A and while on the phone also requests something under reason B. Every call will have a primary reason column value (Calls.Reason not null) but not necessarily a secondary reason column value (Calls.[Reason 2] is often null).
What they want is, for each WeekNum, a single (distinct) entry for each possible Reason, and a Total of how many times that Reason was used in either the Calls.Reason or Calls.[Reason 2] column for that week. So in the example above for Eligibility Request, they want to see one entry for Eligibility Request for the week and count every record in Calls that for that week that has Calls.Reason = Eligibility Request OR Calls.[Reason 2] = Eligibility Request.
What is the best way to approach a query that will display as shown above? Ideally this is a straight query, no VBA required. They are non-technical so the simpler and easier to maintain the better if possible.
Thanks in advance, any help much appreciated.

The "normal" approach would be to use a union all query as a subquery to create a set of weeks and reasons, however Access doesn't support this, but what you can do that should work is to first define a query to make the union and then use that query as a source for the "main" query.
So the first query would be
SELECT datepart("ww",datelogged) as week, Reason from calls
UNION ALL
SELECT datepart("ww",datelogged), [Reason 2] from calls;
Save this as UnionQuery and make another query mainQuery:
SELECT uq.week, rl.reason, Count(*) AS Total
FROM UnionQuery AS uq
INNER JOIN reasonlookup AS rl ON uq.reason = rl.id
GROUP BY uq.week, rl.reason;

You can use a Union query to append individual Group By Aggregate queries for both Reason and Reason 2:
SELECT DatePart("ww", Calls.DateLogged) As WeekNum, ReasonLookup.Reason,
Sum(Calls.ID) As [Total]
FROM Calls
INNER JOIN Calls.Reason = ReasonLookup.ID
GROUP BY DatePart("ww", Calls.DateLogged) As WeekNum, ReasonLookup.Reason;
UNION
SELECT DatePart("ww", Calls.DateLogged) As WeekNum, ReasonLookup.Reason,
Sum(Calls.ID) As [Total]
FROM Calls
INNER JOIN Calls.[Reason 2] = ReasonLookup.ID
GROUP BY DatePart("ww", Calls.DateLogged) As WeekNum, ReasonLookup.Reason;
DatePart() outputs the specific date's week number in the calendar year. Also, UNION as opposed to UNION ALL prevents duplicate rows from appearing.

Writing a query to include certain values but exclude others when looking for a latest time period

I am trying to write a query that looks for a people that have a certain code with the latest period (year) but not if they have another code with that latest period(year). I'll be explicit just so my example makes sense.
I want people who have the code A1,A2,A3,A4,A5 but not AG,AP,AQ. There are people who have an A1 code for a period (like 2014) and an AG code for a the same period. I'd like to exclude them. Not everyone has a code so the field value could be NULL.
Is there a way to express this in a different way (i.e. less characters) than the way I did?
SELECT
people.firstName
FROM
people
WHERE EXISTS (
SELECT *
FROM codes
WHERE
codes.people_id = people.id
AND period = (SELECT MAX(period) FROM codes codes2 WHERE codes2.people_id = codes.people_id)
AND code LIKE 'A[1-5]'
)
AND NOT EXISTS (
SELECT *
FROM codes
WHERE
codes.people_id = people.id
AND period = (
SELECT MAX(period)
FROM codes codes2
WHERE codes2.people_id = codes.people_id
)
AND code LIKE 'A[GPQ]'
)
Schema is as follows:
People
id (PK)
firstName
Codes
people_id (FK) many to one relation with People table
code (e.g. "A1", "A2", "AG")
period (e.g. "2013", "2014")

There are so many ways you could do that, I'm not an SQL expert but I can't see your query being too bad, if you want to try and reduce the number of sub-queries you could consider using the GROUP BY clause along with a SUM Aggregate function in a HAVING clause.
I started updating your code as follows:
SELECT
people.firstName
FROM
people
LEFT JOIN codes AS a15 ON a15.people_id = people.id AND a15.code LIKE 'A[1-5]'
LEFT JOIN codes AS agpq ON agpq.people_id = people.id AND agpq.code LIKE 'A[GPQ]'
GROUP BY
people.firstName
HAVING
SUM(CASE WHEN a15.code IS NULL THEN 0 ELSE 1 END) > 0
AND SUM(CASE WHEN agpq.code IS NULL THEN 0 ELSE 1 END) = 0
This however doesn't take into account anything to do with period specific requirements described. You could add the period to the GROUP BY clause or add it to a WHERE or one of the JOIN constraints but I'm not quite sure from your description exactly what you're after (I don't believe this is through any fault of your own, I just can't personally align the code provided to the description).
I would also like to point out that the SUM functions above will not give an accurate count of the number of matching codes. This is because if both A[GPQ] and A[1_5] return at least one row, the number returned by each constraint will be multiplied by the number returned for the other, it can however be used to determine if there are "any" returned items as if the criteria is matched it will have a SUM(...) > 0
I'm sure a more experienced SQL Developer / DBA will be able to poke many holes in my proposed query but it might give them or someone else something to work from and hopefully gives you ideas for alternatives to using sub-queries.

SQL: Are we able to combine multiple conditions into one result column?

I'm standardizing & migrating some of our old data from an ugly heap of a DB into something more useful.
BTW, the source DB is MSSQL/SQL-Server.
(and it's plagued with column and table sprawl, very little use of FK's, and where there are FK's, they are often broken because some unique ID's change in certain UI use cases !!! :)
Right now, I'm collecting some customer data. There are two main categories of customer here. They are not mutually exclusive. The terminology used is rather cryptic, so I'll simplify it here to User, and Reseller.
As it stands right now, there are 4 (yes, four) columns in the source table which can be used to determine whether a customer is a Reseller or a User: is_user, isuser, is_reseller, and cust_type.
The cust_type column stores hand-typed values, so we face variations in spelling/abbreviations too :)
I already have some earlier queries which select each of the customer types separately, but now I'd like to consolidate those into one qry where the results have a simple is_user bool and a simple is_reseller bool.
Here's an example of my qry for users (trimmed for brevity, functionality intact):
SELECT TOP 1 name, email, phone
FROM customer
WHERE name = '{$row['cust']}'
AND ( isuser=1
OR is_user=1
OR cust_type LIKE 'USER%'
OR cust_type = 'DIRECT USER'
)
My goal is to eliminate that condition, and replace it with an additional column in the result set to indicate whether is_user is TRUE/FALSE.
How doable is this in SQL (specifically the MS flavor)?

You can use a CASE expression but you need to make sure your WHERE clause is less exclusive (if it's the same set of criteria then every row in the result will be true).
SELECT TOP 1 name, email, phone, is_user = CASE
WHEN ( isuser = 1
OR is_user = 1
OR cust_type LIKE 'USER%'
OR cust_type = 'DIRECT USER')
THEN 1 ELSE 0 END,
is_reseller = CASE
WHEN (...any condition that makes a customer a reseller...)
THEN 1 ELSE 0 END
FROM dbo.customer
...WHERE...

Splitting one table based on criteria and comparing

I'm not quite sure on the best way to phrase this particular query, so I hope the title is adequate, however, I will attempt to describe what it is I need to be able to understand how to do. Just to clarify, this is for oracle sql.
We have a table called assessments. There are different kinds of assessments within this table, however, some assessments should follow others in a logical order and within set time frames. The problems come in when a client has multiple assessments of the same type, as we have to use a fairly inefficient array formula in excel to identify which 'full' assessment corresponds with the 'initial' assessment.
I have an earlier query that was resolved on this site (Returning relevant date from multiple tables including additional table info) which I believe includes a lot of the logic for what is required (particularly in identifying a corresponding event which has occurred within a specified timeframe). However, whilst that query pulls data from 3 seperate tables (assessments, events, responsiblities), I now need to create a query that generates a similar outcome but pulling from 1 main table and a 2nd table to return worker information. I thought the most logical way would be be to create a query that looks at the assessment table with one type of assessment, and then joins to the assessment table again (possibly a temporary table?) with assessment type that would follow the initial one.
For example:
Table 1 (Assessments):
Client ID Assessment Type Start End
P1 1 Initial 01/01/2012 05/01/2012
Table 2 (Assessments temp?):
Client ID Assessment Type Start End
P1 2 Full 12/01/2012
Table 3:
ID Worker Team
1 Bob Team1
2 Lyn Team2
Result:
Client ID Initial Start Initial End Initial Worker Full Start Full End
P1 1 01/01/2012 05/01/2012 Bob 12/01/2012
So table 1 and table 2 draw from the same table, except it's bringing back different assessments. Ideally, there'd be a check to make sure that the 'full' assessment started within X days of the end of the 'initial' assessment (similar to the 'likely' check in the previous query mentioned earlier). If this can be achieved, it's probably worth mentioning that I'd also be interested in expanding this to look at multiple assessment types, as roughly in the cycle a client could be expected to have between 4 or 5 different types of assessment. Any pointers would be appreciated, I've already had a great deal of help from this community which is very valuable.
Edit:
Edited to include solution following MBs advice.
Select
*
From(
Select
I.ASM_SUBJECT_ID as PNo,
I.ASM_ID As IAID,
I.ASM_QSA_ID as IAType,
I.ASM_START_DATE as IAStart,
I.ASM_END_DATE as IAEnd,
nvl(olm_bo.get_ref_desc(I.ASM_OUTCOME,'ASM_OUTCOME'),'') as IAOutcome,
C.ASM_ID as CAID,
C.ASM_QSA_ID as CAType,
C.ASM_START_DATE as CAStart,
C.ASM_END_DATE as CAEnd,
nvl(olm_bo.get_ref_desc(C.ASM_OUTCOME,'ASM_OUTCOME'),'') as CAOutcome,
ROUND(C.ASM_START_DATE -I.ASM_START_DATE,0) as "Likely",
row_number() over(PARTITION BY I.ASM_ID
ORDER BY
abs(I.ASM_START_DATE - C.ASM_START_DATE))as "Row Number"
FROM
O_ASSESSMENTS I
left join O_ASSESSMENTS C
on I.ASM_SUBJECT_ID = C.ASM_SUBJECT_ID
and C.ASM_QSA_ID IN ('AA523','AA1326') and
ROUND(C.ASM_START_DATE - I.ASM_START_DATE,0) >= -2
AND
ROUND(C.ASM_START_DATE - I.ASM_START_DATE,0) <= 25
and C.ASM_OUTCOME <>'ABANDON'
Where I.ASM_QSA_ID IN ('AA501','AA1323')
AND I.ASM_OUTCOME <> 'ABANDON'
AND
I.ASM_END_DATE >= '01-04-2011') WHERE "Row Number" = 1

You can access the same table multiple times in a given query in SQL, simply by using table aliases. So one way of doing this would be:
select i.client,
i.id initial_id,
i.start initial_start,
i.end initial_end,
w.worker initial_worker,
f.id full_id,
f.start full_start,
f.end full_end
from assessments i
join workers w on i.id = w.id
left join assessments f
on i.client = f.client and
f.assessment_type = 'Full' and
f.start between i.end and i.end + X
/* replace X with appropriate number of days */
where i.assessment_type = 'Initial'
Note: column names such as end (that are reserved words in Oracle SQL) should normally be double-quoted, but from the previous question it looks as though these are simplified versions of the actual column names.

From your post, I assume that you're using Oracle here (as I see "Oracle" in the question).
In terms of "temp" tables, Views come right to mind. An Oracle View can give you different looks of a table which is what it sounds like you're looking for with different kinds of assessments.
Don Burleson is a good source for anything Oracle related and he gives some tips on Oracle Views at http://www.dba-oracle.com/concepts/views.htm

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas