SQL query to append a column which counts the # instance of a particular ID

SQL query to append a column which counts the # instance of a particular ID - sql

I run analytics for a call center that more or less cold calls leads our customers give us from their CRM to sell/upsell software. I'm trying to figure out a way to calculate pick up rates based on the 1st call, 2nd call, 3rd call, etc. All data associated with our call center is logged in a table in PostgreSQL, each contact has a unique ID and each call has a unique ctc_id. My thought is to order the table by created_at/ctc_id ASC, then append a calculated column to count the number-th occurrence of the contact ID for each contact_id in a dial session/campaign [let's call it nTH_Call]. With that, I could group by nTH_Call, and based on a pickup vs voicemail (or any other negative call disposition), calculate a pickup rate based on N-th attempt to call each contact. An example table below hopefully illustrates my question:
ctc_id
call_time
contact_id
nTH_Call
347723
2021-04-06 14:14:13.287698
163836
1
354172
2021-04-12 09:29:02.564833
81153
1
354174
2021-04-12 09:29:09.759067
81153
2
367971
2021-04-21 11:57:26.050931
81153
3
369961
2021-04-22 11:45:40.009285
163836
2
374106
2021-04-26 13:41:10.687522
163836
3
In a way, I'm also trying to answer - "It takes about X.X dials (on average) to get a customer to pickup on a cold call..." etc. as well as "by the 2nd call, our average pick up rates are X.X% and by the 3rd call through, our average pick up rates are X.X%..." etc.
If it were in Excel and I had the data in a table, I'd do this --
With a =COUNTIF($C$2:C2,C2) in cell D2, and drag the formula down, then pivot the data however I'd want. But I'm trying to illustrate the rates real time on a live dashboard programmatically.
Any ideas or radically different methods I can try to answer my questions? What would I need to do in my SQL statement to make this happen?
Any help would be appreciated.
Sample Query:
SELECT
ctc_id,
call_time,
contact_id,
[contact_id OCCURRENCE order count column here!!]
FROM
call_table
ORDER BY
call_time ASC

You can calculate the nth_call order by contact in this way:
SELECT
ctc_id,
call_time,
contact_id,
ROW_NUMBER() OVER(partition by contact_id order by call_time) as nth_call
FROM
call_table
ORDER BY
call_time ASC

Related

How to calculate a bank's deposit growth from one call report to the next, as a percentage?

I downloaded the entire FDIC bank call reports dataset, and uploaded it to BigQuery.
The table I currently have looks like this:
What I am trying to accomplish is adding a column showing the deposit growth rate since the last quarter for each bank:
Note:The first reporting date for each bank (e.g. 19921231) will not have a "Quarterly Deposit Growth". Hence the two empty cells for the two banks.
I would like to know if a bank is increasing or decreasing its deposits each quarter/call report (viewed as a percentage).
e.g. "On their last call report (19921231)First National Bank had deposits of 456789 (in 1000's). In their next call report (19930331)First National bank had deposits of 567890 (in 1000's). What is the percentage increase (or decrease) in deposits"?
This "_%_Change_in_Deposits" column would be displayed as a new column.
This is the code I have written so far:
select
SFRNLL.repdte, SFRNLL.cert, SFRNLL.name, SFRNLL.city, SFRNLL.county, SFRNLL.stalp, SFRNLL.specgrp AS `Loan_Specialization`, SFRNLL.lnreres as `_1_to_4_Residential_Loans`, AL.dep as `Deposits`, AL.lnlsnet as `loans_and_leases`,
IEEE_DIVIDE(SFRNLL.lnreres, AL.lnlsnet) as SFR2TotalLoanRatio
FROM usa_fdic_call_reports_1992.All_Reports_19921231_1_4_Family_Residential_Net_Loans_and_Leases as SFRNLL
JOIN usa_fdic_call_reports_1992.All_Reports_19921231_Assets_and_Liabilities as AL
ON SFRNLL.cert = AL.cert
where SFRNLL.specgrp = 4 and IEEE_DIVIDE(SFRNLL.lnreres, AL.lnlsnet) <= 0.10
UNION ALL
select
SFRNLL.repdte, SFRNLL.cert, SFRNLL.name, SFRNLL.city, SFRNLL.county, SFRNLL.stalp, SFRNLL.specgrp AS `Loan_Specialization`, SFRNLL.lnreres as `_1_to_4_Residential_Loans`, AL.dep as `Deposits`, AL.lnlsnet as `loans_and_leases`,
IEEE_DIVIDE(SFRNLL.lnreres, AL.lnlsnet) as SFR2TotalLoanRatio
FROM usa_fdic_call_reports_1993.All_Reports_19930331_1_4_Family_Residential_Net_Loans_and_Leases as SFRNLL
JOIN usa_fdic_call_reports_1993.All_Reports_19930331_Assets_and_Liabilities as AL
ON SFRNLL.cert = AL.cert
where SFRNLL.specgrp = 4 and IEEE_DIVIDE(SFRNLL.lnreres, AL.lnlsnet) <= 0.10
The table looks like this:
Additional notes:
I would also like to view the last column (SFR2TotalLoansRatio) as a percentage.
This code runs correctly, however, previously I was getting a "division by zero" error when attempting to run 50,000 rows (1992 to the present).

Addressing each of your question individually.
First) Retrieving SFR2TotalLoanRatio as percentage, I assume you want to see 9.988% instead of 0.0988 in your results. Currently, in BigQuery you can achieve this by casting the field into a STRING then, concatenating the % sign. Below there is an example with sample data:
WITH data as (
SELECT 0.0123 as percentage UNION ALL
SELECT 0.0999 as percentage UNION ALL
SELECT 0.3456 as percentage
)
SELECT CONCAT(CAST(percentage*100 as String),"%") as formatted_percentage FROM data
And the output,
Row formatted_percentage
1 1.23%
2 9.99%
3 34.56%
Second) Regarding your question about the division by zero error. I am assuming IEEE_DIVIDE(arg1,arg2) is a function to perform the division, in which arg1 is the divisor and arg2 is the dividend. Therefore, I would adivse your to explore your data in order to figured out which records have divisor equals to zero. After gathering these results, you can determine what to do with them. In case you decide to discard them you can simply add within your WHERE statement in each of your JOINs: AL.lnlsnet = 0. On the other hand, you can also modify the records where lnlsnet = 0 using a CASE WHEN or IF statements.
UPDATE:
In order to add this piece of code your query, you u have to wrap your code within a temporary table. Then, I will make two adjustments, first a temporary function in order to calculate the percentage and format it with the % sign. Second, retrieving the previous number of deposits to calculate the desired percentage. I am also assuming that cert is the individual id for each of the bank's clients. The modifications will be as follows:
#the following function MUST be the first thing within your query
CREATE TEMP FUNCTION percent(dep INT64, prev_dep INT64) AS (
Concat(Cast((dep-prev_dep)/prev_dep*100 AS STRING), "%")
);
#followed by the query you have created so far as a temporary table, notice the the comma I added after the last parentheses
WITH data AS(
#your query
),
#within this second part you need to select all the columns from data, and LAG function will be used to retrieve the previous number of deposits for each client
data_2 as (
SELECT repdte, cert, name, city, county, stalp, Loan_Specialization, _1_to_4_Residential_Loans,Deposits, loans_and_leases, SFR2TotalLoanRatio,
CASE WHEN cert = lag(cert) OVER (PARTITION BY id ORDER BY d) THEN lag(Deposits) OVER (PARTITION BY id ORDER BY id) ELSE NULL END AS prev_dep FROM data
)
SELECT repdte, cert, name, city, county, stalp, Loan_Specialization, _1_to_4_Residential_Loans,Deposits, loans_and_leases, SFR2TotalLoanRatio, percent(Deposits,prev_dep) as dept_growth_rate FROM data_2
Note that the built-in function LAG is used together with CASE WHEN in order to retrieve the previous amount of deposits per client.

SQL Time Series Homework

Imagine you have this two tables.
a) streamers: it contains time series data, at a 1-min granularity, of all the channels that broadcast on
Twitch. The columns of the table are:
username: Channel username
timestamp: Epoch timestamp, in seconds, corresponding to the moment the data was captured
game: Name of the game that the user was playing at that time
viewers: Number of concurrent viewers that the user had at that time
followers: Number of total followers that the channel had at that time
b) games_metadata: it contains information of all the games that have ever been broadcasted on Twitch.
The columns of the table are:
game: Name of the game
release_date: Timestamp, in seconds, corresponding to the date when the game was released
publisher: Publisher of the game
genre: Genre of the game
Now I want the Top 10 publishers that have been watched the most during the first quarter of 2019. The output should contain publisher and hours_watched.
The problem is I don't have any database, I created one and inputted some values by hand.
I thought of this query, but I'm not sure if it is what I want. It may be right (I don't feel like it is ), but I'd like a second opinion
SELECT publisher,
(cast(strftime('%m', "timestamp") as integer) + 2) / 3 as quarter,
COUNT((strftime('%M',`timestamp`)/(60*1.0)) * viewers) as total_hours_watch
FROM streamers AS A INNER JOIN games_metadata AS B ON A.game = B.game
WHERE quarter = 3
GROUP BY publisher,quarter
ORDER BY total_hours_watch DESC

Looks about right to me. You don't need to include quarter in the GROUP BY since the where clause limits you to only one quarter. You can modify the query to get only the top 10 publishers in a couple of ways depending on the SQL server you've created.
For SQL Server / MS Access modify your select statement: SELECT TOP 10 publisher, ...
For MySQL add a limit clause at the end of your query: ... LIMIT 10;

Microsoft Access query to retrieve random transactions with seemingly impossible criteria

I was asked to assist with developing a report to retrieve a 25% sample of random transactions within a specific date range. I am not a programmer but I was able to devise the following fairly quickly:
SELECT TOP 25 PERCENT account.CID, account.ACCT, account.NAME, log.DATE, log.action_txt, log.field_nm, log.from_data, log.to_data, log.tran_id, log.init
FROM account INNER JOIN log ON account.ACCT = log.ACCT
GROUP BY account.CID, account.ACCT, account.NAME, log.DATE, log.action_txt, log.field_nm, log.from_data, log.to_data, log.tran_id, log.init
HAVING (((log.DATE) Between #2/7/2018# And #6/15/2018#) AND ((log.action_txt)="mod" Or (log.action_txt)="del") AND ((log.init)="J1X"
ORDER BY log.tran_dt
This returns 25% of the records within the date range. Each record row is unique but each account number potentially has multiple records on each day. In some cases the records have the same date and tran_id as well.
Upon further discussion with the requester, he actually wants to see all of the transactions for 25% of the accounts that have activity on each day within the date range. Thus if there were 100 accounts on 3/1/2018 with records in this table, he wants to see all of the transactions for 25 of those accounts; if there were 60 accounts on 3/2/2018 with records in this table, he wants to see all of the transactions for 15 of those accounts; and so on.
I was thinking that an Access module would work best in this scenario as I believe there are multiple parts to this. I figured that I need a function to loop through the date range and for each day:
1. Count the account numbers only one time
2. Return all of the transactions for 25% of the total accounts
But as I mentioned, I am not a programmer and I am exhausted from searching possible solutions for the many parts.

I think the key to your question is that you only really need a pseudo random selection of results for your report. So you can force the Random number generator to reorder your results based on a value in the record and the current time.
Something like this should work - I assume your actiontxt field is a text field and pull out the length of each field and apply that to current date/time to create a pseudo random number that can be sorted.
All I really do is change your ORDER BY line
See if this works for you
SELECT TOP 25 PERCENT
account.CID, account.ACCT, account.NAME, log.DATE, log.action_txt, log.field_nm, log.from_data,
log.to_data, log.tran_id, log.init
FROM account
INNER JOIN log ON account.ACCT = log.ACCT
GROUP BY account.CID, account.ACCT, account.NAME, log.DATE, log.action_txt, log.field_nm, log.from_data, log.to_data, log.tran_id, log.init
HAVING (((log.DATE) Between #2/7/2018# And #6/15/2018#) AND ((log.action_txt)="mod" Or (log.action_txt)="del") AND ((log.init)="J1X"
ORDER BY Rnd(CLng(Now()*Len(log.action_txt))-(Now()*Len(log.action_txt)));
Modified similar idea from other StackOverflow question and response

Access query, grouped sum of 2 columns where either column contains values

Another team has an Access database that they use to track call logs. It's very basic, really just a table with a few lookups, and they enter data directly in the datasheet view. They've asked me to assist with writing a report to sum up their calls by week and reason and I'm a bit stumped on this problem because I'm not an Access guy by any stretch.
The database consists of two core tables, one holding the call log entries (Calls) and one holding the lookup list of call reasons (ReasonsLookup). Relevant table structures are:
Calls
-----
ID (autonumber, PK)
DateLogged (datetime)
Reason (int, FK to ReasonLookup.ID)
Reason 2 (int, FK to ReasonLookup.ID)
ReasonLookup
------------
ID (autonumber PK)
Reason (text)
What they want is a report that looks like this:
WeekNum Reason Total
------- ---------- -----
10 Eligibility Request 24
10 Extension Request 43
10 Information Question 97
11 Eligibility Request 35
11 Information Question 154
... ... etc ...
My problem is that there are TWO columns in the Calls table, because they wanted to log a primary and secondary reason for receiving the call, i.e. someone calls for reason A and while on the phone also requests something under reason B. Every call will have a primary reason column value (Calls.Reason not null) but not necessarily a secondary reason column value (Calls.[Reason 2] is often null).
What they want is, for each WeekNum, a single (distinct) entry for each possible Reason, and a Total of how many times that Reason was used in either the Calls.Reason or Calls.[Reason 2] column for that week. So in the example above for Eligibility Request, they want to see one entry for Eligibility Request for the week and count every record in Calls that for that week that has Calls.Reason = Eligibility Request OR Calls.[Reason 2] = Eligibility Request.
What is the best way to approach a query that will display as shown above? Ideally this is a straight query, no VBA required. They are non-technical so the simpler and easier to maintain the better if possible.
Thanks in advance, any help much appreciated.

The "normal" approach would be to use a union all query as a subquery to create a set of weeks and reasons, however Access doesn't support this, but what you can do that should work is to first define a query to make the union and then use that query as a source for the "main" query.
So the first query would be
SELECT datepart("ww",datelogged) as week, Reason from calls
UNION ALL
SELECT datepart("ww",datelogged), [Reason 2] from calls;
Save this as UnionQuery and make another query mainQuery:
SELECT uq.week, rl.reason, Count(*) AS Total
FROM UnionQuery AS uq
INNER JOIN reasonlookup AS rl ON uq.reason = rl.id
GROUP BY uq.week, rl.reason;

You can use a Union query to append individual Group By Aggregate queries for both Reason and Reason 2:
SELECT DatePart("ww", Calls.DateLogged) As WeekNum, ReasonLookup.Reason,
Sum(Calls.ID) As [Total]
FROM Calls
INNER JOIN Calls.Reason = ReasonLookup.ID
GROUP BY DatePart("ww", Calls.DateLogged) As WeekNum, ReasonLookup.Reason;
UNION
SELECT DatePart("ww", Calls.DateLogged) As WeekNum, ReasonLookup.Reason,
Sum(Calls.ID) As [Total]
FROM Calls
INNER JOIN Calls.[Reason 2] = ReasonLookup.ID
GROUP BY DatePart("ww", Calls.DateLogged) As WeekNum, ReasonLookup.Reason;
DatePart() outputs the specific date's week number in the calendar year. Also, UNION as opposed to UNION ALL prevents duplicate rows from appearing.

Select Average of Top 25% of Values in SQL

I'm currently writing a stored procedure for my client to populate some tables that will be used to generate SSRS reports later on. Some of the data is based on specific stock formulas that are run on each of their clients' quarterly data (sent to them by their clients). The other part of the data is generated by comparing those results against those from other, similar sized clients. One of the things that they want tracked in their reports is the average of the top 25% of formula results for that particular comparison group.
To give a better picture of it, imagine the following fields that I have in a temp table:
FormulaID int
Value decimal (18,6)
I want to do the following: Given a specific FormulaID return the average of the top 25% of Value.
I know how to take an average in SQL, but I don't know how to do it against only the top 25% of a specific group.
How would I write this query?

I guess you can do something like this...
SELECT AVG(Q.ColA) Avg25Prec
FROM (
SELECT TOP 25 Percent ColA
FROM Table_Name
ORDER BY SomeCOlumn
) Q

Here's what I did, given the table shown above:
select AVG(t.Value)
from (select top 25 percent Value
from #TempGroupTable
where FormulaID = #PassedInFormulaID
order by Value desc) as t
The desc must be there, because the percent command will not actually do comparisons. It will just simply grab the first x number of records, with x being equal to 25% of the count of records it's querying. Therefore, the order by Value desc line then will grab the top 25% records which have the highest Value, and then sends that info to be averaged.
As a side note to all of this, this also means that if you wanted to grab the bottom 25% instead, or if your formula results are like a golf score (i.e. lowest is the best), all you would need to do is remove the desc part and you would be good to go.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas