Trying to summarize loan origination by month - sql

I am trying to summarize monthly loan originations by month from a table that contains loan level data going back to the late 90s. Every month, the most recent loan-level data are added to the table – the month_key field is used to identify the most recent records. I want to group the loan origination dates by month and sum the total loan commitments originated in the individual months. The table attached depicts how I want to summarize the data in my query, and the code below is what I have written thus far - it outputs data summarized by month dating back to the 90s. Thanks for the help.
Edit: JMB's solution worked. Once I added the month_key field back in, and sorted for the latest month on record, and summed the original loan balance I received the output I needed.
select SUM(INDIVIDUAL_LOAN_BALANCE) AS MONTHLY_LOAN_ORIGINATIONS, ltrim(TO_CHAR(ORIG_OBGN_DATE,'mm-yyyy'), '0') AS ORIG_MONTH
FROM LOAN_TABLE
WHERE MONTH_KEY = 202002
group by ltrim(TO_CHAR(ORIG_OBGN_DATE,'mm-yyyy'), '0')
HAVING ltrim(TO_CHAR(ORIG_OBGN_DATE,'mm-yyyy'), '0') >= '12-2018'
ORDER BY ltrim(TO_CHAR(ORIG_OBGN_DATE,'mm-yyyy'), '0') DESC
Example for how I want the query to depict the data:
How the query currently outputs the data

I would suggest leveraging the date datatype to filter, aggregate and sort: this makes things much easier and safer (typically, your where clause compares strings, and does not do what you expect). You can handle the formatting in the select clause.
select
sum(individual_loan_balance) as monthly_loan_originations,
ltrim(to_char(trunc(orig_obgn_date, 'mm'),'mm-yyyy'), '0') as orig_month
from loan_table
where orig_obgn_date >= date'2018-01-01'
group by trunc(orig_obgn_date, 'mm')
order by trunc(orig_obgn_date, 'mm')

Related

Cohort retention with SQL BigQuery

I am trying to create a retention table like the following using SQL in Big Query but with MONTHLY cohorts;
I have the following columns to use in my dataset, I am only using one table and it's name is 'curious-furnace-341507.TEST.Test_Dataset_-_Orders'
order_date
order_id
customer_id
2020-01-02
12345
6789
I do not need the new user column and the data goes through June 2020 I think ideally a cohort month column that lists January-June cohorts and then 5 periods across.
I have tried so many different things and keep getting errors in BigQuery I think I am approaching it all wrong. The online queries I am trying to pull from seem to use dates rather than months which is also causing some confusion as I think I need to truncate my date column to months only in the query?
Does anyone have a go-to query that will work in BigQuery for a retention table or can help me approach this? Thanks!
This may help you:
With cohorts AS (
SELECT
customer_id,
MIN(DATE(order_date)) AS cohort_date
FROM 'curious-furnace-341507.TEST.Test_Dataset_-_Orders'
GROUP BY 1)
SELECT
FORMAT_DATE("%Y-%m", c.cohort_date) AS cohort_mth,
t.customer_id AS cust_id,
DATE_DIFF(t.order_date, c.cohort_date, month) AS order_period,
FROM 'curious-furnace-341507.TEST.Test_Dataset_-_Orders' t
JOIN cohorts c ON t.customer_id = c.customer_id
WHERE cohort_date >= ('2020-01-01')
AND DATE_DIFF(t.order_date, c.cohort_date, month) <=5
GROUP BY 1, 2, 3
I typically do pivots and % calcs in excel/ sheets. So this will give just you the input data you need for that.
NOTE:
This will give you a count of unique customers who ordered in period X (ignores repeat orders in period).
This also has period 0 (ordered again in cohort_mth) which you may wish to keep/ exclude.

Sum dates with different timestamps and picking the min date?

Beginner here. I want to have only one row for each delivery date but it is important to keep the hours and the minutes. I have the following table in Oracle (left):
As you can see there are days that a certain SKU (e.g SKU A) was delivered twice in the same day. The table on the right is the desired result. Essentially, I want to have the quantities that arrived on the 28th summed up and in the Supplier_delivery column I want to have the earliest delivery timestamp.
I need to keep the hours and the minutes otherwise I know I could achieve this by writing sth like: SELECT SKU, TRUNC(TO_DATE(SUPPLIER_DELIVERY), 'DDD'), SUM(QTY) FROM TABLE GROUP BY SKU , TRUNC(TO_DATE(SUPPLIER_DELIVERY), 'DDD')
Any ideas?
You can use MIN():
SELECT SKU, MIN(SUPPLIER_DELIVERY), SUM(QTY)
FROM TABLE
GROUP BY SKU, TRUNC(SUPPLIER_DELIVERY);
This assumes that SUPPLIER_DELIVERY is a date and does not need to be converted to one. But it would work with TO_DATE() in the GROUP BY as well.

SQL query to count number of checkins per month

To put a long story short, I am working on a database using PostgreSQL that is managing yelp checkins. The checkintable has the attributes business_id(string), date(string in form yyyy-mm-dd), and time(string in form 00:00:00).
What I simply need to do is, given a business_id, I need to return a list of the total number of checkins based on just the mm (month) value.
So for instance, I need to retrieve the total checkins that were in Jan, Feb, March, April, etc, not based upon the year.
Any help is greatly appreciated. I've already considered group by clauses but I didn't know how to factor in '%mm%'.
Reiterating Gordon, class or not, storing dates and times as strings makes things harder, slower, and more likely to break. It's harder to take advantage of Postgres's powerful date math functions. Storing dates and times separately makes things even harder; you have to concatenate them together to get the full timestamp which means it will not be indexed. Determining the time between two events becomes unnecessarily difficult.
It should be a single timestamp column. Hopefully your class will introduce that shortly.
What I simply need to do is, given a business_id, I need to return a list of the total number of checkins based on just the mm (month) value.
This is deceptively straightforward. Cast your strings to dates, fortunately they're in ISO 8601 format so no reformatting is required. Then use extract to extract just the month part.
select
extract('month' from checkin_date::date) as month,
count(*)
from yelp_checkins
where business_id = ?
group by month
order by month
But there's a catch. What if there are no checkins for a business on a given month? We'll get no entry for that month. This is a pretty common problem.
If we want a row for every month, we need to generate a table with our desired months with generate_series, then left join with our checkin table. A left join ensures all the months (the "left" table) will be there even if there is no corresponding month in the join table (the "right" table).
select
months.month,
count(business_id)
from generate_series(1,12) as months(month)
left join yelp_checkins
on months.month = extract('month' from checkin_date::date)
and business_id = ?
group by months.month
order by months.month
Now that we have a table of months, we can group by that. We can't use a where business_id = ? clause or that will filter out empty months after the left join has happened. Instead we must put that as part of the left join.
Try it.
Why would you store the date as a string? That is a broken data model. You should fix the data.
That said, I recommend converting a date and truncating to the first day of the month:
select date_trunc('day', datestr::date) as yyyymm, count(*)
from t
group by yyyymm
order by yyyymm;
If you don't want these based on the year, then use extract():
select extract(month from datestr::date) as mm, count(*)
from t
group by mm
order by mm;

SQL Select Items and link field to another table in same select

Currently I am retrieving a list of Purchase Orders Lines (POL) and each has a Due Date. For each line where the POL.Due Date is a future date i.e. >= Current Date, I need to determine the Period Name and Financial Week the POL.Due Date falls into.
The SQL Database has a table for the Financial Period and one for the Financial Week. Each table is driven by a date.
PERIODPER
PERIOD_DATE
PERIOD
PERIOD_NAME
PERIOD_YEAR
PERIOD_WEEK
START_DATE
WEEK
YEAR
Against each report line along with the Due Date I am trying to link to each of the above tables to determine the PERIOD_NAME and WEEK for the POL.Due Date.
Where the POL.Due Date has elapsed i.e. < Current Date, I need to retrieve the PERIOD_NAME and WEEK for the Current Date.
I would like to try and do this in an SQL select as my only other option is to write a VBA report which initially retrieves all the Purchase Order Lines and then serially reads through each and links to the other tables to determine the Financial Period Name and Week Number.
I am looking for an end result something on the lines of:
PO_NUMBER PO_LINE DUE_DATE WEEK_NO PERIOD_NAME
I would appreciate any assistance on this as my SQL knowledge does not extend to what to me appears to be a complex selection.
Do you mean something like this (SQL Server syntax)?
select pol.po_number,
pol.po_line,
pol.due_date,
pw.week as week_no,
pp.period_name
from purchaseOrderLines pol
left join period_week pw on pol.DueDate > GetDate()
and pw.start_date <= pol.due_date
and dateAdd(d, 7, pw.start_date) >= pol.due_date
left join periodPer pp on pol.DueDate > GetDate()
and pp.period_date = pol.DueDate
Thanks for your response.
I tried your logic which gave me what I was looking for but upon further investigation I located another table which detailed every date up to 2025, including the week and month. I'm simply now creating a JOIN to this new table using my Due Date value.
I would just like to thank you for your time and effort in answering my query. Although I am not using your suggestion for the stated example, I am however looking at what you've presented, and it certainly gives me a few ideas for some other SELECT statements I'm currently working on.

Querying SQLITE DB for Data from One Column Based On Another Column

I hope the title of this post makes sense.
The db in question has two columns that are related to my issue, a date column that follows the format xx/xx/xxxx and price a column. What I want to do is get a sum of the prices in the price column based on the month and year in which they occurred, but that data is in the other aforementioned column. Doing so will allow me to determine the total for a given month of a given year. The problem is I have no idea how to construct a query that would do what I need. I have done some reading on the web, but I'm not really sure how to go about this. Can anyone provide some advice/tips?
Thanks for your time!
Mike
I was able to find a solution using a LIKE clause:
SELECT sum(price) FROM purchases WHERE date LIKE '11%1234%'
The "11" could be any 2-digit month and the "1234" is any 4 digit year. The % sign acts as a wildcard. This query, for example, returns the sum of any prices that were from month 11 of year 1234 in the db.
Thanks for your input!
You cannot use the built-in date functions on these date values because you have stored them formatted for displaing instead of in one of the supported date formats.
If the month and day fields always have two digits, you can use substr:
SELECT substr(MyDate, 7, 4) AS Year,
substr(MyDate, 1, 2) AS Month,
sum(Price)
FROM Purchases
GROUP BY Year,
Month
So, the goal is to get an aggregate grouping by the month?
select strftime('%m', mydate), sum(price)
from mytable
group by strftime('%m', mydate)
Look into group by