SQL - Average number of records within a time period - sql

I'm trying to compile some lifetime value information for customers within one of our databases.
We have an MS SQL Server database which stores all of our customer/transactional information.
My issue is that I don't have much experience when it comes to MS SQL Server (or SQL in general) - I'd like to be able to run a query against the database that pulls AVG number of loans, and AVG revenue based on three criteria:
1.) Loans be counted if they are 'approved'
2.) Loans from a customer_id only be counted if the first loan (first identified by date_created field) be on or after a certain 'mm/yyyy'
3.) I'm able to specify for how many months after the first 'mm/yyyy' to tally the number of loans / revenue to be included within the AVG
Here is what the database would look like:
customer_id | loan_status | date_created | revenue
111 | 'approved' | 2010-06-20 17:17:09 | 100.00
222 | 'approved' | 2010-06-21 09:54:43 | 255.12
333 | 'denied' | 2011-06-21 12:47:30 | NULL
333 | 'approved' | 2011-06-21 12:47:20 | 56.87
222 | 'denied' | 2011-06-21 09:54:48 | NULL
222 | 'approved' | 2011-06-21 09:54:18 | 50.00
111 | 'approved' | 2011-06-20 17:17:23 | 100.00
... loads' of records ...
555 | 'approved' | 2012-01-02 09:08:42 | 24.70
111 | 'denied' | 2012-01-05 02:10:36 | NULL
666 | 'denied' | 2012-02-05 03:31:16 | NULL
555 | 'approved' | 2012-02-17 09:32:26 | 197.10
777 | 'approved' | 2012-04-03 18:28:45 | 300.50
777 | 'approved' | 2012-06-28 02:42:01 | 201.80
555 | 'approved' | 2012-06-21 22:16:59 | 10.00
666 | 'approved' | 2012-09-30 01:17:20 | 50.00
If I wanted to find the avg transaction count (approved transactions), and average revenue per approved transaction for all customer's who's first loan was in/after 2012-01, and for a period of 4 months after then, how would I go about querying the database?
Any help is greatly appreciated.

something like this (there maybe a few typos here and there)...
you could first calculate the minimum loan date:
select customer_id, min(date_created) from table t where loan_status = 'approved' group by customer_id
then you can join to it:
select customer_id, count(date_created), avg(revenue) from table t
join (
select customer_id, min(date_created) as min_date from table t where loan_status = 'approved' group by customer_id ) s
on t.customer_id = s.customer_id
where t.date_created between s.min_date and DATEADD(month, 4, s.min_date) and t.loan_status = 'approved'

Rename tbl to your table name.
Specify dates in the format YYYYMMDD.
select customer_id, AVG(revenue) average_revenue
from
(
select customer_id
from tbl
group by customer_id
having min(date_created) >= '20120101'
) fl
join tbl t on t.customer_id = fl.customer_id
where t.loan_status = 'approved'
and date_created < '20120501' -- NOT including May the first, so Jan through Apr (4 months)
If you mean 4 months after each customer's first loan, leave me a comment, state whether it's 4 calendar months (e.g. 15-Jan to 15-May) or up to the last day of the 4th month (15-Jan to 30-Apr), and I'll update the answer.

Related

SQL Day-over-Day count miscalculation

I'm encountering a bug in my SQL code that calculates the day-over-day (DoD) count difference. This table (curr_day) summarizes the count of trades on any business day (i.e. excluding weekends and government-mandated holidays) and is joined by a similar table (prev_day) that is day-lagged (previous day). The joining is based on the day's rank; for example the first day on the curr_day table is Jan-01 and it's rank is 1, the first day (rank 1) for the prev_day table is Dec-31.
My issue is that the trade count does not seem to calculate positive changes (see table below), only negative or no changes. This problem does not affect other fields that calculate the value of a trade, simply the amount of trades on a given day.
Sample of query
with curr_day as (select GROUP, COUNT from DB where DATE is not HOLIDAY),
prev_day as (select rank()over(partition by GROUP order by DATE) as RANK, GROUP, DATE, COUNT
from curr_day where DATE is not HOLIDAY)
select ID, DATE, curr_day.COUNT-prev_day.COUNT
from (select rank()over(partition by curr_day.GROUP order by curr_day.DATE) as RANK
from curr_day
where curr_day.DATE >= (select min(curr_day.DATE)+1) from curr_day)
left join prev_day on curr_day.RANK = prev_day.RANK and curr_day.GROUP = prev_day.GROUP)
;
Output table
Date | Group | Count | DoD_Cnt_Diff
2020-12-31 | A | 1 | 0
2021-01-01 | A | 1 | 0
2021-01-02 | A | 0 | -1
2021-01-03 | A | 1 | (null)
2021-01-04 | A | 0 | -1
2021-01-05 | A | 0 | 0
2021-12-31 | B | 0 | 0

How to create a chart to get number of account for a customer by period in sql

I have an issue we want to create a request to add customer number of account by period.
For each account I have : accountid, customerid, createddate and deleteddate
select accountid,customerid, createddate , deleteddate from account
where customerid = 1
This customer have 4 accounts :
accountid | customerid | createddate | deleteddate
2145 | 6641 | 2018-12-12 10:39:16.457 | 2020-03-26 00:00:12.540
2718 | 6641 | 2020-02-11 15:04:51.643 | 2020-03-26 00:00:04.947
2825 | 46818 | 2020-04-14 15:28:30.400 | 2020-04-29 15:58:30.651
2851 | 46818 | 2020-06-05 12:41:45.790 | NULL
so i want a chart for current year to get the nb of account of the customer not for each month but for each modification
For exemple 02/01/2020 I will have 1 account
03/01/2020 I will have 0 account
It is possible to do that or something like that in SQL ? And how can I do it if it's possible.
get the nb of account of the customer not for each month but for each modification
Is this what you want?
select
x.customer_id,
x.modifdate,
sum(x.cnt) over(partition by x.customer_id order by x.modifdate) no_active_accounts
from mytable t
cross apply (
values (customer_id, createddate, 1), (customer_id, deleteddate, -1)
) as x(customer_id, modifdate, cnt)
where modifdate is not null
For each customer, this generates one record everytime an account is created or deleted, with the modification date and the running count of active accounts.

Oracle SQL Join Data Sequentially

I am trying to track the usage of material with my SQL. There is no way in our database to link when a part is used to the order it originally came from. A part simply ends up in a bin after an order arrives, and then usage of parts basically just creates a record for the number of parts used at a time of transaction. I am attempting to, as best I can, link usage to an order number by summing over the data and sequentially assigning it to order numbers.
My sub queries have gotten me this far. Each order number is received on a date. I then join the usage table records based on the USEDATE needing to be equal to or greater than the RECEIVEDATE of the order. The data produced by this is as such:
| ORDERNUM | PARTNUM | RECEIVEDATE | ORDERQTY | USEQTY | USEDATE |
|----------|----------|-------------------------|-----------|---------|------------------------|
| 4412 | E1125 | 10/26/2016 1:32:25 PM | 1 | 1 | 11/18/2016 1:40:55 PM |
| 4412 | E1125 | 10/26/2016 1:32:25 PM | 1 | 3 | 12/26/2016 2:19:32 PM |
| 4412 | E1125 | 10/26/2016 1:32:25 PM | 1 | 1 | 1/3/2017 8:31:21 AM |
| 4111 | E1125 | 10/28/2016 2:54:13 PM | 1 | 1 | 11/18/2016 1:40:55 PM |
| 4111 | E1125 | 10/28/2016 2:54:13 PM | 1 | 3 | 12/26/2016 2:19:32 PM |
| 4111 | E1125 | 10/28/2016 2:54:13 PM | 1 | 1 | 1/3/2017 8:31:21 AM |
| 0393 | E1125 | 12/22/2016 11:52:04 AM | 3 | 3 | 12/26/2016 2:19:32 PM |
| 0393 | E1125 | 12/22/2016 11:52:04 AM | 3 | 1 | 1/3/2017 8:31:21 AM |
| 7812 | E1125 | 12/27/2016 10:56:01 AM | 1 | 1 | 1/3/2017 8:31:21 AM |
| 1191 | E1125 | 1/5/2017 1:12:01 PM | 2 | 0 | null |
The query for the above section looks as such:
SELECT
B.*,
NVL(B2.QTY, ‘0’) USEQTY
B2.USEDATE USEDATE
FROM <<Sub Query B>>
LEFT JOIN USETABLE B2 ON B.PARTNUM = B2.PARTNUM AND B2.USEDATE >= B.RECEIVEDATE
My ultimate goal here is to join USEQTY records sequentially until they have filled enough ORDERQTY’s. I also need to add an ORDERUSE column that represents what QTY from the USEQTY column was actually applied to that record. Not really sure how to word this any better so here is example of what I need to happen based on the table above:
| ORDERNUM | PARTNUM | RECEIVEDATE | ORDERQTY | USEQTY | USEDATE | ORDERUSE |
|----------|----------|-------------------------|-----------|---------|------------------------|-----------|
| 4412 | E1125 | 10/26/2016 1:32:25 PM | 1 | 1 | 11/18/2016 1:40:55 PM | 1 |
| 4111 | E1125 | 10/28/2016 2:54:13 PM | 1 | 3 | 12/26/2016 2:19:32 PM | 1 |
| 0393 | E1125 | 12/22/2016 11:52:04 AM | 3 | 2 | 12/26/2016 2:19:32 PM | 2 |
| 0393 | E1125 | 12/22/2016 11:52:04 AM | 3 | 1 | 1/3/2017 8:31:21 AM | 1 |
| 7812 | E1125 | 12/27/2016 10:56:01 AM | 1 | 0 | null | 0 |
| 1191 | E1125 | 1/5/2017 1:12:01 PM | 2 | 0 | null | 0 |
If I can get the query to pull the information like above, I will then be able to group the records together and sum the ORDERUSE column which would get me the information I need to know what orders have been used and which have not been fully used. So in the example above, if I were to sum the ORDERUSE column for each of the ORDERNUMs, orders 4412, 4111, 0393 would all show full usage. Orders 7812, 1191 would show not being fully used.
If i am reading this correctly you want to determine how many parts have been used. In your example it looks like you have 5 usages and with 5 orders coming to a total of 8 parts with the following orders having been used.
4412 - one part - one used
4111 - one part - one used
7812 - one part - one used
0393 - three
parts - two used
After a bit of hacking away I came up with the following SQL. Not sure if this works outside of your sample data since thats the only thing I used to test and I am no expert.
WITH data
AS (SELECT *
FROM (SELECT *
FROM sub_b1
join (SELECT ROWNUM rn
FROM dual
CONNECT BY LEVEL < 15) a
ON a.rn <= sub_b1.orderqty
ORDER BY receivedate)
WHERE ROWNUM <= (SELECT SUM(useqty)
FROM sub_b2))
SELECT sub_b1.ordernum,
partnum,
receivedate,
orderqty,
usage
FROM sub_b1
join (SELECT ordernum,
Max(rn) AS usage
FROM data
GROUP BY ordernum) b
ON sub_b1.ordernum = b.ordernum
You are looking for "FIFO" inventory accounting.
The proper data model should have two tables, one for "received" parts and the other for "delivered" or "used". Each table should show an order number, a part number and quantity (received or used) for that order, and a timestamp or date-time. I model both in CTE's in my query below, but in your business they should be two separate table. Also, a trigger or similar should enforce the constraint that a part cannot be used until it is available in stock (that is: for each part id, the total quantity used since inception, at any point in time, should not exceed the total quantity received since inception, also at the same point in time). I assume that the two input tables do, in fact, satisfy this condition, and I don't check it in the solution.
The output shows a timeline of quantity used, by timestamp, matching "received" and "delivered" (used) quantities for each part_id. In the sample data I illustrate a single part_id, but the query will work with multiple part_id's, and orders (both for received and for delivered or used) that include multiple parts (part id's) with different quantities.
with
received ( order_id, part_id, ts, qty ) as (
select '0030', '11A4', timestamp '2015-03-18 15:00:33', 20 from dual union all
select '0032', '11A4', timestamp '2015-03-22 15:00:33', 13 from dual union all
select '0034', '11A4', timestamp '2015-03-24 10:00:33', 18 from dual union all
select '0036', '11A4', timestamp '2015-04-01 15:00:33', 25 from dual
),
delivered ( order_id, part_id, ts, qty ) as (
select '1200', '11A4', timestamp '2015-03-18 16:30:00', 14 from dual union all
select '1210', '11A4', timestamp '2015-03-23 10:30:00', 8 from dual union all
select '1220', '11A4', timestamp '2015-03-23 11:30:00', 7 from dual union all
select '1230', '11A4', timestamp '2015-03-23 11:30:00', 4 from dual union all
select '1240', '11A4', timestamp '2015-03-26 15:00:33', 1 from dual union all
select '1250', '11A4', timestamp '2015-03-26 16:45:11', 3 from dual union all
select '1260', '11A4', timestamp '2015-03-27 10:00:33', 2 from dual union all
select '1270', '11A4', timestamp '2015-04-03 15:00:33', 16 from dual
),
(end of test data; the SQL query begins below - just add the word WITH at the top)
-- with
combined ( part_id, rec_ord, rec_ts, rec_sum, del_ord, del_ts, del_sum) as (
select part_id, order_id, ts,
sum(qty) over (partition by part_id order by ts, order_id),
null, cast(null as date), cast(null as number)
from received
union all
select part_id, null, cast(null as date), cast(null as number),
order_id, ts,
sum(qty) over (partition by part_id order by ts, order_id)
from delivered
),
prep ( part_id, rec_ord, del_ord, del_ts, qty_sum ) as (
select part_id, rec_ord, del_ord, del_ts, coalesce(rec_sum, del_sum)
from combined
)
select part_id,
last_value(rec_ord ignore nulls) over (partition by part_id
order by qty_sum desc) as rec_ord,
last_value(del_ord ignore nulls) over (partition by part_id
order by qty_sum desc) as del_ord,
last_value(del_ts ignore nulls) over (partition by part_id
order by qty_sum desc) as used_date,
qty_sum - lag(qty_sum, 1, 0) over (partition by part_id
order by qty_sum, del_ts) as used_qty
from prep
order by qty_sum
;
Output:
PART_ID REC_ORD DEL_ORD USED_DATE USED_QTY
------- ------- ------- ----------------------------------- ----------
11A4 0030 1200 18-MAR-15 04.30.00.000000000 PM 14
11A4 0030 1210 23-MAR-15 10.30.00.000000000 AM 6
11A4 0032 1210 23-MAR-15 10.30.00.000000000 AM 2
11A4 0032 1220 23-MAR-15 11.30.00.000000000 AM 7
11A4 0032 1230 23-MAR-15 11.30.00.000000000 AM 4
11A4 0032 1230 23-MAR-15 11.30.00.000000000 AM 0
11A4 0034 1240 26-MAR-15 03.00.33.000000000 PM 1
11A4 0034 1250 26-MAR-15 04.45.11.000000000 PM 3
11A4 0034 1260 27-MAR-15 10.00.33.000000000 AM 2
11A4 0034 1270 03-APR-15 03.00.33.000000000 PM 12
11A4 0036 1270 03-APR-15 03.00.33.000000000 PM 4
11A4 0036 21
12 rows selected.
Notes: (1) One needs to be careful if at one moment the cumulative used quantity exactly matches cumulative received quantity. All rows must be include in all the intermediate results, otherwise there will be bad data in the output; but this may result (as you can see in the output above) in a few rows with a "used quantity" of 0. Depending on how this output is consumed (for further processing, for reporting, etc.) these rows may be left as they are, or they may be discarded in a further outer-query with the condition where used_qty > 0.
(2) The last row shows a quantity of 21 with no used_date and no del_ord. This is, in fact, the "current" quantity in stock for that part_id as of the last date in both tables - available for future use. Again, if this is not needed, it can be removed in an outer query. There may be one or more rows like this at the end of the table.

How to join 2 columns as one and order by date?

I'm using SQL Server Compact Edition 4.0 and there are 2 tables called debit and credit as below.
tbl_debit
invoice | dealer | price| purchasedate
=========================================
001 | AAA | 1000 | 2/9/2016 8:46:38 PM
002 | AAA | 1500 | 2/20/2016 8:46:38 PM
tbl_credit
dealer | settlement| purchasedate
=========================================
AAA | 800 | 2/12/2016 8:46:38 PM
AAA | 400 | 2/22/2016 8:46:38 PM
I want to create single table that should include 4 column..
Invoice, Dealer, Amount, date
Amount should include both settlement from tbl_credit and price from tbl_debit and need to order by date.
I really appreciate if anyone can help me.
Here's a script to logically approach the problem based on the limited information presented to us:
SELECT A.invoice, A.dealer, A.amount, A.purchasedate
FROM (SELECT A.invoice, A.dealer, A.price [amount], A.purchasedate
WHERE tbl_debit A
UNION
SELECT ' ', B.dealer, B.settlement, B.purchasedate
FROM tbl_credit B) A
ORDER BY 4

SQL: Select distinct sum of column with max(column)

I have a salary table like this:
id | person_id | start_date | pay
1 | 1234 | 2012-01-01 | 3000
2 | 1234 | 2012-05-01 | 3500
3 | 5678 | 2012-01-01 | 5000
4 | 5678 | 2013-01-01 | 6000
5 | 9101 | 2012-09-01 | 2000
6 | 9101 | 2014-04-01 | 3000
7 | 9101 | 2011-01-01 | 1500
and so on...
Now I want to query the sum of the salaries of a specific month for all persons of a company.
I already have the ids of the persons who worked in the specific month in the specific company, so I can do something like WHERE person_id IN (...)
I have some problems with the salaries query though. The result for e.g. the month 2012-08 should be:
10000
which is 3500+5000+1500.
So I need to find the summed up pay value (for all persons in the IN clause) for the maximum start_date <= the specific month.
I tried various INNER JOINS but it's been a long day and I can't think straight at the moment.
Any hint is highly appreciated.
You need to get the active record. This following does this by calculating the max start date before the month in question:
select sum(s.pay)
from (select person_id, max(start_date) as maxstartdate
from salary
where person_id in ( . . . ) and
start_date < <first day of month of interest>
group by person_id
) p join
salary s
on s.person_id = p.person_id and
s.maxstartdate = p.start_date
You need to fill in the month and list of ids.
You can also do this with ranking functions, but you don't specify which SQL engine you are using.
You have to use group by for these things....
select person_id,sum(pay) from salary where person_id in(...) group by person_id
may it will helps you.....