Monthly Count with Daily records - sql

I work with credit card accounts. Each day, every account adds a record to our database. There is associated data (not relevant to this), but there is one column that shows a boolean (1,0) if the account is active or now, and the date of that record
The data looks a little like this
ACCOUNT DATA1 DATA2 ISACTIVE INSERT DATE
1234 XXX XXXX 1 5/1/2019
1234 XXX XXXX 1 5/2/2019
1234 XXX XXXX 1 5/3/2019
1234 XXX XXXX 1 5/4/2019
5678 XXX XXXX 1 5/1/2019
5678 XXX XXXX 1 5/2/2019
5678 XXX XXXX 1 5/3/2019
5678 XXX XXXX 1 5/4/2019
I am looking to figure a distinct count of accounts that are active per month (based on the 1st of each month) going back about 18 months. I am not sure how to code for this though.
I appreciate any help

SELECT Count(DISTINCT account)
FROM t
WHERE isactive = 1
GROUP BY Month(insert_date),
Year(insert_date)

Try this:
SELECT YEAR([INSERT DATE]) AS [Year], MONTH([INSERT DATE]) AS [Month], COUNT(DISTINCT [Account]) AS [UniqueActiveAccounts]
FROM [YourTableName]
WHERE [ISACTIVE] = 1 AND [INSERT DATE] > DATEADD(MONTH,-19,GETDATE())
GROUP BY YEAR([INSERT DATE]), MONTH([INSERT DATE])
This query gets data 18 months back from the time you run this query. You can adjust this period in the DATEADD part of the query.
You will of course need to insert the name of your table after FROM.

You could try grouping on month(INSERT DATE):
SELECT month([INSERT DATE]) as month_num,count(distinct ACCOUNT) as ACCOUNT_num
from table
group by month([INSERT DATE])

Related

How to calculate average monthly number of some action in some perdion in Teradata SQL?

I have table in Teradata SQL like below:
ID trans_date
------------------------
123 | 2021-01-01
887 | 2021-01-15
123 | 2021-02-10
45 | 2021-03-11
789 | 2021-10-01
45 | 2021-09-02
And I need to calculate average monthly number of transactions made by customers in a period between 2021-01-01 and 2021-09-01, so client with "ID" = 789 will not be calculated because he made transaction later.
In the first month (01) were 2 transactions
In the second month was 1 transaction
In the third month was 1 transaction
In the nineth month was 1 transactions
So the result should be (2+1+1+1) / 4 = 1.25, isn't is ?
How can I calculate it in Teradata SQL? Of course I showed you sample of my data.
SELECT ID, AVG(txns) FROM
(SELECT ID, TRUNC(trans_date,'MON') as mth, COUNT(*) as txns
FROM mytable
-- WHERE condition matches the question but likely want to
-- use end date 2021-09-30 or use mth instead of trans_date
WHERE trans_date BETWEEN date'2021-01-01' and date'2021-09-01'
GROUP BY id, mth) mth_txn
GROUP BY id;
Your logic translated to SQL:
--(2+1+1+1) / 4
SELECT id, COUNT(*) / COUNT(DISTINCT TRUNC(trans_date,'MON')) AS avg_tx
FROM mytable
WHERE trans_date BETWEEN date'2021-01-01' and date'2021-09-01'
GROUP BY id;
You should compare to Fred's answer to see which is more efficent on your data.

Combining Two Tables & Summing REV amts by Mth

Below are my two tables of data
Acct BillingDate REV
101 01/05/2018 5
101 01/30/2018 4
102 01/15/2018 2
103 01/4/2018 3
103 02/05/2018 2
106 03/06/2018 5
Acct BillingDate Lease_Rev
101 01/15/2018 2
102 01/16/2018 1
103 01/19/2018 2
104 02/05/2018 3
105 04/02/2018 1
Desired Output
Acct Jan Feb Mar Apr
101 11
102 3
103 5 2
104 3
105 1
106 5
My SQL Script is Below:
SELECT [NewSalesHistory].[Region]
,[NewSalesHistory].[Account]
,SUM(case when [NewSalesHistory].[billingdate] between '6/1/2016' and '6/30/2016' then REV else 0 end ) + [X].[Jun-16] AS 'Jun-16'
FROM [NewSalesHistory]
FULL join (SELECT [Account]
,SUM(case when [BWLease].[billingdate] between '6/1/2016' and '6/30/2016' then Lease_REV else 0 end ) as 'Jun-16'
FROM [AirgasPricing].[dbo].[BWLease]
GROUP BY [Account]) X ON [NewSalesHistory].[Account] = [X].[Account]
GROUP BY [NewSalesHistory].[Region]
,[NewSalesHistory].[Account]
,[X].[Jun-16]
I am having trouble combining these tables. If there is a rev amt and lease rev amt then it will combine (sum) for that account. If there is not a lease rev amt (which is the majority of the time), it brings back NULLs for all other rev amts accounts in Table 1. Table one can have duplicate accounts with different Rev, while the Table two is one unique account only w Lease rev. The output above is how I would like to see the data.
What am I missing here? Thanks!
I would suggest union all and group by:
select acct,
sum(case when billingdate >= '2016-01-01' and billingdate < '2016-02-01' then rev end) as rev_201601,
sum(case when billingdate >= '2016-02-01' and billingdate < '2016-03-01' then rev end) as rev_201602,
. . .
from ((select nsh.acct, nsh.billingdate, nsh.rev
from NewSalesHistory
) union all
(select bl.acct, bl.billingdate, bl.rev
from AirgasPricing..BWLease bl
)
) x
group by acct;
Okay, so there are a few things going on here:
1) As Gordon Linoff mentioned you can perform a union all on the two tables. Be sure to limit your column selections and name your columns appropriately:
select
x as consistentname1,
y as consistentname2,
z as consistentname3
from [NewSalesHistory]
union all
select
a as consistentname1,
b as consistentname2,
c as consistentname3
from [BWLease]
2) Your desired result contains a pivoted month column. Generate a column with your desired granularity on the result of the union in step one. F.ex. months:
concat(datepart(yy, Date_),'-',datename(mm,Date_)) as yyyyM
Then perform aggregation using a group by:
select sum(...) as desiredcolumnname
...
group by PK1, PK2, yyyyM
Finally, PIVOT to obtain your result: https://learn.microsoft.com/en-us/sql/t-sql/queries/from-using-pivot-and-unpivot?view=sql-server-2017
3) If you have other fields/columns that you wish to present then you first need to determine whether they are measures (can be aggregated) or are dimensions. That may be best addressed in a follow up question after you've achieved what you set out for in this part.
Hope it helps
As an aside, it seems like you are preparing data for reporting. Performing these transformations can be facilitated using a GUI such as MS Power Query. As long as your end goal is not data manipulation in the DB itself, you do not need to resort to raw sql.

SQL JOIN - retrieve MAX DateTime from second table and the first DateTime after previous MAX for other value

I have issue with creating a proper SQL expression.
I have table TICKET with column TICKETID
TICKETID
1000
1001
I then have table STATUSHISTORY from where I need to retrieve what was the last time (maximum time) when that ticket entered VENDOR status (last VENDOR status) and when it exited VENDOR status (by exiting VENDOR status I mean the first next INPROG status, but only first INPROG after the VENDOR status, it's always INPROG the next status after VENDOR status). Also it is also possible that VENDOR status for ID does not exist at all in STATUSHISOTRY (then nulls should be returned), but INPROG exists always - it can be before but also and after VENDOR status, if ID is not anymore in VENDOR status.
Here is the example of STATUSHISTORY.
ID TICKETID STATUS DATETIME
1 1000 INPROG 01.01.2017 10:00
2 1000 VENDOR 02.01.2017 10:00
3 1000 INPROG 03.01.2017 10:00
4 1000 VENDOR 04.01.2017 10:00
5 1000 INPROG 05.01.2017 10:00
6 1000 HOLD 06.01.2017 10:00
7 1000 INPROG 07.01.2017 10:00
8 1001 INPROG 02.02.2017 10:00
9 1001 VENDOR 03.02.2017 10:00
10 1001 INPROG 04.02.2017 10:00
11 1001 VENDOR 05.02.2017 10:00
So the result when doing the query from TICKET table and doing the JOIN with table STATUSHISTORY should be:
ID VENDOR_ENTERED VENDOR_EXITED
1000 04.01.2017 10:00 05.01.2017 10:00
1001 05.02.2017 10:00 null
Because for ID 1000 last VENDOR status was at 04.01.2017 and the first INPROG status after the VENDOR status for that ID was at 05.01.2017 while for ID 1001 the last VENDOR status was at 05.02.2017 and after that INPROG status did not happen yet.
If VENDOR did not exist then both columns should be null in result.
I am really stuck with this, trying different JOINs but without any progress.
Thank you in advance if you can help me.
You can do this with window functions. First, assign a "vendor" group to the tickets. You can do this using a cumulative sum counting the number of "vendor" records on or before each record.
Then, aggregate the records to get one record per "vendor" group. And use row numbers to get the most recent records. So:
with vg as (
select ticket,
min(datetime) as vendor_entered,
min(case when status = 'INPROG' then datetime end) as vendor_exitied
from (select sh.*,
sum(case when status = 'VENDOR' then 1 else 0 end) over (partition by ticketid order by datetime) as grp
from statushistory sh
) sh
group by ticket, grp
)
select vg.tiketid, vg.vendor_entered, vg.vendor_exited
from (select vg.*,
row_number() over (partition by ticket order by vendor_entered desc) as seqnum
from vg
) vg
where seqnum = 1;
You can aggregate to get max time, then join onto all of the date values higher than that time, and then re-aggregate:
select a.TicketID,
a.VENDOR_ENTERED,
min( EXIT_TIME ) as VENDOR_EXITED
from (
select TicketID,
max( DATETIME ) as VENDOR_ENTERED
from StatusHistory
where Status = 'VENDOR'
group by TicketID
) as a
left join
(
select TicketID,
DATETIME as EXIT_TIME
from StatusHistory
where Status = 'INPROG'
) as b
on a.TicketID = b.TicketID
and EXIT_TIME >= a.VENDOR_ENTERED
group by a.TicketID,
a.VENDOR_ENTERED
DB2 is not supported in SQLfiddle, but a standard SQL example can be found here.

how to count days between two dates with where conditions

i have table and it has following data:
USERID NAME DATEFROM DATETO
1 xxx 2014-05-10 2014-05-15
1 xxx 2014-05-20 2014-05-25
4 yyy 2014-04-20 2014-04-21
now i have sql query like :
select * from leave where datefrom>='2014-05-01' and dateto<='2014-05-31'
so now i want output :
userid name total_leave_days
1 xxx 12
4 yyy 2
(2014-05-10 - 2014-05-15 )=6 days
(2014-05-20 - 2014-05-25 )=6 days
total = 12 days for useid 1
(2014-04-20 - 2014-04-21)= 2 days for userid 4
how can i calculate this total days .?
Please try:
select
USERID,
NAME,
SUM(DATEDIFF(day, DATEFROM, DATETO)+1) total_leave_days
From leave
group by USERID, NAME
SQL Fiddle Demo
It's important to note that you need "+1" to emulate the expected calculations because there is an inherent assumption of ""start of day" for the Start date and "end of day" for end date - but dbms's don't think that way. a date is always stored as "start of day".
select
USERID
, name
, sum( datediff(day,DATEFROM,DATETO) + 1 ) as leave_days
from leavetable
group by
USERID
, name
produces this:
| USERID | NAME | LEAVE_DAYS |
|--------|------|------------|
| 1 | xxx | 12 |
| 4 | yyy | 2 |
see: http://sqlfiddle.com/#!3/ebe5d/1
You can use DateDiff.
SELECT UserID, Name, SUM(DATEDIFF(DAY, DateFrom, DateTo) + 1) AS total_leave_days
FROM leave
WHERE datefrom >= '2014-05-01' AND dateto <= '2014-05-31'
GROUP BY UserID, Name
The + 1 ,of course, is because DATEDIFF will return the exclusive count, where it sounds like you want the inclusive number of days.
Try this:
select userid, name, sum (1 + datediff(day,datefrom,dateto)) as total_leave_days
from leaves
where datefrom>='2014-05-01' and dateto<='2014-05-31'
group by userid, name
This will sum the total leaves per userid. Note that datediff will give you 5 days difference for the range 2014-05-10 to 2014-05-15, so we need to add 1 to the result to get 6 days i.e. range inclusive of both ends.
Demo

SQL Access -- Keep record only with most recent timestamp

I have a table that appears as follows:
Time Name Cust_ID Num_Calls Num_Orders
12.00 ABC 100 20 10
12.25 PQR 102 23 12
12.30 ABC 100 26 15
01.00 ABC 100 26 18
02.00 PQR 102 23 14
04.00 PQR 102 25 20
How do I delete the earlier records for each "Name & Cust_ID" and keep the most recent one. The other fields in the record may change as I run them through my Access Database, but Name and ID remains the same.
My output at the End of the Day should be:
Time Name Cust_ID Num_Calls Num_Orders
01.00 ABC 100 26 18
04.00 PQR 102 25 20
I think if your cust_id is unique, you should make it a primary key in the table.
Then whenever you have a new entry, first check and see if the current cust_id already exists.
If yes, update that entry in the table.
Else do an insert.
Try this...
This should give you your most recent records based on max(time), you could delete the complement of this set.
SELECT * FROM YOUR_TABLE A
INNER JOIN
( SELECT MAX(time) MAX_time
, NAME , CUST_ID
FROM YOUR_TABLE
GROUP BY
NAME , CUST_ID )B
ON A.NAME=B.Name
and A.CUST_ID=B.CUst_ID
and A.time =B.max_time
So you would delete the following records
DELETE FROM YOUR_TABLE
WHERE EXISTS
(SELECT * FROM YOUR_TABLE B
WHERE TIME <>( SELECT MAX(time) FROM YOUR_TABLE C WHERE B.NAME=C.Name
and C.CUST_ID=B.CUst_ID )
AND A.NAME=B.Name
and A.CUST_ID=B.CUst_ID)