Total contracts by month - sql

I am trying to find the total contracts by month. Data is stored in columns (Start Date) and (End Date) multiple lines of data for each month.
SELECT e.CustomerID, e.AgentID,
COUNT(*) engagementnumber
FROM Engagements e
GROUP BY EndDate
The first time I ran the code with
SELECT COUNT(*) engagementnumber,
FROM Engagements,
GROUP BY EndDate
I got a count but it wasn't grouped by month.

You can unpivot the dates and use aggregation:
select year(dte), month(dte),
sum(inc) as change_in_month,
sum(sum(inc)) over (order by min(dte) as active_in_month
from ((select startdate as dte, 1 as inc from Engagements) union all
(select enddate, -1 as int from Engagements)
) t
group by year(dte), month(dte)
order by year(dte), month(dte);

You can try something like
SELECT MONTH(enddate), COUNT(*) OVER (PARTITION BY MONTH(enddate)) engagementnumber FROM engagements

Related

Top N items in every month - BIGQUERY

I have a big query program below;
WITH cte AS(
SELECT *
FROM (
SELECT project_name,
SUM(reward_value) AS total_reward_value,
DATE_TRUNC(date_signing, MONTH) as month,
date_signing,
Row_number() over (partition by DATE_TRUNC(date_signing, MONTH)
order by SUM(reward_value) desc) AS rank
FROM `deals`
WHERE CAST(date_signing as DATE) > '2019-12-31'
AND CAST(date_signing as DATE) < '2020-02-01'
AND target_category = 'achieved'
AND project_name IS NOT NULL
GROUP BY project_name, month, date_signing
)
)
SELECT * FROM cte WHERE rank <= 5
that returns the following result:
While I expect to have each unique project to be SUM within each month and then I filter only the top 5.
Something like this:
I got the following error if the date_signing grouping is removed
PARTITION BY expression references column date_signing which is neither grouped nor aggregated at [16:48]
Any hints what should be corrected will be appreciated!
One more subquery maybe then?
WITH cte AS(
SELECT project_name,
SUM(reward_value) as reward_sum,
DATE_TRUNC(date_signing, MONTH) as month
FROM `deals`
WHERE CAST(date_signing as DATE) > '2019-12-31'
AND CAST(date_signing as DATE) < '2020-02-01'
AND target_category = 'achieved'
AND project_name IS NOT NULL
GROUP BY project_name, month
),
ranks AS (
SELECT
project_name,
reward_sum,
month,
ROW_NUMBER() over (PARTITION BY month ORDER BY reward_sum DESC) AS rank
)
SELECT *
FROM ranks
WHERE rank <= 5
yeah you can't do that , yo can show the last signing date instead:
WITH cte AS(
SELECT project_name,
SUM(reward_value),
DATE_TRUNC(date_signing, MONTH) as month,
MAX(date_signing) as last_signing_date,
Row_number() over (partition by DATE_TRUNC(date_signing, MONTH)
order by SUM(reward_value) desc) AS rank
FROM `deals`
WHERE CAST(date_signing as DATE) > '2019-12-31'
AND CAST(date_signing as DATE) < '2020-02-01'
AND target_category = 'achieved'
AND project_name IS NOT NULL
GROUP BY project_name, month
)
SELECT * FROM cte WHERE rank <= 5

Get last data recorded of the date and group it by month

tbl_totalMonth has id,time, date and kwh column.
I want to get the last recorded data of the months and group it per month so the result would be the name of the month and kwh.
the result should be something like this:
month | kwh
------------
January | 150
February | 400
the query I tried: (but it returns the max kwh not the last kwh recorded)
SELECT DATENAME(MONTH, a.date) as monthly, max(a.kwh) as kwh
from tbl_totalMonth a
WHERE date > = DATEADD(yy,DATEDIFF(yy,0, GETDATE() -1 ),0)
group by DATENAME(MONTH, a.date)
I suspect you need something quite different:
select *
from (
select *
, row_number() over(partition by month(a.date), year(a.date) order by a.date DESC) as rn
from tbl_totalMonth a
WHERE date > = DATEADD(yy,DATEDIFF(yy,0, GETDATE() -1 ),0)
) d
where rn = 1
To get "the last kwh recorded (per month)" you need to use row_number() which - per month - will order the rows (descending) and give each one a row number. When that number is 1 you have "the most recent" row for that month, and you won't need group by at all.
You could use group by and month
select datename(month, date), sum(kwh)
from tbl_totalMonth
where date = (select max(date) from tbl_totalMonth )
group by datename(month, date)
if you need only the last row for each month then youn should use
select datename(month, date), khw
from tbl_totalMonth a
inner join (
select max(date) as max_date
from tbl_totalMonth
group by month(date)) t on t.max_date = a.date

How can I count users in a month that were not present in the month before?

I am trying to count unique users on a monthly basis that were not present in the previous month. So if a user has a record for January and then another one for February, then I would only count January for that user.
user_id time
a1 1/2/17
a1 2/10/17
a2 2/18/17
a4 2/5/17
a5 3/25/17
My results should look like this
Month User Count
January 1
February 2
March 1
I'm not really familiar with BigQuery, but here's how I would solve the problem using TSQL. I imagine that you'd be able to use similar logic in BigQuery.
1). Order the data by user_id first, and then time. In TSQL, you can accomplish this with the following and store it in a common table expression, which you will query in the step after this.
;WITH cte AS
(
select ROW_NUMBER() OVER (PARTITION BY [user_id] ORDER BY [time]) AS rn,*
from dbo.employees
)
2). Next query for only the rows with rn = 1 (the first occurrence for a particular user) and group by the month.
select DATENAME(month, [time]) AS [Month], count(*) AS user_count
from cte
where rn = 1
group by DATENAME(month, [time])
This is assuming that 2017 is the only year you're dealing with. If you're dealing with more than one year, you probably want step #2 to look something like this:
select year([time]) as [year], DATENAME(month, [time]) AS [month],
count(*) AS user_count
from cte
where rn = 1
group by year([time]), DATENAME(month, [time])
First aggregate by the user id and the month. Then use lag() to see if the user was present in the previous month:
with du as (
select date_trunc(time, month) as yyyymm, user_id
from t
group by date_trunc(time, month)
)
select yyyymm, count(*)
from (select du.*,
lag(yyyymm) over (partition by user_id order by yyyymm) as prev_yyyymm
from du
) du
where prev_yyyymm is not null or
prev_yyyymm < date_add(yyyymm, interval 1 month)
group by yyyymm;
Note: This uses the date functions, but similar functions exist for timestamp.
The way I understood question is - to exclude user to be counted in given month only if same user presented in previous month. But if same user present in few months before given, but not in previous - user should be counted.
If this is correct - Try below for BigQuery Standard SQL
#standardSQL
SELECT Year, Month, COUNT(DISTINCT user_id) AS User_Count
FROM (
SELECT *,
DATE_DIFF(time, LAG(time) OVER(PARTITION BY user_id ORDER BY time), MONTH) AS flag
FROM (
SELECT
user_id,
DATE_TRUNC(PARSE_DATE('%x', time), MONTH) AS time,
EXTRACT(YEAR FROM PARSE_DATE('%x', time)) AS Year,
FORMAT_DATE('%B', PARSE_DATE('%x', time)) AS Month
FROM yourTable
GROUP BY 1, 2, 3, 4
)
)
WHERE IFNULL(flag, 0) <> 1
GROUP BY Year, Month, time
ORDER BY time
you can test / play with above using below example with dummy data from your question
#standardSQL
WITH yourTable AS (
SELECT 'a1' AS user_id, '1/2/17' AS time UNION ALL
SELECT 'a1', '2/10/17' UNION ALL
SELECT 'a2', '2/18/17' UNION ALL
SELECT 'a4', '2/5/17' UNION ALL
SELECT 'a5', '3/25/17'
)
SELECT Year, Month, COUNT(DISTINCT user_id) AS User_Count
FROM (
SELECT *,
DATE_DIFF(time, LAG(time) OVER(PARTITION BY user_id ORDER BY time), MONTH) AS flag
FROM (
SELECT
user_id,
DATE_TRUNC(PARSE_DATE('%x', time), MONTH) AS time,
EXTRACT(YEAR FROM PARSE_DATE('%x', time)) AS Year,
FORMAT_DATE('%B', PARSE_DATE('%x', time)) AS Month
FROM yourTable
GROUP BY 1, 2, 3, 4
)
)
WHERE IFNULL(flag, 0) <> 1
GROUP BY Year, Month, time
ORDER BY time
The output is
Year Month User_Count
2017 January 1
2017 February 2
2017 March 1
Try this query:
SELECT
t1.d,
count(DISTINCT t1.user_id)
FROM
(
SELECT
EXTRACT(MONTH FROM time) AS d,
--EXTRACT(MONTH FROM time)-1 AS d2,
user_id
FROM nbitra.tmp
) t1
LEFT JOIN
(
SELECT
EXTRACT(MONTH FROM time) AS d,
user_id
FROM nbitra.tmp
) t2
ON t1.d = t2.d+1
WHERE
(
t1.user_id <> t2.user_id --User is in previous month
OR t2.user_id IS NULL --To handle january, since there is no previous month to compare to
)
GROUP BY t1.d;

Total Count of Active Employees by Date

I have in the past written queries that give me counts by date (hires, terminations, etc...) as follows:
SELECT per.date_start AS "Date",
COUNT(peo.EMPLOYEE_NUMBER) AS "Hires"
FROM hr.per_all_people_f peo,
hr.per_periods_of_service per
WHERE per.date_start BETWEEN peo.effective_start_date AND peo.EFFECTIVE_END_DATE
AND per.date_start BETWEEN :PerStart AND :PerEnd
AND per.person_id = peo.person_id
GROUP BY per.date_start
I was now looking to create a count of active employees by date, however I am not sure how I would date the query as I use a range to determine active as such:
SELECT COUNT(peo.EMPLOYEE_NUMBER) AS "CT"
FROM hr.per_all_people_f peo
WHERE peo.current_employee_flag = 'Y'
and TRUNC(sysdate) BETWEEN peo.effective_start_date AND peo.EFFECTIVE_END_DATE
Here is a simple way to get started. This works for all the effective and end dates in your data:
select thedate,
SUM(num) over (order by thedate) as numActives
from ((select effective_start_date as thedate, 1 as num from hr.per_periods_of_service) union all
(select effective_end_date as thedate, -1 as num from hr.per_periods_of_service)
) dates
It works by adding one person for each start and subtracting one for each end (via num) and doing a cumulative sum. This might have duplicates dates, so you might also do an aggregation to eliminate those duplicates:
select thedate, max(numActives)
from (select thedate,
SUM(num) over (order by thedate) as numActives
from ((select effective_start_date as thedate, 1 as num from hr.per_periods_of_service) union all
(select effective_end_date as thedate, -1 as num from hr.per_periods_of_service)
) dates
) t
group by thedate;
If you really want all dates, then it is best to start with a calendar table, and use a simple variation on your original query:
select c.thedate, count(*) as NumActives
from calendar c left outer join
hr.per_periods_of_service pos
on c.thedate between pos.effective_start_date and pos.effective_end_date
group by c.thedate;
If you want to count all employees who were active during the entire input date range
SELECT COUNT(peo.EMPLOYEE_NUMBER) AS "CT"
FROM hr.per_all_people_f peo
WHERE peo.[EFFECTIVE_START_DATE] <= :StartDate
AND (peo.[EFFECTIVE_END_DATE] IS NULL OR peo.[EFFECTIVE_END_DATE] >= :EndDate)
Here is my example based on Gordon Linoff answer
with a little modification, because in SUBSTRACT table all records were appeared with -1 in NUM, even if no date was in END DATE = NULL.
use AdventureWorksDW2012 --using in MS SSMS for choosing DATABASE to work with
-- and may be not work in other platforms
select
t.thedate
,max(t.numActives) AS "Total Active Employees"
from (
select
dates.thedate
,SUM(dates.num) over (order by dates.thedate) as numActives
from
(
(
select
StartDate as thedate
,1 as num
from DimEmployee
)
union all
(
select
EndDate as thedate
,-1 as num
from DimEmployee
where EndDate IS NOT NULL
)
) AS dates
) AS t
group by thedate
ORDER BY thedate
worked for me, hope it will help somebody
I was able to get the results I was looking for with the following:
--Active Team Members by Date
SELECT "a_date",
COUNT(peo.EMPLOYEE_NUMBER) AS "CT"
FROM hr.per_all_people_f peo,
(SELECT DATE '2012-04-01'-1 + LEVEL AS "a_date"
FROM dual
CONNECT BY LEVEL <= DATE '2012-04-30'+2 - DATE '2012-04-01'-1
)
WHERE peo.current_employee_flag = 'Y'
AND "a_date" BETWEEN peo.effective_start_date AND peo.EFFECTIVE_END_DATE
GROUP BY "a_date"
ORDER BY "a_date"

sql to find row for min date in each month

I have a table, lets say "Records" with structure:
id date
-- ----
1 2012-08-30
2 2012-08-29
3 2012-07-25
I need to write an SQL query in PostgreSQL to get record_id for MIN date in each month.
month record_id
----- ---------
8 2
7 3
as we see 2012-08-29 < 2012-08-30 and it is 8 month, so we should show record_id = 2
I tried something like this,
SELECT
EXTRACT(MONTH FROM date) as month,
record_id,
MIN(date)
FROM Records
GROUP BY 1,2
but it shows 3 records.
Can anybody help?
SELECT DISTINCT ON (EXTRACT(MONTH FROM date))
id,
date
FROM Records1
ORDER BY EXTRACT(MONTH FROM date),date
SQLFiddle http://sqlfiddle.com/#!12/76ca2/3
UPD: This query:
1) Orders the records by month and date
2) For every month picks the first record (the first record has MIN(date) because of ordering)
Details here http://www.postgresql.org/docs/current/static/sql-select.html#SQL-DISTINCT
This will return multiples if you have duplicate minimum dates:
Select
minbymonth.Month,
r.record_id
From (
Select
Extract(Month From date) As Month,
Min(date) As Date
From
records
Group By
Extract(Month From date)
) minbymonth
Inner Join
records r
On minbymonth.date = r.date
Order By
1;
Or if you have CTEs
With MinByMonth As (
Select
Extract(Month From date) As Month,
Min(date) As Date
From
records
Group By
Extract(Month From date)
)
Select
m.Month,
r.record_id
From
MinByMonth m
Inner Join
Records r
On m.date = r.date
Order By
1;
http://sqlfiddle.com/#!1/2a054/3
select extract(month from date)
, record_id
, date
from
(
select
record_id
, date
, rank() over (partition by extract(month from date) order by date asc) r
from records
) x
where r=1
order by date
SQL Fiddle
select distinct on (date_trunc('month', date))
date_trunc('month', date) as month,
id,
date
from records
order by 1, 3 desc
I think you need use sub-query, something like this:
SELECT
EXTRACT(MONTH FROM r.date) as month,
r.record_id
FROM Records as r
INNER JOIN (
SELECT
EXTRACT(MONTH FROM date) as month,
MIN(date) as mindate
FROM Records
GROUP BY EXTRACT(MONTH FROM date)
) as sub on EXTRACT(MONTH FROM r.date) = sub.month and r.date = sub.mindate