sql return 1st day of each month in table - sql

I have a sql table like so with two columns...
3/1/17 100
3/2/17 200
3/3/17 300
4/3/17 600
4/4/17 700
4/5/17 800
I am trying to run a query that returns the 1st day of each month in that above table, and grab the corresponding value.
results should be
3/1/17 100
4/3/17 600
then once I have these results... do something with each one.
any ideas how I can get started?

In standard SQL, you would use row_number():
select t.*
from (select t.*,
row_number() over (partition by extract(year from dte), extract(month from dte)
order by dte asc) as seqnum
from t
) t
where seqnum = 1;
Most databases support this functionality, but the exact functions (particularly for dates) may differ depending on the database.

An alternative (SQL Server flavour):
SELECT t.*
FROM YourTable t
JOIN (
select MIN(DateColumn) as MinimumDate
from YourTable
group by FORMAT(DateColumn,'yyyyMM')
) q on (t.DateColumn = q.MinimumDate)
ORDER BY t.DateColumn;
For the GROUP BY this will also be fine:
group by YEAR(DateColumn), MONTH(DateColumn)
or
group by DATEPART(YEAR,DateColumn), DATEPART(MONTH,DateColumn)

Related

Past 7 days running amounts average as progress per each date

So, the query is simple but i am facing issues in implementing the Sql logic. Heres the query suppose i have records like
Phoneno Company Date Amount
83838 xyz 20210901 100
87337 abc 20210902 500
47473 cde 20210903 600
Output expected is past 7 days progress as running avg of amount for each date (current date n 6 days before)
Date amount avg
20210901 100 100
20210902 500 300
20210903 600 400
I tried
Select date, amount, select
avg(lg) from (
Select case when lag(amount)
Over (order by NULL) IS NULL
THEN AMOUNT
ELSE
lag(amount)
Over (order by NULL) END AS LG)
From table
WHERE DATE>=t.date-7) as avg
From table t;
But i am getting wrong avg values. Could anyone please help?
Note: Ive tried without lag too it results the wrong avgs too
You could use a self join to group the dates
select distinct
a.dt,
b.dt as preceding_dt, --just for QA purpose
a.amt,
b.amt as preceding_amt,--just for QA purpose
avg(b.amt) over (partition by a.dt) as avg_amt
from t a
join t b on a.dt-b.dt between 0 and 6
group by a.dt, b.dt, a.amt, b.amt; --to dedupe the data after the join
If you want to make your correlated subquery approach work, you don't really need the lag.
select dt,
amt,
(select avg(b.amt) from t b where a.dt-b.dt between 0 and 6) as avg_lg
from t a;
If you don't have multiple rows per date, this gets even simpler
select dt,
amt,
avg(amt) over (order by dt rows between 6 preceding and current row) as avg_lg
from t;
Also the condition DATE>=t.date-7 you used is left open on one side meaning it will qualify a lot of dates that shouldn't have been qualified.
DEMO
You can use analytical function with the windowing clause to get your results:
SELECT DISTINCT BillingDate,
AVG(amount) OVER (ORDER BY BillingDate
RANGE BETWEEN TO_DSINTERVAL('7 00:00:00') PRECEDING
AND TO_DSINTERVAL('0 00:00:00') FOLLOWING) AS RUNNING_AVG
FROM accounts
ORDER BY BillingDate;
Here is a DBFiddle showing the query in action (LINK)

Add columns to SQL query and filter by min(date) and sum(price)

I am trying to generate a list of users who's first purchase was in December 2018 and have spent over 100 dollars since then in SQL. I'm able to generate the list of users, but I'm unable to determine what their first purchase was or other variables and it appears to be an issue since the columns I'm trying to include are neither grouped nor aggregated so I'm hoping someone can point me in the right direction as I'm new to SQL.
Here's my code to generate the list I want to add more columns to:
select billing_address.name, contact_email, min(processed_at) as First_Purchase_Date, sum(total_price) as Total_Revenue
FROM (
SELECT *, ROW_NUMBER() OVER(PARTITION BY id) AS instance
FROM `table.orders`
) orders -- identify duplicate rows
WHERE instance = 1
group by contact_email, billing_address.name
having min(processed_at) between '2019-01-01 00:00:00 UTC' and '2019-02-01 00:00:00 UTC' and sum(total_price) > 100
order by sum(total_price) desc
Is there some way I can modify this to pull each user's purchase from this list into a separate row and include more columns? So I'd pull in each user (and ALL of their purchases) who has a min(processed_at) in December 2018 AND their sum(total_price) > 100? something like this:
SELECT contact_email, billing_address, line_items, min(processed_at), sum(total_price) OVER (PARTITION BY contact_email)
FROM (
SELECT *, ROW_NUMBER() OVER(PARTITION BY id) AS instance
FROM `table.orders`
) orders -- identify duplicate rows
WHERE instance = 1
However, the sum(total_price) doesn't work in this case and I can't filter by min(processed_at). Can someone guide me in the right direction?
I think that should use window functions instead of aggregation. You can compute the date of the first purchase and the total amount spent on the fly in a subquery, without aggregating (your original group by columns become the partition columns of the window functions). Then you can use these information to filter in the outer query.
This should get you close to what you want:
select o.*
from (
select
o.*,
min(processed_at) over(partition by contact_email, billing_address) min_processed_at,
sum(total_price) over(partition by contact_email, billing_address) sum_total_price
from (
select
o.*,
row_number() over(partition by id) instance
from orders o
) o
where instance = 1
) o
where
processed_at between '2019-01-01 00:00:00 UTC' and '2019-02-01 00:00:00 UTC'
and sum_total_price > 100
Your question was a bit unclear as you did not provide much detail about your input tables or your expected output, so this is a guess.
The following query gets all transactions from users who meet the criteria:
-- BigQuery StandardSQL
with ordered_orders as (
--rank each ID by processed_at date first to last
select *, row_number() over(partition by id order by processed_at asc) as rn
from `table.orders`
),
first_criteria as (
-- select IDs where first processed_at date is in 2018-12
select id, processed_at as first_order_date
from ordered_orders
where rn = 1
and extract(year from processed_at) = 2018
and extract(month from processed_at) = 12
),
second_criteria as (
-- further select IDs who meet first criteria and have a total of > 100
select id, sum(total_prices) as total_revenue
from ordered_orders
inner join first_criteria using(id)
group by id
having total_revenue > 100
),
orders_with_criteria as (
-- get all orders for users who meet both criteria
select ordered_orders.* except(rn), first_order_date, total_revenue
from ordered_orders
inner join first_criteria using(id)
inner join second_criteria using(id)
),
-- select any fields you want
select * from orders_with_criteria
I prefer liberal use of CTEs in cases like this to keep the logic clear.
I also wouldn't be surprised if this query doesn't work as you intend. I think it is highly doubtful that the ID column in your orders table refers to the customer id, which is what you/we are partitioning on. Depending on who set up your tables, id probably refers to the order id. If you have a customer_id (or account #, etc), then I would use that instead of id in the query.
No need to use row_number() in BigQuery for this:
SELECT billing_address.name, contact_email,
MIN(processed_at) as First_Purchase_Date,
SUM(total_price) as Total_Revenue,
ARRAY_AGG(o ORDER BY processed_at LIMIT 1) as first_order
FROM `table.orders` o
WHERE instance = 1
GROUP BY contact_email, billing_address.name
HAVING MIN(processed_at) >= '2019-01-01 00:00:00 UTC' AND
MIN(processed_at) < '2019-02-01 00:00:00 UTC' AND
SUM(total_price) > 100
ORDER BY SUM(total_price) desc;
This returns the entire first order as a struct. You can select specific columns, if you prefer.

ORACLE SQL: Find last minimum and maximum consecutive period

I have the sample data set below which list the water meters not working for specific reason for a certain range period (jan 2016 to december 2018).
I would like to have a query that retrieves the last maximum and minimum consecutive period where the meter was not working within that range of period.
any help will be greatly appreciated.
You have two options:
select code, to_char(min_period, 'yyyymm') min_period, to_char(max_period, 'yyyymm') max_period
from (
select code, min(period) min_period, max(period) max_period,
max(min(period)) over (partition by code) max_min_period
from (
select code, period, sum(flag) over (partition by code order by period) grp
from (
select code, period,
case when add_months(period, -1)
= lag(period) over (partition by code order by period)
then 0 else 1 end flag
from (select mrdg_acc_code code, to_date(mrdg_per_period, 'yyyymm') period from t)))
group by code, grp)
where min_period = max_min_period
Explanation:
flag rows where period is not equal previous period plus one month,
create column grp which sums flags consecutively,
group data using code and grp additionaly finding maximal start of period,
show only rows where min_period = max_min_period
Second option is recursive CTE available in Oracle 11g and above:
with
data(period, code) as (
select to_date(mrdg_per_period, 'yyyymm'), mrdg_acc_code from t
where mrdg_per_period between 201601 and 201812),
cte (period, code) as (
select to_char(period, 'yyyymm'), code from data
where (period, code) in (select max(period), code from data group by code)
union all
select to_char(data.period, 'yyyymm'), cte.code
from cte
join data on data.code = cte.code
and data.period = add_months(to_date(cte.period, 'yyyymm'), -1))
select code, min(period) min_period, max(period) max_period
from cte group by code
Explanation:
subquery data filters only rows from 2016 - 2018 additionaly converting period to date format. We need this for function add_months to work.
cte is recursive. Anchor finds starting rows, these with maximum period for each code. After union all is recursive member, which looks for the row one month older than current. If it finds it then net row, if not then stop.
final select groups data. Notice that period which were not consecutive were rejected by cte.
Though recursive queries are slower than traditional ones, there can be scenarios where second solution is better.
Here is the dbfiddle demo for both queries. Good luck.
use aggregate function with group by
select max(mdrg_per_period) mdrg_per_period, mrdg_acc_code,max(mrdg_date_read),rea_Desc,min(mdrg_per_period) not_working_as_from
from tablename
group by mrdg_acc_code,rea_Desc
This is a bit tricky. This is a gap-and-islands problem. To get all continuous periods, it will help if you have an enumeration of months. So, convert the period to a number of months and then subtract a sequence generated using row_number(). The difference is constant for a group of adjacent months.
This looks like:
select acc_code, min(period), max(period)
from (select t.*,
row_number() over (partition by acc_code order by period_num) as seqnum
from (select t.*, floor(period / 100) * 12 + mod(period, 100) as period_num
from t
) t
where rea_desc = 'METER NOT WORKING'
) t
group by (period_num - seqnum);
Then, if you want the last one for each account, you can use a subquery:
select t.*
from (select acc_code, min(period), max(period),
row_number() over (partition by acc_code order by max(period desc) as seqnum
from (select t.*,
row_number() over (partition by acc_code order by period_num) as seqnum
from (select t.*, floor(period / 100) * 12 + mod(period, 100) as period_num
from t
) t
where rea_desc = 'METER NOT WORKING'
) t
group by (period_num - seqnum)
) t
where seqnum = 1;

How to calculate daily average from aggregate results with SQL?

I'm working on outputting some data and I want to pull the daily average of some numbers.
As you can see, what I want to do is count the amount of rows received/results(think the row ID) and then divide it against the day value to make the daily average.(30/1) , (64/2) etc I've tried everything, but I keep running into a wall with this.
As it stands, I'm guessing to make this work a sub query of some sort is needed. I just don't know how to get the day(Row id 1,2,3,4 etc) to use for the division.
SELECT calendar_date, SUM(NY_dayscore * cAttendance)
FROM vw_Appointments
WHERE status = 'Confirmed'
Group by calendar_date
Attempted count with distinct, to no avail
SUM(NY_dayscore * cAttendance) ) / count(distinct calendar_date)
My original code is long and cba to post it all. So just attempting to post a small sample code to get guidance on the issue.
In SQL Server 2012+, you would use the cumulative average:
select calendar_date, sum(NY_dayscore * cAttendance),
avg(sum(NY_dayscore * cAttendance)) over (order by calendar_date) as running_average
from vw_appointments a
where status = 'Confirmed'
group by calendar_date
order by calendar_date;
In SQL Server 2008, this is more difficult:
with a as (
select calendar_date, sum(NY_dayscore * cAttendance) as showed
from vw_appointments a
where status = 'Confirmed'
group by calendar_date
)
select a.*, a2.running_average
from a outer apply
(select avg(showed) as running_average
from a a2
where a2.calendar_date <= a.calendar_date
) a2
order by calendar_date;
Is it ROW_NUMBER() that you are missing?
SELECT
calendar_date,
SUM(NY_dayscore * cAttendance) / (ROW_NUMBER() OVER (ORDER BY calendar_date ASC)) AS average
FROM vw_Appointments
WHERE status = 'Confirmed'
GROUP BY calendar_date
ORDER BY calendar_date
I think you need sum(showed) over (..)/row_number() over (..)
WITH Table1(date, showed) AS
(
SELECT '2019-01-02', 30 UNION ALL
SELECT '2019-01-03', 34 UNION ALL
SELECT '2019-01-03', 41 UNION ALL
SELECT '2019-01-04', 48
)
SELECT date,
sum(showed) over (order by date) /
row_number() over (order by date)
as daily_average
FROM Table1
GROUP BY showed, date;
date daily_average
2019-01-02 30
2019-01-03 52
2019-01-03 35
2019-01-04 38
Demo

SQL Aggregates OVER and PARTITION

All,
This is my first post on Stackoverflow, so go easy...
I am using SQL Server 2008.
I am fairly new to writing SQL queries, and I have a problem that I thought was pretty simple, but I've been fighting for 2 days. I have a set of data that looks like this:
UserId Duration(Seconds) Month
1 45 January
1 90 January
1 50 February
1 42 February
2 80 January
2 110 February
3 45 January
3 62 January
3 56 January
3 60 February
Now, what I want is to write a single query that gives me the average for a particular user and compares it against all user's average for that month. So the resulting dataset after a query for user #1 would look like this:
UserId Duration(seconds) OrganizationDuration(Seconds) Month
1 67.5 63 January
1 46 65.5 February
I've been batting around different subqueries and group by scenarios and nothing ever seems to work. Lately, I've been trying OVER and PARTITION BY, but with no success there either. My latest query looks like this:
select Userid,
AVG(duration) OVER () as OrgAverage,
AVG(duration) as UserAverage,
DATENAME(mm,MONTH(StartDate)) as Month
from table.name
where YEAR(StartDate)=2014
AND userid=119
GROUP BY MONTH(StartDate), UserId
This query bombs out with a "Duration' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause" error.
Please keep in mind I'm dealing with a very large amount of data. I think I can make it work with CASE statements, but I'm looking for a cleaner, more efficient way to write the query if possible.
Thank you!
You are joining two queries together here:
Per-User average per month
All Organisation average per month
If you are only going to return data for one user at a time then an inline select may give you joy:
SELECT AVG(a.duration) AS UserAvergage,
(SELECT AVG(b.Duration) FROM tbl b WHERE MONTH(b.StartDate) = MONTH(a.StartDate)) AS OrgAverage
...
FROM tbl a
WHERE userid = 119
GROUP BY MONTH(StartDate), UserId
Note - using comparison on MONTH may be slow - you may be better off having a CTE (Common Table Expression)
missing partition clause in Average function
OVER ( Partition by MONTH(StartDate))
Please try this. It works fine to me.
WITH C1
AS
(
SELECT
AVG(Duration) AS TotalAvg,
[Month]
FROM [dbo].[Test]
GROUP BY [Month]
),
C2
AS
(
SELECT Distinct UserID,
AVG(Duration) OVER(PARTITION BY UserID, [Month] ORDER BY UserID) AS DetailedAvg,
[Month]
FROM [dbo].[Test]
)
SELECT C2.*, C1.TotalAvg
FROM C2 c2
INNER JOIN C1 c1 ON c1.[Month] = c2.[Month]
ORDER BY c2.UserID, c2.[Month] desc;
I was able to get it done using a self join, There's probably a better way.
Select UserId, AVG(t1.Duration) as Duration, t2.duration as OrgDur, t1.Month
from #temp t1
inner join (Select Distinct MONTH, AVG(Duration) over (partition by Month) as duration
from #temp) t2 on t2.Month = t1.Month
group by t1.Month, t1.UserId, t2.Duration
order by t1.UserId, Month desc
Here's using a CTE which is probably a better solution and definitely easier to read
With MonthlyAverage
as
(
Select MONTH, AVG(Duration) as OrgDur
from #temp
group by Month
)
Select UserId, AVG(t1.Duration) as Duration, m.duration as OrgDur , t1.Month
from #temp t1
inner join MonthlyAverage m on m.Month = t1.Month
group by UserId, t1.Month, m.duration
You can try below with less code.
SELECT Distinct UserID,
AVG(Duration) OVER(PARTITION BY [Month]) AS TotalAvg,
AVG(Duration) OVER(PARTITION BY UserID, [Month] ORDER BY UserID) AS DetailedAvg,
[Month]
FROM [dbo].[Test]