How do I correctly use the SQL Sum function with multiple variables and grouping? - sql

I am trying to write an SQL statement based on the following code.
CREATE TABLE mytable (
year INTEGER,
month INTEGER,
day INTEGER,
hoursWorked INTEGER )
Assuming that each employee works multiple days over each month in a 3 year period.
I need to write an sql statement that returns the total hours worked in each month, grouped by earliest year/month first.
I tried doing this, but I don't think it is correct:
SELECT Sum(hoursWorked) FROM mytable
ORDER BY(year,month)
GROUP BY(month);
I am a little confused about how to operate the sum function in conjunction with thee GROUP BY or ORDER BY function. How does one go about doing this?

Try this:
SELECT year, month, SUM(hoursWorked)
FROM mytable
GROUP BY year, month
ORDER BY year, month
This way you will have for example:
2014 December 30
2015 January 12
2015 February 40
Fields you want to group by always have be present in SELECT part of query. And vice-versa - what you put in SELECT part, need be also in GROUP BY.

SELECT year, month, Sum(hoursWorked)as workedhours
FROM mytable
GROUP BY year,month
ORDER BY year,month;
You have to group by year and month.

Is this what you are trying to do. This will sum by Year/Month and Order by Year/Month.
Select [Year], [Month], Sum(HoursWorked) as WorkedHours
From mytable
Group By [Year], [Month]
Order by [Year], [Month]

You have to group by year and month, otherwise you will have the hours you worked on March 2014 and 2015 in one record :)
SELECT Sum(hoursWorked) as hoursWorked, year, month
FROM mytable
GROUP BY(year, month)
ORDER BY(year,month)
;

Related

Remove Duplicates and show Total sales by year and month

i am trying to work with this query to produce a list of all 11 years and 12 months within the years with the sales data for each month. Any suggestions? this is my query so far.
SELECT
distinct(extract(year from date)) as year
, sum(sale_dollars) as year_sales
from `project-1-349215.Dataset.sales`
group by date
it just creates a long list of over 2000 results when i am expecting 132 max one for each month in the years.
You should change your group by statement if you have more results than you expected.
You can try:
group by YEAR(date), MONTH(date)
or
group by EXTRACT(YEAR_MONTH FROM date)
A Grouping function is for takes a subsection of the date in your case year and moth and collect all rows that fit, and sum it up,
So a sĀ“GROUp BY date makes no sense, what so ever as you don't want the sum of every day
So make this
SELECT
extract(year from date) as year
,extract(MONTH from date) as month
, sum(sale_dollars) as year_sales
from `project-1-349215.Dataset.sales`
group by 1,2
Or you can combine both year and month
SELECT
extract(YEAR_MONTH from date) as year
, sum(sale_dollars) as year_sales
from `project-1-349215.Dataset.sales`
group by 1

Using () OVER or HAVING clause to get monthly aggregates of counts

I have a big dataset on ticket sales throughout a single year. The schema I am working with is:
ID
date_time_sale (Timestamp, yyyy-MM-dd hh-mm-ss)
weekday (varchar, Mon to Sun)
number_tickets (integer)
ticket_price (float)
total_price (float)
I am trying to get to get the weekday of every month of the year where the highest number of tickets was sold, so, for example, the output would be:
year
month
weekday
total_tickets
2015
01
SAT
5400
2015
02
SUN
4300
2015
03
SUN
6400
I tried using the following, but admittedly SQL is not my strongest skill:
SELECT DISTINCT EXTRACT(YEAR FROM date_time_sale) AS YEAR,
EXTRACT(MONTH FROM date_time_sale) AS MONTH,
week_day,
RANK () OVER (PARTITION BY YEAR, MOMTH ORDER BY count(week_day) ASC) weekday_count
from ticket_sales
order by YEAR, MONTH
But I keep running into errors. I tried using a HAVING clause, but I coludn't go anywhere. Any tip on how to effectively use the RANK () OVER (PARTITION BY) clause to get this output, please? Or do I need to use COUNT () OVER?
The analysis exception says:
`cannot resolve '`YEAR`' given input columns: [ticket_sales.YEAR, ticket_sales.MONTH, weekday]; line 1 pos 292;\n'Sort ['YEAR ASC NULLS FIRST, 'MONTH ASC NULLS FIRST], true\n+- Project [YEAR#342, MONTH#358
but then it is quite a long error.
Update:
So I tried this code:
SELECT DISTINCT year,
month,
week_day,
COUNT (week_day) OVER (PARTITION BY year, month, week_day) AS weekday_count
from ticket_sales
order by year, month, weekday_count DESC
And what that did is give the results of all week days in the for every months, so the output is 12*7 instead of 12 rows. Still ways to learn around this but at least I am somewhere.
Try this query and let me know if return the desire result:
I'm not sure if field name is number_tickets or total_tickets, I used number_tickets.
First I sum numbers tickets from year, month and week day, then return a row per year and month with the week's day in which more tickets were sold.
WITH total_by_day AS (SELECT EXTRACT(YEAR FROM date_time_sale) AS YEAR,
EXTRACT(MONTH FROM date_time_sale) AS MONTH,
week_day,
SUM(number_tickets) AS number_tickets
FROM ticket_sales
GROUP BY YEAR, MONTH, week_day)
SELECT DISTINCT
YEAR,
MONTH,
FIRST_VALUE(week_day) OVER (PARTITION BY YEAR, MONTH ORDER BY number_tickets DESC) AS week_day,
FIRST_VALUE(number_tickets) OVER (PARTITION BY YEAR, MONTH ORDER BY number_tickets DESC) AS total_tickets
FROM total_by_day
ORDER BY YEAR, MONTH;
In Postgresql database I got the desire result.

SQL - lag variable creation using window function

I have daily city level data with some counts. I have to aggregate this data at monthly level(1st day of each month) and then create lag variables based on last 1 week from 1st day of month.
I have used following code to create lag variables for last 1 month using (after aggregating data at monthly level ( with 1st date of month)
sum(count) over (partition by City order by month_date rows between 1 preceding and 1 preceding) as last_1_month_count
Is there a way to aggregate data at monthly level and create lag variables based on last 7,14,21,28 days using window function?
you can use this L
select
CITY
, month(Date)
, year(date)
, sum(count)
from table1
where date < Datediff(days , 7 , getdate())
group by
City
, month(Date)
, year(date)
I think you're looking for something like this. The first cte summarizes city counts to the day, week, month, year. The second summarizes the counts to the week, month, year. To group sales by weeks starting from the 1st day it uses the DAY function along with YEAR and MONTH. Since DAY returns and integer, groups of distinct weeks can be created by dividing by 7, i.e. DAY(day_dt)/7.
One way to get the prior week sales would be to join the week sales summary cte to itself where the week is offset by -1. Since the prior week might possible have 0 sales it seems safer to LEFT JOIN than to use LAG imo
with
day_sales_cte(city, day_dt, yr, mo, wk, sum_count) as (
select city, day_dt, year(day_dt), month(day_cte), day(day_dt)/7, sum([count]) sum_counts
from city_level_data
group by city, day_dt, year(day_dt), month(day_cte), day(day_dt)/7)
wk_sales_cte(city, yr, mo, wk, sum_count) as (
select city, yr, mo, wk, sum(sum_counts) sum_counts
from sales_cte
group by city, yr, mo, wk)
select ws.*, ws2.sum_sales prior_wk_sales
from wk_sales_cte ws
left join wk_sales_cte ws2 on ws.city=ws2.city
and ws.yr=ws2.yr
and ws.mo=ws2.mo
and ws.wk=ws.wk-1;

group by year month in postgresql

customer Date location
1 25Jan2018 texas
2 15Jan2018 texas
3 12Feb2018 Boston
4 19Mar2017 Boston.
I am trying to find out count of customers group by yearmon of Date column.Date column is of text data type
eg: In jan2018 ,the count is 2
I would do something like the following:
SELECT
date_part('year', formattedDate) as Year
,date_part('month', formattedDate) as Month
,count(*) as CustomerCountByYearMonth
FROM
(SELECT to_date(Date,'DDMonYYYY') as formattedDate from <table>) as tbl1
GROUP BY
date_part('year', formattedDate)
,date_part('month', formattedDate)
Any additional formatting for dates could be done on the inner query that will allow for adjustments in case some single digit days need to be padded or a month has four letters instead of three etc.
By converting to date type, you can properly order by date type and not alphabetical etc.
Optionally:
SELECT
Year
,Month
,count(*) as CustomerCountByYearMonth
FROM
(SELECT
date_part('year', to_date(Date,'DDMonYYYY')) as Year
,date_part('month', to_date(Date,'DDMonYYYY')) as Month
FROM <table>) as tbl1
GROUP BY
Year
,Month
You shouldn't store dates in a text column...
select substring(Date, length(Date)-6), count(*)
from tablename
group by substring(Date, length(Date)-6)
I thought #Jarlh asked a good question -- what about dates like January 1? Is it 01Jan2019 or 1Jan2019? If it can be either, perhaps a regex would work.
select
substring (date from '\d+(\D{3}\d{4})') as month,
count (distinct customer)
from t
group by month
The 'distinct customer' also presupposes you may have the same customer listed in the same month, but you only want to count it once. If that's not the case, just remove 'distinct.'
And, if you wanted the output in date format:
select
to_date (substring (date from '\d+(\D{3}\d{4})'), 'monyyyy') as month,
count (distinct customer)
from t
group by month
If it is a date column, you can truncate the date:
select date_trunc('month', date) as yyyymm, count(*)
from t
group by yyyymm
order by yyyymm;
I really read that the type was date. For a string, just use string functions:
select substr(date, 3, 7) as mmmyyyy, count(*)
from t
group by mmmyyyy;
Unfortunately, ordering doesn't work in this case. You should really be storing dates using the proper type.

Calculate totals of field based on current fiscal year only - SQL

I have seen many examples regarding calculating the sum of fields using the fiscal year, but I can not find one that fits my needs. What I am trying to do is get just the current fiscal year totals for a field using SQL Query. The fields I have is userid, startdate, total_hours, and missed_hours. Here is the query I have so far:
SELECT
userid,
SUM(total_hours) - SUM(missed_hours) AS hours
FROM mytable
GROUP BY userid
This works great, but all I need is the total number of hours for the current fiscal year for each of the userid's. Our fiscal year runs from July to June. I only need the current fiscal year and I need it to start over again this coming July.
Assuming this is SQLServer, try:
SELECT userid, SUM(total_hours) - SUM(missed_hours) AS hours
FROM mytable
WHERE startdate >= cast( cast(year(dateadd(MM,-6,getdate())) as varchar(4)) +
'-07-01' as date ) and
startdate < cast( cast(year(dateadd(MM, 6,getdate())) as varchar(4)) +
'-07-01' as date )
GROUP BY userid
Add a where clause:
FROM mytable
WHERE startdate >= '2011-07-01'
GROUP BY userid
Or with the start of the year dynamically:
where startdate >= dateadd(yy, datepart(yy, getdate())-2001, '2000-07-01')
Maybe something like this:
SELECT
userid,
SUM(total_hours) - SUM(missed_hours) AS hours
FROM
mytable
WHERE
MONTH(startdate) BETWEEN 6 AND 7
AND YEAR(startdate) IN (2011,2012)
GROUP BY userid
For the solution, two additional information is needed
the name of the date column
the vendor type of RDBMS you are using
I supposed your date column is date_col and you are using MySQL
SELECT
userid,
SUM(total_hours) - SUM(missed_hours) AS hours
FROM mytable
WHERE date_col between STR_TO_DATE('01,7,2011','%d,%m,%Y') and STR_TO_DATE('01,7,2010','%d,%m,%Y')
GROUP BY userid