I have a big dataset on ticket sales throughout a single year. The schema I am working with is:
ID
date_time_sale (Timestamp, yyyy-MM-dd hh-mm-ss)
weekday (varchar, Mon to Sun)
number_tickets (integer)
ticket_price (float)
total_price (float)
I am trying to get to get the weekday of every month of the year where the highest number of tickets was sold, so, for example, the output would be:
year
month
weekday
total_tickets
2015
01
SAT
5400
2015
02
SUN
4300
2015
03
SUN
6400
I tried using the following, but admittedly SQL is not my strongest skill:
SELECT DISTINCT EXTRACT(YEAR FROM date_time_sale) AS YEAR,
EXTRACT(MONTH FROM date_time_sale) AS MONTH,
week_day,
RANK () OVER (PARTITION BY YEAR, MOMTH ORDER BY count(week_day) ASC) weekday_count
from ticket_sales
order by YEAR, MONTH
But I keep running into errors. I tried using a HAVING clause, but I coludn't go anywhere. Any tip on how to effectively use the RANK () OVER (PARTITION BY) clause to get this output, please? Or do I need to use COUNT () OVER?
The analysis exception says:
`cannot resolve '`YEAR`' given input columns: [ticket_sales.YEAR, ticket_sales.MONTH, weekday]; line 1 pos 292;\n'Sort ['YEAR ASC NULLS FIRST, 'MONTH ASC NULLS FIRST], true\n+- Project [YEAR#342, MONTH#358
but then it is quite a long error.
Update:
So I tried this code:
SELECT DISTINCT year,
month,
week_day,
COUNT (week_day) OVER (PARTITION BY year, month, week_day) AS weekday_count
from ticket_sales
order by year, month, weekday_count DESC
And what that did is give the results of all week days in the for every months, so the output is 12*7 instead of 12 rows. Still ways to learn around this but at least I am somewhere.
Try this query and let me know if return the desire result:
I'm not sure if field name is number_tickets or total_tickets, I used number_tickets.
First I sum numbers tickets from year, month and week day, then return a row per year and month with the week's day in which more tickets were sold.
WITH total_by_day AS (SELECT EXTRACT(YEAR FROM date_time_sale) AS YEAR,
EXTRACT(MONTH FROM date_time_sale) AS MONTH,
week_day,
SUM(number_tickets) AS number_tickets
FROM ticket_sales
GROUP BY YEAR, MONTH, week_day)
SELECT DISTINCT
YEAR,
MONTH,
FIRST_VALUE(week_day) OVER (PARTITION BY YEAR, MONTH ORDER BY number_tickets DESC) AS week_day,
FIRST_VALUE(number_tickets) OVER (PARTITION BY YEAR, MONTH ORDER BY number_tickets DESC) AS total_tickets
FROM total_by_day
ORDER BY YEAR, MONTH;
In Postgresql database I got the desire result.
Related
i am trying to work with this query to produce a list of all 11 years and 12 months within the years with the sales data for each month. Any suggestions? this is my query so far.
SELECT
distinct(extract(year from date)) as year
, sum(sale_dollars) as year_sales
from `project-1-349215.Dataset.sales`
group by date
it just creates a long list of over 2000 results when i am expecting 132 max one for each month in the years.
You should change your group by statement if you have more results than you expected.
You can try:
group by YEAR(date), MONTH(date)
or
group by EXTRACT(YEAR_MONTH FROM date)
A Grouping function is for takes a subsection of the date in your case year and moth and collect all rows that fit, and sum it up,
So a sĀ“GROUp BY date makes no sense, what so ever as you don't want the sum of every day
So make this
SELECT
extract(year from date) as year
,extract(MONTH from date) as month
, sum(sale_dollars) as year_sales
from `project-1-349215.Dataset.sales`
group by 1,2
Or you can combine both year and month
SELECT
extract(YEAR_MONTH from date) as year
, sum(sale_dollars) as year_sales
from `project-1-349215.Dataset.sales`
group by 1
This is the initial query that groups, sums up, and orders the busiest day of the week per month and year for a small retail store:
SELECT year, month, day_of_week, SUM(total_revenue)
FROM vip_sales
GROUP BY year, month, day_of_week
ORDER BY year, month, SUM DESC
and returns the table in attached image. And that is what I want to see INITIALLY.
Now I want to do a query on this result (image) that only shows the MAX sums of each month - essentially ONLY the rows that I circled, which is the best day (highest SUM) is each of the months of January (1) , February(2), ...
I tried the following:
SELECT year, month, day_of_week, MAX(SUM(total_revenue))
FROM vip_sales
GROUP BY year, month, day_of_week
ORDER BY year, month
But I got this error:
ERROR: aggregate function calls cannot be nested
LINE 1: SELECT year, month, day_of_week, MAX(SUM(total_revenue))
^
SQL state: 42803
Character: 38
Then I tried:
SELECT year, month, day_of_week, MAX(SUM)
FROM
(SELECT year, month, day_of_week, SUM(total_revenue)
FROM vip_sales
GROUP BY year, month, day_of_week
ORDER BY year, month, SUM DESC)
ORDER BY year, month
And I got another error with hint:
ERROR: subquery in FROM must have an alias
LINE 3: (SELECT year, month, day_of_week, SUM(total_revenue)
^
HINT: For example, FROM (SELECT ...) [AS] foo.
SQL state: 42601
Character: 51
So then I tried:
SELECT year, month, day_of_week, MAX(SUM)
FROM
(SELECT year, month, day_of_week, SUM(total_revenue)
FROM vip_sales
GROUP BY year, month, day_of_week
ORDER BY year, month, SUM DESC) AS foo
GROUP BY foo.year, foo.month, foo.day_of_week
ORDER BY foo.year, foo.month, MAX DESC
AND
SELECT foo.year, foo.month, foo.day_of_week, MAX(foo.SUM)
FROM
(SELECT year, month, day_of_week, SUM(total_revenue)
FROM vip_sales
GROUP BY year, month, day_of_week
ORDER BY year, month, SUM DESC) AS foo
GROUP BY foo.year, foo.month, foo.day_of_week
ORDER BY foo.year, foo.month, MAX DESC
But they are redundant and both return the SAME results as in the image - all days of the week in that month, and NOT the day of the week which is the day with maximum sales in that month in that year.
I googled 'nested queries' and 'sub queries" but I tried some techniques but got errors with no hints. I am not finding anything that logically explains how to do SUM and then query the MAXIMUM of the SUMs.
Any suggestions?
You can use ROW_NUMBER() to create a custom partition
SELECT year, month, day, thesum
FROM (
SELECT year, month, day, thesum,
ROW_NUMBER() OVER (PARTITION BY year, month ORDER BY thesum DESC) RN
FROM (
SELECT year, month, day_of_week, SUM(total_revenue) as thesum
FROM vip_sales
GROUP BY year, month, day_of_week
--ORDER BY year, month, SUM DESC
) x
) y
WHERE RN = 1
I am exploring a dataset in Microsoft SQL Server Management, regarding sales.
I want to obtain the day with the highest number of items sold for each year, therefore a table like this (the values in the rows are totally random):
Year
Purchase Day
Max_Daily_Sales
2011
2011-11-12
48
2012
2012-12-22
123
I first tried to run this query:
WITH CTE_DailySales AS
(
SELECT DISTINCT
Purchase_Day,
Year,
SUM(Order_Quantity) OVER (PARTITION BY Purchase_Day, Year) AS Daily_Quantity_Sold
FROM
[sql_cleaning].[dbo].[Sales$]
)
SELECT
Year, MAX(Daily_Quantity_Sold) AS Max_Daily_Sales
FROM
CTE_DailySales
GROUP BY
Year
ORDER BY
Year
It partially works since it gives me the highest quantity of items sold in a day for each year. However, I would also like to specify what day of the year it was.
If I try to write Purchase_Day in the Select statement, it returns the max for each day, not the single day with the highest number of items sold.
How could I resolve this problem?
I hope I've been clear enough and thanks you all for your help
I suggest you use ROW_NUMBER to get you max value, your query would be:
WITH CTE_DailySales AS
(
SELECT Purchase_Day,
Year,
SUM(Order_Quantity) Daily_Quantity_Sold,
ROW_NUMBER() OVER(PARTITION BY Year ORDER BY SUM(Order_Quantity) DESC) as rn
FROM
[sql_cleaning].[dbo].[Sales$]
GROUP BY Purchase_Day,
Year
)
SELECT
*
FROM
CTE_DailySales
WHERE rn = 1
Simply :
SELECT Purchase_Day,
Year,
SUM(Order_Quantity) OVER(PARTITION BY Purchase_Day, Year) AS Daily_Quantity_Sold,
MAX(SUM(Order_Quantity)) OVER(PARTITION BY Purchase_Day, Year) AS MAX_QTY_YEAR
FROM [sql_cleaning].[dbo].[Sales$];
I have daily city level data with some counts. I have to aggregate this data at monthly level(1st day of each month) and then create lag variables based on last 1 week from 1st day of month.
I have used following code to create lag variables for last 1 month using (after aggregating data at monthly level ( with 1st date of month)
sum(count) over (partition by City order by month_date rows between 1 preceding and 1 preceding) as last_1_month_count
Is there a way to aggregate data at monthly level and create lag variables based on last 7,14,21,28 days using window function?
you can use this L
select
CITY
, month(Date)
, year(date)
, sum(count)
from table1
where date < Datediff(days , 7 , getdate())
group by
City
, month(Date)
, year(date)
I think you're looking for something like this. The first cte summarizes city counts to the day, week, month, year. The second summarizes the counts to the week, month, year. To group sales by weeks starting from the 1st day it uses the DAY function along with YEAR and MONTH. Since DAY returns and integer, groups of distinct weeks can be created by dividing by 7, i.e. DAY(day_dt)/7.
One way to get the prior week sales would be to join the week sales summary cte to itself where the week is offset by -1. Since the prior week might possible have 0 sales it seems safer to LEFT JOIN than to use LAG imo
with
day_sales_cte(city, day_dt, yr, mo, wk, sum_count) as (
select city, day_dt, year(day_dt), month(day_cte), day(day_dt)/7, sum([count]) sum_counts
from city_level_data
group by city, day_dt, year(day_dt), month(day_cte), day(day_dt)/7)
wk_sales_cte(city, yr, mo, wk, sum_count) as (
select city, yr, mo, wk, sum(sum_counts) sum_counts
from sales_cte
group by city, yr, mo, wk)
select ws.*, ws2.sum_sales prior_wk_sales
from wk_sales_cte ws
left join wk_sales_cte ws2 on ws.city=ws2.city
and ws.yr=ws2.yr
and ws.mo=ws2.mo
and ws.wk=ws.wk-1;
I am trying to write an SQL statement based on the following code.
CREATE TABLE mytable (
year INTEGER,
month INTEGER,
day INTEGER,
hoursWorked INTEGER )
Assuming that each employee works multiple days over each month in a 3 year period.
I need to write an sql statement that returns the total hours worked in each month, grouped by earliest year/month first.
I tried doing this, but I don't think it is correct:
SELECT Sum(hoursWorked) FROM mytable
ORDER BY(year,month)
GROUP BY(month);
I am a little confused about how to operate the sum function in conjunction with thee GROUP BY or ORDER BY function. How does one go about doing this?
Try this:
SELECT year, month, SUM(hoursWorked)
FROM mytable
GROUP BY year, month
ORDER BY year, month
This way you will have for example:
2014 December 30
2015 January 12
2015 February 40
Fields you want to group by always have be present in SELECT part of query. And vice-versa - what you put in SELECT part, need be also in GROUP BY.
SELECT year, month, Sum(hoursWorked)as workedhours
FROM mytable
GROUP BY year,month
ORDER BY year,month;
You have to group by year and month.
Is this what you are trying to do. This will sum by Year/Month and Order by Year/Month.
Select [Year], [Month], Sum(HoursWorked) as WorkedHours
From mytable
Group By [Year], [Month]
Order by [Year], [Month]
You have to group by year and month, otherwise you will have the hours you worked on March 2014 and 2015 in one record :)
SELECT Sum(hoursWorked) as hoursWorked, year, month
FROM mytable
GROUP BY(year, month)
ORDER BY(year,month)
;