Querying sales, grouped by weeks - sql

I'm trying to put together a query in SQLLite for my app and wanted to ask for a little help please.
The query I want to write:
"I'm a user with UserID x. For my personal work week, which can start on any day D that I want (Sunday to Saturday, Monday to Sunday, etc), I'd like to see which work week was my worst, which was my best, in terms of sales.
To keep it simple "worst" and "best" simply means the highest and lowest sales summed for that week.
If everyone's work week started on Sunday, this is easy, but this is not the case. I have to overcome a SQL challenge of grouping all of the rows in the database table not just by week, but a custom week (users define which day starts and ends the week).
As an example, if my work week starts on Sunday, then this past week, the week of May 28th is a Sunday and is the beginning of my work week (and it ends this Saturday June 3rd). I would follow this pattern for all records in the table.
However, a different user could have their work week start on Monday, May 29th, and end on Friday, June 2nd.
So this means for user 1, I'd want to group his rows from the starting day of Sunday, and the ending day of Saturday (and then aggregate them all, and take the first and last records for sales).
For user 2, however, I'd want to group his records in the date ranges of Monday to Sunday
Here is where I'm at so far. I'm close I think.
(Note that I store the date as a unix timestamp in milliseconds, hence the division by 1000 and the unixepoch part). The +d part is actually an integer based on the start of the day, but I haven't figured out what that number should really be. Just when I think I get it, it fails for someone else's day.
SELECT 'Date', SUM(Amount) 'Amount'
FROM Sales WHERE UserID = x
GROUP BY CAST(( julianday((datetime(CreationDate / 1000, 'unixepoch', 'localtime')) + d) / 7 ) AS INT)
Does anyone think they could give me a hand? :)
Thanks so much!
EDIT
Thank you so much for your help!
For the +d question (what should the value 'd' be to offset)?
Here is what I found after testing, and this works as far as I can tell. I understand Sqllite uses 0 as Sunday, 1 as Monday, etc, and I understand we were grouping and dividing by 7 (for 7 days in the week), but any idea why these would be the right values for 'd' as the offset? It seems to be working now. I see the pattern goes 2,1,0,6,5,4,3 but kinda strange order to go in eh?
if (day == Sunday) //if your work week starts on Sunday, d=2
return 2
else if (day == Monday)
return 1
else if (day == Tuesday)
return 0
else if (day == Wednesday)
return 6
else if (day == Thursday)
return 5
else if (day == Friday)
return 4
else if (day == Saturday)
return 3

You are very close.
You are adding d to the datetime. I don't know whether this actually adds days. I could not find out what happens if you add an integer to a datetime in SQLite. To play it save, add the day to the julian day instead. You don't have to first get a datetime and from this the julian day by the way, you can do that in one step:
julianday(CreationDate / 1000, 'unixepoch', 'localtime') + d
This is the only real flaw I see in your query.
the Julian day is a fractional number such as 2457907.5. When you invoke a division with / on it, you get a fractional result. I see that you convert this result to INT, but I would suggest to convert to INT first and only then divide which would make this an integer division implicitly.
cast(julianday(CreationDate / 1000, 'unixepoch', 'localtime') + d as int) / 7
This is just for readability; I get a day number (2457907 rather than some decimal 2457907.5) and integer-divide by 7 (e.g. 2457907 / 7 = 351129).
The whole query:
SELECT
MIN(DATE(CreationDate / 1000, 'unixepoch', 'localtime')) AS from_date,
MAX(DATE(CreationDate / 1000, 'unixepoch', 'localtime')) AS till_date,
SUM(Amount) AS total
FROM Sales
WHERE UserID = x
GROUP BY CAST(JULIANDAY(CreationDate / 1000, 'unixepoch', 'localtime') + d as INT) / 7
ORDER BY SUM(Amount);
from_date and till_date don't always represent the full seven days though, but only the worked days (e.g. in a week from Sunday to Saturday, but worked only Monday, Wednesday and Friday, it would show the dates for Monday and Friday). It would take slightly more work to show the real week. (I better don't try this now, for it's so easy to be one day off, when not being able to try the query.)
EDIT: Here is my try on the start and end days of the weeks. When we invoke DATE on a floating point value, this value is considered a Julian day. (Maybe it would work with the integer, too, I can not be sure from the documentation.)
SELECT
DATE(CAST(CAST(JULIANDAY(CreationDate / 1000, 'unixepoch', 'localtime') + d as INT) / 7 as REAL)) AS from_date,
DATE(CAST(CAST(JULIANDAY(CreationDate / 1000, 'unixepoch', 'localtime') + d as INT) / 7 as REAL), '+6 day') AS till_date,
MIN(DATE(CreationDate / 1000, 'unixepoch', 'localtime')) AS first_working_day,
MAX(DATE(CreationDate / 1000, 'unixepoch', 'localtime')) AS last_working_day,
SUM(Amount) AS total
FROM Sales
WHERE UserID = x
GROUP BY CAST(JULIANDAY(CreationDate / 1000, 'unixepoch', 'localtime') + d as INT) / 7
ORDER BY SUM(Amount);

Please try to execute below query:
select
min(to_char(to_date(order_date,'mm/dd/yyyy'),'Day'))
keep(dense_rank first order by sum(sales) desc) best_day,
min(to_char(to_date(order_date,'mm/dd/yyyy'),'Day'))
keep(dense_rank last order by sum(sales) desc)worst_day
from orders
where userid=x
group by to_char(to_date(order_date,'mm/dd/yyyy'),'Day');

Related

Working days between two dates in Snowflake

Is there any ways to calculate working days between two dates in snowflake without creating calendar table, only using "datediff" function
After doing research work on snowflake datediff function, I have found the following conclusions.
DATEDIFF(DAY/WEEK, START_DATE, END_DATE) will calculate difference, but the last date will be considered as END_DATE -1.
DATEDIFF(WEEK, START_DATE, END_DATE) will count number of Sundays between two dates.
By summarizing these two points, I have implemented the logic below.
SELECT
( DATEDIFF(DAY, START_DATE, DATEADD(DAY, 1, END_DATE))
- DATEDIFF(WEEK, START_DATE, DATEADD(DAY, 1, END_DATE))*2
- (CASE WHEN DAYNAME(START_DATE) != 'Sun' THEN 1 ELSE 0 END)
+ (CASE WHEN DAYNAME(END_DATE) != 'Sat' THEN 1 ELSE 0 END)
) AS WORKING_DAYS
Here's an article with a calendar table solution that also includes a UDF to solve this in Snowflake (the business days are hard-coded, so that does require some maintenance, but you don't have to maintain a calendar table at least):
https://medium.com/dandy-engineering-blog/how-to-calculate-the-number-of-working-hours-between-two-timestamps-in-sql-b5696de66e51
The best way to count the number of Sundays between two dates is possibly as follows:
CREATE OR REPLACE FUNCTION SUNDAYS_BETWEEN(a DATE,b DATE)
RETURNS INTEGER
AS $$
FLOOR( (DAYOFWEEKISO(a) + DATEDIFF('days',a,b)) / 7 ,0)
$$
The above is better than using DATEDIFF(WEEK because the output of that function changes if the WEEK_START session parameter is altered away from the legacy default of 0
I have a way to calculate the number of business hours that elapse between a start time and end time but it only works if you make the following assumptions.
Asssume only 1 time zone for all timestamps
Any start or end times that occur outside of business hours should be rounded to nearest business hour time. (I.e. Assuming a schedule of 10:00am - 6:00 pm, timestamps occurring from midnight to 9:59am should be rounded to 10am, times after 6:00pm should be set to the next day at 10:00am)
Timestamps that occur on the weekends should be set to the opening time of the next business day. (In this case Monday at 10:00am)
My model does not account for any holidays.
If these 4 conditions are met then the following code should be enough for a rough estimate of business hours elapsed.
(DATEDIFF(seconds, start_time, end_time) --accounts for the pure number of seconds in between the two dates
- (DATEDIFF(DAY, start_time,end_time) * 16 * 60*60) --For every day between the two dates, we need to subtract out X number of hours. Where X is the number of hours not worked in a day. (i.e. for a standard 8 hour work day, set X =16. For a 10 hour day, set X = 14, etc.) We multiple by (60*60*16) to convert days into seconds.
- (DATEDIFF(WEEK, businness_hours_wait_time_start_at_est, businness_hours_first_touch_at_est)*(8*2*60*60)) --This accounts for the fact that weekends are not work days. Which is why we need to subtract an additional 8 hours for Saturday and Sunday.
)/(60*60*8) --We then divide by 60*60*8 to convert the business seconds into business days. We use 8 hours here instead of 24 hours since our "business day" is only 8 hours long.

SQLite - Determine average sales made for each day of week

I am trying to produce a query in SQLite where I can determine the average sales made each weekday in the year.
As an example, I'd say like to say
"The average sales for Monday are $400.50 in 2017"
I have a sales table - each row represents a sale you made. You can have multiple sales for the same day. Columns that would be of interest here:
Id, SalesTotal, DayCreated, MonthCreated, YearCreated, CreationDate, PeriodOfTheDay
Day/Month/Year are integers that represent the day/month/year of the week. DateCreated is a unix timestamp that represents the date/time it was created too (and is obviously equal to day/month/year).
PeriodOfTheDay is 0, or 1 (day, or night). You can have multiple records for a given day (typically you can have at most 2 but some people like to add all of their sales in individually, so you could have 5 or more for a day).
Where I am stuck
Because you can have two records on the same day (i.e. a day sales, and a night sales, or multiple of each) I can't just group by day of the week (i.e. group all records by Saturday).
This is because the number of sales you made does not equal the number of days you worked (i.e. I could have worked 10 saturdays, but had 30 sales, so grouping by 'saturday' would produce 30 sales since 30 records exist for saturday (some just happen to share the same day)
Furthermore, if I group by daycreated,monthcreated,yearcreated it works in the sense it produces x rows (where x is the number of days you worked) however that now means I need to return this resultset to the back end and do a row count. I'd rather do this in the query so I can take the sales and divide it by the number of days you worked.
Would anyone be able to assist?
Thanks!
UPDATE
I think I got it - I would love someone to tell me if I'm right:
SELECT COUNT(DISTINCT CAST(( julianday((datetime(CreationDate / 1000, 'unixepoch', 'localtime'))) ) / 7 AS INT))
FROM Sales
WHERE strftime('%w', datetime(CreationDate / 1000, 'unixepoch'), 'localtime') = '6'
AND YearCreated = 2017
This would produce the number for saturday, and then I'd just put this in as an inner query, dividing the sale total by this number of days.
Buddy,
You can group your query by getting the day of week and week number of day created or creation date.
In MSSQL
DATEPART(WEEK,'2017-08-14') // Will give you week 33
DATEPART(WEEKDAY,'2017-08-14') // Will give you day 2
In MYSQL
WEEK('2017-08-14') // Will give you week 33
DAYOFWEEK('2017-08-14') // Will give you day 2
See this figures..
Day of Week
1-Sunday, 2- Monday, 3-Tuesday, 4-Wednesday, 5-Thursday, 6-Saturday
Week Number
1 - 53 Weeks in a year
This will be the key so that you will have a separate Saturday's in every month.
Hope this can help in building your query.

How to calculate ages in BigQuery?

I have two TIMESTAMP columns in my table: customer_birthday and purchase_date. I want to create a query to show the number of purchases by customer age, to create a chart.
But how do I calculate ages, in years, using BigQuery? In other words, how do I get the difference in years between two TIMESTAMPs? The age calculation cannot be made using days or hours, because of leap years, so the function DATEDIFF(<timestamp1>,<timestamp2>) is not appropriate.
Thanks.
First of all, I'd really love BigQuery to have a function which calculates current age based on a date. That seems to be like a very common use case and it's not really easy due to the whole leap year thing.
I found a great article about this issue: https://towardsdatascience.com/how-to-accurately-calculate-age-in-bigquery-999a8417e973
Their final approach is similar to Lars Haugseth's and Saad's answer, but they do not use the DAYOFYEAR part in order to avoid issues with leap years. It also gives you the flexibility not only to calculate the current age, but also the age at a particular date that you pass to the function as argument:
CREATE OR REPLACE FUNCTION workspace.age_calculation(as_of_date DATE, date_of_birth DATE)
AS (
DATE_DIFF(as_of_date,date_of_birth, YEAR) -
IF(EXTRACT(MONTH FROM date_of_birth)*100 + EXTRACT(DAY FROM date_of_birth) >
EXTRACT(MONTH FROM as_of_date)*100 + EXTRACT(DAY FROM as_of_date)
,1,0)
)
Regarding the difference between dates - you could consider user-defined functions (https://cloud.google.com/bigquery/user-defined-functions) with a JavaScript date library, such as Datejs or Moment.js
You can use DATE_DIFF to get the difference in years, but need to subtract by one if the birthday has not yet occured this year:
IF(EXTRACT(DAYOFYEAR FROM CURRENT_DATE) < EXTRACT(DAYOFYEAR FROM birthdate),
DATE_DIFF(CURRENT_DATE, birthdate, YEAR) - 1,
DATE_DIFF(CURRENT_DATE, birthdate, YEAR)) AS age
Here it is in a user defined function:
CREATE TEMP FUNCTION calculateAge(birthdate DATE) AS (
DATE_DIFF(CURRENT_DATE, birthdate, YEAR) +
IF(EXTRACT(DAYOFYEAR FROM CURRENT_DATE) < EXTRACT(DAYOFYEAR FROM birthdate), -1, 0) -- subtract 1 if bithdate has not yet occured this year
);
You can compute the number of days it would be if all years were 365 days long, take the difference, and divide by 365. For example:
SELECT (day2-day1)/365
FROM (
SELECT YEAR(t1) * 365 + DAYOFYEAR(t1) as day1,
YEAR(t2) * 365 + DAYOFYEAR(t2) as day2
FROM (
SELECT TIMESTAMP('20000201') as t1,
TIMESTAMP('20140201') as t2))
This returns 14.0, even though there are intervening leap years. If you want the final result as an integer instead of floating point, you can use the INTEGER() function to cast the result.
Note that if one of the dates is a leap day (feb 29) it will appear to be one year away from march 1, but I think this sounds like the intended behavior.
Another way to calculate age that takes leap years into account is to:
Calculate simple age based on difference in year
Either subtract 1 or not by:
Add difference in years to birthday (e.g. if today is 2022-12-14 and birthday is 2000-12-30, then the "new" birthday becomes 2022-12-30)
Do a DAY-based difference between today and "new" birthday, which either gives you a positive number (birthday passed for this year) or negative number (still has birthday this year)
Subtract 1 year from simple age calculation if number is negative
In BigQuery SQL code this looks like:
SELECT
bd AS birthday
,today
,DATE_DIFF(today, bd, YEAR) AS simpleAge
,DATE_DIFF(today, bd, YEAR) +
(CASE
WHEN DATE_DIFF(today, DATE_ADD(bd, INTERVAL DATE_DIFF(today, bd, YEAR) YEAR), DAY) >= 0
THEN 0
ELSE -1
END) AS age
FROM
(SELECT
PARSE_DATE("%Y-%m-%d", "2000-12-01") AS bd
,CURRENT_DATE("Asia/Tokyo") AS today
)
Outputs:
birthday
today
simpleAge
age
2000-12-30
2022-12-14
22
21

Group SQL results by week and specify "week-ending" day

I'm trying to select data grouped by week, which I have working, but I need to be able to specify a different day as the last day of the week. I think something needs to go near INTERVAL (6-weekday('datetime')) but not sure. This kind of SQL is above my pay-grade ($0) :P
SELECT
sum(`value`) AS `sum`,
DATE(adddate(`datetime`, INTERVAL (6-weekday(`datetime`)) DAY)) AS `dt`
FROM `values`
WHERE id = '123' AND DATETIME BETWEEN '2010-04-22' AND '2010-10-22'
GROUP BY `dt`
ORDER BY `datetime`
Thanks!
select
sum(value) as sum,
CASE WHEN (weekday(datetime)<=3) THEN date(datetime + INTERVAL (3-weekday(datetime)) DAY)
ELSE date(datetime + INTERVAL (3+7-weekday(datetime)) DAY)
END as dt
FROM values
WHERE id = '123' and DATETIME between '2010-04-22' AND '2010-10-22'
GROUP BY dt
ORDER BY datetime
This does look pretty evil but, this query will provide you with a sum of value grouped by a week ending on a Thursday (weekday() return of 3).
If you wish to change what day the end of the week is you just need to replace the 3's in the case statement, ie if you wanted Tuesday you would have it say
CASE WHEN (weekday(datetime)<=1) THEN date(datetime + INTERVAL (1-weekday(datetime)) DAY)
ELSE date(datetime + INTERVAL (1+7-weekday(datetime)) DAY)
I hope this helps.
Simple solution that I like. This will return the date for the start of the week assuming the week ends Sunday and starts Monday.
DATE(`datetime`) - INTERVAL WEEKDAY(`datetime`) AS `dt`
This can easily be adjusted to have a week ending on Thursday because Thursday is 3 days earlier than Sunday
DATE(`datetime`) - INTERVAL WEEKDAY(`datetime` + INTERVAL 3 DAY) AS `dt`
this returns for the start of the week that starts on Friday and ends on Thursday.
You can group on this no problem. If you want to use get the end of the week based on the start you do this
DATE(`datetime`) - INTERVAL -6 + WEEKDAY(`datetime` + INTERVAL 3 DAY) AS `dt`
I think you must choose between Sunday and Monday? When you can use DATE_FORMAT for grouping by string format of date, and use %v for grouping by Mondays and %v for grouping by Sundays.
SELECT
sum(`value`) AS `sum`,
DATE_FORMAT(`datetime`,'%v.%m.%Y') AS `dt`
FROM `values`
WHERE id = '123' AND DATETIME BETWEEN '2010-04-22' AND '2010-10-22'
GROUP BY DATE_FORMAT(`datetime`,'%v.%m.%Y')
ORDER BY `datetime`
How to use DATE_FORMAT
I don't remember the exact math, but you can get WEEKDAY to wrap around on different days of the week by adding or subtracting days to its argument. You'll need to tinker with different values of x and y in the expression:
x-weekday(adddate(`datetime`, INTERVAL y DAY))

SQL that list all birthdays within the next and previous 14 days

I have a MySQL member table, with a DOB field which stores all members' dates of birth in DATE format (Notice: it has the "Year" part)
I'm trying to find the correct SQL to:
List all birthdays within the next 14 days
and another query to:
List all birthdays within the previous 14 days
Directly comparing the current date by:
(DATEDIFF(DOB, now()) <= 14 and DATEDIFF(DOB, now()) >= 0)
will fetch nothing since the current year and the DOB year is different.
However, transforming the DOB to 'this year' won't work at all, because today could be Jan 1 and the candidate could have a DOB of Dec 31 (or vice versa)
It will be great if you can give a hand to help, many thanks! :)
#Eli had a good response, but hardcoding 351 makes it a little confusing and gets off by 1 during leap years.
This checks if birthday (dob) is within next 14 days. First check is if in same year. Second check is if its say Dec 27, you'll want to include Jan dates too.
With DAYOFYEAR( CONCAT(YEAR(NOW()),'-12-31') ), we are deciding whether to use 365 or 366 based on the current year (for leap year).
SELECT dob
FROM birthdays
WHERE DAYOFYEAR(dob) - DAYOFYEAR(NOW()) BETWEEN 0 AND 14
OR
DAYOFYEAR( CONCAT(YEAR(NOW()),'-12-31') ) - ( DAYOFYEAR(NOW()) - DAYOFYEAR(dob) ) BETWEEN 0 AND 14
Here's the simplest code to get the upcoming birthdays for the next x days and previous x days
this query is also not affected by leap-years
SELECT name, date_of_birty
FROM users
WHERE DATE(CONCAT(YEAR(CURDATE()), RIGHT(date_of_birty, 6)))
BETWEEN
DATE_SUB(CURDATE(), INTERVAL 14 DAY)
AND
DATE_ADD(CURDATE(), INTERVAL 14 DAY)
My first thought was it would be easy to just to use DAYOFYEAR and take the difference, but that actually gets kinda trick near the start/end of a yeay. However:
WHERE
DAYOFYEAR(NOW()) - DAYOFYEAR(dob) BETWEEN 0 AND 14
OR DAYOFYEAR(dob) - DAYOFYEAR(NOW()) > 351
Should work, depending on how much you care about leap years. A "better" answer would probably be to extract the DAY() and MONTH() from the dob and use MAKEDATE() to build a date in the current (or potential past/following) year and compare to that.
Easy,
We can obtain the nearer birthday (ie the birthday of this year) by this code:
dateadd(year,datediff(year,dob,getdate()),DOB)
use this in your compares ! it will work.
There are a number of options, I would first try to transform by number of years between current year and row's year (i.e. Add their age).
Another option is day number within the year (but then you have still to worry about the rollover arithmetic or modulo).
This is my query for the 30 days before check:
select id from users where
((TO_DAYS(concat(DATE_FORMAT(NOW(),'%Y'), '-', DATE_FORMAT(date_of_birth, '%m-%d')))-TO_DAYS(NOW()))>=-30
AND (TO_DAYS(concat(DATE_FORMAT(NOW(),'%Y'), '-', DATE_FORMAT(date_of_birth, '%m-%d')))-TO_DAYS(NOW()))<=0)
OR (TO_DAYS(concat(DATE_FORMAT(NOW(),'%Y'), '-', DATE_FORMAT(date_of_birth, '%m-%d')))-TO_DAYS(NOW()))>=(365-31)
and 30 days after:
select id from users where
((TO_DAYS(NOW())-TO_DAYS(concat(DATE_FORMAT(NOW(),'%Y'), '-', DATE_FORMAT(date_of_birth, '%m-%d'))))>=-31
AND (TO_DAYS(NOW())-TO_DAYS(concat(DATE_FORMAT(NOW(),'%Y'), '-', DATE_FORMAT(date_of_birth, '%m-%d'))))<=0)
OR (TO_DAYS(NOW())-TO_DAYS(concat(DATE_FORMAT(NOW(),'%Y'), '-', DATE_FORMAT(date_of_birth, '%m-%d'))))>=(365-30)
My solution is as follow:
select cm.id from users cm where
date(concat(
year(curdate()) - (year(subdate(curdate(), 14)) < year(curdate())
and month(curdate()) < month(cm.birthday)) + (year(adddate(curdate(), 14)) > year(curdate())
and month(curdate()) > month(cm.birthday)), date_format(cm.birthday, '-%m-%d'))) between subdate(curdate(), 14)
and adddate(curdate(), 14);
It looks like it works fine when the period captures the current and next year or the current and previous year