Get the latest full week's data for analysis in SQL - sql

I was given sales data, where I have items and sales on a particular date. Now, the company wants to analyze the latest full week’s data against the total sales of company.
Item date Sales
Apple 08/25/2020 10
Orange 08/24/2020 20
Orange 08/21/2020 30
Now the full week is defined by a complete week from Sunday-Saturday. In the above made up example, it is clear that, these two data Apple 08/25/2020 10 Orange 08/24/2020 20 are from days Friday and Thursday respectively, so it is not a full week, hence we cannot take this week’s data. We need to check the last week’s data which would be for 08/21/2020
I was given 10 minutes to think on this, my immediate solution was, find the weekday number for the maximum data in the table. And subtract it from 7. If that is equal to 0 then we have a full week, and we can take the max date as the end date of our analysis and use a dateadd() to subtract 7 days from the max date to make it a start date. If I have something other than the 0, for example 6, then I use dateadd to go 6 days prior to my max date and use it as end date, again go 7 days behind this and get the start date.
CREATE TABLE SALES(Item nvarchar(10), dates date, Sales Numeric)
INSERT INTO SALES VALUES('Apple',CAST('08/25/2020' AS DATE),10),
('Orange',CAST('08/24/2020' AS DATE),20),
('Orange',CAST('08/21/2020' AS DATE),30)
WITH end_dates AS
(
SELECT CASE WHEN 7-DATEPART(dw, max(dates))=0 THEN max(dates)
ELSE DATEADD(day,- DATEPART(dw, max(dates)),max(dates)) END AS end_date
FROM SALES
),
Full_Week_Date AS
(
SELECT DATEADD(day,-6,end_date) as start_date ,end_date FROM end_dates
)
SELECT (SELECT SUM(SALES.sales)*100 FROM SALES JOIN Full_Week_Date ON(dates BETWEEN start_date AND end_date))/(SELECT SUM(SALES.sales) FROM SALES) AS revenue_per
This is the best I could think of, but the interviewer said, given a large amount of data, this would run like forever. What would be an optimum solution for this problem? I only want to know, how to get start and end date of the week that I want to analyze. Rest the revenue and % revenue will be fairly easy I believe if I have this in place.

In an actual database the query to use will depend on indexes and other things. For a basic answer, there are a few things to consider here.
They state "a complete week from Sunday-Saturday". Unless you are reporting at 11:59PM on Saturday you never will really have that full weeks sales in the same week. Since that is the case there is no reason to do all the checks you mentioned. They will cause unnecessary processing.
One thing you didn't mention is if the total company sales included the week your sales you are checking are for. I am going to assume they want to exclude that weeks sales.
I am not going to claim this is the most efficient way, but I would do it like this.
INSERT INTO #Sales
VALUES
('Apple', '08/25/2020', 10),
('Orange', '08/24/2020', 20),
('Orange', '08/21/2020', 30),
('Apple', '11/14/2020', 25);
-- Get week to check (last week)
DECLARE #curWeek int = DATEPART(WW, DATEADD(wk, -1, GETDATE()));
-- Get Sales
SELECT
SUM(COALESCE(SalesAmt, 0)) AS CompanySales
, (SELECT SUM(COALESCE(SalesAmt, 0)) FROM #Sales WHERE DATEPART(WW, SalesDate) = #curWeek) AS WeekSales
FROM #Sales
WHERE DATEPART(WW, SalesDate) <= #curWeek;

Related

Generate custom start and end of months in SQL Server

I'm facing an issue while working with custom dates in T-SQL, we have a client that works with a different methodology of start and end of his month, instead of the default day 01 to start the month and ending in 31, 30 or 29, it's month start at day 26 and ends at 25 of the next month.
E.g., how usually is:
select sum(sales) as sales
from sales
where salesDate between '2020-09-01' and '2020-09-30'
-- or left(salesDate,7) = '2020-09'
Client custom month:
select sum(sales) as sales
from sales
where salesDate between '2020-08-26' and '2020-09-25' -- for setember
So, for this need, I have to calculate how many sales this client did from january until now, month per month, with this custom way... how can I do that?
Example of the query result I want to perform with this month determination:
This is a pretty awful situation. One method is to construct the first date of the month based on the rules:
select mon, sum(sales) as sales
from sales s cross apply
(values (case when day(salesdate) >= 26
then dateadd(month, 1, datefromparts(year(salesdate), month(salesdate), 1))
else datefromparts(year(salesdate), month(salesdate), 1)
) v(mon)
where v.mon >= '2020-01-01'
group by v.mon;
I would recommend adding the fiscal month column as a persisted computed column in the salesDate table so you can add an index and not have to worry about the computation.
Or, better yet, add a calendar table where you can look up the fiscal month for any (reasonable) date.

Count distinct customers, active within a year, for every week of the year

I am working with an existing E-commerce database. Actually, this process is usually done in Excel, but we want to try it directly with a query in PostgreSQL (version 10.6).
We define as an active customer a person who has bought at least once within 1 year. This means, if I analyze week 22 in 2020, an active customer will be the one that has bought at least once since week 22, 2019.
I want the output for each week of the year (2020). Basically what I need is ...
select
email,
orderdate,
id
from
orders_table
where
paid = true;
|---------------------|-------------------|-----------------|
| email | orderdate | id |
|---------------------|-------------------|-----------------|
| email1#email.com |2020-06-02 05:04:32| Order-2736 |
|---------------------|-------------------|-----------------|
I can't create new tables. And I would like to see the output like this:
Year| Week | Active customers
2020| 25 | 6978
2020| 24 | 3948
depending on whether there is a year and week column you can use a OVER (PARTITION BY ...) with extract:
SELECT
extract(year from orderdate),
extract(week from orderdate),
sum(1) as customer_count_in_week,
OVER (PARTITION BY extract(YEAR FROM TIMESTAMP orderdate),
extract(WEEK FROM TIMESTAMP orderdate))
FROM ordertable
WHERE paid=true;
Which should bucket all orders by year and week, thus showing the total count per week in a year where paid is true.
references:
https://www.postgresql.org/docs/9.1/tutorial-window.html
https://www.postgresql.org/docs/8.1/functions-datetime.html
if I analyze week 22 in 2020, an active customer will be the one that has bought at least once since week 22, 2019.
Problems on your side
This method has some corner case ambiguities / issues:
Do you include or exclude "week 22 in 2020"? (I exclude it below to stay closer to "a year".)
A year can have 52 or 53 full weeks. Depending on the current date, the calculation is based on 52 or 53 weeks, causing a possible bias of almost 2 %!
If you start the time range on "the same date last year", then the margin of error is only 1 / 365 or ~ 0.3 %, due to leap years.
A fixed "period of 365 days" (or 366) would eliminate the bias altogether.
Problems on the SQL side
Unfortunately, window functions do not currently allow the DISTINCT key word (for good reasons). So something of the form:
SELECT count(DISTINCT email) OVER (ORDER BY year, week
GROUPS BETWEEN 52 PRECEDING AND 1 PRECEDING)
FROM ...
.. triggers:
ERROR: DISTINCT is not implemented for window functions
The GROUPS keyword has only been added in Postgres 10 and would otherwise be just what we need.
What's more, your odd frame definition wouldn't even work exactly, since the number of weeks to consider is not always 52, as discussed above.
So we have to roll our own.
Solution
The following simply generates all weeks of interest, and computes the distinct count of customers for each. Simple, except that date math is never entirely simple. But, depending on details of your setup, there may be faster solutions. (I had several other ideas.)
The time range for which to report may change. Here is an auxiliary function to generate weeks of a given year:
CREATE OR REPLACE FUNCTION f_weeks_of_year(_year int)
RETURNS TABLE(year int, week int, week_start timestamp)
LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE
ROWS 52 COST 10 AS
$func$
SELECT _year, d.week::int, d.week_start
FROM generate_series(date_trunc('week', make_date(_year, 01, 04)::timestamp) -- first day of first week
, LEAST(date_trunc('week', localtimestamp), make_date(_year, 12, 28)::timestamp) -- latest possible start of week
, interval '1 week') WITH ORDINALITY d(week_start, week)
$func$;
Call:
SELECT * FROM f_weeks_of_year(2020);
It returns 1 row per week, but stops at the current week for the current year. (Empty set for future years.)
The calculation is based on these facts:
The first ISO week of the year always contains January 04.
The last ISO week cannot start after December 28.
Actual week numbers are computed on the fly using WITH ORDINALITY. See:
PostgreSQL unnest() with element number
Aside, I stick to timestamp and avoid timestamptz for this purpose. See:
Generating time series between two dates in PostgreSQL
The function also returns the timestamp of the start of the week (week_start), which we don't need for the problem at hand. But I left it in to make the function more useful in general.
Makes the main query simpler:
WITH weekly_customer AS (
SELECT DISTINCT
EXTRACT(YEAR FROM orderdate)::int AS year
, EXTRACT(WEEK FROM orderdate)::int AS week
, email
FROM orders_table
WHERE paid
AND orderdate >= date_trunc('week', timestamp '2019-01-04') -- max range for 2020!
ORDER BY 1, 2, 3 -- optional, might improve performance
)
SELECT d.year, d.week
, (SELECT count(DISTINCT email)
FROM weekly_customer w
WHERE (w.year, w.week) >= (d.year - 1, d.week) -- row values, see below
AND (w.year, w.week) < (d.year , d.week) -- exclude current week
) AS active_customers
FROM f_weeks_of_year(2020) d; -- (year int, week int, week_start timestamp)
db<>fiddle here
The CTE weekly_customer folds to unique customers per calendar week once, as duplicate entries are just noise for our calculation. It's used many times in the main query. The cut-off condition is based on Jan 04 once more. Adjust to your actual reporting period.
The actual count is done with a lowly correlated subquery. Could be a LEFT JOIN LATERAL ... ON true instead. See:
What is the difference between LATERAL and a subquery in PostgreSQL?
Using row value comparison to make the range definition simple. See:
SQL syntax term for 'WHERE (col1, col2) < (val1, val2)'

SQLite - Determine average sales made for each day of week

I am trying to produce a query in SQLite where I can determine the average sales made each weekday in the year.
As an example, I'd say like to say
"The average sales for Monday are $400.50 in 2017"
I have a sales table - each row represents a sale you made. You can have multiple sales for the same day. Columns that would be of interest here:
Id, SalesTotal, DayCreated, MonthCreated, YearCreated, CreationDate, PeriodOfTheDay
Day/Month/Year are integers that represent the day/month/year of the week. DateCreated is a unix timestamp that represents the date/time it was created too (and is obviously equal to day/month/year).
PeriodOfTheDay is 0, or 1 (day, or night). You can have multiple records for a given day (typically you can have at most 2 but some people like to add all of their sales in individually, so you could have 5 or more for a day).
Where I am stuck
Because you can have two records on the same day (i.e. a day sales, and a night sales, or multiple of each) I can't just group by day of the week (i.e. group all records by Saturday).
This is because the number of sales you made does not equal the number of days you worked (i.e. I could have worked 10 saturdays, but had 30 sales, so grouping by 'saturday' would produce 30 sales since 30 records exist for saturday (some just happen to share the same day)
Furthermore, if I group by daycreated,monthcreated,yearcreated it works in the sense it produces x rows (where x is the number of days you worked) however that now means I need to return this resultset to the back end and do a row count. I'd rather do this in the query so I can take the sales and divide it by the number of days you worked.
Would anyone be able to assist?
Thanks!
UPDATE
I think I got it - I would love someone to tell me if I'm right:
SELECT COUNT(DISTINCT CAST(( julianday((datetime(CreationDate / 1000, 'unixepoch', 'localtime'))) ) / 7 AS INT))
FROM Sales
WHERE strftime('%w', datetime(CreationDate / 1000, 'unixepoch'), 'localtime') = '6'
AND YearCreated = 2017
This would produce the number for saturday, and then I'd just put this in as an inner query, dividing the sale total by this number of days.
Buddy,
You can group your query by getting the day of week and week number of day created or creation date.
In MSSQL
DATEPART(WEEK,'2017-08-14') // Will give you week 33
DATEPART(WEEKDAY,'2017-08-14') // Will give you day 2
In MYSQL
WEEK('2017-08-14') // Will give you week 33
DAYOFWEEK('2017-08-14') // Will give you day 2
See this figures..
Day of Week
1-Sunday, 2- Monday, 3-Tuesday, 4-Wednesday, 5-Thursday, 6-Saturday
Week Number
1 - 53 Weeks in a year
This will be the key so that you will have a separate Saturday's in every month.
Hope this can help in building your query.

Sales Grouped by Week of the year

I have a requirement to output the number sales in a year to date in weekly format where Monday is the first day of the week and Sunday is the last.
The table structure is as follows.
SalesId | Representative | DateOfSale.
Below is what I have tried but it doesn't seem to give me the correct result. The counts don't seem to add up for a given week. The Sunday results are not included in the correct week. I am thinking it has something to do with the date not including 11:59:59.999 for the last day of the week.
SELECT DATEADD(wk, DATEDIFF(wk, 6, Sales.DateOfSale), 6) as [Week Ending], count(SalesID) as Sales,
count(distinct(representative)) as Agents, count(SalesID) / count(distinct(representative)) as SPA
FROM Sales
where DateOfSale >= DATEADD(yy, DATEDIFF(yy,0,getdate()), 0)
GROUP BY DATEADD(wk, DATEDIFF(wk, 6, Sales.DateOfSale), 6)
ORDER BY DATEADD(wk, DATEDIFF(wk, 6, Sales.DateOfSale), 6)
I am hoping to have something like this:
Week Ending | Sales
01/05/2014 | 5
01/12/2014 | 8
01/19/2014 | 11
01/26/2014 | 14
Please excuse the formatting of the table above. I couldn't seem to figure out how to create a pipe/newline based table using the editor.
~Nick
I suggest creating a table or table parameter that has all of your calendar information. In this case, it would need at minimum the column WeekEnding.
For example
DECLARE #MyCalendar TABLE
(
WeekEnding date
);
Populate this with your valid WeekEnding dates. I might also make parameters to limit the amount of sales data, e.g. #BeginDate and #EndDate.
If you join using "<=" on the week ending date, then I believe you will get the return you want:
SELECT
MyCalendar.WeekEnding,
COUNT(Sales.SalesId) Sales,
COUNT(DISTINCT Sales.Representative) Agents,
CAST(COUNT(Sales.SalesId) AS float) / CAST(COUNT(DISTINCT Sales.Representative) AS float) Spa
FROM
Sales
INNER JOIN
#MyCalendar MyCalendar
ON
Sales.DateOfSale <= MyCalendar.WeekEnding
WHERE
Sales.DateOfSale BETWEEN #BeginDate AND #EndDate
GROUP BY
MyCalendar.WeekEnding;
I am assuming you are using SQL 2012, but I believe this will work in 2008 too. I might point out two other things. First, consider your data type when dividing the COUNT of SalesId by the distinct count of Representative. You may not get the return you expect, and that is why I cast as float. Second, you apply count distinct slightly differently than what I use; the extra parenthesis are not needed.
I have a simplified version in SQL Fiddle.

T-SQL absence by month from start date end date

I have an interesting query to do and am trying to find the best way to do it. Basically I have an absence table in our personnel database this records the staff id and then a start date and end date for the absence. End date being null if not yet entered (not returned). I cannot change the design.
They would like a report by month on number of absences (12 month trend). With staff being off over the month change it obviously may be difficult to calculate.
e.g. Staff off 25/11/08 to 05/12/08 (dd/MM/yy) I would want the days in November to go into the November count and the ones in December in the December count.
I am currently thinking in order to count the number of days I need to separate the start and end date into a record for each day of the absence, assigning it to the month it is in. then group the data for reporting. As for the ones without an end date I would assume null is the current date as they are presently still absent.
What would be the best way to do this?
Any better ways?
Edit: This is SQL 2000 server currently. Hoping for an upgrade soon.
I have had a similar issue where there has been a table of start/end dates designed for data storage but not for reporting.
I sought out the "fastest executing" solution and found that it was to create a 2nd table with the monthly values in there. I populated it with the months from Jan 2000 to Jan 2070. I'm expecting it will suffice or that I get a large pay cheque in 2070 to come and update it...
DECLARE TABLE months (start DATETIME)
-- Populate with all month start dates that may ever be needed
-- And I would recommend indexing / primary keying by start
SELECT
months.start,
data.id,
SUM(CASE WHEN data.start < months.start
THEN DATEDIFF(DAY, months.start, data.end)
ELSE DATEDIFF(DAY, data.start, DATEADD(month, 1, months.start))
END) AS days
FROM
data
INNER JOIN
months
ON data.start < DATEADD(month, 1, months.start)
AND data.end > months.start
GROUP BY
months.start,
data.id
That join can be quite slow for various reasons, I'll search out another answer to another question to show why and how to optimise the join.
EDIT:
Here is another answer relating to overlapping date ranges and how to speed up the joins...
Query max number of simultaneous events