Table Self Join for Year over Year comparison - sql

I am having trouble self joining a table of data to provide year over year results in a single row.
My Data is currently stored as follows in table sales. I can work with either Postgres or sqlite3.
we(date) | store | category | planu | planrev | merchu | merchrev
by desired outcome is:
I need to be able to show values for LY for 1/7/17 in the last 4 columns.
I will then union the results with those for all other partners replicating the same query for other tables.
It can be assumed that all current year data will be we>1/1/18 to match the date 364 days ago for previous year results.
Through reading other posts I think I may need to craft a CTE query, I just don't know where to start.
I hope this was clear.
Any help in working this out would be greatly appreciated.

It looks like you want to join on identical store and category as well as same month and day of the year. That would look as follows in PostgreSQL:
select
'PartnerA' as channel,
cy.we as date,
cy.month,
cy.year,
cy.store,
'PartnerA ' || cy.store as ch_store,
cy.category,
cy.planu,
cy.planrev,
cy.merchu,
cy.merchrev,
ly.planu as planu_ly,
ly.planrev as planrev_ly,
ly.merchu as merchu_ly,
ly.merchrev as merchrev_ly
from sales cy
join sales ly on cy.store = ly.store and cy.category = ly.category
and cy.we - interval '1 year' = ly.we
;

Related

Performing math on SELECT result rows

I have a table that houses customer balances and I need to be able to see when accounts figures have dropped by a certain percentage over the previous month's balance per account.
My output consists of an account id, year_month combination code, and the month ending balance. So I want to see if February's balance dropped by X% from January's, and if January's dropped by the same % from December. If it did drop then I would like to be able to see what year_month code it dropped in, and yes I could have 1 account with multiple drops and I hope to see that.
Anyone have an ideas on how to perform this within SQL?
EDIT: Adding some sample data as requested. On the table I am looking at I have year_month as a column, but I do have access to get the last business day date per month as well
account_id | year_month | ending balance
1 | 2016-1 | 50000
1 | 2016-2 | 40000
1 | 2016-3 | 25
Output that I would like to see is the year_month code when the ending balance has at least a 50% decline from the previous month.
First I would recommend making Year_Month a yyyy-mm-dd format date for this calculation. Then take the current table and join it to itself, but the date that you join on will be the prior month. Then perform your calculation in the select. So you could do something like this below.
SELECT x.*,
x.EndingBalance - y.EndingBalance
FROM Balances x
INNER JOIN Balances y ON x.AccountID = y.AccountID
and x.YearMonth = DATEADD(month, DATEDIFF(month, 0, x.YearMonth) - 1, 0)

Join to Calendar Table - 5 Business Days

So this is somewhat of a common question on here but I haven't found an answer that really suits my specific needs. I have 2 tables. One has a list of ProjectClosedDates. The other table is a calendar table that goes through like 2025 which has columns for if the row date is a weekend day and also another column for is the date a holiday.
My end goal is to find out based on the ProjectClosedDate, what date is 5 business days post that date. My idea was that I was going to use the Calendar table and join it to itself so I could then insert a column into the calendar table that was 5 Business days away from the row-date. Then I was going to join the Project table to that table based on ProjectClosedDate = RowDate.
If I was just going to check the actual business-date table for one record, I could use this:
SELECT actual_date from
(
SELECT actual_date, ROW_NUMBER() OVER(ORDER BY actual_date) AS Row
FROM DateTable
WHERE is_holiday= 0 and actual_date > '2013-12-01'
ORDER BY actual_date
) X
WHERE row = 65
from here:
sql working days holidays
However, this is just one date and I need a column of dates based off of each row. Any thoughts of what the best way to do this would be? I'm using SQL-Server Management Studio.
Completely untested and not thought through:
If the concept of "business days" is common and important in your system, you could add a column "Business Day Sequence" to your table. The column would be a simple unique sequence, incremented by one for every business day and null for every day not counting as a business day.
The data would look something like this:
Date BDAY_SEQ
========== ========
2014-03-03 1
2014-03-04 2
2014-03-05 3
2014-03-06 4
2014-03-07 5
2014-03-08
2014-03-09
2014-03-10 6
Now it's a simple task to find the N:th business day from any date.
You simply do a self join with the calendar table, adding the offset in the join condition.
select a.actual_date
,b.actual_date as nth_bussines_day
from DateTable a
join DateTable b on(
b.bday_seq = a.bday_seq + 5
);

Is there an established pattern for SQL queries which group by a range?

I've seen a lot of questions on SO concerning how to group data by a range in a SQL query.
The exact scenarios vary, but the general underlying problem in each is to group by a range of values rather than each discrete value in the GROUP BY column. In other words, to group by a less precise granularity than you're storing in the database table.
This crops up often in the real world when producing things like histograms, calendar representations, pivot tables and other bespoke reporting outputs.
Some example data (tables unrelated):
| OrderHistory | | Staff |
--------------------------- ------------------------
| Date | Quantity | | Age | Name |
--------------------------- ------------------------
|01-Jul-2012 | 2 | | 19 | Barry |
|02-Jul-2012 | 5 | | 53 | Nigel |
|08-Jul-2012 | 1 | | 29 | Donna |
|10-Jul-2012 | 3 | | 26 | James |
|14-Jul-2012 | 4 | | 44 | Helen |
|17-Jul-2012 | 2 | | 49 | Wendy |
|28-Jul-2012 | 6 | | 62 | Terry |
--------------------------- ------------------------
Now let's say we want to use the Date column of the OrderHistory table to group by weeks, i.e. 7-day ranges. Or perhaps group the Staff into 10-year age ranges:
| Week | QtyCount | | AgeGroup | NameCount |
-------------------------------- -------------------------
|01-Jul to 07-Jul | 7 | | 10-19 | 1 |
|08-Jul to 14-Jul | 8 | | 20-29 | 2 |
|15-Jul to 21-Jul | 2 | | 30-39 | 0 |
|22-Jul to 28-Jul | 6 | | 40-49 | 2 |
-------------------------------- | 50-59 | 1 |
| 60-69 | 1 |
-------------------------
GROUP BY Date and GROUP BY Age on their own won't do it.
The most common answers I see (none of which are consistently voted "correct") are to use one or more of:
a bunch of CASE statements, one per grouping
a bunch of UNION queries, with a different WHERE clause per grouping
as I'm working with SQL Server, PIVOT() and UNPIVOT()
a two-stage query using a sub-select, temp table or View construct
Is there an established generic pattern for dealing with such queries?
You can use some of the dimensional modeling techniques, such as fact tables and dimension tables. Order History can act as a fact table with DateKey foreign key relation to a Date dimension.
Date dimension can have a schema such as below:
Note that Date table is pre-filled with data up-to N number of years.
Using an example above, here is a sample query to get the result:
select CalendarWeek, sum(Quantity)
from OrderHistory a
join DimDate b
on a.DateKey = b.DateKey
group by CalendarWeek
For Staff table, you can store Birthday Key instead of age and let the query calculate the age and ranges.
Here is SQL Fiddle
Date dimension population script was taken from here.
As is often the case this SQL problem requires using more than one pattern in composition.
In this case the two you can use are
NTILE
Numbers Table
You can use NTITLE to create a set number of groups. However since you don't have each member of the groups represented you also need to use a numbers table Since you're using SQL Server you have it easy as you don't have to simulate either.
Here's an example for the Staff problem
WITH g as (
SELECT
NTILE(6) OVER (ORDER BY number) grp,
NUMBER
FROM
master..spt_values
WHERE
TYPE = 'P'
and number >=10 and number <=69
)
SELECT
CAST(min(g.number) as varchar) + ' - ' +
CAST(max(g.number) as varchar) AgeGroup ,
COUNT(s.age) NameCount
FROM
g
LEFT JOIN Staff s
ON g.NUMBER = s.Age
GROUP BY
grp
DEMO
You can apply this to dates as well it just requires some date to day maniplulation
Take a look at the OVER clause and its associated clauses: PARTITION BY, ROW, RANGE...
Determines the partitioning and ordering of a rowset before the
associated window function is applied. That is, the OVER clause
defines a window or user-specified set of rows within a query result
set. A window function then computes a value for each row in the
window. You can use the OVER clause with functions to compute
aggregated values such as moving averages, cumulative aggregates,
running totals, or a top N per group results.
My favorite case in this genre is where transactions must be grouped by fiscal quarter or fiscal year. The fiscal quarter or fiscal year boundaries of various enterprises can border on the bizarre.
My favorite way to implement this is to create a separate table for the attributes of a date. Let's call the table "Almanac". One of the columns in this table is the fiscal quarter, and another one is the fiscal year. The key to this table is of course the date. Ten years worth of data fill up 3,650 rows, plus a few for leap years. You then need a program that can populate this table from scratch. All the enterprise calendar rules are built into this one program.
When you need to group transaction data by fiscal quarter, you just join with this table over date, and then group by fiscal quarter.
I figure this pattern could be extended to groupings by other kinds of ranges, but I've never done it myself.
In your first example your intervals are regular so you can achieve the desired result simply by using functions. Below is an example that gets the data as you require it. The first query keeps the first column in date format (how I would preferably deal with it doing any formatting outside of SQL), the second does the string conversion for you.
DECLARE #OrderHistory TABLE (Date DATE, Quantity INT)
INSERT #OrderHistory VALUES
('20120701', 2), ('20120702', 5), ('20120708', 1), ('20120710', 3),
('20120714', 4), ('20120717', 2), ('20120728', 6)
SET DATEFIRST 7
SELECT DATEADD(DAY, 1 - DATEPART(WEEKDAY, Date), Date) AS WeekStart,
SUM(Quantity) AS Quantity
FROM #OrderHistory
GROUP BY DATEADD(DAY, 1 - DATEPART(WEEKDAY, Date), Date)
SELECT WeekStart,
SUM(Quantity) AS Quantity
FROM #OrderHistory
CROSS APPLY
( SELECT CONVERT(VARCHAR(6), DATEADD(DAY, 1 - DATEPART(WEEKDAY, Date), Date), 6) + ' to ' +
CONVERT(VARCHAR(6), DATEADD(DAY, 7 - DATEPART(WEEKDAY, Date), Date), 6) AS WeekStart
) ws
GROUP BY WeekStart
Something similar can be done for your age grouping using:
SELECT CAST(FLOOR(Age / 10.0) * 10 AS INT)
However this fails for 30-39 because there is no data for this group.
My stance on the matter would be, if you are doing the query as a one off, using a temp table, cte or case statement should work just fine, this should also extend to reusing the same query on small sets of data.
If you are likely to reuse the group however, or you are referring to significant amounts of data then create a permanent table with the ranges defined and indices applied to any columns required. This is the basis of creating dimensions in OLAP.
Couldn't you treat the age (or date) as a foreign key in a new, tiny table that is just ages (or dates) and their corresponding ranges? A join statement could provide a new table with a column that contains AgeGroups. With the new table you could use the standard group-by method.
It does seem reckless to make a new table for grouping, but it would be easy to make programatically and I think it would be easier to maintain (or drop and recreate) than a case statement or a where clause. If the result of this query is a one-off, a throwaway sql statement would probably work best, but I think my method makes the most sense for long-term use.
Well, some years ago with Oracle DB we did it the following way:
We had two tables: Sessions and Ranges. Ranges had foreign key that referenced Session.
When we needed to perform SQL, we created a new record in Sessions and several new records in Ranges that referred to that session.
Our SQL joined Ranges with filter by Session:
select sum(t.Value), r.Name
from DataTable t
join Ranges r on (r.Session = ? and r.Start t.MyDate)
group by r.Name
After we got results we deleted that record from Sessions and records from Ranges where deleted by cascade.
We had daemon job that purged Sessions from junk records that were leaked in case of extraordinary situation (killed processes, etc).
This worked perfectly. Since that time Oracle added new SQL clauses, and maybe they could be used instead. But on other RDBMSes this is still a valid way.
Another approach is to create a number of functions such as GET_YEAR_BY_DATE or GET_QUARTER_BY_DATE or GET_WEEK_BY_DATE (they would return start date of corresponding
period, for example, for any date return start date of year). And then group by them:
select sum(Value), GET_YEAR_BY_DATE(MyDate) from DataTable
group by GET_YEAR_BY_DATE(MyDate)

Adjust date column for change over time

This is an easy enough problem, but wondering if anyone can provide a more elegant solution.
I've got a table that consists of a date column (month end dates over time) and several value columns--say the price on a variety of stocks over time, one column for each stock. I'd like to calculate the change in value columns for each period represented in the date column (eg, a daily return from a table filled with prices).
My current plan is to join the table to itself and simply create a new column for the return as ret = b.price/a.price - 1. Code as follows:
select Date, Ret = (b.stock1/a.stock1 - 1)
from #temp a, #temp b
where datediff(day, a.Date,b.Date) between 25 and 35
order by a.Date
This works fine, BUT:
(1) I need to do this for, say, dozens of stocks--is there a good way to replicate the calculation without copying and pasting the return calculation and replacing 'stock1' with each other stock name?
(2) Is there a better way to do this join? I'm effectively doing a cross join at this point and only keeping entries that are adjacent (as defined by the datediff and range), but wondering if there's a better way to join a table like this to itself.
EDIT: Per request, data is in the form (my data has multiple price columns though):
Date Price
7/1/1996 349.22
7/31/1996 337.72
8/30/1996 343.70
9/30/1996 357.23
10/31/1996 364.07
11/29/1996 385.04
12/31/1996 383.68
And from that, I'd like to calculate return, to generate a table like this (again, with additional columns for the extra price columns that exist in the actual table):
Date Ret
7/31/1996 -0.03
8/30/1996 0.02
9/30/1996 0.04
10/31/1996 0.02
11/29/1996 0.06
12/31/1996 0.00
I would do the following. First, use the month and year to do the self join. I woudl recommend you take the year * 12 + the month number to get a unique value for each month and year combination. So, Jan of 2011 would have a value of (2011 * 12 + 1 = 24133) and December of 2010 would have a value of (2010 * 12 + 12 = 24132). This will allow you to accurately compare months without having to mess with rolling over from December to January. Next, you need to supply the calculations in the select clause. If you have the stock values in different columns then you will have to type them out as a.stock1-b.stock1, a.stock2-b.stock2, etc. The only way around that would be to massage the data to where there is only one stock value column and add a stockname column that would identify what stock that value is for.
Using the Month and Year for the self join, the following query should work:
select Date, Ret = (b.stock1/a.stock1 - 1)
from #temp a
inner join #temp b on (YEAR(a.Date) * 12) + MONTH(a.Date) = (YEAR(b.Date) * 12) + MONTH(b.Date) + 1
order by a.Date

SQL query with week days

I would like to know what is the best way of creating a report that will be grouped by the last 7 days - but not every day i have data. for example:
08/01/10 | 0
08/02/10 | 5
08/03/10 | 6
08/04/10 | 10
08/05/10 | 0
08/06/10 | 11
08/07/10 | 1
is the only option is to create a dummy table with those days and join them altogether?
thank you
Try something like this
WITH LastDays (calc_date)
AS
(SELECT DATEADD(DAY, DATEDIFF(DAY, 0, CURRENT_TIMESTAMP) - 6, 0)
UNION ALL
SELECT DATEADD(DAY, 1, calc_date)
FROM LastDays
WHERE DATEADD(DAY, 1, calc_date) < CURRENT_TIMESTAMP)
SELECT ...
FROM LastDays l LEFT JOIN (YourQuery) t ON (l.cal_date = t.YourDateColumn);
Many people will suggest methods for dynamically creating a range of dates that you can then join against. This will certainly work but in my experience a calendar table is the way to go. This will make the SQL trivial and generic at the cost of maintaining the calendar table.
At some point in the future someone will come along and ask for another report that excludes weekends. You then have to make your dynamic days generation account for weekends. Then someone will ask for working-days excluding public-holidays at which point you have no choice but to create a calendar table.
I would suggest you bite the bullet and create a calendar table to join against. Pre-populate it with every date and if you want to think ahead then add columns for "Working Day" and maybe even week number if your company uses a non-standard week-number for reporting
You don't mention the specific language (please do for a more detailed answer), but most versions of sql have a function for the current date (GetDate(), for instance). You could take that date, subtract x (7) days and build your WHERE statement like that.
Then you could GROUP BY the day-part of that date.
select the last 7 transactions and left join it with your query and then group by the date column. hope this helps.