Access/SQL Cumulative Query wth GroupBy - sql

I need a query that basically improves upon a query I already have. My original post was here: Access Cumulative Total by Date
The existing query is a running sum of the energy capacity online as of the end of the year. I need one that makes the same one but broken out by project type, preferably in a crosstab format.
The project capacity and online date is stored in the Projects table and the Alternative Energy Type that I need the table to be grouped by is in a different table, Project Types, and the two are related through a type ID.
I'm not very good with SQL, so I've been trying to just add in the Project Types table from the Access Query Builder, but just adding in another groupby column with the Alternative Energy Type
Original:
SELECT Year(p.[Online Date]) AS yr, (SELECT SUM(p2.[System Size AC])
FROM Projects as p2
WHERE YEAR(p2.[Online Date]) <= YEAR(p.[Online Date])
) AS running_sum
FROM Projects AS p
GROUP BY Year([Online Date]);
Modified (wrong):
SELECT Year(p.[Online Date]) AS yr, (SELECT SUM(p2.[System Size AC])
FROM Projects as p2
WHERE YEAR(p2.[Online Date]) <= YEAR(p.[Online Date])
) AS running_sum, [Project Types].[Alternative Energy Type]
FROM [Project Types] INNER JOIN Projects AS p ON [Project Types].[Type ID] = p.[Project Type]
GROUP BY Year([Online Date]), [Project Types].[Alternative Energy Type];
The results of the modified query just show the total yearly running sum with the Alterntaive Energy Types next to them. This isn't correct because it's just showing the same total over and over, nothing is broken out.
I need it to be broken out so that it answers the question "How much rooftop solar did we have as of 12/31/2015, 12/31/2016, etc and how much offsite wind did we have as of 12/31/2015, 12/31/2016, etc"

You need to add an equality to the correlated subquery by grouping field, Alternative Energy Type:
SELECT t.[Alternative Energy Type],
YEAR(p.[Online Date]) AS yr,
(SELECT SUM(subp.[System Size AC])
FROM Projects subp
INNER JOIN [Project Types] subt
ON subt.[Type ID] = subp.[Project Type]
WHERE YEAR(subp.[Online Date]) <= YEAR(p.[Online Date])
AND subt.[Alternative Energy Type] = t.[Alternative Energy Type]
) AS running_sum
FROM Projects p
INNER JOIN [Project Types] t
ON t.[Type ID] = p.[Project Type]
GROUP BY t.[Alternative Energy Type],
YEAR(p.[Online Date]);
For crosstab, first run a make-table action query since the running sum calculation will cause issues
SELECT *
INTO RunningSumTbl
FROM RunningSumQ
Then run the crosstab:
TRANSFORM Sum(q.running_sum) AS SumOfRunningSum
SELECT r.[Alternative Energy Type]
FROM RunningSumTbl r
GROUP BY r.[Alternative Energy Type]
PIVOT r.[yr];
Use PIVOT clause to subset and order columns:
PIVOT r.[yr] IN (2019, 2018, 2017, 2016, 2015)

Related

New to SQL. Would like to convert an IF(COUNTIFS()) Excel formula to SQL code and have SQL calculate it instead of Excel

I am running SQL Server 2008 R2 (RTM).
I have a SQL query that pulls Dates, Products, Customers and Units:
select
[Transaction Date] as Date,
[SKU] as Product,
[Customer Name] as Customer,
sum(Qty) as Units
from dataset
where [Transaction Date] < '2019-03-01' and [Transaction Date] >= '2016-01-01'
group by [Transaction Date], [SKU], [Customer Name]
order by [Transaction Date]
This pulls hundreds of thousands of records and I wanted to determine if a certain transaction was a new order or reorder based on the following logic:
Reorder: That specific Customer has ordered that specific product in the last 6 months
New Order: That specific Customer hasn’t ordered that specific product in the last 6 months
For that I have this formula in Excel that seems to be working:
=IF(COUNTIFS(A$1:A1,">="&DATE(YEAR(A2),MONTH(A2)-6,DAY(A2)),C$1:C1,C2,B$1:B1,B2),"Reorder","New Order")
The formula works when I paste it individually or in a smaller dataset, but when I try to copy paste it to all 500K+ rows, Excel gives up because it loops for each calculation.
This could probably be done in SQL, but I don’t have the knowledge on how to convert this excel formula to SQL, I just started studying it.
You're doing pretty well with the start of your query there. There are three additional functions you're looking to add to your query.
The first thing you'll need is the easiest. GETDATE() simply returns the current date. You'll need that when you're comparing the current date to the transaction date.
The second function is DATEDIFF, which will give you a unit of time between two dates (months, days, years, quarters, etc). Using DATEDIFF, you can say "is this date within the last 6 months". The format for this is pretty easy. It's DATEDIFF(interval, date1, date2).
The thrid function you're looking for is CASE, which allows you to tell SQL to give you one answer if one condition is met, but a different answer if a different condition is met. For your example, you can say "if the difference in days is < 60, return 'Reorder', if not give me 'New Order'".
Putting it all together:
SELECT CASE
WHEN DATEDIFF(MONTH, [Transaction Date], GETDATE()) <= 6
THEN 'Reorder'
ELSE 'New Order'
END as ORDER_TYPE
,[Transaction Date] AS DATE
,[SKU] AS PRODUCT
,[Customer Name] AS CUSTOMER
,Qty AS UNITS
FROM DATASET
For additonal examples on CASE, take a look at this site: https://www.w3schools.com/sql/sql_ref_case.asp
For additional examples on DATEDIFF, take a look here: See the
following webpage for examples and a chance to try it out:
https://www.w3schools.com/sql/func_sqlserver_datediff.asp
SELECT CASE
WHEN Datediff(day, [transaction date], Getdate()) <= 180 THEN 'reorder'
ELSE 'Neworder'
END,
[transaction date] AS Date,
[sku] AS Product,
[customer name] AS Customer,
qty AS Units
FROM datase
If I understand correctly, you want to peak at the previous date and make a comparison. This suggests lag():
select (case when lag([Transaction Date]) over (partition by SKU, [Customer Name] order by [Transaction Date]) >
dateadd(month, -6, [Transaction Date])
then 'Reorder'
else 'New Order'
end) as Order_Type
[Transaction Date] as Date,
[SKU] as Product,
[Customer Name] as Customer,
sum(Qty) as Units
from dataset d
group by [Transaction Date], [SKU], [Customer Name];
EDIT:
In SQL Server 2008, you can emulate the LAG() using OUTER APPLY:
select (case when dprev.[Transaction Date] >
dateadd(month, -6, d.[Transaction Date])
then 'Reorder'
else 'New Order'
end) as Order_Type
d.[Transaction Date] as Date,
d.[SKU] as Product,
d.[Customer Name] as Customer,
sum(d.Qty) as Units
from dataset d outer apply
(select top (1) dprev.*
from dataset dprev
where dprev.SKU = d.SKU and
dprev.[Customer Name] = d.[Customer Name] and
dprev.[Transaction Date] < d.[Transaction Date]
order by dprev.[Transaction Date] desc
) dprev
group by d.[Transaction Date], d.[SKU], d.[Customer Name];

mssql: add column with the same value for all rows to search results

I have my query:
SELECT [Shipment Date], [Amount] as [Running Costs], Sum([Amount]) OVER
(ORDER BY [Shipment Date]) as [Total Running Costs]
FROM...
This gets me 3 columns:
Shipment Date | Running Costs | Total Running Costs
I would like to add a fourth column to this query which has the same value for all rows, and the same number of rows as my original query results.
I know you could add for example '999'as Something to the search results, but how can I do the same for a sum of another column (example: Imagine the total sum of the a column in another table is 1500, and I want to have 1500 for all rows in the fourth column. Something like select sum(column_name)?
The database engine is MSSQL.
You can use a nested query
SELECT [Shipment Date], [Amount] as [Running Costs], [Total Running Costs], SUM([Total Running Costs] OVER ())
FROM
(
SELECT [Shipment Date], [Amount] as [Running Costs], Sum([Amount]) OVER
(ORDER BY [Shipment Date]) as [Total Running Costs]
FROM...
)
Nested window function should also work
SUM(SUM([Running costs]) OVER (ORDER BY [Shipment Date])) OVER ()

SQL Select Same Table Multiple Sums

I am trying to wrap my head round this issue and I am sure the answer exists here a million times but then I am not searching for the right question.
I have a huge sales table [SALES] and I am extracting
SELECT DISTINCT S1.[ORDER ID], S1.SUPPLIER, SUM(S1.[ORDER TOTAL]) AS SUPPLIERTOTAL
FROM [SALES] S1
LEFT JOIN
(
Select s2.[Order ID], S2.[Supplier], S2.[Supplier Colour], SUM(S2.[Order TOTAL]) AS COLOURTOTAL
FROM [SALES]
WHERE [SALES].[SALESDATE] Between '20160101' and '20170101'
) AS s2
ON s1.[Order ID] = s2.[Order ID]
I have thrown this code together as an illustration as I am not by my work PC at present. My issue is that when I do get the re-select to work it produces the correct order value from the first select.
E.G Lets say the manufacturer was Ford and the total value was 100000 over ten orders it returns the 100000 correctly however on the sub select it appears to take the total value and multiply it by the total number of rows in the table. I am trying to work out what is going on with the data and query but cannot see the issue.
The only factor if its of influence is that the table has no primary key but as I am providing referential integrity with the join didn't believe that would be the case...
Anyone able to answer or come across this issue>
A bit of guessing here, as the question is not too clear, but I think you are looking for something like this:
SELECT S1.[ORDER ID], S1.SUPPLIER, SUM(S1.[ORDER TOTAL]) AS SUPPLIERTOTAL, SUM(S2.COLOURTOTAL) as COLOURTOTAL
FROM [SALES] S1
LEFT JOIN
(
Select s2.[Order ID], S2.[Supplier], S2.[Supplier Colour], S2.[Order TOTAL] AS COLOURTOTAL
FROM [SALES]
WHERE [SALES].[SALESDATE] Between '20160101' and '20170101'
) AS s2
ON s1.[Order ID] = s2.[Order ID]
GROUP BY S1.[ORDER ID], S1.SUPPLIER

SQL sorting query

I have following table with following relevant columns-
Invoices-
Invoice Number | Invoice Date | Invoice Status
Invoice status can have values : PENDING, SENT, COURIER, LR, CANCELLED, DONE
I want to order the records in this table such that,
Invoices older than 2 days but not having status CANCELLED and DONE should appear first
then
Invoices newer then 2 days but not having status CANCELLED and DONE should appear first
then
All invoices having CANCELLED or DONE should be last on the list
How to achieve this in SQL.
I am using SQL server and writing stored procedure is something I am not keen to do it.
Ideally it should be single SQL statement.
What would be the solution for this problem?
You do this using expressions in the order by:
select i.
from invoices i
order by (case when invoicedate < dateadd(day, -2, getdate()) and
status not in ('Cancelled', 'Done')
then 1
when status not in ('Cancelled', 'Done')
then 2
then 3
end);
A stored procedure is not necessary for this (nor in my opinion is it desirable).
Please try the following...
SELECT [Invoice Number],
[Invoice Date],
[Invoice Status]
FROM Invoices
WHERE [Invoice Status] NOT IN ( 'CANCELLED', 'DONE' )
AND DATEADD( DAY, -2, GETDATE() ) > [Invoice Date]
UNION
SELECT [Invoice Number],
[Invoice Date],
[Invoice Status]
FROM Invoices
WHERE [Invoice Status] NOT IN ( 'CANCELLED', 'DONE' )
AND DATEADD( DAY, -2, GETDATE() ) <= [Invoice Date]
UNION
SELECT [Invoice Number],
[Invoice Date],
[Invoice Status]
FROM Invoices
WHERE [Invoice Status] IN ( 'CANCELLED', 'DONE' );
In each of by SELECT statements I have chosen to show all the fields and in the same order. Note that the corresponding fields in the SELECT statements making up a UNION should always be of the same type. In some situations a field of one type may be converted to match another, but this can sometimes lead to troubles and it is strongly recommended that you try to avoid such situations.
In my first statement I compare the date two days before the current date to the [Invoice Date] and if it is greater (i.e the Invoice is older than 2 days) then the row is included (subject to its [Invoice Status]).
For the second statement I choose invoices where the date is no more than two days before the current one.
For the third statement I show all Invoices where the status is either CANCELLED or DONE.
You can specify sorts, groupings, etc. of similar or different patterns within each list, but since you have not specified any such in your Question I have not attempted to implement them.
If you have any questions or comments, then please feel free to post a Comment accordingly.

GROUPING A TABLE BASED ON MULTIPLE FIELDS

I have a table with following format
Now i am using Access and want to create a table which will provide me sum of units where Segment is commercial, at the same time I just want to sum Subregions where they are same..means MCA+MCA+MCA... Also the period should be same where we have added them..means 2013Q1 should be added too 2013Q1 only and grouped in that format.
Something like below
Thanks and Regards
The following query totals the unit while grouping on the other fields. To change the range that it totals over, you will either be adding a WHERE clause or you will change the fields that it is GROUP BY
SELECT [Sub Region], [Corporate Family], [HP Segment], [Product Category], [Period], Sum([Units])
FROM TableName
GROUP BY [Sub Region], [Corporate Family], [HP Segment], [Product Category], [Period]