SQL Two columns from a single row based on Third Column - sql

I have a dataset (Call it TableA) that records all data for a single year, in columns by month
What I want to do is be able to use a single query to get a beginning and ending balance for the first month on a single row, but that would require pulling in the Balance 12 from the previous year. similar to this:
I need to be able to do this as a Direct Query in PowerBI, so merging separate queries in PowerQuery won't work.
Are there any suggestions on how to accomplish this?
Thanks in advance!

SELECT CurrentYear.Item, CurrentYear.FiscYr, PreviousYear.Balance12, CurrentYear.Balance1
FROM TableA CurrentYear
LEFT OUTER JOIN TableA PreviousYear ON CurrentYear.Item = PreviousYear.Item AND PreviousYear.FiscYr = CurrentYear.FiscYr - 1

You can use lag():
select item, fiscyr, lag(balance12) over (partition by item order by fiscyr) as beginning,
balanc1 as ending
from t;

Related

Sum Column for Running Total where Overlapping Date

I have a table with about 3 million rows of Customer Sales by Date.
For each CustomerID row I need to get the sum of the Spend_Value
WHERE Order_Date BETWEEN Order_Date_m365 AND Order_Date
Order_Date_m365 = OrderDate minus 365 days.
I just tried a self join but of course, this gave the wrong results due to rows overlapping dates.
If there is a way with Window Functions this would be ideal but I tried and can't do the between dates in the function, unless I missed a way.
Tonly way I can think now is to loop so process all rank 1 rows into a table, then rank 2 in a table, etc., but this will be really inefficient on 3 million rows.
Any ideas on how this is usually handled in SQL?
SELECT CustomerID,Order_Date_m365,Order_Date,Spend_Value
FROM dbo.CustomerSales
Window functions likely won't help you here, so you are going to need to reference the table again. I would suggest that you use an APPLY with a subquery to do this. Provided you have relevant indexes then this will likely be the more efficient approach:
SELECT CS.CustomerID,
CS.Order_Date_m365,
CS.Order_Date,
CS.Spend_Value,
o.output
FROM dbo.CustomerSales CS
CROSS APPLY (SELECT SUM(Spend_Value) AS output
FROM dbo.CustomerSales ca
WHERE ca.CustomerID = CS.CustomerID
AND ca.Order_Date >= CS.Order_Date_m365 --Or should this is >?
AND ca.Order_Date <= CS.Order_Date) o;

SQL: Getting the latest date using Max() while using group by

I'm struggling to get the correct result with this query:
select max(kts.my_date), kts.name
join ktt on ktt.someId = kts.someOtherId
where ktt.someId = 'example'
group by kts.name;
I have two (possibly stupid) questions:
Will this max() take time into account? I know that order by does if the dates are the same. Does max do the same?
This is connected to my previous question, but when I run the query above, if the dates are same, it orders it by the name. I want the latest date at the top. Do I need to put an order by clause for the date in? If so, using Max is pointless, right?
Thanks for the help.
Yes,
--2
select max(kts.my_date) over (partition by kts.name) as maxdate, kts.name
from -- chose your table
join ktt on ktt.someId = kts.someOtherId
where ktt.someId = 'example'
order by --chose here your column
give this a try

Subtracting 2 values from a query and sub-query using CROSS JOIN in SQL

I have a question that I'm having trouble answering.
Find out what is the difference in number of invoices and total of invoiced products between May and June.
One way of doing it is to use sub-queries: one for June and the other one for May, and to subtract the results of the two queries. Since each of the two subqueries will return one row you can (should) use CROSS JOIN, which does not require the "on" clause since you join "all" the rows from one table (i.e. subquery) to all the rows from the other one.
To find the month of a certain date, you can use MONTH function.
Here is the Erwin document
This is what I got so far. I have no idea how to use CROSS JOIN in this situation
select COUNT(*) TotalInv, SUM(ILP.ProductCount) TotalInvoicedProducts
from Invoice I, (select Count(distinct ProductId) ProductCount from InvoiceLine) AS ILP
where MONTH(inv_date) = 5
select COUNT(*) TotalInv, SUM(ILP.ProductCount) TotalInvoicedProducts
from Invoice I, (select Count(distinct ProductId) ProductCount from InvoiceLine) AS ILP
where MONTH(inv_date) = 6
If you guys can help that would be great.
Thanks
The problem statement suggests you use the following steps:
Construct a query, with a single result row giving the values for June.
Construct a query, with a single result row giving the values for May.
Compare the results of the two queries.
The issue is that, in SQL, it's not super easy to do that third step. One way to do it is by doing a cross join, which yields a row containing all the values from both subqueries; it's then easy to use SELECT (b - a) ... to get the differences you're looking for. This isn't the only way to do the third step, but what you have definitely doesn't work.
can't you do something with subqueries? I haven't tested this, but something like the below should give you 4 columns, invoices and products for may and june.
select (
select 'stuff' a, count(*) as june_invoices, sum(products) as products from invoices
where month = 'june'
) june , (
select 'stuff' a, count(*) as may_invoices, sum(products) as products from invoices
where month = 'may'
) may
where june.a = may.a

Obtain maximum row_number inside a cross apply

I am having trouble in calculating the maximum of a row_number in my sql case.
I will explain it directly on the SQL Fiddle example, as I think it will be faster to understand: SQL Fiddle
Columns 'OrderNumber', 'HourMinute' and 'Code' are just to represent my table and hence, should not be relevant for coding purposes
Column 'DateOnly' contains the dates
Column 'Phone' contains the phones of my customers
Column 'Purchases' contains the number of times customers have bought in the last 12 months. Note that this value is provided for each date, so the 12 months time period is relative to the date we're evaluating.
Finally, the column I am trying to produce is the 'PREVIOUSPURCHASES' which counts the number of times the figure provided in the column 'Purchases' has appeared in the previous 12 months (for each phone).
You can see on the SQL Fiddle example what I have achieved so far. The column 'PREVIOUSPURCHASES' is producing what I want, however, it is also producing lower values (e.g. only the maximum one is the one I need).
For instance, you can see that rows 4 and 5 are duplicated, one with a 'PREVIOUSPURCHASES' of 1 and the other with 2. I don't want to have the 4th row, in this case.
I have though about replacing the row_number by something like max(row_number) but I haven't been able to produce it (already looked at similar posts at stackoverflow...).
This should be implemented in SQL Server 2012.
Thanks in advance.
I'm not sure what kind of result set you want to see but is there anything wrong with what's returned with this?
SELECT c.OrderNumber, c.DateOnly, c.HourMinute, c.Code, c.Phone, c.Purchases, MAX(o.PreviousPurchases)
FROM cte c CROSS APPLY (
SELECT t2.DateOnly, t2.Phone,t2.ordernumber, t2.Purchases, ROW_NUMBER() OVER(PARTITION BY c.DateOnly ORDER BY t2.DateOnly) AS PreviousPurchases
FROM CurrentCustomers_v2 t2
WHERE c.Phone = t2.Phone AND t2.purchases<=c.purchases AND DATEDIFF(DAY, t2.DateOnly, c.DateOnly) BETWEEN 0 AND 365
) o
WHERE c.OrderNumber = o.OrderNumber
GROUP BY c.OrderNumber, c.DateOnly, c.HourMinute, c.Code, c.Phone, c.Purchases
ORDER BY c.DateOnly

How do I count data from 2 different tables by date

I have 2 tables with no relations, both tables have different number of columns, but there are a few columns that are the same but hold different data. I was able to create a function or view of only the data I wanted, but when I try to count the data by filtering the date, I always get the wrong count in return. Let me explain by showing the 2 functions and what I try to do:
Function 1
ID - number from 1 to 8
data sent - YES or NO
Date - date value
Function 2
ID - number from 1 to 8
data sent - yes or no
date - date value
Upon running both separately, I get all the rows from the tables and everything looks good.
Then I try to add the following to each function:
select
count([data sent]), ID
from function1
Where (date between #date1 and #date2)
group by ID
The above statement works great and gives me the right result for each function.
Now I thought what if I want to add those 2 functions into one and get the count from both functions on 1 page.
So I created the following function:
Function 3
select
count(Function1.[data sent]) as Expr1,
Function1.id,
count(Function2.[data sent]) as Expr2,
Function1.date
from
Function1
LEFT OUTER JOIN
Function2 on Function1.id = Function2.id
Where
(Function1.date between #date1 and #date2)
group by
Function1.id
Upon running the above, I get the following table:
ID Expr1 Expr2
On both Expr1 and Expr2, I get results which I am not sure where they come from. I guess something is being multiplied by 100000 since one table holds almost 15000 rows and the other around 5000 rows.
What I would like to know first is if it possible at all to be able to filter by date and count records from both table at the same time. If anyone need more information please let me know and I will be glad to share and explain more.
Thank you
The LEFT OUTER JOIN is taking each row of the left table, finding ALL of the rows in the right table with the same id field, and creating that many rows in the result table. Since id isn't what we usually think of as an identity field (it looks more like a "deviceId" or something), you'll get lots of matches for each one. Repeat 15000 times and you get your combinatorial explosion.
Tip: To debug things like this, you can create sample tables with a tiny subset of the real data, say 10 rows from each, and run your query on them. You'll see the issue immediately.
It's possible to filter by date. It's hard to recommend an actual solution without better understanding your phrase "I want to add those 2 functions into one and get the count from both functions on 1 page".
Why can't you create a temporary table for each function then join them together?
Maybe subqueries can help you to achieve what you want:
SELECT
ID = COALESCE(f1.ID, f2.ID),
Date = COALESCE(f1.Date, f2.Date),
f1.Expr1,
f2.Expr2
FROM (
SELECT
ID,
Date,
Expr1 = COUNT([data sent])
FROM Function1
WHERE Date BETWEEN #date1 AND #date2
GROUP BY
ID,
Date
) f1
FULL JOIN (
SELECT
ID,
Date,
Expr2 = COUNT([data sent])
FROM Function2
WHERE Date BETWEEN #date1 AND #date2
GROUP BY
ID,
Date
) f2
ON f1.ID = f2.ID AND f1.Date = f2.Date
This query also uses full (outer) join instead of left join, in case the right side of the join contains rows that have no match in the left side (and you want those rows).