Compare 2 subsets of data from table? - sql

I'm not sure if this is possible - I'm having real trouble getting my head around it.
This is for a product schedule, showing how much we are expecting to deliver on a given date. Data is imported into this schedule weekly which creates a new entry.
For example, if the schedule for the day currently totals 10, and you import 15, a new row is inserted with Qty 5, bringing the sum to 15.
The data I have is like so:
Product | Delivery Required Date | Qty
Prod1 | 1/1/13 | 10
Prod1 | 1/1/13 | -10
Prod1 | 1/1/13 | 10
Prod1 | 1/1/13 | -10
Prod1 | 1/1/13 | 25
I want to design a query which shows the variance between the previous schedule, and the current schedule.
For example, the query will sum all of the rows "Qty", excluding the last entry - and compare it to the last entry. In the data above, the variance is 25 (Existing total was 0, latest entry is 25, 0+25 =25).
Is this possible?
Thanks

I suspect there'a better answer using Common Table Expressions, but a quick & ugly solution might be
select sum(case when EntryNo <> MAX(EntryNo) then Qty else 0 end) as 'sumLessLast'
from MyTable
If MyTable has a million rows in it you'll want a better solution.

SqlServer 2005 and 2008:
;with r1 as (
select DeliveryReqDate, sum(Qty) as TotalQty
from TableName
group by DeliveryReqDate)
, r2 as (
select DeliveryReqDate, Qty
, row_number() over (partition by DeliveryReqDate order by EntryNo desc) rn
from TableName)
select r1.DeliveryReqDate, r1.TotalQty, r2.Qty as LastQty
, r1.TotalQty - r2.Qty as TotalButLastQty
from r1
join r2 on r2.DeliveryReqDate = r1.DeliveryReqDate and r2.rn = 1
SqlServer 2012
;with r1 as (
select DeliveryReqDate, Qty
, sum(Qty) over (partition by DeliveryReqDate) as TotalQty
, row_number() over (partition by DeliveryReqDate order by EntryNo desc) rn
from TableName)
select DeliveryReqDate, TotalQty, Qty as LastQty
, TotalQty - Qty as TotalButLastQty
from r1
where rn = 1
I'm not sure that I completely understand logic regarding the accounting of product and date, but I hope you can adapt above queries to your needs.

Related

Cumulative Sum Query in SQL table with distinct elements

I have a table like this, with column names as Date of Sale and insurance Salesman Names -
Date of Sale | Salesman Name | Sale Amount
2021-03-01 | Jack | 40
2021-03-02 | Mark | 60
2021-03-03 | Sam | 30
2021-03-03 | Mark | 70
2021-03-02 | Sam | 100
I want to do a group by, using the date of sale. The next column should display the cumulative count of the sellers who have made the sale till that date. But same sellers shouldn't be considered again.
For example,
The following table is incorrect,
Date of Sale | Count(Salesman Name) | Sum(Sale Amount)
2021-03-01 | 1 | 40
2021-03-02 | 3 | 200
2021-03-03 | 5 | 300
The following table is correct,
Date of Sale | Count(Salesman Name) | Sum(Sale Amount)
2021-03-01 | 1 | 40
2021-03-02 | 3 | 200
2021-03-03 | 3 | 300
I am not sure how to frame the SQL query, because there are two conditions involved here, cumulative count while ignoring the duplicates. I think the OVER clause along with the unbounded row preceding may be of some use here? Request your help
Edit - I have added the Sale Amount as a column. I need the cumulative sum for the Sales Amount also. But in this case , all the sale amounts should be considered unlike the salesman name case where only unique names were being considered.
One approach uses a self join and aggregation:
WITH cte AS (
SELECT t1.SaleDate,
COUNT(CASE WHEN t2.Salesman IS NULL THEN 1 END) AS cnt,
SUM(t1.SaleAmount) AS amt
FROM yourTable t1
LEFT JOIN yourTable t2
ON t2.Salesman = t1.Saleman AND
t2.SaleDate < t1.SaleDate
GROUP BY t1.SaleDate
)
SELECT
SaleDate,
SUM(cnt) OVER (ORDER BY SaleDate) AS NumSalesman,
SUM(amt) OVER (ORDER BY SaleDate) AS TotalAmount
FROM cte
ORDER BY SaleDate;
The logic in the CTE is that we try to find, for each salesman, an earlier record for the same salesman. If we can't find such a record, then we assume the record in question is the first appearance. Then we aggregate by date to get the counts per day, and finally take a rolling sum of counts in the outer query.
The best way to do this uses window functions to determine the first time a sales person appears. Then, you just want cumulative sums:
select saledate,
sum(case when seqnum = 1 then 1 else 0 end) over (order by saledate) as num_salespersons,
sum(sum(sales)) over (order by saledate) as running_sales
from (select t.*,
row_number() over (partition by salesperson order by saledate) as seqnum
from t
) t
group by saledate
order by saledate;
Note that this in addition to being more concise, this should have much, much better performance than a solution that uses a self-join.

Database schema pattern for grouping transactions

I am working on an accounting system in which there is a way to revert transactions which are made by mistake.
There are processes which run on invoices which generate transactions.
One process can generate multiple transactions for an invoice. There can be multiple processes which can be run on an invoice.
The schema looks as under:
Transactions
========================================================
Id | InvoiceId | InvoiceProcessType | Amount | CreatedOn
1 1 23 10.00 Today
2 1 23 13.00 Today
3 1 23 17.00 Yesterday
4 1 23 32.00 Yesterday
Now 1 and 2 happened together and 3 and 4 happened together and I want to revert the latter (3,4), what would be a possible solution to group them.
One possible solution is to add a column ProcessCount which is incremented on every process.
The new schema would look as under.
Transactions
==============================================================================
Id | InvoiceId | InvoiceProcessType | Amount | CreatedOn | ProcessCount
1 1 23 10.00 Today 1
2 1 23 13.00 Today 1
3 1 23 17.00 Yesterday 2
4 1 23 32.00 Yesterday 2
Is there any other way I can implement this ?
TIA
If you are basing the batching on an arbitrary time frame between the createdon date/time values, then you can use lag() and a cumulative sum. For instance, if two rows are in the same batch if they are within an hour, then:
select t.*,
sum(case when prev_createdon > dateadd(hour, -1, createdon) then 0 else 1 end) over
(partition by invoiceid order by createdon, id) as processcount
from (select t.*,
lag(createdon) over (partition by invoiceid order by createdon, id) as prev_createdon
from transactions t
) t;
That said, it would seem that your processing needs to be enhanced. Each time the code runs, a row should be inserted into some table (say processes). The id generated from that insertion should be used to insert into transactions. That way, you can keep the information about when -- and who and what and so on -- inserted particular transactions.
You can use the dense_rank to identify it as follows:
select t.*,
dense_rank() over (partition by InvoiceId
order by CreatedOn desc) as ProcessCount
from your_table t
You can then revert (/delete) as per your requirement, There is no need to explicitly maintain the ProcessCount column. It can be derived as per the above query.

Trying to consolidate and/or CONCAT ItemNumber with NumberOfDaysBetweenOrders

Right now I currently have the NumberOfDaysBetweenPurchases per ItemNumber by AccountNumber. I'm trying to CONCATENATE the ItemNumber|AVERAGENumberOfDaysBetweenPurchases
I have two issues. 1, I cannot find a way to get the average for all of the Days between Orders. 2, I cannot get this to display without Ordering/Grouping by Account Number or Purchase Date. So it always displays by breaking out millions of rows for each time the product was ordered by a customer.
Here is what I'm using to get the NumberOfDaysBetweenPurchases:
datediff(day, lag(OrderDate,1) over (partition by AccountNumber
order by OrderDate), OrderDate) as [NumOfDaysBetweenOrdersByAccountNum]
How do I consolidate the ItemNumbers so that they only show up once along with the Average Number of Days Between Orders by Account Number? I'd like it to show up like this:
ItemNumber | AvgNumberOfDaysBetweenOrdersByAccountNumber
12345 6
452234 45
5235 3
Here's an example of what my current info looks like (millions of rows):
ItemNumber | NumberDays(...) | OrderDate | AccountNumber
123 0 ---- 101010
123 1 ---- 101010
123 4 ---- 101010
123 7 ---- 101010
123 8 ---- 101010
From what I can understand of your question, try this.
WITH cte AS (
SELECT ItemNumber, ItemNumber, AccountNumber,
datediff(day, lag(OrderDate,1) over (partition by AccountNumber order by OrderDate), OrderDate) as [NumOfDaysBetweenOrdersByAccountNum]
)
SELECT
ItemNumber
, AVG(NumberOfDaysBetweenOrdersByAccountNum)
FROM
cte
GROUP BY
ItemNumber

SQL Azure - Create complex Pivot Table

This question is all for SQL Azure. I have a data set for various commodity prices by year and a unit price like:
Rice - 2007 - 0.5
Rice - 2007 - 0.3
Rice - 2007 - 0.8
Wheat - 2006 - 1.1
Wheat - 2006 - 1.4
etc
How can I create a pivot table that gives me the MAX and MIN price paid for each year for each commodity? I know how to do a pivot table that would give me something like the average - thats pretty easy. But I need my "main" pivot column to be the year and then each year would have its 2 "sub columns" for a MIN and MAX price and I'm not quite sure how to do that. Help!
Unless I am missing something in your explanation, you can do this easily without the PIVOT function:
select product,
year,
min(price) MinPrice,
max(price) MaxPrice
from yourtable
group by product, year
See SQL Fiddle with Demo.
If you want the data in separate columns, then there are a few ways that you can do this.
Aggregate function with CASE:
select product,
min(case when year=2006 then price else 0 end) [2006_MinPrice],
max(case when year=2006 then price else 0 end) [2006_MaxPrice],
min(case when year=2007 then price else 0 end) [2007_MinPrice],
max(case when year=2007 then price else 0 end) [2007_MaxPrice]
from yourtable
group by product
See SQL Fiddle with Demo
UNPIVOT and PIVOT:
The UNPIVOT is used to transform your column data into rows. Once in the rows, you can create the new columns with the year and then pivot:
select *
from
(
select product,
cast(year as varchar(4))+'_'+col as piv_col,
value
from
(
select product,
year,
min(price) MinPrice,
max(price) MaxPrice
from yourtable
group by product, year
) x
unpivot
(
value for col in (minPrice, maxPrice)
) u
) d
pivot
(
max(value)
for piv_col in ([2006_MinPrice], [2006_MaxPrice],
[2007_MinPrice], [2007_MaxPrice])
) piv;
See SQL Fiddle with Demo. These give the result:
| PRODUCT | 2006_MINPRICE | 2006_MAXPRICE | 2007_MINPRICE | 2007_MAXPRICE |
---------------------------------------------------------------------------
| Rice | 0 | 0 | 0.3 | 0.8 |
| Wheat | 1.1 | 1.4 | 0 | 0 |
If you have an unknown number of years, then you coul also implement dynamic sql.

How to find N Consecutive records in a table using SQL

I have the following Table definition with sample data. In the following table, Customer Product & Date are key fields
Table One
Customer Product Date SALE
X A 01/01/2010 YES
X A 02/01/2010 YES
X A 03/01/2010 NO
X A 04/01/2010 NO
X A 05/01/2010 YES
X A 06/01/2010 NO
X A 07/01/2010 NO
X A 08/01/2010 NO
X A 09/01/2010 YES
X A 10/01/2010 YES
X A 11/01/2010 NO
X A 12/01/2010 YES
In the above table, I need to find the N or > N consecutive records where there was no sale, Sale value was 'NO'
For example, if N is 2, the the result set would return the following
Customer Product Date SALE
X A 03/01/2010 NO
X A 04/01/2010 NO
X A 06/01/2010 NO
X A 07/01/2010 NO
X A 08/01/2010 NO
Can someone help me with a SQL query to get the desired results. I am using SQL Server 2005. I started playing using ROW_NUMBER() AND PARTITION clauses but no luck.
Thanks for any help
You need to match your table against itself, as if there where 2 tables. So you use two aliases, o1 and o2 to refer to your table:
SELECT DISTINCT o1.customer, o1.product, o1.datum, o1.sale
FROM one o1, one o2
WHERE (o1.datum = o2.datum-1 OR o1.datum = o2.datum +1)
AND o1.sale = 'NO'
AND o2.sale = 'NO';
customer | product | datum | sale
----------+---------+------------+------
X | A | 2010-01-03 | NO
X | A | 2010-01-04 | NO
X | A | 2010-01-06 | NO
X | A | 2010-01-07 | NO
X | A | 2010-01-08 | NO
Note that I performed the query on an postgresql database - maybe the syntax differs on ms-sql-server, maybe at the alias 'FROM one AS o1' perhaps, and maybe you cannot add/substract in that way.
A different approach, inspired by munchs last line.
Get - for a given date the first date with YES later than that, and the last date with YES earlier than that. These form the boundary, where our dates shall fit in.
SELECT (o1.datum),
MAX (o3.datum) - MIN (o2.datum) AS diff
FROM one o1, one o2, one o3
WHERE o1.sale = 'NO'
AND o3.datum <
(SELECT MIN (datum)
FROM one
WHERE datum >= o1.datum
AND SALE = 'YES')
AND o2.datum >
(SELECT MAX (datum)
FROM one
WHERE datum <= o1.datum
AND SALE = 'YES')
GROUP BY o1.datum
HAVING MAX (o3.datum) - MIN (o2.datum) >= 2
ORDER BY o1.datum;
Maybe it needs some kind of optimization, because table one is 5 times involved in the query. :)
Thanks to everyone for posting your solution. Thought, I would also share my solution with everyone. Just as an FYI, I received this solution from another SQL Server Central forum member. I am definitely not going to take credit for this solution.
DECLARE #CNT INT
SELECT #CNT = 3
SELECT * FROM
(
SELECT
[Customer], [Product], [Date], [Sale], groupID,
COUNT(*) OVER (PARTITION BY [Customer], [Product], [Sale], groupID) AS groupCnt
FROM
(
SELECT
[Customer], [Product], [Date], [Sale],
ROW_NUMBER() OVER (PARTITION BY [Customer], [Product] ORDER BY [Date])
- ROW_NUMBER() OVER (PARTITION BY [Customer], [Product], [Sale] ORDER BY [Date]) AS groupID
FROM
[TableSales]
) T1
) T2
WHERE
T2.[Sale] = 'NO' AND T2.[groupCnt] >= #CNT
Ok, we need a variable answer. We search for a date, where we have N following dates, all with the sale-field being NO.
SELECT d1.datum
FROM one d1, one d2, i
WHERE d1.sale = 'NO' AND d2.sale = 'NO'
AND d1.datum = (d2.datum - i)
AND i > 0 AND i < 4
GROUP BY d1.datum
HAVING COUNT (*) = 3;
This will give us the date, which we use for subquerying.
Notes:
I used 'datum' instead of date, because date is a reserved keyword on postgresql.
In Oracle you can use a virtual table dummy, which contains anything you ask for, like 'SELCT foo FROM dual WHERE foo in (1, 2, 3);' which will give you 1, 2, 3, if I remember correctly. Depending on the vendor, there might be other tricks to get a sequence 1 to N. I created a table i with column i, and filled it with the values 1 to 100, and I expect N not to exceed 100; Since a few versions, postgresql contains a function 'generate_series (from, to) which would solve the problem too, and might have similarities with solutions for your specific database. But table i should work vendor independent.
if N == 17, you have to modify 3 places from 3 to 17.
The final query will be:
SELECT o4.*
FROM one o3, one o4
WHERE o3.datum = (
SELECT d1.datum
FROM one d1, one d2, i
WHERE d1.sale = 'NO' AND d2.sale = 'NO'
AND d1.datum = (d2.datum - i)
AND i > 0 AND i <= 3
GROUP BY d1.datum
HAVING COUNT (*) = 3)
AND o4.datum <= o3.datum + 3
AND o4.datum >= o3.datum;
customer | product | datum | sale
----------+---------+------------+------
X | A | 2010-02-06 | NO
X | A | 2010-02-07 | NO
X | A | 2010-02-08 | NO
X | A | 2010-02-09 | NO