Update records and set values - sql

I have a table called Transaction which contains some columns: [TransactionID, Type(credit or debit), Amount, Cashout, CreditPaid, EndTime]
Customers can get lots of credit and these transactions are stored in the transactions table. If a customer pays at the end of the month an amount which covers some or all of the credit transactions, I want those transactions to be updated. If the total payment covers some transactions, then the transactions should be updated.
For example, a customer pays in 300. If the transaction 'Amount' is 300 and 'Type' is credit then the 'CreditPaid' amount should be 300. (This is a simple update statement) but...
If there are two transactions i.e. one 300 and another 400 and are both credit and the monthly payment amount is 600, then the oldest transaction should be paid 300 in full, and the next transaction 300 leaving 100 outstanding.
Any ideas how to do this?
TrID Buyin Type Cashout CustID StartTime EndTime AddedBy CreditPaid
72 200 Credit 0 132 2013-05-21 NULL NULL NULL
73 300 Credit 0 132 2013-05-22 NULL NULL NULL
75 400 Credit 0 132 2013-05-23 NULL NULL NULL
Desired Results after customer pays 600
TrID Buyin Type Cashout CustID StartTime EndTime AddedBy CreditPaid
72 200 Credit 0 132 2013-05-21 2013-05-24 NULL 200
73 300 Credit 0 132 2013-05-22 2013-05-24 NULL 300
75 400 Credit 0 132 2013-05-23 NULL NULL 100

Here's a SQL 2008 version:
CREATE PROCEDURE dbo.PaymentApply
#CustID int,
#Amount decimal(11, 2),
#AsOfDate datetime
AS
WITH Totals AS (
SELECT
T.*,
RunningTotal =
Coalesce (
(SELECT Sum(S.Buyin - Coalesce(S.CreditPaid, 0))
FROM dbo.Trans S
WHERE
T.CustID = S.CustID
AND S.Type = 'Credit'
AND S.Buyin < Coalesce(S.CreditPaid, 0)
AND (
T.Starttime > S.Starttime
OR (
T.Starttime = S.Starttime
AND T.TrID > S.TrID
)
)
),
0)
FROM
dbo.Trans T
WHERE
CustID = #CustID
AND T.Type = 'Credit'
AND T.Buyin < Coalesce(T.CreditPaid, 0)
)
UPDATE T
SET
T.EndTime = P.EndTime,
T.CreditPaid = Coalesce(T.CreditPaid, 0) + P.CreditPaid
FROM
Totals T
CROSS APPLY (
SELECT TOP 1
V.*
FROM
(VALUES
(T.Buyin - Coalesce(T.CreditPaid, 0), #AsOfDate),
(#Amount - RunningTotal, NULL)
) V (CreditPaid, EndTime)
ORDER BY
V.CreditPaid,
V.EndTime DESC
) P
WHERE
T.RunningTotal <= #Amount
AND #Amount > 0;
;
See a Live Demo at SQL Fiddle
Or, for anyone using SQL 2012, you can replace the contents of the CTE with a better-performing and simpler query using the new windowing functions:
SELECT
*,
RunningTotal =
Sum(Buyin - Coalesce(CreditPaid, 0)) OVER(
ORDER BY StartTime
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) - Buyin
FROM dbo.Trans
WHERE
CustID = #CustID
AND Type = 'Credit'
AND Buyin - Coalesce(CreditPaid, 0) > 0
See a Live Demo at SQL Fiddle
Here's how they work:
We calculate the running total for all the prior rows where the CreditPaid amount is less than the Buyin amount. Note this does NOT include the current row.
From this we can determine what portion of the payment will apply to each row and which rows will be involved in the payment. If the sum of all the credits for all the prior rows are higher than the payment, then this row will NOT be included, thus T.RunningTotal <= #Amount. That's because all the prior rows will fully consume the payment by this point, so we can stop applying it.
For each row where we will apply a payment, we want to pay as much as possible, but we have to pay attention to the last row where we may not be paying the full amount (as is the case with the third credit in the example). So we'll be paying one of two amounts: the full credit amount (with more rows to receive payments) or only the portion left over which could be less than the full credit for that row (and this is the last row). We accomplish this by taking the lesser of either 1) the full remaining Buyin - CreditPaid amount, or 2) what's left of the full amount #Amount - RunningTotalOfPriorRows. I could have done this as a CASE expression, but I like using the Min function, especially because we would have had to do two CASE expressions to also determine whether to also update the EndTime column (per your requirements).
The SQL 2012 version simply calculates the same thing as the 2008 version: the sum of Buyin - CreditPaid for all the prior rows, using a windowing function instead of a correlated subquery.
Finally, we perform the update to all rows where the RunningTotal is less than the amount to be applied (since if it were equal to the amount, there would be no payment left for the current row).
Now, there are some larger considerations that you should think about.
Some of your scheme I like--I am not convinced that, as some commenters have said, you should ignore the individual transactions. I think that handling the individual transactions can be very important. It's much like how hospitals have one medical record number for each patient (MRN) but open a new account / file / visit each time the patient has a service performed. Each account is treated separately, and this is for many reasons, including--and this is where it seems important for you, too--the need for the customer to understand what exactly is comprising the total. It can be shocking to see the total all added up, but when this is broken out into individual transactions on individual dates, this makes a lot more sense to people and they can begin to understand exactly how they spent more money than they remembered at the time. "You owe me 600 bucks" can be harder to face than "your transactions for $100, $300, and $200 are still unpaid". :)
So, on to some big considerations here.
If you go with the theory that a transactional or balance-based account starts at 0 as a sort of "anchor", and to find the current balance you simply have to add up all the transactions: well, this does indeed satisfy relational theory, but in practice it is completely unworkable because it does not provide a fast, accurate way to get the current balance. It is imperative to have the current balance saved as a discrete value. If you were a bank, how would you know how much money you had, without adding up perhaps dozens of years of transaction history each time? Instead, it may be better to think of the current balance as the "anchor" (instead of 0) and think of the transactions as going backward in time. Additionally, there is no harm in recording periodic balances. Banks do this by closing out periods into statements, with a defined balance as of each statement closing date. There is no need to go all the way back to zero, since you don't care too much about the balance at the old, unanchored end of the history. You know that eventually every account started at 0. Big deal!
Given these thoughts, it is important for you to have a table where the customer's total account balance is simply stated. You also need a place to record his payments, refunds, cancellations, and so on. This should be separate from the accounts (in your case, transactions) themselves, because there is not a one-to-one correspondence between payment transactions and credit transactions. Already in your current scheme you have partially paid transactions with no date recorded--this is a huge gap in the system that will come back to bite you. What if a customer paid $10 a day toward a $200 credit for 20 days? 19 of those payments would show no date paid.
What I recommend, then, is that you create a stored procedure (SP) that applies payments to totals first, and then create another one that will "rewrite" the payments into the transactions in an on-demand way. Think about what a credit card company has to do if they "re-rate" your account. Perhaps they acted on incorrect information and increased your interest rate on a certain date. (This actually happened to me. I proved to them that the collections activity they were responding to was not my fault--it had been retracted by the original company after I showed them that one of their staff had mistakenly changed my mailing address, and I had never received a bill to be able to pay. So they had to be able to re-run all the purchase/debit/interest rate calculations on my account retroactively, to recalculate everything after the original change date based on the correct interest rate.) Think about this a bit and you will see that it is quite possible to operate this way, as long as you design your system properly. Your SP is given a date range or set of transactions within which it is allowed to work, and then "rewrites" history as if the old history had never existed.
But, you don't actually want to destroy history, so this is further complicated by the fact that at one point in time, your best knowledge of the customer's account balance for a time period was a different amount than your current best knowledge of their account balance for that time period--both are true data and need to be kept.
Let's say you discover that your system occasionally doubled up Credit transactions mistakenly. When you fix the customer data, you need to be able to see the fact that they had the problem, even though they don't have it now. This is done by using additional date columns EffectiveDate and ExpirationDate--or whatever you want to call them. Then, these need to be part of the clustered index, and used on every query to always get the current values. I highly recommend using 9999-12-31 instead of NULL as your ExpirationDate value for current rows--this will have a huge positive impact on performance when querying for current data. I also recommend putting the ExpirationDate as the first column in the clustered index (or at least, before the EffectiveDate column), since history will always potentially have many more records than the future, so it will be more selective than EffectiveDate being first (think a bit: all past knowledge will have EffectiveDate =< GetDate() but only current or future data will have ExpirationDate > GetDate()). To drive the point home: you don't delete. You expire old rows by setting a column to the date the knowledge became obsolete, and you insert new rows representing the new knowledge, with a column showing the date you learned this information and having an indefinitely-open "to the future" value in the other date column.
And finally a couple of single points:
The CreditPaid column should be NOT NULL with a default of 0. I had to throw in a bunch of Coalesces to deal with the NULLs.
You need to handle overpayments somehow. Either by preventing them, or by storing the overpaid portion value and applying it later. You could OUTPUT the results of the UPDATE statement into a table, then select the Sum from this and make the SP return any unused payment value. There are many ways to handle this. If you build the "re-rate" SP as I suggested, then this won't be too much of a problem, as you can rerun it after receiving new transactions (then immediately (re)apply all payments for any open periods).
At this point I can't go on too much more, but I hope that these thoughts help you. Your design is a good start, but it needs some work to get it to the point where it will function well as an enterprise-quality system.
UPDATE
I corrected a glitch in the 2008 version (adding the conditions from the outer query to the subquery).
And here's my last edit (all: please do not edit this answer again or it will be converted to community wiki).
If you do go with a scheme where rows are marked with the dates they are understood to be true (EffectiveDate and ExpirationDate), you can make coding in your system a little easier by creating inline table functions that select only the active rows from the table WHERE EffectiveDate <= GetDate() AND GetDate() < ExpirationDate. Pay careful attention to the comparison operators you're using (e.g., <= vs <), and use date ranges that are inclusive at the start and exclusive at the end. If you aren't sure what that means, please do look these terms up and understand them before proceeding. You want to be able to change the resolution of your date data type in the future, without breaking any of your queries. If you use an inclusive end date, this will not be possible. There are many posts online talking about how to properly query for dates in SQL.
Something like this:
CREATE FUNCTION dbo.TransCurrent
RETURNS TABLE
AS
RETURN (
SELECT *
FROM dbo.Trans
WHERE
EffectiveDate <= GetDate()
AND GetDate() < ExpirationDate --make clustered index have this first!
);
Do NOT confuse this with a multi-statement table-value-returning function. That will NOT perform well. This function type here will perform well because it can be inlined into the query, where basically the engine takes the logical intent of what the function is doing, and disposes with the function call entirely. Using any other kind of function will defeat this and your performance will go into the pot as your table grows in size.

Related

Need to perform some not so straight forward data processing in Access 2010

I have a table in Access that is setup like the one in the photo. What I need to do is this:
For each part no, I want to sum the total Qty for each month and type (Ordered and Demand). Then I need to cap the qty in the rows where the type is = to Orders to the value of the Qty where the type is = to Orders, when the sum of the Qty for Ordered is greater than Demand. Let me try to explain it another way.
I want to look at a subset of the master data, in this case the subset is by part no (rows with identical part numbers). For this subset I want to have two sets of sums. 1. The sum of qty with type = Ordered AND 2. a sum of qty with type = Demand. If the sum for Ordered is greater than demand, I want to change the Qty for Ordered to be the value of the Qty for Demand.
Essentially, the business reason is that for reporting purposes the total Qty for Ordered shouldn't be more than Demand in a given month, for a part number.
Looking at the photo, the rows in red will need to change because the sum of the qty is 30, which is greater than the sum of qty for the green rows (25). The red rows qty should be changed to 20 and 5 to match the green rows.
Whew, hope this made sense because it is hard to explain. I have tried many things for a couple weeks now, and I am a bit fuzzy on the details so I will just give a high level. Ok so what have I tried:
I have tried to join the table to iself, using part no (and date I believe) to join on, but that doesn't work because the sum would somehow be incorrect sometimes.
Pivot the table, using the transform and pivot functions in Access but it's important for me to keep the individual dates in tact and when I pivoted it I had to roll it up on a month basis. This gives me the row structure I need to make the changes but I don't know how to get back the original date format after I am done.
I am guessing I need some VBA code that loops through each part no, but I am not big on VBA code and I don't have much time to learn it. Any suggestions? I know this is long winded but its a complicated problem (at least for me). Thanks in advance.

SQL how to implement if and else by checking column value

The table below contains customer reservations. Customers come and make one record in this table, and the last day this table will be updated its checkout_date field by putting that current time.
The Table
Now I need to extract all customers spending nights.
The Query
SELECT reservations.customerid, reservations.roomno, rooms.rate,
reservations.checkin_date, reservations.billed_nights, reservations.status,
DateDiff("d",reservations.checkin_date,Date())+Abs(DateDiff("s",#12/30/1899
14:30:0#,Time())>0) AS Due_nights FROM reservations, rooms WHERE
reservations.roomno=rooms.roomno;
What I need is, if customer has checkout status, due nights will be calculated checkin_date subtracting by checkout date instead current date, also if customer has checkout date no need to add extra absolute value from 14:30.
My current query view is below, also my computer time is 14:39 so it adds 1 to every query.
Since you want to calculate the Due nights upto the checkout date, and if they are still checked in use current date. I would suggest you to use an Immediate If.
The condition to check would be the status of the room. If it is checkout, then use the checkout_date, else use the Now(), something like.
SELECT
reservations.customerid,
reservations.roomno,
rooms.rate,
reservations.checkin_date,
reservations.billed_nights,
reservations.status,
DateDiff("d", checkin_date, IIF(status = 'checkout', checkout_date, Now())) As DueNights
FROM
reservations
INNER JOIN
rooms
ON reservations.roomno = rooms.roomno;
As you might have noticed, I used a JOIN. This is more efficient than merging the two tables with common identifier. Hope this helps !

Using SQL Can I get incremental changes in data from query results? Loops?

NOTE: I am not making changes to a database. I am creating a report.
The purpose of the report is to show pending orders that need to be assembled for shipment, but not until there is enough stock to fill the order. An order includes multiple inventory items, but the inventory on hand must be >= the ordered amount per each inventory item and in order by oldest date first before the order can be added to the report.
I've written this to where it pulls the orders, but I need it to loop through to the next order and carry over the quantity of inventory On Hand from the calculation prior to this order. When the calculation is < 0, I don't need to see the order.
EXAMPLE OUTPUT:
Order Date | Order No | Item No | Quantity Ordered | On Hand | Available Qty
2015-01-01 123456 555555 50 60 10
2015-01-02 555544 555555 10 10 00
Notice On Hand says 60 for Item No 555555 in the first row. This is the actual QOH, but the report needs to subtract the amount that was ordered in the previous line from my On Hand stock, and give me the remainder, or show the new available total under On Hand. When my On Hand amount can't fulfill an order, I don't want the order to appear on my report. My current report shows On Hand to be 60 in both rows, and instead of zero, like above, it just subtracts 10 from 60, as if it's my only order.
I don't know what approach to take to do this type of incremental change in a field, but I am assuming it involves a loop and a variable (If I need to add a variable, then it needs to begin with the actual Quantity on hand), ???? Could someone please assist me with a direction? My search to answer this has only left me more unsure of how to do this. I can provide the SQL, but it is rather complicated, so I am trying to keep this on a more general level.
"Looping" should be used as a last resort in SQL. You can do so using a CURSOR but they tend to run slower and require more work than standard SQL commands.
I would recommend trying to break this problem down into smaller tables using sub-queries / CTEs (Common Table Expressions). Can you create a query that shows the total on hand amounts for each item number? Now put that into a sub-query and start building on top of it.

Is there a way to handle immutability that's robust and scalable?

Since bigquery is append-only, I was thinking about stamping each record I upload to it with an 'effective date' similar to how peoplesoft works, if anybody is familiar with that pattern.
Then, I could issue a select statement and join on the max effective date
select UTC_USEC_TO_MONTH(timestamp) as month, sum(amt)/100 as sales
from foo.orders as all
join (select id, max(effdt) as max_effdt from foo.orders group by id) as latest
on all.effdt = latest.max_effdt and all.id = latest.id
group by month
order by month;
Unfortunately, I believe this won't scale because of the big query 'small joins' restriction, so I wanted to see if anyone else had thought around this use case.
Yes, adding a timestamp for each record (or in some cases, a flag that captures the state of a particular record) is the right approach. The small side of a BigQuery "Small Join" can actually return at least 8MB (this value is compressed on our end, so is usually 2 to 10 times larger), so for "lookup" table type subqueries, this can actually provide a lot of records.
In your case, it's not clear to me what the exact query you are trying to run is.. it looks like you are trying to return the most recent sales times of every individual item - and then JOIN this information with the SUM of sales amt per month of each item? Can you provide more info about the query?
It might be possible to do this all in one query. For example, in our wikipedia dataset, an example might look something like...
SELECT contributor_username, UTC_USEC_TO_MONTH(timestamp * 1000000) as month,
SUM(num_characters) as total_characters_used FROM
[publicdata:samples.wikipedia] WHERE (contributor_username != '' or
contributor_username IS NOT NULL) AND timestamp > 1133395200
AND timestamp < 1157068800 GROUP BY contributor_username, month
ORDER BY contributor_username DESC, month DESC;
...to provide wikipedia contributions per user per month (like sales per month per item). This result is actually really large, so you would have to limit by date range.
UPDATE (based on comments below) a similar query that finds "num_characters" for the latest wikipedia revisions by contributors after a particular time...
SELECT current.contributor_username, current.num_characters
FROM
(SELECT contributor_username, num_characters, timestamp as time FROM [publicdata:samples.wikipedia] WHERE contributor_username != '' AND contributor_username IS NOT NULL)
AS current
JOIN
(SELECT contributor_username, MAX(timestamp) as time FROM [publicdata:samples.wikipedia] WHERE contributor_username != '' AND contributor_username IS NOT NULL AND timestamp > 1265073722 GROUP BY contributor_username) AS latest
ON
current.contributor_username = latest.contributor_username
AND
current.time = latest.time;
If your query requires you to use first build a large aggregate (for example, you need to run essentially an accurate COUNT DISTINCT) another option is to break this query up into two queries. The first query could provide the max effective date by month along with a count and save this result as a new table. Then, could run a sum query on the resulting table.
You could also store monthly sales records in separate tables, and only query the particular table for the months you are interested in, simplifying your monthly sales summaries (this could also be a more economical use of BigQuery). When you need to find aggregates across all tables, you could run your queries with multiple tables listed after the FROM clause.

Date range intersection in SQL

I have a table where each row has a start and stop date-time. These can be arbitrarily short or long spans.
I want to query the sum duration of the intersection of all rows with two start and stop date-times.
How can you do this in MySQL?
Or do you have to select the rows that intersect the query start and stop times, then calculate the actual overlap of each row and sum it client-side?
To give an example, using milliseconds to make it clearer:
Some rows:
ROW START STOP
1 1010 1240
2 950 1040
3 1120 1121
And we want to know the sum time that these rows were between 1030 and 1100.
Lets compute the overlap of each row:
ROW INTERSECTION
1 70
2 10
3 0
So the sum in this example is 80.
If your example should have said 70 in the first row then
assuming #range_start and #range_end as your condition paramters:
SELECT SUM( LEAST(#range_end, stop) - GREATEST(#range_start, start) )
FROM Table
WHERE #range_start < stop AND #range_end > start
using the greatest/least and date functions you should be able to get what you need directly operating on the date type.
I fear you're out of luck.
Since you don't know the number of rows that you will be "cumulatively intersecting", you need either a recursive solution, or an aggregation operator.
The aggregation operator you need is no option because SQL does not have the data type that it is supposed to operate on (that type being an interval type, as described in "Temporal Data and the Relational Model").
The recursive solution may be possible, but it is likely to be difficult to write, difficult to read to other programmers, and it is also questionable whether the optimizer can turn that query into the optimal data access strategy.
Or I misunderstood your question.
There's a fairly interesting solution if you know the maximum time you'll ever have. Create a table with all the numbers in it from one to your maximum time.
millisecond
-----------
1
2
3
...
1240
Call it time_dimension (this technique is often used in dimensional modelling in data warehousing.)
Then this:
SELECT
COUNT(*)
FROM
your_data
INNER JOIN time_dimension ON time_dimension.millisecond BETWEEN your_data.start AND your_data.stop
WHERE
time_dimension.millisecond BETWEEN 1030 AND 1100
...will give you the total number of milliseconds of running time between 1030 and 1100.
Of course, whether you can use this technique depends on whether you can safely predict the maximum number of milliseconds that will ever be in your data.
This is often used in data warehousing, as I said; it fits well with some kinds of problems -- for example, I've used it for insurance systems, where a total number of days between two dates was needed, and where the overall date range of the data was easy to estimate (from the earliest customer date of birth to a date a couple of years into the future, beyond the end date of any policies that were being sold.)
Might not work for you, but I figured it was worth sharing as an interesting technique!
After you added the example, it is clear that indeed I misunderstood your question.
You are not "cumulatively intersecting rows".
The steps that will bring you to a solution are :
intersect each row's start and end point with the given start and end points. This should be doable using CASE expressions or something of that nature, something in the style of :
SELECT (CASE startdate < givenstartdate : givenstartdate, CASE startdate >= givenstartdate : startdate) as retainedstartdate, (likewise for enddate) as retainedenddate FROM ... Cater for nulls and that sort of stuff as needed.
With the retainedstartdate and retainedenddate, use a date function to compute the length of the retained interval (which is the overlap of your row with the given time section).
SELECT the SUM() of those.