Database schema pattern for grouping transactions - sql

I am working on an accounting system in which there is a way to revert transactions which are made by mistake.
There are processes which run on invoices which generate transactions.
One process can generate multiple transactions for an invoice. There can be multiple processes which can be run on an invoice.
The schema looks as under:
Transactions
========================================================
Id | InvoiceId | InvoiceProcessType | Amount | CreatedOn
1 1 23 10.00 Today
2 1 23 13.00 Today
3 1 23 17.00 Yesterday
4 1 23 32.00 Yesterday
Now 1 and 2 happened together and 3 and 4 happened together and I want to revert the latter (3,4), what would be a possible solution to group them.
One possible solution is to add a column ProcessCount which is incremented on every process.
The new schema would look as under.
Transactions
==============================================================================
Id | InvoiceId | InvoiceProcessType | Amount | CreatedOn | ProcessCount
1 1 23 10.00 Today 1
2 1 23 13.00 Today 1
3 1 23 17.00 Yesterday 2
4 1 23 32.00 Yesterday 2
Is there any other way I can implement this ?
TIA

If you are basing the batching on an arbitrary time frame between the createdon date/time values, then you can use lag() and a cumulative sum. For instance, if two rows are in the same batch if they are within an hour, then:
select t.*,
sum(case when prev_createdon > dateadd(hour, -1, createdon) then 0 else 1 end) over
(partition by invoiceid order by createdon, id) as processcount
from (select t.*,
lag(createdon) over (partition by invoiceid order by createdon, id) as prev_createdon
from transactions t
) t;
That said, it would seem that your processing needs to be enhanced. Each time the code runs, a row should be inserted into some table (say processes). The id generated from that insertion should be used to insert into transactions. That way, you can keep the information about when -- and who and what and so on -- inserted particular transactions.

You can use the dense_rank to identify it as follows:
select t.*,
dense_rank() over (partition by InvoiceId
order by CreatedOn desc) as ProcessCount
from your_table t
You can then revert (/delete) as per your requirement, There is no need to explicitly maintain the ProcessCount column. It can be derived as per the above query.

Related

Calculating time with datetime by groups

I have two tables Tickets and Tasks. When ticket is registered then it appears in Tickets table and every action that is made with the ticket is saved in the Tasks table. Tickets table includes information like who created the ticket, start and end dates (if it is closed) etc. Tasks table looks like this:
ID Ticket_ID Task_type_ID Task_type Group_ID Submit_Date
1 120 1 Opened 3 2016-12-09 11:10:22.000
2 120 2 Assign 4 2016-12-09 12:10:22.000
3 120 3 Paused 4 2016-12-09 12:30:22.000
4 120 4 Unpause 4 2016-12-10 10:30:22.000
5 120 2 Assign 6 2016-12-12 10:30:22.000
6 120 2 Assign 7 2016-12-12 15:30:22.000
7 120 5 Modify NULL 2016-12-13 15:30:22.000
8 120 6 Closed NULL 2016-12-13 16:30:22.000
I would like to calculate the time how long each group completed their task. The start time is the time when the ticket was assigned to certain group and end time is when that group completes their task (if they assign it elsewhere or close it). But it should not include the paused time(task_type_ID 3 to 4). Also when ticket is assigned to other group the new group ID appears in the previous task/row. If the task goes through multiple groups it should calculate how long the ticket was in the hands of every group.
I know it is complicated but maybe someone has an idea that I can start to build from.
This is a quite sophisticated gaps-and-island problem.
Here is one approach at it:
select distinct
ticket_id,
group_id,
sum(sum(datediff(minute, submit_date, lead_submit_date)))
over(partition by group_id) elapsed_minutes
from (
select
t.*,
row_number() over(partition by ticket_id order by submit_date) rn1,
row_number() over(partition by ticket_id, group_id order by submit_date) rn2,
lead(submit_date) over(partition by ticket_id order by submit_date) lead_submit_date
from mytable t
) t
where task_type <> 'Paused' and group_id is not null
group by ticket_id, group_id, rn1 - rn2
In the subquery, we assign row numbers to records within two different partitions (by tickets vs by ticket and group), and recover the date of the next record with lead().
We can then use the difference between the row numbers to build groups of "adjacent" records (where the tickets stays in the same group), while not taking into account periods when the ticket was paused. Aggregation comes into play here.
The final step is to compute the overall time spent in each group : this handles the case when a ticket is assigned to the same group more than once during its lifecycle (although that's not showing in your sample data, the description of the question makes it sound like that may happen). We could do this with another level of aggregation but I went for a window sum and distinct, which avoids adding one more level of nesting to the query.
Executing the subquery independently might help understanding the logic better (see the below db fiddle).
For your sample data, the query yields:
ticket_id | group_id | minutes_elapsed
--------: | -------: | --------------:
120 | 3 | 60
120 | 4 | 2900
120 | 6 | 300
120 | 7 | 1440
I actually think this is pretty simple. Just use lead() to get the next submit time value and aggregate by the ticket and group ignoring pauses:
select ticket_id, group_id, sum(dur_sec)
from (select t.*,
datediff(second, submit_date, lead(submit_date) over (partition by ticket_id order by submit_date)) as dur_sec
from mytable t
) t
where task_type <> 'Paused' and group_id is not null
group by ticket_id, group_id;
Here is a db<>fiddle (with thanks to GMB for creating the original fiddle).

finding the number of days in between first 2 date point

So the question seems to be quite difficult I wonder if I could get some advice from here. I am trying to solve this with SQLite 3. So I have a data format of this.
customer | purchase date
1 | date 1
1 | date 2
1 | date 3
2 | date 4
2 | date 5
2 | date 6
2 | date 7
number of times the customer repeats is random.
so I just want to find whether customer 1's 1st and 2nd purchase date are fallen in between a specific time period. repeat for other customers. only need to consider 1st and 2nd dates.
Any help would be appreciated!
We can try using ROW_NUMBER here:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY customer ORDER BY "purchase date") rn
FROM yourTable
)
SELECT
customer,
CAST(MAX(CASE WHEN rn = 2 THEN julianday("purchase date") END) -
MAX(CASE WHEN rn = 1 THEN julianday("purchase date") END) AS INTEGER) AS diff_in_days
FROM cte
GROUP BY
customer;
The idea here is to aggregate by customer and then take the date difference between the second and first purchase. ROW_NUMBER is used to find these first and second purchases, for each customer.

SQL Calculating time from last transaction for each ID

Hello I'm stuck trying to calculate the difference in time between each transaction for each ID.
The data looks like
Customer_ID | Transaction_Time
1 00:30
1 00:35
1 00:37
1 00:38
2 00:20
2 00:21
2 00:23
I'm trying to get the result to look something like
Customer_ID | Time_diff
1 5
1 2
1 1
2 1
2 2
I would really appreciate any help.
Thanks
Most databases support the LAG() function. However, the date/time functions can depend on the database. Here is an example for SQL Server:
select t.*
from (select t.*,
datediff(second,
lag(transaction_time) over (partition by customer_id order by transaction_time),
transaction_time
) as diff
from t
) t
where diff is not null;
The logic would be similar in most databases, although the function for calculating the time difference varies.

Find how many times it took to achieve a particular outcome in SQL Table

I need to find out how many attempts it takes to achieve an outcome from a SQL table. For example My table contains CustomerID, Outcome, OutcomeType. The outcome I am looking for is Sale
So if I had this record:
CID Outcome OutcomeID Date
1 No Answer 0 01/01/2015 08:00:00
1 No Interest 0 02/01/2015 09:00:00
1 Sale 1 02/02/2015 10:00:00
1 Follow up 2 03/02/2015 10:00:00
I can see it took 2 attempts to get a sale. I need to do this for all the customers in a table which contains thousands of entries. They may have entries after the sale and I need to exclude these, they may also have additional sales after the first but I am only interested in the first sale.
i hope this is enough info,
many thanks in advance
Edit as requested, the outcome I would look for would be:
CID CountToOutcome
1 2
2 3
3 5
etc
You can do this with window functions and aggregation:
select cid,
min(case when Outcome = 'Sale' then seqnum end) - 1 as AttemptsBeforeSale
from (select t.*,
row_number() over (partition by cid order by date) as seqnum
from t
) t
group by cid;
Note: This provides the value for the first sale for each cid.

Compare 2 subsets of data from table?

I'm not sure if this is possible - I'm having real trouble getting my head around it.
This is for a product schedule, showing how much we are expecting to deliver on a given date. Data is imported into this schedule weekly which creates a new entry.
For example, if the schedule for the day currently totals 10, and you import 15, a new row is inserted with Qty 5, bringing the sum to 15.
The data I have is like so:
Product | Delivery Required Date | Qty
Prod1 | 1/1/13 | 10
Prod1 | 1/1/13 | -10
Prod1 | 1/1/13 | 10
Prod1 | 1/1/13 | -10
Prod1 | 1/1/13 | 25
I want to design a query which shows the variance between the previous schedule, and the current schedule.
For example, the query will sum all of the rows "Qty", excluding the last entry - and compare it to the last entry. In the data above, the variance is 25 (Existing total was 0, latest entry is 25, 0+25 =25).
Is this possible?
Thanks
I suspect there'a better answer using Common Table Expressions, but a quick & ugly solution might be
select sum(case when EntryNo <> MAX(EntryNo) then Qty else 0 end) as 'sumLessLast'
from MyTable
If MyTable has a million rows in it you'll want a better solution.
SqlServer 2005 and 2008:
;with r1 as (
select DeliveryReqDate, sum(Qty) as TotalQty
from TableName
group by DeliveryReqDate)
, r2 as (
select DeliveryReqDate, Qty
, row_number() over (partition by DeliveryReqDate order by EntryNo desc) rn
from TableName)
select r1.DeliveryReqDate, r1.TotalQty, r2.Qty as LastQty
, r1.TotalQty - r2.Qty as TotalButLastQty
from r1
join r2 on r2.DeliveryReqDate = r1.DeliveryReqDate and r2.rn = 1
SqlServer 2012
;with r1 as (
select DeliveryReqDate, Qty
, sum(Qty) over (partition by DeliveryReqDate) as TotalQty
, row_number() over (partition by DeliveryReqDate order by EntryNo desc) rn
from TableName)
select DeliveryReqDate, TotalQty, Qty as LastQty
, TotalQty - Qty as TotalButLastQty
from r1
where rn = 1
I'm not sure that I completely understand logic regarding the accounting of product and date, but I hope you can adapt above queries to your needs.