Multiple scenario in where clause - sql

I have the following data:
Invoice | Status | StatusDate
1111111 BackOrd null
1111111 Delivd 2020-01-01
2222222 BackOrd null
3333333 Delivd 2020-02-29
In the above invoice 11111 was on BackOrd at one time and now has been Delivered, 222222 is currently on BackOrd and 33333 was never on BackOrd and was Delivered. 22222&33333 are easy but 11111 is vexing me because I would want to only show the current Status of Delivered.
I've tried
where case when StatusDate is null then 'BackOrd' else 'Delivd' end = Status
and various iterations, however my examples of 11111 will bring back both rows which sure is was in both at one time. I feel like this shouldn't be that hard and maybe not enough coffee but something isn't making sense to me.

You want the latest row per invoice, so this is a top-1-per group problem. You can use window functions:
select *
from (
select t.*,
row_number() over(partition by invoice order by statusdate desc) rn
from mytable t
) t
where rn = 1
This works because SQL Server puts null values last when using a descending sort.

Related

Sort group members and aggregate with navigation function

I'm trying to group the table below, on columns id and due_month:
id
created_at
due_month
status
1
2021-02-05
2021-02
paused
1
2021-01-31
2021-02
normal
1
2021-01-15
2021-01
normal
2
2021-03-18
2021-03
normal
2
2021-03-07
2021-03
paused
2
2021-03-31
2021-08
normal
then within each group, sort the members on created_at in ascending order, and finally pick the last item's status value with latest created_at date (assumes that created_at never repeat on records with the same id).
Hence the output will look like this:
id
due_month
status
1
2021-01
normal
1
2021-02
paused
2
2021-03
normal
2
2021-08
normal
I tried out query like this but it didn't work (syntax error):
SELECT
`id`,
`due_month`,
LAST_VALUE(`status`) OVER (ORDER BY `created_at`) AS `status`
FROM `some_table`
GROUP BY
`id`,
`due_month`
;
Also I know it's possible to join information like MAX(`created_at`) AS latest to the original table, then filter by WHERE created_at = latest to get what's needed, but that doesn't look very efficient.
Any better ideas for writing down this type of logics in BigQuery?
Consider below approach
select id, due_month, status
from your_table
where true
qualify 1 = row_number() over win
window win as (partition by id, due_month order by created_at desc)
if applied to sample data in your question - output is

Database schema pattern for grouping transactions

I am working on an accounting system in which there is a way to revert transactions which are made by mistake.
There are processes which run on invoices which generate transactions.
One process can generate multiple transactions for an invoice. There can be multiple processes which can be run on an invoice.
The schema looks as under:
Transactions
========================================================
Id | InvoiceId | InvoiceProcessType | Amount | CreatedOn
1 1 23 10.00 Today
2 1 23 13.00 Today
3 1 23 17.00 Yesterday
4 1 23 32.00 Yesterday
Now 1 and 2 happened together and 3 and 4 happened together and I want to revert the latter (3,4), what would be a possible solution to group them.
One possible solution is to add a column ProcessCount which is incremented on every process.
The new schema would look as under.
Transactions
==============================================================================
Id | InvoiceId | InvoiceProcessType | Amount | CreatedOn | ProcessCount
1 1 23 10.00 Today 1
2 1 23 13.00 Today 1
3 1 23 17.00 Yesterday 2
4 1 23 32.00 Yesterday 2
Is there any other way I can implement this ?
TIA
If you are basing the batching on an arbitrary time frame between the createdon date/time values, then you can use lag() and a cumulative sum. For instance, if two rows are in the same batch if they are within an hour, then:
select t.*,
sum(case when prev_createdon > dateadd(hour, -1, createdon) then 0 else 1 end) over
(partition by invoiceid order by createdon, id) as processcount
from (select t.*,
lag(createdon) over (partition by invoiceid order by createdon, id) as prev_createdon
from transactions t
) t;
That said, it would seem that your processing needs to be enhanced. Each time the code runs, a row should be inserted into some table (say processes). The id generated from that insertion should be used to insert into transactions. That way, you can keep the information about when -- and who and what and so on -- inserted particular transactions.
You can use the dense_rank to identify it as follows:
select t.*,
dense_rank() over (partition by InvoiceId
order by CreatedOn desc) as ProcessCount
from your_table t
You can then revert (/delete) as per your requirement, There is no need to explicitly maintain the ProcessCount column. It can be derived as per the above query.

Calculating time with datetime by groups

I have two tables Tickets and Tasks. When ticket is registered then it appears in Tickets table and every action that is made with the ticket is saved in the Tasks table. Tickets table includes information like who created the ticket, start and end dates (if it is closed) etc. Tasks table looks like this:
ID Ticket_ID Task_type_ID Task_type Group_ID Submit_Date
1 120 1 Opened 3 2016-12-09 11:10:22.000
2 120 2 Assign 4 2016-12-09 12:10:22.000
3 120 3 Paused 4 2016-12-09 12:30:22.000
4 120 4 Unpause 4 2016-12-10 10:30:22.000
5 120 2 Assign 6 2016-12-12 10:30:22.000
6 120 2 Assign 7 2016-12-12 15:30:22.000
7 120 5 Modify NULL 2016-12-13 15:30:22.000
8 120 6 Closed NULL 2016-12-13 16:30:22.000
I would like to calculate the time how long each group completed their task. The start time is the time when the ticket was assigned to certain group and end time is when that group completes their task (if they assign it elsewhere or close it). But it should not include the paused time(task_type_ID 3 to 4). Also when ticket is assigned to other group the new group ID appears in the previous task/row. If the task goes through multiple groups it should calculate how long the ticket was in the hands of every group.
I know it is complicated but maybe someone has an idea that I can start to build from.
This is a quite sophisticated gaps-and-island problem.
Here is one approach at it:
select distinct
ticket_id,
group_id,
sum(sum(datediff(minute, submit_date, lead_submit_date)))
over(partition by group_id) elapsed_minutes
from (
select
t.*,
row_number() over(partition by ticket_id order by submit_date) rn1,
row_number() over(partition by ticket_id, group_id order by submit_date) rn2,
lead(submit_date) over(partition by ticket_id order by submit_date) lead_submit_date
from mytable t
) t
where task_type <> 'Paused' and group_id is not null
group by ticket_id, group_id, rn1 - rn2
In the subquery, we assign row numbers to records within two different partitions (by tickets vs by ticket and group), and recover the date of the next record with lead().
We can then use the difference between the row numbers to build groups of "adjacent" records (where the tickets stays in the same group), while not taking into account periods when the ticket was paused. Aggregation comes into play here.
The final step is to compute the overall time spent in each group : this handles the case when a ticket is assigned to the same group more than once during its lifecycle (although that's not showing in your sample data, the description of the question makes it sound like that may happen). We could do this with another level of aggregation but I went for a window sum and distinct, which avoids adding one more level of nesting to the query.
Executing the subquery independently might help understanding the logic better (see the below db fiddle).
For your sample data, the query yields:
ticket_id | group_id | minutes_elapsed
--------: | -------: | --------------:
120 | 3 | 60
120 | 4 | 2900
120 | 6 | 300
120 | 7 | 1440
I actually think this is pretty simple. Just use lead() to get the next submit time value and aggregate by the ticket and group ignoring pauses:
select ticket_id, group_id, sum(dur_sec)
from (select t.*,
datediff(second, submit_date, lead(submit_date) over (partition by ticket_id order by submit_date)) as dur_sec
from mytable t
) t
where task_type <> 'Paused' and group_id is not null
group by ticket_id, group_id;
Here is a db<>fiddle (with thanks to GMB for creating the original fiddle).

Running Count of Unique Identifier Occurrences in SQL

So I'm trying to get a running count of uses over time by a unique identifier,
E.G.
Date UniqueID Running Count
1/1/2019 234567 1
1/1/2019 123456 1
1/2/2019 234567 2
1/3/2019 234567 3
1/3/2019 123456 2
Basically I want to be able to see that on 1/3/2019 that was the 3rd time that UniqueID 234567 showed up in the data.
I tried:
SELECT Date, UniqueID,
count(UniqueID) OVER (ORDER BY Date, UniqueID rows unbounded preceding) AS RunningTotal
but this just does a overall running total, so it doesn't reset with a new UniqueID
SELECT Date, UniqueID, count(UniqueID) OVER (ORDER BY Date, UniqueID rows unbounded preceding) AS RunningTotal
Is there anything I could do to make it reset for each UniqueID
Assuming that the 2 in the last row is a typo, you want either ROW_NUMBER() or DENSE_RANK():
SELECT Date, UniqueID,
ROW_NUMBER(UniqueID) OVER (PARTITION BY UniqueID ORDER BY Date) AS RunningTotal
You would use DENSE_RANK() if you could have duplicates on one day that you wanted to count only once.
By the way, you could also express this using COUNT(*):
SELECT Date, UniqueID,
COUNT(*) OVER (PARTITION BY UniqueID ORDER BY Date) AS RunningTotal
There are some subtle differences in the handling of duplicate values. Normally, COUNT() is not used for this purpose because the ranking functions are so pervasive (and useful).

Find how many times it took to achieve a particular outcome in SQL Table

I need to find out how many attempts it takes to achieve an outcome from a SQL table. For example My table contains CustomerID, Outcome, OutcomeType. The outcome I am looking for is Sale
So if I had this record:
CID Outcome OutcomeID Date
1 No Answer 0 01/01/2015 08:00:00
1 No Interest 0 02/01/2015 09:00:00
1 Sale 1 02/02/2015 10:00:00
1 Follow up 2 03/02/2015 10:00:00
I can see it took 2 attempts to get a sale. I need to do this for all the customers in a table which contains thousands of entries. They may have entries after the sale and I need to exclude these, they may also have additional sales after the first but I am only interested in the first sale.
i hope this is enough info,
many thanks in advance
Edit as requested, the outcome I would look for would be:
CID CountToOutcome
1 2
2 3
3 5
etc
You can do this with window functions and aggregation:
select cid,
min(case when Outcome = 'Sale' then seqnum end) - 1 as AttemptsBeforeSale
from (select t.*,
row_number() over (partition by cid order by date) as seqnum
from t
) t
group by cid;
Note: This provides the value for the first sale for each cid.