Quickly calculating running totals in sql server using set based operations

Quickly calculating running totals in sql server using set based operations - sql

I have some data that looks like this:
+---+--------+-------------+---------------+--------------+
| | A | B | C | D |
+---+--------+-------------+---------------+--------------+
| 1 | row_id | disposal_id | excess_weight | total_weight |
| 2 | 1 | 1 | 0 | 30 |
| 3 | 2 | 1 | 10 | 30 |
| 4 | 3 | 1 | 0 | 30 |
| 5 | 4 | 2 | 5 | 50 |
| 6 | 5 | 2 | 0 | 50 |
| 7 | 6 | 2 | 15 | 50 |
| 8 | 7 | 2 | 5 | 50 |
| 9 | 8 | 2 | 5 | 50 |
+---+--------+-------------+---------------+--------------+
And I am transforming it to look like this:
+---+--------+-------------+---------------+--------------+
| | A | B | C | D |
+---+--------+-------------+---------------+--------------+
| 1 | row_id | disposal_id | excess_weight | total_weight |
| 2 | 1 | 1 | 0 | 30 |
| 3 | 2 | 1 | 10 | 30 |
| 4 | 3 | 1 | 0 | 20 |
| 5 | 4 | 2 | 5 | 50 |
| 6 | 5 | 2 | 0 | 45 |
| 7 | 6 | 2 | 15 | 45 |
| 8 | 7 | 2 | 5 | 30 |
| 9 | 8 | 2 | 5 | 25 |
+---+--------+-------------+---------------+--------------+
Basically, I need to update the total_weight column by subtracting the sum of the excess_weights from previous rows in the table which belong to the same disposal_id.
I'm currently using a cursor because it's faster then other solutions I've tried (cte, triangular join, cross apply). My cursor solution keeps a running total that is reset to zero for each new disposal_id, increments it by the excess weight, and performs updates when needed and runs in about 40 seconds. The other solutions I've tried took anywhere from 3-5 minutes and I'm wondering if there is a relatively performant way to do this using set based operations?

I've spent a lot of time optimizing such queries, ended up with two performant options: either store precalculated running totals, as described in Denormalizing to enforce business rules: Running Totals, or calculate them on the client, which is also fast and easy.

The other solution you probably already tried is to do something like the answers found here
Unless you are using Oracle, which has decent aggregates for cumulative sum, you're better off using a cursor. At best, you're going to have to rejoin the table to itself or use another methods for what should be a O(n) operation. In general, the set based solution for problems like these are messy or really messy.

'previous rows' implies an ordering. so no - no set based operations there.
Oracle's LEAD and LAG are built for this, but SQL Server forces you into triangular joins... which i suppose you have investigated.

Related

Transform table from sequential identifier to real with attributes

I changed a but the context, but it's basically the same issue.
Imagine we are in a never-ending tunnel, shaped like a circle. We split every section of the circle, from 1 to 10 and we'll call each section slot (sl). There are 2 groups (gr) of living things walking in the tunnel. Each group has 2 bands, where each has a name and global hitpoints (hp). Every group is walking forward (although the bands might change order). If a group is at slot #10 and moves forward, he will be at slot #1. We snapshot their information every day. All the data gathered is stored in a table with this structure:
+----------+----------------+------------------+----------------+----------------+------------------+----------------+----------------+------------------+----------------+----------------+------------------+--------------+--+
| day_id | | gr_1_sl_1_id | | gr_1_sl_1_name | | gr_1_sl_1_hp | | gr_1_sl_2_id | | gr_1_sl_2_name | | gr_1_sl_2_hp | | gr_2_sl_1_id | | gr_2_sl_1_name | | gr_2_sl_1_hp | | gr_2_sl_2_id | | gr_2_sl_2_name | | gr_2_sl_2_hp | |
+----------+----------------+------------------+----------------+----------------+------------------+----------------+----------------+------------------+----------------+----------------+------------------+--------------+--+
| 1 | 3 | orc | 100 | 4 | goblin | 10 | 10 | human | 50 | 1 | dwarf | 25 | |
| 2 | 6 | goblin | 7 | 7 | orc | 76 | 2 | human | 60 | 3 | dwarf | 28 | |
+----------+----------------+------------------+----------------+----------------+------------------+----------------+----------------+------------------+----------------+----------------+------------------+--------------+--+
As you can see, the columns are structured in a sequential way, while the data shows what is the actual value. What I want is to have the information shaped this way instead:
+---------+-------+-------+-----------+---------+
| id_game | gr_id | sl_id | band_name | band_hp |
+---------+-------+-------+-----------+---------+
| 1 | 1 | 3 | orc | 100 |
| 1 | 1 | 4 | goblin | 10 |
| 1 | 2 | 10 | human | 50 |
| 1 | 2 | 1 | dwarf | 25 |
| 2 | 1 | 6 | goblin | 7 |
| 2 | 1 | 7 | orc | 76 |
| 2 | 2 | 2 | human | 60 |
| 2 | 2 | 3 | dwarf | 28 |
+---------+-------+-------+-----------+---------+
I have this information in power bi, although I can create views in sql server if need be. I have tried many things, closest thing I got was unpivoting and parsing the original columns to get day_id, gr_id, sl_id, attributes and values. In attributes and values, it's basically name and hp with their corresponding value (I changed hp into string), but then I'm stocked, I'm not sure what to do next.
Anyone has any ideas ? Keep in mind that I oversimplified the problem; there are more groups, more slots, more bands and more statistics (i.e. attack and defense rating, etc.)

You seem to want to unpivot the table. In SQL Server, I recommend using apply:
select t.day_id, v.*
form t cross apply
(values (1, 1, gr_1_sl_1_id, gr_1_sl_1_name, gr_1_sl_1_hp),
(1, 2, gr_1_sl_2_id, gr_1_sl_2_name, gr_1_sl_2_hp),
(2, 1, gr_2_sl_1_id, gr_1_sl_1_name, gr_2_sl_1_hp),
(2, 2, gr_2_sl_2_id, gr_1_sl_2_name, gr_2_sl_2_hp)
) v(id_game, gr_id, sl_id, band_name, band_hp);
In other databases, you can do something similar with union all.

How can I efficiently store large number sequences in a SQL db?

I want to store some number sequences in my database. So I:
+-----+---------+-----+
| idx | seq_id | x |
+-----+---------+-----+
| 1 | 1 | 1 |
| 2 | 1 | 1 |
| 3 | 1 | 2 |
| 4 | 1 | 3 |
| 5 | 1 | 5 |
| 6 | 1 | 7 |
| 1 | 2 | 1 |
| 2 | 2 | 2 |
| 3 | 2 | 4 |
| 4 | 2 | 8 |
| 5 | 2 | 16 |
| ... |
+-----+---------+-----+
but when I look at it, it feels like I'm storing more overhead with idx and seq_id than meaningful information.
In some sense I am, but I wouldn't find strange if the database engine optimized most of the repetition here. Is this the case for SQLite, MySQL, Postgre...?
And what can I make, perhaps in terms of table definition, to help the db optimize this storage pattern?

Possible fallbacks in my pagination technique and how can I improve it?

I want to perform pagination for my web page.The method that I am using (and I found mostly on internet ) is explained below with an example.
Suppose I have the following table user
+----+------+----------+
| id | name | category |
+----+------+----------+
| 1 | a | 1 |
| 2 | b | 2 |
| 3 | c | 2 |
| 4 | d | 3 |
| 5 | e | 1 |
| 6 | f | 3 |
| 7 | g | 1 |
| 8 | h | 3 |
| 9 | i | 2 |
| 10 | j | 2 |
| 11 | k | 1 |
| 12 | l | 3 |
| 13 | m | 3 |
| 14 | n | 3 |
| 15 | o | 1 |
| 16 | p | 1 |
| 17 | q | 2 |
| 18 | r | 1 |
| 19 | s | 3 |
| 20 | t | 3 |
| 21 | u | 3 |
| 22 | v | 3 |
| 23 | w | 1 |
| 24 | x | 1 |
| 25 | y | 2 |
| 26 | z | 2 |
+----+------+----------+
And I want to show information about category 3 users with 2 users per page, I am using the following query for this
select * from user where category=3 limit 0,2;
+----+------+----------+
| id | name | category |
+----+------+----------+
| 4 | d | 3 |
| 6 | f | 3 |
+----+------+----------+
and for next two
select * from user where category=3 limit 2,2;
+----+------+----------+
| id | name | category |
+----+------+----------+
| 8 | h | 3 |
| 12 | l | 3 |
+----+------+----------+
and so on.
Now in practice I have around 7000 tuples in a single table.So is there any better way in terms of speed to achieve this or in terms of any fallback this method may have.
Thanks.

You don't want to fetch more values than your current page can handle, so yes, you will essentially be making one query per page. Some other solutions (such as Rails will_paginate) will execute essentially the same queries.
Now, you could build some logic into your client side to do the pagination there - prefetch multiple (or all) pages at once and store them on the client side. This way pagination is handled completely on the client side without need for further queries. It is a bit wasteful if a user is likely to only look at a small percentage of pages overall though.
If your actual production table has more columns in it, you could select only the relevant columns instead of *, or potentially add some sort of order by (for sorting).

I hope this will help, you gotta put your page number in place of your_page_number, and records per page in place of records_per_page which in your sample is 2:
select A.* from
(select #row := #row + 1 as Row_Number, User.* from User
join (select #row := 0) Row_Temp_View
where category = 3
) A
where row_number
between (your_page_number * records_per_page)-records_per_page+1
and your_page_number * records_per_page;
notice that this will fetch you the right records, where your sample will not, and this is because your sample will fetch you always two records, which is not always true, lets say that you have 3 users you wonna show in two pages so your sample will show the first and the second in the first page and it will show the second and the third in the second page which is not right, my code will show you the first and the second in the first page and in the second page it will show you only the third one....

You can use Datatables. It's meant for exact same thing that you are looking for. I successfully use it for paginating more than a million rows, it's very fast & easy to implement.

How to generate this report?

I'm trying to set up a report based on several tables.
I have a table Actual that looks like this:
+--------+------+
| status | date |
+--------+------+
| 5 | 7/10 |
| 8 | 7/9 |
| 8 | 7/11 |
| 5 | 7/18 |
+--------+------+
Table Targets looks like this:
+--------+-------------+--------+------------+
| status | weekEndDate | target | cumulative |
+--------+-------------+--------+------------+
| 5 | 7/12 | 4 | 45 |
| 5 | 7/19 | 5 | 50 |
| 8 | 7/12 | 4 | 45 |
| 8 | 7/19 | 5 | 50 |
+--------+-------------+--------+------------+
Grouping the Actual records by which Targets.weekEndDate they fall under, I have the following aggregate query GroupActual:
+-------------+------------+--------------+--------+------------+
| weekEndDate | status | weeklyTarget | actual | cumulative |
+-------------+------------+--------------+--------+------------+
| 7/12 | 5 | 4 | 1 | 45 |
| 7/12 | 8 | 4 | 2 | 41 |
| 7/19 | 5 | 5 | 1 | 50 |
| 7/19 | 8 | 4 | | 45 |
+-------------+------------+--------------+--------+------------+
I'm trying to create this report:
+--------+------------+------+------+
| status | category | 7/12 | 7/19 | ...etc for every weekEndDate entry in Targets
+--------+------------+------+------+
| 5 | actual | 1 | 1 |
| 5 | target | 4 | 5 |
| 5 | cumulative | 45 | 50 |
+--------+------------+------+------+
| 8 | actual | 2 | |
| 8 | target | 4 | 5 |
| 8 | cumulative | 45 | 50 |
+--------------+------+------+------+
I can use a crosstab query to make the date columns, but I'm not sure how to have rows for "actual", "target", and "cumulative". They aren't values in the same table, which means (I think) that a crosstab query won't be useful for this breakdown. Should I try to change GroupActual so that it puts the data in the shape I'm looking for? Kind of confused as to where to go next with this...
EDIT: I've made some headway on the crosstabs as per PowerUser's solution, but I'm having trouble with the one for Target. I modified the wizard's generated SQL in an attempt to get what I want but it's not working out. I used a version of GroupActual that only has the weekEndDate,status, and weeklyTarget columns; here's the SQL:
TRANSFORM weeklyTarget
SELECT status
FROM TargetStatus_forCrosstab_Target
GROUP BY status,weeklyTarget
PIVOT Format([weekEndDate],"Short Date");

You're almost there. The problem is that you can't do this all in a single crosstab. You need to make 3 crosstabs (one for 'actual', one for 'target', and one for 'cumulative'), then make a Union query to combine them all.
Additional Tip: In your individual crosstabs, add a Sort column. Your 'actual' crosstab will have a Sort value of 1, 'Target' will have a Sort value of 2, and 'Cumulative' will have 3. That way, when you union them together, you can get them all in the right order.

Rails 3 - complex query with joins and counts, possible subqueries?

Ok, so i have a bit of a complex query i am trying to come up with in my rails application. I have four tables: Clients, Projects, Invoices, Invoice_Line_Items. I am trying to get certain bits of data from all of those tables and display it in a "reports" type view in my application. This is what the structures look like for the four tables:
Clients
| id | name | archive |
----------------------------------------
| 1 | Client 1 | 0 |
| 2 | Client 2 | 0 |
Projects
| id | client_id | name | archive |
------------------------------------------------------
| 1 | 1 | Project 1 | 0 |
| 2 | 1 | Project 2 | 1 |
| 3 | 2 | Project 3 | 0 |
| 4 | 2 | Project 4 | 1 |
Invoices
| id | client_id | project_id | name | archive |
----------------------------------------------------------------------
| 1 | 1 | 1 | Invoice 1 | 0 |
| 2 | 1 | 1 | Invoice 2 | 0 |
| 3 | 1 | 2 | Invoice 3 | 1 |
| 4 | 1 | 2 | Invoice 4 | 1 |
| 5 | 2 | 3 | Invoice 5 | 0 |
| 6 | 2 | 3 | Invoice 6 | 0 |
| 7 | 2 | 4 | Invoice 7 | 1 |
| 8 | 2 | 4 | Invoice 8 | 1 |
Invoice_Line_Items
| id | invoice_id | name | amount_due |
---------------------------------------------------------
| 1 | 1 | Item 1 | 500 |
| 2 | 1 | Item 2 | 500 |
| 3 | 2 | Item 3 | 500 |
| 4 | 2 | Item 4 | 500 |
| 5 | 3 | Item 5 | 500 |
| 6 | 3 | Item 6 | 500 |
| 7 | 4 | Item 7 | 500 |
| 8 | 4 | Item 8 | 500 |
| 9 | 5 | Item 9 | 500 |
| 10 | 5 | Item 10 | 500 |
| 11 | 6 | Item 11 | 500 |
| 12 | 6 | Item 12 | 500 |
| 13 | 7 | Item 13 | 500 |
| 14 | 7 | Item 14 | 500 |
| 15 | 8 | Item 15 | 500 |
| 16 | 8 | Item 16 | 500 |
Ok, hope those diagrams make sense enough. What i am looking for as a result set is this (example data set taken from above example data):
| clients.name | current_projects | archived_projects | total_amount_due | total_amount_paid |
-----------------------------------------------------------------------------------------------------------
| Client 1 | 1 | 1 | 2000 | 2000 |
| Client 2 | 1 | 1 | 2000 | 2000 |
Ok, so here's what's going on there:
Getting all non-archived clients
Getting a count of all non-archived projects
Getting a count of all archived projects
Getting a total_amount_due from the invoice_line_items table that is a sum of all of the non-archived invoices
Getting a total_amount_paid from the invoice_line_items table that is a sum of all of the archived invoices
I am relatively new to Rails and this is a fairly complex query (at least in my head). Please let me know if there is a simpler solution that i am overlooking or if i am just over complicating it. If i need to do multiple queries in my controller that's fine, i was just wanting to see if i could get away with one sql call. I'm pretty sure i can do this pretty easily with some subqueries but i'm not sure how to write those in the controller in Rails.
Thanks for any help or direction you can provide and if this question is just outrageous or whatever just let me know and i'll delete it and go search the Googles more (have tried already to no avail).

Ok, well i ended up figuring out a solution myself. Not quite sure it's the best solution....feels heavy and messing but i just created quite a few objects in the controller to get the sql statements i needed to pull the data from the database. I basically have one object for each column (column, not each row). Let me know if anyone can figure out a better solution.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Quickly calculating running totals in sql server using set based operations - sql

I've spent a lot of time optimizing such queries, ended up with two performant options: either store precalculated running totals, as described in Denormalizing to enforce business rules: Running Totals, or calculate them on the client, which is also fast and easy.

'previous rows' implies an ordering. so no - no set based operations there. Oracle's LEAD and LAG are built for this, but SQL Server forces you into triangular joins... which i suppose you have investigated.

Related

Transform table from sequential identifier to real with attributes

How can I efficiently store large number sequences in a SQL db?

Possible fallbacks in my pagination technique and how can I improve it?

How to generate this report?

Rails 3 - complex query with joins and counts, possible subqueries?

Categories

Resources