How to integrate over segments using SQL - sql

I have a table with columns t_b; t_e; x were [t_b, t_e) denotes a period during which x resources where used. I want to compute a table were for each hour h I have amount of resources that where used during [h, h+1) period.
So far my only idea was to generate multiple rows from each input row for each hour (I use an extension of SQL with UDFs) and then simply group by by hour, but I'm afraid this may be too slow considering large amount of data at hand.
Say for example I have a table with two rows:
+-----+-----+---+
| t_b | t_e | x |
+-----+-----+---+
| 1 | 3.5 | a |
| 0.5 | 4 | b |
+-----+-----+---+
Then resulting table should be:
+---+-------------+
| h | x |
+---+-------------+
| 0 | 0*a + 0.5*b |
| 1 | 1*a + 1*b |
| 2 | 1*a + 1*b |
| 3 | 0.5*a + 1*b |
+---+-------------+

You can have a trigger on insert into the stats table that also adds to the aggregate table (the per-hour sums).
If you also need to convert the existing data, you need to run over every row of your current table, split it into amounts/hours and add to the aggregate table.

This is an sql-server example for all number columns
with h as (
-- your hours tally here
select top(24) row_number() over(order by (select null)) eoh from sys.all_objects
), myTable as (
select 1 t_b, 3.5 t_e, 20 v union all
select 0.5, 4, 40
)
select eoh-1 h_starth
, sum(v * (case when t_e < eoh then t_e else eoh end - case when t_b > eoh-1 then t_b else eoh-1 end)) usage
from h
left join myTable t on t_e > eoh - 1 and eoh > t_b -- [..) intresection with [..)
group by eoh;
Fiddle

Related

How to pivot column data into a row where a maximum qty total cannot be exceeded?

Introduction:
I have come across an unexpected challenge. I'm hoping someone can help and I am interested in the best method to go about manipulating the data in accordance to this problem.
Scenario:
I need to combine column data associated to two different ID columns. Each row that I have associates an item_id and the quantity for this item_id. Please see below for an example.
+-------+-------+-------+---+
|cust_id|pack_id|item_id|qty|
+-------+-------+-------+---+
| 1 | A | 1 | 1 |
| 1 | A | 2 | 1 |
| 1 | A | 3 | 4 |
| 1 | A | 4 | 0 |
| 1 | A | 5 | 0 |
+-------+-------+-------+---+
I need to manipulate the data shown above so that 24 rows (for 24 item_ids) is combined into a single row. In the example above I have chosen 5 items to make things easier. The selection format I wish to get, assuming 5 item_ids, can be seen below.
+---------+---------+---+---+---+---+---+
| cust_id | pack_id | 1 | 2 | 3 | 4 | 5 |
+---------+---------+---+---+---+---+---+
| 1 | A | 1 | 1 | 4 | 0 | 0 |
+---------+---------+---+---+---+---+---+
However, here's the condition that is making this troublesome. The maximum total quantity for each row must not exceed 5. If the total quantity exceeds 5 a new row associated to the cust_id and pack_id must be created for the rest of the item_id quantities. Please see below for the desired output.
+---------+---------+---+---+---+---+---+
| cust_id | pack_id | 1 | 2 | 3 | 4 | 5 |
+---------+---------+---+---+---+---+---+
| 1 | A | 1 | 1 | 3 | 0 | 0 |
| 1 | A | 0 | 0 | 1 | 0 | 0 |
+---------+---------+---+---+---+---+---+
Notice how the quantities of item_ids 1, 2 and 3 summed together equal 6. This exceeds the maximum total quantity of 5 for each row. For the second row the difference is created. In this case only item_id 3 has a single quantity remaining.
Note, if a 2nd row needs to be created that total quantity displayed in that row also cannot exceed 5. There is a known item_id limit of 24. But, there is no known limit of the quantity associated for each item_id.
Here's an approach which goes from left-field a bit.
One approach would have been to do a recursive CTE, building the rows one-by-one.
Instead, I've taken an approach where I
Create a new (virtual) table with 1 row per item (so if there are 6 items, there will be 6 rows)
Group those items into groups of 5 (I've called these rn_batches)
Pivot those (based on counts per item per rn_batch)
For these, processing is relatively simple
Creating one row per item is done using INNER JOIN to a numbers table with n <= the relevant quantity.
The grouping then just assigns rn_batch = 1 for the first 5 items, rn_batch = 2 for the next 5 items, etc - until there are no more items left for that order (based on cust_id/pack_id).
Here is the code
/* Data setup */
CREATE TABLE #Order (cust_id int, pack_id varchar(1), item_id int, qty int, PRIMARY KEY (cust_id, pack_id, item_id))
INSERT INTO #Order (cust_id, pack_id, item_id, qty) VALUES
(1, 'A', 1, 1),
(1, 'A', 2, 1),
(1, 'A', 3, 4),
(1, 'A', 4, 0),
(1, 'A', 5, 0);
/* Pivot results */
WITH Nums(n) AS
(SELECT (c * 100) + (b * 10) + (a) + 1 AS n
FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) A(a)
CROSS JOIN (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) B(b)
CROSS JOIN (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) C(c)
),
ItemBatches AS
(SELECT cust_id, pack_id, item_id,
FLOOR((ROW_NUMBER() OVER (PARTITION BY cust_id, pack_id ORDER BY item_id, N.n)-1) / 5) + 1 AS rn_batch
FROM #Order O
INNER JOIN Nums N ON N.n <= O.qty
)
SELECT *
FROM (SELECT cust_id, pack_id, rn_batch, 'Item_' + LTRIM(STR(item_id)) AS item_desc
FROM ItemBatches
) src
PIVOT
(COUNT(item_desc) FOR item_desc IN ([Item_1], [Item_2], [Item_3], [Item_4], [Item_5])) pvt
ORDER BY cust_id, pack_id, rn_batch;
And here are results
cust_id pack_id rn_batch Item_1 Item_2 Item_3 Item_4 Item_5
1 A 1 1 1 3 0 0
1 A 2 0 0 1 0 0
Here's a db<>fiddle with
additional data in the #Orders table
the answer above, and also the processing with each step separated.
Notes
This approach (with the virtual numbers table) assumes a maximum of 1,000 for a given item in an order. If you need more, you can easily extend that numbers table by adding additional CROSS JOINs.
While I am in awe of the coders who made SQL Server and how it determines execution plans in millisends, for larger datasets I give SQL Server 0 chance to accurately predict how many rows will be in each step. As such, for performance, it may work better to split the code up into parts (including temp tables) similar to the db<>fiddle example.

Generating Rows Based on Column Value

One of my tables in my database contains rows with requisition numbers and other related info. I am trying to create a second table (populated with an INSERT INTO statement) that duplicates these rows and adds a series value based on the value in the QuantityOrdered column.
For example, the first table is shown below:
+-------------+----------+
| Requisition | Quantity |
+-------------+----------+
| 10001_01_AD | 4 |
+-------------+----------+
and I would like the output to be as follows:
+-------------+----------+----------+
| Requisition | Quantity | Series |
+-------------+----------+----------+
| 10001_01_AD | 4 | 1 |
| 10001_01_AD | 4 | 2 |
| 10001_01_AD | 4 | 3 |
| 10001_01_AD | 4 | 4 |
+-------------+----------+----------+
I've been attempting to use Row_Number() to sequence the values but it's numbering rows based on instances of Requisition values, not based on the Quantity value.
Non-recursive way:
SELECT *
FROM tab t
CROSS APPLY (SELECT n
FROM (SELECT ROW_NUMBER() OVER(ORDER BY 1/0) AS n
FROM master..spt_values s1) AS sub
WHERE sub.n <= t.Quantity) AS s2(Series);
db<>fiddle demo
You need recursive way :
with t as (
select Requisition, 1 as start, Quantity
from table
union all
select Requisition, start + 1, Quantity
from t
where start < Quantity
)
select Requisition, Quantity, start as Series
from t;
However, by default it has limited to only 100 Quantities, if you have a more then you need to specify the query hint by using option (maxrecursion 0).
A simple method uses recursive CTEs:
with cte as (
select requsition, quantity, 1 as series
from t
union all
select requsition, quantity, 1 + series
from t
where lev < quantity
)
select requsition, quantity, series
from cte;
With default setting, this works up to a quantity of 100. For larger quantities, you can add option (maxrecursion 0) to the query.

SQL union / join / intersect multiple select statements

I have two select statements. One gets a list (if any) of logged voltage data in the past 60 seconds and related chamber names, and one gets a list (if any) of logged arc event data in the past 5 minutes. I am trying to append the arc count data as new columns to the voltage data table. I cannot figure out how to do this.
Note that, there may or may not be arc count rows, for a given chamber name that is in the voltage data table. If there are no rows, I want to set the arc count column value to zero.
Any ideas on how to accomplish this?
Voltage Data:
SELECT DISTINCT dbo.CoatingChambers.Name,
AVG(dbo.CoatingGridVoltage_Data.ChanA_DCVolts) AS ChanADC,
AVG(dbo.CoatingGridVoltage_Data.ChanB_DCVolts) AS ChanBDC,
AVG(dbo.CoatingGridVoltage_Data.ChanA_RFVolts) AS ChanARF,
AVG(dbo.CoatingGridVoltage_Data.ChanB_RFVolts) AS ChanBRF FROM
dbo.CoatingGridVoltage_Data LEFT OUTER JOIN dbo.CoatingChambers ON
dbo.CoatingGridVoltage_Data.CoatingChambersID =
dbo.CoatingChambers.CoatingChambersID WHERE
(dbo.CoatingGridVoltage_Data.DT > DATEADD(second, - 60,
SYSUTCDATETIME())) GROUP BY dbo.CoatingChambers.Name
Returns
Name | ChanADC | ChanBDC | ChanARF | ChanBRF
-----+-------------------+--------------------+---------------------+------------------
OX2 | 2.9099999666214 | -0.485000004371007 | 0.344801843166351 | 0.49748428662618
S2 | 0.100000001490116 | -0.800000016887983 | 0.00690172302226226 | 0.700591623783112
S3 | 4.25666658083598 | 0.5 | 0.96554297208786 | 0.134956782062848
Arc count table:
SELECT CoatingChambers.Name,
SUM(ArcCount) as ArcCount
FROM CoatingChambers
LEFT JOIN CoatingArc_Data
ON dbo.[CoatingArc_Data].CoatingChambersID = dbo.CoatingChambers.CoatingChambersID
where EventDT > DATEADD(mi,-5, GETDATE())
Group by Name
Returns
Name | ArcCount
-----+---------
L1 | 283
L4 | 0
L6 | 1
S2 | 55
To be clear, I want this table (with added arc count column), given the two tables above:
Name | ChanADC | ChanBDC | ChanARF | ChanBRF | ArcCount
-----+-------------------+--------------------+---------------------+-------------------+---------
OX2 | 2.9099999666214 | -0.485000004371007 | 0.344801843166351 | 0.49748428662618 | 0
S2 | 0.100000001490116 | -0.800000016887983 | 0.00690172302226226 | 0.700591623783112 | 55
S3 | 4.25666658083598 | 0.5 | 0.96554297208786 | 0.134956782062848 | 0
You can treat the select statements as virtual tables and just join them together:
select
x.Name,
x.ChanADC,
x.ChanBDC,
x.ChanARF,
x.ChanBRF,
isnull( y.ArcCount, 0 ) ArcCount
from
(
select distinct
cc.Name,
AVG(cgv.ChanA_DCVolts) AS ChanADC,
AVG(cgv.ChanB_DCVolts) AS ChanBDC,
AVG(cgv.ChanA_RFVolts) AS ChanARF,
AVG(cgv.ChanB_RFVolts) AS ChanBRF
from
dbo.CoatingGridVoltage_Data cgv
left outer join
dbo.CoatingChambers cc
on
cgv.CoatingChambersID = cc.CoatingChambersID
where
cgv.DT > dateadd(second, - 60, sysutcdatetime())
group by
cc.Name
) as x
left outer join
(
select
cc.Name,
sum(ac.ArcCount) as ArcCount
from
dbo.CoatingChambers cc
left outer join
dbo.CoatingArc_Data ac
on
ac.CoatingChambersID = cc.CoatingChambersID
where
EventDT > dateadd(mi,-5, getdate())
group by
Name
) as y
on
x.Name = y.Name
Also, it's worthwhile to simplify your names with aliases and format the queries for readability...which I shamelessly took a stab at.

Reference the output of a calculated column in Hive SQL

I have a self-referencing/recursive calculation in Excel that needs to be moved to Hive SQL. Basically the column needs to SUM the two values only if the total of the concrete column plus the result from the previous calculation is greater than 0.
The data is as follows, A is the value and B is the expected output:
| A | B |
|-----|-----|
| -1 | 0 |
| 2 | 2 |
| -2 | 0 |
| 2 | 2 |
| 2 | 4 |
| -1 | 3 |
| 2 | 5 |
In Excel it would be written in column B as:
=MAX(0,B1+A2)
The problem in SQL is you need to have the output of the current calculation. I think I've got it sorted in SQL as the following:
DECLARE #Numbers TABLE(A INT, Rn INT)
INSERT INTO #Numbers VALUES (-1,1),(2,2),(-2,3),(2,4),(2,5),(-1,6),(2,7);
WITH lagged AS
(
SELECT A, 0 AS B, Rn
FROM #Numbers
WHERE Rn = 1
UNION ALL
SELECT i.A,
CASE WHEN ((i.A + l.B) >= 0) THEN (i.A + l.B)
ELSE l.B
END,
i.Rn
FROM #Numbers i INNER JOIN lagged l
ON i.Rn = l.Rn + 1
)
SELECT *
FROM lagged;
But this being Hive, it doesn't support CTEs so I need to dumb the SQL down a touch. Is that possible using LAG/LEAD? My brain is hurting having got this far!
I initially thought that it would help to first compute the Sum of all elements until each rank and then fix the values somehow using negative elements.
However, one big negative that would zero the B column will carry forward in the sum and will make all following elements negative.
It's as Gordon commented - 0 is max in the calculation =MAX(0,B1+A2) depends on the previous location where it happened and it seems to be impossible to compute them in advance analytically.

how to get daily profit from sql table

I'm stucking for a solution at the problem of finding daily profits from db (ms access) table. The difference wrt other tips I found online is that I don't have in the table a field "Price" and one "Cost", but a field "Type" which distinguish if it is a revenue "S" or a cost "C"
this is the table "Record"
| Date | Price | Quantity | Type |
-----------------------------------
|01/02 | 20 | 2 | C |
|01/02 | 10 | 1 | S |
|01/02 | 3 | 10 | S |
|01/02 | 5 | 2 | C |
|03/04 | 12 | 3 | C |
|03/03 | 200 | 1 | S |
|03/03 | 120 | 2 | C |
So far I tried different solutions like:
SELECT
(SELECT SUM (RS.Price* RS.Quantity)
FROM Record RS WHERE RS.Type='S' GROUP BY RS.Data
) as totalSales,
(SELECT SUM (RC.Price*RC.Quantity)
FROM Record RC WHERE RC.Type='C' GROUP BY RC.Date
) as totalLosses,
ROUND(totalSales-totaleLosses,2) as NetTotal,
R.Date
FROM RECORD R";
in my mind it could work but obviously it doesn't
and
SELECT RC.Data, ROUND(SUM (RC.Price*RC.QuantitY),2) as DailyLoss
INTO #DailyLosses
FROM Record RC
WHERE RC.Type='C' GROUP BY RC.Date
SELECT RS.Date, ROUND(SUM (RS.Price*RS.Quantity),2) as DailyRevenue
INTO #DailyRevenues
FROM Record RS
WHERE RS.Type='S'GROUP BY RS.Date
SELECT Date, DailyRevenue - DailyLoss as DailyProfit
FROM #DailyLosses dlos, #DailyRevenues drev
WHERE dlos.Date = drev.Date";
My problem beyond the correct syntax is the approach to this kind of problem
You can use grouping and conditional summing. Try this:
SELECT data.Date, data.Income - data.Cost as Profit
FROM (
SELECT Record.Date as Date,
SUM(IIF(Record.Type = 'S', Record.Price * Record.Quantity, 0)) as Income,
SUM(IIF(Record.Type = 'C', Record.Price * Record.Quantity, 0)) as Cost,
FROM Record
GROUP BY Record.Date
) data
In this case you first create a sub-query to get separate fields for Income and Cost, and then your outer query uses subtraction to get actual profit.