SQL Query to continuously bucket data - sql

I have a table as follows:
Datetime | ID | Price | Quantity
2013-01-01 13:30:00 1 139 25
2013-01-01 13:30:15 2 140 25
2013-01-01 13:30:30 3 141 15
Supposing that I wish to end up with a table like this, which buckets the data into quantities of 50 as follows:
Bucket_ID | Max | Min | Avg |
1 140 139 139.5
2 141 141 141
Is there a simple query to do this? Data will constantly be added to the first table, it would be nice if it could somehow not recalculate the completed buckets of 50 and instead automatically start averaging the next incomplete bucket. Ideas appreciated! Thanks

You may try this solution. It should work even if "number" is bigger than 50 (but relying on fact that avg(number) < 50).
select
bucket_id,
max(price),
min(price),
avg(price)
from
(
select
price,
bucket_id,
(select sum(t2.number) from test t2 where t2.id <= t1.id ) as accumulated
from test t1
join
(select
rowid as bucket_id,
50 * rowid as bucket
from test) buckets on (buckets.bucket - 50) < accumulated
and buckets.bucket > (accumulated - number))
group by
bucket_id;
You can have a look at this fiddle http://sqlfiddle.com/#!7/4c63c/1 if it is what you want.

Related

SQL update statement to sum column in one table, then add the total to a different column/table

Evening all, hoping for some pointers with an SQL Server query if possible.
I have two tables in a database, example as follows:
PostedTran
PostedTranID AccountID PeriodID Value TransactionDate
1 100 120 100 2019-01-01
2 100 120 200 2020-01-01
3 100 130 300 2021-01-01
4 101 120 400 2020-01-01
5 101 130 500 2021-01-01
PeriodValue
PeriodValueID AccountID PeriodID ActualValue
10 100 120 500
11 101 120 600
I have a mismatch in the two tables, and I'm failing miserably in my attempts. From the PostedTran table, I'm trying to select all transaction lines dated before 2021-01-01, then sum the Value for each AccountID from the results. I then need to add that value to the existing ActualValue in the PeriodValue table.
So, in the above example, the ActualValue on PeriodValueID 10 will update to 800, and 11 to 1000. The PeriodID in this example is constant and will always be 120.
Thanks in advance for any help.
Since RDMS not mentioned, pseudo-sql looks like:
with DataSum as
(
select AccountID, PeriodID, sum(Value) as TotalValue
from PostedTran
where TransactionDate<'1/1/2021'
group by AccountID, PeriodID
)
update PeriodValue set ActualValue = ActualValue + ds.TotalVaue
from PeriodValue pv inner join DataSum ds
on pv.accountid=ds.accountid and pv.periodid=ds.periodid
The following should do what you ask. I haven't included PeriodId in the correlation as you did not specify it in your description, however you can just include it if it's required.
update pv set pv.ActualValue=pv.ActualValue + t.Value
from PeriodValue pv
cross apply (
select Sum(value) value
from PostedTran pt
where pt.AccountId=pv.AccountId and pt.TransactionDate <'20210101'
)t

Dividing a sum value into multiple rows due to field length constraint

I am migrating financial data from a very large table (100 million+ of rows) by summarizing the amount and insert them into summary table. I ran into problem when the summary amount (3 billions) is larger than what the field in the summary table can hold (can only hold up to 999 millions.) Changing the field size is not an option as it requires a change process.
The only option I have is to divide the amount (the one that breach the size limit) into smaller ones so it can be inserted into the table.
I came across this SQL - I need to divide a total value into multiple rows in another table which is similar except the number of rows I need to insert is dynamic.
For simplicity, this is how the source table might look like
account_table
acct_num | amt
-------------------------------
101 125.00
101 550.00
101 650.00
101 375.00
101 475.00
102 15.00
103 325.00
103 875.00
104 200.00
104 275.00
The summary records are as follows
select acct_num, sum(amt)
from account_table
group by acct_num
Account Summary
acct_num | amt
-------------------------------
101 2175.00
102 15.00
103 1200.00
104 475.00
Assuming the maximum value in the destination table is 1000.00, the expected output will be
summary_table
acct_num | amt
-------------------------------
101 1000.00
101 1000.00
101 175.00
102 15.00
103 1000.00
103 200.00
104 475.00
How do I create a query to get the expected result? Thanks in advance.
You need a numbers table. If you have a handful of values, you can define it manually. Otherwise, you might have one on hand or use a similar logic:
with n as (
select (rownum - 1) as n
from account_table
where rownum <= 10
),
a as (
select acct_num, sum(amt) as amt
from account_table
group by acct_num
)
select acct_num,
(case when (n.n + 1) * 1000 < amt then 1000
else amt - n.n * 1000
end) as amt
from a join
n
on n.n * 1000 < amt ;
A variation along these lines might give some ideas (using the 1,000 of your sample data):
WITH summary AS (
SELECT acct_num
,TRUNC(SUM(amt) / 1000) AS times
,MOD(SUM(amt), 1000) AS remainder
FROM account_table
GROUP BY acct_num
), x(acct_num, times, remainder) AS (
SELECT acct_num, times, remainder
FROM summary
UNION ALL
SELECT s.acct_num, x.times - 1, s.remainder
FROM summary s
,x
WHERE s.acct_num = x.acct_num
AND x.times > 0
)
SELECT acct_num
,CASE WHEN times = 0 THEN remainder ELSE 1000 END AS amt
FROM x
ORDER BY acct_num, amt DESC
The idea is to first build a summary table with div and modulo:
ACCT_NUM TIMES REMAINDER
101 2 175
102 0 15
103 1 200
104 0 475
Then perform a hierarchical query on the summary table based on the number of "times" (i.e. rows) you want, with an extra for the remainder.
ACCT_NUM AMT
101 1000
101 1000
101 175
102 15
103 1000
103 200
104 475

add column based on a column value in one row

I've this table with the following data
user Date Dist Start
1 2014-09-03 150 12500
1 2014-09-04 220 null
1 2014-09-05 100 null
2 2014-09-03 290 18000
2 2014-09-04 90 null
2 2014-09-05 170 null
Based on the value in Start Column i need to add another column and repeat the value if not null for the same user
The resultant table should be as below
user Date Dist Start StartR
1 2014-09-03 150 12500 12500
1 2014-09-04 220 null 12500
1 2014-09-05 100 null 12500
2 2014-09-03 290 18000 18000
2 2014-09-04 90 null 18000
2 2014-09-05 170 null 18000
Can someone please help me out with this query? because i don't have any idea how can i do it
For the data you have, you can use a window function:
select t.*, min(t.start) over (partition by user) as StartR
from table t
You can readily update using the same idea:
with toupdate as (
select t.*, min(t.start) over (partition by user) as new_StartR
from table t
)
update toupdate
set StartR = new_StartR;
Note: this works for the data in the question and how you have phrased the question. It would not work if there were multiple Start values for a given user, or if there were NULL values that you wanted to keep before the first non-NULL Start value.
You can use COALESCE/ISNULL and a correlated sub-query:
SELECT [user], [Date], [Dist], [Start],
StartR = ISNULL([Start], (SELECT MIN([Start])
FROM dbo.TableName t2
WHERE t.[User] = t2.[User]
AND t2.[Start] IS NOT NULL))
FROM dbo.TableName t
I have used MIN([Start]) since you haven't said what should happen if there are multiple Start values for one user that are not NULL.

Oracle SQL Create PDF from Data

So I am trying to create a Probability Density Function from data in an Oracle SQL table through a SQL query. So consider the below table:
Name | Spend
--------------
Anne | 110
Phil | 40
Sue | 99
Jeff | 190
Stan | 80
Joe | 90
Ben | 100
Lee | 85
Now if I want to create a PDF from that data I need to count the number of times each customer spends with in a certain quanta (between 0 and 50 or between 50 and 100). An example graph would look something like this (forgive my poor ascii art):
5|
4| *
3| *
2| * *
1|* * * *
|_ _ _ _
5 1 1 2
0 0 5 0
0 0 0
So the axis are:
X-Axis: Is the buckets
Y-Axis: is the number of customers
I am currently using the Oracle SQL CASE function to determine whether the spend falls within the bucket and then summing the number of customers that do. However this is taking forever as it there are a couple of million records.
Any idea on how to do this effectively?
Thanks!
You can try using WIDTH_BUCKET function.
select bucket , count(name)
from (select name, spend,
WIDTH_BUCKET(spend, 0, 200, 4) bucket
from mytable
)
group by bucket
order by bucket;
Here I have divided the range 0 to 200 into 4 bucket. And the function assigns a bucket number to each value. You can group by this bucket and count how many reocrds fall in each bucket.
Demo here.
You can even display the actual bucket range.
select bucket,
cast(min_value + ((bucket-1) * (max_value-min_value)/buckets) as varchar2(10))
||'-'
||cast(min_value + ((bucket) * (max_value-min_value)/buckets) as varchar2(10)),
count(name) c
from (select name,
spend,
WIDTH_BUCKET(spend, min_value, max_value, buckets) bucket
from mytable)
group by bucket
order by bucket;
Sample here.
SELECT COUNT(*) y_axis,
X_AXIS
FROM
(SELECT COUNT(*)y_axis,
CASE
WHEN spend <= 50 THEN 50
WHEN spend < 100 AND spend > 50 THEN 100
WHEN spend < 150 AND spend >= 100 THEN 150
WHEN spend < 200 AND spend >= 150 THEN 200
END x_axis
FROM your_table
GROUP BY spend
)
GROUP BY X_AXIS;
y_axis x_axis
-----------------
4 100
1 50
1 200
2 150

oracle sql query to get data from two tables of similar type

I have two tables ACTUAL AND ESTIMATE having unique column(sal_id, gal_id, amount, tax).
In ACTUAL table I have
actual_id, sal_id, gal_id, process_flag, amount, tax
1 111 222 N 100 1
2 110 223 N 200 2
In ESTIMATE table I have
estimate_id, sal_id, gal_id, process_flag, amount, tax
3 111 222 N 50 1
4 123 250 N 150 2
5 212 312 Y 10 1
Now I want a final table, which should have record from ACTUAL table and if no record exist for sal_id+gal_id mapping in ACTUAL but exist in ESTIMATE, then populate estimate record (along with addition of amount and tax).
In FINAL table
id sal_id, gal_id, actual_id, estimate_id, total
1 111 222 1 null 101 (since record exist in actual table for 111 222)
2 110 223 2 null 202 (since record exist in actual table for 110 223)
3 123 250 null 4 51 (since record not exist in actual table but estimate exist for 123 250)
(for 212 312 combination in estimate, since record already processed, no need to process again).
I am using Oracle 11g. Please help me on writing a logic in a single sql query?
Thanks.
There are several ways to write this query. One way is to use join and coalesce:
select coalesce(a.sal_id, e.sal_id) as sal_id,
coalesce(a.gal_id, e.gal_id) as gal_id,
coalesce(a.actual_value, e.estimate_value) as actual_value
from actual a full outer join
estimate e
on a.sal_id = e.sal_id and
a.gal_id = e.gal_id
This assumes that sal_id/gal_id provides a unique match between the tables.
Since you are using Oracle, here is perhaps a clearer way of doing it:
select sal_id, gal_id, actual_value
from (select *,
max(isactual) over (partition by sal_id, gal_id) as hasactual
from ((select 1 as isactual, *
from actual
) union all
(select 0 as isactual, *
from estimate
)
) t
) t
where isactual = 1 or hasactual = 0
This query uses a window function to determine whether there is an actual record with the matching sal_id/gal_id. The logic is to take all actuals and then all records that have no match in the actuals.