Create View with cumulative counts for a fixed set of value-range - google-bigquery

I want to create BQ View on this table:
mytable
id | value
s1 | 21
s2 | 31
s3 | 71
the View need to count each of the above row(id), in each of the 10 fixed milestones when value <= milestoneValue
Resulting view with 10 milestone rows for mytable will be:
milestoneValue | count
100 | 3
90 | 3 (all s1 s2 s3)
80 | 3
70 | 2 (s1,s2)
60..
30 | 1
20 | 1
10 | 1
I did not find any suitable function to compute this. I could add the 10 binary flag as columns on the raw data in mytable, that I could SUM, but do not see a way to transform that to a 10 row milestone View.
I tried:
SELECT id, value,
IF(value <= 10 , 1, 0) as M10,
IF(value <= 20 , 1, 0) as M20,
...
IF(value <= 90 , 1, 0) as M90,
IF(value <= 100 , 1, 0) as M100
FROM mytable ;
Help appreciated,
Thanks

SELECT bucket, SUM(word_count>bucket)
FROM [publicdata:samples.shakespeare] a
CROSS JOIN (
SELECT bucket FROM (SELECT 10 bucket), (SELECT 20 bucket), (SELECT 30 bucket), (SELECT 40 bucket)
) b
GROUP BY bucket

Related

Find 3 or more consecutive transaction record where the transaction amount greater than 100 and the records belong to the same category

I have a customer transaction table which has 3 columns, id, Category, TranAmount. Now I want to find 3 or more consecutive transaction records which belongs to the same category and the TranAmount greater than 100.
Below is the sample table:
Id Category TranAmount
1 A 190
2 A 160
3 A 169
4 B 190
5 A 90
6 B 219
7 B 492
8 B 129
9 B 390
10 B 40
11 A 110
12 A 130
And the output should be:
Id Category TranAmount
1 A 190
2 A 160
3 A 169
6 B 219
7 B 492
8 B 129
9 B 390
Look into "gaps and islands" reference for a deeper understanding of the approach. Here's one of many you could read: https://www.red-gate.com/simple-talk/sql/t-sql-programming/the-sql-of-gaps-and-islands-in-sequences/
In this specific problem you have two conditions that cause a break in a consecutive series, those being a change in category or an amount that doesn't meet the threshold.
with data as (
select *,
row_number() over (order by Id) as rn,
row_number() over (partition by
Category, case when TranAmount >= 100 then 1 else 0 end order by Id) as cn
from Transactions
), grp as (
select *, count(*) over (partition by rn - cn) as num
from data
where TranAmount >= 100
)
select * from grp where num >= 3;
https://rextester.com/DUM44618
This will work if there are no gaps between the ids:
select distinct t.*
from tablename t inner join (
select t.id from tablename t
where t.tranamount > 100
and
exists (
select 1 from tablename
where id = t.id - 1 and category = t.category and tranamount > 100
)
and
exists (
select 1 from tablename
where id = t.id + 1 and category = t.category and tranamount > 100
)
) tt on t.id in (tt.id - 1, tt.id, tt.id + 1)
See the demo.
Results:
Id | Category | TranAmount
-: | :------- | ---------:
1 | A | 190
2 | A | 160
3 | A | 169
6 | B | 219
7 | B | 492
8 | B | 129
9 | B | 390
I can't really test this out yet but give this a try.
SELECT Id, Category, Amount FROM Table
WHERE Amount > 100
and Category IN
(SELECT Category FROM Table
WHERE Amount > 100
GROUP BY Category HAVING COUNT (Category ) >= 3)

finding rows against summed value of specific id's in sql

I have a table like below--
Id| Amount|DateAdded |
--|-------|-----------|
1 20 20-Jun-2018
1 10 05-Jun-2018
1 4 21-May-2018
1 5 15-May-2018
1 15 05-May-2018
2 25 15-Jun-2018
2 25 12-Jun-2018
2 65 05-Jun-2018
2 65 20-May-2018
Here If I sum up the Amount of Id = 1 then I will get 54 as the sum result. I want to find those rows of Id = 1 whose sum is not greater then exact 35 or any given value
In case of given value 35 the expected Output for id = 1 should be--
Id| Amount|DateAdded |
--|-------|-----------|
1 20 20-Jun-2018
1 10 05-Jun-2018
1 4 21-May-2018
1 5 15-May-2018
In case of given value 50 the expected Output for Id = 2 should be--
Id| Amount|DateAdded |
--|-------|-----------|
2 25 15-Jun-2018
2 25 12-Jun-2018
You would use a cumulative sum. To get all the rows:
select t.*
from (select t.*,
sum(amount) over (partition by id order by dateadded) as running_amount
from t
) t
where t.running_amount - amount < 35;
To get just the row that passes the mark:
where t.running_amount - amount < 35 and
t.running_amount >= 35

Oracle: Get the smaller values and the first greater value

I have a table like this;
ID Name Value
1 Sample1 10
2 Sample2 20
3 Sample3 30
4 Sample4 40
And I would like to get all of the rows that contain smaller values and the first row that contains greater value.
For example when I send '25' as a parameter to Value column, I want to have following table;
ID Name Value
1 Sample1 10
2 Sample2 20
3 Sample3 30
I'm stuck at this point, thanks in advance.
Analytic functions to the rescue!
create table your_table (
id number,
value number)
insert into your_table
select level, level * 10
from dual
connect by level <= 5
select * from your_table
id | value
----+------
1 | 10
2 | 20
3 | 30
4 | 40
5 | 50
Ok, now we use lag(). Specify field, offset and the default value (for the first row that has no previous one).
select id, value, lag(value, 1, value) over (order by value) previous_value
from your_table
id | value | previous_value
---+-------+---------------
1 | 10 | 10
2 | 20 | 10
3 | 30 | 20
4 | 40 | 30
5 | 50 | 40
Now apply where.
select id, value
from (
select id, value, lag(value, 1, value) over (order by value) previous_value
from your_table)
where previous_value < 25
Works for me.
id | value
----+------
1 | 10
2 | 20
3 | 30
Of course you have to have some policy on ties. For example, what happens if two rows have the same value and they are both first — do you want to keep both or only one of them. Or maybe you have some other criterion for breaking the tie (say, sort by id). But the idea is fairly simple.
you can try a query like this :
SELECT * FROM YourTableName WHERE Value < 25 OR ID IN (SELECT TOP 1 ID FROM YourTableName WHERE Value >= 25 ORDER BY Value)
in Oracle, you can try this (but see "That Young Man" answer, I think it's better than mine):
SELECT * FROM (
SELECT ID, NAME, VALUE, 1 AS RN
FROM YT
WHERE VALUE < 25
UNION ALL
SELECT ID, NAME, VALUE, ROW_NUMBER()OVER (ORDER BY VALUE) AS RN
FROM YT
WHERE VALUE > 25
) A
WHERE RN=1;

SQL Server 2012 buckets based on running total

For SQL Server 2012, I am trying to assign given rows to sequential buckets based on the maximum size of the bucket (100 in the sample below) and running total of a column. Most of the solutions I found partition by known column changing value e.g. partition by department id etc. However, in this situation all I have is sequential id and size. The closest solution I have found is discussed in this thread for SQL Server 2008 and I tried it but the performance very slow for large row set much worse than cursor based solution. https://dba.stackexchange.com/questions/45179/how-can-i-write-windowing-query-which-sums-a-column-to-create-discrete-buckets
This table can contain up to 10 Million rows. With SQL Server 2012 supporting SUM OVER and LAG and LEAD functions, wondering if someone can suggest a solution based on 2012.
CREATE TABLE raw_data (
id INT PRIMARY KEY
, size INT NOT NULL
);
INSERT INTO raw_data
(id, size)
VALUES
( 1, 96) -- new bucket here, maximum bucket size is 100
, ( 2, 10) -- and here
, ( 3, 98) -- and here
, ( 4, 20)
, ( 5, 50)
, ( 6, 15)
, ( 7, 97)
, ( 8, 96) -- and here
;
--Expected output
--bucket_size is for illustration only, actual needed output is bucket only
id size bucket_size bucket
-----------------------------
1 100 100 1
2 10 10 2
3 98 98 3
4 20 85 4
5 50 85 4
6 15 85 4
7 97 98 5
8 1 98 5
TIA
You can achieve this quite easily in SQL Server 2012 using a window function and framing. The syntax looks quite complex, but the concept is simple - sum all the previous rows up to and including the current one. The cumulative_bucket_size column in this example is for demonstration purposes, as it is part of the equation used to derive the bucket number:
DECLARE #Bucket_Size AS INT;
SET #Bucket_Size = 100
SELECT
id,
size,
SUM(size) OVER (
PARTITION BY 1 ORDER BY id ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_bucket_size,
1 + SUM(size) OVER (
PARTITION BY 1 ORDER BY id ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) / #Bucket_Size AS bucket
FROM
raw_data
The PARTITION BY clause is optional, but would be useful if you had different "bucket sets" for column groupings. I have added it here for completeness.
Results:
id size cumulative_bucket_size bucket
------------------------------------------
1 96 96 1
2 10 106 2
3 98 204 3
4 20 224 3
5 50 274 3
6 15 289 3
7 97 386 4
8 96 482 5
You can read more about windows framing in the following article:
https://www.simple-talk.com/sql/learn-sql-server/window-functions-in-sql-server-part-2-the-frame/
Before you can use the running total method to assign bucket numbers, you need to generate that bucket_size column, because the numbers would be produced based on that column.
Based on your expected output, the bucket ranges are
1..10
11..85
86..100
You could use a simple CASE expression like this to generate a bucket_size column like in your example:
CASE
WHEN size <= 10 THEN 10
WHEN size <= 85 THEN 85
ELSE 100
END
Then you would use LAG() to determine if a row starts a new sequence of sizes belonging to the same bucket:
CASE bucket_size
WHEN LAG(bucket_size) OVER (ORDER BY id) THEN 0
ELSE 1
END
These two calculations could be done in the same (sub)query with the help of CROSS APPLY:
SELECT
d.id,
d.size,
x.bucket_size, -- for illustration only
is_new_seq = CASE x.bucket_size
WHEN LAG(x.bucket_size) OVER (ORDER BY d.id) THEN 0
ELSE 1
END
FROM dbo.raw_data AS d
CROSS APPLY
(
SELECT
CASE
WHEN size <= 10 THEN 10
WHEN size <= 85 THEN 85
ELSE 100
END
) AS x (bucket_size)
The above query would produce this output:
id size bucket_size is_new_seq
-- ---- ----------- ----------
1 96 100 1
2 10 10 1
3 98 100 1
4 20 85 1
5 50 85 0
6 15 85 0
7 97 100 1
8 96 100 0
Now use that result as a derived table and apply SUM() OVER to is_new_seq to produce the bucket numbers, like this:
SELECT
id,
size,
bucket = SUM(is_new_seq) OVER (ORDER BY id)
FROM
(
SELECT
d.id,
d.size,
is_new_seq = CASE x.bucket_size
WHEN LAG(x.bucket_size) OVER (ORDER BY d.id) THEN 0
ELSE 1
END
FROM dbo.raw_data AS d
CROSS APPLY
(
SELECT
CASE
WHEN size <= 10 THEN 10
WHEN size <= 85 THEN 85
ELSE 100
END
) AS x (bucket_size)
) AS s
;

How to find range of a number where the ranges come dyamically from another table?

If I had two tables:
PersonID | Count
-----------------
1 | 45
2 | 5
3 | 120
4 | 87
5 | 60
6 | 200
7 | 31
SizeName | LowerLimit
-----------------
Small | 0
Medium | 50
Large | 100
I'm trying to figure out how to do a query to get a result similar to:
PersonID | SizeName
-----------------
1 | Small
2 | Small
3 | Large
4 | Medium
5 | Medium
6 | Large
7 | Small
Basically, one table specifies an unknown number of "range names" and their integer ranges associated. So a count range of 0 to 49 from the person table gets a 'small' designation. 50-99 gets 'medium' etc. But I need it to be dynamic because I do not know the range names or integer values. Can I do this in a single query or would I have to write a separate function to loop through the possibilities?
Try this out:
SELECT PersonID, SizeName
FROM
(
SELECT
PersonID,
(SELECT MAX([LowerLimit]) FROM dbo.[Size] WHERE [LowerLimit] < [COUNT]) As LowerLimit
FROM dbo.Person
) A
INNER JOIN dbo.[SIZE] B ON A.LowerLimit = B.LowerLimit
With Ranges As
(
Select 'Small' As Name, 0 As LowerLimit
Union All Select 'Medium', 50
Union All Select 'Large', 100
)
, Person As
(
Select 1 As PersonId, 45 As [Count]
Union All Select 2, 5
Union All Select 3, 120
Union All Select 4, 87
Union All Select 5, 60
Union All Select 6, 200
Union All Select 7, 31
)
, RangeStartEnd As
(
Select R1.Name
, Case When Min(R1.LowerLimit) = 0 Then -1 Else MIN(R1.LowerLimit) End As StartValue
, Coalesce(MIN(R2.LowerLimit), 2147483647) As EndValue
From Ranges As R1
Left Join Ranges As R2
On R2.LowerLimit > R1.LowerLimit
Group By R1.Name
)
Select P.PersonId, P.[Count], RSE.Name
From Person As P
Join RangeStartEnd As RSE
On P.[Count] > RSE.StartValue
And P.[Count] <= RSE.EndValue
Although I'm using common-table expressions (cte for short) which only exist in SQL Server 2005+, this can be done with multiple queries where you create a temp table to store the equivalent of the RangeStartEnd cte. The trick is to create a view that has a start column and end column.
SELECT p.PersonID, Ranges.SizeName
FROM People P
JOIN
(
SELECT SizeName, LowerLimit, MIN(COALESCE(upperlimit, 2000000)) AS upperlimit
FROM (
SELECT rl.SizeName, rl.LowerLimit, ru.LowerLimit AS UpperLimit
FROM Ranges rl
LEFT OUTER JOIN Ranges ru ON rl.LowerLimit < ru.LowerLimit
) r
WHERE r.LowerLimit < COALESCE(r.UpperLimit, 2000000)
GROUP BY SizeName, LowerLimit
) Ranges ON p.Count >= Ranges.LowerLimit AND p.Count < Ranges.upperlimit
ORDER BY PersonID