divide data into subgroups

divide data into subgroups - sql

I have y=20 rows and would like to create a new column which divides the rows into n subgroups. Let us say n is 4 the result would be:
RowNumber NewColumn
1 1
2 1
3 1
4 1
5 1
6 2
7 2
8 2
9 2
10 2
11 3
12 3
13 3
14 3
15 3
16 4
17 4
18 4
19 4
20 4
How could I achieve this in SQL/TeraData please?
PS:
To add to the accepted answer, I am using something along those lines:
1 + FLOOR((ROW_NUMBER() OVER (ORDER BY Id DESC ) - 1) / 100) AS SubGroup

You can just use arithmetic:
select 1 + floor((row_number - 1) / 4) as newColumn
from t;
Note: Teradata prefers to be integer division, so floor() is not strictly necessary.

There's an old function to bucket data into percentiles, QUANTILE, but it's deprecated:
QUANTILE(4, ORDER BY whatever ASC)
When you already use another OLAP-function you better rewrite it to
4 * (RANK() OVER (ORDER BY whatever) - 1)
/ COUNT(*) OVER()
Both return a value between 0 and n, so you have to add 1 to get your expected result.
Btw, in Standard SQL there's NTILE which return a slightly different result, see Missing Functions: CUME_DIST & NTILE

Related

Convert column into the rows

This is my current result set of my query:
Question Sol25A Sol25B Sol25C Sol40A Sol40B
======================================================
A 1 4 2 6 0
B 2 3 2 1 9
C 6 7 1 0 8
======================================================
Total = 9 14 5 7 17
======================================================
And I want the result in this form:
Product Total
===============
Sol25A 9
Sol25B 14
Sol25C 5
Sol40A 7
Sol40B 17
Can you please provide me the query for me, this will be the great help for me.

I would suggest that you unpivot using cross apply and then aggregate:
select product, sum(val)
from t cross apply
(values ('Sol25A', Sol25A), ('Sol25B', Sol25B), ('Sol25C', Sol25C),
('Sol40A', Sol40A), ('Sol40B', Sol40B)
) v(product, val)
group by product;

Calculate scattered rows's average in sql server?

The table looks like below:
testid stepid serverid duration
1 1 1 10
1 2 1 11
2 1 2 12
2 2 2 13
3 1 1 14
3 2 1 15
4 1 2 16
4 2 2 17
4 tests ran on two servers. Each test has 2 steps. I would like to calculate average duration of each step of all tests on the 2 servers given test id. For example, if given test ids are 1 and 2, the final table looks like below:
stepid avg_duration
1 (10 + 12) / 2
2 (11 + 13) / 2

This is just a group by, right?
select stepid, avg(duration)
from t
where testid in (1, 2)
group by stepid;
Note: You might want avg(duration*1.0) if you want "normal" division.

Using temporary extended table to make a sum

From a given table I want to be able to sum values having the same number (should be easy, right?)
Problem: A given value can be assigned from 2 to n consecutive numbers.
For some reasons this information is stored in a single row describing the value, the starting number and the ending number as below.
TABLE A
id | starting_number | ending_number | value
----+-----------------+---------------+-------
1 2 5 8
2 0 3 5
3 4 6 6
4 7 8 10
For instance the first row means:
value '8' is assigned to numbers: 2, 3 and 4 (5 is excluded)
So, I would like the following intermediairy result table
TABLE B
id | number | value
----+--------+-------
1 2 8
1 3 8
1 4 8
2 0 5
2 1 5
2 2 5
3 4 6
3 5 6
4 7 10
So I can sum 'value' for elements having identical 'number'
SELECT number, sum(value)
FROM B
GROUP BY number
TABLE C
number | sum(value)
--------+------------
2 13
3 8
4 14
0 5
1 5
5 6
7 10
I don't know how to do this and didn't find any answer on the web (maybe not looking with appropriate key words...)
Any idea?

You can do what you want with generate_series(). So, TableB is basically:
select id, generate_series(starting_number, ending_number - 1, 1) as n, value
from tableA;
Your aggregation is then:
select n, sum(value)
from (select id, generate_series(starting_number, ending_number - 1, 1) as n, value
from tableA
) a
group by n;

sql server 2008 - calculated and ordered list needs to return only 2 entries per supplier

I have a dataset like below, but longer. I want to ensure I am picking the 'fleet_id' in terms of their 'StarDriver' value overall, but I want to return only two results for each 'supplier_id' and return a max of 20 in total.
(I'm sorry I didnt work out how to copy the below in proper formatting, couldn't find from toolbar above and google results were about copying data; would also be grateful if someone would point out how)
fleet_id supplier_id Ratings Driver Punctuality Car StarDriver
19442 151 10 5 5 5 5
19634 151 11 5 5 5 5
19437 151 12 5 5 5 5
12832 10 14 5 4.92857142857143 5 4.97619047619048
12217 111 10 5 5 4.9 4.96666666666667
21135 158 19 5 4.89473684210526 5 4.96491228070175
19436 151 14 4.85714285714286 5 5 4.95238095238095
12239 111 12 4.91666666666667 5 4.91666666666667 4.94444444444445
10520 92 12 4.91666666666667 5 4.91666666666667 4.94444444444445
19997 151 12 5 5 4.83333333333333 4.94444444444444

To limit to the top 2 for each supplier, use row_number(). This will enumerate the rows and you can choose just two with where seqnum <= 2.
The rest of the query is just selecting 20 rows based on a field:
select top 20 t.*
from (select t.*,
row_number() over (partition by supplier order by StarDriver desc) as seqnum
from table t
) t
where seqnum <= 2
order by StarDriver;

Return the last sub sorted row in a table (sql)

It's quiet hard to describe this problem but it's easy to see it graphically:
x y
1 1
2 1
3 1
* 4 1 *
5 2
* 6 2 *
7 3
8 3
9 3
* 10 3 *
I have sorted a table by x, then sub-sorted by y. I need to return the x value of the last item in the sub-sorted table (the stared rows).
I'm aware of the LAST command, but I don't know how to apply this recursively i.e. to each sub-sorted section.
Best,
Dan

SELECT y, Max(x) FROM [table] group by Y

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

divide data into subgroups - sql

You can just use arithmetic: select 1 + floor((row_number - 1) / 4) as newColumn from t; Note: Teradata prefers to be integer division, so floor() is not strictly necessary.

Related

Convert column into the rows

Calculate scattered rows's average in sql server?

Using temporary extended table to make a sum

sql server 2008 - calculated and ordered list needs to return only 2 entries per supplier

Return the last sub sorted row in a table (sql)

Categories

Resources