divide data into subgroups - sql

I have y=20 rows and would like to create a new column which divides the rows into n subgroups. Let us say n is 4 the result would be:
RowNumber NewColumn
1 1
2 1
3 1
4 1
5 1
6 2
7 2
8 2
9 2
10 2
11 3
12 3
13 3
14 3
15 3
16 4
17 4
18 4
19 4
20 4
How could I achieve this in SQL/TeraData please?
PS:
To add to the accepted answer, I am using something along those lines:
1 + FLOOR((ROW_NUMBER() OVER (ORDER BY Id DESC ) - 1) / 100) AS SubGroup

You can just use arithmetic:
select 1 + floor((row_number - 1) / 4) as newColumn
from t;
Note: Teradata prefers to be integer division, so floor() is not strictly necessary.

There's an old function to bucket data into percentiles, QUANTILE, but it's deprecated:
QUANTILE(4, ORDER BY whatever ASC)
When you already use another OLAP-function you better rewrite it to
4 * (RANK() OVER (ORDER BY whatever) - 1)
/ COUNT(*) OVER()
Both return a value between 0 and n, so you have to add 1 to get your expected result.
Btw, in Standard SQL there's NTILE which return a slightly different result, see Missing Functions: CUME_DIST & NTILE

Related

Convert column into the rows

This is my current result set of my query:
Question Sol25A Sol25B Sol25C Sol40A Sol40B
======================================================
A 1 4 2 6 0
B 2 3 2 1 9
C 6 7 1 0 8
======================================================
Total = 9 14 5 7 17
======================================================
And I want the result in this form:
Product Total
===============
Sol25A 9
Sol25B 14
Sol25C 5
Sol40A 7
Sol40B 17
Can you please provide me the query for me, this will be the great help for me.
I would suggest that you unpivot using cross apply and then aggregate:
select product, sum(val)
from t cross apply
(values ('Sol25A', Sol25A), ('Sol25B', Sol25B), ('Sol25C', Sol25C),
('Sol40A', Sol40A), ('Sol40B', Sol40B)
) v(product, val)
group by product;

Calculate scattered rows's average in sql server?

The table looks like below:
testid stepid serverid duration
1 1 1 10
1 2 1 11
2 1 2 12
2 2 2 13
3 1 1 14
3 2 1 15
4 1 2 16
4 2 2 17
4 tests ran on two servers. Each test has 2 steps. I would like to calculate average duration of each step of all tests on the 2 servers given test id. For example, if given test ids are 1 and 2, the final table looks like below:
stepid avg_duration
1 (10 + 12) / 2
2 (11 + 13) / 2
This is just a group by, right?
select stepid, avg(duration)
from t
where testid in (1, 2)
group by stepid;
Note: You might want avg(duration*1.0) if you want "normal" division.

Using temporary extended table to make a sum

From a given table I want to be able to sum values having the same number (should be easy, right?)
Problem: A given value can be assigned from 2 to n consecutive numbers.
For some reasons this information is stored in a single row describing the value, the starting number and the ending number as below.
TABLE A
id | starting_number | ending_number | value
----+-----------------+---------------+-------
1 2 5 8
2 0 3 5
3 4 6 6
4 7 8 10
For instance the first row means:
value '8' is assigned to numbers: 2, 3 and 4 (5 is excluded)
So, I would like the following intermediairy result table
TABLE B
id | number | value
----+--------+-------
1 2 8
1 3 8
1 4 8
2 0 5
2 1 5
2 2 5
3 4 6
3 5 6
4 7 10
So I can sum 'value' for elements having identical 'number'
SELECT number, sum(value)
FROM B
GROUP BY number
TABLE C
number | sum(value)
--------+------------
2 13
3 8
4 14
0 5
1 5
5 6
7 10
I don't know how to do this and didn't find any answer on the web (maybe not looking with appropriate key words...)
Any idea?
You can do what you want with generate_series(). So, TableB is basically:
select id, generate_series(starting_number, ending_number - 1, 1) as n, value
from tableA;
Your aggregation is then:
select n, sum(value)
from (select id, generate_series(starting_number, ending_number - 1, 1) as n, value
from tableA
) a
group by n;

sql server 2008 - calculated and ordered list needs to return only 2 entries per supplier

I have a dataset like below, but longer. I want to ensure I am picking the 'fleet_id' in terms of their 'StarDriver' value overall, but I want to return only two results for each 'supplier_id' and return a max of 20 in total.
(I'm sorry I didnt work out how to copy the below in proper formatting, couldn't find from toolbar above and google results were about copying data; would also be grateful if someone would point out how)
fleet_id supplier_id Ratings Driver Punctuality Car StarDriver
19442 151 10 5 5 5 5
19634 151 11 5 5 5 5
19437 151 12 5 5 5 5
12832 10 14 5 4.92857142857143 5 4.97619047619048
12217 111 10 5 5 4.9 4.96666666666667
21135 158 19 5 4.89473684210526 5 4.96491228070175
19436 151 14 4.85714285714286 5 5 4.95238095238095
12239 111 12 4.91666666666667 5 4.91666666666667 4.94444444444445
10520 92 12 4.91666666666667 5 4.91666666666667 4.94444444444445
19997 151 12 5 5 4.83333333333333 4.94444444444444
To limit to the top 2 for each supplier, use row_number(). This will enumerate the rows and you can choose just two with where seqnum <= 2.
The rest of the query is just selecting 20 rows based on a field:
select top 20 t.*
from (select t.*,
row_number() over (partition by supplier order by StarDriver desc) as seqnum
from table t
) t
where seqnum <= 2
order by StarDriver;

Return the last sub sorted row in a table (sql)

It's quiet hard to describe this problem but it's easy to see it graphically:
x y
1 1
2 1
3 1
* 4 1 *
5 2
* 6 2 *
7 3
8 3
9 3
* 10 3 *
I have sorted a table by x, then sub-sorted by y. I need to return the x value of the last item in the sub-sorted table (the stared rows).
I'm aware of the LAST command, but I don't know how to apply this recursively i.e. to each sub-sorted section.
Best,
Dan
SELECT y, Max(x) FROM [table] group by Y