split no. of rows into equal batches

split no. of rows into equal batches - sql

I have a table and would like to split the rows into batches(adding additional column like batch_no ).
e.g I have a table with Single column cust_no and total=20 rows and split them into 5 batches like 1,2,3,4 and 5.
expected output like...
cust_no batch_no
23123 1
2313 1
23 1
323123 1
123 1
23 2
213 2
123 2
2123 2
2123 2
23123 3
2313 3
23 3
323123 3
123 3
23 4
...
...
...
...

If you know the number of batches then use ntile():
select t.*,
ntile(5) over (order by null) as batch
from t;
Note: The order by null means that you do not care about the ordering. It is arbitrary and not reproducible. You might want to include some criteria -- say an id column or creation date or something so the batches are more homogenous.
If you want batches of a certain size, then use row_number() and arithmetic:
select t.*,
(row_number() over (order by null) % 4) as batch
from t;
In some databases, this would be:
select t.*,
mod(row_number() over (order by null), 4) as batch
from t;

Related

Average and sort by this based on other conditional columns in a table

I have a table in SQL Server 2017 like below:
Name Rank1 Rank2 Rank3 Rank4
Jack null 1 1 3
Mark null 3 2 2
John null 2 3 1
What I need to do is to add an average rank column then rank those names based on those scores. We ignore null ranks. Expected output:
Name Rank1 Rank2 Rank3 Rank4 AvgRank FinalRank
Jack null 1 1 3 1.66 1
Mark null 3 2 2 2.33 3
John null 2 3 1 2 2
My query now looks like this:
;with cte as (
select *, AvgRank= (Rank1+Rank2+Rank3+Rank4)/#NumOfRankedBy
from mytable
)
select *, FinakRank= row_number() over (order by AvgRank)
from cte
I am stuck at finding the value of #NumOfRankedBy, which should be 3 in our case because Rank1 is null for all.
What is the best way to approach such an issue?
Thanks.

Your conumdrum stems from the fact your table in not normalised and you are treating data (Rank) as structure (columns).
You should have a table for Ranks where each rank is a row, then your query is easy.
You can unpivot your columns into rows and then make use of avg
select *, FinakRank = row_number() over (order by AvgRank)
from mytable
cross apply (
select Avg(r * 1.0) AvgRank
from (values(rank1),(rank2),(rank3),(rank4))r(r)
)r;

SQL compares the value of 2 columns and select the column with max value row-by-row

I have table something like:
GROUP
NAME
Value_1
Value_2
1
ABC
0
0
1
DEF
4
4
50
XYZ
6
6
50
QWE
6
7
100
XYZ
26
2
100
QWE
26
2
What I would like to do is to groupby group and select the name with highest value_1. If their value_1 are the same, compare and select the max with value_2. If they're still the same, select the first one.
The output will be something like:
GROUP
NAME
Value_1
Value_2
1
DEF
4
4
50
QWE
6
7
100
XYZ
26
2
The challenge for me here is I don't know how many categories in NAME so a simple case when is not working. Thanks for help

You can use window functions to solve the bulk of your problem:
select t.*
from (select t.*,
row_number() over (partition by group order by value1 desc, value2 desc) as seqnum
from t
) t
where seqnum = 1;
The one caveat is the condition:
If they're still the same, select the first one.
SQL tables represent unordered (multi-) sets. There is no "first" one unless a column specifies the ordering. The best you can do is choose an arbitrary value when all the other values are the same.
That said, you might have another column that has an ordering. If so, add that as a third key to the order by.

how to rank/group 3 rows each in sql

I have a table with 6 records, and need to get the results in the format below, grouping them into 3 rows each.
Input table:
id Value
-------------
1 abcd
2 defgh
3 ijkl
4 mnop
5 qrst
6 uvwx
Output format needed:
Rank id Value
--------------------
1 1 abcd
1 2 defgh
1 3 ijkl
2 4 mnop
2 5 qrst
2 6 uvwx

Here is one method:
select dense_rank() over (order by (id - 1)/3) as grp, id, value
from t;
This assumes, as in your sample data, that id starts at 1 and increases with no gaps.
If that is not true, then an alternative is:
select dense_rank() over (order by seqnum/3) as grp, id, value
from (select t.*, row_number() over (order by id) - 1 as seqnum
from t
);

You can use NTILE() here.
SELECT NTILE(2) OVER(ORDER BY id),id FROM TABLE_NAME
Think of it as buckets, NTILE(2) will make 2 buckets, half the rows will have the value 1 and the other half the value 2

Add a series column that will increment if reaches the maximum number

I would like to add a column that will count according to the max number of series.
I have generated the SERIES_NO column using below:
MOD(ROW_NUMBER() OVER (ORDER BY item_code, loc_code, cargo_sts) - 1, 3) + 1
In this case the Max series no is 3. Below is the sample result set. Now, I want to generate the SHEET_NO column. Any suggestion? Thanks.
CARGO_STS LOC_CODE ITEM_CODE AVAIL_QTY SERIES_NO SHEET_NO
NORMAL D1867BD1 0000044500 6 1 1
NORMAL D1947GD1 0000055401 2 2 1
NORMAL D3351AA1 0000058000 2 3 1
NORMAL D1945DC2 0000058201 1 1 2
STO-DAMAGE 205-12BB 0000058300 1 2 2
NORMAL D3446FB1 0000058300 1 3 2
NORMAL Q00-37CA 0000060401 128 1 3
NORMAL D1158FA1 0000079901 36 2 3

something like following will do:
ceil (ROW_NUMBER() OVER (ORDER BY item_code, loc_code, cargo_sts) /3 )

How to find the SQL medians for a grouping

I am working with SQL Server 2008
If I have a Table as such:
Code Value
-----------------------
4 240
4 299
4 210
2 NULL
2 3
6 30
6 80
6 10
4 240
2 30
How can I find the median AND group by the Code column please?
To get a resultset like this:
Code Median
-----------------------
4 240
2 16.5
6 30
I really like this solution for median, but unfortunately it doesn't include Group By:
https://stackoverflow.com/a/2026609/106227

The solution using rank works nicely when you have an odd number of members in each group, i.e. the median exists within the sample, where you have an even number of members the rank method will fall down, e.g.
1
2
3
4
The median here is 2.5 (i.e. half the group is smaller, and half the group is larger) but the rank method will return 3. To get around this you essentially need to take the top value from the bottom half of the group, and the bottom value of the top half of the group, and take an average of the two values.
WITH CTE AS
( SELECT Code,
Value,
[half1] = NTILE(2) OVER(PARTITION BY Code ORDER BY Value),
[half2] = NTILE(2) OVER(PARTITION BY Code ORDER BY Value DESC)
FROM T
WHERE Value IS NOT NULL
)
SELECT Code,
(MAX(CASE WHEN Half1 = 1 THEN Value END) +
MIN(CASE WHEN Half2 = 1 THEN Value END)) / 2.0
FROM CTE
GROUP BY Code;
Example on SQL Fiddle
In SQL Server 2012 you can use PERCENTILE_CONT
SELECT DISTINCT
Code,
Median = PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY Value) OVER(PARTITION BY Code)
FROM T;
Example on SQL Fiddle

SQL Server does not have a function to calculate medians, but you could use the ROW_NUMBER function like this:
WITH RankedTable AS (
SELECT Code, Value,
ROW_NUMBER() OVER (PARTITION BY Code ORDER BY VALUE) AS Rnk,
COUNT(*) OVER (PARTITION BY Code) AS Cnt
FROM MyTable
)
SELECT Code, Value
FROM RankedTable
WHERE Rnk = Cnt / 2 + 1
To elaborate a bit on this solution, consider the output of the RankedTable CTE:
Code Value Rnk Cnt
---------------------------
4 240 2 3 -- Median
4 299 3 3
4 210 1 3
2 NULL 1 2
2 3 2 2 -- Median
6 30 2 3 -- Median
6 80 3 3
6 10 1 3
Now from this result set, if you only return those rows where Rnk equals Cnt / 2 + 1 (integer division), you get only the rows with the median value for each group.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

split no. of rows into equal batches - sql

Related

Average and sort by this based on other conditional columns in a table

SQL compares the value of 2 columns and select the column with max value row-by-row

how to rank/group 3 rows each in sql

Add a series column that will increment if reaches the maximum number

How to find the SQL medians for a grouping

Categories

Resources