How to split data in ratio using SQL - sql

Suppose we have a table with columns X and Y where Y is the total count of values present in X.
Column X
Column Y
3000
23
8000
50
4000
20
9000
70
5000
64
How to split the data with 8:1:1 ratio of column Y.
Example: Y is 23. Therefore 8:1:1 of Y will be nearly 18,2,3. There will be 18 rows for train, 3 rows for test and 2 rows for val.
Similarly 8:1:1 of 64 will be 51,7 and 6.
Expected output table is like this:
Column X
Column Y
Column Z
3000
1
Train
3000
.
Train
3000
.
Train
3000
18
Train
3000
1
Test
3000
.
Test
3000
3
Test
3000
1
Val
3000
2
Val
8000
1
Train
8000
.
Train
8000
40
Train
8000
1
Test
8000
.
Test
8000
5
Test
8000
1
Val
8000
.
Val
8000
5
Val
4000
1
Train
4000
.
Train
4000
.
Train
4000
16
Train
4000
1
Test
4000
2
Test
4000
1
Val
4000
2
Val
5000
1
Train
5000
.
Train
5000
51
Train
5000
1
Test
5000
.
Test
5000
.
Test
5000
7
Test
5000
1
Val
5000
.
Val
5000
6
Val
To summarize, I want to split the all rows in train, test, val set in proportion of 8:1:1 using column Y value.
I tried using Pandas for similar task, but unable to do it in SQL

Here's a brute force method that will work. I had to make an assumption on what you mean by 8:1:1, but result in integers:
Val8 = FLOOR(ColumnY * 0.8)
Val1a = FLOOR(ColumnY * 0.1)
Val1b = ColumnY - Val8 - Val1a
So, you may need to adjust if you have clearer requirements.
The code below is written for SQL Server. Among other possible differences, other RDBMSs (like Oracle) may require WITH RECURSIVE for the recursive CTE to process without error.
with a as (
select *
from (
values (1, 3000, 23)
, (2, 8000, 50)
, (3, 4000, 20)
, (4, 9000, 70)
, (5, 5000, 64)
) t (id, ColumnX, ColumnY)
),
b as (
select id
, 'Train' as ColumnZ
, ColumnX
, cast(ColumnY * 0.8 as int) as ColumnY
from a
union all
select id
, 'Test' as ColumnZ
, ColumnX
, cast(ColumnY * 0.1 as int) as ColumnY
from a
),
c as (
select b.id
, 'Val' as ColumnZ
, b.ColumnX
, a.ColumnY - sum(b.ColumnY) as ColumnY
from b
inner join a on a.ColumnX = b.ColumnX
group by b.id
, b.ColumnX
, a.ColumnY
),
iterator as (
select 1 as ColumnY
, max(ColumnY) as m
from c
union all
select ColumnY + 1
, m
from iterator
where ColumnY <= m
)
select b.ColumnX
, i.ColumnY
, b.ColumnZ
from b
inner join iterator i on i.ColumnY <= b.ColumnY
union all
select c.ColumnX
, i.ColumnY
, c.ColumnZ
from c
inner join iterator i on i.ColumnY <= c.ColumnY
order by 1, 3, 2

Related

Value Calculation on oracle sql

I have this table:
CREATE TABLE TEST
(
TITLE VARCHAR2(199 BYTE),
AMOUNT NUMBER,
VALUE NUMBER
)
and this INSERT statement:
INSERT INTO TEST (TITLE, AMOUNT, VAL)
VALUES (Switch, 3000, 12);
COMMIT;
We have an amount = 3000 up to 12, now we need to calculate.
So
3000 multiplied by 1 = 3000
3000 multiplied by 2 = 6000
3000 multiplied by 3 = 9000
3000 multiplied by 4 = 12000
3000 multiplied by 5 = 15000
3000 multiplied by 6 = 18000
3000 multiplied by 7 = 21000
3000 multiplied by 8 = 24000
3000 multiplied by 9 = 27000
3000 multiplied by 10 = 30000
3000 multiplied by 11 = 33000
3000 multiplied by 12 = 36000
Regards
Output is needed in the following format.
Title Amount Total
Switch 30000 3000 6000 9000 12000 15000 18000 21000 24000 27000 30000 33000 36000 231000
plug
board
Can somebody help me how to get this output in SQL?
You can use a recursive query:
WITH data (title, amount, value, idx) AS (
SELECT title, amount, value, 1
FROM test
UNION ALL
SELECT title, amount, value, idx + 1
FROM data
WHERE idx < value
) SEARCH DEPTH FIRST BY title SET order_num
SELECT title, amount * idx AS value
FROM data;
Or a correlated hierarchical query:
SELECT t.title, t.amount * l.idx AS value
FROM test t
CROSS JOIN LATERAL (
SELECT LEVEL AS idx FROM DUAL CONNECT BY LEVEL <= t.value
) l;
Which, for the sample data:
CREATE TABLE TEST ( TITLE VARCHAR2(199 BYTE), AMOUNT NUMBER, VALUE NUMBER )
INSERT INTO TEST ( TITLE, AMOUNT, VALUE ) VALUES ( 'Switch', 3000, 12);
Both output:
TITLE
VALUE
Switch
3000
Switch
6000
Switch
9000
Switch
12000
Switch
15000
Switch
18000
Switch
21000
Switch
24000
Switch
27000
Switch
30000
Switch
33000
Switch
36000
fiddle
Or for your output format:
WITH data (title, amount, value, idx) AS (
SELECT title, amount, value, 1
FROM test
UNION ALL
SELECT title, amount, value, idx + 1
FROM data
WHERE idx < value
) SEARCH DEPTH FIRST BY title SET order_num
SELECT title,
LISTAGG(amount * idx, ' ') WITHIN GROUP (ORDER BY idx) AS amounts,
SUM(amount*idx) AS total
FROM data
GROUP BY title;
or
SELECT t.title,
l.amounts,
t.amount * t.value * (t.value + 1) / 2 AS total
FROM test t
CROSS JOIN LATERAL (
SELECT LISTAGG(LEVEL * t.amount, ' ') WITHIN GROUP (ORDER BY LEVEL) AS amounts
FROM DUAL CONNECT BY LEVEL <= t.value
) l;
Which both output:
TITLE
AMOUNTS
TOTAL
Switch
3000 6000 9000 12000 15000 18000 21000 24000 27000 30000 33000 36000
234000
fiddle
Try it like here:
WITH
tbl AS
(
Select 'Switch' "TITLE", 3000 "AMOUNT", 12 "VAL" From Dual
)
--
Select TITLE, AMOUNT, TOTAL
From (Select LEVEL "ID", TITLE "TITLE", Sum(AMOUNT * LEVEL) OVER() "TOTAL",
LISTAGG(AMOUNT * LEVEL, ' ') WITHIN GROUP (ORDER BY LEVEL) OVER() "AMOUNT"
From tbl
Connect By LEVEL <= VAL )
Where ID = 1
--
-- R e s u l t :
-- TITLE AMOUNT TOTAL
-- ------- --------------------------------------------------------------------- ------
-- Switch 3000 6000 9000 12000 15000 18000 21000 24000 27000 30000 33000 36000 234000
Dear Respectful Experts
Thanks for your Cooperation,
I have oracle 10 G which do not support LISTAGG, what i need to do in 10G,
Thanks
Regards

Case when statement with summed values in SQL

I have a dataset with two columns. I want to categorise one of the columns into bins, and then sum the values in the other column that are within each bin.
I have tried the following code
select DISTINCT (
CASE WHEN H=1 THEN '1'
WHEN H BETWEEN 2 AND 3 THEN '2-3'
WHEN H BETWEEN 4 AND 6 THEN '4-6'
ELSE '' END
) AS H , sum(V) [V]
from
TABLE1 inner join TABLE 2 on TABLE1.X=TABLE2.X
where
TABLE.X=1 and Y='id'
GROUP BY H
ORDER BY H ASC
The table below gives a sample of my data (where H and V are headers)
H V
1 100
1 1000
1 1500
2 300
3 500
4 9000
5 800
6 1100
My desired output is
H V
1 2600
2 TO 3 800
4 TO 6 10900
However, I am getting (ie. duplicated bins as column V is not being summed across all values in each bin)
H V
1 100
1 1000
1 1500
2-3 300
2-3 500
4-6 9000
4-6 800
4-6 1100
You seem to want aggregation on a computed column:
select (CASE WHEN H = 1 THEN '1'
WHEN H BETWEEN 2 AND 3 THEN '2-3'
WHEN H BETWEEN 4 AND 6 THEN '4-6'
ELSE ''
END) AS H , sum(V) as V
from TABLE1 inner join
TABLE2
on TABLE1.X = TABLE2.X
where TABLE.X = 1 and Y = 'id'
GROUP BY (CASE WHEN H = 1 THEN '1'
WHEN H BETWEEN 2 AND 3 THEN '2-3'
WHEN H BETWEEN 4 AND 6 THEN '4-6'
ELSE ''
END)
ORDER BY MIN(H) ASC;
You should qualify all column references in the query.
SELECT DISTINCT is almost never appropriate with GROUP BY.

Teradata query pagination - batch every 1000 records

I've this Teradata query:
WITH ID(ROW_NUM) AS
(
SELECT ROW_NUMBER() OVER (ORDER BY PRSN_ID) AS ROW_NUM
FROM MyTable
WHERE ACTIVE_IND = 'Y'
GROUP BY PRSN_ID
)
SELECT ROW_NUM-ROW_NUM MOD 2 AS FirstIndex,
ROW_NUM-(ROW_NUM-1) MOD 2 AS SecondIndex
FROM ID
WHERE ROW_NUM MOD 2=1
This query will generate an ID column, and the result will be something like this:
FirstIndex SecondIndex
0 1
2 3
4 5
. .
. .
etc etc
I would like to change the selection to be a batch through a 1000 records like below:
FirstIndex SecondIndex
0 1000
1001 2000
2001 3000
3001 4000
. .
. .
etc etc
Your help is appreciated.
Your calculation is way to complex:
ROW_NUM-1 AS FirstIndex,
ROW_NUM AS SecondIndex
To get your requested range you simply need to modify the calculation, e.g.
SELECT
CASE WHEN ROW_NUM = 1 THEN 0 ELSE SecondIndex-999 END AS FirstIndex,
(ROW_NUM+2)/2 * 1000 AS SecondIndex
FROM ID
WHERE ROW_NUM MOD 2=1
But what do you want to do with that result, batch what?
Edit:
It's still unclear why you need a 2nd table to calculate the ranges, but this creates ranges starting from 1 for any pagesize of n rows:
WITH ID(ROW_NUM) AS
( -- just to get some rows
SELECT day_of_calendar AS row_num
FROM sys_calendar.CALENDAR
WHERE row_num BETWEEN 1 AND 10
)
SELECT 1000 AS RowsPerPage,
Row_Num AS page_num,
rownum_to - (RowsPerPage-1) AS rownum_from,
page_num * RowsPerPage AS rownum_to
FROM ID
GROUP BY page_num
ORDER BY page_num
RowsPerPage page_num rownum_from rownum_to
1000 1 1 1000
1000 2 1001 2000
1000 3 2001 3000
1000 4 3001 4000
1000 5 4001 5000
1000 6 5001 6000
1000 7 6001 7000
1000 8 7001 8000
1000 9 8001 9000
1000 10 9001 10000
Hi you can try the below query once, this will give you the desired sequence.
select ROW_NUM,FirstIndex, coalesce(case when SecondIndex<1000 then (SecondIndex+2000) else (SecondIndex+1000) end,FirstIndex+1001) as SecondIndex from
(SELECT ROW_NUM,
case when (FirstIndex < 1000) then (FirstIndex+1001) else (FirstIndex+1) end as FirstIndex,
(min(FirstIndex) OVER(ORDER BY FirstIndex ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING)) SecondIndex
FROM (Select (ROW_NUMBER() OVER (ORDER BY PRSN_ID)-1) AS ROW_NUM,(ROW_NUMBER() OVER (ORDER BY PRSN_ID)-1) AS FirstIndex from Mytable GROUP BY PRSN_ID) A) B
order by ROW_NUM;

SQL expand row mapping every grouped value to a coordinated value

I have a table, where I'm doing some iterative calculations for an engineering application. ColD has been calculated from another query, so it's basically trying to find a best fit. Part of the strategy in the best fit, is to get a point where I look at each grouping of ColC (grouping signified by ColB), and have each value reference the value in ColD.
In essence, I need Table A to be converted to Table B
Table A:
ColA ColB ColC ColD
1 1 A 200
2 2 B 300
3 3 C 400
4 1 X 200
5 2 Y 400
6 3 Z 600
Table B:
A 200
B 200
C 200
A 300
B 300
C 300
A 400
B 400
C 400
X 200
Y 200
Z 200
X 400
Y 400
Z 400
X 600
Y 600
Z 600
It looks like you want something like this:
WITH cte
AS
(
SELECT
ColA
, ColB
, ColC
, ColD
, SUM(CASE WHEN L < ColB THEN 0 ELSE 1 END) OVER (ORDER BY ColA) GroupID
FROM
(
SELECT
ColA
, ColB
, ColC
, ColD
, LAG(ColB, 1, NULL) OVER (ORDER BY ColA) L
FROM YourTable
) Q
)
SELECT
C1.ColC
, C2.ColD
FROM
cte C1
JOIN cte C2 ON C1.GroupID = C2.GroupID
ORDER BY C2.ColA

An sql statement to select the closest values

In the table below, that I'll call TableA are two numerical columns. I need to create a Select statement whereby the value of B is specified. Either one or two rows are returned. Not sure if this can be done in a single sql statement. If a row exists where the value of B matches, then just that row is returned. If the value of B is between two values of B that are closest to its value, both values are returned. If a value exists that is larger than it but no value exists that is smaller than it, than the larger value is returned. If no larger value exists but a smaller one does, than the row with the smaller value is returned. Here are some examples. It would be nice if the sql worked in sqlite:
A B
50 400
10 200
30 100
40 800
20 500
B = 10
A B
30 100
----------
B = 250
A B
10 200
50 400
----------
B = 100
A B
30 100
----------
B = 410
A B
50 400
20 500
----------
B = 900
A B
40 800
SELECT * FROM A WHERE B = 10
UNION
SELECT * FROM A WHERE B = (SELECT MAX(B) FROM A WHERE B < 10)
UNION
SELECT * FROM A WHERE B = (SELECT MIN(B) FROM A WHERE B > 10);
See it working live in an sqlfiddle.
SELECT * FROM TableA WHERE B = (SELECT MAX(B) FROM TableA WHERE B <= 10)
UNION
SELECT * FROM TableA WHERE B = (SELECT MIN(B) FROM TableA WHERE B >= 10)