Split a column into multiple columns - sql

select distinct account_num from account order by account_num;
The above query gave the below result
account_num
1
2
4
7
12
18
24
37
45
59
I want to split the account_num column into tuple of three account_num's like (1,2,4);(7,12,18);(24,37,45),(59); The last tuple has only one entry as there are no more account_num's left. Now I want a query to output the min and max of each tuple. (please observe that the max of one tuple is less than the min of the next tuple). Output desired is shown below
1 4
7 18
24 45
59 59
Edit: I have explained my requirement in the best way I could

You can use the example below as a scratch, this is only based on information you have provided so far. For further documentation, you can consult Oracle's analytical functions docs:
with src as( --create a source data
select 1 col from dual union
select 2 from dual union
select 4 from dual union
select 7 from dual union
select 12 from dual union
select 18 from dual union
select 24 from dual union
select 37 from dual union
select 45 from dual union
select 59 from dual
)
select
col,
decode(col_2, 0, max_col, col_2) col_2 -- for the last row we get the maximum value for the row
from (
select
col,
lead(col, 2, 0) over (order by col) col_2, -- we get the values from from two rows behind
max(col) over () max_col, -- we get the max value to be used for the last row in the result
rownum rn from src -- we get the rownum to handle the final output
) where mod(rn - 1, 3) = 0 -- only get rows having a step of two

This is another solution.
SELECT *
FROM (SELECT DISTINCT MIN(val) over(PARTITION BY gr) min_,
MAX(val) over(PARTITION BY gr) max_
FROM (SELECT val,
decode(trunc(rn / 3), rn / 3, rn / 3, ceil(rn / 3)) gr
FROM (SELECT val,
row_number() over(ORDER BY val) rn
FROM (select distinct account_num from account order by account_num)))) ORDER BY min_
UPDATED
Solution without analytic function.
SELECT MIN(val) min_,
MAX(val) max_
FROM (SELECT val,
ceil(rn / 3) gr
FROM (SELECT val,
rownum rn
FROM A_DEL_ME)) GROUP BY gr

Please add more information on what you want to do. What is the connection between account_number 1 and number 4, 7 and 18? Is there any? If not, why would you want to split this into two columns and what is the rule for splitting it?
With what you have posted, you could do something like this:
select 1 as account_num, 4 as account_num1 from dual
union all select 7 as account_num, 18 as account_num1 from dual
...
and so on, but I don't see the use for this.

Related

How to filter an ordered table until the first row where a threshold is reached?

I'm using PostgreSQL and I have a table with the following columns: id, distance, and length. I want to order the table by distance and create a new column called cum_length using a window function. I also want to filter the rows so that only rows until cum_length value cross a certain threshold are included in the final result.
Example of input table:
id
distance
length
1
10
1
2
5
2
3
8
1
4
1
3
5
3
2
6
9
2
Desired output for a threshold of 6:
id
distance
length
cum_length
4
1
3
3
5
3
2
5
2
5
2
7
This is the SQL that I came up with:
WITH ordered_table AS (
SELECT id,
distance,
length,
SUM(length) OVER (ORDER BY distance) AS cum_length
FROM table)
SELECT *
FROM ordered_table
WHERE cum_length <= 6
But this omits the last row of the desired result.
Try the following:
WITH ordered_table AS
(
SELECT id, distance, length,
SUM(length) OVER (ORDER BY distance) AS cum_length
FROM table_name
)
SELECT id, distance, length, cum_length
FROM ordered_table
WHERE cum_length <= COALESCE((SELECT MIN(cum_length) FROM ordered_table WHERE cum_length > 6), 6)
For your sample data, this is equivalent to WHERE cum_length <= 7.
See demo
Here is it based on your ordered_table query and an extra lag window function to calculate previous_is_less than the threshold value. Records where previous_is_less is true or null qualify for selection.
with t as
(
select *,
lag(cum_length) over (order by cum_length) < 6 as previous_is_less
from
(
SELECT id, distance, length,
SUM(length) OVER (order BY distance) AS cum_length
from the_table
) as ordered_table
)
select id, distance, length, cum_length
from t
where coalesce(previous_is_less, true)
order by cum_length;
DB-Fiddle demo
May be interesting to compare with this one on large data set, only one sorting by distance and should stop scanning data as soon as the threshold is found
with data(id, distance, length) as (
select 1, 10, 1 FROM DUAL UNION ALL
select 2, 5, 2 FROM DUAL UNION ALL
select 3, 8, 1 FROM DUAL UNION ALL
select 4, 1, 3 FROM DUAL UNION ALL
select 5, 3, 2 FROM DUAL UNION ALL
select 6, 9, 2 FROM DUAL -- UNION ALL
),
rdata(id, distance, length, rn) as (
select id, distance, length, row_number() over(order by distance) as rn
from data
),
recdata(id, distance, length, rn, cumlength) as (
select id, distance, length, rn, length
from rdata d
where rn = 1
union all
select d.id, d.distance, d.length, d.rn, d.length + r.cumlength
from recdata r
join rdata d on d.rn = r.rn + 1 and r.cumlength <= 6
)
select id, distance, length, cumlength from recdata
;

How to use distinct keyword on two columns in oracle sql?

I used distinct keyword on one column it did work very well but when I add the second column in select query it doesn't work for me as both columns have duplicate values. So I want to not show me the duplicate values in both columns. Is there any proper select query for that.
The sample data is:
For Col001:
555
555
7878
7878
89.
Col002:
43
43
56
56
56
67
67
67
79
79
79.
I want these data in this format:
Col001:
555
7878
89.
Col002:
43
56
67
79
I tried the following query:
Select distinct col001, col002 from tbl1
Use a set operator. UNION will give you the set of unique values from two subqueries.
select col001 as unq_col_val
from your_table
union
select col002
from your_table;
This presumes you're not fussed whether the value comes from COL001 or COL002. If you are fussed, this variant preserves that information:
select 'COL001' as source_col
,col001 as unq_col_val
from your_table
union
select 'COL002' as source_col
,col002
from your_table;
Note that this result set will contain more rows if the same value exists in both columns.
DISTINCT works across the entire row considering all values in the row and will remove duplicate values where the entire row is duplicated.
For example, given the sample data:
CREATE TABLE table_name (col001, col002) AS
SELECT 1, 1 FROM DUAL UNION ALL
SELECT 1, 2 FROM DUAL UNION ALL
SELECT 1, 3 FROM DUAL UNION ALL
SELECT 2, 1 FROM DUAL UNION ALL
SELECT 2, 2 FROM DUAL UNION ALL
--
SELECT 1, 2 FROM DUAL UNION ALL -- These are duplicates
SELECT 2, 2 FROM DUAL;
Then:
SELECT DISTINCT
col001,
col002
FROM table_name
Outputs:
COL001
COL002
1
1
1
2
1
3
2
1
2
2
And the duplicates have been removed.
If you want to only display distinct values for each column then you need to consider each column separately and can use something like:
SELECT c1.col001,
c2.col002
FROM ( SELECT DISTINCT
col001,
DENSE_RANK() OVER (ORDER BY col001) AS rnk
FROM table_name
) c1
FULL OUTER JOIN
( SELECT DISTINCT
col002,
DENSE_RANK() OVER (ORDER BY col002) AS rnk
FROM table_name
) c2
ON (c1.rnk = c2.rnk)
Which outputs:
COL001
COL002
1
1
2
2
null
3
db<>fiddle here

Return distinct rows based on only one column in oracle sql

I want to return an n number of distinct rows. The distinct rows should be based on one column (SN) only.
I have the query below which is expected to return 4 rows where the serial number is greater than 2 and no rows with similar SN column values are returned.
Table
SN letter value
1 test 25
1 bread 26
3 alpha 43
4 beta 23
4 gamma 5
5 omega 60
6 omega 60
Expected Result
SN letter value
3 alpha 43
4 beta 23
5 omega 60
6 omega 60
This is the query I have. This does not work correctly, it returns the duplicates because it filters disctinct values by all the columns combined instead of just the single column, SN.
SELECT * FROM (SELECT a.*, row_number() over(order by SN) rowRank
FROM (SELECT distinct SN, letter, value from table where SN > 2 order by SN) a)
WHERE rowRank BETWEEN 1 AND 4}"
You do not need to use DISTINCT before trying to filter out your results. You can modify the ORDER BY clause of the row_rank analytic function if you need to modify which duplicate of a SN should be returned. Right now it is returning the first LETTER value alphabetically since that matches your example result.
Query
WITH
some_table (sn, letter, VALUE)
AS
(SELECT 1, 'test', 25 FROM DUAL
UNION ALL
SELECT 1, 'bread', 26 FROM DUAL
UNION ALL
SELECT 3, 'alpha', 43 FROM DUAL
UNION ALL
SELECT 4, 'beta', 23 FROM DUAL
UNION ALL
SELECT 4, 'gamma', 5 FROM DUAL
UNION ALL
SELECT 5, 'omega', 60 FROM DUAL
UNION ALL
SELECT 6, 'omega', 60 FROM DUAL)
--Above is to set up the sample data. Use the query below with your real table
SELECT sn, letter, VALUE
FROM (SELECT sn,
letter,
VALUE,
ROW_NUMBER () OVER (PARTITION BY sn ORDER BY letter) AS row_rank
FROM some_table
WHERE sn > 2)
WHERE row_rank = 1
ORDER BY sn
FETCH FIRST 4 ROWS ONLY;
Result
SN LETTER VALUE
_____ _________ ________
3 alpha 43
4 beta 23
5 omega 60
6 omega 60
SELECT * FROM
(
SELECT
t.*
,ROW_NUMBER() OVER (PARTITION BY sn ORDER BY value ) rn
FROM
t
WHERE sn > 2
) t1
WHERE t1.rn = 1
ORDER BY sn;

How to get all sums values with out each element using BigQuery?

I have a table in BigQuery. I want to count all sums of values in column removing each element alternately by id. As output I want to see removed id and sum of other values.
WITH t as (SELECT 1 AS id, "LY" as code, 34 AS value
UNION ALL
SELECT 2, "LY", 45
UNION ALL
SELECT 3, "LY", 23
UNION ALL
SELECT 4, "LY", 5
UNION ALL
SELECT 5, "LY", 54
UNION ALL
SELECT 6, "LY", 78)
SELECT lv id, SUM(lag) sum_wo_id
FROM
(SELECT *, FIRST_VALUE(id) OVER (ORDER BY id DESC) lv, LAG(value) OVER (Order by id) lag from t)
GROUP BY lv
In example above I can see sum of values with out id = 6. How can I modify this query to get sums without another ids like 12346, 12356, 12456, 13456, 23456 and see which one removed?
Below is for BigQuery Standard SQL
Assuming ids are distinct - you can simply use below
#standardSQL
SELECT id AS removed_id,
SUM(value) OVER() - value AS sum_wo_id
FROM t
if applied to sample data from your question - output is
Row removed_id sum_wo_id
1 1 205
2 2 194
3 3 216
4 4 234
5 5 185
6 6 161
In case if id is not unique - you can first group by id as in below example
#standardSQL
SELECT id AS removed_id,
SUM(value) OVER() - value AS sum_wo_id
FROM (
SELECT id, SUM(value) AS value
FROM t
GROUP BY id
)

SQL Grouping by Ranges

I have a data set that has timestamped entries over various sets of groups.
Timestamp -- Group -- Value
---------------------------
1 -- A -- 10
2 -- A -- 20
3 -- B -- 15
4 -- B -- 25
5 -- C -- 5
6 -- A -- 5
7 -- A -- 10
I want to sum these values by the Group field, but parsed as it appears in the data. For example, the above data would result in the following output:
Group -- Sum
A -- 30
B -- 40
C -- 5
A -- 15
I do not want this, which is all I've been able to come up with on my own so far:
Group -- Sum
A -- 45
B -- 40
C -- 5
Using Oracle 11g, this is what I've hobbled togther so far. I know that this is wrong, by I'm hoping I'm at least on the right track with RANK(). In the real data, entries with the same group could be 2 timestamps apart, or 100; there could be one entry in a group, or 100 consecutive. It does not matter, I need them separated.
WITH SUB_Q AS
(SELECT K_ID
, GRP
, VAL
-- GET THE RANK FROM TIMESTAMP TO SEPARATE GROUPS WITH SAME NAME
, RANK() OVER(PARTITION BY K_ID ORDER BY TMSTAMP) AS RNK
FROM MY_TABLE
WHERE K_ID = 123)
SELECT T1.K_ID
, T1.GRP
, SUM(CASE
WHEN T1.GRP = T2.GRP THEN
T1.VAL
ELSE
0
END) AS TOTAL_VALUE
FROM SUB_Q T1 -- MAIN VALUE
INNER JOIN SUB_Q T2 -- TIMSTAMP AFTER
ON T1.K_ID = T2.K_ID
AND T1.RNK = T2.RNK - 1
GROUP BY T1.K_ID
, T1.GRP
Is it possible to group in this way? How would I go about doing this?
I approach this problem by defining a group which is the different of two row_number():
select group, sum(value)
from (select t.*,
(row_number() over (order by timestamp) -
row_number() over (partition by group order by timestamp)
) as grp
from my_table t
) t
group by group, grp
order by min(timestamp);
The difference of two row numbers is constant for adjacent values.
A solution using LAG and windowed analytic functions:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE TEST ( "Timestamp", "Group", Value ) AS
SELECT 1, 'A', 10 FROM DUAL
UNION ALL SELECT 2, 'A', 20 FROM DUAL
UNION ALL SELECT 3, 'B', 15 FROM DUAL
UNION ALL SELECT 4, 'B', 25 FROM DUAL
UNION ALL SELECT 5, 'C', 5 FROM DUAL
UNION ALL SELECT 6, 'A', 5 FROM DUAL
UNION ALL SELECT 7, 'A', 10 FROM DUAL;
Query 1:
WITH changes AS (
SELECT t.*,
CASE WHEN LAG( "Group" ) OVER ( ORDER BY "Timestamp" ) = "Group" THEN 0 ELSE 1 END AS hasChangedGroup
FROM TEST t
),
groups AS (
SELECT "Group",
VALUE,
SUM( hasChangedGroup ) OVER ( ORDER BY "Timestamp" ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS grp
FROM changes
)
SELECT "Group",
SUM( VALUE )
FROM Groups
GROUP BY "Group", grp
ORDER BY grp
Results:
| Group | SUM(VALUE) |
|-------|------------|
| A | 30 |
| B | 40 |
| C | 5 |
| A | 15 |
This is typical "star_of_group" problem (see here: https://timurakhmadeev.wordpress.com/2013/07/21/start_of_group/)
In your case, it would be as follows:
with t as (
select 1 timestamp, 'A' grp, 10 value from dual union all
select 2, 'A', 20 from dual union all
select 3, 'B', 15 from dual union all
select 4, 'B', 25 from dual union all
select 5, 'C', 5 from dual union all
select 6, 'A', 5 from dual union all
select 7, 'A', 10 from dual
)
select min(timestamp), grp, sum(value) sum_value
from (
select t.*
, sum(start_of_group) over (order by timestamp) grp_id
from (
select t.*
, case when grp = lag(grp) over (order by timestamp) then 0 else 1 end
start_of_group
from t
) t
)
group by grp_id, grp
order by min(timestamp)
;