How can I select a different column from each row? - sql

I have a table which has three columns with three records. How can I select first value of the first column, second value of the second column, third value of the third column?
table
============
test_tab
id1 id2 id2
=== === ====
100 400 700
200 500 800
300 600 900
Required output like :
100 500 900
How can I achieve this by using Oracle SQL or PL/SQL?

First of all, How you would identify which row is the first and which one is second?
Oracle does not guarantee the order of the records, it must be ordered using order by clause explicitly else it will be, we can say random order (The default query output)
With your test data and result, You can use the following query:
Note: I am considering the third column as ID3 and ordering the rows using ID1
SELECT
MAX(CASE RN
WHEN 1 THEN ID1
END) AS ID1,
MAX(CASE RN
WHEN 2 THEN ID2
END) AS ID2,
MAX(CASE RN
WHEN 3 THEN ID3
END) AS ID3
FROM
(
SELECT
ID1,
ID2,
ID3,
ROW_NUMBER() OVER(
ORDER BY
ID1
) RN
FROM
TEST_TAB
);
Cheers!!

You will need a means of establishing what "first" means.
In Oracle, here is one way to address the example from your question:
create table test_tab(
id1 integer,
id2 integer,
id3 integer
);
insert into test_tab values(100,400,700);
insert into test_tab values(200,500,800);
insert into test_tab values(300,600,900);
commit;
select sum(decode(pos, 1, id1, null)) id1,
sum(decode(pos, 2, id2, null)) id2,
sum(decode(pos, 3, id3, null)) id3
from(
-- this subquery produces a rank column for the whole dataset
-- with an explicit order
select id1, id2, id3, rank() over (order by id1, id2, id3) pos from TEST_TAB
);
In this implementation, the subquery is used to establish an ordering of the rows, adding a new pos column based on the rank() function.
The sum(decode(pos, 3, id3, null)) construct is an Oracle idiom for picking one specific row (row 3 in this case) while ignoring the others.
Basically, for your three rows, the decode will result in NULL for any row but the one with the specified number, so the expression for id3 will only have a non-null value for the third row, hence the sum over the group will be equal to id3 in row 3.
There are many ways to do it, this is just one, and you will likely need to make some adjustments to this implementation for it to work properly in your real code.

Related

create a new column that contains a list of values from another column subsequent rows

I have a table like below,
and want to create a new column that contains a list of values from another column subsequent rows like below,
for copy paste:
timestamp ID Value
2021-12-03 04:03:45 ID1 O
2021-12-03 04:03:46 ID1 P
2021-12-03 04:03:47 ID1 Q
2021-12-03 04:03:48 ID1 R
2021-12-03 04:03:49 ID1 NULL
2021-12-03 04:03:50 ID1 S
2021-12-03 04:03:51 ID1 T
2021-12-04 11:09:03 ID2 A
2021-12-04 11:09:04 ID2 B
2021-12-04 11:09:05 ID2 C
Using windowed functions and range JOIN:
WITH cte AS (
SELECT tab.*,
COALESCE(FIRST_VALUE(CASE WHEN VALUE IS NULL THEN tmp END) IGNORE NULLS
OVER(PARTITION BY ID ORDER BY TMP
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
,MAX(tmp) OVER(PARTITION BY ID)) AS next_tmp
FROM tab
)
SELECT c1.tmp, c1.id, c1.value,
LISTAGG(c2.value, ',') WITHIN GROUP(ORDER BY c2.tmp) AS list
FROM cte c1
LEFT JOIN cte c2
ON c1.ID = c2.ID
AND (c1.tmp < c2.tmp AND c2.tmp <= c1.next_tmp)
GROUP BY c1.tmp, c1.id, c1.value
ORDER BY c1.ID, c1.tmp;
db<>fiddle demo
Output:
How does it work:
The idea is to find first timestamp corresponding to NULL value per each ID:
SELECT tab.*,
COALESCE(FIRST_VALUE(CASE WHEN VALUE IS NULL THEN tmp END) IGNORE NULLS
OVER(PARTITION BY ID ORDER BY TMP
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
, MAX(tmp) OVER(PARTITION BY ID)) AS next_tmp
FROM tab;
Output:
The rules are complex and it wasn't easy to code this — but a triple join, rules for nulls, and removing the first element can produce the results as desired:
with data as (
select (x[0]||' '||x[1])::timestamp ts, x[2]::string id, iff(x[3]='NULL', null, x[3])::string value
from (
select split(value, ' ') x
from table(split_to_table($$2021-12-03 04:03:45 ID1 O
2021-12-03 04:03:46 ID1 P
2021-12-03 04:03:47 ID1 Q
2021-12-03 04:03:48 ID1 R
2021-12-03 04:03:49 ID1 NULL
2021-12-03 04:03:50 ID1 S
2021-12-03 04:03:51 ID1 T
2021-12-04 11:09:03 ID2 A
2021-12-04 11:09:04 ID2 B
2021-12-04 11:09:05 ID2 C$$, '\n'))
))
select ts, id, value, iff( -- return null for null values
value is null
, null
, array_to_string(
array_slice( -- remove first element
array_agg(bvalue) within group (order by bts)
, 1, 99999)
, ',')
) list
from (
select a.*, b.ts bts, b.value bvalue
, coalesce( -- find max null after current value, or max value if none
(
select max(ts)
from data
where a.id=id
and value is null
and a.ts<ts
),
(
select max(ts)
from data
where a.id=id
)) maxts
from data a
join data b
on a.id=b.id
and a.ts<=b.ts
where maxts >= b.ts
)
group by id, ts, value
order by id, ts
Data that is ordered by TMP is also ordered by ID logically.
So, you can
group rows first by ID;
in each group, create a new group when the previous VALUE is null;
in each subgroup, use the comma to join up VALUEs from the second to the last non-null VALUE to form a sequence and make it the value of the new column LIST.
A SQL set is unordered, which makes computing process very complicated.
You need to first create a marker column using the window function, perform a self-join by the marker column, and group rows and join up VALUE values to get the desired result.
A common alternative is to fetch the original data out of the database and process it in Python or SPL. SPL, the open-source Java package, is easy to be integrated into a Java program and generates much simpler code. It can get it done with only two lines of code:
A
1
=ORACLE.query("SELECT * FROM TAB ORDER BY 1")
2
=A1.group#o(#2).conj(~.group#i(#3[-1]==null).run(tmp=~.(#3).select(~),~=~.derive(tmp.m(#+1:).concat#c():LIST))).conj()

Second largest value in a group

I want to select second largest value in each group, how can I do that in SQL?
For example with the below table,
IDs value
ID1 2
ID1 3
ID1 4
ID2 1
ID2 2
ID2 5
When grouping by IDs, I want this output
IDs value
ID1 3
ID2 2
Thanks.
Use row_number():
select t.*
from (select t.*, row_number() over (partition by id order by value desc) as seqnum
from t
) t
where seqnum = 2;
Alternate way - you can use dense_rank().
It will make sure that your SQL always returns second largest value even when you have two records with largest value.
select t.*
from (select t.*, dense_rank() over (partition by id order by value desc) as rrank
from t
) t
where rrank = 2;

reducing oracle table entity

I have a table like this:
I want a query that make my table like this:
You can use aggregation:
select max(id1) as id1, max(id2) as id2, max(id3) as id3
from . . .
I would prefer to use SUM() here, because you might have negative numbers in a given column:
select sum(id1) as id1, sum(id2) as id2, sum(id3) as id3
from yourTable
If we use MAX(), then it would fail, if for example there were a -2 in one of the columns. In that case, the max would return zero.
I'm not sure that solution is optimal for executing, but:
select (select id1 from table where id1 <> 0) id1,
(select id2 from table where id2 <> 0) id2,
(select id3 from table where id3 <> 0) id3
from dual;

How to label groups in postgresql when group belonging depends on the preceding line?

I want, in a request, to fill all Null values by the last known value.
When it's in a table and not in a request, it's easy:
If I define and fill my table as follows:
CREATE TABLE test_fill_null (
date INTEGER,
value INTEGER
);
INSERT INTO test_fill_null VALUES
(1,2),
(2, NULL),
(3, 45),
(4,NULL),
(5, null);
SELECT * FROM test_fill_null ;
date | value
------+-------
1 | 2
2 |
3 | 45
4 |
5 |
Then I just have to fill like that:
UPDATE test_fill_null t1
SET value = (
SELECT t2.value
FROM test_fill_null t2
WHERE t2.date <= t1.date AND value IS NOT NULL
ORDER BY t2.date DESC
LIMIT 1
);
SELECT * FROM test_fill_null;
date | value
------+-------
1 | 2
2 | 2
3 | 45
4 | 45
5 | 45
But now, I'm in a request, like this one:
WITH
pre_table AS(
SELECT
id1,
id2,
tms,
CASE
WHEN tms - lag(tms) over w < interval '5 minutes' THEN NULL
ELSE id2
END as group_id
FROM
table0
window w as (partition by id1 order by tms)
)
Where the group_id is set to id2 when the previous point is distant from more than 5 minutes, null otherwise. By doing so, I want to end up with group of points that follow each other by less than 5 minutes, and gaps of more than 5 minutes between each groups.
Then I don't know how to proceed. I tried:
SELECT distinct on (id1, id2)
t0.id1,
t0.id2,
t0.tms,
t1.group_id
FROM
pre_table t0
LEFT JOIN (
select
id1,
tms,
group_id
from pre_table t2
where t2.group_id is not null
order by tms desc
) t1
ON
t1.tms <= t0.tms AND
t1.id1 = t0.id1
WHERE
t0.id1 IS NOT NULL
ORDER BY
id1,
id2,
t1.tms DESC
But in the final result I have some group with two consecutive points which are distant from more than 5 minutes. Their should be two different groups in this case.
A "select within a select" is more commonly called "subselect" or "subquery" In your particular case it's a correlated subquery. LATERAL joins (new in postgres 9.3) can largely replace correlated subqueries with more flexible solutions:
What is the difference between LATERAL and a subquery in PostgreSQL?
I don't think you need either here.
For your first case this query is probably faster and simpler, though:
SELECT date, max(value) OVER (PARTITION BY grp) AS value
FROM (
SELECT *, count(value) OVER (ORDER BY date) AS grp
FROM test_fill_null
) sub;
count() only counts non-null values, so grp is incremented with every non-null value, thereby forming groups as desired. It's trivial to pick the one non-null value per grp in the outer SELECT.
For your second case, I'll assume the initial order of rows is determined by (id1, id2, tms) as indicated by one of your queries.
SELECT id1, id2, tms
, count(step) OVER (ORDER BY id1, id2, tms) AS group_id
FROM (
SELECT *, CASE WHEN lag(tms, 1, '-infinity') OVER (PARTITION BY id1 ORDER BY id2, tms)
< tms - interval '5 min'
THEN true END AS step
FROM table0
) sub
ORDER BY id1, id2, tms;
Adapt to your actual order. One of these might cover it:
PARTITION BY id1 ORDER BY id2 -- ignore tms
PARTITION BY id1 ORDER BY tms -- ignore id2
SQL Fiddle with an extended example.
Related:
Select longest continuous sequence
While editing my question I found a solution. It's pretty low though, much lower than my example within a table. Any suggestion to improve it ?
SELECT
t2.id1,
t2.id2,
t2.tms,
(
SELECT t1.group_id
FROM pre_table t1
WHERE
t1.tms <= t2.tms
AND t1.group_id IS NOT NULL
AND t2.id1 = t2.id1
ORDER BY t1.tms DESC
LIMIT 1
) as group_id
FROM
pre_table t2
ORDER BY
t2.id1
t2.id2
t2.tms
So as I said, a select within a select

Return most recent data

I have a database table, which is structured like this:
CREATE table match (ID1 int, ID2 int, CreatedDate, ConnectDisconnect)
I am trying to write an SQL statement that will return the most recent rematch record grouped by id1 and id2. For example, please see the data below:
1,2,'2014-06-05', C
1,3,'2014-06-05', C
1,4,'2014-06-05', C
N1,N2,'2014-06-05',D
Please see the SQL statement below:
select max(CreatedDate), ID1,ID2 FROM match
group by ID1,ID2
This will show the most recent decision that was made on ID1 and ID2. The problem is that the matching records can be either way around. For example:
1,2,'2014-06-04', C
2,1,'2014-06-05', D
The data above shows that records 1 and 2 were connected on 04/06/2014 and disconnected on 05/06/2014. My query above will return two rows, however I only want it to return one row i.e. the most recent (the data dated 05/06/14 in the case above).
Here is one approach, using case and row_number():
select m.*
from (select m.*,
row_number() over (partition by (case when id1 < id2 then id1 else id2 end),
(case when id1 < id2 then id2 else id1 end)
order by CreatedDate desc
) as seqnum
from match m
) m
where seqnum = 1;
It's ugly but this may do what you want (not sure what the syntax of concatenation is in SQL Server):
select max(CreatedDate), case when ID1 < ID2 concat(ID1,',',ID2) else concat(ID2,',',ID1) end as combinedIds
FROM match
group by case when ID1 < ID2 concat(ID1,',',ID2) else concat(ID2,',',ID1) end