Using multilist column as foreign key reference

Using multilist column as foreign key reference - sql

I have a table TABLEA that store data in a Columns which are basically multilist columns like this ColumnA ',2562,2563,2564,' and ColumnB with values ',121,122,123,'.
These column are actually foreign key values coming from another table.
Data is something like this in Table A.
ID NAME ColumnA ColumnB
1 ITEM1 ,2562,2563,2564, ,121,122,123
2 ITEM2 NULL ,6455,545,
3 ITEM3 ,1221,1546, NULL
4 ITEM4 NULL NULL
I want to join these columns with there parent tables and extract data.
I am hoping the result set would have 8 rows.
For example
ITEM ColumnA ColumB
ITEM1 2562 121
ITEM1 2563 122
ITEM1 2564 123
ITEM2 NULL 6455
ITEM2 NULL 545
....
I have tried this query with some help but this is not working when I try to use ColumnB as well and also it ignores the Items with NULL values.
The Column A is saving Ids of USER_GROUP table but ColumnB is fetching the Ids from some other table lets say GROUP1 and there could be another Column ColumnC that might be storing values from another table so that's kind of situation I am stuck in and hope I have explained so someone can understand but I am open if you want me to improve more
SELECT ug.*
FROM USER_GROUP ug
WHERE EXISTS (SELECT 1
FROM TableA t1
WHERE t1.COLUMNA LIKE '%,' || ug.ID || ',%'
)
AND EXISTS (SELECT 1
FROM TableA t1
WHERE t1.COLUMNB LIKE '%,' || ug.ID || ',%'
);

Here's one option:
SQL> with test (id, name, cola, colb) as
2 (select 1, 'item1', ',2562,2563,2564,', ',121,122,123,' from dual union all
3 select 2, 'item2', null , ',6455,545,' from dual union all
4 select 3, 'item3', ',1221,1546,' , null from dual union all
5 select 4, 'item4', null , null from dual
6 ),
7 remcom
8 -- remove leading and trailing commas
9 as (select id,
10 name,
11 rtrim(ltrim(cola, ','), ',') cola,
12 rtrim(ltrim(colb, ','), ',') colb
13 from test
14 )
15 select id,
16 name,
17 regexp_substr(cola, '[^,]+', 1, column_value) cola,
18 regexp_substr(colb, '[^,]+', 1, column_value) colb
19 from remcom r cross join
20 table(cast(multiset(select level from dual
21 connect by level <= regexp_count(nvl(r.cola, r.colb), ',') + 1
22 ) as sys.odcinumberlist))
23 order by id, name, cola, colb;
ID NAME COLA COLB
---------- ----- ---------- ----------
1 item1 2562 121
1 item1 2563 122
1 item1 2564 123
2 item2 545
2 item2 6455
3 item3 1221
3 item3 1546
4 item4
8 rows selected.
SQL>
Now that you have it, join this result with another table you have.
By the way, this example nicely shows what it is a bad idea to store multiple values into the same column. Don't do that.

You don't need to use (slow) regular expressions and can do it with simple string functions in a recursive sub-query factoring clause:
WITH split_data ( id, name, columna, columnb, starta, enda, startb, endb ) AS (
SELECT id,
name,
columna,
columnb,
INSTR(columna,',',1,1),
INSTR(columna,',',1,2),
INSTR(columnb,',',1,1),
INSTR(columnb,',',1,2)
FROM test_data
UNION ALL
SELECT id,
name,
columna,
columnb,
enda,
CASE WHEN enda = 0 THEN 0 ELSE INSTR(columna,',',enda+1,1) END,
endb,
CASE WHEN endb = 0 THEN 0 ELSE INSTR(columnb,',',endb+1,1) END
FROM split_data
WHERE enda > 0
OR endb > 0
)
SELECT id,
name,
CASE
WHEN starta = 0 THEN NULL
WHEN enda = 0 THEN SUBSTR( columna, starta + 1 )
ELSE SUBSTR( columna, starta + 1, enda - starta - 1 )
END AS valuea,
CASE
WHEN startb = 0 THEN NULL
WHEN endb = 0 THEN SUBSTR( columnb, startb + 1 )
ELSE SUBSTR( columnb, startb + 1, endb - startb - 1 )
END as valueb
FROM split_data
ORDER BY id, starta, startb;
Which for your test data:
CREATE TABLE test_data ( ID, NAME, ColumnA, ColumnB ) AS
SELECT 1, 'ITEM1', ',2562,2563,2564', ',121,122,123' FROM DUAL UNION ALL
SELECT 2, 'ITEM2', NULL, ',6455,545' FROM DUAL UNION ALL
SELECT 3, 'ITEM3', ',1221,1546', NULL FROM DUAL UNION ALL
SELECT 4, 'ITEM4', NULL, NULL FROM DUAL;
Outputs:
ID | NAME | VALUEA | VALUEB
-: | :---- | :----- | :-----
1 | ITEM1 | 2562 | 121
1 | ITEM1 | 2563 | 122
1 | ITEM1 | 2564 | 123
2 | ITEM2 | null | 6455
2 | ITEM2 | null | 545
3 | ITEM3 | 1221 | null
3 | ITEM3 | 1546 | null
4 | ITEM4 | null | null
db<>fiddle here

Related

Aggregate multiple columns into an array only when the columns have non null value in Bigquery

I have a table that looks like this:
+----+------+------+------+------+------+
| id | col1 | col2 | col3 | col4 | col5 |
+----+------+------+------+------+------+
| a | 1 | null | null | null | null |
| b | 1 | 2 | 3 | 4 | null |
| c | 1 | 2 | 3 | 4 | 5 |
| d | 2 | 1 | 7 | null | 4 |
+----+------+------+------+------+------+
I want to create an aggregated table where for each id I want an array that contains non null value from all the other columns. The output should look like this:
+-----+-------------+
| id | agg_col |
+-----+-------------+
| a | [1] |
| b | [1,2,3,4] |
| c | [1,2,3,4,5] |
| d | [2,1,7,4] |
+-----+-------------+
Is it possible to produce the output using bigquery standard sql?

Below is not super generic solution, but works for your specific example that you provided - id is presented with alphanumeric (not starting with digit) and rest of columns are numbers - integers
#standardSQL
SELECT id,
ARRAY(SELECT * FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r':(\d*)')) col WHERE col != '') AS agg_col_as_array,
CONCAT('[', ARRAY_TO_STRING(ARRAY(SELECT * FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r':(\d*)')) col WHERE col != ''), ','), ']') AS agg_col_as_string
FROM `project.dataset.table` t
You can test, play with above using sample data from your question as below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'a' id, 1 col1, NULL col2, NULL col3, NULL col4, NULL col5 UNION ALL
SELECT 'b', 1, 2, 3, 4, NULL UNION ALL
SELECT 'c', 1, 2, 3, 4, 5 UNION ALL
SELECT 'd', 2, 1, 7, NULL, 4
)
SELECT id,
ARRAY(SELECT * FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r':(\d*)')) col WHERE col != '') AS agg_col_as_array,
CONCAT('[', ARRAY_TO_STRING(ARRAY(SELECT * FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r':(\d*)')) col WHERE col != ''), ','), ']') AS agg_col_as_string
FROM `project.dataset.table` t
-- ORDER BY id
with result as
Row id agg_col_as_array agg_col_as_string
1 a 1 [1]
2 b 1 [1,2,3,4]
2
3
4
3 c 1 [1,2,3,4,5]
2
3
4
5
4 d 2 [2,1,7,4]
1
7
4
Do you think it is possible to do this by mentioning specific columns and then binding them into an array?
Sure, it is doable - see below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'a' id, 1 col1, NULL col2, NULL col3, NULL col4, NULL col5 UNION ALL
SELECT 'b', 1, 2, 3, 4, NULL UNION ALL
SELECT 'c', 1, 2, 3, 4, 5 UNION ALL
SELECT 'd', 2, 1, 7, NULL, 4
)
SELECT id,
ARRAY(
SELECT col
FROM UNNEST([col1, col2, col3, col4, col5]) col
WHERE NOT col IS NULL
) AS agg_col_as_array,
CONCAT('[', ARRAY_TO_STRING(
ARRAY(
SELECT CAST(col AS STRING)
FROM UNNEST([col1, col2, col3, col4, col5]) col
WHERE NOT col IS NULL
), ','), ']') AS agg_col_as_string
FROM `project.dataset.table` t
-- ORDER BY id
BUT ... this is not the best option you have as you need to manage and adjust number and names of columns in each case for different uses
Below solution is adjusted version of my original answer to address your latest comment - Actually the sample was too simple. Both of my id and other columns have alphanumeric and special characters.
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'a' id, 1 col1, NULL col2, NULL col3, NULL col4, NULL col5 UNION ALL
SELECT 'b', 1, 2, 3, 4, NULL UNION ALL
SELECT 'c', 1, 2, 3, 4, 5 UNION ALL
SELECT 'd', 2, 1, 7, NULL, 4
)
SELECT id,
ARRAY(
SELECT col
FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r':(.*?)(?:,|})')) col WITH OFFSET
WHERE col != 'null' AND OFFSET > 0
) AS agg_col_as_array,
CONCAT('[', ARRAY_TO_STRING(
ARRAY(
SELECT col
FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r':(.*?)(?:,|})')) col WITH OFFSET
WHERE col != 'null' AND OFFSET > 0
), ','), ']') AS agg_col_as_string
FROM `project.dataset.table` t
-- ORDER BY id
both with same result as before
Row id agg_col_as_array agg_col_as_string
1 a 1 [1]
2 b 1 [1,2,3,4]
2
3
4
3 c 1 [1,2,3,4,5]
2
3
4
5
4 d 2 [2,1,7,4]
1
7
4

How to exclude certain rows from sql select

How do I exclude certain rows?
For example, I have the following table:
+------+------+------+
| Col1 | Col2 | Col3 |
+------+------+------+
| 1 | 1 | R |
| 1 | 2 | D |
| 2 | 3 | R |
| 2 | 4 | R |
| 3 | 5 | R |
| 4 | 6 | D |
+------+------+------+
I need to select only:
| 2 | 3 | R |
| 2 | 4 | R |
| 3 | 5 | R |
My select that does not work properly:
with t (c1,c2,c3) as(
select 1 , 1 , 'R' from dual union all
select 1 , 2 , 'D' from dual union all
select 2 , 3 , 'R' from dual union all
select 2 , 4 , 'R' from dual union all
select 3 , 5 , 'R' from dual union all
select 4 , 6 , 'D' from dual),
tt as (select t.*,count(*) over (partition by c1) cc from t ) select * from tt where cc=1 and c3='R';
Thanks in advance!

select * from table where col2 = 'R'
or if you want to exclude rows with D value just
select * from table where col2 != 'D'

It depends on your requirements but you can do in this way:
SELECT * FROM `table` WHERE col1 = 2 AND col3 = "R"
if you want to exclude just do it like WHERE col1 != 1
You ca also use IN clause also e.g.
SELECT column_name(s)
FROM table_name
WHERE column_name IN (value1, value2, ...);
This syntax is for MySql, but you can modify it as per your requirement or database you are using.

this will work :
select * from (select * from table_name) where rownum<=4
minus
select * from ( select * from table_name) where rownum<=2

My guess is that you want all rows for a col1 where no row for a col1 = D and at least 1 row for a col1 = R. # where [not] exists may do
DROP TABLE T;
CREATE TABLE T
(Col1 NUMBER, Col2 NUMBER, Col3 VARCHAR(1));
INSERT INTO T VALUES ( 1 , 1 , 'R');
INSERT INTO T VALUES ( 1 , 2 , 'D');
INSERT INTO T VALUES ( 2 , 3 , 'R');
INSERT INTO T VALUES ( 2 , 4 , 'R');
INSERT INTO T VALUES ( 3 , 5 , 'R');
INSERT INTO T VALUES ( 3 , 6 , 'D');
INSERT INTO T VALUES ( 4 , 5 , 'X');
INSERT INTO T VALUES ( 4 , 6 , 'Y');
INSERT INTO T VALUES ( 5 , 6 , 'X');
INSERT INTO T VALUES ( 5 , 5 , 'R');
INSERT INTO T VALUES ( 5 , 6 , 'Y');
SELECT *
FROM T
WHERE NOT EXISTS(SELECT 1 FROM T T1 WHERE T1.COL1 = T.COL1 AND COL3 = 'D') AND
EXISTS(SELECT 1 FROM T T1 WHERE T1.COL1 = T.COL1 AND COL3 = 'R');
Result
COL1 COL2 COL3
---------- ---------- ----
5 6 X
5 5 R
5 6 Y
2 3 R
2 4 R

use row_number() window function
with t (c1,c2,c3) as(
select 1 , 1 , 'R' from dual union all
select 1 , 2 , 'D' from dual union all
select 2 , 3 , 'R' from dual union all
select 2 , 4 , 'R' from dual union all
select 3 , 5 , 'R' from dual union all
select 4 , 6 , 'D' from dual
),
t1 as
(
select c1,c2,c3,row_number() over(order by c2) rn from t
) select * from t1 where t1.rn>=3 and t1.rn<=5
demo link
C1 C2 C3
2 3 R
2 4 R
3 5 R

You can try using correlated subquery
select * from tablename a
from
where exists (select 1 tablename b where a.col1=b.col1 having count(*)>1)

Based on what you have provided I can only surmise that the only requirement is for COL1 to be equal to 2 or 3 in that case all you have to do is (assuming that you actually have table);
SELECT * FROM <table_name>
WHERE col1 IN (2,3);
This will give you the desired output for the particular example provided in the question. If there is a selection requirement that goes beyond retrieving data where column 1 is either 2 or 3 than a more specific or precise answer can be provided.

SQL theory: Filtering out duplicates in one column, picking lowest value in other column

I am trying to figure out the best way to remove rows from a result set where either the value in one column or the value in a different column has a duplicate in the result set.
Imagine the results of a query are as follows:
a_value | b_value
-----------------
1 | 1
2 | 1
2 | 2
3 | 1
4 | 3
5 | 2
6 | 4
6 | 5
What I want to do is:
Eliminate all rows that have duplicate values in a_value
Pick only 1 row for a given b_value
So I'd want the filtered results to end up like this after eliminating a_value duplicates:
a_value | b_value
-----------------
1 | 1
3 | 1
4 | 3
5 | 2
And then like this after picking only a single b_value:
a_value | b_value
-----------------
1 | 1
4 | 3
5 | 2
I'd appreciate suggestions on how to accomplish this task in an efficient way via SQL.

with
q_res ( a_value, b_value ) as (
select 1, 1 from dual union all
select 2, 1 from dual union all
select 2, 2 from dual union all
select 3, 1 from dual union all
select 4, 3 from dual union all
select 5, 2 from dual union all
select 6, 4 from dual union all
select 6, 5 from dual
)
-- end test data; solution begins below
select min(a_value) as a_value, b_value
from (
select a_value, min(b_value) as b_value
from q_res
group by a_value
having count(*) = 1
)
group by b_value
order by a_value -- ORDER BY is optional
;
A_VALUE B_VALUE
------- -------
1 1
4 3
5 2

1) In the inner query I am avoiding all duplicates which are present in a_value
column and getting all the remaining rows from input table and storing them
as t2. By joining t2 with t1 there would be full data without any dups as per
your #1 in requirement.
SELECT t1.*
FROM Table t1,
(
SELECT a_value
FROM Table
GROUP BY a_value
HAVING COUNT(*) = 1
) t2
WHERE t1.a_value = t2.a_value;
2) Once the filtered data is obtained, I am assigning rank to each row in the filtered dataset obtained in step-1 and I am selecting only rows with rank=1.
SELECT X.a_value,
X.b_value
FROM
(
SELECT t1.*,
ROW_NUMBER() OVER ( PARTITION BY t1.b_value ORDER BY t1.a_value,t1.b_value ) AS rn
FROM Table t1,
(
SELECT a_value
FROM Table
GROUP BY a_value
HAVING COUNT(*) = 1
) t2
WHERE t1.a_value = t2.a_value
) X
WHERE X.rn = 1;

Difference between the values of multiple rows in SQL

My table in SQL is like:-
RN Name value1 value2 Timestamp
1 Mark 110 210 20160119
1 Mark 106 205 20160115
1 Mark 103 201 20160112
2 Steve 120 220 20151218
2 Steve 111 210 20151210
2 Steve 104 206 20151203
Desired Output:-
RN Name value1Lag1 value1lag2 value2lag1 value2lag2
1 Mark 4 3 5 4
2 Steve 9 7 10 4
The difference is calculated from the most recent to the second recent and then from second recent to the third recent for RN 1
value1lag1 = 110-106 =4
value1lag2 = 106-103 = 3
value2lag1 = 210-205 = 5
value2lag2 = 205-201 = 4
similarly for other RN's also.
Note: For each RN there are 3 and only 3 rows.
I have tried in several ways by taking help from similar posts but no luck.

I've assumed that RN and Name are linked here. It's a bit messy, but if each RN always has 3 values and you always want to check them in this order, then something like this should work.
SELECT
t1.Name
, AVG(CASE WHEN table_ranked.Rank = 1 THEN table_ranked.value1 ELSE NULL END) - AVG(CASE WHEN table_ranked.Rank = 2 THEN table_ranked.value1 ELSE NULL END) value1Lag1
, AVG(CASE WHEN table_ranked.Rank = 2 THEN table_ranked.value1 ELSE NULL END) - AVG(CASE WHEN table_ranked.Rank = 3 THEN table_ranked.value1 ELSE NULL END) value1Lag2
, AVG(CASE WHEN table_ranked.Rank = 1 THEN table_ranked.value2 ELSE NULL END) - AVG(CASE WHEN table_ranked.Rank = 2 THEN table_ranked.value2 ELSE NULL END) value2Lag1
, AVG(CASE WHEN table_ranked.Rank = 2 THEN table_ranked.value2 ELSE NULL END) - AVG(CASE WHEN table_ranked.Rank = 3 THEN table_ranked.value2 ELSE NULL END) value2Lag2
FROM table t1
INNER JOIN
(
SELECT
t1.Name
, t1.value1
, t1.value2
, COUNT(t2.TimeStamp) Rank
FROM table t1
INNER JOIN table t2
ON t2.name = t1.name
AND t1.TimeStamp <= t2.TimeStamp
GROUP BY t1.Name, t1.value1, t1.value2
) table_ranked
ON table_ranked.Name = t1.Name
GROUP BY t1.Name

There are other answers here, but I think your problem is calling for analytic functions, specifically LAG():
select
rn,
name,
-- calculate the differences
value1 - v1l1 value1lag1,
v1l1 - v1l2 value1lag2,
value2 - v2l1 value2lag1,
v2l1 - v2l2 value2lag2
from (
select
rn,
name,
value1,
value2,
timestamp,
-- these two are the values from the row before this one ordered by timestamp (ascending)
lag(value1) over(partition by rn, name order by timestamp asc) v1l1,
lag(value2) over(partition by rn, name order by timestamp asc) v2l1
-- these two are the values from two rows before this one ordered by timestamp (ascending)
lag(value1, 2) over(partition by rn, name order by timestamp asc) v1l2,
lag(value2, 2) over(partition by rn, name order by timestamp asc) v2l2
from (
select
1 rn, 'Mark' name, 110 value1, 210 value2, '20160119' timestamp
from dual
union all
select
1 rn, 'Mark' name, 106 value1, 205 value2, '20160115' timestamp
from dual
union all
select
1 rn, 'Mark' name, 103 value1, 201 value2, '20160112' timestamp
from dual
union all
select
2 rn, 'Steve' name, 120 value1, 220 value2, '20151218' timestamp
from dual
union all
select
2 rn, 'Steve' name, 111 value1, 210 value2, '20151210' timestamp
from dual
union all
select
2 rn, 'Steve' name, 104 value1, 206 value2, '20151203' timestamp
from dual
) data
)
where
-- return only the rows that have defined values
v1l1 is not null and
v1l2 is not null and
v2l1 is not null and
v2l1 is not null
This approach has the benefit that Oracle does all the necessary buffering internally, avoiding self-joins and the like. For big data sets this can be important from a performance viewpoint.
As an example, the explain plan for that query would be something like
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6 | 150 | 13 (8)| 00:00:01 |
|* 1 | VIEW | | 6 | 150 | 13 (8)| 00:00:01 |
| 2 | WINDOW SORT | | 6 | 138 | 13 (8)| 00:00:01 |
| 3 | VIEW | | 6 | 138 | 12 (0)| 00:00:01 |
| 4 | UNION-ALL | | | | | |
| 5 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 6 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 7 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 8 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 9 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 10 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("V1L1" IS NOT NULL AND "V1L2" IS NOT NULL AND "V2L1" IS
Note that there are no joins, just a WINDOW SORT that buffers the necessary data from the "data source" (in our case, the VIEW 3 that is the UNION ALL of our SELECT ... FROM DUAL) to partition and calculate the different lags.

if just in this case, it's not that difficult.you need 2 steps
self join and get the result of minus
select t1.RN,
t1.Name,
t1.rm,
t2.value1-t1.value1 as value1,
t2.value2-t1.value2 as value2
from
(select RN,Name,value1,value2,
row_number(partition by Name order by Timestamp desc) as rm from table)t1
left join
(select RN,Name,value1,value2,
row_number(partition by Name order by Timestamp desc) as rm from table) t2
on t1.rm = t2.rm-1
where t2.RN is not null.
you set this as a table let's say table3.
2.you pivot it
select * from (
select t3.RN, t3.Name,t3.rm,t3.value1,t3.value2 from table3 t3
)
pivot
(
max(value1)
for rm in ('1','2')
)v1
3.you get 2 pivot table for value1 and value2 join them together to get the result.
but i think there may be a better way and i m not sure if we can just join pivot when we pivot it so i ll use join after i get the pivot result that will make 2 more tables. its not good but the best i can do

-- test data
with data(rn,
name,
value1,
value2,
timestamp) as
(select 1, 'Mark', 110, 210, to_date('20160119', 'YYYYMMDD')
from dual
union all
select 1, 'Mark', 106, 205, to_date('20160115', 'YYYYMMDD')
from dual
union all
select 1, 'Mark', 103, 201, to_date('20160112', 'YYYYMMDD')
from dual
union all
select 2, 'Steve', 120, 220, to_date('20151218', 'YYYYMMDD')
from dual
union all
select 2, 'Steve', 111, 210, to_date('20151210', 'YYYYMMDD')
from dual
union all
select 2, 'Steve', 104, 206, to_date('20151203', 'YYYYMMDD') from dual),
-- first transform value1, value2 to value_id (1,2), value
data2 as
(select d.rn, d.name, 1 as val_id, d.value1 as value, d.timestamp
from data d
union all
select d.rn, d.name, 2 as val_id, d.value2 as value, d.timestamp
from data d)
select * -- find previous row P of row D, evaluate difference and build column name as desired
from (select d.rn,
d.name,
d.value - p.value as value,
'value' || d.val_id || 'Lag' || row_number() over(partition by d.rn, d.val_id order by d.timestamp desc) as col
from data2 p, data2 d
where p.rn = d.rn
and p.val_id = d.val_id
and p.timestamp =
(select max(pp.timestamp)
from data2 pp
where pp.rn = p.rn
and pp.val_id = p.val_id
and pp.timestamp < d.timestamp))
-- pivot
pivot(sum(value) for col in('value1Lag1',
'value1Lag2',
'value2Lag1',
'value2Lag2'));

Oracle grouping/changing rows to columns

I have the following table named foo:
ID | KEY | VAL
----------------
1 | 47 | 97
2 | 47 | 98
3 | 47 | 99
4 | 48 | 100
5 | 48 | 101
6 | 49 | 102
I want to run a select query and have the results show like this
UNIQUE_ID | KEY | ID1 | VAL1 | ID2 | VAL2 | ID3 | VAL3
--------------------------------------------------------------
47_1:97_2:98_3:99| 47 | 1 | 97 | 2 | 98 | 3 | 99
48_4:100_5:101 | 48 | 4 | 100 | 5 | 101 | |
49_6:102 | 49 | 6 | 102 | | | |
So, basically all rows with the same KEY get collapsed into 1 row. There can be anywhere from 1-3 rows per KEY value
Is there a way to do this in a sql query (without writing a stored procedure or scripts)?
If not, I could also work with the less desirable choice of
UNIQUE_ID | KEY | IDS | VALS
--------------------------------------------------------------
47_1:97_2:98_3:99| 47 | 1,2,3 | 97,98,99
48_4:100_5:101 | 48 | 4,5 | 100, 101
49_6:102 | 49 | 6 | 102
Thanks!
UPDATE:
Unfortunately my real-world problem seems to be much more difficult than this example, and I'm having trouble getting either example to work :( My query is over 120 lines so it's not very easy to post. It kind of looks like:
with v_table as (select ...),
v_table2 as (select foo from v_table where...),
v_table3 as (select foo from v_table where ...),
...
v_table23 as (select foo from v_table where ...)
select distinct (...) as "UniqueID", myKey, myVal, otherCol1, ..., otherCol18
from tbl1 inner join tbl2 on...
...
inner join tbl15 on ...
If I try any of the methods below it seems that I cannot do group-bys correctly because of all the other data being returned.
Ex:
with v_table as (select ...),
v_table2 as (select foo from v_table where...),
v_table3 as (select foo from v_table where ...),
...
v_table23 as (select foo from v_table where ...)
select "Unique ID",
myKey, max(decode(id_col,1,id_col)) as id_1, max(decode(id_col,1,myVal)) as val_1,
max(decode(id_col,2,id_col)) as id_2,max(decode(id_col,2,myVal)) as val_2,
max(decode(id_col,3,id_col)) as id_3,max(decode(id_col,3,myVal)) as val_3
from (
select distinct (...) as "UniqueID", myKey, row_number() over (partition by myKey order by id) as id_col, id, myVal, otherCol1, ..., otherCol18
from tbl1 inner join tbl2 on...
...
inner join tbl15 on ...
) group by myKey;
Gives me the error: ORA-00979: not a GROUP BY expression
This is because I am selecting the UniqueID from the inner select. I will need to do this as well as select other columns from the inner table.
Any help would be appreciated!

Take a look ath this article about Listagg function, this will help you getting the comma separated results, it works only in the 11g version.

You may try this
select key,
max(decode(id_col,1,id_col)) as id_1,max(decode(id_col,1,val)) as val_1,
max(decode(id_col,2,id_col)) as id_2,max(decode(id_col,2,val)) as val_2,
max(decode(id_col,3,id_col)) as id_3,max(decode(id_col,3,val)) as val_3
from (
select key, row_number() over (partition by key order by id) as id_col,id,val
from your_table
)
group by key

As #O.D. suggests, you can generate the less desirable version with LISTAGG, for example (using a CTE to generate your sample data):
with foo as (
select 1 as id, 47 as key, 97 as val from dual
union select 2,47,98 from dual
union select 3,47,99 from dual
union select 4,48,100 from dual
union select 5,48,101 from dual
union select 6,49,102 from dual
)
select key ||'_'|| listagg(id ||':' ||val, '_')
within group (order by id) as unique_id,
key,
listagg(id, ',') within group (order by id) as ids,
listagg(val, ',') within group (order by id) as vals
from foo
group by key
order by key;
UNIQUE_ID KEY IDS VALS
----------------- ---- -------------------- --------------------
47_1:97_2:98_3:99 47 1,2,3 97,98,99
48_4:100_5:101 48 4,5 100,101
49_6:102 49 6 102
With a bit more manipulation you can get your preferred results:
with foo as (
select 1 as id, 47 as key, 97 as val from dual
union select 2,47,98 from dual
union select 3,47,99 from dual
union select 4,48,100 from dual
union select 5,48,101 from dual
union select 6,49,102 from dual
)
select unique_id, key,
max(id1) as id1, max(val1) as val1,
max(id2) as id2, max(val2) as val2,
max(id3) as id3, max(val3) as val3
from (
select unique_id,key,
case when r = 1 then id end as id1, case when r = 1 then val end as val1,
case when r = 2 then id end as id2, case when r = 2 then val end as val2,
case when r = 3 then id end as id3, case when r = 3 then val end as val3
from (
select key ||'_'|| listagg(id ||':' ||val, '_')
within group (order by id) over (partition by key) as unique_id,
key, id, val,
row_number() over (partition by key order by id) as r
from foo
)
)
group by unique_id, key
order by key;
UNIQUE_ID KEY ID1 VAL1 ID2 VAL2 ID3 VAL3
----------------- ---- ---- ---- ---- ---- ---- ----
47_1:97_2:98_3:99 47 1 97 2 98 3 99
48_4:100_5:101 48 4 100 5 101
49_6:102 49 6 102
Can't help feeling there ought to be a simpler way though...

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Using multilist column as foreign key reference - sql

Related

Aggregate multiple columns into an array only when the columns have non null value in Bigquery

How to exclude certain rows from sql select

SQL theory: Filtering out duplicates in one column, picking lowest value in other column

Difference between the values of multiple rows in SQL

Oracle grouping/changing rows to columns

Categories

Resources