I have this table:
Vals
Val1 Val2 Score
A B 1
C 2
D 3
I would like the output to be a single column that is the "superset" of the Vals1 and Val2 variable. It also keeps the "score" variable associated with that value.
The output should be:
Val Score
A 1
B 1
C 2
D 3
Selecting from this table twice and then unioning is absolutely not a possibility because producing it is very expensive. In addition I cannot use a with clause because this query uses one in a sub-query and for some reason Oracle doesn't support two with clauses.
I don't really care about how repeat values are dealt with, whatever is easiest/fastest.
How can I generate my appropriate output?
Here is solution without using unpivot.
with columns as (
select level as colNum from dual connect by level <= 2
),
results as (
select case colNum
when 1 then Val1
when 2 then Val2
end Val,
score
from vals,
columns
)
select * from results where val is not null
Here is essentially the same query without the WITH clause:
select case colNum
when 1 then Val1
when 2 then Val2
end Val,
score
from vals,
(select level as colNum from dual connect by level <= 2) columns
where case colNum
when 1 then Val1
when 2 then Val2
end is not null
Or a bit more concisely
select *
from ( select case colNum
when 1 then Val1
when 2 then Val2
end Val,
score
from vals,
(select level as colNum from dual connect by level <= 2) columns
) results
where val is not null
try this, looks like you want to convert column values into rows
select val1, score from vals where val1 is not null
union
select val2,score from vals where val2 is not null
If you're on Oracle 11, unPivot will help:
SELECT *
FROM vals
UNPIVOT ( val FOR origin IN (val1, val2) )
you can choose any names instead of 'val' and 'origin'.
See Oracle article on pivot / unPivot.
Related
I have a table which has the following format (Google Big query) :
user
url
val1
val2
val3
...
val300
A
a
0.5
0
-3
...
1
A
b
1
2
3
...
2
B
c
5
4
-10
...
2
I would like to obtain a new table where I obtain the number of urls by user, and vals are aggregated by average. (The number of different vals can be variable so I would like to have something rather flexible)
user
nb_url
val1
val2
val3
...
val300
A
2
0.75
1
0
...
1.5
B
1
...
What is the good syntax?
Thank you in advance
Aggregate by user, select the count of URLs, and the average of the other columns.
SELECT
user,
COUNT(*) AS nb_url,
AVG(val1) AS val1,
AVG(val2) AS val2,
AVG(val3) AS val3,
...
AVG(val300) AS val300
FROM yourTable
GROUP BY user
ORDER BY user;
Generating pivot for 300 columns can be quite expensive even for BigQuery - instead I would recommend below [unpivoted] solution
select user, count(url) nb_url,
offset + 1 col, avg(cast(val as float64)) as val
from your_table t,
unnest(split(translate(format('%t', (select as struct * except(user, url) from unnest([t]))), '() ', ''))) val with offset
group by user, col
if applied to sample data as in your question - output is
I just had this idea of how can i loop in sql?
For example
I have this column
PARAMETER_VALUE
E,C;S,C;I,X;G,T;S,J;S,F;C,S;
i want to store all value before (,) in a temp column also store all value after (;) into another column
then it wont stop until there is no more value after (;)
Expected Output for Example
COL1 E S I G S S C
COL2 C C X T J F S
etc . . .
You can get by using regexp_substr() window analytic function with connect by level <= clause
with t1(PARAMETER_VALUE) as
(
select 'E,C;S,C;I,X;G,T;S,J;S,F;C,S;' from dual
), t2 as
(
select level as rn,
regexp_substr(PARAMETER_VALUE,'([^,]+)',1,level) as str1,
regexp_substr(PARAMETER_VALUE,'([^;]+)',1,level) as str2
from t1
connect by level <= regexp_count(PARAMETER_VALUE,';')
)
select listagg( regexp_substr(str1,'([^;]+$)') ,' ') within group (order by rn) as col1,
listagg( regexp_substr(str2,'([^,]+$)') ,' ') within group (order by rn) as col2
from t2;
COL1 COL2
------------- -------------
E S I G S S C C C X T J F S
Demo
Assuming that you need to separate the input into rows, at the ; delimiters, and then into columns at the , delimiter, you could do something like this:
-- WITH clause included to simulate input data. Not part of the solution;
-- use actual table and column names in the SELECT statement below.
with
t1(id, parameter_value) as (
select 1, 'E,C;S,C;I,X;G,T;S,J;S,F;C,S;' from dual union all
select 2, ',U;,;V,V;' from dual union all
select 3, null from dual
)
-- End of simulated input data
select id,
level as ord,
regexp_substr(parameter_value, '(;|^)([^,]*),', 1, level, null, 2) as col1,
regexp_substr(parameter_value, ',([^;]*);' , 1, level, null, 1) as col2
from t1
connect by level <= regexp_count(parameter_value, ';')
and id = prior id
and prior sys_guid() is not null
order by id, ord
;
ID ORD COL1 COL2
--- --- ---- ----
1 1 E C
1 2 S C
1 3 I X
1 4 G T
1 5 S J
1 6 S F
1 7 C S
2 1 U
2 2
2 3 V V
3 1
Note - this is not the most efficient way to split the inputs (nothing will be very efficient - the data model, which is in violation of First Normal Form, is the reason). This can be improved using standard instr and substr, but the query will be more complicated, and for that reason, harder to maintain.
I generated more input data, to illustrate a few things. You may have several inputs that must be broken up at the same time; that must be done with care. (Note the additional conditions in CONNECT BY). I also illustrate the handling of NULL - if a comma comes right after a semicolon, that means that the "column 1" part of that pair must be NULL. That is shown in the output.
I have a SQL table (actually a BigQuery table) that has a huge number of columns (over a thousand). I want to quickly find the min and max value of each column. Is there a way to do that?
It is impossible for me to list all the columns. Looking for ways to do something like
SELECT MAX(*) FROM mytable;
and then running
SELECT MIN(*) FROM mytable;
I have been unable to Google a way of doing that. Not sure that's even possible.
For example, if my table has the following schema:
col1 col2 col3 .... col1000
the (say, max) query should return
Row col1 col2 col3 ... col1000
1 3 18 0.6 ... 45
and the min query should return (say)
Row col1 col2 col3 ... col1000
1 -5 4 0.1 ... -5
The numbers are just for illustration. The column names could be different strings and not easily scriptable.
See below example for BigQuery Standard SQL - it works for any number of columns and does not require explicit calling/use of columns names
#standardSQL
WITH `project.dataset.mytable` AS (
SELECT 1 AS col1, 2 AS col2, 3 AS col3, 4 AS col4 UNION ALL
SELECT 7,6,5,4 UNION ALL
SELECT -1, 11, 5, 8
)
SELECT
MIN(CAST(value AS INT64)) AS min_value,
MAX(CAST(value AS INT64)) AS max_value
FROM `project.dataset.mytable` t,
UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r'":(.*?)(?:,"|})')) value
with result
Row min_value max_value
1 -1 11
Note: if your columns are of STRING data type - you should remove CAST ... AS INT64
Or if they are of FLOAT64 - replace INT64 with FLOAT64 in the CAST function
Update
Below is option to get MIN/Max for each column and present result as array of respective values as list of respective values in the order of the columns
#standardSQL
WITH `project.dataset.mytable` AS (
SELECT 1 AS col1, 2 AS col2, 3 AS col3, 14 AS col4 UNION ALL
SELECT 7,6,5,4 UNION ALL
SELECT -1, 11, 5, 8
), temp AS (
SELECT pos, MIN(CAST(value AS INT64)) min_value, MAX(CAST(value AS INT64)) max_value
FROM `project.dataset.mytable` t,
UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r'":(.*?)(?:,"|})')) value WITH OFFSET pos
GROUP BY pos
)
SELECT 'min_values' stats, TO_JSON_STRING(ARRAY_AGG(min_value ORDER BY pos)) vals FROM temp UNION ALL
SELECT 'max_values', TO_JSON_STRING(ARRAY_AGG(max_value ORDER BY pos)) FROM temp
with result as
Row stats vals
1 min_values [-1,2,3,4]
2 max_values [7,11,5,14]
Hope this is something you can still apply to whatever your final goal
Currently I have a table that looks like below:
ID|Date |Val1|Val2|
1 |1/1/2016|1000|0
2 |1/1/2016|Null|0
3 |1/1/2016|Null|0
1 |2/1/2016|1000|0
2 |2/1/2016|Null|0
3 |2/1/2016|1000|0
1 |3/1/2016|1000|0
2 |3/1/2016|1000|0
3 |3/1/2016|1000|0
I want val2 to become 1 if Val1 is populated in the previous month, so the output would look like:
ID|Date |Val1|Val2|
1 |1/1/2016|1000|0
2 |1/1/2016|Null|0
3 |1/1/2016|Null|0
1 |2/1/2016|1000|1
2 |2/1/2016|Null|0
3 |2/1/2016|1000|0
1 |3/1/2016|1000|1
2 |3/1/2016|1000|0
3 |3/1/2016|1000|1
I've tried a few code combinations, but the conditional of updating the value by the previous date where Val1 first appears is tripping me up. I'd appreciate any help!
You can do this with a windowed LAG() to find the previous value, and update Val2 if it's NOT NULL.
;With Cte As
(
Select Id, [Date-----], Val1, Val2,
Lag(Val1) Over (Partition By Id Order By [Date-----] Asc) As Prev
From LikeBelow
)
Update Cte
Set Val2 = 1
Where Prev Is Not Null;
If you are actually storing your dates as a VARCHAR and not a DATE, you'll need to convert it:
;With Cte As
(
Select Id, [Date-----], Val1, Val2,
Lag(Val1) Over (Partition By Id
Order By Convert(Date, [Date-----]) Asc) As Prev
From LikeBelow
)
Update Cte
Set Val2 = 1
Where Prev Is Not Null;
I was wondering which is a better/faster/more efficient way of turning arbitrary strings into columns:
UNION ALL
SELECT my_field,
CASE WHEN my_field = 'str1'
THEN ...
...
END,
...
FROM (
SELECT 'str1' AS my_field FROM DUAL
UNION ALL
SELECT 'str2' AS my_field FROM DUAL
UNION ALL
SELECT 'str3' AS my_field FROM DUAL
),
...
CONNECT BY LEVEL
SELECT CASE WHEN rowno = 1
THEN 'str1'
...
END AS my_field,
CASE WHEN rowno = 1
THEN ...
...
END,
...
FROM (
SELECT ROWNUM rowno
FROM DUAL
CONNECT BY LEVEL <= 3
),
...
I'm inclined to go with the UNION ALL version if only because it makes the outermost SELECT simpler: I don't have to do a second CASE statement to get the desired string values. It also is more readable to see WHEN my_field = 'str1' rather than WHEN rowno = 1. The only reason I ask about the CONNECT BY LEVEL version is because it was suggested in Example of Data Pivots in SQL (rows to columns and columns to rows) (see the "From Two rows to Six rows (a column to row pivot)" section).
I have only SELECT access to the Oracle database I'm using, so I cannot run EXPLAIN PLAN. I have also tried to use WITH ... AS before, too, without luck.
I think you're confusing the purposed UNION ALL and CONNECT BY methods used in "Example of Data Pivots in SQL (rows to columns and columns to rows)"
The UNION ALL in your question is used to transform multiple rows with a single column into a single row with multiple columns:
label, 1, val1
label, 2, val2
label, 3, val3
into
label, val1, val2, val3
The CONNECT BY sub-query is used to transform a single row with multiple columns into mutiple rows with single column, so it uses as generator sub-query to multiply the existing data set:
label, val1, val2, val3
+
1
2
3
result into:
label, 1, val1, val2, val3
label, 2, val1, val2, val3
label, 3, val1, val2, val3
transformed into:
label, 1, val1
label, 2, val2
label, 3, val3
I would use connect by for any but the most trivial number of rows. Not having explain plan is a pain though ... you're really having your hands tied there. I'd be really keen on knowing what the optimiser's estimate of cardinality is.