Convert data from wide format to long format in SQL - sql

I have some data in the format:
VAR1 VAR2 Score1 Score2 Score3
A B 1 2 3
I need to convert it into the format
VAR1 VAR2 VarName Value
A B Score1 1
A B Score2 2
A B Score3 3
How can I do this in SQL?

Provided your score columns are fixed and you require no aggregation, you can use multiple SELECT and UNION ALL statements to generate the shape of data you requested. E.g.
SELECT [VAR1], [VAR2], [VarName] = 'Score1', [Value] = [Score1]
FROM [dbo].[UnknownMe]
UNION ALL
SELECT [VAR1], [VAR2], [VarName] = 'Score2', [Value] = [Score2]
FROM [dbo].[UnknownMe]
UNION ALL
SELECT [VAR1], [VAR2], [VarName] = 'Score3', [Value] = [Score3]
FROM [dbo].[UnknownMe]
SQL Fiddle: http://sqlfiddle.com/#!6/f54b2/4/0

In hive, you could use the named_struct function, the array function, and the explode function in conjunction with the LATERAL VIEW construct
SELECT VAR1, VAR2, var_struct.varname, var_struct.value FROM
(
SELECT
VAR1,
VAR2,
array (
named_struct("varname","Score1","value",Score1),
named_struct("varname","Score2","value",Score2),
named_struct("varname","Score3","value",Score3)
) AS struct_array1
FROM OrignalTable
) t1 LATERAL VIEW explode(struct_array1) t2 as var_struct;

Related

Convert String to Tuple in BigQuery

I have a variable passed as an argument in BigQuery which is in the format "('a','b','c')"
with vars as (
select "{0}" as var1,
)
-- where, {0} = "('a','b','c')"
To use it in BigQuery I need to make it a tuple ('a','b','c').
How can it be done?
Any alternate approach is also welcome.
Example:
with vars as (
select "('a','b','c')" as index
)
select * from `<some_other_db>.table` where index in (
select index from vars)
-- gives me empty results because index is now a string
Present output:
select * from <db_name>.table where index in "('a','b','c')"
Required output:
select * from <db_name>.table where index in ('a','b','c')
Below is for BigQuery Standard SQL
#standardSQL
WITH vars AS (
SELECT "('a','b','c')" AS var
)
SELECT *
FROM `<some_other_db>.table`
WHERE index IN UNNEST((
SELECT SPLIT(REGEXP_REPLACE(var, r'[()\']', ''))
FROM vars
))
You can test, play with above using some dummy data as in below example
#standardSQL
WITH vars AS (
SELECT "('a','b','c')" AS var
), `<some_other_db>.table` AS (
SELECT 1 id, 'a' index UNION ALL
SELECT 2, 'd' UNION ALL
SELECT 3, 'c' UNION ALL
SELECT 4, 'e'
)
SELECT *
FROM `<some_other_db>.table`
WHERE index IN UNNEST((
SELECT SPLIT(REGEXP_REPLACE(var, r'[()\']', ''))
FROM vars
))
with output
Row id index
1 1 a
2 3 c
I think this does what you are asking for:
with vars as ( select "('a','b','c')" as var1)
select as struct
MAX(CASE WHEN n = 0 then var END) as f1,
MAX(CASE WHEN n = 1 then var END) as f2,
MAX(CASE WHEN n = 2 then var END) as f3
from vars v cross join
unnest(SPLIT(REPLACE(REPLACE(var1, '(', ''), ')', ''), ',')) var with offset n;

Generating additional SQL rows from columns

In presto, I have rows of the form
name, var1, var2, var3
Foo, A, B, C
And because I need to group by var1, and also by var2 and var3 (each separately), I want to transform each row into three rows of the form:
name, key
Foo, var1=A
Foo, var2=B
Foo, var3=C
So that I can then just group by key. Presto doesn't have an UNPIVOT function, so any advice would be appreciated!
You can increase the number of rows by using a cross join to a subquery of the needed number of rows e.g.
select
t.name
, case when n.n = 1 then t.var1
when n.n = 2 then t.var2
when n.n = 3 then t.var3
end as key
from sourcetbl t
cross join (
select 1 as n union all
select 2 as n union all
select 3 as n
) n
and then a case expression may be used to reduce the number of columns based on the supplied "row number" (n.n in my example)

Bigquery union two arrays of different struct

I have two tables
v1
ARRAY<STRUCT<a int64>>
and
v2
ARRAY<STRUCT<a int64, b int64>>
I want to write query which unions both tables using union all and for v1 rows put nulls in place of b field. Any help is appreciated :)
I'm using standard SQL.
Below is for BigQuery Standard SQL
#standardSQL
SELECT
ARRAY(SELECT AS STRUCT val.a, NULL AS b FROM UNNEST(arr1) val) arr
FROM `project.dataset.v1`
UNION ALL
SELECT arr2 AS arr
FROM `project.dataset.v2`
you can test, play with above using dummy data as below
#standardSQL
WITH `project.dataset.v1` AS (
SELECT [STRUCT<a INT64>(1),STRUCT(2),STRUCT(3)] arr1
), `project.dataset.v2` AS (
SELECT [STRUCT<a INT64, b INT64>(100, 1),STRUCT(100, 2),STRUCT(100, 3)] arr2
)
SELECT
ARRAY(SELECT AS STRUCT val.a, NULL AS b FROM UNNEST(arr1) val) arr
FROM `project.dataset.v1`
UNION ALL
SELECT arr2 AS arr
FROM `project.dataset.v2`
with result as
Row arr.a arr.b
1 1 null
2 null
3 null
2 100 1
100 2
100 3

How do I combine 2 records with a single field into 1 row with 2 fields (Oracle 11g)?

Here's a sample data
record1: field1 = test2
record2: field1 = test3
The actual output I want is
record1: field1 = test2 | field2 = test3
I've looked around the net but can't find what I'm looking for. I can use a custom function to get it in this format but I'm trying to see if there's a way to make it work without resorting to that.
thanks a lot
You need to use pivot:
with t(id, d) as (
select 1, 'field1 = test2' from dual union all
select 2, 'field1 = test3' from dual
)
select *
from t
pivot (max (d) for id in (1, 2))
If you don't have the id field you can generate it, but you will have XML type:
with t(d) as (
select 'field1 = test2' from dual union all
select 'field1 = test3' from dual
), t1(id, d) as (
select ROW_NUMBER() OVER(ORDER BY d), d from t
)
select *
from t1
pivot xml (max (d) for id in (select id from t1))
There are several ways to approach this - google pivot rows to columns. Here is one set of answers: http://www.dba-oracle.com/t_converting_rows_columns.htm

Oracle SQL -- select from two columns and combine into one

I have this table:
Vals
Val1 Val2 Score
A B 1
C 2
D 3
I would like the output to be a single column that is the "superset" of the Vals1 and Val2 variable. It also keeps the "score" variable associated with that value.
The output should be:
Val Score
A 1
B 1
C 2
D 3
Selecting from this table twice and then unioning is absolutely not a possibility because producing it is very expensive. In addition I cannot use a with clause because this query uses one in a sub-query and for some reason Oracle doesn't support two with clauses.
I don't really care about how repeat values are dealt with, whatever is easiest/fastest.
How can I generate my appropriate output?
Here is solution without using unpivot.
with columns as (
select level as colNum from dual connect by level <= 2
),
results as (
select case colNum
when 1 then Val1
when 2 then Val2
end Val,
score
from vals,
columns
)
select * from results where val is not null
Here is essentially the same query without the WITH clause:
select case colNum
when 1 then Val1
when 2 then Val2
end Val,
score
from vals,
(select level as colNum from dual connect by level <= 2) columns
where case colNum
when 1 then Val1
when 2 then Val2
end is not null
Or a bit more concisely
select *
from ( select case colNum
when 1 then Val1
when 2 then Val2
end Val,
score
from vals,
(select level as colNum from dual connect by level <= 2) columns
) results
where val is not null
try this, looks like you want to convert column values into rows
select val1, score from vals where val1 is not null
union
select val2,score from vals where val2 is not null
If you're on Oracle 11, unPivot will help:
SELECT *
FROM vals
UNPIVOT ( val FOR origin IN (val1, val2) )
you can choose any names instead of 'val' and 'origin'.
See Oracle article on pivot / unPivot.