I need to merge a table with ID and various bit flags like this
-----------------
a1 | x | | x |
-----------------
a1 | | x | |
-----------------
a1 | | | |
-----------------
b2 | x | | |
-----------------
b2 | | | |
-----------------
c3 | x | x | x |
into such form
-----------------
a1 | x | x | x |
-----------------
b2 | x | | |
-----------------
c3 | x | x | x |
The problem is that data are join by kind of option ID each option has an unique ID which is joined with a1, b2. When I try to SELECT it by using DISTINCT I receive results from table number 1. I can make it by subqueries in SELECT but it is really weak solution due to performance reasons.
Do you have any idea how select and combine all these flags into single row?
use aggregation
select col1 ,max(col2),max(col3),max(col4)
form table_name group by col1
For the given result set it is eligible to use MIN and GROUP BY:
SELECT
tbl.Col
, MIN(tbl.Col1) Col1
, MIN(tbl.Col2) Col2
, MIN(tbl.Col3) Col3
FROM #table tbl
GROUP BY tbl.Col
However, if you have empty rows, then use MAX(). Otherwise MIN() returns empty rows:
SELECT
tbl.Col
, MAX(tbl.Col1) Col1
, MAX(tbl.Col2) Col2
, MAX(tbl.Col3) Col3
FROM #table tbl
GROUP BY tbl.Col
For example:
DECLARE #table TABLE
(
Col VARCHAR(50),
Col1 VARCHAR(50),
Col2 VARCHAR(50),
Col3 VARCHAR(50)
)
INSERT INTO #table
(
Col,
Col1,
Col2,
Col3
)
VALUES
( 'a1', -- Col - varchar(50)
'x', -- Col1 - varchar(50)
Null, -- Col2 - varchar(50)
'x' -- Col3 - varchar(50)
)
, ('a1', NULL, 'x', null)
, ('a1', NULL, 'x', null)
, ('b2', 'x', null, null)
, ('b2', null, null, null)
, ('c3', 'x', 'x', 'x')
SELECT
tbl.Col
, MIN(tbl.Col1) Col1
, MIN(tbl.Col2) Col2
, MIN(tbl.Col3) Col3
FROM #table tbl
GROUP BY tbl.Col
OUTPUT:
Col Col1 Col2 Col3
a1 x x x
b2 x NULL NULL
c3 x x x
You want aggregation :
select col1, max(col2), max(col2), max(col3)
from table t
group by col1;
This assuming blank value as null.
The general solution for such a situation is to simply aggregate and either use MIN or MAX on the columns.
SQL Server's data type BIT, however, is quirky. It's a little like a BOOLEAN, but not a real boolean. It is a little like a very limited numeric data type, but it isn't really a numeric type either. And there simply exist no aggregation functions for this data type. In standard SQL you'd have ANY and EVERY for the BOOLEAN type. In PostgreSQL you have BIT_OR and BIT_AND for BIT and BOOL_OR and BOOL_AND for BOOLEAN. SQL Server has nothing.
So convert your columns to a numeric type before using MIN (which would be a bitwise AND) or MAX (which would be a bitwise OR) on it. E.g.
select
id,
max(bit1 + 0) as bit1agg,
max(bit2 + 0) as bit2agg,
max(bit3 + 0) as bit3agg
from mytable
group by id
order by id;
You can also use CAST or CONVERT instead of course.
Related
I have a dataset table like this in Google Big Query:
| col1 | col2 | col3 | col4 | col5 | col6 |
-------------------------------------------
| a1 | b1 | c1 | d1 | e2 | f1 |
| a2 | b2 | c2 | d1 | e2 | f2 |
| a1 | b3 | c3 | d1 | e3 | f2 |
| a2 | b1 | c4 | d1 | e4 | f2 |
| a1 | b2 | c5 | d1 | e5 | f2 |
Let's say the given threshold number is 4, in that case, I want to transform this into one of the tables given below:
| col1 | col2 | col4 | col5 | col6 |
---------------------------------------------------------------------
| [a1,a2] | [b1,b2,b] | [d1] |[e2,e3,e4,e5]| [f1,f2] |
Or like this:
| col | values |
------------------------
| col1 | [a1,a2] |
| col2 | [b1,b2,b] |
| col4 | [d1] |
| col5 | [e2,e3,e4,e5] |
| col6 | [f1,f2] |
Please note col3 was removed because it contained more than 4 (threshold) distinct values. I explored lot of documents here but was not able to figure out the required query. Can somebody help or point in the right direction ?
Edit: I have one solution in mind, where I do something like this:
select * from (select 'col1', array_aggregate(distinct col1) as values union all
select 'col2', array_aggregate(distinct col2) as values union all
select 'col3', array_aggregate(distinct col3) as values union all
select 'col4', array_aggregate(distinct col4) as values union all
select 'col5', array_aggregate(distinct col5) as values) X where array_length(values) > 4;
This will give me the second result but requires complex query construction assuming I don't know the number and names of the columns up front. Also, this might cross 100MB per row limit for BigQuery table as I will be having more than a billion rows in the table. Please also suggest if there is a better way to do this.
How about:
WITH arrays AS (
SELECT * FROM UNNEST((
SELECT [
STRUCT("col_repo_name" AS col, ARRAY_AGG(DISTINCT repo.name IGNORE NULLS LIMIT 1001) AS values)
, ('col_actor_login', ARRAY_AGG(DISTINCT actor.login IGNORE NULLS LIMIT 1001))
, ('col_type', ARRAY_AGG(DISTINCT type IGNORE NULLS LIMIT 1001))
, ('col_org_login', ARRAY_AGG(DISTINCT org.login IGNORE NULLS LIMIT 1001))
]
FROM `githubarchive.year.2017`
))
)
SELECT *
FROM arrays
WHERE ARRAY_LENGTH(values)<=1000
This query processed 20.6GB in 11.9s (half billion rows). It only returned one row, because every other row had more than 1000 unique values (my threshold).
That's traditional SQL -- but see here an even simpler query, that produces similar results:
SELECT col, ARRAY_AGG(DISTINCT value IGNORE NULLS LIMIT 1001) values
FROM (
SELECT REGEXP_EXTRACT(x, r'"([^\"]*)"') col , REGEXP_EXTRACT(x, r'":"([^\"]*)"') value
FROM (
SELECT SPLIT(TO_JSON_STRING(STRUCT(repo.name, actor.login, type, org.login)), ',') x
FROM `githubarchive.year.2017`
), UNNEST(x) x
)
GROUP BY col
HAVING ARRAY_LENGTH(values)<=1000
# 17.0 sec elapsed, 20.6 GB processed
Caveat: This will only run if there are no special values in the columns, like quotes or commas. If you have those, it won't be as straightforward (but still possible).
Below is for BigQuery Standard SQL
#standardSQL
SELECT col, STRING_AGG(DISTINCT value) `values`
FROM (
SELECT
TRIM(z[OFFSET(0)], '"') col,
TRIM(z[OFFSET(1)], '"') value
FROM `project.dataset.table` t,
UNNEST(SPLIT(TRIM(to_JSON_STRING(t), '{}'))) kv,
UNNEST([STRUCT(SPLIT(kv, ':') AS z)])
)
GROUP BY col
HAVING COUNT(DISTINCT value) < 5
You can test, play with above using sample data from your question - result will be
Row col values
1 col1 a1,a2
2 col2 b1,b2,b3
3 col4 d1
4 col5 e2,e3,e4,e5
5 col6 f1,f2
#FelipeHoffa I was able to use your idea with a little modification in the query for my use-case.
SELECT * FROM UNNEST((
SELECT [
STRUCT("col_repo_name" AS col, ARRAY_AGG(DISTINCT repo.name IGNORE NULLS LIMIT 1001) AS values)
, ('col_actor_login', ARRAY_AGG(DISTINCT actor.login IGNORE NULLS LIMIT 1001))
, ('col_type', ARRAY_AGG(DISTINCT type IGNORE NULLS LIMIT 1001))
, ('col_org_login', ARRAY_AGG(DISTINCT org.login IGNORE NULLS LIMIT 1001))
]
FROM `githubarchive.year.2017`
))
This UNNEST on an array of structs will not work as it is because the underlying columns will have different data-types and BigQuery will not be able to put the arrays under a single column (with error like this: Array elements of types {STRUCT>, STRUCT>} do not have a common supertype). I modified it to something like this to serve my use-case.
SELECT * FROM UNNEST((
SELECT [
STRUCT("col_repo_name" AS col, to_json_string(ARRAY_AGG(DISTINCT repo.name IGNORE NULLS LIMIT 1001)) AS values)
, ('col_actor_login', to_json_string(ARRAY_AGG(DISTINCT actor.login IGNORE NULLS LIMIT 1001)))
, ('col_type', to_json_string(ARRAY_AGG(DISTINCT type IGNORE NULLS LIMIT 1001)))
, ('col_org_login', to_json_string(ARRAY_AGG(DISTINCT org.login IGNORE NULLS LIMIT 1001)))
]
FROM `githubarchive.year.2017`
))
And this worked well !
I currently have a few unpivot queries that yeilds about 2000 rows each. I need to take the results of those queires, and put in a new table to match on a key.
Query Example:
Select DeviceSlot
FROM tbl1
unpivot(
DeviceSlot
For col in(
col1,
col2,
col3,
)
)AS Unpivot
Now I need to match the results from the query, and insert it into a new table with about 20,000 rows.
Pseudo-Code for this:
Insert Into tbl2(DeviceSlot)
Select DeviceSlot
FROM tbl1
unpivot(
DeviceSlot
For col in(
col1,
col2,
col3
)
)AS Unpivot2
Where tbl1.key = tbl2.key
I've been pretty confused on how to do this, and I apologize if it is not clear.
I also have another unpivot query doing the same thing for different columns.
Not sure what you are asking for. While unpivoting to "normalize" data typically the wanted "key" is derived during the unpivot, for example, below the id column of the original table is repeated in the un-pivoted data to represent a foreign key for some new table.
SQL Fiddle
MS SQL Server 2014 Schema Setup:
CREATE TABLE Table1
([id] int, [col1] varchar(2), [col2] varchar(2), [col3] varchar(2))
;
INSERT INTO Table1
([id], [col1], [col2], [col3])
VALUES
(1, 'a', 'b', 'c'),
(2, 'aa', 'bb', 'cc')
;
Query 1:
select id as table1_fk, colheading, colvalue
from (
select * from table1
) t
unpivot (
colvalue for colheading in (col1, col2, col3)
) u
Results:
| table1_fk | colheading | colvalue |
|-----------|------------|----------|
| 1 | col1 | a |
| 1 | col2 | b |
| 1 | col3 | c |
| 2 | col1 | aa |
| 2 | col2 | bb |
| 2 | col3 | cc |
I have a table like this:
| Col1 | Col2 |
|:-----------|------------:|
| 1 | a;b; |
| 1 | b;c; |
| 2 | c;d; |
| 2 | d;e; |
I want the result to be some thing like this.
| Col1 | Col2 |
|:-----------|------------:|
| 1 | a;b;c;|
| 2 | c;d;e;|
Is there some way to write a set function which adds unique values in a column into an array and then displays them. I am using the Redshift Database which mostly uses postgresql with the following difference:
Unsupported PostgreSQL Functions
Have a look at Redshift's listagg() function which is similar to MySQL's group_concat. You would need to split the items first and then use listagg() to give you a list of values. Do take note, though, that, as the documentation states:
LISTAGG does not support DISTINCT expressions
(Edit: As of 11th October 2018, DISTINCT is now supported. See the docs.)
So will have to take care of that yourself. Assuming you have the following table set up:
create table _test (col1 int, col2 varchar(10));
insert into _test values (1, 'a;b;'), (1, 'b;c;'), (2, 'c;d;'), (2, 'd;e;');
Fixed number of items in Col2
Perform as many split_part() operations as there are items in Col2:
select
col1
, listagg(col2, ';') within group (order by col2)
from (
select col1, split_part(col2, ';', 1) as col2 from _test
union select col1, split_part(col2, ';', 2) as col2 from _test
)
group by col1
;
Varying number of items in Col2
You would need a helper here. If there are more rows in the table than items in Col2, a workaround with row_number() could work (but is expensive for large tables):
with _helper as (
select
(row_number() over())::int as part_number
from
_test
),
_values as (
select distinct
col1
, split_part(col2, ';', part_number) as col2
from
_test, _helper
where
length(split_part(col2, ';', part_number)) > 0
)
select
col1
, listagg(col2, ';') within group (order by col2) as col2
from
_values
group by
col1
;
Before I begin, I know there is a whole bunch of questions on Stackoverflow on this topic but I could not find any of them relevant to my case because they involve something much more complicated than what I need.
What I want is a simple dumb transpose with no logic involved.
Here is the original table that my select query returns:
Name Age Sex DOB Col1 Col2 Col3 ....
A 12 M 8/7 aa bb cc
Typically, this is going to contain only 1 record i.e. for one person
Now what I want is
Field Value
Name A
Age 12
Sex M
DOB 8/7
Col1 aa
Col2 bb
Col3 cc
.
.
So there is no counting, summing or any complicated logic involved like most of the similar question on Stackoverflow.
How do I do it?
I read through the PIVOT and UNPIVOT help and it was not that helpful at all.
PS: By chance, if it contains more than one records, is it possible to return each record as a field somewhat like
Field Value1 Value2 Value3 ...
Name A B C ...
Age .. .. .. ...
.
.
I want to know how to to do this for Oracle 10g and 11g
PS:Feel free to tag as duplicate if you find a question that is truly similar to mine.
I would suggest applying the UNPIVOT function first to your multiple columns, then using row_number() to create your new column names that will be used in the PIVOT.
The basic syntax for the unpivot will be
select field,
value,
'value'||
to_char(row_number() over(partition by field
order by value)) seq
from yourtable
unpivot
(
value
for field in (Name, Age, Sex, DOB, col1, col2, col3)
) u;
See SQL Fiddle with Demo. This is going to convert your multiple columns of data into multiple rows. I used row_number() to create a unique value for your new column names, the data from this query looks like:
| FIELD | VALUE | SEQ |
|-------|-------------------------|--------|
| AGE | 12 | value1 |
| AGE | 15 | value2 |
| COL1 | aa | value1 |
| COL1 | xx | value2 |
Then you can apply the PIVOT function to this result:
select field, value1, value2
from
(
select field,
value,
'value'||
to_char(row_number() over(partition by field
order by value)) seq
from yourtable
unpivot
(
value
for field in (Name, Age, Sex, DOB, col1, col2, col3)
) u
) d
pivot
(
max(value)
for seq in ('value1' as value1, 'value2' as value2)
) piv
See SQL Fiddle with Demo. This gives a final result:
| FIELD | VALUE1 | VALUE2 |
|-------|-------------------------|-------------------------|
| AGE | 12 | 15 |
| COL1 | aa | xx |
| COL2 | bb | yy |
| COL3 | cc | zz |
| DOB | 07-Aug-2001 12:00:00 AM | 26-Aug-2001 12:00:00 AM |
| NAME | A | B |
| SEX | F | M |
Note, when you are applying the unpivot function the datatype of all of the columns must be the same so you might have to convert your data in a subquery before you can unpivot it.
The UNPIVOT/PIVOT function were introduced in Oracle 11g, if you are using Oracle 10g, then you can edit the query to use:
with cte as
(
select 'name' field, name value
from yourtable
union all
select 'Age' field, Age value
from yourtable
union all
select 'Sex' field, Sex value
from yourtable
union all
select 'DOB' field, DOB value
from yourtable
union all
select 'col1' field, col1 value
from yourtable
union all
select 'col2' field, col2 value
from yourtable
union all
select 'col3' field, col3 value
from yourtable
)
select
field,
max(case when seq = 'value1' then value end) value1,
max(case when seq = 'value2' then value end) value2
from
(
select field, value,
'value'||
to_char(row_number() over(partition by field
order by value)) seq
from cte
) d
group by field;
See SQL Fiddle with Demo
Is there any way to map the first table to the second table with an SQL query or, if too complicated, a PL/SQL block?
Original
--------------------------------------
| col1 | col2 | col3 | col4 |
--------------------------------------
| key | case 1 | case 2 | case 3 |
| value1 | v1c1 | v1c2 | v1c3 |
| value2 | v2c1 | v2c2 | v2c3 |
--------------------------------------
Target
-----------------------------
| key | case | result |
-----------------------------
| value1 | case 1 | v1c1 |
| value1 | case 2 | v1c2 |
| value1 | case 3 | v1c3 |
| value2 | case 1 | v2c1 |
| value2 | case 2 | v2c2 |
| value2 | case 3 | v2c3 |
-----------------------------
The original table can have a variable number of columns, and 'key' is a hardcoded string and is always in column 1 of the original table. No other row has “key” in column 1, so this row is a unique pivot.
Thank you
If dynamic sql is allowed, then it is possible to have all your requirements fullfilled using one query:
SELECT col1 as "key"
,extractvalue(dbms_xmlgen.getXMLType('select "' || tc.Column_Name ||
'" as v from Original where col1 = ''key''')
,'/ROWSET/ROW/V') "case"
,extractvalue(dbms_xmlgen.getXMLType('select "' || tc.Column_Name ||
'" as v from Original where col1 = ''' ||
replace(col1, '''', '''''') || '''')
,'/ROWSET/ROW/V') "result"
FROM Original
,(SELECT Column_Name
FROM All_Tab_Columns tc
WHERE tc.Owner = 'YOURSCHEMA'
and tc.Table_Name = 'ORIGINAL'
and Column_Name != 'COL1'
ORDER BY tc.COLUMN_ID) tc
WHERE col1 != 'key'
ORDER BY "key"
,"case"
Some more details as requested:
dbms_xmlgen.getXMLType returns an XmlType instance which is basically the result of the supplied query string as XML.
The format is ROWSET for the root node and ROW for each row. Every column will be an element as well.
The 2 selects that I am creating are only returning one value and to makes things easier, I gave them a column alias "V" so that I know which value to pick from the XML.
extractValue is a function that returns the result of an XPath expression from an XmlType.
'/ROWSET/ROW/V' returns the first V node, from the first ROW node that resides under the root node ROWSET.
<ROWSET><ROW><V>Abc</V></ROW></ROWSET>
The original table can have a variable
number of columns
Really?
The straightforward way is to select and union the parts you want.
select col1 as key, 'case1' as case, col2 as result
from test
where col1 <> 'key'
union all
select col1 as key, 'case2' as case, col3 as result
from test
where col1 <> 'key'
union all
select col1 as key, 'case3' as case, col4 as result
from test
where col1 <> 'key'
Straightforward, but not dynamic.
Later . . .
Based on your comment . . . although I don't think it's necessary.
select col1 as key, (select col2 from test where col1='key') as case, col2 as result
from test
where col1 <> 'key'
union all
select col1 as key, (select col3 from test where col1='key') as case, col3 as result
from test
where col1 <> 'key'
union all
select col1 as key, (select col4 from test where col1='key') as case, col4 as result
from test
where col1 <> 'key'
Oracle 11 also supports UNPIVOT, which I haven't used.
I don't know which parts can change, but this should be a start for you. If the column names can change (key, case 1, etc.) you will have to have another query to get the correct column names. If you have questions feel free to ask:
declare
v_query VARCHAR2(5000);
v_case VARCHAR2(255);
v_colcount PLS_INTEGER;
begin
-- Get number of columns
select count(*)
INTO v_colcount
from user_tab_columns
where table_name = 'T1';
-- Build case statement to get correct value for result column
v_case := 'case';
for i in 1 .. v_colcount-1
loop
v_case := v_case||' when rn = '||to_char(i)||' then col'||to_char(i+1);
end loop;
v_case := v_case||' end result';
-- Build final query
v_query := 'select col1 key, ''case ''||rn case, '||v_case||'
from t1
cross join (
select rownum rn
from dual
connect by level <= '||to_char(v_colcount-1)||'
) cj
where col1 <> ''key''
order by key, case';
-- Display query (would probably be replaced with an insert using execute immediate)
dbms_output.put_line(v_query);
end;
This produces the following query (which assumes your original table is called t1):
select col1 key, 'case '||rn case, case when rn = 1 then col2 when rn = 2 then col3 when rn = 3 then col4 end result
from t1
cross join (
select rownum rn
from dual
connect by level <= 3
) cj
where col1 <> 'key'
order by key, case
Try this:
with data as
(select level l from dual connect by level <= 3)
select col1,
'case' || l as "case",
decode(l,1,col2,2,col3,3,col4) as "values"
from myTable, data
order by 1,2;
Cheers