I have a table like:
id name value
--------------------
1 x 100
1 y 200
1 z 300
2 x 10
2 y abc
2 z 001
3 x 1
...
--------------------
and I need to transform it into something like that:
id x y z
---------------------
1 100 200 300
2 10 abc 001
3 1 ...
---------------------
Names are determined. I could make multiple joins but I'm looking for a more elegant solution.
Use conditional aggregation which in Postgres uses the filter syntax:
select id,
max(value) filter (where name = 'x') as x,
max(value) filter (where name = 'y') as y,
max(value) filter (where name = 'z') as z
from t
group by id;
The additional module tablefunc provides variants of the crosstab() function, which is typically fastest:
SELECT *
FROM crosstab(
'SELECT id, name, value
FROM tbl
ORDER BY 1, 2'
) AS ct (id int, x text, y text, z text);
You seem to have a mix of numbers and strings in your value, so I chose text as output.
See:
PostgreSQL Crosstab Query
Related
I have an Oracle table X. I want to update Column X in this table with a value from a column in table Y. The columns are the same but there is no join on these tables. I have written a select to get the IDs from table Y. I am just not sure how to update the table X records with each value from the select.
It does not matter which ID goes where because it mock data anyways. I just want to populate column X data from table Y.
Suppose these are your tables; both have ID column, but you can't make a join on it as they have nothing in common.
SQL> select * from x order by id;
ID X
---------- ----------
1 100
2 200
3 300
SQL> select * From y order by id;
I Y
- ----------
A 10
B 20
C 30
D 40
SQL>
You'd want to put Y values (10, 20, 30, ...) into X column (instead of 100, 200, ...). Here's one option: use ROWNUM as a fake ID column which will then be used to establish join between tables. No problem in doing that because you don't really care which value goes where.
SQL> update x set
2 x.x = (with
3 a as (select id, x, rownum rn from x),
4 b as (select id, y, rownum rn from y)
5 select b.y
6 from a join b on a.rn = b.rn
7 where a.id = x.id
8 );
3 rows updated.
New table X contents:
SQL> select * from x order by id;
ID X
---------- ----------
1 10
2 20
3 30
SQL>
I want to assign a value in new_col based on value in column 'ind' when months = 1;
idnum1 months ind new_col
1 1 X X
1 2 X X
1 3 Y X
1 4 Y X
1 5 X X
2 1 Y Y
2 2 Y Y
2 3 X Y
2 4 X Y
2 5 X Y
Below query just assign the value X where months = 1 but I want in all the rows of new_col for all the id -
create table tmp as
select t1.*,
case when months = 1 then ind end as new_col
from table t1;
I am trying to do it in SAS using proc sql;
Ideally you would use RETAIN within a data step:
data want;
set have;
retain new_var;
if month=1 then new_var = ind;
run;
SQL isn't as good with this as a data step.
But assuming your variable ID is repeated then this would work. If it's not then you really do need the data step approach.
proc sql;
create table want as
select *, max(ind) as new_col
from have
group by ID;
quit;
EDIT: If you want to retain the first per ID just use FIRST. instead of If month =1.
data want;
set have;
by ID;
retain new_var;
if first.id then new_var = ind;
run;
A robust Proc SQL statement that deals with possibly repeated first month situations that chooses the lowest ind to distribute to the group
data have; input
idnum1 months ind $ new_col $; datalines;
1 1 X X
1 2 X X
1 3 Y X
1 4 Y X
1 5 X X
2 1 Y Y
2 2 Y Y
2 3 X Y
2 4 X Y
2 5 X Y
3 1 Z .
3 1 Y .
3 1 X .
3 2 A .
;
create table want as
select
have.idnum1, months, ind, new_col, lowest_first_ind
from
have
join
( select idnum1, min(ind) as lowest_first_ind from
(
select idnum1, ind
from have
group by idnum1
having months = min(months)
)
group by idnum1
) value_seeker
on
have.idnum1 = value_seeker.idnum1
;
You can use a window function:
select t1.*,
max(case when months = 1 then ind end) over (partition by id) as new_col
from t1;
If there is only one MONTH=1 observation per BY group then just use a simple join.
create table WANT as
select t1.*,t2.ind as new_col
from table t1
left join (select idnum1,ind from table where month=1) t2
on t1.idnum1 = t2.idnum1
;
I have a SQL table of the following format:
ID Cat
1 A
1 B
1 D
1 F
2 B
2 C
2 D
3 A
3 F
Now, I want to create a table with one ID per row, and multiple Cat's in a row. My desired output looks as follows:
ID A B C D E F
1 1 1 0 1 0 1
2 0 1 1 1 0 0
3 1 0 0 0 0 1
I have found:
Transform table to one-hot-encoding of single column value
However, I have more than 1000 Cat's, so I am looking for code to write this automatically, rather than manually. Who can help me with this?
First let me transform the data you pasted into an actual table:
WITH data AS (
SELECT REGEXP_EXTRACT(data2, '[0-9]') id, REGEXP_EXTRACT(data2, '[A-Z]') cat
FROM (
SELECT SPLIT("""1 A
1 B
1 D
1 F
2 B
2 C
2 D
3 A
3 F""", '\n') AS data1
), UNNEST(data1) data2
)
SELECT * FROM data
(try sharing a table next time)
Now we can do some manual 1-hot encoding:
SELECT id
, MAX(IF(cat='A',1,0)) cat_A
, MAX(IF(cat='B',1,0)) cat_B
, MAX(IF(cat='C',1,0)) cat_C
FROM data
GROUP BY id
Now we want to write a script that will automatically create the columns we want:
SELECT STRING_AGG(FORMAT("MAX(IF(cat='%s',1,0))cat_%s", cat, cat), ', ')
FROM (
SELECT DISTINCT cat
FROM data
ORDER BY 1
)
That generates a string that you can copy paste into a query, that 1-hot encodes your arrays/rows:
SELECT id
,
MAX(IF(cat='A',1,0))cat_A, MAX(IF(cat='B',1,0))cat_B, MAX(IF(cat='C',1,0))cat_C, MAX(IF(cat='D',1,0))cat_D, MAX(IF(cat='F',1,0))cat_F
FROM data
GROUP BY id
And that's exactly what the question was asking for. You can generate SQL with SQL, but you'll need to write a new query using that result.
BigQuery has no dynamic column with standardSQL, but depending on what you want to do on the next step, there might be a way to make it easier.
Following code sample groups Cat by ID and uses a JavaScript function to do one-hot encoding and return JSON string.
CREATE TEMP FUNCTION trans(cats ARRAY<STRING>)
RETURNS STRING
LANGUAGE js
AS
"""
// TODO: Doing one hot encoding for one cat and return as JSON string
return "{a:1}";
"""
;
WITH id_cat AS (
SELECT 1 as ID, 'A' As Cat UNION ALL
SELECT 1 as ID, 'B' As Cat UNION ALL
SELECT 1 as ID, 'C' As Cat UNION ALL
SELECT 2 as ID, 'A' As Cat UNION ALL
SELECT 3 as ID, 'C' As Cat)
SELECT ID, trans(ARRAY_AGG(Cat))
FROM id_cat
GROUP BY ID;
I'm having trouble finding the right sql query. I want to select all the rows with a unique x value and if there are rows with the same x value, then I want to select the row with the greatest y value. As an example I've put a part of my database below.
ID x y
1 2 3
2 1 5
3 4 6
4 4 7
5 2 6
The selected rows should then be those with ID 2, 4 and 5.
This is what I've got so far
SELECT *
FROM base
WHERE x IN
(
SELECT x
FROM base
HAVING COUNT(*) > 1
)
But this only results in the rows that occur more than once. I've added the tags R, postgresql and sqldf because I'm working in R with those packages.
Here is a typical way to formulate the query in ANSI SQL:
select b.*
from base b
where not exists (select 1
from base b2
where b2.x = b.x and
b2.y > b.y
);
In Postgres, you would use distinct on for performance:
select distinct on (x) b.*
from base b
order by x, y desc;
You could try this query:
select x, max(y) from base group by x;
And, if you'd also like the id column in the result:
select base.*
from base join (select x, max(y) from base group by x) as maxima
on (base.x = maxima.x and base.y = maxima.max);
Example:
CREATE TABLE tmp(id int, x int ,y int);
INSERT INTO .....
test=# SELECT x, max(y) AS y FROM tmp GROUP BY x;
x | y
---+---
4 | 7
1 | 5
2 | 6
I have a table like this.
ID NAME VALUE
______________
1 A X
2 A Y
3 A Z
4 B X
5 B Y
6 C X
7 C Z
8 D Z
9 E X
And the query:
SELECT * FROM TABLE1 T WHERE T.VALUE IN (X,Z)
This query gives me
ID NAME VALUE
______________
1 A X
3 A Z
4 B X
6 C X
7 C Z
8 D Z
9 E X
But i want to see all values of names which have all params. So, only A and C have both X and Z values, and my desired result is:
ID NAME VALUE
______________
1 A X
2 A Y
3 A Z
6 C X
7 C Z
How can I get the desired result? No matter with sql or with reporting service. Maybe "GROUP BY ..... HAVING" clause will help, but I'm not sure.
By the way I dont know how many params will be in the list.
I realy appreciate any help.
The standard approach would be something like
SELECT id, name, value
FROM table1 a
WHERE name IN (SELECT name
FROM table1 b
WHERE b.value in (x,y)
GROUP BY name
HAVING COUNT(distinct value) = 2)
That would require that you determine how many values are in the list so that you can use a 2 in the HAVING clause if there are 2 elements, a 5 if there are 5 elements, etc. You could also use analytic functions
SELECT id, name, value
FROM (SELECT id,
name,
value,
count(distinct value) over (partition by name) cnt
FROM table1 t1
WHERE t1.value in (x,y))
WHERE cnt = 2
I prefer to structure these "sets within sets" of queries as an aggregatino. I find this is the most flexible approach:
select t.*
from t
where t.name in (select name
from t
group by name
having sum(case when value = 'X' then 1 else 0 end) > 0 and
sum9case when value = 'Y' then 1 else 0 end) > 0
)
The subquery for the in finds all names that have at least one X value and one Y value. Using the same logic, it is easy to adjust for other conditions (X and Y and Z,; X and Y but not Z and so on). The outer query just returns all the rows instead of the names.