Reformat SAS data set into multiple observations - formatting

I am trying to reformat a data set in SAS that I am outputting as a csv. It is currently in the format:
Type, Name, data1, data2, data3…
Dog, retriever, 20, 40, 60…
Dog, corgi, 10, 30, 50…
Cat, Persian, 15, 25, 35…
Cat, stray, 1, 2, 3…
And I am trying to get it in the format:
Dog, retriever, data1, 20
Dog, retriever, data2, 40
Dog, retriever, data3, 60
Dog, Corgi, data1, 10
Dog, corgi, data2, 30
Dog, corgi, data3, 50
Cat, Persian, data1, 15
Cat, Persian, data2, 25
Cat, Persian, data3, 35
Cat, Siamese, data1, 1
Cat, Siamese, data2, 2
Cat, Siamese, data3, 3
Do you know the best way to go about this in SAS?
Thanks

with proc transpose, something like this :
PROC TRANSPOSE DATA = ...
OUT=...
NAME=ValueSource
LABEL=ValueDescription
;
BY type name;
ID <a column with hte same value for all your observations>;
VAR data1 data2 data3;
RUN;

Related

Turn one column into multiple columns by key words

How can I split a column into more columns base on the specific words? For example, I have table A and I want to split col wherever the words "AND, OR, PLUS" appears, so that I get table B as the result.
A
ID
col
1
THE BIG APPLE AND ORANGE OR PEAR
2
BANNANA EATS GRAPE OR BLUEBERRY
3
THE BEST FRUIT IS WATERMELON
4
FRUITS OR CANDY ARE THE BEST OR WATER
5
APPLE STRAWBERRY AND PLUM PLUS SUGAR OR PEACH
6
MELON IN MY BELLY
B
ID
col1
col2
col3
col4
1
THE BIG APPLE
ORANGE
PEAR
2
BANNANA EATS GRAPE
BLUEBERRY
3
THE BEST FRUIT IS WATERMELON
4
FRUITS
CANDY ARE THE BEST
WATER
5
APPLE STRAWBERRY
PLUM
SUGAR
PEACH
6
MELON IN MY BELLY
You can split the string and then PIVOT:
SELECT *
FROM (
SELECT id,
idx,
match
FROM table_name
CROSS APPLY (
SELECT LEVEL AS idx,
REGEXP_SUBSTR(
col,
'(.+?)(\s+(AND|OR|PLUS)\s+|$)',
1,
LEVEL,
'i',
1
) AS match
FROM DUAL
CONNECT BY LEVEL <= REGEXP_COUNT(
col,
'(.+?)(\s+(AND|OR|PLUS)\s+|$)',
1,
'i'
)
)
)
PIVOT (
MAX(match)
FOR idx IN (1 AS col1, 2 AS col2, 3 AS col3, 4 AS col4)
);
Note: SQL statements MUST have a fixed number of output columns so you cannot dynamically set the number of columns with a static SQL statement. It would possibly be better to just use the inner query (without the outer wrapper which performs the PIVOT) and output the values as rows rather than columns and then if you want to transpose to columns then do it in whatever front-end you are using to access the database.
Which, for the sample data:
CREATE TABLE table_name (ID, col) AS
SELECT 1, 'THE BIG APPLE AND ORANGE OR PEAR' FROM DUAL UNION ALL
SELECT 2, 'BANNANA EATS GRAPE OR BLUEBERRY' FROM DUAL UNION ALL
SELECT 3, 'THE BEST FRUIT IS WATERMELON' FROM DUAL UNION ALL
SELECT 4, 'FRUITS OR CANDY ARE THE BEST OR WATER' FROM DUAL UNION ALL
SELECT 5, 'APPLE STRAWBERRY AND PLUM PLUS SUGAR OR PEACH' FROM DUAL UNION ALL
SELECT 6, 'MELON IN MY BELLY' FROM DUAL;
Outputs:
ID
COL1
COL2
COL3
COL4
1
THE BIG APPLE
ORANGE
PEAR
null
2
BANNANA EATS GRAPE
BLUEBERRY
null
null
3
THE BEST FRUIT IS WATERMELON
null
null
null
4
FRUITS
CANDY ARE THE BEST
WATER
null
5
APPLE STRAWBERRY
PLUM
SUGAR
PEACH
6
MELON IN MY BELLY
null
null
null
db<>fiddle here

Conversion of columns to rows

ID Name M E H S
1 Sally 78 85 91 76
2 Edward 87 90 82 87
convert to
ID Name Subject Marks
1 Sally M 78
1 Sally E 85
1 Sally H 91
1 Sally S 76
2 Edward M 87
2 Edward E 90
2 Edward H 82
2 Edward S 87
The unpivot function will perform the action you're looking for, try the following:
with sample_data as (
SELECT 1 as id, 'Sally' as name, 78 as M, 85 as E, 91 as H, 76 as S UNION ALL
SELECT 2, 'Edward', 87, 90, 82, 87
)
SELECT id, name, subject, marks
from sample_data
unpivot(marks for subject in (M,E,H,S));
for more information on unpivot see the docs here:
https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#unpivot_operator
One simple approach uses a series of unions:
SELECT ID, Name, 'M' AS Subject, M AS Marks, 1 AS pos FROM yourTable UNION ALL
SELECT ID, Name, 'E', E, 2 FROM yourTable UNION ALL
SELECT ID, Name, 'H', H, 3 FROM yourTable UNION ALL
SELECT ID, Name, 'S', S, 4 FROM yourTable
ORDER BY ID, pos;
In many practical case number of columns are big enough to enlist in the query or even unknown in advance - so below approach is covering more generic cases - you don't need to know in advance number and name of columns
select id, name, key, value
from your_table t,
unnest([to_json_string((select as struct * except(id, name) from unnest([t])))]) json,
unnest(bqutil.fn.json_extract_keys(json)) key with offset
join unnest(bqutil.fn.json_extract_values(json)) value with offset
using (offset)
if applied to sample data in your question - output is

How to update a value within a column with another value- Oracle

I have a static table that I want to replace a particular value with a string. In other words, in my table, I want to replace the value dog in the type_1 column with 'primate'.
Here is my table:
t_id|type_1|count
1, dog, 22
2, cat, 55
3, bird, 12
Here is my expected output:
t_id|type_1|count
1, primate, 22
2, cat, 55
3, bird, 12
As you can see, I just want to replace the value dog with the value primate.
Here is my code so far:
SELECT REPLACE(t.type_1, 'dog', 'primate')
FROM table_1 t
where t.type_1 = 'dog' and t.t_id='1'
I am new to oracle sql so the syntax is a bit confusing to me. Any ideas or suggestions would help.
If you just want a query, use a case expression:
select t.t_id,
(case when t.type_1 = 'dog' then 'primate' else t.type_1 end) as type_1,
t.count
from table_1 t;
If you want to change the value in the table (i.e. permanently), use update:
update table_1
set type_1 = 'primate'
where type_1 = 'dog';
Looks like update.
This is what you have:
SQL> select * from animal;
T_ID TYPE_1 CNT
---------- ---------- ----------
1 dog 22
2 cat 55
3 bird 12
Change "dog" to "primate":
SQL> update animal set type_1 = 'primate' where type_1 = 'dog';
1 row updated.
Result:
SQL> select * from animal;
T_ID TYPE_1 CNT
---------- ---------- ----------
1 primate 22
2 cat 55
3 bird 12
SQL>
Or, if you just need a select statement, then case might do:
SQL> select t_id,
2 case when type_1 = 'dog' then 'primate'
3 else type_1
4 end as type,
5 cnt
6 from animal;
T_ID TYPE CNT
---------- ---------- ----------
1 primate 22
2 cat 55
3 bird 12
SQL>

Remove all duplicates except latest occurrence in big query standard sql based off two columns

If I have a table in big query that contains the following
fruit color quantity age other_field
apple red 3 1 foo
grapes green 5 1 young
apple green 1 3 word
apple red 4 5 bar
How would I delete all rows except the last instance containing the same fruit & color column so that my table would then look like this
fruit color quantity age other_field
grapes green 5 1 young
apple green 1 3 word
apple red 4 5 bar
Essentially only keeping a single row for every unique pair of fruit and color in big query standard sql?
Below is for BigQuery Standard SQL
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'apple' fruit, 'red' color, 3 quantity, 1 age, 'foo' other_field UNION ALL
SELECT 'grapes', 'green', 5, 1, 'young' UNION ALL
SELECT 'apple', 'green', 1, 3, 'word' UNION ALL
SELECT 'apple', 'red', 4, 5, 'bar'
)
SELECT fruit, color,
ARRAY_AGG(STRUCT(quantity, age, other_field) ORDER BY age DESC LIMIT 1)[OFFSET(0)].*
FROM `project.dataset.table` t
GROUP BY fruit, color
with result
Row fruit color quantity age other_field
1 apple red 4 5 bar
2 grapes green 5 1 young
3 apple green 1 3 word
Another version of same is:
#standardSQL
SELECT AS VALUE
ARRAY_AGG(t ORDER BY age DESC LIMIT 1)[OFFSET(0)]
FROM `project.dataset.table` t
GROUP BY fruit, color
with same result ... but obviously I like this version better :o)

How to Pivot data from distinct rows into a single row under different columns?

So I Have a table called Value that's associated with different 'Fields':
VALUE_ID VALUE_TX FIELD_NAME SUB_ID
1 Yes Adult 1
2 18 Age 1
3 Black Eye Color 1
4 Brown Hair Color 1
5 Female Gender 1
I have a table called Submitted that looks like the following:
SUB_ID Submitted_Name
1 TEST_RUN
I need a result set that Looks like this:
Submitted_Name Adult Age Eye Color Hair Color Gender
TEST_RUN Yes 18 Black Brown Female
I've tried the following:
SELECT * FROM (
select value_Tx, field_name, sub_id
from VALUE
)
PIVOT (max (value_tx) for field_name in ('Adult', 'Age', 'Eye Color', 'Hair Color', 'Gender')
);
What am I doing wrong? Please let me know if I need to add any additional details / data.
Thanks in advance!
List item