How can I convert a string to a map in Azure Data Factory/Synapse Data Flow? - azure-data-factory-2

I have a string column in a parquet file:-
{"name":"bob","age":"35"}
I need to create separate columns for each data item.
How can I achieve this? Can I convert the string to a map or parse directly to columns?

We can use Parse transformation in mapping data flow to achieve that.
Select Single document as Document form. Select Column_1, enter Column_1 as column name, enter (name as string,age as integer) as Output column type:
Data preview is as follows:

Related

Extract key value pair from json column in redshift

I have a table mytable that stores columns in the form of JSON strings, which contain multiple key-value pairs. Now, I want to extract only a particular value corresponding to one key.
The column that stores these strings is of varchar datatype, and is created as:
insert into mytable(empid, json_column) values (1,'{"FIRST_NAME":"TOM","LAST_NAME" :"JENKINS", "DATE_OF_JOINING" :"2021-06-10", "SALARY" :"1000" }').
As you can see, json_column is created by inserting only a string. Now, I want to do something like:
select json_column.FIRST_NAME from mytable
I just want to extract the value corresponding to key FIRST_NAME.
Though my actual table is far more complex than this example, and I cannot convert these JSON keys into different columns themselves. But, this example clearly illustrates my issue.
This needs to be done over Redshift, please help me out with any valuable suggestions.
using function json_extract_path_text of Redshift can solve this problem easily, as follows:
select json_extract_path_text(json_column, 'FIRST_NAME') from mytable;

How to extract null values from Bigquery table as a TableRow object

I am trying to extract data from a BigQuery table using Google Cloud Dataflow.
My BigQuery table has few empty values(for String datatype) and null (for Numeric data types).
When I try to extract the data in dataflow using BigQueryIO.readTableRows().fromQuery(select * from table_name), I don't see the columns with null values.
How can I achieve this to get all the columns as part of the TableRow object?
Any help is appreciated
I believe this is the current behavior BigQueryIO connector. Null values are ignored in the resulting element. But empty string values should be available. Can you just assume values that are not available in the resulting element to be null ?

Why array values appear in impala but not hive?

I have a column defined as array in my table (HIVE) .
create external table rule
id string,
names array<string>
ROW FORMAT DELIMITED
COLLECTION ITEMS TERMINATED BY '|'stored as parquet
location 'hdfs://folder'
Exemple of value in names : Joe|Jimmy
As i query the table in Impala, i retrieve the data but in hive i only have NULL. Why this behavior? I would even understand the inverse.
I found the answer. the data was written from a spark job in string instead of array.

How to access individul elements of a blob in dynamoDb using a hive script?

I am transferring data from DynamoDB to S3 using a hive script in AWS Data Pipeline. I am using a script like this :
CREATE EXTERNAL TABLE dynamodb_table ( PROPERTIES STRING, EMAIL
STRING, ............. ) STORED BY
'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler' TBLPROPERTIES
("dynamodb.table.name" = "${DYNAMODB_INPUT_TABLE}",
"dynamodb.column.mapping" =
"PROPERTIES:Properties,EMAIL:EmailId....");
CREATE EXTERNAL TABLE s3_table (
PROPERTIES STRING,
EMAIL STRING,
......
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY'\n'
LOCATION '${S3_OUTPUT_BUCKET}';
INSERT OVERWRITE TABLE s3_table SELECT * FROM dynamodb_table;
The Properties column in DyanmoDB table is like this
Properties : String
:{\"deal\":null,\"MinType\":null,\"discount\":null}
that is it contains multiple attributes in it. I want each attribute in Properties to come as a separate column (not just a string in a single column). I want the output in this schema
deal MinType discount EMAIL
How can I do this?
Is your Properties column in proper JSON format? If so, it looks like you can - https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-get_json_object

extract value from xml data column oracle sql

From the database there is a column which contains xml data in the format like:
<old_template_code> something </old_template_code><old_template_name> new
code</old_template_name><new_template_code>BEVA24M</new_template_code>
How can I extract the values from the xml and make a column for each of the different xml values? For example I would like to do something like:
select EXTRACTVALUE(table.column, table.column)
but the latter doesn't work for me.