How to dynamically convert source folder structure to delta table partition using ADF data flow? - azure-data-factory-2

Source (CSV) - In myfile.csv I have country column that needs to be created as partition column in target side.
/raw/myfolder/myfile.csv
Target (Delta)
/raw/myfolder/country=<value>/delta_log
-----------------------------/part*.parquet
This seems possible using optimize partitions to use key partition type and this needs to be done MANUALLY. However, in my case I want to pass partition column (in this case country) as a parameter dynamically.

Since the key column has to be part of the schema before runtime, and schema can only be dynamically inferred at runtime, I don't think you can do this with partitioning by keys
You can, however, pass the key column in from the executing pipeline and create the folder structure that way, since you can specify the folder as a parameter.

Related

Storing SQL metadata

I have a generated SQL table which fits nicely in rows/columns, but I'd also like to store some information about that table such as:
When was it generated?
Who generated it?
What script was run to generate it?
Where did the data come from?
A list of key-pair values which didn't fit in the table and really only describe more information about the source
I'm a little new to SQL, but how would one normally store this information?
I would image that after I CREATE TABLE foo, I would need to then CREATE TABLE foo_meta, then each question will be a column and I'll need to INSERT just one row: the answers to those questions. Is that a normal way to handle this?
Some clarifications,
I'm using SQLite3
Each table represents recorded time-history data where each column is a parameter and each row is a time-step.
Each table will have associated metadata
The meta-data contains things like initial conditions and other conditions which weren't recorded in the time-history.
I will add a bunch of test-results to the same database. Each test will have a data table (time-history) and a meta-data table.

Table within a SQL table

I have used this type of data storage in a VBA bsed application, storing a record, where one of the fields was an array. Is this possible on a SQL table?
eg. I need to store data relating to customers and their assets. Each client has their own list of assets. I would use a second joined table, but then each customer would require their own new table.
Is it possible to store this in an array within the original table?
If RDMS, you can either use
1) XML data type or JSON to maintain the asset information against each client
2) OR create a separate table for assets and link to the client table.
It depends on individual use case and context.
1) if you are having only 1 or 2 assets to be maintained against each client. then XML/ JSON would be ideal.
2) if you maintaining high volumes of assets against each client, then the creation of separate table is ideal.
3)If you are unsure of the volume, then a separate table for an asset is ideal.

Update column type in a table from Boolean to string

I want to update an empty column from type Boolean to string in big-query.
How can i do it without overwrite the table and load all the data?
thanks!
You can only add new fields at the end of the table. On the old columns, you have the option to change required to nullable. So what you want is not possible, only if you add a new field, or as you say completely overwriting the table.
There are two table operations Update and Patch.
You need to use the Update command, to add new columns to your schema.
Important side notes:
order is important. If you change the ordering, it will look like an incompatible schema.
you can only add new fields at the end of the table. On the old columns, you have the option to change required to nullable.
you cannot add a required field to an existing schema.
you cannot remove old fields, once a table's schema has been specified you cannot change it without first deleting all the of the data associated with it. If you want to change a table's schema, you must specify a writeDisposition of WRITE_TRUNCATE. For more information, see the Jobs resource.
Here is an example of a curl session that adds fields to a schema. It should be relatively easy to adapt to Java. It uses auth.py from here
When using Table.Update(), you must include the full table schema again. If you don't provide an exact matching schema you can get: Provided Schema does not match Table. For example I didn't paid attention to details and in one of my update calls I didn't include an old field like created and it failed.

Update liquibase data to uppercase

I have a table in liquibase. I want to retrieve "name" field for a particular "type" from that table and then convert name to upper case. How can I use changeset or ServletContextListener to do the same?
For example:
My table structure is table1{ id, name, type} and i retrieve names associated to a given type. Now I want to convert these names (can be more than one) to upper case before my data is actually populated in the database.
This is more of an ETL task than a schema task. You could use Liquibase for it, but it would probably require custom code. If it is just a small set of things you might be able to do it using custom SQL.

Is it possibly to specify Postgres column storage type at table creation?

Can one specify the EXTENDED storage type for a column as part of CREATE TABLE? If yes, would it also work under Postgres 8.1?
I don't see a way to do this except through ALTER TABLE, which seems weird for something that really belongs with the rest of the table definition.
I don' think you can.
http://www.postgresql.org/docs/9.1/static/storage-toast.html
"Each TOAST-able data type specifies a default strategy for columns of that data type, but the strategy for a given table column can be altered with ALTER TABLE SET STORAGE."