Creating metadata dynamically from a flat .csv file in CC - gooddata

I am having some difficulties on how to dynamically create a metadata, which need to be extracted from the header line of a flat .csv file in CC.
Usually, I manually define the metadata by select New Metadata --> Extract from flat file in CC. However the metadata of the file may changes with additional columns. Thus, I do not know the metadata of the file and I can not define it in this static approach.
It would be helpful if you could suggest a solution to create metadata dynamically and using this newly created metadata for connecting to other components. Perhaps an example graph file for demonstration would be great.
Thanks,
Andy

I have discovered this kind of solution.
You just have to fill in flat .csv filename into csv readers and writers.
MetaDataMaster.grf - runs the graphs below.
MetaDataCreator.grf - creates metadata according to csv header and
write it into meta_example.fmt file
MetaDataUser.grf - Reads csv according to created meta_example.fmt file - you can add there a reformat and use just some predefined fields.
You can run the 2nd and 3rd graph separately to test it.

Related

Convert bulk .xlsx files to .csv (UTF-8) in Pentaho

I am new to Pentaho. I am trying to build a transformation that can convert a bunch of .xlsx files to .csv (utf-8).
I tried Get file Names and Text File Output, but it saves a single file as csv and the content of that file is the file properties.
I also tried Microsoft Excel Input and Microsoft Excel Output and that did not work either.
Any help will be appreciated. TIA!
I have prepare a SOLUTION for you. I have made my solution full dynamic. For that reason solution is combination of 6 (transformation & job). You only need to define following 2 things:-
Source folder location
Destination folder location
Others will work dynamically.
Also, I have learn a lot with this solution.
Would you like to generate a separate CSV for each Excel file?
It is better to do it like this:
Using the Get File Names component, read the list of Excel files from the folder.
Then call Execute Transformation, and pass the name of the file.
Then a separate Transformation will be performed for each file, and a separate CSV will be generated in it for each Excel file.

Azure Data Factory 2 : How to split a file into multiple output files

I'm using Azure Data Factory and am looking for the complement to the "Lookup" activity. Basically I want to be able to write a single line to a file.
Here's the setup:
Read from a CSV file in blob store using a Lookup activity
Connect the output of that to a For Each
within the For Each, take each record (a line from the file read by the Lookup activity) and write it to a distinct file, named dynamically.
Any clues on how to accomplish that?
Use Data flow, use the derived column activity to create a filename column. Use the filename column in sink. Details on how to implement dynamic filenames in ADF is describe here: https://kromerbigdata.com/2019/04/05/dynamic-file-names-in-adf-with-mapping-data-flows/
Data Flow would probably be better for this, but as a quick hack, you can do the following to read the text file line by line in a pipeline:
Define your source dataset to output a line as a single column. Normally I would use "NoDelimiter" for this, but that isn't supported by Lookup. As a workaround, define it with an incorrect Column Delimiter (like | or \t for a CSV file). You should also go to the Schema tab, and CLEAR the schema. This will generate a column in the output named "Prop_0".
In the foreach activity, set the Items to the Lookup's "output.value" and check "Sequential".
Inside the foreach, you can use item().Prop_0 to grab the text of the line:
To the best of my understanding, creating a blob isn't directly supported by pipelines [hence my suggestion above to look into Data Flow]. It is, however, very simple to do in Logic Apps. If I was tackling this problem, I would create a logic app with an HTTP Request Received trigger, then call it from ADF with a Web activity and send the text line and dynamic file name in the payload.

Capture Multiple File Names into a column on SSIS

Hope you can help me.
I am looking to retrieve multiple file-names into a column from files I wish to upload onto SQL in a custom column on SSIS.
I am aware of using Advanced Editor on the Flat File Source in Component Properties > FileNameColumnName. However, I am not sure how to make sure it picks up all the file names or what to enter in this field?
I can upload the files and all the data it holds but not the filename into a column.

How to create format files using bcp from flat files

I want to use a format file to help import a comma delimited file using bulk insert. I want to know how you generate format files from a flat file source. The microsoft guidance on this subjects makes it seem as though you can only generate a format file from a SQL table. But I want it to look at text file and tell me what the delimiters are in that file.
Surely this is possible.
Thanks
The format file can, and usually does include more than just delimiters. It also frequently includes column data types, which is why it can only be automatically generated from the Table or view the data is being retrieved from.
If you need to find the delimiters in a flat file, I'm sure there are a number of ways to create a script that could accomplish that, as well as creating a format file.

Kettle - Read multiple files from a folder

I'm trying to read multiple XML files from a folder, to compile all the data they have (all of them have the same XML structure), and than save that data in a CSV file.
I already have a 'read-files' Transformation with the steps: Get File Names and Copy Rows to Result, to get all the XML files. (it's working - I print a file with all the files names)
Then, I enter in a 'for-each-file' Job which has a Transformation with the Get Rows from Result Step, and then another Job to process those files.
I think I'm loosing information from the 'read-files' Transformation to the Transformation in the 'for-each-file' Job which Get all the rows. (I print another file with all the files names, but it is empty)
Can you tell me if I'm thinking in the right way? I have to set some variables, or some option that is disabled? Thanks.
Here is an example of "How to process a Kettle transformation once per filename"
http://www.timbert.net/doku.php?id=techie:kettle:jobs:processtransonceperfile