Export Jobs with Transformations in Pentaho Kettle - pentaho

I want to export all jobs in my ETL repository but singularly.
I mean that for every job I want to export his own XML (with transformation).
I read a guide on the official website WIKI FOR EXPORT XML but with that solution I can export all objects in the same file, export only jobs (but always in the same file), export only transformations (same as jobs), the other two options I don't know why cause errors.
However, I can achieve what I want from File -> Export -> To XML... and in this way I'll have the XML of the job opened in the Kettle main view.
I want to automate all this. How is it possible?

Related

BigQuery: Export to GCS option disappeared from BigQuery UI

The option to export to a GCS bucket has disappeared from the BigQuery UI and was replaced with "Export to Google Drive". It's a feature I used a lot for large results and using the export to Drive is not useful at all. It takes very long and I can't work the same way with the file in Drive than I would in GCS. Is there any way I can still export to GCS from BigQuery UI?
The "workaround" for BigQuery UI is to save result as a table (or just have destination table set for query) and after result is available in the table - just use "Export to GCS" option which is "still" available in both Classic and New BQ UI

Is there a way to import backups in NiFi?

Using NiFi v0.6.1 is there a way to import backups/archives?
And by backups I mean the files that are generated when you call
POST /controller/archive using the REST api or "Controller Settings" (tool bar button) and then "Back-up flow" (link).
I tried unzipping the backup and importing it as a template but that didn't work. But after comparing it to an exported template file, the formats are reasonably different. But perhaps there is a way to transform it into a template?
At the moment my current work around is to not select any components on the top level flow and then select "create template"; which will add a template with all my components. Then I just export that. My issue with this is it's a bit more tricky to automate via the REST API. I used Fiddler to determine what the UI is doing and it first generates a snippet that includes all the components (labels, processors, connections, etc.). Then it calls create template (POST /nifi-api/contorller/templates) using the snippet ID. So the template call is easy enough but generating the definition for the snippet is going to take some work.
Note: Once the following feature request is implemented I'm assuming I would just use that instead:
https://cwiki.apache.org/confluence/display/NIFI/Configuration+Management+of+Flows
The entire flow for a NiFi instance is stored in a file called flow.xml.gz in the conf directory (flow.xml.tar in a cluster). The back-up functionality is essentially taking a snapshot of that file at the given point in time and saving it to the conf/archive directory. At a later point in time you could stop NiFi and replace conf/flow.xml.gz with one of those back-ups to restore the flow to that state.
Templates are a different format from the flow.xml.gz. Templates are more public facing and shareable, and can be used to represent portions of a flow, or the entire flow if no components are selected. Some people have used templates as a model to deploy their flows, essentially organizing their flow into process groups and making template for each group. This project provides some automation to work with templates: https://github.com/aperepel/nifi-api-deploy
You just need to stop NiFi, replace the nifi flow configuration file (for example this could be flow.xml.gz in the conf directory) and start NiFi back up.
If you have trouble finding it check your nifi.properties file for the string nifi.flow.configuration.file= to find out what you've set this too.
If you are using clustered mode you need only do this on the NCM.

Can I use RavenDB during an Export? What happens to documents added?

If I've started a large export of data using smuggler, can I continue to use RavenDB while the export is happening?
If a document gets added during the export, will it get exported?
The documentation states you can continue to use RavenDB while importing. See https://ravendb.net/docs/article-page/1.0/csharp/server/administration/export-import. I cannot find anything about exporting though.
And I will be exporting for RavenDB server V1 (build 888).
You can use RavenDB during the export.
In current versions of RavenDB, the export will continue to run on a snapshot of the data. But will query the database again when it is done to get documents that were added during the export.
In build 888, however, you get the documents that were in the database when the export started, not the ones added during the export.

Exporting ERwin models to XML or .erwin files programmatically

I have a requirement to programmatically export models in ERwin data modeler. The exported files could be saved in a directory (on server or local machine). We also want the process to export only the models that were changed after previous export.
Anybody know how to do that?
Thanks in advance,
Vivek
The Erwin API can be used to programatically access the model.
you can write a program to step through the model, extract and format the information you want to export.
One of the model properties is the date the model was last updated. If your export program saves the date that it was last run, you could compare the 2.

Export Google Cloud Datastore and import to BigQuery programmatically

I'm looking for a method to export my Cloud Datastore and import it into BigQuery daily. The manual way is described at google page. I do not find a clean way to automate it.
There isn't a simple way to do this, but you can separate out the two parts: creating appengine backups and loading them into bigquery.
You can use scheduled backups to create datastore backups periodically (https://cloud.google.com/appengine/articles/scheduled_backups).
You can then use Apps Script to automate the BigQuery portion (https://developers.google.com/apps-script/advanced/bigquery#load_csv_data) or use an AppEngine cron to do the same thing.
As of last week there's a proper way to automate this. The most important part is gcloud beta datastore export.
I created a script around it: https://github.com/chees/datastore2bigquery
You could run this in a cron job.
See here for a demo of how it works: https://www.youtube.com/watch?v=dGyQCE3bWkU
Building on #Jordan's answer above, the steps to do this would be:
1) Make a storage bucket
2) Export datastore entities to this bucket
3) Open Big Query Web UI, and load using the Google Cloud file path.
Full tutorial with images is available at this post.
It is possible using the following code. It basically uses App Engine Cron jobs and BigQuery API.
https://github.com/wenzhe/appengine_datastore_bigquery