GoodData and AWS S3 Integration

GoodData and AWS S3 Integration - gooddata

I have two questions related to the situation, if I need to set up the GoodData-S3 Integration for all projects under a segment:
Will LCM help me to deploy the ADD Component across all the projects on that segment, OR do I need to deploy the ADD component separately on each of the project?
Do I need to specify the client_id for each project in the ADD Component, or will it use the client_id from the segment?

1) Will LCM help me to deploy the ADD Component across all the projects on that segment OR do I need to deploy the ADD component separately on each of the project?
If you have multiple workspaces to load data to and you have the workspaces organized into segments (see Set Up Automated Data Distribution v2 for Object Storage), you only need to deploy the ADD v2 to the service workspace.
2) Do I need to specify the client_id for each project in the ADD Component or will it use the client_id from the segment?
You need to make sure that the source files that you want to distribute separately per workspace has the x__client_id column. This column should contain the values corresponding to the client ID values of your client workspaces so then it can be specified accordingly.

Related

Specify the GCP project in tf.io.GFile()

Is there a way how to specify the GCP project for downloading some objects using the tf.io.gfile.GFile? I know it can be used like this:
import tensorflow as tf
with tf.io.gfile.GFile("gs://<bucket>/<path>") as f:
f.read()
but this does not have any parameter for project. I know you can select active project using the CLI tools, but I want to download data from different projects. Is it possible, or do I need to use some other GCS client? If so, which is the most compatible with TF and can be most easily used in tf.function?

Buckets are unique across projects, so although you see only buckets created as a part of a project on the https://console.cloud.google.com/storage/browser?project=<project>&prefix=&forceOnObjectsSortingFiltering=false page, you can query it regardless of the project, so as long as you have access there, you just can access the data without specifying project.

Is there a way to trigger a child plan in Bamboo and pass it information like a version number?

We're using Go.Cd and transitioning to Bamboo.
One of the features we use in Go.Cd is value stream maps. This enables triggering another pipeline and passing information (and build artifacts) to the downstream pipeline.
This is valuable when an upstream build has a particular version number, and you want to pass that version number to the downstream build.
I want to replicate this setup in Bamboo (without a plugin).
My question is: Is there a way to trigger a child plan in Bamboo and pass it information like a version number?

This has three steps.
Use a parent plan/child plan to setup the relationship.
Using the artifacts tab, setup shared artifacts to transfer files of one plan to another.
3a. At the end of the parent build, dump the environment variables to a file
env > env.txt
3b. Setup (using the artifacts tab) an artifact selector that picks this up.
3c. Setup a fetch for this artifact from the shared artifacts in the child plan.
3d. Using the Inject Variables task - read the env.txt file you have transferred over. Now your build number from the original pipeline is now available in this downstream pipeline. (Just like Go.Cd).

What is the difference and relationship between an Azure DevOps Build Definition, and a Pipeline?

I am trying to automate a process in Azure DevOps, using the REST API. I think it should go like this (at least, this is the current manual process):
fork repo
create pipeline(s) based using YAML files in newly forked repo
run pipelines in particular way
I am new to the Azure DevOps REST API and I am struggling to understand what I have done and what I should be doing.
Using the REST API, I seem to be able to create what I would call a pipeline, using the pipeline endpoint; I do notice that if I want to run it, I have to interact with its build definition instead.
Also, looking at code other colleagues have written, it seems (though I may be wrong) like they are able to achieve the same by simply creating a build definition, and not explicitly creating pipeline.
This lack of understanding is driving me bonkers so I am hoping someone can enlighten me!
Question
What is the difference, and relationship, between a Build Definition and a Pipeline?
Additional info, I am not interested in working with the older Release Pipelines and I have tried to find the answer among the Azure DevOps REST API docs, but to no avail.

If you want to create a pipeline you can do this using both of this. However, the difference is actually in terms of concept:
build definitions are part of first available flow which consist: build and release where build was responsible for building, testing and publishing artifact for later use in releases to deploy
pipeline are a new approach which leverage YAML designed process for building/testing/deploying code
More info you can find here - Whats the difference between a build pipeline and a release pipeline in Azure DevOps?
And for instance for this pipeline/build
https://dev.azure.com/thecodemanual/DevOps%20Manual/_build?definitionId=157
where definition id is 157
You will get reposnses in both endpoints:
https://dev.azure.com/{{organization}}/{{project}}/_apis/build/definitions/157?api-version=5.1
and
https://dev.azure.com/{{organization}}/{{project}}/_apis/pipelines/157?api-version=6.0-preview.1
and in that term pipeline id = build id

The pipelines endpoint is not very useful:
https://dev.azure.com/{Organization}/{ProjectName}/_apis/pipelines?api-version=6.0-preview.1
It will only give you a list of pipelines with very basic info such as name, ID, folder etc.
To create and update YAML pipelines you need to use the Build definitions endpoint. The IDs you use in the endpoint are the same IDs as the Pipelines endpoint uses.
Get definition, Get list, Create, Update:
https://dev.azure.com/{Organization}/{ProjectName}/_apis/build/definitions?api-version=6.0
(To create a working pipeline you must first Get an existing pipeline, modify the JSON you receive, then POST it as a new definition.)

Call a pipeline from a pipeline in Amazon Data Pipeline

My team at work is currently looking for a replacement for a rather expensive ETL tool that, at this point, we are using as a glorified scheduler. Any of the integrations offered by the ETL tool we have improved using our own python code, so I really just need its scheduling ability. One option we are looking at is Data Pipeline, which I am currently piloting.
My problem is thus: imagine we have two datasets to load - products and sales. Each of these datasets requires a number of steps to load (get source data, call a python script to transform, load to Redshift). However, product needs to be loaded before sales runs, as we need product cost, etc to calculate margin. Is it possible to have a "master" pipeline in Data Pipeline that calls products first, waits for its successful completion, and then calls sales? If so, how? I'm open to other product suggestions as well if Data Pipeline is not well-suited to this type of workflow. Appreciate the help

I think I can relate to this use case. Any how, Data Pipeline does not do this kind of dependency management on its own. It however can be simulated using file preconditions.
In this example, your child pipelines may depend on a file being present (as a precondition) before starting. A Master pipeline would create trigger files based on some logic executed in its activities. A child pipeline may create other trigger files that will start a subsequent pipeline downstream.
Another solution is to use Simple Workflow product . That has the features you are looking for - but would need custom coding using the Flow SDK.

This is a basic use case of datapipeline and should definitely be possible. You can use their graphical pipeline editor for creating this pipeline. Breaking down the problem:
There are are two datasets:
Product
Sales
Steps to load these datasets:
Get source data: Say from S3. For this, use S3DataNode
Call a python script to transform: Use ShellCommandActivity with staging. Data Pipeline does data staging implicitly for S3DataNodes attached to ShellCommandActivity. You can use them using special env variables provided: Details
Load output to Redshift: Use RedshiftDatabase
You will need to do add above components for each of the dataset you need to work with (product and sales in this case). For easy management, you can run these on an EC2 Instance.
Condition: 'product' needs to be loaded before 'sales' runs
Add dependsOn relationship. Add this field on ShellCommandActivity of Sales that refers to ShellCommandActivity of Product. See dependsOn field in documentation. It says: 'One or more references to other Activities that must reach the FINISHED state before this activity will start'.
Tip: In most cases, you would not want your next day execution to start while previous day execution is still active aka RUNNING. To avoid such a scenario, use 'maxActiveInstances' field and set it to '1'.

Adding specific references from a NuGet Package

I have created a package with a bunch of assemblies that we will provide to our users. I want our users to be able to pick and add only references they need from within the package to a project. The user should be able to add this package at a solution level and then pick the references to be added to each project from the package added. Is this possible with NuGet?
Example:
MyPackage - contain foo.dll, bar.dll, bla.dll
User installs package "MyPackage" to solution
Project 1 - select and add reference foo.dll, bar.dll
Project 2 - select and add reference bla.dll
Currently, every reference of the package is added to every project. This is not the desired setup. I want only the selected references added. Is there a way to do this with Nuget?

NuGet is not designed to work this way. Packages are whole delivery units. Our recommendation in this scenario would be to package the individual assemblies according to how you want them individually installable.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

GoodData and AWS S3 Integration - gooddata

Related

Specify the GCP project in tf.io.GFile()

Is there a way to trigger a child plan in Bamboo and pass it information like a version number?

What is the difference and relationship between an Azure DevOps Build Definition, and a Pipeline?

Call a pipeline from a pipeline in Amazon Data Pipeline

Adding specific references from a NuGet Package

Categories

Resources