Google Data Fusion Salesforce to Bigquery Pipeline, automatic way of managing schema updates in Salesforce - google-bigquery

Hey I am trying to create some batch jobs that reads from a couple Salesforce Objects and pushes them to BQ. Every-time batch process runs it will truncate the table in BQ and push all the data in the SF object back into BQ. Is it possible for google data fusion to automatically detect changes in an object in Salesforce(like adding a new column or changing data types of a column) then be registered and pushed to BQ via google data fusion?

For SF side of the puzzle you could look into https://developer.salesforce.com/docs/atlas.en-us.api_rest.meta/api_rest/resources_describeGlobal.htm and If-Modified-Since header telling you if the definition of table(s) changed. That url is for all tables in the org or you run table-specific metadata describe calls with https://developer.salesforce.com/docs/atlas.en-us.api_rest.meta/api_rest/resources_sobject_describe.htm
But I can't tell you how to use it in your job.

You can use the provided answer of #eyescream to be the condition or the trigger for the update to BigQuery. You may push changes to BigQuery using the pre-built plugin Stream Source approach from Datafusion in which, as mentioned in this docmentation, it
tracks updates in Salesforce sObjects. Examples of sObjects are
opportunities, contacts, accounts, leads, any custom object, etc.
You may use this approach to automatically track changes and push them to BigQuery. You can also find the whole Salesforce Streaming Source configuration reference in this documentation as also redirected from google's official documentation.
However, if you want a more dynamic approach for your overall use case, you may also use the integration of BigQuery with Salesforce. However in this approach, you will need to build your own code in which you can also use #eyescream 's answer as the primary condition/trigger and then automatically push the update to your BigQuery schema.

Related

How to save API output data to a Dataset in Azure Data Factory

I'm currently working on a project in Azure Data Factory, which involves collecting data from a Dataset, using this data to make API calls, and thereafter taking the output of the calls, and posting them to another dataset.
In this way I wish to end up with a dataset containing various different data, that the API call returns to me.
My current difficulty with this is, that do not know how to make the "Web activity" (which I use to make the API Call) save its output to my dataset.
I have tried numerous different solutions found online, however none of them seem to work. I am not sure if the official documentation is outdated or if I'm misunderstanding parts of it. Below I've listed links to the solutions I've tried and failed:
Copy data from a REST source
Copy data from an HTTP source
(among others, including similar posts to mine.)
The current flow in my pipeline is, that a "Lookup" collects a list of variables named "User_ID". These user ID's are put in to a ForEach loop, which makes an API call with the "Web" activity, using each of the USER_ID's. And this is where in the pipeline I wish to implement an activity or other, that can post each of these Web activity outputs into my new dataset.
I've tried to use the "Copy data" activity, but all it seems to do, is copying data straight from one dataset to another, and not being able to manipulate the data (which i wish to do with my api call).
Does anyone have a solution to how this is done?
Thanks a lot in advance.
Not sure why you could not achieve this following Copy data from a REST endpoint. I tested the below which works fine. I used schema mapping feature of 'Copy data' activity.
For example, I used a sample API http://dummy.restapiexample.com/api/v1/employees as source and for my testing, I used CosmosDB as sink. Of course you can choose any other dataset as per your requirement.
Create 'Linked Service' for the REST API. For simplicity I do not have authentication for this API. Of course, you have that option if required.
Create 'Linked Service' for the target data store. In my case, it is CosmosDB.
Create Dataset for the REST API and link to the linked service created in #1.
Create Dataset for the Data store (in my case CosmosDB) and link to the linked service created in #2.
In the pipeline, add a 'Copy data' activity like below with source as the REST dataset created in #3 and sink as the dataset created in #4. Also, in my case I had to add schema mapping to select the employees array from the API output and map to each field in my datastore.
And voila, that's it. When I run the pipeline, it calls the REST API and saves the output in my DB with my desired mapping.

Retrieve Overwritten Saved Query in Big Query

I accidentally overwrote a saved project query in BQ with a completely unrelated query. I can't find any documentation about retrieving overwritten queries or about any sort of version control. Has anyone done this as well and recovered their query?
Unfortunately, "Saved Query" is UI internal feature (see How to access “Saved Queries” programmatically? and there is respective feature request REST API for Saved Queries), so we really have no way to manage / control this cases
Meantime you can use query history (either in UI or via respective API or in Stackdriver) to locate use of that query and recreate/re-save it again

How can I move data from BigQuery or DataPrep to Firestore?

I just cleaned up my firestore collection data using DataPrep and verified the data via BigQuery. I now want to move the data back to Firestore. Is there a way to do this?
I have used manual method of exporting to JSON and then uploading using a code provided by AngularFirebase. But It is not automated as there is a need to periodically cleanup this data.
I am looking for a process within Google Cloud console. Any help will be appreciated
This is not an answer, more like a partial answer. I could not add a comment as I don't have 50 reputation yet. I am in a similar boat but not entirely the same situation. My situation being that I want to use a subset of BigQuery data and add it to Firestore. My thinking is to do the following:
Use the BigQuery API to query the data periodically using BigQuery Jobs' Load (in your case) or Query (in my case)
Convert it to JSON in code
Use batch commit in Firestore's API to update the firestore database
This is my idea and I am not sure whether this will work, but I will you know more once I am done with this. Unless someone else has better insights to help me and the person asking this question

Backfill Google Analytics in BigQuery

I'm looking for a workaround on the following issue. Hope someone can help.
I'm unable to backfill data in the ga_sessions_ table in BigQuery through product linking in GA. e.g. partition ga_sessions_20180517 is missing
This specific view has already been linked before. Google documentation says that historical load is only done once per view (hence, the issue) (https://support.google.com/analytics/answer/3416092?hl=en)
Is there any way to work around it?
Kind regards,
Martijn
You can use Google Analytics Reporting API to get the data for that view. This method has lot of restrictions like sometimes the data is sampled/only 7 dimensions can be exported in one call, but at least you will be able to fetch your data in a partitioned manner.
Documentation hereDoc
If you need a lot of dimensions/metrics in hit level format, scitylana.com has a service that can provide this data historically.
If you have a clientId set in a custom dimension the data-quality is near perfect.
It also works without a clientId set.
You can get all history as available through the API.
You can get 100+ dimensions/metrics in one batch into BQ.

Getting DoubleClick data from Google Analytics exported to BigQuery

I have GA360 and I have exported raw Google Analytics data to BigQuery through the integration. I'm just wondering if the DCM dimensions and metrics are part of the export?
Ideally these link
I can't find them as part of the export, what would be the best way to access these dimensions for all my raw data? Core reporting API?
This Google Analytics documentation page shows the BigQuery Export schema for the Google Analytics data that is exported to BigQuery. As you will see, the DoubleClick Campaign Manager (DCM) data is not part of the BigQuery export, unlike the DoubleClick for Publishers (DFP), which does appear (these are all the hits.publisher.<DFP_METRIC> metrics).
Therefore, as explained by #BenP in the comments, you may be interested in the BigQuery Data Transfer service, which does have a feature for DoubleClick Campaign Manager Transfers. Unfortunately, I am not an expert in Google Analytics or DCM, and therefore cannot add much relevant info to the process of linking both sets of data, but maybe you can try combining them yourself and then post a new question for that matter if you do not success in doing so.