I would like to send data from an API to a BigQuery table with Google Workflows (YAML format).
But the API response that I want to send to BigQuery table does not match the "insertAll" BigQuery connector feature.
main:
params: [input]
steps:
- retrieveMatomoData:
call: http.get
args:
url: https://.....
result: matomoData
- insertAll:
call: googleapis.bigquery.v2.tabledata.insertAll
args:
datasetId: myDatasetId
projectId: myProjectId
tableId: myTableId
body:
"rows": [
{
json: should be the full "matomoData" response
}
]
The response structure of the API I use is:
{
"body": [
{
…
},
{
…
}
]
(which is an array that corresponds to several rows to insert)
It does not match with the structure to insert rows in Bigquery:
"rows": [
{
json: …
},
json: …
]
Do you have any idea of how can I handle this?
While the Workflows syntax and standard library can perform simple data extraction and transformation, larger JSON transformations are likely unwieldy inside Workflows for now. I'd recommend using a Cloud Function with a JSON transformation library.
Related
I have an api link https://apilink.com?_fields=id,name,images which gives me the following format
[
{
"id": 229210,
"name": "Basic Electrical Knowledge",
"images": [
{
"id": 229211,
"date_created": "2023-01-13T18:34:39",
"date_created_gmt": "2023-01-13T07:34:39",
"date_modified": "2023-01-13T18:34:39",
"date_modified_gmt": "2023-01-13T07:34:39",
"src": "https://sampleSite.in/wp-content/uploads/2023/01/SomeUrlSource.jpg",
"name": "Basic Electrical Knowledge",
"alt": ""
}
]
}
]
I want to access only src from images[]. How do I retrieve this from the link. When clicking the link I want to display this:
[
{
"id": 229210,
"name": "Basic Electrical Knowledge",
"src": "https://sampleSite.in/wp-content/uploads/2023/01/SomeUrlSource.jpg"
}
]
How do I do this?
I tried to solve this by providing this parameters:
https://apilink.com?_fields=id,name,images=src
You can achieve this by making a GET request to the API link and then using a library such as JSON.parse() to parse the response and extract the necessary data. After that, you can use a for loop to iterate over the 'images' array in the response and extract the 'src' key from each object in the array. Finally, you can construct a new object with the desired format and return it.
fetch(https://apilink.com?_fields=id,name,images)
.then(response => response.json())
.then(data => {
let newData = []
data.forEach(item => {
let newItem = {
id: item.id,
name: item.name,
src: item.images[0].src
}
newData.push(newItem)
});
return newData;
})
.then(newData => {
console.log(newData);
});
Note that this code snippet is simplified and doesn't handle errors, it's only serve as an example of how you could do it.
Assuming that you can't make server-side changes, implement a little script, and want the result just manipulating the URI the response is no.
The URI is referring to a resource in the server, the _fields seem like a projection to make to the attributes of the desired resource.
In this case, you are trying to make a transformation on the resource given by the server through. If the server does not implement such functionality you must do it by yourself.
You want to transform the attribute images that has type [Object] to a String.
A code snippet like the answered by #RASIKA EKANAYAKA would fit your requirement.
It is possible to read data from a sheet only with API key (without OAuth 2.0), but it seems that reading the developer metadata requires OAuth 2.0.
Is there some way to read the metadata from an app without asking the user to connect his google account?
You want to retrieve the developer metadata of the Spreadsheet using the API key.
You have already been able to get values from Spreadsheet using the API key.
If my understanding is correct, how about this answer? Please think of this as just one of several possible answers.
Issue and workaround:
Unfortunately, "REST Resource: spreadsheets.developerMetadata" in Sheets API cannot be used with the API key. In this case, OAuth2 is required as mentioned in your question. The developer metadata can be also retrieved by the method of spreadsheets.get in Sheets API. The developer metadata can be retrieved by the API key. And in this method, all developer metadata is retrieved. So when you want to search the developer metadata, please search it from the retrieved all developer metadata.
IMPORTANT POINTS:
In this case, please set the visibility of developer metadata to DOCUMENT. By this, the developer metadata can be retrieved by the API key. If the visibility is PROJECT, it cannot be retrieved with the API key. Please be careful this.
When you want to retrieve the developer metadata with the API key, please publicly share the Spreadsheet. By this, it can be retrieved with the API key. Please be careful this.
Sample situation 1:
As a sample situation, it supposes that it creates new Spreadsheet, and create new developer metadata to the Spreadsheet as the key of "sampleKey" and value of "sampleValue".
In this case, the sample request body of spreadsheets.batchUpdate is as follows.
{
"requests": [
{
"createDeveloperMetadata": {
"developerMetadata": {
"location": {
"spreadsheet": true
},
"metadataKey": "sampleKey",
"metadataValue": "sampleValue",
"visibility": "DOCUMENT"
}
}
}
]
}
Sample curl command:
When you retrieve the developer metadata from above sample Spreadsheet, please use the following curl command.
curl "https://sheets.googleapis.com/v4/spreadsheets/### spreadsheetId ###?key=### your API key ###&fields=developerMetadata"
In this case, fields=developerMetadata is used to make it easier to see the response value. Of course, you can also use * as fields.
In this case, when above endpoint is put to the browser, you can see the retrieved value, because of GET method.
Result:
{
"developerMetadata": [
{
"metadataId": 123456789,
"metadataKey": "sampleKey",
"metadataValue": "sampleValue",
"location": {
"locationType": "SPREADSHEET",
"spreadsheet": true
},
"visibility": "DOCUMENT"
}
]
}
Sample situation 2:
As other situation, it supposes that it creates new Spreadsheet, and create new developer metadata to the 1st column (column "A") as the key of "sampleKey" and value of "sampleValue".
In this case, the sample request body is as follows.
{
"requests": [
{
"createDeveloperMetadata": {
"developerMetadata": {
"location": {
"dimensionRange": {
"sheetId": 0,
"startIndex": 0,
"endIndex": 1,
"dimension": "COLUMNS"
}
},
"metadataKey": "sampleKey",
"metadataValue": "sampleValue",
"visibility": "DOCUMENT"
}
}
}
]
}
Sample curl command:
When you retrieve the developer metadata from above sample Spreadsheet, please use the following curl command.
curl "https://sheets.googleapis.com/v4/spreadsheets/### spreadsheetId ###?key=### your API key ###&fields=sheets(data(columnMetadata(developerMetadata)))"
In this case, sheets(data(columnMetadata(developerMetadata))) is used to make it easier to see the response value. Of course, you can also use * as fields.
Result:
{
"sheets": [
{
"data": [
{
"columnMetadata": [
{
"developerMetadata": [
{
"metadataId": 123456789,
"metadataKey": "sampleKey",
"metadataValue": "sampleValue",
"location": {
"locationType": "COLUMN",
"dimensionRange": {
"dimension": "COLUMNS",
"startIndex": 0,
"endIndex": 1
}
},
"visibility": "DOCUMENT"
}
]
},
{},
,
,
]
}
]
}
]
}
References:
Method: spreadsheets.developerMetadata.get
DeveloperMetadataVisibility
If I misunderstood your question and this was not the direction you want, I apologize.
I need to modify the Google Data Studio - Google BigQuery Connector for the customized requirements.
https://support.google.com/datastudio/answer/6370296
First Question: How could I find the source code for this data connector?
Second question:
According to the guide, https://developers.google.com/datastudio/connector/reference, getData(),
Returns the tabular data for the given request.
And the response is in this format
{
"schema":[
{
"name":"OpportunityName",
"dataType":"STRING"
},
{
"name":"IsVerified",
"dataType":"BOOLEAN"
},
{
"name":"Created",
"dataType":"STRING"
},
{
"name":"Amount",
"dataType":"NUMBER"
}
],
"rows":[
{
"values":[
"Interesting",
true,
"2017-05-23",
"120453.65"
]
},
{
"values":[
"SF",
false,
"2017-03-03",
"362705286.92"
]
},
{
"values":[
"Spring Sale",
true,
"2017-04-21",
"870.12"
]
}
],
"cachedData":true
}
But BigQuery could have 100 millions records in the table. We don't care that it could be 100 millions records, we just give the response in this format anyway?
Thanks!
The existing DS-BQ connector is not open source, hence you won't be able to modify its behavior.
With that said:
The DS-BQ connector has a "smarter" API contract than the open one - queries and filters will be passed down.
Feel free to create your own DS-BQ connector with whatever logic you might require! Community connectors would love your contributions.
Can someone please help me for Google analytic API V4:
how to pass: max-result parameter with this class:
Google_Service_AnalyticsReporting
I am unable to find relevant function to assign max-result parameter value.
Based on https://stackoverflow.com/a/38922925/1224827 , the parameter you're looking for is pageSize:
The correct name of the parameter you are looking for is: pageSize. The Reference Docs provide the full API specifications.
def get_report(analytics):
# Use the Analytics Service Object to query the Analytics Reporting API V4.
return analytics.reports().batchGet(
body={
'reportRequests': [
{
'viewId': VIEW_ID,
'pageSize': 10000,
'dateRanges': [{'startDate': '2016-04-01', 'endDate': '2016-08-09'}],
'dimensions': [{'name':'ga:date'},
{'name': 'ga:channelGrouping'}],
'metrics': [{'expression': 'ga:sessions'},
{'expression': 'ga:newUsers'},
{'expression': 'ga:goal15Completions'},
{'expression': 'ga:goal9Completions'},
{'expression': 'ga:goal10Completions'}]
}]
}
).execute()
Note: the API returns a maximum of 100,000 rows per request, no matter how many you ask for (according to the documentation). As you attempted max_results this tells me you are trying to migrate from the Core Reporting API V3, check out the Migration Guide - Pagination documentation to understand how to request the next 100,000 rows.
Stack Overflow extra tip. Include your error responses in your question, as it will likely improve your chances of someone being able to help.
You can use parameter page_size: 10000. Hope this helps.
I checked these docs but couldn't find any example for max-result
v3 doc https://developers.google.com/analytics/devguides/reporting/core/v3/reference#maxResults
v4 batchGet doc https://developers.google.com/analytics/devguides/reporting/core/v4/rest/v4/reports/batchGet
It would be great if someone shares the JSON example of max-result. I'm getting an error message when I add "start-index" : 1 and "max-results": 10
"Invalid JSON payload received. Unknown name \"start-index\" at 'report_requests[0]':
Cannot find field.\nInvalid JSON payload received. Unknown name \"max-results\" at
'report_requests[0]': Cannot find field.", {
Here is my JSON
{
"reportRequests": [
{
"viewId": "112211828",
"dateRanges": [
{
"startDate": "30daysAgo",
"endDate": "yesterday"
}
],
"metrics": [
{
"formattingType": "METRIC_TYPE_UNSPECIFIED",
"expression": "ga:searchUniques"
}
],
"dimensions": [
{
"name": "ga:searchKeyword"
}
],
"orderBys": [
{
"orderType": "VALUE",
"sortOrder": "DESCENDING",
"fieldName": "ga:searchUniques"
}
],
"samplingLevel": "DEFAULT",
"start-index" : 1,
"max-results": 10 // [Uptade] it should be "pageSize": 10
}
]
}
[UPDATE]
"pageSize": 10 is works instead of "max-results"
I'm in the process of writing a node wrapper for RavenDB.
I'm using version 3 but as there are no HTTP docs for it, I've been relying on the 2.0 and 2.5 docs.
In regards to single document operations, I've used this doc page successfully for PUTs, DELETEs and multiple PATCHs to individual documents.
Similarly, I've used this doc page successfully for multiple PUTs and DELETEs of several documents in one HTTP call but the docs are a bit vague in regards to PATCHing mutliple documents in one call.
Under the "Batching Requests" heading, it clearly states it's possible:
Request batching in RavenDB is handled using the '/bulk_docs' endpoint, which accepts an array of operations to execute. The format for the operations is:
method - PUT, PATCH or DELETE.
...
For PUTs, I POST to /bulk_docs:
[
{
Method: 'PUT',
Key: 'users/1',
Document: { username: 'dummy' }
Metadata: { 'Raven-Entity-Type': 'Users' }
},
...
]
For DELETEs, I POST to /bulk_docs:
[
{
Method: 'DELETE',
Key: 'users/1'
},
...
]
For PATCHs, I've tried POSTing the following without any luck:
[
{
Method: 'PATCH',
Key: 'users/1',
Document: {
Type: 'Set',
Name:'username',
Value: 'new-username'
}
},
...
]
and
[
{
Method: 'PATCH',
Key: 'users/1',
Type: 'Set',
Name:'username',
Value: 'new-username'
},
...
]
All I'm getting back is 500 - Internal Server Error and without any examples of PATCHing multiple documents on that docs page I'm kind of stuck...
Any help would be appreciated :)
The structure for PATCH is :
[
{
Method: 'PATCH',
Key: 'users/1',
Patches: [{
Type: 'Set',
Name:'username',
Value: 'new-username'
}]
},
...
]
The full structure can be see here:
https://github.com/ayende/ravendb/blob/master/Raven.Abstractions/Commands/PatchCommandData.cs#L72