Google BigQuery connector (Connect Data Studio to BigQuery tables) - I would like to modify this connector to customize for my special requirements - google-bigquery

I need to modify the Google Data Studio - Google BigQuery Connector for the customized requirements.
https://support.google.com/datastudio/answer/6370296
First Question: How could I find the source code for this data connector?
Second question:
According to the guide, https://developers.google.com/datastudio/connector/reference, getData(),
Returns the tabular data for the given request.
And the response is in this format
{
"schema":[
{
"name":"OpportunityName",
"dataType":"STRING"
},
{
"name":"IsVerified",
"dataType":"BOOLEAN"
},
{
"name":"Created",
"dataType":"STRING"
},
{
"name":"Amount",
"dataType":"NUMBER"
}
],
"rows":[
{
"values":[
"Interesting",
true,
"2017-05-23",
"120453.65"
]
},
{
"values":[
"SF",
false,
"2017-03-03",
"362705286.92"
]
},
{
"values":[
"Spring Sale",
true,
"2017-04-21",
"870.12"
]
}
],
"cachedData":true
}
But BigQuery could have 100 millions records in the table. We don't care that it could be 100 millions records, we just give the response in this format anyway?
Thanks!

The existing DS-BQ connector is not open source, hence you won't be able to modify its behavior.
With that said:
The DS-BQ connector has a "smarter" API contract than the open one - queries and filters will be passed down.
Feel free to create your own DS-BQ connector with whatever logic you might require! Community connectors would love your contributions.

Related

How do I restructure a json in YAML?

I would like to send data from an API to a BigQuery table with Google Workflows (YAML format).
But the API response that I want to send to BigQuery table does not match the "insertAll" BigQuery connector feature.
main:
params: [input]
steps:
- retrieveMatomoData:
call: http.get
args:
url: https://.....
result: matomoData
- insertAll:
call: googleapis.bigquery.v2.tabledata.insertAll
args:
datasetId: myDatasetId
projectId: myProjectId
tableId: myTableId
body:
"rows": [
{
json: should be the full "matomoData" response
}
]
The response structure of the API I use is:
{
"body": [
{
…
},
{
…
}
]
(which is an array that corresponds to several rows to insert)
It does not match with the structure to insert rows in Bigquery:
"rows": [
{
json: …
},
json: …
]
Do you have any idea of how can I handle this?
While the Workflows syntax and standard library can perform simple data extraction and transformation, larger JSON transformations are likely unwieldy inside Workflows for now. I'd recommend using a Cloud Function with a JSON transformation library.

Is it possible to read google sheets *metadata* only with API key?

It is possible to read data from a sheet only with API key (without OAuth 2.0), but it seems that reading the developer metadata requires OAuth 2.0.
Is there some way to read the metadata from an app without asking the user to connect his google account?
You want to retrieve the developer metadata of the Spreadsheet using the API key.
You have already been able to get values from Spreadsheet using the API key.
If my understanding is correct, how about this answer? Please think of this as just one of several possible answers.
Issue and workaround:
Unfortunately, "REST Resource: spreadsheets.developerMetadata" in Sheets API cannot be used with the API key. In this case, OAuth2 is required as mentioned in your question. The developer metadata can be also retrieved by the method of spreadsheets.get in Sheets API. The developer metadata can be retrieved by the API key. And in this method, all developer metadata is retrieved. So when you want to search the developer metadata, please search it from the retrieved all developer metadata.
IMPORTANT POINTS:
In this case, please set the visibility of developer metadata to DOCUMENT. By this, the developer metadata can be retrieved by the API key. If the visibility is PROJECT, it cannot be retrieved with the API key. Please be careful this.
When you want to retrieve the developer metadata with the API key, please publicly share the Spreadsheet. By this, it can be retrieved with the API key. Please be careful this.
Sample situation 1:
As a sample situation, it supposes that it creates new Spreadsheet, and create new developer metadata to the Spreadsheet as the key of "sampleKey" and value of "sampleValue".
In this case, the sample request body of spreadsheets.batchUpdate is as follows.
{
"requests": [
{
"createDeveloperMetadata": {
"developerMetadata": {
"location": {
"spreadsheet": true
},
"metadataKey": "sampleKey",
"metadataValue": "sampleValue",
"visibility": "DOCUMENT"
}
}
}
]
}
Sample curl command:
When you retrieve the developer metadata from above sample Spreadsheet, please use the following curl command.
curl "https://sheets.googleapis.com/v4/spreadsheets/### spreadsheetId ###?key=### your API key ###&fields=developerMetadata"
In this case, fields=developerMetadata is used to make it easier to see the response value. Of course, you can also use * as fields.
In this case, when above endpoint is put to the browser, you can see the retrieved value, because of GET method.
Result:
{
"developerMetadata": [
{
"metadataId": 123456789,
"metadataKey": "sampleKey",
"metadataValue": "sampleValue",
"location": {
"locationType": "SPREADSHEET",
"spreadsheet": true
},
"visibility": "DOCUMENT"
}
]
}
Sample situation 2:
As other situation, it supposes that it creates new Spreadsheet, and create new developer metadata to the 1st column (column "A") as the key of "sampleKey" and value of "sampleValue".
In this case, the sample request body is as follows.
{
"requests": [
{
"createDeveloperMetadata": {
"developerMetadata": {
"location": {
"dimensionRange": {
"sheetId": 0,
"startIndex": 0,
"endIndex": 1,
"dimension": "COLUMNS"
}
},
"metadataKey": "sampleKey",
"metadataValue": "sampleValue",
"visibility": "DOCUMENT"
}
}
}
]
}
Sample curl command:
When you retrieve the developer metadata from above sample Spreadsheet, please use the following curl command.
curl "https://sheets.googleapis.com/v4/spreadsheets/### spreadsheetId ###?key=### your API key ###&fields=sheets(data(columnMetadata(developerMetadata)))"
In this case, sheets(data(columnMetadata(developerMetadata))) is used to make it easier to see the response value. Of course, you can also use * as fields.
Result:
{
"sheets": [
{
"data": [
{
"columnMetadata": [
{
"developerMetadata": [
{
"metadataId": 123456789,
"metadataKey": "sampleKey",
"metadataValue": "sampleValue",
"location": {
"locationType": "COLUMN",
"dimensionRange": {
"dimension": "COLUMNS",
"startIndex": 0,
"endIndex": 1
}
},
"visibility": "DOCUMENT"
}
]
},
{},
,
,
]
}
]
}
]
}
References:
Method: spreadsheets.developerMetadata.get
DeveloperMetadataVisibility
If I misunderstood your question and this was not the direction you want, I apologize.

Google analytics API v4 max results

Can someone please help me for Google analytic API V4:
how to pass: max-result parameter with this class:
Google_Service_AnalyticsReporting
I am unable to find relevant function to assign max-result parameter value.
Based on https://stackoverflow.com/a/38922925/1224827 , the parameter you're looking for is pageSize:
The correct name of the parameter you are looking for is: pageSize. The Reference Docs provide the full API specifications.
def get_report(analytics):
# Use the Analytics Service Object to query the Analytics Reporting API V4.
return analytics.reports().batchGet(
body={
'reportRequests': [
{
'viewId': VIEW_ID,
'pageSize': 10000,
'dateRanges': [{'startDate': '2016-04-01', 'endDate': '2016-08-09'}],
'dimensions': [{'name':'ga:date'},
{'name': 'ga:channelGrouping'}],
'metrics': [{'expression': 'ga:sessions'},
{'expression': 'ga:newUsers'},
{'expression': 'ga:goal15Completions'},
{'expression': 'ga:goal9Completions'},
{'expression': 'ga:goal10Completions'}]
}]
}
).execute()
Note: the API returns a maximum of 100,000 rows per request, no matter how many you ask for (according to the documentation). As you attempted max_results this tells me you are trying to migrate from the Core Reporting API V3, check out the Migration Guide - Pagination documentation to understand how to request the next 100,000 rows.
Stack Overflow extra tip. Include your error responses in your question, as it will likely improve your chances of someone being able to help.
You can use parameter page_size: 10000. Hope this helps.
I checked these docs but couldn't find any example for max-result
v3 doc https://developers.google.com/analytics/devguides/reporting/core/v3/reference#maxResults
v4 batchGet doc https://developers.google.com/analytics/devguides/reporting/core/v4/rest/v4/reports/batchGet
It would be great if someone shares the JSON example of max-result. I'm getting an error message when I add "start-index" : 1 and "max-results": 10
"Invalid JSON payload received. Unknown name \"start-index\" at 'report_requests[0]':
Cannot find field.\nInvalid JSON payload received. Unknown name \"max-results\" at
'report_requests[0]': Cannot find field.", {
Here is my JSON
{
"reportRequests": [
{
"viewId": "112211828",
"dateRanges": [
{
"startDate": "30daysAgo",
"endDate": "yesterday"
}
],
"metrics": [
{
"formattingType": "METRIC_TYPE_UNSPECIFIED",
"expression": "ga:searchUniques"
}
],
"dimensions": [
{
"name": "ga:searchKeyword"
}
],
"orderBys": [
{
"orderType": "VALUE",
"sortOrder": "DESCENDING",
"fieldName": "ga:searchUniques"
}
],
"samplingLevel": "DEFAULT",
"start-index" : 1,
"max-results": 10 // [Uptade] it should be "pageSize": 10
}
]
}
[UPDATE]
"pageSize": 10 is works instead of "max-results"

REST pattern create, update and delete same endpoint

I have a page where I list the books of a school. The user can update a book, add a new book or delete an existing book. All actions must be saved when the form is submitted.
How can i map a rest API for that? I could take advantage of the endpoints i already have.
UPDATE
PUT /schools/1/books
{
"books": [
{
"id": "1",
"name": "Book 1"
}
]
}
CREATE
POST /schools/1/books
{
"books": [
{
"name": "Book 2"
},
{
"name": "Book 3"
}
]
}
DELETE
DELETE /schools/1/books
{
"books": [
{
"id": 2
}
]
}
But I need everything to run on the same transaction, and wouldn't make sense to submit 3 requests.
I also thought of creating a new endpoint where I would create books that doesn't exists, update books that exists, and remove books that are not present on the request.
So if this school has Book 1 and Book 2, I could update Book 1, create New Book and remove Book 2 with:
PUT /schools/1/batch-books
{
"books": [
{
"id": "1",
"name": "Updated Book 1"
},
{
"name": "New Book"
}
]
}
Do you guys have other options?
I would separate things into different resources:
/books and /books/{id} for books. They gives book details and allow to manage them.
/schools and /schools/{id} for schools. They gives school details and allow to manage them.
/schools/{id}/books to associate books in schools. I mean books that are available within a school. This resource provides methods to manage a list of links to books.
Let me detail the last resource. In fact, this is related to hypermedia. In the following, I'll use JSON-LD but you're free to use other hypermedia tools.
A GET method will return the list of associated books:
GET /schools/1/books
[
{
"#id": "http://api.example.com/books/1895638109"
},
{
"#id": "http://api.example.com/books/8371023509"
}
]
You can notice that you can implement mechanisms to allow to get more details if needed. Leveraging the Prefer header seems to be a great approach (see the link below for more details).
In addition, you could provide the following methods:
POST to add a link to the school. The request payload would be: {"#id": "http://api.example.com/books/1895638109"}. The response should be a 201 status code.
DELETE to delete a specific link from a school. A query parameter could be used to specify which link to remove.
PATCH to allow to do several operations in one call and actually provide some batch processing. You can leverage at this level JSON-PATCH for the request processing. Within the response, you could describe what happens. There is no specification at this level so you're free to use what you want... Here is a sample for the request payload:
PATCH /schools/1/books/
[
{
"op": "add", "value": "http://api.example.com/books/1895638109"
},
{
"op": "remove", "path": "http://api.example.com/books/8371023509"
}
]
Reading the following links could give you some hints on the way to design such use case:
Implementing bulk updates within RESTful services: http://restlet.com/blog/2015/05/18/implementing-bulk-updates-within-restful-services/
On choosing a hypermedia type: http://sookocheff.com/post/api/on-choosing-a-hypermedia-format/
Creating Client-Optimized Resource Representations in APIs: http://www.freshblurbs.com/blog/2015/06/25/api-representations-prefer.html
Hope it helps you,
Thierry

RESTful API: Modelling a collection of resources that have access to another resource

I'm building a RESTful API that exposes my application's users as users.
My application also features 'documents' and each user has access to specific documents. I'm thinking the natural way to represent that is by exposing the accessible documents through users/{user-id}/documents.
However, from a usability perspective, it's important for my clients to be able to fetch (and modify) the users that have access to a specific document. Because of that I'm considering 'reversing' this representation to documents/{document-id}/users.
Do these (and especially the latter) seem like proper ways to model this relationship? If I do go with such a solution, how do I model 'granting access to a document'?
I'm leaning towards PUTing a pre-existing user (presumably acquired by GETing users) into documents/{document-id}/users/{user-id}. That seems unsatisfactory however, as I'll be doing an 'update' operation not to actually update the resource but to insert it into a collection. It is especially problematic in terms of semantics as I expect my server-side to ultimately not take into account the complete, sent user representation but rather only cross-reference the id with ids of pre-existing users in order to create an association.
On the other hand, I can't POST into documents/{document-id}/users as I'm not aiming at the creation of a new resource - I specifically don't want one to be created.
Am I doing it wrong?
The users don't really belong to the document resource, right? What you're really saying is these users have access to this document. So what should probably be returned from /documents/{document-id}/users is not a direct representation of the user entity, but instead some kind of representation of the user's permission to the entity. Perhaps inside of that representation is a link to the full user itself.
So, if you were returning the Collection+JSON media type, maybe you'd have something like:
{
"collection":
{
"version":"1.0",
"href":"/documents/document123",
"items":
[
{
"href":"/documents/document123/users/user3841",
"data": [
{ "name":"userName", "value":"John Doe", "prompt":"User Name" },
{ "name":"permissions", "value":["Read"], "prompt":"User Permissions" }
],
"links": [
{ "rel":"user", "href":"/users/3841" }
]
},
{
"href":"http//whatever/documents/document123/users/user9387",
"data": [
{ "name":"userName", "value":"John Doe", "prompt":"User Name" },
{ "name":"permissions", "value":["Read"], "prompt":"User Permissions" }
],
"links": [
{ "rel":"user", "href":"/users/9387" }
]
}
]
}
}