How to Get Data Location Info of a BigQuery Dataset in script - google-bigquery

We are aware that when using bq mk command to create a dataset in BigQuery, we can use flag --data_location to specify which region we want table data under this dataset to be located in.
We are now wanting to set up a monitor so that whenever someone creates a dataset outside of our designated location, we can trig an alert to the dataset owner. In order to do this, we'll need a script that can automatically scan through all the datasets and get the location information. we looked at both api calls and bq command line tool commands, there's no clue with regarding to showing/inquirying data location of a dataset. Wondering if there's a way to accomplish our goal?

To get all your datasets in the current project:
bq ls -d --format=json
If you run
bq show --format=json <dataset_name>
you get back a JSON that contains the location key:
{
"kind":"bigquery#dataset",
"datasetReference":{
"projectId":"<edited>",
"datasetId":"wr_temp"
},
"creationTime":"1479393712602",
"access":[
{
"specialGroup":"projectWriters",
"role":"WRITER"
},
{
"specialGroup":"projectOwners",
"role":"OWNER"
},
{
"role":"OWNER",
"userByEmail":"<edited>"
},
{
"specialGroup":"projectReaders",
"role":"READER"
}
],
"defaultTableExpirationMs":"604800000",
"etag":"<edited>",
"location":"US",
"lastModifiedTime":"1479393712602",
"id":"<edited>",
"selfLink":"https://www.googleapis.com/bigquery/v2/projects/<edited>"
}
Also regarding API, if you run the dataset's GET call you get back the same JSON. https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/get#try-it

we looked at both api calls and bq command line tool commands, there's no clue with regarding to showing/inquirying data location of a dataset. Wondering if there's a way to accomplish our goal?
You can use API to accomplish this:
With Datasets: list API you can lists all datasets in the specified project
And then, with Datasets: get API you can returns the dataset specified by datasetID and check dataset's location property

Related

Export Data from Branch

I need to download data form a particular data source with required rows only.
But when I run this script it gives me entire data source with entire columns in response.
{
"branch_key": "MY_BRANCH_KEY",
"branch_secret": "MY_BRANCH_SECRET_KEY",
"export_date": "2018-12-02",
"custom_data": "eo_custom_event",
"dimensions": [
"last_attributed_touch_data_tilde_feature",
"last_attributed_touch_data_tilde_channel",
"last_attributed_touch_data_tilde_campaign",
"last_attributed_touch_data_plus_current_feature"
]
}
The Data Export API does not have an option to retrieve data based on dimensions and filters. You can check out our Query API which will let you add dimensions, filters, data sources, and aggregation rules. We also have a Query Recipe Book to help you build queries.

How to get all Storage ID to Authorize with VM ID..?

I want to authorize a storage with VMs. For that I need to have all the VM ID's for a storage and those I get using the following call:
https://[username]:[apikey]#api.softlayer.com/rest/v3/SoftLayer_Network_Storage_Iscsi/9653497/getAllowableVirtualGuests?objectMask=mask[id,fullyQualifiedDomainName]
This gives me all the VM ID's corresponding to 9653497 (storage/order ID). However, I need to have all those storage ID's (like 9653497) which are not assigned to any of the VM's ID. I am using below call to get all storage ID:
https://[username]:[apikey]#api.softlayer.com/rest/v3/SoftLayer_Account/getNetworkStorage?objectMask=mask[id,username,nasType,storageType, billingItem[description,location[id,longName]]]&objectFilter={"networkStorage":{"nasType":{"operation":"ISCSI"},"billingItem":{"description":{"operation":"Endurance Storage"}}}}
the data that you are using for the filter probably are wrong, try to call the get object method GET /SoftLayer_Network_Storage/9653497/getObject?objectMask=mask[nasType,billingItem[description]] and see if the values of the request are the same as of your objectFilter
The filter in your request, gets Block Storage("nasType":{"operation":"ISCSI"}), maybe you need the File Storage. We can remove it to get more "Endurance" items (Block and File).
Please try the following removing some filters:
https://[username]:[apikey]#api.softlayer.com/rest/v3/SoftLayer_Account/getNetworkStorage?objectMask=mask[id,username,nasType,storageType, billingItem[description,location[id,longName]]]&objectFilter={ "networkStorage": { "billingItem": { "description": { "operation": "Endurance Storage" } } } }
Method: GET
if we don't want to get only Endurance, we can remove that filter too.
But when trying to add some properties using objectMasks to SoftLayer_Account::getNetworkStorage like allowableVirtualGuests, that property is not present in SoftLayer_Network_Storage.
For that reason the unique way to get “getAllowableVirtualGuests” is using SoftLayer_Network_Storage::getAllowableVirtualGuests

Jenkins API: Get a list of jobs filtered by build parameter - What jobs have built this Git commit?

We are sending different parameters to our Jenkins jobs, among them are the Git commit SHA1. We want to get a list of jobs that used that parameter value (the Git SHA1 - which jobs ran this commit?).
The following URL will give us all builds:
http://jenkins.example.com/api/json?tree=jobs[name,builds[number,actions[parameters[name,value]]]]&pretty=true
It takes some time to render (6 seconds) and contains too many builds (5 MB of builds).
Sample output from that URL:
{
"jobs" : [
{
"name" : "Job name - Build",
"builds" : [
{
"actions" : [
{
"parameters" : [
{
"name" : "GIT_COMMIT_PARAM",
"value" : "5447e2f43ea44eb4168d6b32e1a7487a3fdf237f"
}
]
},
(...)
How can we use the Jenkins JSON API to list all jobs with a certain build parameter value?
Also been looking for this, and luckily i found an awesome gist
https://gist.github.com/justlaputa/5634984
To answer your question:
jenkins_url + /api/json?tree=jobs[name,color]
Using your example from above
http://jenkins.example.com/api/json?tree=jobs[name,color]
So it seems like all you need to do is remove the builds parameter from your original url, and you should be fine
How can we use the Jenkins JSON API to list all jobs with a certain build parameter value?
Not sure about JSON API, but you can use XML API and combine tree and xpath parameters:
http://jenkins_url/api/xml?tree=jobs[name,builds[actions[parameters[name,value]]]]&xpath=/hudson/job[build/action/parameter[name="GIT_COMMIT_PARAM"][value="5447e2f43ea44eb4168d6b32e1a7487a3fdf237f"]]/name&wrapper=job_names&pretty=true
Result sample:
<job_names>
<name>JOB1</name>
<name>JOB2</name>
<name>JOB3</name>
...
</job_names>
Note: job falls into this list if at least one it's build was built with desired parameter
It looks it isn't supported in JSON API, however if you can use XML API, it is possible to query via XPATH, see sample below
http://jenkins.example.com/api/xml?tree=jobs[name,builds[number,actions[parameters[name,value]]]]&exclude=hudson/job/build/action/parameter[value!=%275447e2f43ea44eb4168d6b32e1a7487a3fdf237f%27]
You may tune the better query string to fit for your needs.
credit to http://blog.dahanne.net/2014/04/02/using-jenkins-hudson-remote-api-to-check-jobs-status/
Here's the query for passing jobs only:
http://jenkinsURL/job/ProjectFolderName/api/xml?tree=jobs[name,color=blue]
Here's the query for failing jobs only:
http://jenkinsURL/job/ProjectFolderName/api/xml?tree=jobs[name,color=yellow]

Wsapi data store query

I am looking to get all projects under a selected project (i.e the entire child project branch ) using Wsapi data store query in Rally SDK 2.0rc1. Is it possible using a query to recursively get all child project names? or will I have to write a separate recursive function to get that information? If a separate recursive function is required, how should I populate that data into for example, a combo box? Do I need to create a separate data store and push the data from my recursive function in it and then link the Combobox's store to it?
Also, how to get the "current workspace name" (workspace that I am working in, inside Rally), in Rally SDK 2.0rc1 ?
Use the 'context' config option to specify which project level to start at and add 'projectScopeDown' to make sure child projects are returned. That would look something like this:
Ext.create('Rally.data.WsapiDataStore', {
limit : Infinity,
model : 'Project',
fetch : ['Name','ObjectID'],
context : {
project : '/project/' + PROJECT_OID,
projectScopeDown : true
}
}).load({
callback: function(store) {
//Use project store data here
}
});
To get your current context data, use: this.getContext().
var workspace = this.getContext().getWorkspace();
var project = this.getContext().getProject();
If you try exposing with console.log the this.getContext().getWorkspace() and this.getContext().getProject() you may understand better what is returned and what is required. In one of my cases I had to use this.getContext().getProject().project.
Using console debug statement is best way to figure what you need based on its usage.

Delete a field and its contents in all the records and recreate it with new mapping

I have a field field10 which got created by accident when I updated a particular record in my index. I want to remove this field from my index, all its contents and recreate it with the below mapping:
"mytype":{
"properties":{
"field10":{
"type":"string",
"index":"not_analyzed",
"include_in_all":"false",
"null_value":"null"
}
}
}
When I try to create this mapping using the Put Mapping API, I get an error: {"error":"MergeMappingException[Merge failed with failures {[mapper [field10] has different index values, mapper [field10] has different index_analyzer, mapper [field10] has different search_analyzer]}]","status":400}.
How do I change the mapping of this field? I don't want to reindex millions of records just for this small accident.
Thanks
AFAIK, you can't remove a single field and recreate it.
You can not either just modify a mapping and have everything reindexed automagicaly. Imagine that you don't store _source. How can Elasticsearch know what your data look like before it was indexed?
But, you can probably modify your mapping using a multifield with field10.field10 using the old mapping and field10.new with the new analyzer.
If you don't reindex, only new documents will have content in field10.new.
If you want to manage old documents, you have to:
Send again all your docs (it will update everything) - aka reindex (you can use scan & scroll API to get your old documents)
Try to update your docs with the Update API
You can probably try to run a query like:
curl -XPOST localhost:9200/crunchbase/person/1/_update -d '{
"script" : "ctx._source.field10 = ctx._source.field10"
}'
But, as you can see, you have to run it document by document and I think it will take more time than reindexing all with the Bulk API.
Does it help?