Is it possible to retrieve an extended or full query history in google bigquery? - google-bigquery

I recently discovered I accidentally deleted a table from BigQuery, which was constructed by querying other tables (which I still have).
As table deletes in BigQuery are permanent (right?), I would like to reconstruct the lost table, preferably without rewriting the query.
The Query History in the webUI obviously only displays a limited number of queries. The one I am looking for is unfortunately not in that list.
So my question is, is it possible to somehow recover the queries which have disappeared from the query history?
(I do know the creation date of the query I am looking for)

In the CLI, you can run bq ls -j -a to retrieve jobs for all users in a project.
Then you can run for each job id a bq show -j <job_id> and in order to have more details you will choose to use the json response:
bq show --format=prettyjson -j job_joQEqPwOiOoBlOhDBEgKxQAlKJQ
this returns the following format which have your query, your user and bytesprocessed etc...
{
"configuration": {
"dryRun": false,
"query": {
"createDisposition": "CREATE_IF_NEEDED",
"destinationTable": {
"datasetId": "",
"projectId": "",
"tableId": ""
},
"query": "",
"writeDisposition": "WRITE_TRUNCATE"
}
},
"etag": "",
"id": "",
"jobReference": {
"jobId": "",
"projectId": ""
},
"kind": "bigquery#job",
"selfLink": "",
"statistics": {
"creationTime": "1435006022346",
"endTime": "1435006144730",
"query": {
"cacheHit": false,
"totalBytesProcessed": "105922683030"
},
"startTime": "1435006023171",
"totalBytesProcessed": "105922683030"
},
"status": {
"state": "DONE"
},
"user_email": ""
}
Using the API you need to pass allUsers property to list jobs from all users https://cloud.google.com/bigquery/docs/reference/v2/jobs/list#allUsers

Using the JobID, you can query for a specific job (documented here). This will give you a Jobs resource, which will contain your query.
If you don't know the JobID... it depends on how the query was ran I assume. It's possibly logged by the App Engine (if you ran it via code) in the Logs section of the Developer console. You could also take a look at the Jobs List (credit to the OP for that one) and look in there for your recent jobs. From the list you get Jobs Resources as well, so they will contain all you need.

This can be done using stackdriver audit logs. Here is more info.
Even if you haven't set up the stackdriver logging you can still try to find your query. BigQuery logs are enabled by default and you can retrieve those from stackdriver within 30 days.

Related

How to filter response with query parameters on POST methods on Microsoft Graph API?

I am attempting to make a simple room booking application within my office. Users can select a time frame, see the available rooms, and book the room (create an event in their calendar in that time frame in that room).
In order to see what rooms are available, I am attempting to use the Microsoft Graph REST API, and specifically the POST method - getSchedule.
An example request for getSchedule looks like this
{
"schedules": ["adelev#contoso.onmicrosoft.com", "meganb#contoso.onmicrosoft.com"],
"startTime": {
"dateTime": "2019-03-15T09:00:00",
"timeZone": "Pacific Standard Time"
},
"endTime": {
"dateTime": "2019-03-15T18:00:00",
"timeZone": "Pacific Standard Time"
},
"availabilityViewInterval": "60"
}
I place all of the rooms in the office in the schedules list, and then can see their availabilities in the response based on the availability view.
"#odata.context": "https://graph.microsoft.com/v1.0/$metadata#Collection(microsoft.graph.scheduleInformation)",
"value": [
{
"scheduleId": "adelev#contoso.onmicrosoft.com",
"availabilityView": "000220000",
"scheduleItems": [
{
"isPrivate": false,
"status": "busy",
"subject": "Let's go for lunch",
"location": "Harry's Bar",
"start": {
"dateTime": "2019-03-15T12:00:00.0000000",
"timeZone": "Pacific Standard Time"
},
"end": {
"dateTime": "2019-03-15T14:00:00.0000000",
"timeZone": "Pacific Standard Time"
}
}
],
"workingHours": {
"daysOfWeek": [
"monday",
"tuesday",
"wednesday",
"thursday",
"friday"
],
"startTime": "08:00:00.0000000",
"endTime": "17:00:00.0000000",
"timeZone": {
"name": "Pacific Standard Time"
}
}
},
However, I don't need any of the other information provided in the response. I only want to see the scheduleId and the availabilityView, because the response takes forever to load with many rooms in the schedules request.
I've been looking at the available ways to filter a response through parameters in the POST request at: https://learn.microsoft.com/en-us/graph/query-parameters. However, any of the filters I seem to apply to my address do not seem to have any affect on the response.
I've tried
https://graph.microsoft.com/v1.0/me/calendar/getschedule?$select=availabilityView
for the request and other similar variants without any success. They all return the full JSON response.
It is a OData protocol limitation. Querying Data is only possible on GET requests as documented here.
Besides asking for less rooms to begin with. a shorter period or a bigger interval, I don't think there a way to get less data today.

Revoking default permission fails in new Office 365 Group

After a file is uploaded to an Office 365 Group’s OneDrive using the Graph API, we want to revoke the default permissions on the file. However, in groups which have only recently be created, this fails.
By default, a file's permissions are “GroupName Owners”, “GroupName Members” and “GroupName Visitors”. We want to remove these permissions and grant access to specific AD Security Groups.
After uploading a file we are seeing two different results when getting the default permissions (in preparation to delete them).
In one case, we get four permissions – the three listed above, plus a ‘special’ permission which has grantedTo with a user with id matching the group id. We have learned not to delete this permission, as we lose the ability to delete the other permissions.
Here, this ‘special’ permission is the first one listed:
{
"#odata.context": "https://graph.microsoft.com/V1.0/$metadata#drives('b%21Y25ow5oitkOvNToutf7LrYZ-y78P2jBEjoGLzb3oPqnw0a3YKFDwTobjTB4gYxKt')/root/permissions",
"value": [
{
"grantedTo": {
"user": {
"id": "273c2c33-8533-445d-ae65-4b63be296995",
"displayName": "SharePoint Tests"
}
},
"id": "Yzowby5jfGZlZGVyYXRlZGRpcmVjdG9yeWNsYWltcHJvdmlkZXJ8MjczYzJjMzMtODUzMy00NDVkLWFlNjUtNGI2M2JlMjk2OTk1X28",
"roles": [
"write"
]
},
{
"grantedTo": {
"user": {
"displayName": "SharePoint Tests Owners"
}
},
"id": "U2hhcmVQb2ludCBUZXN0cyBPd25lcnM",
"roles": [
"SP.Full Control",
"write"
]
},
{
"grantedTo": {
"user": {
"displayName": "SharePoint Tests Visitors"
}
},
"id": "U2hhcmVQb2ludCBUZXN0cyBWaXNpdG9ycw",
"roles": [
"read"
]
},
{
"grantedTo": {
"user": {
"displayName": "SharePoint Tests Members"
}
},
"id": "U2hhcmVQb2ludCBUZXN0cyBNZW1iZXJz",
"roles": [
"SP.Edit"
]
}
]
}
However, for a period after the group has been created, after uploading a file, we only get 3 permissions back – the special one mentioned above is missing. In this case, trying to delete the other permissions fail with an ‘unauthenticated’ error code. E.g.
DELETE https://graph.microsoft.com/V1.0/drives/b!zn7l0OHTmUa3lGABIbIGQIZ-y78P2jBEjoGLzb3oPqnw0a3YKFDwTobjTB4gYxKt/items/013LUA5IQEPURED3OSURAI27FBHDYLFQJP/permissions/U2FnZSAtIFBBUiBTZWN1cml0eSA0IE93bmVycw
We can still add permissions, just not revoke the default ones.
This condition seems to persist for all files created within a given Office 365 Unified Group until several minutes after it has been created.
Our only option at the moment looks to be to create a dummy file, and see if we get 3 or 4 permissions back (or just try deleting the default permissions). If we only get 3 try again after some time period. But this seems like a fragile hack, and adds significant time (several minutes) to our upload process.
Does anyone have any better suggestions, or an explanation of this behaviour?
Thanks
Peter, Groups files are stored in a SharePoint document library and hence permissions (owners & members) are inherited from the AzureAD and cannot be changed, see this documentation for more information: https://support.office.com/en-us/article/Learn-about-Office-365-groups-b565caa1-5c40-40ef-9915-60fdb2d97fa2?ui=en-US&rs=en-US&ad=US&fromAR=1
You can't break inheritance and please see these additional features we are rolling specific to the SharePoint document library: https://techcommunity.microsoft.com/t5/SharePoint/UPDATE-Create-Office-365-Groups-with-team-sites-from-SharePoint/m-p/48277

logging all BigQuery queries

Is it possible to have all BigQuery requests logged to a file in Cloud Storage (or even better into a BigQuery table)? It seems like the --apilog option available in bq is intended mainly for debugging purposes, but what I'd like to do is keep track of all queries, just like logging all access requests on a particular file in CloudStorage.
To be more specific, I don't just want to log my own queries, but (a) queries by all users within the same project, and optimally also (b) queries by anyone touching a table in a dataset that I own.
I know its late, but GCP in its latest releases introduced this new feature of Audit logs.
Refer this - Audit Logs BQ
In the CLI, you can run bq ls -j -a to retrieve jobs for all users in a project. You can redirect all output to a storage file.
Then you can run for each job id a bq show -j <job_id> and in order to have more details you will choose to use the json response:
bq show --format=prettyjson -j job_joQEqPwOiOoBlOhDBEgKxQAlKJQ
this returns the following format which have your query, your user and bytesprocessed etc...
{
"configuration": {
"dryRun": false,
"query": {
"createDisposition": "CREATE_IF_NEEDED",
"destinationTable": {
"datasetId": "",
"projectId": "",
"tableId": ""
},
"query": "",
"writeDisposition": "WRITE_TRUNCATE"
}
},
"etag": "",
"id": "",
"jobReference": {
"jobId": "",
"projectId": ""
},
"kind": "bigquery#job",
"selfLink": "",
"statistics": {
"creationTime": "1435006022346",
"endTime": "1435006144730",
"query": {
"cacheHit": false,
"totalBytesProcessed": "105922683030"
},
"startTime": "1435006023171",
"totalBytesProcessed": "105922683030"
},
"status": {
"state": "DONE"
},
"user_email": ""
}
Using the API you need to pass allUsers property to list jobs from all users https://cloud.google.com/bigquery/docs/reference/v2/jobs/list#allUsers
There's a better way to do this now with the INFORMATION_SCHEMA tables.
Here's a simple way to get all queries from a project in the last 90 days:
SELECT
job_id,
start_time,
user_email,
total_bytes_processed,
query
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 90 DAY)
AND CURRENT_TIMESTAMP()
AND job_type = "QUERY"
AND end_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 90 DAY) AND CURRENT_TIMESTAMP()
ORDER BY total_bytes_processed DESC
Full documentation can be found here: https://cloud.google.com/bigquery/docs/information-schema-jobs
BigQuery has the INFORMATION_SCHEMA.JOBS_BY_* view to retrieve real-time metadata about BigQuery jobs. This view contains currently running jobs, as well as the last 180 days of history of completed jobs.
For more info see Getting jobs metadata using INFORMATION_SCHEMA

AWS data pipeline activity with multiple inputs

As part of an Amazon AWS data pipeline, I have a hive activity using two unstaged S3 data nodes as input. What I want is to be able to set two script variables on the activity, each pointing to an input data node, but I can't get the syntax right. With the single input, I could write the following and it would work just fine:
INPUT_FOO=#{input.directoryPath}
When I add the second input, I run into a problem of how to reference them since they are now an array of inputs, as you can see in the pipeline definition below. Essentially, I want to achieve the following, but can't figure out the correct syntax:
INPUT_FOO=#{input[1].directoryPath}
INPUT_BAR=#{input[2].directoryPath}
Here's the activity portion of the pipeline definition:
{
"id": "ActivityId_7u1sR",
"input": [
{
"ref": "DataNodeId_iYnxf"
},
{
"ref": "DataNodeId_162Ka"
}
],
"schedule": {
"ref": "DefaultSchedule"
},
"scriptUri": "#{myS3ScriptLocation}calculate-results.q",
"name": "Perform Calculations",
"runsOn": {
"ref": "EmrClusterId_jHeiV"
},
"scriptVariable": [
"INPUT_SOURCE1=#{input[1].directoryPath}",
"OUTPUT=#{output.directoryPath}Results/",
"INPUT_SOURCE2=#{input[2].directoryPath}"
],
"output": {
"ref": "DataNodeId_2jY6v"
},
"type": "HiveActivity",
"stage": "false"
}
I plan to keep the tables unstaged and take care of table creation in the hive script so that it's easier to run each Hive activity in isolation as well as in the pipeline itself.
Here's the error I see when using array syntax:
Unable to resolve input[1].directoryPath for object ActivityId_7u1sR'
As it stands now, this scenario is not supported, but a feature request was added to support it in the future.

Is there any work around on fetching twitter conversations using latest Twitter REST API v1.1

I am working on a project where the conversation of a twitter user needs to be retrieved. For example i want to get all the replies of this tweet of BBC World Service. Using the REST API v1.1 i can get the timeline (tweet, re-tweet) of a twitter user. But i did not find any documentation/working work around on fetching replies of a specific tweet. Is there any work around on getting the replies of a specific tweet at all?
There is no API call to get replies to a specific tweet. You can, however, cheat!
Using the Search API you can construct a search query which is:
In reply to #bbcworldservice.
Occurred after the tweet was posted.
Optionally, before a specific date / time.
So, in this case, something like
https://api.twitter.com/1.1/search/tweets.json?
q=%23bbcworldservice&
since_id=489366839953489920&
count=100
You'll get a list of Tweets (up to 100). You will then need to search them for in_reply_to_status_id_str and see if it matches the status you're looking for.
The TwitterAPI v2 allows you to retrieve the entire conversation thread using just the conversation_id in search. (In v1.1 you had to write custom code to build it)
Replies to a given Tweet, as well as replies to those replies, are all included in the conversation stemming from the single original Tweet. Regardless of how many reply threads result, they will all share a common conversation_id to the original Tweet that sparked the conversation. Using the Twitter API v2, you have the ability to retrieve and reconstruct an entire conversation thread, so that you can better understand what is being said, and how conversations and ideas evolve.
Example:
curl --request GET \
--url 'https://api.twitter.com/2/tweets?ids=1225917697675886593&tweet.fields=author_id,conversation_id,created_at,in_reply_to_user_id,referenced_tweets&expansions=author_id,in_reply_to_user_id,referenced_tweets.id&user.fields=name,username' \
--header 'Authorization: Bearer $BEARER_TOKEN'
Response will be like
{
"data": [
{
"id": "1225917697675886593",
"text": "#TwitterEng",
"created_at": "2020-02-07T23:02:10.000Z",
"author_id": "2244994945",
"in_reply_to_user_id": "6844292",
"conversation_id": "1225912275971657728",
"referenced_tweets": [
{
"type": "quoted",
"id": "1200517737669378053"
},
{
"type": "replied_to",
"id": "1225912275971657728"
}
]
}
],
"includes": {
"users": [
{
"username": "TwitterDev",
"name": "Twitter Dev",
"id": "2244994945"
},
{
"username": "TwitterEng",
"name": "Twitter Engineering",
"id": "6844292"
}
],
"tweets": [
{
"id": "1200517737669378053",
"text": "| ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄|\n don't push \n to prod on \n Fridays \n|___________| \n(\\__/) ||\n(•ㅅ•) ||\n/   づ",
"created_at": "2019-11-29T20:51:47.000Z",
"author_id": "2244994945",
"conversation_id": "1200517737669378053"
},
{
"id": "1225912275971657728",
"text": "Note to self: Don't deploy on Fridays",
"created_at": "2020-02-07T22:40:37.000Z",
"author_id": "6844292",
"conversation_id": "1225912275971657728"
}
]
}
}
For more info checkout twitter API Conversation