I have Stackdriver logs dating back to Sept. 3rd, and a sink I created on Sept. 14th pulling those logs into a Bigquery dataset. Currently, the data in Bigquery starts only from when I created the sink. Can I export previous logs to a giant .csv and then reupload? I found a similar question here, but with no answer.
Thanks, and sorry for not being more technical with my question -- I am new to Stackdriver logging!
As of late 2021, an alpha feature known as copy logs is available, which will allow you to dump older logs into a GCS storage bucket, and from there, it's a short trip back into BigQuery.
As a caveat, this must be done via shell, and as an alpha feature, no guarantees or SLAs are made.
Prior to this feature, you would have been out of luck. Downloading old logs only allowed 10K logs per request.
Since exporting happens for new log entries only, you cannot export log entries that Logging received before your sink was created. Please refer this Documentation
Related
I am currently exporting logs from Stackdriver to BigQuery using sinks. But i am only interessted in the jsonPayload. I would like to ignore pretty much everything else.
But since the table creation and data insertion happens automatically, i could not do this.
Is there a way to preprocess data coming from sink to store only what matters?
If the answer is no, is there a way to run a cron job each day to copy yesterday data into a seperate table and then remove it? (knowing that the tables are named using timestamps which makes it possible to query them by day)
As far as I know both options mentioned are currently not possible in the GCP platform. On my end I've also tried to create an internal reproduction of your request and noticed that there isn't a way to solely filter the jsonPayload.
I would therefore suggest creating a feature request in regards to your ask on the following public issue tracker link. Note that feature requests do not have an ETA as to when they'll processed or if they'll be implemented.
To track BQ usage we created a new dataset and configured it in Billing export. But after waiting for a day also the dataset seems to be empty as no new tables is created.
Is there any other setup needs to be done for this to work.
Refer this link,
https://cloud.google.com/billing/docs/how-to/export-data-bigquery
Thanks and regards,
Gour
You just need to follow the How to enable billing export to BigQuery steps to begin using this functionality. Keep in mind that you have to wait certain time to start seeing your data, as mentioned in the Export Billing Data to BigQuery documentation.
After you enable BigQuery export, it might take a few hours to start seeing your data. Billing data automatically exports your data to BigQuery in regular intervals, but the frequency of updates in BigQuery varies depending on the services you're using.
In case you continue having this issue, I recommend you to take a look the Issue Tracker tool that you can use to raise a BigQuery ticket in order to verify this scenario with the Google Technical Support Team. Since this is an automated process, you might need some of their help to review your Project's internal configuration.
I am new to Google Cloud Bigtable and have a very basic question as to whether the cloud offering protects my data against user error or application corruption? I see a lot of mention on the Google website that the data is safe and protected but not clear if the scenario above is covered because I did not see references to how I can go about restoring data from a previous point-in-time copy. I am sure someone on this forum knows!
Updated 7/24/2020: Bigtable now supports both backups and replication.
Currently we create backups to protect against catastrophic events and provide for disaster recovery.
As of February 2017, Cloud Bigtable does not provide backups from user errors or application bugs at this time. We hope to make this feature available in a future release - there is no planned delivery date at this time. In the meantime you may make your own snapshots using HBase or a similar process.
In addition to Google's disaster protection #Greg Dubicki mentioned, at Egnyte we backup our mission-critical Bigtable data into GCS, as Hadoop sequence files, using a couple Python wrappers for the Bigtable HBase shaded jar.
This provides for a quick recovery, fully under our control (ie. no need to wait for Google support to recover data on demand) in case our BT cluster failed or if an error on our software/admin side corrupted the data. A usefull side-effect is access to historical BT data for debugging.
Last week I wrote about that on Egnyte's engineering blog: https://medium.com/egnyte-engineering/bigtable-backup-for-disaster-recovery-9eeb5ea8e0fb. And we are thinking about open-sourcing this. We'll see how it goes.
UPDATE: On Thu Feb 20 I have published the scripts on Egnyte’s GitHub, under MIT license - https://github.com/egnyte/bigtable-backup-and-restore.
As of February 2020, Cloud Bigtable does provide backups, but only vaguely described as:
(...) we [do] create backups of your data to protect against catastrophic events and provide for disaster recovery.
Source
We're using the BigQuery streaming API, and we have been for some time now. We noticed that about 4:05am UTC (June 18th) BigQuery was no longer reporting any new data being streamed in. We checked all our logs, and everything looks good, and we're even getting back 200's from the insertAll() request.
As a test, we created a table, and used the online insertAll() 'Test it!' webpage. Again, everything looks good, but the data is not showing up in BigQuery. We know that data might not be visible for a while, but we've never seen it take more than 5 minutes max to be available.
Is there any known issue with BigQuery streaming currently?
The issue is under investigation, see:
https://code.google.com/p/google-bigquery/issues/detail?id=263
More information in this forum https://groups.google.com/forum/#!forum/bigquery-downtime-notify
Im trying to do logs analysis with BigQuery. Specifically, I have an appengine app and a javascript client that will be sending log data to BigQuery. In bigquery, I'll store the full log text in one column but also extract important fields into other columns. I then want to be able to do adhoc queries over those columns.
Two questions:
1) Is BigQuery particularly good or particularly bad at this use case?
2) How do I setup revolving logs? I.e. I want to only store the last N logs or the last X GB of log data. I see delete is not supported.
Just so you know, there is an excellent demo of moving App Engine Log data to BigQuery via App Engine MapReduce called log2bq (http://code.google.com/p/log2bq/)
Re: "use case" - Stack Overflow is not a good place for judgements about best or worst, but BigQuery is used internally at Google to analyse really really big log data.
I don't see the advantage of storing full log text in a single column. If you decide that you must set up revolving "logs," you could ingest daily log dumps by creating separate BigQuery tables, perhaps one per day, and then delete the tables when they become old. See https://developers.google.com/bigquery/docs/reference/v2/tables/delete for more information on the Table.delete method.
After implementing this - we decided to open source the framework we built for it. You can see the details of the framework here: http://blog.streak.com/2012/07/export-your-google-app-engine-logs-to.html
If you want your Google App Engine (Google Cloud) project's logs to be in BigQuery, Google has added this functionality built in to the new Cloud Logging system. It is a beta feature known as "Logs Export"
https://cloud.google.com/logging/docs/install/logs_export
They summarize it as:
Export your Google Compute Engine logs and your Google App Engine logs to a Google Cloud Storage bucket, a Google BigQuery dataset, a Google Cloud Pub/Sub topic, or any combination of the three.
We use the "Stream App Engine Logs to BigQuery" feature in our Python GAE projects. This sends our app's logs directly to BigQuery as they are occurring to provide near real-time log records in a BigQuery dataset.
There is also a page describing how to use the exported logs.
https://cloud.google.com/logging/docs/export/using_exported_logs
When we want to query logs exported to BigQuery over multiple days (e.g. the last week), you can use a SQL query with a FROM clause like this:
FROM
(TABLE_DATE_RANGE(my_bq_dataset.myapplog_,
DATE_ADD(CURRENT_TIMESTAMP(), -7, 'DAY'), CURRENT_TIMESTAMP()))