I am trying to create something similar to what the weather apps have which show a graph of the temperatures from today.
I am currently using the onecall endpoint which returns the current weather along with forecasts for the next 48 hours but this doesn't cover my case of showing earlier data from the day.
Am I miss using this endpoint or is there a way to show the data for the current day (both historical from earlier and forecast for later)? Or do I need to use the historical data endpoints?
I am getting data in Splunk from Snowflake using Splunk DB Connect. This is just simple orders data. At Splunk search & reporting I am running the following query on my table to get visualization.
source="big_data_table_inner_join" "UNITS_SOLD" | top COUNTRY
What I am seeing is that each time I run query the events number at splunk increases quite heavily. For eg. After running first time they were 342000 events and when I ran the same query they were 67445 events. Any idea why is this happening?
I have created a report in DataStudio using data pulled from BigQuery and saved as a view. After playing around with the report for a while I have noticed that I have been billed 100+ times for the query (all the exact same size of data), but I only ran it once to build the view. Am I getting charged every time I interact with the report e.g. apply a filter? If not, what is causing these costs?
Your report will run a query against the view for each element on the page so 4 graphs = 4 queries.
If you then change a filter for example, that will run a further 4 queries (assuming the filter affects them all).
I am relatively new to Splunk and I am trying to create a reportthat will display a hostname and the amount of times that host failed to login within the past five minutes, when they failed 3 or more times. The only way I was able to get the initial search results I want is to look only within the past 5 minutes, as you can see in my query:
index="wineventlog" EventCode=4625 earliest=-5min | stats count by host,_time | stats count by host | search count > 2
This returns the host and the count. The issue is if I use this query in my report, it can run every five minutes, but the hosts that were listed previously get removed as they no longer are included in the search results.
I found ways to generate logs that I can then search for separately (http://docs.splunk.com/Documentation/Splunk/6.6.2/Alert/LogEvents) but it didn't work the way I expected.
I am looking for an answer to any of these questions that can help me get the intended results:
Can my original search be improved to still only get results where the failed logins were within 5 minutes but be able to search over any time period?
Is there a way to send the results from the query I already have to a report, where the results will not be cleared out when the search is run again?
Is there any other option I haven't considered to achieve the desired result?
If you only care about the last 5 minutes then search only the last 5 minutes. Searching more is just wasting resources.
Consider writing your results to a summary index (using collect) with a scheduled search and have your report/dashboard display values from the summary index.
I know that BigQuery offers the first "1 TB of data processed" per month for free but I can't figure out where to look on my dashboard to see my monthly usage. I used to be able to "revert" to the old dashboard which had the info but for the past couple of weeks the "old dashboard" isn't accessible.
From the Google Cloud Console overview page for your project, click on the "details" section on the top-right, next to the charge estimate :
You'll get an estimate of the charges for the current month for each service and item in the service, including Big Query analysis :
If you want to track this usage, you can also export the data into CSV every day by going in the Billing settings and enable the usage export feature. Do not worry about the fact that it only mentions Compute Engine, it actually works for other services also.
You can also access directly the billing history by clicking on the billing account link :
You will get a detailed bill with the usage info :
Post GCP Console Redesign Answer
The GCP console was redesigned and now the other answer here no longer applies, but it is still possible to view your usage by going to IAM & Admin -> Quotas.
What you're looking for is "Big Query API: Query usage per day". It doesn't seem possible to view your usage over 30 days unfortunately, but you can see your current usage (per day) and your peak usage over the past 7 days. You can also set a daily quota. If you're just working infrequently or doing a lot in one day, you can set a quota to 1 TiB and prevent yourself from blowing your whole allocation in one day.
You can try sending feedback about these limitations, like I did, by clicking the question mark at the top right and then send feedback.
Theo is correct that there is no way to view the number of bytes processed or billed since the start of the month (inside of the free tier) in the GCP Billing Console. However, you can extract the bytes processed and bytes billed data from logs in Cloud Logging and calculate the total bytes processed/billed since the start of the month inside of BigQuery.
Here are the steps to count total bytes billed in a month:
Under Cloud Logging, go to Logs Explorer (NOT the Legacy Logs Explorer) and run the following query in the query builder frame:
resource.type="bigquery_project" AND
protoPayload.metadata.jobChange.job.jobStats.queryStats.totalBilledBytes>1 AND
timestamp>="2021-04-01T00:00:00Z"
The timestamp clause is not actually necessary, but it will speed up the query. You can set timestamp >= <value> to any valid timestamp you want as long as it returns at least one result.
In the Query Results frame, click the "Action" button, and select "Create Sink".
In the window that opens, give your sink a name, click "Next", and in the "Select sink service" dropdown menu select "BigQuery dataset".
In the "Select BigQuery dataset" dropdown menu, either select an existing dataset where you would like to create your sink (which is a table containing logs) or if you prefer, choose "Create new BigQuery dataset.
Finally, you will likely want to check the box for Partition Table, since this will help you control costs whenever you query this sink. As of the time of this answer, however, Google limits partition tables to 4000 partitions, so you may find it is necessary to clear out old logs eventually.
Click "Create Sink" (there is no need for any inclusion or exclusion filters).
Run a query in BigQuery that produces bytes billed (i.e. a query that does not return a previously cached result). This is necessary to instantiate the sink. Moments after your query runs, you should now see a table called <your_biquery_dataset>.cloudaudit_googleapis_com_data_access
Enter the following Standard SQL query in the BigQuery query editor:
WITH
bytes_table AS (
SELECT
JSON_VALUE(protopayload_auditlog.metadataJson,
'$.jobChange.job.jobStats.createTime') AS date_time,
JSON_VALUE(protopayload_auditlog.metadataJson,
'$.jobChange.job.jobStats.queryStats.totalBilledBytes') AS billedbytes
FROM
`<your_project><your_bigquery_dataset>.cloudaudit_googleapis_com_data_access`
WHERE
EXTRACT(MONTH
FROM
timestamp) = 4
AND EXTRACT(YEAR
FROM
timestamp) = 2021)
SELECT
(SUM(CAST(billedbytes AS INT64))/1073741824) AS total_GB
FROM
bytes_table;
You will want to chance the month from 4 to whatever month you intend to query, and 2021 to whatever year you intend to query. Also, you may find it helpful to save this query as a view if you intend to rerun it periodically.
Be advised that your sink does not contain your past BigQuery logs, only BigQuery logs produced after you created the sink. Therefore in the first month the number of GB returned by this query will not be an accurate count your bytes billed in month unless you happen to have created the sink prior to running any queries in BigQuery during the current month.
Might be related to How can I monitor incurred BigQuery billings costs (jobs completed) by table/dataset in real-time?
If you are fine by using BigQuery itself to get that information (instead of using a UI), you can use something like this:
DECLARE gb_divisor INT64 DEFAULT 1024*1024*1024;
DECLARE tb_divisor INT64 DEFAULT gb_divisor*1024;
DECLARE cost_per_tb_in_dollar INT64 DEFAULT 5;
DECLARE cost_factor FLOAT64 DEFAULT cost_per_tb_in_dollar / tb_divisor;
SELECT
ROUND(SUM(total_bytes_processed) / gb_divisor,2) as bytes_processed_in_gb,
ROUND(SUM(IF(cache_hit != true, total_bytes_processed, 0)) * cost_factor,4) as cost_in_dollar,
user_email,
FROM (
(SELECT * FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_USER)
UNION ALL
(SELECT * FROM `other-project.region-us`.INFORMATION_SCHEMA.JOBS_BY_USER)
)
WHERE
DATE(creation_time) BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) and CURRENT_DATE()
GROUP BY
user_email
Open in BigQuery UI
Explanation
Please consider the caveats I mentioned in my answer here