I'm using dashboards to monitor various output stats on AWS.
Lets say it looks something like this:
stats avg(myfield1), min(myfield2), max(myfield3) by bin(1m)
This works fine - however I am by default using a bin size of 1 minute - so the data retention period is only 3 days. If I want to look at a week or a month I have to use a separate widget with a larger bin size - I still want the 1 minute resolution for the shorter time periods and I'd rather not have to double up the graphs as the dashboard is already very busy.
Obviously all the built in metrics graphs adjust the bin size they are querying dynamically as the data range being viewed is changed.
Is it possible to do this within a cloudwatch insights query and if so what is the syntax?
Related
I have a custom Namespace on Cloudwatch that contains a list of metrics.
The metric name is the IP of each of the servers connecting to mine, and this can change over time, with new ones coming, older not connecting during a time frame, etc.
What I'm trying to have is a graph that shows all the metrics inside that namespace by including the new arriving ones automatically and setting at 0 the ones that aren't present at a specific timeframe.
(For instance, if IP 1.2.3.4 connects at 9:01, 9:02 but not 9:03, 9:04 then reconnects at 9:05, the graph will show 0 for 9:03 and 9:04 for that IP. If a new IP arrives at 9:05, it will be added automatically in the graph).
Is it possible to do that? How can I do? I haven't found how on Cloudwatch so far.
The answer depends on how many metrics you have in the namespace.
Dashboard Widget can show a maximum of 500 metrics (docs). If you have less than 500 metrics in the namespace, you can simply use the metric math SEARCH and FILL functions like this:
"FILL(SEARCH('{YOUR_NAMESPACE}', 'Average', 300), 0)"
SEARCH will fetch the metrics and FILL will default the values to 0 for intervals that don't have datapoints present. Also, if a metric didn't receive new datapoints in over two weeks, it won't be returned by the search.
If you have between 500 and 2500 metrics in the namespace (limit is 500 metrics per widget and 2500 metrics per dashboard), you could potentialy split IP ranges into multiple graphs with SEARCH expressions like this:
"FILL(SEARCH('{YOUR_NAMESPACE} MetricName="1.2', 'Average', 300), 0)"
This will include all metrics for IPs starting with 1.2 in one graph. You would then need to create similar graphs for different ranges.
You can still use CloudWatch to graph more than 2500 metrics on a graph/dashboard, but then you need to write custom widgets. You would need to write a lambda function that would fetch all of the datapoints from every metric in the namespace and render the graph using something like Matplotlib.
I have 2 data sources - one is google analytics, the other one is a table from big query:
I am simply trying to display the % change per day in both metrics, so for example in the first I would show:
(8239 - 1706) / 8239 * 100
(7802 - 8239) / 7802 * 100
# and so on, just daily percentage change
The same in the second table with views.
Before I used to use it like so:
And it would show me what I desire, but now all % changes are - and I can't understand why.
Maybe this can be resolved in using some custom SQL?
can you please help me with time stamp of summay index..
we having disk space issue and we are clearing the old logs . but we want keep some field data so if will schedule a SI then does it will add the data from last 1 month at one time ..then why we need to schedule it ? have gone through the splunk document but unable to understand the steps and logic ..
The idea of a summary index is to store the results of a search until they are needed for a later search. The classic example is the end-of-month report. Rather than run a huge search over thirty days to crunch the thousands of events of each day into a final report, a daily search crunches the events of that day into a SI then the monthly report runs on day 30 to read the 30 summary events from the SI into a report that runs quickly. The same SI can then be used for end-of-week reports and to populate a dashboard with the daily sales (or whatever) figures.
The key is to make the summary smaller than the original data. One cannot dump 1 month of data into a SI and hope to save space - it won't happen.
A summary index can help save disk space by retaining a smaller set of summary data long after the original events have been discarded.
Summaries do not have to be scheduled, but that is the most common way to producing them. It means no one has to remember to run the daily sales reports everyday to be able to get the monthly sales report. That said, one can write events to a summary index in an ad-hoc search using the collect command.
I'm having splunk with holding 3 months of log details getting refreshed after that (no history we can see after that), but my requirement is: I need to store that log details to another folder in splunk, which holds all the log info with history by dumping. Not sure how to extract data from splunk. Can we use any java code? or any API to extract the log data from splunk and store into another?
I'm new to splunk.
You need to investigate the following:
index retention (and for Smart Store)
storage availability
if you have an index set for 500G or 1 year, but you store 50G per day, you'll rotate at 10 days
if you hsve an index set for 500G or 1 year, but only have 400G available storage, it will rotate sooner
In addition to the answer by #warren, look into the coldToFrozenDir and coldToFrozenScript settings in indexes.conf. These settings govern where and how data is archived rather than deleted. The data is not exported, however, it is stored in Splunk's proprietary format.
I know that BigQuery offers the first "1 TB of data processed" per month for free but I can't figure out where to look on my dashboard to see my monthly usage. I used to be able to "revert" to the old dashboard which had the info but for the past couple of weeks the "old dashboard" isn't accessible.
From the Google Cloud Console overview page for your project, click on the "details" section on the top-right, next to the charge estimate :
You'll get an estimate of the charges for the current month for each service and item in the service, including Big Query analysis :
If you want to track this usage, you can also export the data into CSV every day by going in the Billing settings and enable the usage export feature. Do not worry about the fact that it only mentions Compute Engine, it actually works for other services also.
You can also access directly the billing history by clicking on the billing account link :
You will get a detailed bill with the usage info :
Post GCP Console Redesign Answer
The GCP console was redesigned and now the other answer here no longer applies, but it is still possible to view your usage by going to IAM & Admin -> Quotas.
What you're looking for is "Big Query API: Query usage per day". It doesn't seem possible to view your usage over 30 days unfortunately, but you can see your current usage (per day) and your peak usage over the past 7 days. You can also set a daily quota. If you're just working infrequently or doing a lot in one day, you can set a quota to 1 TiB and prevent yourself from blowing your whole allocation in one day.
You can try sending feedback about these limitations, like I did, by clicking the question mark at the top right and then send feedback.
Theo is correct that there is no way to view the number of bytes processed or billed since the start of the month (inside of the free tier) in the GCP Billing Console. However, you can extract the bytes processed and bytes billed data from logs in Cloud Logging and calculate the total bytes processed/billed since the start of the month inside of BigQuery.
Here are the steps to count total bytes billed in a month:
Under Cloud Logging, go to Logs Explorer (NOT the Legacy Logs Explorer) and run the following query in the query builder frame:
resource.type="bigquery_project" AND
protoPayload.metadata.jobChange.job.jobStats.queryStats.totalBilledBytes>1 AND
timestamp>="2021-04-01T00:00:00Z"
The timestamp clause is not actually necessary, but it will speed up the query. You can set timestamp >= <value> to any valid timestamp you want as long as it returns at least one result.
In the Query Results frame, click the "Action" button, and select "Create Sink".
In the window that opens, give your sink a name, click "Next", and in the "Select sink service" dropdown menu select "BigQuery dataset".
In the "Select BigQuery dataset" dropdown menu, either select an existing dataset where you would like to create your sink (which is a table containing logs) or if you prefer, choose "Create new BigQuery dataset.
Finally, you will likely want to check the box for Partition Table, since this will help you control costs whenever you query this sink. As of the time of this answer, however, Google limits partition tables to 4000 partitions, so you may find it is necessary to clear out old logs eventually.
Click "Create Sink" (there is no need for any inclusion or exclusion filters).
Run a query in BigQuery that produces bytes billed (i.e. a query that does not return a previously cached result). This is necessary to instantiate the sink. Moments after your query runs, you should now see a table called <your_biquery_dataset>.cloudaudit_googleapis_com_data_access
Enter the following Standard SQL query in the BigQuery query editor:
WITH
bytes_table AS (
SELECT
JSON_VALUE(protopayload_auditlog.metadataJson,
'$.jobChange.job.jobStats.createTime') AS date_time,
JSON_VALUE(protopayload_auditlog.metadataJson,
'$.jobChange.job.jobStats.queryStats.totalBilledBytes') AS billedbytes
FROM
`<your_project><your_bigquery_dataset>.cloudaudit_googleapis_com_data_access`
WHERE
EXTRACT(MONTH
FROM
timestamp) = 4
AND EXTRACT(YEAR
FROM
timestamp) = 2021)
SELECT
(SUM(CAST(billedbytes AS INT64))/1073741824) AS total_GB
FROM
bytes_table;
You will want to chance the month from 4 to whatever month you intend to query, and 2021 to whatever year you intend to query. Also, you may find it helpful to save this query as a view if you intend to rerun it periodically.
Be advised that your sink does not contain your past BigQuery logs, only BigQuery logs produced after you created the sink. Therefore in the first month the number of GB returned by this query will not be an accurate count your bytes billed in month unless you happen to have created the sink prior to running any queries in BigQuery during the current month.
Might be related to How can I monitor incurred BigQuery billings costs (jobs completed) by table/dataset in real-time?
If you are fine by using BigQuery itself to get that information (instead of using a UI), you can use something like this:
DECLARE gb_divisor INT64 DEFAULT 1024*1024*1024;
DECLARE tb_divisor INT64 DEFAULT gb_divisor*1024;
DECLARE cost_per_tb_in_dollar INT64 DEFAULT 5;
DECLARE cost_factor FLOAT64 DEFAULT cost_per_tb_in_dollar / tb_divisor;
SELECT
ROUND(SUM(total_bytes_processed) / gb_divisor,2) as bytes_processed_in_gb,
ROUND(SUM(IF(cache_hit != true, total_bytes_processed, 0)) * cost_factor,4) as cost_in_dollar,
user_email,
FROM (
(SELECT * FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_USER)
UNION ALL
(SELECT * FROM `other-project.region-us`.INFORMATION_SCHEMA.JOBS_BY_USER)
)
WHERE
DATE(creation_time) BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) and CURRENT_DATE()
GROUP BY
user_email
Open in BigQuery UI
Explanation
Please consider the caveats I mentioned in my answer here