Parse using cloudwatch metric filter for access log for a column which has response time and convert it to a metric - amazon-cloudwatch

I have access log how can I create cloudwatch metric using cloudwatch metric filter for 0.021 or 0.016. These values are response times. How would I parse and filter this to convert it to a graph
api.mydomain.com xxx.xxx.xxx.xxx - - [06/Sep/2020:23:30:57 +0000] "GET /documents/F3VX5A5X3X4Y HTTP/1.1" 200 2616 "-" "axios/0.18.1" "xxx.xxx.xxx.xxx" 0.021 0.016 . ucs="EXPIRED"

You can create 2 metric filters, one for each value. Use this filter pattern in both metric filters:
[..., response_time_1, response_time_2, dot, ucs]
Then in one filter set $response_time_1 as metric value, and set $response_time_2 in the other.
Now you'll have 2 metrics to graph.
See here for more details on metric filters: https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/FilterAndPatternSyntax.html

Related

How do I obtain Monte Carlo error in R2OpenBugs?

Has anyone managed to obtain a Monte Carlo error for a parameter when running bayesian model un R2OpenBugs?
It is provided in a standard output of OpenBugs, but when run under R2OpenBugs, the log file doesn't have MC error.Is there a way to ask R2OpenBugs to calculate MC error? Or maybe there is a way to calculate it manually? Please, let me know if you heard of any way to do that. Thank you!
Here is the standard log output of R2OpenBugs:
$stats
mean sd val2.5pc median val97.5pc sample
beta0 1.04700 0.13250 0.8130 1.03800 1.30500 1500
beta1 -0.31440 0.18850 -0.6776 -0.31890 0.03473 1500
beta2 -0.05437 0.05369 -0.1648 -0.05408 0.04838 1500
deviance 588.70000 7.87600 575.3000 587.50000 606.90000 1500
$DIC
Dbar Dhat DIC pD
t 588.7 570.9 606.5 17.78
total 588.7 570.9 606.5 17.78
A simple way to calculate Monte Carlo standard error (MCSE) is to divide the standard deviation of the chain by the square root of the effective number of samples. The standard deviation is provided in your output, but the effective sample size should be given as n.eff (the rightmost column) when you print the model output - or at least that is the impression I get from:
https://cran.r-project.org/web/packages/R2OpenBUGS/vignettes/R2OpenBUGS.pdf
I don't use OpenBugs any more so can't easily check for you, but there should be something there that indicates the effective sample size (this is NOT the same as the number of iterations you have sampled, as it also takes into account the loss of information due to correlation within the chains).
Otherwise you can obtain it yourself by extracting the raw MCMC chains and then either computing the effective sample size using the coda package (?coda::effectiveSize) or just use LaplacesDemon::MCSE to calculate the Monte Carlo standard error directly. For more information see:
https://rdrr.io/cran/LaplacesDemon/man/MCSE.html
Note that some people (including me!) would suggest focusing on the effective sample size directly rather than looking at the MCSE, as the old "rule of thumb" that MCSE should be less than 5% of the sample standard deviation is equivalent to saying that the effective sample size should be at least 400 (1/0.05^2). But opinions do vary :)
The MCMC-error is named Time-series SE, and can be found in the statistics section of the summary of the coda object:
library(R2OpenBUGS)
library(coda)
my_result <- bugs(...., codaPg = TRUE)
my_coda <- read.bugs(my_result)
summary(my_coda$statistics)

How does Stratified Sampling Work in Weka

I have a dataset. I want to split the dataset using Stratified Sampling. I would like 70% of data in training set and 30% in test set. So I split the dataset 10 equal subset using StratifiedRemoveFold filter in weka. Then I append 7 datasets to make 70% training dataset and append rest of 3 datasets to make 30% training dataset. However, this is not a good option. I found that, for the 1st attribute of test test one value was missing. Like, my 1st attribute has 7 values. But there were only 6 values for 1st attribute in the test set. As a result when I run the classifier on training set there was error Training set and Test set are incompatible.
I went through the link Stratified Sampling in WEKA. I found if I want to generate a 5% subsample, set the folds to 20. If this is the strategy, then for 30% test set do I need to set the numberofFold of StratifiedRemoveFold filter = 120? And also what about the test set? What should I set as numberofFolds in test set where test set is 70% of whole dataset?
You could try using the supervised Resample (weka.filters.supervised.instance.Resample) filter instead, with no replacement and a bias factor of 0 (to use the distribution of the input data). When using the invert flag, you get the remainder of the dataset.
If you really want to use StratifiedRemoveFolds, then use 10 folds, apply the filter 10 times to get all the 10 folds out and then combine 7 to make your 70% and the remainder to get your 30%.

test and train good practice wrt summary feature

When one feature of a dataset is a summary statistic of the entire pool of data, is it good practice to include the train data in your test data in order to calculate the feature for validation?
For instance, let's say I have 1000 data points split into 800 entries of training and 200 entries for validation. I create a feature with the 800 entries for training of say rank quartile (or could be anything), which numbers 0-3 the quartile some other feature falls in. So in the training set, there will be 200 data points in each quartile.
Once you train the model and need to calculate the feature again for the validation set, a) do you use the already set quartiles barriers, ie the 200 validation entries could have a different than 50-50-50-50 quartile split, or b) do you recalculate the quartiles using all 1000 entries so there is a new feature of quartile rank, each of 250 entries each?
Thanks very much
The ideal practice would be to calculate the quartiles on the training dataset, and using those barriers on your holdout / validation dataset. To ensure that you correctly generate model diagnostics to evaluate its predictive performance, you do not want the distribution of the testing dataset to influence your model training. This is because that data will not be available in real life when you apply the model on unseen data.
I also thought that you will find this article extremely useful when thinking about train-test splitting - https://towardsdatascience.com/3-things-you-need-to-know-before-you-train-test-split-869dfabb7e50

What does "max_batch_size" mean in tensorflow-serving batching_config.txt?

I'm using tensorflow-serving on GPUs with --enable-batching=true.
However, I'm a little confused with max_batch_size in batching_config.txt.
My client sends a input tensor with a tensor shape [-1, 1000] in a single gRPC request, dim0 ranges from (0, 200]. I set max_batch_size = 100 and receive an error:
"gRPC call return code: 3:Task size 158 is larger than maximum batch
size 100"
"gRPC call return code: 3:Task size 162 is larger than maximum batch
size 100"
Looks like max_batch_size limits dim0 of a single request, but tensorflow batches multiple requests to a batch, I thought it means the sum of request numbers.
Here is a direct description from the docs.
max_batch_size: The maximum size of any batch. This parameter governs
the throughput/latency tradeoff, and also avoids having batches that
are so large they exceed some resource constraint (e.g. GPU memory to
hold a batch's data).
In ML most of the time the first dimension represents a batch. So based on my understanding tensorflow serving confuses the value for the first dimension as a batch and issues errors whenever it is bigger than the allowed value. You can verify it by issuing some of the request where you manually control the first dimension to be lower than 100. I expect this to remove the error.
After that you can modify your inputs to be sent in a proper format.

AWS Cloud watch alarm, triggering autoscaling using multiple metrics

I want to create a cloud watch alarm which triggers autoscaling based on more than one metric data. Since this is not natively supported by Cloud Watch ( Correct me if i am wrong ). I was wondering how to overcome this.
Can we get the data from different metrics, say CPUUtilization, NetworkIn, NetworkOut and then create a custom metrics using mon-put-data and enter these data to create a new metric based on which to trigger an autoscaling ?
You can now make use of CloudWatch Metric Math.
Metric math enables you to query multiple CloudWatch metrics and use
math expressions to create new time series based on these metrics. You
can visualize the resulting time series in the CloudWatch console and
add them to dashboards.
More information regarding Metric Math Syntax and Functions available here:
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/using-metric-math.html#metric-math-syntax
However, it needs to be noted that there are no logical operators and you have to use arithmetic functions to make your way out.
To help out anyone bumping here, posting an example:
Lets say you want to trigger an alarm if CPUUtilization < 20% and MemoryUtilization < 30%.
m1 = Avg CPU Utilization % for 5mins
m2 = Avg Mem Utilization % for 5mins
Then:
Avg. CPU Utilization % < 20 for 5 mins AND Avg Mem Utilization % < 30 for 5mins ... (1)
is same as
(m1 - 20) / ABS([m1 - 20]) + (m2 - 30) / ABS([m2 - 30]) < 0 ... (2)
So, define your two metrics and build a metric query which looks like LHS of equation (2) above. Set your threshhold to be 0 and set comparison operator to be LessThanThreshold.
Yes .. Cloudwatch Alarms can only trigger on a single Cloudwatch Metric so you would need to publish your own 'aggregate' custom metric and alarm on that as you suggest yourself.
Here is a blog post describing using custom metrics to trigger autoscaling.
http://www.thatsgeeky.com/2012/01/autoscaling-with-custom-metrics/
This is supported now. You can check
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/using-metric-math.html for the same.
As an example, you can use something like (CPU Utilization>80) OR (MEMORY Consumed>55)