How can I set unit in a cloudwatch math expression? - amazon-cloudwatch

I have a clouwatch math metric e1 that is SUM([m1,m2]). m1 and m2 have CloudWatch Unit of Milliseconds but the SUM expression has unit No unit.
Is there any way to assign a units to math expressions in CloudWatch? I want to show e1,m1,m2 in the same chart and it says Various units in the Y-label axis instead of saying Milliseconds

The only workaround that I know is to go to the chart Options and set Left Y axis Label to Milliseconds and unchecking Show units. This does not actually give the match expression a unit but it shows ok in the chart.

Related

AWS Cloudwatch Math Expressions: draw null / empty

I am trying to create a dashboard widget that says "if a metric sample count is less than certain number, don't draw the graph".
The only Math expression that seem promising is IF, however the value can only be a metric or a scalar. I'm trying to find a way to draw a null/no data point/empty instead.
Any way?
Reference: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/using-metric-math.html#using-IF-expressions
CloudWatch will drop data points that are not numbers (NaN, +Infinity, -Infinity) when graphing the data. Also, metric math will evaluate basic operations in the expression. You can divide by zero to get non-number value.
So you can do something like this to trick it into dropping the values you don't want:
Have your metric in the graph as m1.
Have the sample count of your metric in the graph as m2.
Add an IF function to drop data points if the sample count is lower than some number (10 in this example): IF(m2 < 10, 1/0, m1)
Disable m1 and m2 on the graph and only show the expression.

why is ggplot2 geom_col misreading discrete x axis labels as continuous?

Aim: plot a column chart representing concentration values at discrete sites
Problem: the 14 site labels are numeric, so I think ggplot2 is assuming continuous data and adding spaces for what it sees as 'missing numbers'. I only want 14 columns with 14 marks/labels, relative to the 14 values in the dataframe. I've tried assigning the sites as factors and characters but neither work.
Also, how do you ensure the y-axis ends at '0', so the bottom of the columns meet the x-axis?
Thanks
Data:
Sites: 2,4,6,7,8,9,10,11,12,13,14,15,16,17
Concentration: 10,16,3,15,17,10,11,19,14,12,14,13,18,16
You have two questions in one with two pretty straightforward answers:
1. How to force a discrete axis when your column is a continuous one? To make ggplot2 draw a discrete axis, the data must be discrete. You can force your numeric data to be discrete by converting to a factor. So, instead of x=Sites in your plot code, use x=as.factor(Sites).
2. How to eliminate the white space below the columns in a column plot? You can control the limits of the y axis via the scale_y_continuous() function. By default, the limits extend a bit past the actual data (in this case, from 0 to the max Concentration). You can override that behavior via the expand= argument. Check the documentation for expansion() for more details, but here I'm going to use mult=, which uses a multiplication to find the new limits based on the data. I'm using 0 for the lower limit to make the lower axis limit equal the minimum in your data (0), and 0.05 as the upper limit to expand the chart limits about 5% past the max value (this is default, I believe).
Here's the code and resulting plot.
library(ggplot2)
df <- data.frame(
Sites = c(2,4,6,7,8,9,10,11,12,13,14,15,16,17),
Concentration = c(10,16,3,15,17,10,11,19,14,12,14,13,18,16)
)
ggplot(df, aes(x=as.factor(Sites), y=Concentration)) +
geom_col(color="black", fill="lightblue") +
scale_y_continuous(expand=expansion(mult=c(0, 0.05))) +
theme_bw()

point cloud generation for XYZ-format in order to use in GLAP

As I read there is to kind of XYZ format:
x y z <--- in one line
and
x y z nx ny nz <--- in one line.
the function CGAL::make_surface_mesh() is extreamly slow if I use just x y z (without normals).
What is the proper way to retrieve normals from PCD-format (PCL-lib) ?
Or how to generate it manually (by my own code)?
There are several methods to estimate normals. One possibility is to insert all the points in a KdTree, then get a certain number of nearest neighbors from each point. Once you get the nearest neighbors, you can either fit a higher-order surface (quadric) to the points and compute its normal, or you can do a principal component analysis of the points and take the eigenvector associated with the smallest eigenvalue. Both methods as well as several refinements are implemented in the Point Cloud Processing package of CGAL:
http://doc.cgal.org/latest/Point_set_processing_3/index.html#Point_set_processing_3NormalEstimation
Depending on your input pointset, different methods / tunings will perform differently (it may require experimentation / parameter tuning).
Note: you may also try the different reconstruction algorithms available there:
http://doc.cgal.org/latest/Surface_reconstruction_points_3/

sampling 2-dimensional surface: how many sample points along X & Y axes?

I have a set of first 25 Zernike polynomials. Below are shown few in Cartesin co-ordinate system.
z2 = 2*x
z3 = 2*y
z4 = sqrt(3)*(2*x^2+2*y^2-1)
:
:
z24 = sqrt(14)*(15*(x^2+y^2)^2-20*(x^2+y^2)+6)*(x^2-y^2)
I am not using 1st since it is piston; so I have these 24 two-dim ANALYTICAL functions expressed in X-Y Cartesian co-ordinate system. All are defined over unit circle, as they are orthogonal over unit circle. The problem which I am describing here is relevant to other 2D surfaces also apart from Zernike Polynomials.
Suppose that origin (0,0) of the XY co-ordinate system and the centre of the unit circle are same.
Next, I take linear combination of these 24 polynomials to build a 2D wavefront shape. I use 24 random input coefficients in this combination.
w(x,y) = sum_over_i a_i*z_i (i=2,3,4,....24)
a_i = random coefficients
z_i = zernike polynomials
Upto this point, everything is analytical part which can be done on paper.
Now comes the discretization!
I know that when you want to re-construct a signal (1Dim/2Dim), your sampling frequency should be at least twice the maximum frequency present in the signal (Nyquist-Shanon principle).
Here signal is w(x,y) as mentioned above which is nothing but a simple 2Dim
function of x & y. I want to represent it on computer now. Obviously I can not take all infinite points from -1 to +1 along x axis and same for y axis.
I have to take finite no. of data points (which are called sample points or just samples) on this analytical 2Dim surface w(x,y)
I am measuring x & y in metres, and -1 <= x <= +1; -1 <= y <= +1.
e.g. If I divide my x-axis from -1 to 1, in 50 sample points then dx = 2/50= 0.04 metre. Same for y axis. Now my sampling frequency is 1/dx i.e. 25 samples per metre. Same for y axis.
But I took 50 samples arbitrarily; I could have taken 10 samples or 1000 samples. That is the crux of the matter here: how many samples points?How will I determine this number?
There is one theorem (Nyquist-Shanon theorem) mentioned above which says that if I want to re-construct w(x,y) faithfully, I must sample it on both axes so that my sampling frequency (i.e. no. of samples per metre) is at least twice the maximum frequency present in the w(x,y). This is nothing but finding power spectrum of w(x,y). Idea is that any function in space domain can be represented in spatial-frequency domain also, which is nothing but taking Fourier transform of the function! This tells us how many (spatial) frequencies are present in your function w(x,y) and what is the maximum frequency out of these many frequencies.
Now my question is first how to find out this maximum sampling frequency in my case. I can not use MATLAB fft2() or any other tool since it means already I have samples taken across the wavefront!! Obviously remaining option is find it analytically ! But that is time consuming and difficult since I have 24 polynomials & I will have to use then continuous Fourier transform i.e. I will have to go for pen and paper.
Any help will be appreciated.
Thanks
Key Assumptions
You want to use the "Nyquist-Shanon" theorem to determine sampling frequency
Obviously remaining option is find it analytically ! But that is time
consuming and difficult since I have 21 polynomials & I have to use
continuous Fourier transform i.e. done by analytically.
Given the assumption I have made (and noting that consideration of other mathematical techniques is out of scope for StackOverflow), you have no option but to calculate the continuous Fourier Transform.
However, I believe you haven't considered all the options for calculating the transform other than a laborious paper exercise e.g.
Numerical approximation of the continuous F.T. using code
Symbolic Integration e.g. Wolfram Alpha
Surely a numerical approximation of the Fourier Transform will be adequate for your solution?
I am assuming this is for coursework or research rather, so all you really care about as a physicist is a solution that is the quickest solution that is accurate within the scope of your problem.
So to conclude, IMHO, don't waste time searching for a more mathematically elegant solution or trick and just solve the problem with one of the above methods

What exactly do the whiskers in pandas' boxplots specify?

In python-pandas boxplots with default settings, the red bar is the mean median, and the box signifies the 25th and 75th quartiles, but what exactly do the whiskers mean in this case? Where is the documentation to figure out the exact definition (couldn't find it)?
Example code:
df.boxplot()
Example result:
Pandas just wraps the boxplot function from matplotlib. The matplotlib docs have the definition of the whiskers in detail:
whis : float, sequence, or string (default = 1.5)
As a float, determines the reach of the whiskers to the beyond the
first and third quartiles. In other words, where IQR is the
interquartile range (Q3-Q1), the upper whisker will extend to last
datum less than Q3 + whis*IQR). Similarly, the lower whisker will
extend to the first datum greater than Q1 - whis*IQR. Beyond the
whiskers, data are considered outliers and are plotted as individual
points.
Matplotlib (and Pandas) also gives you a lot of options to change this default definition of the whiskers:
Set this to an unreasonably high value to force the whiskers to show
the min and max values. Alternatively, set this to an ascending
sequence of percentile (e.g., [5, 95]) to set the whiskers at specific
percentiles of the data. Finally, whis can be the string 'range' to
force the whiskers to the min and max of the data.
Below a graphic that illustrates this from a stats.stackexchange answer. Note that k=1.5 if you don't supply the whis keyword in Pandas.
From Amelio Vazquez-Reina's answer in Boxplots in matplotlib: Markers and outliers:
The outliers (the + markers in the boxplot) are simply points outside of the wide [(Q1-1.5 IQR), (Q3+1.5 IQR)] margin below.
FYI: Confused by location of fences in box-whisker plots
You mention in your question that the red line is the mean - it is actually the median.
From the matplotlib link mentioned by Chang She above:
The box extends from the lower to upper quartile values of the data,
with a line at the median. The whiskers extend from the box to show
the range of the data. Flier points are those past the end of the
whiskers.
I didn't experiment, but there is a 'meanline' option which might put the line at the mean.
These are specified in the matplotlib documentation. The whiskers are some multiple (1.5 by default) of the interquartile range.