Fliter Point Cloud - pandas

I want to filter a point cloud. The figure shows the result of the intersection of a sphere with a trapezoid. So basically I only need the points which describe the curved surface. My Idea was to figure out the unique values in X and Y and find the lowest Z value for every possible combination of every unique X and Y value.
A csv file contains the entire point cloud:
data = pd.read_csv('test.csv', sep=' ')
uniqueX = data.X.unique()
uniqueY = data.Y.unique()
I am not sure how to iterate and combine the uniqueX and unique as a filter method to find the min Z.
Any ideas?

It would help if you could add sample data to your question, but if you want to find the minimum of z for every combination of x and y, I think you can use .groupby():
data.groupby(['x', 'y'])['z'].min()
Documentation and examples on the use of .groupby():
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html

Related

how to do clustering when I have multiple categorical column and less number of numerical column in pandas?

Say I have one column (X) which holds the customer id and have other multiple columns x1,x2,x3,x4,x5,x6
which have only these 4 distinct values ('High','Low','Medium','Nan') repeatedly. Please click on the above the attachment
Recent update: 16/12/2021: I have done one hot encoding and got 19 features now along with X column now I need to know how to go ahead with the clustering part for such unsupervised data set
Regarding the question what encoding to use i found this article to give a good understanding of when to use label encoding or one hot encoding:
https://www.analyticsvidhya.com/blog/2020/03/one-hot-encoding-vs-label-encoding-using-scikit-learn/
In your case as you do have a ordinal value of your data (high > medium > low > nan) i would suggest using the label encoding technique.
Then regarding the clusteringpart you have identified three diffrent clusters, do you want to identify which samples belong to which cluster or what is your goal?
You could start train a model with 3 cluster centroids as you have identified yourself but could also use an elbow function to find a optimal number of clusters to your dataset. (https://www.geeksforgeeks.org/elbow-method-for-optimal-value-of-k-in-kmeans/)
For label encoding on a column in your dataframe:
encoding_dict = {}
def label_encode(string_value):
num_value = encoding_dict.setdefault(string_value, len(encoding_dict)) # Sets a numerical value for your string value
return num_value
for col in dataframe.columns:
if dataframe[col].dtype == object: #indicates string
dataframe[col] = dataframe[col].apply(label_encode)
encoding_dict = {} # Reset encoding dict to not reuse same values (or dont if you always have same values)

How to determine if a point is above or below a limit?

I have a program written in vb.net that creates a graph and draws various lines on it parsed from an XML file. Each line defines if points on the graph must be above or below it.
Simply put, I am looking for a way to find the closest number ABOVE and BELOW a certain point.
So say we have a straight line {(0,0)(1,1)(2,2)(3,3)}
and a point we want to validate (1.5,4) Say this point needs to be ABOVE the line.
Also, i should mention that the line may not always be a straight line but have many segments representing a curve.
I suspect the easiest way to do this is to find the 2 points on the line surrounding our point on the x axis, get the slope between them and then interpolate.
So I tried this:
pointBelow = validationLine.points.Aggregate(Function(x, y) If(Math.Abs(x.X - paramPoint.XValue) < Math.Abs(y.X - paramPoint.YValues(0)), x, y))
pointAbove = validationLine.points.Aggregate(Function(x, y) If(Math.Abs(x.X - paramPoint.XValue) < Math.Abs(y.X - paramPoint.YValues(0)), x, y))
As you can see, these will obviously both return the same value, so I would like to know how I can search for the closest number in a list BELOW a given value, then do the same thing but search ABOVE that value.
P.S. it is also possible that the point we are validating may be at the exact same place on the x axis as one of the vertices on our line and I am looking for a solution that will solve this regardless.
Sorry but it's too long to comment.
It depends on the line that you are comparing it to... if your line is a function, it means it will never 'go backwards', and you just have to compare the Y-values of the point and your line at point X.
If it's not a function, then it's harder, and maybe you should ask that question on a math Q&A site, like https://mathematica.stackexchange.com/

Smart indexing for matching values in Julia

Suppose I have the following:
# valuesToFind: n x 1 vector
# allValues: m x n matrix, in which every column allValues[:,i]
# contains among it's components exactly 1 instance of the
# corresponding value valuesToFind[i] at some row position
I am trying to determine the position (row index) at which this match occurs for every value in valuesToFind and, currently, I achieve it with the following loop:
idx=Array(Int16, length(valuesToFind))
for (i, v) in enumerate(valuesToFind)
idx[i] = findfirst(articleIDs[:,i], v)
end
Is it possible to do this without loop in a single statement?
Are you looking for:
[findfirst(allValues[:,i], v) for (i,v) in enumerate(valuesToFind)]
?
I'm not 100% convinced this is clearer (in terms of code readability) then a simple loop, but it would do the job in one line if that's what you're after.
Try:
map(x->ind2sub(allValues,x)[1],findin(allValues,valuesToFind))
This is a one-line solution to get the row number of each value in the column. Note that it uses the assumption laid out in the question (unique value in each column). It also uses some assumption on the column first layout of a matrix. The layout assumption can be removed with a sort on the first index returned by ind2sub.

How to add points in order along a stream reach in ArcGIS?

I have a stream network in ArcGIS - i.e. a series of polylines, and along each stream part I have added points. For each of the points I have extracted the height and flow from underlying rasters and I have also extracted data from the intersecting polylines including minimum, mean and max height of the polyline, the HydroID and the nextdownID. The points also have their own ID but I have noticed these are not in order.
What I would like is to add stepID to each of the points, where at the beginning of each river reach (each polyline) the first point is step 1 and this increments upwards until the end of the reach. So if there were 10 points along a polyline, the first point would have a stepID value of 1 and the last point would have a stepID value of 10.
This sounds quite easy but not sure how to do it. Any help would be great.
You can construct points along the line at specific intervals using the construct points tool/function.
Click the Edit tool Edit Tool on the Editor toolbar.
Click the line feature along which you want to generate points.
Click the Editor menu and click Construct Points.
http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#//001t00000029000000.htm
To automate the numbering, you might look into flipping the lines so all the tails point in one direction - up or downstream. Double click on a line, then right click to see the "flip" command. If you use the points set up from the method above, it might order from tail to head.
Another option is to create your own field for the stepID. Create a attribute join to the stream segment, and give each joined record a unique number. Go through your records selecting each group of ten, then sort by FID (check these are in order) then calculate value for stepID = FID - x
where x = the lowest FID in the stream segment's stepID. This thought might help you figure out how to coax the numbers out correctly.
I had this problem before and solved it this way. It is NOT a pretty solution. Would love to hear if there is a more elegant way of doing this
.
For clarity I'll call the pointdataset you mention the 'inputpoints'.
Step 1: getting the points in the right order
If your inputpoints are sometimes far away from the lines, first project them to your lines.
Give your lines a unique line number and join it to the closest inputpoint features
Generate points along lines: use your polylines and genarate a lot of points on them. I'll call this dataset the helperpoints. Fill in a distance that is smaller then the smallest distance between two of your inputpoints.
Make sure your polylines have the right 'direction'. You can check it by using a symbology with arrows, and if needed correct it with the flip editing tool.
Add an IDfield to your helperpoints, type float or double, and create sequential idnumbers in it (https://support.esri.com/en/technical-article/000011137).
Spatial join: the inputpoints are your target, the helperpoints the join features. Keep all the target features. You only need to join the IDfield from the helperpoints. Right click the IDfield in the field map, and make the merge rule 'Mean'. Set the Match option to 'within a distance', and make the search radius 1.5 x the distance that you used in the generate points along line step.
Use the sort tool and sort your spatial join output on the IDfield you just added, then on the lineID you you added on step one. If you have the advanced licence you can do it at once.
Step 2: Generating the StepID
Add a new field to your sort output, and call it StepID
Use the field calculator to fill it. I used this code to make the numbering restart every time there is a new line.
rec=0
oldid = -1
def autoIncrement(lineid):
global rec
global oldid
pStart = 1
pInterval = 1
if rec == 0 or lineid!= oldid :
rec = pStart
else:
rec += pInterval
oldid = lineid
return int(rec)
Expression: autoIncrement( !lineID! )
Expression type: Python
It might still mess up if you have lines very close to each other, or have weird curls on the end. But for the rest this should work!

Postgresql method for finding the slope of a line and forcing through origin

I have a temporary table with 2 numeric columns, Y and X.
CREATE TEMP TABLE findslope(y numeric,x numeric);
Which I then populate with the desired X and Y for the line I'm trying to fit a least squares best fit line which I am currently using the following:
SELECT REGR_SLOPE(y, x) slope FROM findslope into slope_variable;
This works well, but is it possible to force the line through a point or to set the intercept? Specifically I'd like the line to go through the origin: 0,0.
Below is the working code:
SELECT
sum(y*x) / sum(x*x) as slope
FROM findslope INTO slope_variable
Its been a while since I had to get this sort of math right, so verify this on your own. I'd start with Simple linear regression on wiki.
I think will do it:
SELECT regr_sxy(y,x)/regr_sxy(x,x) AS slope FROM findslope INTO slope_variable;