Weighted Least Squares to find Best Fit Line - line

I am trying to find out the Line of best fit for a number of data points (x,y). I am using the method of least squares as described here http://faculty.cs.niu.edu/~hutchins/csci230/best-fit.htm
However I need to adjust the algorithm so that I can add a weight to each data point and the line will be more inclined to the points which are more important than others in the data set.
Please note that I only know very basic statistics so keep in mind that you are explaining to a someone with basic knowledge when answering.

Related

Information about CGAL and alternatives

I'm working on a problem that will eventually run in an embedded microcontroller (ESP8266). I need to perform some fairly simple operations on linear equations. I don't need much, but do need to be able work with points and linear equations to:
Define an equations for lines either from two known points, or one
point and a gradient
Calculate a new x,y point on an equation line that is a specific distance from another point on that equation line
Drop a perpendicular onto an equation line from a point
Perform variations of cosine-rule calculations on points and triangle sides defined as equations
I've roughed up some code for this a while ago based on high school "y = mx + c" concepts, but it's flawed (it fails with infinities when lines are vertical), and currently in Scala. Since I suspect I'm reinventing a wheel that's not my primary goal, I'd like to use someone else's work for this!
I've come across CGAL, and it seems very likely it's capable of all this and more, but I have two questions about it (given that it seems to take ages to get enough understanding of this kind of huge library to actually be able to answer simple questions!)
It seems to assert some kind of mathematical perfection in it's calculations, but that's not important to me, and my system will be severely memory constrained. Does it use/offer memory efficient approximations?
Is it possible (and hopefully easy) to separate out just a limited subset of features, or am I going to find the entire library (or even a very large subset) heading into my memory limited machine?
And, I suppose the inevitable follow up: are there more suitable libraries I'm unaware of?
TIA!
The problems that you are mentioning sound fairly simple indeed, so I'm wondering if you really need any library at all. Maybe if you post your original code we could help you fix it--your problem sounds like you need to redo a calculation avoiding a division by zero.
As for your point (2) about separating a limited number of features from CGAL, giving the size and the coding style of that project, from my experience that will be significantly more complicated (if at all possible) than fixing your own code.
In case you want to try a simpler library than CGAL, maybe you could try Boost.Geometry
Regards,

Continuous modification of a set of points - find all nearest neighbors

I have a 3D set of points. These points will undergo a series of tiny perturbations (all points will be perturbed at once). Example: if I have 100 points in a box, each point may be moved up to, but no more than 0.2% of the box width in each iteration of my program.
After each perturbation operation, I want to know the new distance to each point's nearest neighbor.
This needs to use a very fast data structure; I'm optimizing this for speed. It's a somewhat tricky problem because I'm modifying all points at once. Approximate NN algorithms are not suitable for this problem.
I feel like the answer is somewhere between kd-trees and Voronoi tessellations, but I am not an expert on data structures, so I am baffled about what to do. I sure this is a very hard problem that would require a lot of research to reach a truly optimal solution, but even something fairly optimal will work for me.
Thanks
You can try a quadkey or monster curve. It reduce the dimension and fills the plane. Microsoft bing maps quadkey is a good start to learn.

Best fixed rectangular area fit over points

I'm using Google Maps and I'm trying to work out the maximum number of points visible in the viewport at a given zoom level.
My naive approach is to get the viewing area (in coordinates) and use that as a "fitting rectangle" and see how many points fit in the area.
I had a look around but I couldn't find any algorithm for "best fit" of random points in a rectangular area.
It seems a quite common problem so I probably don't know the right keywords to use.
Any help in getting me to a solution would be appreciated.
EDIT: thanks for the answers but I'm afraid I didn't make myself clear. Fitting a rectangle over ALL the points is pretty much a trivial affair (sort them all, get the min/max and voilĂ ).
What I want to know is the maximum number of points that can be fit under a FIXED SIZED rectangle: I've got all my points and a "moving window" of fixed size and I want to know how many points I can fit in.
Sorry for the bad initial explanation.
Cheers.
To find a best-fit rectangle over a set of points, and with the assumption that all points in the set need to be within the rectangle, all you need to do is find the min/max in both dimensions.
One way to do this would be to sort the points by their X dimension and take the first and last as the min/max in that dimension, and then repeat the process in the Y dimension to get that min/max. From that information, you have all you need to make a rectangle.
From a computational complexity standpoint, the complexity is 2x the complexity of the sort algorithm used (since you have to sort 2 times) + the complexity of getting the first and last elements of each sorted set, which, if you use an array, for example, is an O(1) operation.
If you use merge sort, and sort into arrays, you have an overall complexity of O(n log n). Broken down into number of operations, you have 2(n log n) + 4.
This wont give you the tightest fit on the set of rectangles because it won't ensure that one side of the rectangle is collinear with at least 2 of the points (for that you will need the Rotating Calipers algorithm that #Bart Kiers suggestes), but it is a much faster algorithm since the rotating calipers does esentially the same as I have described here, but then rotates the rectangle until one of it's edges lines up with 2 of the min/max points.

How can I distribute a number of values Normally in Excel VBA

Sorry I know the question isnt as specific as it could be. I am currently working on a replenishment forecasting system for a clothing company (dont ask why it's in VBA). The module I am currently working on is distribution forecasts down to a size level. The idea is that the planners can forecast the number to sell, then can specify a ratio between the sizes.
In order to make the interface a bit nicer I was going to give them 4 options; Assess trend, manual entry, Poisson and Normal. The last two is where I am having an issue. Given a mean and SD I'd like to drop in a ratio (preferably as %s) between the different sizes. The number of the sizes can vary from 1 to ~30 so its going to need to be a calculation.
If anyone could point me towards a method I'd be etenaly greatfull - likewise if you have suggestions for a better method.
Cheers
For the sake of anyone searching this, whilst only a temporary solution I used probability mass functions to get ratios this allowed the user to modify the mean and SD and thus skew the curve as they wished. I could then use the ratios for my calculations. Poisson also worked with this method but turned out to be a slightly stupid idea in terms of choice.

Minimizing pen lifts in a pen plotter or similar device

I'm looking for references to algorithms for plotting on a mechanical pen plotter.
Specifically, I have a list of straight vectors, each representing a line to be plotted. First I want to remove duplicate vectors, so each line is only plotted once. That's easy enough.
Second, there are many vectors that intersect, sometimes at endpoints, but not always. They can be plotted in any order, but I want to find an order that reduces the number of times the pen must be lifted, preferably to a minimum though I understand that may take a long time to compute, if it's computable at all. Vectors that intersect can be broken into smaller vectors if that helps. But generally, if the pen is moving in a straight line, it's best to keep it moving that way as long as possible. So, two parallel vectors joined end to end could be combined into a single vector, etc.
This sounds like some variety of graph theory problem, but I don't know much about that. Can anyone point me to references or algorithms I need to study? Or maybe example code?
Thanks,
Neil
The problem is an example of the Chinese postman problem which is an NP-complete problem. The most wellknown NP-complete problem is the Travelling Salesman. Common for all NP-complete problems are that they can all be translated into eachother. There are no known algorithms for solving any of them in a time that is polynomial dependent of the number of nodes in the input, they are non-polynomial (NP).
For your case I would suggest some simple heuristics. Don't overdo it, just pick anything quite simple like going in a straight line as long as possible and then lift the pen to the closest available starting point and go on from there.