k-means core function for temporal geo data - k-means

I want cluster geo data (lat,long,timestamp) with k-means. I'm searching for a good core function, I can't find good paper or other sources for that. To time I multiplicate the time and the space distance:
public static double dis(GeoData input1, GeoData input2)
{
double timeDis = Math.abs( input1.getTime() - input2.getTime() );
double geoDis = geoDis(input1, input2); //extra function
return timeDis*geoDis;
}
Maybe someone know a good core function for clustering temporal geo data?
(need citation)

There are couple of work already done using clustering technique for geo data. Check this paper which explain how to use k-mean and density based clustering in geo-data.
http://paginas.fe.up.pt/~prodei/dsie12/papers/paper_13.pdf
Important step is we calculate Euclidean distance for 3D space(lat,long,timestamp).
I hope this paper would help you to understand. Please go through it.

Related

Optimize "1D" bin packing/sheet cutting

Our use case could be described as a variant of 1D bin packing or sheet cutting.
Imagine a drywall with a beam framing.
We want to optimize the number and size of gypsum boards that would be needed to cover the wall.
Boards must start and end on a beam.
Boards must not overlap (hard constraint).
Less (i.e. bigger) boards, the better (soft constraint).
What we currently do:
Pre-generate all possible boards and pass them as problem facts.
Let the solver pick the best subset of those (nullable planning variable).
First Fit Decreasing + Simulated Annealing
Even relatively small walls (~6m, less than 20 possible boards to pick from) take sometimes minutes and while we mostly get a feasible solution, it's rarely optimal.
Is there a better way to model that?
EDIT
Our current domain model looks like the following. Please note that the planning entity only holds the selected/picked material but nothing else. I.e. currently our planning entities are all equal, which kind of prevents any optimization that depends on planning entity difficulty.
data class Assignment(
#PlanningId
private val id: Long? = null,
#PlanningVariable(
valueRangeProviderRefs = ["materials"],
strengthComparatorClass = MaterialStrengthComparator::class,
nullable = true
)
var material: Material? = null
)
data class Material(
val start: Double,
val stop: Double,
)
Active (sub)pillar change and swap move selectors. See optaplanner docs section about move selectors (move neighorhoods). The default moves (single swap and single change) are probably getting stuck in local optima (and even though SA helps them escape those, those escapes are probably not efficient enough).
That should help, but a custom move to swap two subpillars of the almost the same size, might improve efficiency further.
Also, as you're using SA (Simulated Annealing), know that SA is parameter sensitive. Use optaplanner-benchmark to try multiple SA starting temp parameters with different dataset set sizes. Also compare it to a plain LA (Late Acceptance) in benchmarks too. LA isn't fickle like SA can be. (With fickle I don't mean unstable. I mean potential dataset size sensitive parameter tweaking.)

Setting initial values for non-linear parameters via tabuSearch

I'm trying to fit the lppl model to KLSE index to predict the most probable crash time. Many papers suggested tabuSearch to identify the initial value for non-linear parameters but none of them publish their code. I have tried to fit the mentioned index with the help of NLS And Log-Periodic Power Law (LPPL) in R. But the obtained error and p values are not significant. I believe that the initial values are not accurate. Can anyone help me on how to find the proper initial values?
library(tseries)
library(zoo)
ts<-get.hist.quote(instrument="^KLSE",start="2003-04-18",end="2008-01-30",quote="Close",provider="yahoo",origin="1970-01-01",compression="d",retclass="zoo")
df<-data.frame(ts)
df<-data.frame(Date=as.Date(rownames(df)),Y=df$Close)
df<-df[!is.na(df$Y),]
library(minpack.lm)
library(ggplot2)
df$days<-as.numeric(df$Date-df[1,]$Date)
f<-function(pars,xx){pars$a + (pars$tc - xx)^pars$m *(pars$b+ pars$c * cos(pars$omega*log(pars$tc - xx) + pars$phi))}
resids<-function(p,observed,xx){df$Y-f(p,xx)}
nls.out <- nls.lm(par=list(a=600,b=-266,tc=3000, m=.5,omega=7.8,phi=-4,c=-14),fn = resids, observed = df$Y, xx = df$days, control= nls.lm.control (maxiter =1024, ftol=1e-6, maxfev=1e6))
par<-nls.out$par
nls.final<-nls(Y~(a+(tc-days)^m*(b+c*cos(omega*log(tc-days)+phi))),data=df,start=par,algorithm="plinear",control=nls.control(maxiter=10024,minFactor=1e-8))
summary(nls.final)
I would look at some of the newer research on this topic, there is a good trig modification that will practically guarantee a singular optimization. Additionally, you can use r's built in linear equation solver, to find the linearizable parameters, ergo you will only need to optimize in 3 dimensions. The link below should get you started. I would cite recent literature and personal experience to strongly advise against a tabu search.
https://www.ethz.ch/content/dam/ethz/special-interest/mtec/chair-of-entrepreneurial-risks-dam/documents/dissertation/master%20thesis/MAS_final_Tuncay.pdf

How to get user location using accelerometer, gryoscope, and magnetometer in iPhone?

The simple equation for user location using inbuilt inertial measurement unit (IMU) which is also called pedestrian dead reckoning (PDR) is given as:
x= x(previous)+step length * sin(heading direction)
y= y(previous)+step length *cos(heading direction )
We can use the motionManager property of CMMotionManager class to access raw values from accelerometer, gyroscope, and magnetometer. Also, we can get attitudes values as roll, pitch, and yaw. The step length can be calculated as the double square root of acceleration. However, I'm confused with the heading direction. Some of the published literature has used a combination of magnetometer and gyroscope data to estimate the heading direction. I can see that CLHeading also gives heading information. There are some online tutorials especially for an android platform like this to estimate user location. However, it does not give any proper mathematical explanation.
I've followed many online resources like this, this,this, and this to make a PDR app. My app can detect the steps and gives the step length properly however its output is full of errors. I think the error is due to the lack of proper heading direction. I've used the following relation to get heading direction from the magnetometer.
magnetometerHeading = atan2(-self.motionManager.magnetometerData.magneticField.y, self.motionManager.magnetometerData.magneticField.x);
Similarly, from gyroscope:
grysocopeHeading +=-self.motionManager.gyroData.rotationRate.z*180/M_PI;
Finally, I give proportional weight to the previous heading driection, gryoscopeheading, and magnetometerHeading as follows:
headingDriection = (2*headingDirection/5)+(magnetometerHeading/5)+(2*gryospoceHeading/5);
I followed this method from a published journal paper. However, I'm getting lots of error in my work. Is my approach wrong? What exactly should I do to get a proper heading direction such that the localization estimation error would be minimum?
Any help would be appreciated.
Thank you.
EDIT
I noticed that while calculating heading direction using gyroscope data, I didn't multiply the rotation rate (which is in radian/sec) with the delta time. For this, I added following code:
CMDeviceMotion *motion = self.motionManager.deviceMotion;
[_motionManager startDeviceMotionUpdates];
if(!previousTime)
previousTime = motion.timestamp;
double deltaTime = motion.timestamp - previousTime;
previousTime = motion.timestamp;
Then I updated the gyroscope heading with :
gyroscopeHeading+= -self.motionManager.gryoData.rotationRate.z*deltaTime*180/M_PI;
The localization result is still not close to the real location. Is my approach correct?

How to use Redis and geo proximity search to find two users at the same location?

I want to implement a service that, given users' geo coordinates, can detect whether two users are at the very same location in real time.
In order to do this in real time and to scale, it seems I should go with a distributed in-memory datastore like Redis. I have researched using geohashing, but the problem is that points close to each other may not always share the same hash prefix. And geohashing may be overkill since I'm interested in finding whether two users are close enough where they are standing next to each other.
The simple solution of course is just to test whether pairs of geo coordinates fall within a small distance of each other. But AFAIK, Redis and other in-memory datastorse don't have geospatial indexing to support that kind of look-up.
What is the best way to go about implementing this?
This functionality is baked into Redis 3.2+.
But for older versions the problem still exists. I've taken Yin Qiwen's answer and created a module for Node, and you can see how it uses Redis by examining the code. His instructions are perfect and I was able to follow them for great results.
https://github.com/arjunmehta/node-georedis
The same algorithm is essentially what is used for the native commands.
It is very fast, and avoids any kind of intersections/haversine type operations. The coolest thing (I think) about Yin Qiwen's method is that the most computationally intense parts of the algorithm can be distributed to clients (instead of all happening in the DB or on the server).
It's not 100% precise and uses preconfigured distance steps, but for most applications you won't need exact precision I'd imagine.
I've also paraphrased Yin Qiwen's article at the GIS stack exchange.
Sorry for all the linkage. :P
Generally, this could be done by GeoHash and Redis's sorted set. There is a design I wrote before talking about how to implement a spatial index service on redis.
https://github.com/yinqiwen/ardb/wiki/Spatial-Index
Maybe you can try this one:
Redis Geography Edition
You really want to try it, it works awesome.
:)
I realize this doesn't answer your question... but I don't think that it's the correct tool.
PostgreSQL + PostGIS can perform really, really well. You can configure PostgreSQL to pretty much run as much of the database as it can fit in memory.
PostGIS uses (I think) rtree indexes, so it's incredibly fast to perform the kind of lookup you are interested in.
Using a backend that fires off websocket requests would allow you to perform pretty much real-time. Anytime your backend receives a persons GPS coordinates; perform the spatial lookup; and notify applicable clients through websockets.
The Redis geography edition mentioned by other answers in this thread has been integrated into Redis since version 3.2 (also see this earlier comment).
You can find the new commands here (in beta for now) :
GEOADD
GEODIST
GEOHASH
GEOPOS
GEORADIUS
GEORADIUSBYMEMBER
Tarantool database keeps data in memory, pushes them to disk as transaction logs, has RTree-type spatial index (not only 2-dimensional) and a number of nice operations on such index (containment, overlapping, distance).
I use it in a commercial project for storing and querying records which describe objects in 3D space.
http://tarantool.org/doc/book/box/box_index.html
https://github.com/tarantool/tarantool/wiki/R-tree-index-quick-start-and-usage
Standard client and examples are in Lua, but there are couple of other clients developed by the database authors. I use Java client in an Scala application with success.
The database is also very fast - here's scientific comparison with other databases (putting aside an aspect of being a spatial db):
http://airccse.org/journal/ijdms/papers/6314ijdms01.pdf
I would like to share a sample Java code for Redis Geography edition.
public void geoadd(String objectId, BigDecimal latitude, BigDecimal longitude) {
log.info("geoadd(): {} {} {}", objectId, latitude, longitude);
try (Jedis jedis = jedisPool.getResource()) {
if (geoaddSha == null) {
String script = "return redis.call('geoadd','" + GEOSET + "', ARGV[1], ARGV[2], KEYS[1])";
geoaddSha = jedis.scriptLoad(script);
}
log.info("geoaddSha: {}", geoaddSha);
log.info(jedis.evalsha(geoaddSha, 1, objectId, latitude.toString(), longitude.toString()).toString());
}
}
#SuppressWarnings("unchecked")
public List<String> georadius(BigDecimal latitude, BigDecimal longitude, int radius, Unit unit) {
log.info("georadius(): {} {} {} {}", latitude, longitude, radius, unit);
try (Jedis jedis = jedisPool.getResource()) {
if (georadiusSha == null) {
String script = "return redis.call('georadius','" + GEOSET + "', ARGV[1], ARGV[2], ARGV[3], ARGV[4])";
georadiusSha = jedis.scriptLoad(script);
}
log.info("georadiusSha: {}", georadiusSha);
List<String> objectIdList = (List<String>) jedis.evalsha(georadiusSha, 0, latitude.toString(), longitude.toString(), String.valueOf(radius), unit.toString());
log.info("objectIdList: {}", objectIdList);
return objectIdList;
}
}
public void remove(String objectId) {
log.info("remove(): {}", objectId);
try (Jedis jedis = jedisPool.getResource()) {
jedis.zrem(GEOSET, objectId);
}
}

How can I compare two NSImages for differences?

I'm attempting to gauge the percentage difference between two images.
Having done a lot of reading I seem to have a number of options but I'm not sure what the best method to follow for:
Ease of coding
Performance.
The methods I've seen are:
Non language specific - academic Image comparison - fast algorithm and Mac specific direct pixel access http://www.markj.net/iphone-uiimage-pixel-color/
Does anyone have any advice about what solutions make most sense for the above two cases and have code samples to show how to apply them?
I've had success calculating the difference between two images using the histogram technique mentioned here. redmoskito's answer in the SO question you linked to was actually my inspiration!
The following is an overview of the algorithm I used:
Convert the images to grayscale—compare one channel instead of three.
Divide each image into an n * n grid of "subimages". Then, for subimage pair:
Calculate their colour composition histograms.
Calculate the absolute difference between the two histograms.
The maximum difference found between two subimages is a measure of the two images' difference. Other metrics could also be used (e.g. the average difference betwen subimages).
As tskuzzy noted in his answer, if your ultimate goal is a binary "yes, these two images are (roughly) the same" or "no, they're not", you need some meaningful threshold value. You could produce such a value by passing images into the algorithm and tweaking the threshold based on its output and how similar you think the images are. A form of machine learning, I suppose.
I recently wrote a blog post on this very topic, albeit as part of a larger goal. I also created a simple iPhone app to demonstrate the algorithm. You can find the source on GitHub; perhaps it will help?
It is really difficult to suggest something when you don't tell us more about the images or the variations. Are they shapes? Are they the different objects and you want to know what class of objects? Are they the same object and you want to distinguish the object instance? Are they faces? Are they fingerprints? Are the objects in the same pose? Under the same illumination?
When you say performance, what exactly do you mean? How large are the images? All in all it really depends. With what you've said if it is only ease of coding and performance I would suggest to just find the absolute value of the difference of pixels. That is super easy to code and about as fast as it gets, but really unlikely to work for anything other than the most synthetic examples.
That being said I would like to point you to: DHOG, GLOH, SURF and SIFT.
You can use fairly basic subtraction technique that the lads above suggested. #carlosdc has hit the nail on the head with regard to the type of image this basic technique can be used for. I have attached an example so you can see the results for yourself.
The first shows a image from a simulation at some time t. A second image was subtracted away from the first which was taken some (simulation) time later t + dt. The subtracted image (in black and white for clarity) then shows how the simulation has changed in that time. This was done as described above and is very powerful and easy to code.
Hope this aids you in some way
This is some old nasty FORTRAN, but should give you the basic approach. It is not that difficult at all. Due to the fact that I am doing it on a two colour pallette you would do this operation for R, G and B. That is compute the intensities or values in each cell/pixal, store them in some array. Do the same for the other image, and subtract one array from the other, this will leave you with some coulorfull subtraction image. My advice would be to do as the lads suggest above, compute the magnitude of the sum of the R, G and B componants so you just get one value. Write that to array, do the same for the other image, then subtract. Then create a new range for either R, G or B and map the resulting subtracted array to this, the will enable a much clearer picture as a result.
* =============================================================
SUBROUTINE SUBTRACT(FNAME1,FNAME2,IOS)
* This routine writes a model to files
* =============================================================
* Common :
INCLUDE 'CONST.CMN'
INCLUDE 'IO.CMN'
INCLUDE 'SYNCH.CMN'
INCLUDE 'PGP.CMN'
* Input :
CHARACTER fname1*(sznam),fname2*(sznam)
* Output :
integer IOS
* Variables:
logical glue
character fullname*(szlin)
character dir*(szlin),ftype*(3)
integer i,j,nxy1,nxy2
real si1(2*maxc,2*maxc),si2(2*maxc,2*maxc)
* =================================================================
IOS = 1
nomap=.true.
ftype='map'
dir='./pictures'
! reading first image
if(.not.glue(dir,fname2,ftype,fullname))then
write(*,31) fullname
return
endif
OPEN(unit2,status='old',name=fullname,form='unformatted',err=10,iostat=ios)
read(unit2,err=11)nxy2
read(unit2,err=11)rad,dxy
do i=1,nxy2
do j=1,nxy2
read(unit2,err=11)si2(i,j)
enddo
enddo
CLOSE(unit2)
! reading second image
if(.not.glue(dir,fname1,ftype,fullname))then
write(*,31) fullname
return
endif
OPEN(unit2,status='old',name=fullname,form='unformatted',err=10,iostat=ios)
read(unit2,err=11)nxy1
read(unit2,err=11)rad,dxy
do i=1,nxy1
do j=1,nxy1
read(unit2,err=11)si1(i,j)
enddo
enddo
CLOSE(unit2)
! substracting images
if(nxy1.eq.nxy2)then
nxy=nxy1
do i=1,nxy1
do j=1,nxy1
si(i,j)=si2(i,j)-si1(i,j)
enddo
enddo
else
print *,'SUBSTRACT: Different sizes of image arrays'
IOS=0
return
endif
* normal finishing
IOS=0
nomap=.false.
return
* exceptional finishing
10 write (*,30) fullname
return
11 write (*,32) fullname
return
30 format('Cannot open file ',72A)
31 format('Improper filename ',72A)
32 format('Error reading from file ',72A)
end
! =============================================================
Hope this is of some use. All the best.
Out of the methods described in your first link, the histogram comparison method is by far the simplest to code and the fastest. However key point matching will provide far more accurate results since you want to know a precise number describing the difference between two images.
To implement the histogram method, I would do the following:
Compute the red, green, and blue histograms of each image
Add up the differences between each bucket
If the difference is above a certain threshold, then the percentage is 0%
Otherwise the colors found in the images are similar. So then do a pixel by pixel comparison and convert the difference into a percentage.
I don't know any precise algorithms for finding the key points of an image. However once you find them for each image you can do a pixel by pixel comparison for each of the key points.