r exam type num: answer Inf - numeric

I have an moodle exercise where the numerical answer is Infinity. The xml file is generated without issues, however, when importing the question to moodle an R occurred.
Here is a simplified version of the exercise.
```{r data generation, echo = FALSE, results = "hide"}
sol<-Inf
```
Question
========
blblblbl
Solution
========
$x=\infty$
Meta-information
================
extype: num
exsolution: `r fmt(sol, 4)`
extol: 0.0001
exname: prob complemento 2
Is it possible to have infinity answer in moodle numerical type question?
It is important to conserve the numerical type since in some numerical rng instances the answer could be different to infinity

My understanding is that there is no proper way to do this in Moodle. Possibly you have already seen this discussion: https://moodle.org/mod/forum/discuss.php?d=406703.
Personally, I would avoid the problem by making sure that the data-generating process always yields a finite result.
Or if you want to have infinity as a possible answer, transform the exercise to single choice (schoice) and make infinity one of the answer options. For turning numeric into single-choice questions, see this R/exams tutorial: http://www.R-exams.org/tutorials/static_num_schoice/.

Related

Is there any "modulo" equivalent representation for XNOR?

I don't know if this is the right home for this question but since this kind of explanation is used in programming I am posting it here. If I am wrong, please respond in this post and will transfer the question to another home.
So after studying digital logic I came to know that each of the logic gates have their equivalent in programming, like AND, OR and NOT have their separate operators. And for XOR I have been told that it is equivalent to modulo 2 of the number of inputs in an actual logic gate. But what about XNOR? Is there any representation of this type for XNOR? And is that explanation generalized? Like 'modulo 2' is generalized for 3-4 any n number of inputs. Does this apply for XNOR as well?
Just to be clear here, xor is actually equivalent to modulo 2 of the number of true inputs. Your original statement (without the word "true") could be misread as including both true and false inputs.
In any case, since xnor is simply the complement of xor, it's a relatively simple matter to invert the outgoing value by injecting one extra truth value. In pseudo-code, that would be:
def xor(inputs):
value = sum(inputs) modulo 2
def xnor(inputs):
value = (sum(inputs) + 1) modulo 2

Setting initial values for non-linear parameters via tabuSearch

I'm trying to fit the lppl model to KLSE index to predict the most probable crash time. Many papers suggested tabuSearch to identify the initial value for non-linear parameters but none of them publish their code. I have tried to fit the mentioned index with the help of NLS And Log-Periodic Power Law (LPPL) in R. But the obtained error and p values are not significant. I believe that the initial values are not accurate. Can anyone help me on how to find the proper initial values?
library(tseries)
library(zoo)
ts<-get.hist.quote(instrument="^KLSE",start="2003-04-18",end="2008-01-30",quote="Close",provider="yahoo",origin="1970-01-01",compression="d",retclass="zoo")
df<-data.frame(ts)
df<-data.frame(Date=as.Date(rownames(df)),Y=df$Close)
df<-df[!is.na(df$Y),]
library(minpack.lm)
library(ggplot2)
df$days<-as.numeric(df$Date-df[1,]$Date)
f<-function(pars,xx){pars$a + (pars$tc - xx)^pars$m *(pars$b+ pars$c * cos(pars$omega*log(pars$tc - xx) + pars$phi))}
resids<-function(p,observed,xx){df$Y-f(p,xx)}
nls.out <- nls.lm(par=list(a=600,b=-266,tc=3000, m=.5,omega=7.8,phi=-4,c=-14),fn = resids, observed = df$Y, xx = df$days, control= nls.lm.control (maxiter =1024, ftol=1e-6, maxfev=1e6))
par<-nls.out$par
nls.final<-nls(Y~(a+(tc-days)^m*(b+c*cos(omega*log(tc-days)+phi))),data=df,start=par,algorithm="plinear",control=nls.control(maxiter=10024,minFactor=1e-8))
summary(nls.final)
I would look at some of the newer research on this topic, there is a good trig modification that will practically guarantee a singular optimization. Additionally, you can use r's built in linear equation solver, to find the linearizable parameters, ergo you will only need to optimize in 3 dimensions. The link below should get you started. I would cite recent literature and personal experience to strongly advise against a tabu search.
https://www.ethz.ch/content/dam/ethz/special-interest/mtec/chair-of-entrepreneurial-risks-dam/documents/dissertation/master%20thesis/MAS_final_Tuncay.pdf

How to find the fixpoint of a loop and why do we need this? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I know that in static analysis of program, we need to find fixpoint to analysis the info loop provided.
I have read the wiki as well as related meterials in the book Secure_programming_with_Static_Analysis.
But I am still confused with the concept fixpoint, so my questions are:
could anyone give me some explanations of the concept, fixpoint?
What is the practical way(ways) to find the fixpoint in static analysis?
What information can we get after finding the fixpoint?
Thank you!
Conceptually, the fixpoint corresponds to the most information you obtain about the loop by repeatedly iterating it on some set of abstract values. I'm going to guess that by "static analysis" you're referring here to "data flow analysis" or the version of "abstract interpretation" that most closely follows data flow analysis: a simulation of program execution using abstractions of the possible program states at each point. (Model checking follows a dual intuition in that you're simulating program states using an abstraction of possible execution paths. Both are approximations of concrete program behavior. )
Given some knowledge about a program point, this "simulation" corresponds to the effect that we know a particular program construct must have on what we know. For example, at some point in a program, we may know that x could (a) be uninitialized, or else have its value from statements (b) x = 0 or (c) x = f(5), but after (d) x = 42, its value can only have come from (d). On the other hand, if we have
if ( foo() ) {
x = 42; // (d)
bar();
} else {
baz();
x = x - 1; // (e)
}
then the value of x afterwards might have come from either of (d) or (e).
Now think about what can happen with a loop:
while ( x != 0 ) {
if ( foo() ) {
x = 42; // (d)
bar();
} else {
baz();
x = x - 1; // (e)
}
}
On entry, we have possible definitions of x from {a,b,c}. One pass through the loop means that the possible definitions are instead drawn from {d,e}. But what happens if foo() fails initially so that the loop does not run at all? What are the possibilities for x then? Well, in this case, the loop body has no effect, so the definitions of x would come from {a,b,c}. But if it ran, even once, then the answer is {d,e}. So what we know about x at the end of the loop is that the loop either ran or it didn't, which means that the assignment to x could be any one or {a,b,c,d,e}: the only safe answer here is the union of the property known at loop entry ({a,b,c}) and the property know at the end of one iteration ({d,e}).
But this also means that we must associate x with {a,b,c,d,e} at the beginning of the loop body, too, since we have no way of determining whether this is the first or the four thousandth time through the loop. So we have to consider again what we can have on loop exit: the union of the loop body's effect with the property assumed to hold on entry to the last iteration. Happily, this is just {a,b,c,d,e} ∪ {d,e} = {a,b,c,d,e}. In other words, we've not obtained any additional information through this second simulation of the loop body, and thus we can stop, since no further simulated iterations will change the result.
That's the fixpoint: the abstraction of the program state that will cause simulation to produce exactly the same result.
Now as for ways to find it, there are many, though the most straightforward ("chaotic iteration") simply runs the simulation of every program point (according to some fair strategy) until the answer doesn't change. A good starting point for learning better algorithms can be found in most any compilers textbook, though it isn't usually taught in a first course. Steven Muchnick's Advanced Compiler Design and Implementation is a more thorough and very readable treatment of the subject. If you can find a copy, Matthew Hecht's Flow Analysis of Computer Programs is another classic treatment. Both books focus on the "data flow analysis" technique for static analysis. You might also try out Principles of Program Analysis, by Nielson/Nielson/Hankin, though the technical details in the book can be pretty hairy. On the other hand, it offers a more general treatment of static analysis overall.

How can I compare two NSImages for differences?

I'm attempting to gauge the percentage difference between two images.
Having done a lot of reading I seem to have a number of options but I'm not sure what the best method to follow for:
Ease of coding
Performance.
The methods I've seen are:
Non language specific - academic Image comparison - fast algorithm and Mac specific direct pixel access http://www.markj.net/iphone-uiimage-pixel-color/
Does anyone have any advice about what solutions make most sense for the above two cases and have code samples to show how to apply them?
I've had success calculating the difference between two images using the histogram technique mentioned here. redmoskito's answer in the SO question you linked to was actually my inspiration!
The following is an overview of the algorithm I used:
Convert the images to grayscale—compare one channel instead of three.
Divide each image into an n * n grid of "subimages". Then, for subimage pair:
Calculate their colour composition histograms.
Calculate the absolute difference between the two histograms.
The maximum difference found between two subimages is a measure of the two images' difference. Other metrics could also be used (e.g. the average difference betwen subimages).
As tskuzzy noted in his answer, if your ultimate goal is a binary "yes, these two images are (roughly) the same" or "no, they're not", you need some meaningful threshold value. You could produce such a value by passing images into the algorithm and tweaking the threshold based on its output and how similar you think the images are. A form of machine learning, I suppose.
I recently wrote a blog post on this very topic, albeit as part of a larger goal. I also created a simple iPhone app to demonstrate the algorithm. You can find the source on GitHub; perhaps it will help?
It is really difficult to suggest something when you don't tell us more about the images or the variations. Are they shapes? Are they the different objects and you want to know what class of objects? Are they the same object and you want to distinguish the object instance? Are they faces? Are they fingerprints? Are the objects in the same pose? Under the same illumination?
When you say performance, what exactly do you mean? How large are the images? All in all it really depends. With what you've said if it is only ease of coding and performance I would suggest to just find the absolute value of the difference of pixels. That is super easy to code and about as fast as it gets, but really unlikely to work for anything other than the most synthetic examples.
That being said I would like to point you to: DHOG, GLOH, SURF and SIFT.
You can use fairly basic subtraction technique that the lads above suggested. #carlosdc has hit the nail on the head with regard to the type of image this basic technique can be used for. I have attached an example so you can see the results for yourself.
The first shows a image from a simulation at some time t. A second image was subtracted away from the first which was taken some (simulation) time later t + dt. The subtracted image (in black and white for clarity) then shows how the simulation has changed in that time. This was done as described above and is very powerful and easy to code.
Hope this aids you in some way
This is some old nasty FORTRAN, but should give you the basic approach. It is not that difficult at all. Due to the fact that I am doing it on a two colour pallette you would do this operation for R, G and B. That is compute the intensities or values in each cell/pixal, store them in some array. Do the same for the other image, and subtract one array from the other, this will leave you with some coulorfull subtraction image. My advice would be to do as the lads suggest above, compute the magnitude of the sum of the R, G and B componants so you just get one value. Write that to array, do the same for the other image, then subtract. Then create a new range for either R, G or B and map the resulting subtracted array to this, the will enable a much clearer picture as a result.
* =============================================================
SUBROUTINE SUBTRACT(FNAME1,FNAME2,IOS)
* This routine writes a model to files
* =============================================================
* Common :
INCLUDE 'CONST.CMN'
INCLUDE 'IO.CMN'
INCLUDE 'SYNCH.CMN'
INCLUDE 'PGP.CMN'
* Input :
CHARACTER fname1*(sznam),fname2*(sznam)
* Output :
integer IOS
* Variables:
logical glue
character fullname*(szlin)
character dir*(szlin),ftype*(3)
integer i,j,nxy1,nxy2
real si1(2*maxc,2*maxc),si2(2*maxc,2*maxc)
* =================================================================
IOS = 1
nomap=.true.
ftype='map'
dir='./pictures'
! reading first image
if(.not.glue(dir,fname2,ftype,fullname))then
write(*,31) fullname
return
endif
OPEN(unit2,status='old',name=fullname,form='unformatted',err=10,iostat=ios)
read(unit2,err=11)nxy2
read(unit2,err=11)rad,dxy
do i=1,nxy2
do j=1,nxy2
read(unit2,err=11)si2(i,j)
enddo
enddo
CLOSE(unit2)
! reading second image
if(.not.glue(dir,fname1,ftype,fullname))then
write(*,31) fullname
return
endif
OPEN(unit2,status='old',name=fullname,form='unformatted',err=10,iostat=ios)
read(unit2,err=11)nxy1
read(unit2,err=11)rad,dxy
do i=1,nxy1
do j=1,nxy1
read(unit2,err=11)si1(i,j)
enddo
enddo
CLOSE(unit2)
! substracting images
if(nxy1.eq.nxy2)then
nxy=nxy1
do i=1,nxy1
do j=1,nxy1
si(i,j)=si2(i,j)-si1(i,j)
enddo
enddo
else
print *,'SUBSTRACT: Different sizes of image arrays'
IOS=0
return
endif
* normal finishing
IOS=0
nomap=.false.
return
* exceptional finishing
10 write (*,30) fullname
return
11 write (*,32) fullname
return
30 format('Cannot open file ',72A)
31 format('Improper filename ',72A)
32 format('Error reading from file ',72A)
end
! =============================================================
Hope this is of some use. All the best.
Out of the methods described in your first link, the histogram comparison method is by far the simplest to code and the fastest. However key point matching will provide far more accurate results since you want to know a precise number describing the difference between two images.
To implement the histogram method, I would do the following:
Compute the red, green, and blue histograms of each image
Add up the differences between each bucket
If the difference is above a certain threshold, then the percentage is 0%
Otherwise the colors found in the images are similar. So then do a pixel by pixel comparison and convert the difference into a percentage.
I don't know any precise algorithms for finding the key points of an image. However once you find them for each image you can do a pixel by pixel comparison for each of the key points.

Circumventing R's `Error in if (nbins > .Machine$integer.max)`

This is a saga which began with the problem of how to do survey weighting. Now that I appear to be doing that correctly, I have hit a bit of a wall (see previous post for details on the import process and where the strata variable came from):
> require(foreign)
> ipums <- read.dta('/path/to/data.dta')
> require(survey)
> ipums.design <- svydesign(id=~serial, strata=~strata, data=ipums, weights=perwt)
Error in if (nbins > .Machine$integer.max) stop("attempt to make a table with >= 2^31 elements") :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In pd * (as.integer(cat) - 1L) : NAs produced by integer overflow
2: In pd * nl : NAs produced by integer overflow
> traceback()
9: tabulate(bin, pd)
8: as.vector(data)
7: array(tabulate(bin, pd), dims, dimnames = dn)
6: table(ids[, 1], strata[, 1])
5: inherits(x, "data.frame")
4: is.data.frame(x)
3: rowSums(table(ids[, 1], strata[, 1]) > 0)
2: svydesign.default(id = ~serial, weights = ~perwt, strata = ~strata,
data = ipums)
1: svydesign(id = ~serial, weights = ~perwt, strata = ~strata, data = ipums)
This error seems to come from the tabulate function, which I hoped would be straightforward enough to circumvent, first by changing .Machine$integer.max
> .Machine$integer.max <- 2^40
and when that didn't work the whole source code of tabulate:
> tabulate <- function(bin, nbins = max(1L, bin, na.rm=TRUE))
{
if(!is.numeric(bin) && !is.factor(bin))
stop("'bin' must be numeric or a factor")
#if (nbins > .Machine$integer.max)
if (nbins > 2^40) #replacement line
stop("attempt to make a table with >= 2^31 elements")
.C("R_tabulate",
as.integer(bin),
as.integer(length(bin)),
as.integer(nbins),
ans = integer(nbins),
NAOK = TRUE,
PACKAGE="base")$ans
}
Neither circumvented the problem. Apparently this is one reason why the ff package was created, but what worries me is the extent to which this is a problem I cannot avoid in R. This post seems to indicate that even if I were to use a package that would avoid this problem, I would only be able to access 2^31 elements at a time. My hope was to use sql (either sqlite or postgresql) to get around the memory problems, but I'm afraid I'll spend a while getting that to work, only to run into the same fundamental limit.
Attempting to switch back to Stata doesn't solve the problem either. Again see the previous post for how I use svyset, but the calculation I would like to run causes Stata to hang:
svy: mean age, over(strata)
Whether throwing more memory at it will solve the problem I don't know. I run R on my desktop which has 16 gigs, and I use Stata through a Windows server, currently setting memory allocation to 2000MB, but I could theoretically experiment with increasing that.
So in sum:
Is this a hard limit in R?
Would sql solve my R problems?
If I split it up into many separate files would that fix it (a lot of work...)?
Would throwing a lot of memory at Stata do it?
Am I seriously barking up the wrong tree somehow?
Yes, R uses 32-bit indexes for vectors so they can contain no more than 2^31-1 entries and you are trying to create something with 2^40. There is talk of introducing 64-bit indexes but that will be some way off before appearing in R. Vectors have the stated hard limit and that is it as far as base R is concerned.
I am unfamiliar with the details of what you are doing to offer any further advice on the other parts of your Q.
Why do you want to work with the full data set? Wouldn't a smaller sample that can fit in to the restrictions R places on you be just as useful? You could use SQL to store all the data and query it from R to return a random subset of more appropriate size.
Since this question was asked some time ago, I'd like to point that my answer here uses the version 3.3 of the survey package.
If you check the code of svydesign, you can see that the function that causes all the problem is within a check step that looks whether you should set the nest parameter to TRUE or not. This step can be disabled setting the option check.strata=FALSE.
Of course, you shouldn't disable a check step unless you know what you are doing. In this case, you should be able to decide yourself whether you need to set the nest option to TRUE or FALSE. nest should be set to TRUE when the same PSU (cluster) id is recycled in different strata.
Concretely for the IPUMS dataset, since you are using the serial variable for cluster identification and serial is unique for each household in a given sample, you may want to set nest to FALSE.
So, your survey design line would be:
ipums.design <- svydesign(id=~serial, strata=~strata, data=ipums, weights=perwt, check.strata=FALSE, nest=FALSE)
Extra advice: even after circumventing this problem you will find that the code is pretty slow unless you remap strata to a range from 1 to length(unique(ipums$strata)):
ipums$strata <- match(ipums$strata,unique(ipums$strata))
Both #Gavin and #Martin deserve credit for this answer, or at least leading me in the right direction. I'm mostly answering it separately to make it easier to read.
In the order I asked:
Yes 2^31 is a hard limit in R, though it seems to matter what type it is (which is a bit strange given it is the length of the vector, rather than the amount of memory (which I have plenty of) which is the stated problem. Do not convert strata or id variables to factors, that will just fix their length and nullify the effects of subsetting (which is the way to get around this problem).
sql could probably help, provided I learn how to use it correctly. I did the following test:
library(multicore) # make svy fast!
ri.ny <- subset(ipums, statefips_num %in% c(36, 44))
ri.ny.design <- svydesign(id=~serial, weights=~perwt, strata=~strata, data=ri.ny)
svyby(~incwage, ~strata, ri.ny.design, svymean, data=ri.ny, na.rm=TRUE, multicore=TRUE)
ri <- subset(ri.ny, statefips_num==44)
ri.design <- svydesign(id=~serial, weights=~perwt, strata=~strata, data=ri)
ri.mean <- svymean(~incwage, ri.design, data=ri, na.rm=TRUE)
ny <- subset(ri.ny, statefips_num==36)
ny.design <- svydesign(id=~serial, weights=~perwt, strata=~strata, data=ny)
ny.mean <- svymean(~incwage, ny.design, data=ny, na.rm=TRUE, multicore=TRUE)
And found the means to be the same, which seems like a reasonable test.
So: in theory, provided I can split up the calculation by either using plyr or sql, the results should still be fine.
See 2.
Throwing a lot of memory at Stata definitely helps, but now I'm running into annoying formatting issues. I seem to be able to perform most of the calculation I want (much quicker and with more stability as well) but I can't figure out how to get it into the form I want. Will probably ask a separate question on this. I think the short version here is that for big survey data, Stata is much better out of the box.
In many ways yes. Trying to do analysis with data this big is not something I should have taken on lightly, and I'm far from figuring it out even now. I was using the svydesign function correctly, but I didn't really know what's going on. I have a (very slightly) better grasp now, and it's heartening to know I was generally correct about how to solve the problem. #Gavin's general suggestion of trying out small data with external results to compare to is invaluable, something I should have started ages ago. Many thanks to both #Gavin and #Martin.