Smart indexing for matching values in Julia - indexing

Suppose I have the following:
# valuesToFind: n x 1 vector
# allValues: m x n matrix, in which every column allValues[:,i]
# contains among it's components exactly 1 instance of the
# corresponding value valuesToFind[i] at some row position
I am trying to determine the position (row index) at which this match occurs for every value in valuesToFind and, currently, I achieve it with the following loop:
idx=Array(Int16, length(valuesToFind))
for (i, v) in enumerate(valuesToFind)
idx[i] = findfirst(articleIDs[:,i], v)
end
Is it possible to do this without loop in a single statement?

Are you looking for:
[findfirst(allValues[:,i], v) for (i,v) in enumerate(valuesToFind)]
?
I'm not 100% convinced this is clearer (in terms of code readability) then a simple loop, but it would do the job in one line if that's what you're after.

Try:
map(x->ind2sub(allValues,x)[1],findin(allValues,valuesToFind))
This is a one-line solution to get the row number of each value in the column. Note that it uses the assumption laid out in the question (unique value in each column). It also uses some assumption on the column first layout of a matrix. The layout assumption can be removed with a sort on the first index returned by ind2sub.

Related

Summation iterated over a variable length

I have written an optimization problem in pyomo and need a constraint, which contains a summation that has a variable length:
u_i_t[i, t]*T_min_run - sum (tnewnew in (t-T_min_run+1)..t-1) u_i_t[i,tnewnew] <= sum (tnew in t..(t+T_min_run-1)) u_i_t[i,tnew]
T is my actual timeline and N my machines
usually I iterate over t, but I need to guarantee the machines are turned on for certain amount of time.
def HP_on_rule(model, i, t):
return model.u_i_t[i, t]*T_min_run - sum(model.u_i_t[i, tnewnew] for tnewnew in range((t-T_min_run+1), (t-1))) <= sum(model.u_i_t[i, tnew] for tnew in range(t, (t+T_min_run-1)))
model.HP_on_rule = Constraint(N, rule=HP_on_rule)
I hope you can provide me with the correct formulation in pyomo/python.
The problem is that t is a running variable and I do not know how to implement this in Python. tnew is only a help variable. E.g. t=6 (variable), T_min_run=3 (constant) and u_i_t is binary [00001111100000...] then I get:
1*3 - 1 <= 3
As I said, I do not know how to implement this in my code and the current version is not running.
TypeError: HP_on_rule() missing 1 required positional argument: 't'
It seems like you didn't provide all your arguments to the function rule.
Since t is a parameter of your function, I assume that it corresponds to an element of set T (your timeline).
Then, your last line of your code example should include not only the set N, but also the set T. Try this:
model.HP_on_rule = Constraint(N, T, rule=HP_on_rule)
Please note: Building a Constraint with a "for each" part, you must provide the Pyomo Sets that you want to iterate over at the begining of the call for Constraint construction. As a rule of thumb, your constraint rule function should have 1 more argument than the number of Pyomo Sets specified in the Constraint initilization line.

error in LDA in r: each row of the input matrix needs to contain at least one non-zero entry

I am a starter in text mining topic. When I run LDA() over a huge dataset with 996165 observations, it displays the following error:
Error in LDA(dtm, k, method = "Gibbs", control = list(nstart = nstart, :
Each row of the input matrix needs to contain at least one non-zero entry.
I am pretty sure that there is no missing values in my corpus and also. The table of "DocumentTermMatrix" and "simple_triplet_matrix" is:
table(is.na(dtm[[1]]))
#FALSE
#57100956
table(is.na(dtm[[2]]))
#FALSE
#57100956
A little confused how "57100956" comes. But as my dataset is pretty large, I don't know how to check why does this error occurs. My LDA command is:
ldaOut<-LDA(dtm,k, method="Gibbs", control=list(nstart=nstart, seed = seed, best=best, burnin = burnin, iter = iter, thin=thin))
Can anyone provide some insights? Thanks.
In my opinion the problem is not the presence of missing values, but the presence of all 0 rows.
To check it:
raw.sum=apply(table,1,FUN=sum) #sum by raw each raw of the table
Then you can delete all raws which are all 0 doing:
table=table[raw.sum!=0,]
Now table should has all "non 0" raws.
I had the same problem. The design matrix, dtm, in your case, had rows with all zeroes because dome documents did not contain certain words (i.e. their frequency was zero). I suppose this somehow causes a singular matrix problem somewhere along the line. I fixed this by adding a common word to each of the documents so that every row would have at least one non-zero entry. At the very least, the LDA ran successfully and classified each of the documents. Hope this helps!

Return highest or lowest value Z notation , formal method

I am new to Z notation,
Lets say I have a function f defined as X |--> Y ,
where X is string and Y is number.
How can I get highest Y value in this function? Does 'loop' exist in formal method so I can solve it using loop?
I know there is recursion in Z notation, but based on the material provided, I only found it apply in multiset or bag, can it apply in function?
Any extra reference application of 'loop' or recursion application will be appreciated. Sorry for my English.
You can just use the predefined function max that takes a set of integers as input and returns the maximum number. The input values here are the range (the set of all values) of the function:
max(ran(f))
Please note that the maximum is not defined for empty sets.
Regarding your question about recursion or loops: You can actually define a function recursively but I think your question aims more at a way to compute something. This is not easily expressed in Z and this is IMO a good thing because it is used for specifications and it is not a programming language. Even if there wouldn't be a max or ran function, you could still specify the number m you are looking for by:
\exists s:String # (s,m):f /\
\forall s2:String, i2:Z # (s2,i2):f ==> i2 <= m
("m is a value of f, belonging to an s and all other values i2 of f are smaller or equal")
After getting used to the style it is usually far better to understand than any programming language (except your are trying to describe an algorithm itself and not its expected outcome).#
Just for reference: An example of a recursive definition (let's call it rmax) for the maximum would consist of a base case:
\forall e:Z # rmax({e}) = e
and a recursive case:
\forall e:Z; S:\pow(Z) #
S \noteq {} \land
rmax({e} \cup S) = \IF e > rmax(S) \THEN e \ELSE rmax(S)
But note that this is still not a "computation rule" of rmax because e in the second rule can be an arbitrary element of S. In more complex scenarios it might even be not obvious that the defined relation is a function at all because depending on the chosen elements different results could be computed.

How are records stored in erlang, and how are they mutated?

I recently came across some code that looked something like the following:
-record(my_rec, {f0, f1, f2...... f711}).
update_field({f0, Val}, R) -> R#my_rec{f0 = Val};
update_field({f1, Val}, R) -> R#my_rec{f1 = Val};
update_field({f2, Val}, R) -> R#my_rec{f2 = Val};
....
update_field({f711, Val}, R) -> R#my_rec{f711 = Val}.
generate_record_from_proplist(Props)->
lists:foldl(fun update_field/2, #my_rec{}, Props).
My question is about what actually happens to the record - lets say the record has 711 fields and I'm generating it from a proplist - since the record is immutable, we are, at least semantically, generating a new full record one every step in the foldr - making what looks like a function that would be linear in the length of the arguments, into one that is actually quadratic in the length, since there are updates corresponding to the length the record for every insert - Am I correct in this assumption,or is the compiler intelligent enough
to save me?
Records are tuples which first element contains the name of the records, and the next one the record fields.
The name of the fields is not stored, it is a facility for the compiler, and of course the programmer. I think it was introduced only to avoid errors in the field order when writting programs, and to allow tupple extension when releasing new version without rewriting all pattern matches.
Your code will make 712 copies of a 713 element tuple.
I am afraid, the compiler is not smart enough.
You can read more in this SO answer. If you have so big number of fields and you want to update it in O(1) time, you should use ETS tables.

DFA minimization algorithm understanding

I'm trying to understand this algorithm the DFA minimization algorithm at http://www.cs.umd.edu/class/fall2009/cmsc330/lectures/discussion2.pdf where it says:
while until there is no change in the table contents:
For each pair of states (p,q) and each character a in the alphabet:
if Distinct(p,q) is empty and Distinct(δ(p,a), δ(q,a)) is not empty:
set distinct(p,q) to be x
The bit I don't understand is "Distinct(δ(p,a), δ(q,a))" I think I understand the transition function where δ(p,a) = whatever state is reached from p with input a. but with the following DFA:
http://i.stack.imgur.com/arZ8O.png
resulting in this table:
imgur.com/Vg38ZDN.png
shouldn't (c,b) also be marked as an x since distinct(δ(b,0), δ(c,0)) is not empty (d) ?
Distinct(δ(p,a), δ(q,a)) will only be non-empty if δ(p,a) and δ(q,a) are distinct. In your example, δ(b,0) and δ(c,0) are both d. Distinct(d, d) is empty since it doesn't make sense for d to be distinct with itself. Since Distinct(d, d) is empty, we don't mark Distinct(c, b).
In general, Distinct(p, p) where p is a state will always be empty. Better yet, we don't consider it because it doesn't make sense.