I'm currently writing a MIP in LPsolveAPI in R. The program itself is straightforward, but I can't find a way to write an either-or constraint without being able to directly call a new binary variable or the binary values on the lhs. Does LPsolveAPI not support this or am I missing something obvious?
The use of new binary variables is the standard way to model either-or-constraints in lp_solve. (LpSolveAPI is based on the lp_solve solver.). So you are not missing anything obvious.
That said, one thing that might help you, depending on your constraints is the use of SOS (Special Ordered Sets). Check out the reference to SOS in lp_Solve.
Related
My client is making database searches using a django webapp that I've written. The query sends a regex search to the database and outputs the results.
Because the regex searches can be pretty long and unintuitive, the client has asked for certain custom "wildcards" to be created for the regex searches. For example.
Ω := [^aeiou] (all non-vowels)
etc.
This could be achieved with a simple permanent string substitution in the query, something like
query = query.replace("Ω", "[^aeiou]")
for all the elements in the substitution list. This seems like it should be safe, but I'm not really sure.
He has also asked that it be possible for the user to define custom wildcards for their searches on the fly. So that there would be some other input box where a user could define
∫ := some other regex
And to store them you might create a model
class RegexWildcard(models.Model):
symbol = ...
replacement = ...
I'm personally a bit wary of this, because it does not seem to add a whole lot of functionality, but does seem to add a lot of complexity and potential problems to the code. Clients can now write their queries to a db. Can they overwrite each other's symbols?
That I haven't seen this done anywhere before also makes me kind of wary of the idea.
Is this possible? Desirable? A great idea? A terrible idea? Resources and any guidance appreciated.
Well, you're getting paid by the hour....
I don't see how involving the Greek alphabet is to anyone's advantage. If the queries are stored anywhere, everyone approaching the system would have to learn the new syntax to understand them. Plus, there's the problem of how to type the special symbols.
If the client creates complex regular expressions they'd like to be able to reuse, that's understandable. Your application could maintain a list of such expressions that the user could add to and choose from. Notionally, the user would "click on" an expression, and it would be inserted into the query.
The saved expressions could have user-defined names, to make them easier to remember and refer to. And you could define a syntax that referenced them, something otherwise invalid in SQL, such as ::name. Before submitting the query to the DBMS, you substitute the regex for the name.
You still have the problem of choosing good names, and training.
To prevent malformed SQL, I imagine you'll want to ensure the regex is valid. You wouldn't want your system to store a ; drop table CUSTOMERS; as a "regular expression"! You'll either have to validate the expression or, if you can, treat the regex as data in a parameterized query.
The real question to me, though, is why you're in the vicinity of standardized regex queries. That need suggests a database design issue: it suggests the column being queried is composed of composite data, and should be represented as multiple columns that can be queried directly, without using regular expressions.
Question:
Is there a way to use names instead of question marks for paramaterized queries? If so, can anyone suggest some material that explains how to do this/the syntax?
A bit more detail:
For example, if I have something like:
INSERT INTO inventory VALUES(?)
Is it possible to have something like this instead that does the exact same thing as the question mark:
INSERT INTO inventory VALUES("prices")
I tried checking to see if it would work myself before posting the question, but it didn't work. So, I thought I'd ask if it was possible.
I feel like if you have a really long query with, let's say 20 parameters, you don't want to have to count question marks to make sure you have enough parameters whenever you change something. Also, I think it might make the code a bit more readable (especially if you have a lot of parameters to keep track of).
I'm rather new to sql, so I am not sure if it makes much of a difference (for this question) if I add that I'm using postgresql.
Note:
There is a similar question here, but it didn't have an answer that was helpful
I suggest to encapsulate the big query in a function, where you can use parameter names.
One example (out of many):
PostgreSQL parameterized Order By / Limit in table function
You can even set default values and call the function with named parameters, etc.:
Functions with variable number of input parameters
Lets say I have a string that contains stored procedure names, their parameters, and values, like this:
Dim sprocs As String = "Procedure: NameOfProcedure, Parameters: #Param1(int), #Param2(varchar) Values: Something, SomethingElse ; Procedure: NameOfProcedure, Parameters: #Param1(int), #Param2(varchar) Values: Something, Something Else"
My point of the above is I need to be able to take a user defined string that can store multiple procedures separated by... something. In this case I used a ; semicolon.
What I need to be able to do is a FOR EACH on these in vb.net.
I know how to used stored procedures and set parameters... I'm just not sure of the best way to go about this as far as how to split up that string to match parameters and values, etc.
So a bad example:
Dim ConnectionString As String = "String...removed"
Dim conn As New System.Data.SqlClient.SqlConnection(ConnectionString)
' So I need a for each here
Dim cmd As New System.Data.SqlClient.SqlCommand(ProcedureName)
With cmd
.Connection = conn
.CommandType = CommandType.StoredProcedure
' and a for each here on the parameters, datatypes, and their values
.Parameters.Add(ParamName, DataType).Value = TheValue
.Connection.Open()
.BeginExecuteReader()
.Connection.Close()
End With
So the splitting and parsing out the string is my biggest roadblock on this. I'm not sure how to separate it out at this level, match parameters, datatypes and their values all together. If there's a better way to have the end-user organize the string to make parsing and splitting easier, please share.
Any suggestions or examples are greatly appreciated.
Thank you very much.
There's a temptation here to use String.Split() or a regular expression. Neither solution is actually very good for this. To understand what I mean, take a quick moment to search Google for the many thousands of posts of programmers working with CSV running into edge cases where their regex didn't quite cut it. For this purpose, your semi-colon delimiter is not significantly different than a fancy comma.
Depending on the regex engine, it may actually be possible to build an expression that does this, especially if you can put certain constraints on the data. However, such expressions tend to be very tough to build and maintain, and they tend to require backtracking, which hurts performance.
The next alternative to examine is a dedicated parser. For CSV data, this is the panacea. There a number of good ones available for most any platform that you can just plug in, and they're typically pretty easy to work with. However, what you have, while similar, goes a bit beyond the level of what these parsers are written to handle.
One related solution is write your own csv-like parser. This may be your best option, and if you choose this route all I can do is recommend that you build a state machine to go character by character through the input string.
The next rung up the ladder is to use a generic parser/interpretter that relies on something like ANTLR. I'm haven't personally dug very deep into that well, but I know the tools are out there.
One related option, that I think is probably your best option here overall, is a Domain Specific Language (DSL). This is something that may be a little easier to get started with than a tool like ANTLR, especially because it's already built into Visual Studio. However, this option requires that you be able to control your input format.
If all this sounds like more work than it should be, you're right. The good news is that is possible to skip all this. The trick is that you must first impose artificial constraints on your data. For example, a constraint that parameter values are not allowed to contain ; characters means you can get back to a String.Split() solution to divide out each separate procedure call. Now your function is no longer generally applicable, but then maybe is doesn't need to be. When you create these constraints, make sure they are documented, so that when something violates them and code explodes you'll have clear definitions for why.
I have to load a log that should fit in the pattern. Unfortunately some records don't.
It occurs as an error when I'm trying to store data in HCatalog.
Is it possible to store the records that fit the pattern in the HCalatlog, and keep other in a file for further processing?
Or maybe it is possible to do something like try-catch in Pig?
I can't find any solution on but it must be simple - I just don't believe nobody faced that problem earlier!
I will be grateful for any hints.
Edited Answer
People have faced this issue before, but the answer is usually "UDF". Unfortunately, I think that's probably the best answer for your question: a UDF that performs the data validation using java or python try/catch error handling.
Another answer is to use SPLIT to evaluate the data in the field and direct the data into the appropriate alias. This is a common method of handling non-expected data.
Original Answer:
In version .12 of Pig, you have the ASSERT operator, which isn't as good as try/catch, but it's better than nothing.
From the docs:
Suppose we have relation A.
A = LOAD 'data' AS (a0:int,a1:int,a2:int);
DUMP A;
(1,2,3)
(4,2,1)
(8,3,4)
(4,3,3)
(7,2,5)
(8,4,3)
Now, you can assert that a0 column in your data is >0, fail if otherwise
ASSERT A by a0 > 0 'a0 should be greater than 0';
The ASSERT method in JamCon's answer is often helpful, but as you say, your particular issue can't be addressed by it. If you are simply looking to test for the presence of extra columns, one possible workaround would be to load your data as normal, but in the AS clause, add an extra column called error:chararray. Typically, you would expect this to be NULL, but if there are extra columns, it won't be. So
SPLIT a INTO good IF error IS NULL, bad IF error IS NOT NULL;
to separate out the lines which have extra records.
Ugly, but in this particular case it should work for you.
Lets say i have a variable that contains the number of search engine names in a file, what would you name it?
number_of_seach_engine_names
search_engine_name_count
num_search_engines
engines
engine_names
other name?
The first name describes what the variable contains precisely, but isn't it too long?, any advice for choosing variable names? especially how to shorten a name that is too long or what kind of abbreviations to use?
How about numEngineNames?
Choosing variable names is more art than science. You want something that doesn't take an epoch to type, but long enough to be expressive. It's a subjective balance.
Ask yourself, if someone were looking at the variable name for the first time, is it reasonably likely that person will understand its purpose?
A name is too long when there exists a shorter name that equally conveys the purpose of the variable.
I think engineCount would be fine here. The number of engine names is presumably equal to the number of engines.
See JaredPar's post.
It depends on the scope of the variable. A local variable in a short function is usually not worth a 'perfect name', just call it engine_count or something like that. Usually the meaning will be easy to spot, if not a comment might be better than a two-line variable name.
Variables of wider scope – i.e. global variables (if they are really necessary!), member variables – deserve IMHO a name that is almost self documentary. Of course looking up the original declaration is not difficult and most IDE do it automatically, but the identifier of the variable should not be meaningless (i.e. number or count).
Of course, all this depends a lot on your personal coding style and the conventions at your work place.
Depends on the context, if its is a local variable, as eg
int num = text.scan(SEARCH_ENGINE_NAME).size();
the more explicit the right-hand of the expression the shorter the name I'd pick. The rational is that we are in a limited scope of maybe 4-5 lines and can thus assume that the reader will be able to make the connection between the short name and the right-hand-side expression. If however, it is the field of a class, I'd rather be as verbose as possible.
See similar question
The primary technical imperative is to reduce complexity. Variables should be named to reduce complexity. Sometimes this results in shorter names, sometimes longer names. It usually corresponds to how difficult it is for a maintainer to understand the complexity of the code.
On one end of the spectrums, you have for loop iterators and indexes. These can have names like i or j, because they are just that common and simple. Giving them longer names would only cause more confusion.
If a variable is used frequently but represents something more complex, then you have to give it a clear name so that the user doesn't have to relearn what it means every time they use it.
On the other end of the spectrum are variables that are used very rarely. You still want to reduce confusion here, but giving it a short name is less important, because the penalty for relearning the purpose of the variable is not paid very often.
When thinking about your code, try to look at it from the perspective of someone else. This will help not only with picking names, but with keeping your code readable as a whole.
Having really long variable names will muddle up your code's readability, so you want to avoid those. But on the other end of the spectrum, you want to avoid ultra-short names or acronyms like "n" or "ne." Short, cryptic names like these will cause someone trying to read your code to tear their hair out. Usually one to two letter variables are used for small tasks like being incremented in a for loop, for example.
So what you're left with is a balance between these two extremes. "Num" is a commonly used abbreviation, and any semi-experienced programmer will know what you mean immediately. So something like "numEngines" or "numEngineNames" would work well. In addition to this, you can also put a comment in your code next to the variable the very first time it's used. This will let the reader know exactly what you're doing and helps to avoid any possible confusion.
I'd name it "search_engine_count", because it holds a count of search engines.
Use Esc+_+Esc to write:
this_is_a_long_variable = 42
Esc+_+Esc and _ are not identical characters in Mathematica. That's why you are allowed to use the former but not the latter.
If it is a local variable in a function, I would probably call it n, or perhaps ne. Most functions only contain two or three variables, so a long name is unnecessary.