Django performance issue with exclude - sql

I have word model and phrase model
class Word(models.Model):
checked = models.BooleanField(default=False)
class Phrase(models.Model):
words = models.ManyToManyField(Word, null=True,related_name = "phrases")
Word model has attribute checked and many to many connection to phrase
I have a query, something like this:
Phrase.objects.exclude(words__checked=True).filter(words__group__pk__in = groups_ids)
But it works really really slow on big datasets, and the problem is in exclude section, cause without it - it works fast enough
So I found a suggestion that I should use raw sql here,
Performance issue with django exclude
So, how can rewrite this sentence with raw sql ?
(I need this query to both work postgres and mysql due to requirements, or I will need two queries if one query can't achieve this, postgres query has more priority)
So far I 've tried to use .extra syntax,but it didn't work, so asking it here.

Related

Is avoiding SQL statements in programs a good idea?

I recently came across a program which is developed using sql statements in a table with a code for each statement. rather than having specific sql statements in the program itself.
So, rather than having code like this:
string query = "SELECT id, name from [Users]";
cmd.ExecuteQuery(query);
They use code like this: (simplified)
string firstQuery = "SELECT queryText from [Queries] where queryCode = 'SELECT_ALL_USERS'";
string userQuery = cmd.ExecuteQuery(firstQuery);//pretend this directly returns the result of the first query
cmd.ExecuteQuery(userQuery);
The logic behind this as far as I've heard is that it makes the program easier to maintain as the developer is free to change the "user sql" without having to actually change the program.
However, this struck me as maybe a little counterproductive. Would this kind of code be considered a good idea?
EDIT: I'm not looking for suggestions like "use an ORM". Assume that sql queries are the only option.
In my opinion, this approach is ridiculous. There is value (maintainability, modularity) in separating as much SQL from the middle tier as possible, but to accomplish this end, I would recommend using stored procedures.
No i really dont think its a good idea to proceed further with design.
As a test or learning activity is a differetn part, but going foward with such implementations is definately not advisable.
pros:
1. We get complete modularity. The Real Business Schema can change at any time, and we do not need to modify the Running application to get the results from Different schema (Considering result Format dont change).
Cons.
1. With this implementation we are firing 2 SQLs to Database each time when we want to execute 1. I/O call including DB calls are always performnace hit, and with this implementation we are doubling the performance which is definately not advisable.

Doctrine - strange moved parenthesis in the query

I have tested and done quite some research online, but still no luck. Did anyone ever encounter this problem ?
Say, I have a doctrine query set up like:
$q = Doctrine_Query::create()
->update('PckFolder')
->set('id_path', "CONCAT(?, RIGHT(id_path, LENGTH(id_path)-?))", array($newPath, $lenOld))
->where("id_path like '$oldPath%'");
// and I print the query out
$qstr = $q->getSqlQuery(array($newPath, $lenOld));
Instead of giving me:
UPDATE pck_folder SET id_path = CONCAT(?, RIGHT(id_path, LENGTH(id_path)-?)) WHERE (id_path like '1/2//%')
Doctrine gave me:
UPDATE pck_folder SET id_path = CONCAT(?, RIGHT(id_path, LENGTH(id_path-?))) WHERE (id_path like '1/2//%')
please note this part RIGHT(id_path, LENGTH(id_path)-?)
(Note: I'm assuming you're using Doctrine 1.2. I haven't used Doctrine 2.0 yet.)
I had not encountered that specific bug before, but I have found numerous problems with the implementation of update() in Doctrine_Query. Essentially anything but the most very straightforward update queries will cause the parser to generate wrong or invalid queries. For example, it can't handle sub-selects within an update.
Try writing a Raw SQL query, or else use a less efficient but fully-functional workaround: Select the records you want to update using Doctrine_Query, then iterate over them and set the field in PHP, then call save() on each one.
By the way, there's a big GOTCHA inherent with use of UPDATE queries and Doctrine that sort of forces you to use that workaround in many cases anyway. That is, if you or your plugins have made use of the nifty Doctrine hook methods within your models but you execute a SQL-level update that affects those records, the hooks will get silently circumvented. Depending on your application, that may wreck your business logic processing.

Guidance on creating a basic search function in Rails3

Still pretty new to Rails and hoping to develop a function on a site enabling a search to be performed of the manner detailed below:
User inputs a search term / phrase (string of words but unlikely to be more than 5 or 6)
String is chopped into its constituent words
Entries in a single model with a description (a single field in the model) are output
Having looked at previous questions on this site, I am aware that there are a number of add-ons which are commonly used for search queries, however, are these needed in such a simple situation?
I was thinking that I could use an SQL command with a number of ANDs to perform this task?
Currently the model is stored within sqlite3, but it is probably going to grow to about 100,000 lines (just 10 fields though) in the near future if this is likely to cause problems?
Finally, is there an easy way to pull out the words of a string automatically for any length of string / up to a certain limit that is unlikely to be exceeded?
Thanks in advance for your time and patience
You can easily pull the words from a string with ruby: 'alice bob charlie'.split(/\s+/) will give you an array with the words.
Then, you can string those words together into an SQL query to find the appropriate records. It don't know about the performance of this solution though... You should definitely test it out to see if there are any performance issues.

Lazy loading/caching of SQL query results with a model

I'm developing a system (with Rails 2.3.2, Ruby 1.8.7-p72) that has a sizable reporting component. In order to improve performance, I've created a Report model to archive old reports. The idea is that if a matching report already exists for an arbitrary set of conditions then use it, otherwise generate the report and save the results.
Moreover, I'd like to design the Report model in such a way that only the requested attributes have their corresponding SQL queries run. This all stems from the fact that each attribute takes a long time to compute and I'd rather not generate results that won't be used. I.e. I would like to do something like:
def foo
#foo ||= read_attribute(:foo)
if #foo.nil?
#foo = write_attribute(:foo, (expensive SQL query result))
end
#foo
end
The problem I'm experiencing, however, is that results aren't being properly written out to my database and, as a result, the code is constantly reevaluating the SQL query.
Can anyone tell me why write_attribute isn't working? Furthermore, is there a better approach?
Turns out that what I was doing was fine. The real problem was that the object's "id" lookup was being trumped by a piece of code I had elsewhere. I.e. the actual write was occurring to the database, but with the wrong primary key.
Don't you need to call "save" after doing write_attribute?

Prevent "Too Many Clauses" on lucene query

In my tests I suddenly bumped into a Too Many Clauses exception when trying to get the hits from a boolean query that consisted of a termquery and a wildcard query.
I searched around the net and on the found resources they suggest to increase the BooleanQuery.SetMaxClauseCount().
This sounds fishy to me.. To what should I up it? How can I rely that this new magic number will be sufficient for my query? How far can I increment this number before all hell breaks loose?
In general I feel this is not a solution. There must be a deeper problem..
The query was +{+companyName:mercedes +paintCode:a*} and the index has ~2.5M documents.
the paintCode:a* part of the query is a prefix query for any paintCode beginning with an "a". Is that what you're aiming for?
Lucene expands prefix queries into a boolean query containing all the possible terms that match the prefix. In your case, apparently there are more than 1024 possible paintCodes that begin with an "a".
If it sounds to you like prefix queries are useless, you're not far from the truth.
I would suggest you change your indexing scheme to avoid using a Prefix Query. I'm not sure what you're trying to accomplish with your example, but if you want to search for paint codes by first letter, make a paintCodeFirstLetter field and search by that field.
ADDED
If you're desperate, and are willing to accept partial results, you can build your own Lucene version from source. You need to make changes to the files PrefixQuery.java and MultiTermQuery.java, both under org/apache/lucene/search. In the rewrite method of both classes, change the line
query.add(tq, BooleanClause.Occur.SHOULD); // add to query
to
try {
query.add(tq, BooleanClause.Occur.SHOULD); // add to query
} catch (TooManyClauses e) {
break;
}
I did this for my own project and it works.
If you really don't like the idea of changing Lucene, you could write your own PrefixQuery variant and your own QueryParser, but I don't think it's much better.
It seems like you are using this on a field that is sort of a Keyword type (meaning there will not be multiple tokens in your data source field).
There is a suggestion here that seems pretty elegant to me: http://grokbase.com/t/lucene.apache.org/java-user/2007/11/substring-indexing-to-avoid-toomanyclauses-exception/12f7s7kzp2emktbn66tdmfpcxfya
The basic idea is to break down your term into multiple fields with increasing length until you are pretty sure you will not hit the clause limit.
Example:
Imagine a paintCode like this:
"a4c2d3"
When indexing this value, you create the following field values in your document:
[paintCode]: "a4c2d3"
[paintCode1n]: "a"
[paintCode2n]: "a4"
[paintCode3n]: "a4c"
By the time you query, the number of characters in your term decide which field to search on. This means that you will perform a prefix query only for terms with more of 3 characters, which greatly decreases the internal result count, preventing the infamous TooManyBooleanClausesException. Apparently this also speeds up the searching process.
You can easily automate a process that breaks down the terms automatically and fills the documents with values according to a name scheme during indexing.
Some issues may arise if you have multiple tokens for each field. You can find more details in the article