Optimizing a request doing tons of exclusions - sql

I have a huge and dirty SQL request doing many exclusions and I feel bad about it. Perhaps, you know a better way to proceed.
Here's a part of my request:
select name, version, iteration, score
from article, articlemaster
where article.idmaster = articlemaster.id
and article.id not in (select article.id
from article, spsarticlemaster
where article.idmaster = articlemaster.id
and articlemaster.name = 'nameOfMyArticle'
and article.version = 'A'
and article.state = 'CONTINUE')
and article.id not in....
and article.id not in....
You think it doesn't look that bad ? Actually, this is only a portion of the request, the "and spsarticle.id not in ...." exclude one article, and i got more than 1000 to exclude, so i'm using a java program to append the other 999.
Any idea how could i make a light version of this abomination ?

You might be better off loading all of the articles to exclude into a temporary table, then joining that table in to your query.
For example, create exclude_articles:
name version state
---- ------- -----
nameOfMyArticle A CONTINUE
Then exclude its results from the query:
select
article.name,
article.version,
article.iteration,
article.score
from
article
join articlemaster
on article.idmaster = articlemaster.id
where
not exists (
select 1
from article article2
join articlemaster articlemaster2
on article2.idmaster = articlemaster2.id
join exclude_articles
on articlemaster2.name = exclude_articles.name
and article2.version = exclude_articles.version
and article2.state = exclude_articles.state
where article.id = article2.id)
This is all assuming that the version and state are actually necessary for the exclusion logic. It would be a much easier case if the name is unique.

If you're using Java to create the query and process the results, then why not do all the complicated logic in Java? Just ask the database for all the articles matching some basic criterion (or maybe you really do want to read through all of them) and then filter the results:
select am.name, a.version, a.iteration, a.score, a.state
from article a, articlemaster am
where a.idmaster = am.id
and <some other basic criteria>
Then in Java loop over all the results (sorry, my Java is super rusty) and filter out the ones you don't want:
ArrayList recordList = ArrayList();
ArrayList finalList = ArrayList();
for (record in recordList) {
if (! filterThisRecord(record)) {
finalList.append(record);
}
}

Related

ORA-01841 happens on one environment but not all

I have the following SQL-code in my (SAP IdM) Application:
Select mcmskeyvalue as MKV,v1.searchvalue as STARTDATE, v2.avalue as Running_Changes_flag
from idmv_entry_simple
inner join idmv_value_basic_active v1 on mskey = mcmskey and attrname = 'Start_of_company_change'
and mcentrytype = 'MX_PERSON' and to_date(v1.searchvalue,'YYYY-MM-DD')<= sysdate+3
left join idmv_value_basic v2 on v2.mskey = mcmskey and v2.attrname = 'Running_Changes_flag'
where mcmskey not in (Select mskey from idmv_value_basic_active where attrname = 'Company_change_running_flag')
I already found the solution for the ORA-01841 problem, as it could either be a solution similar to MSSQLs try_to_date as mentioned here: How to handle to_date exceptions in a SELECT statment to ignore those rows?
or a solution where I change the code to something like this, to work soly on strings:
Select mcmskeyvalue as MKV,v1.searchvalue as STARTDATE, v2.avalue as Running_Changes_flag
from idmv_entry_simple
inner join idmv_value_basic_active v1 on mskey = mcmskey and attrname = 'Start_of_company_change'
and mcentrytype = 'MX_PERSON' and v1.searchvalue<= to_char(sysdate+3,'YYYY-MM-DD')
left join idmv_value_basic v2 on v2.mskey = mcmskey and v2.attrname = 'Running_Changes_flag'
where mcmskey not in (Select mskey from idmv_value_basic_active where attrname = 'Company_change_running_flag')
So for the actually problem I have a solution.
But now I came into discussion with my customers and teammates why the error happens at all.
Basically for all entries of idmv_value_basic_activ that comply to the requirement of "attrname = 'Start_of_company_change'" we can be sure that those are dates. In addition, if we execute the query to check all values that would be delivered, all are in a valid format.
I learned in university that the DB-Engine could decide in which order it will run individual segments of a query. So for me the most logical explanation would be that, on the development environment (where we face the problem), the section " to_date(v1.searchvalue,'YYYY-MM-DD')<= sysdate+3” is executed before the section “attrname = 'Start_of_company_change'”
Whereas on the productive environment, where everything works like a charm, the segments are executed in the order that is descripted by the SQL Statement.
Now my Question is:
First: do I remember that right, since the teacher said that only once and at that time I could not really make sense out of it
And Second: Is this assumption of mine correct or is there another reason for the problem?
Borderinformation:
The Tool uses a kind of shifted data structure which is why there can be quite a few different types in the actual “Searchvalue” column of the idmv_value_basic_activ view. The datatype on the database layer is always a varchar one.
"the DB-Engine could decide in which order it will run individual segments of a query"
This is correct. A SQL query is just a description of the data you want and where it's stored. Oracle will calculate an execution plan to retrieve that data as best it can. That plan will vary based on any number of factors, like the number of actual rows in the table and the presence of indexes, so it will vary from environment to environment.
So it sounds like you have an invalid date somewhere in your table, so to_date raises an exception. You can use validate_conversion to find it.

Django query not working in shell_plus but working in dbshell

I have a query that won't return me an expected value, but when I print the query itself, and run it in Dbshell, it does work. I am on Django 1.8.18 with SQLite version 3.11.0
My Recommendation has a Foreign Key on my Car, and I need to get all my Cars that do not have a Recommendation with is_active=True AND description=FOO. I know I could probably make it work in the other way, but it would be way easier for me to make it work this way.
class Car(models.Model):
kind = models.CharField(max_length=100)
class Recommendation(models.Model):
car = models.ForeignKey(Car)
is_active = models.BooleanField(default=True)
description = models.CharField(max_length=100)
I have created a Recommendation linked to my Car id 100, with is_active set to False, and description to FOO
Car.objects.exclude(recommendation__is_active=True, recommendation__description="FOO")
This query returns me nothing, when I expected it to return Car 100. I decided to print the actual query and try it in dbshell
SELECT "myapp_car"."id"
FROM "myapp_car"
WHERE NOT ("myapp_car"."id" IN (SELECT U1."car_id" AS Col1 FROM "myotherapp_recommendation" U1 WHERE U1."description" = 'FOO') AND "myapp_car"."id" IN (SELECT U1."car_id" AS Col1 FROM "myotherapp_recommendation" U1 WHERE U1."is_active" = 'True'))
However, this properly works ! It returns me my Car 100
I have also tried with Q, but it didn't work either
Car.objects.exclude(Q(recommendation__is_active=True) & Q(recommendation__description="FOO"))
It feels like a Django bug, but I'd rather have your opinion
What you have written here, is basically two JOINs: you exclude Car objects that have a Recommendation that is not is_active, and you furthermore exclude Cars that have a Recommendation (not necessarily the same), that have a Recommendation with description='FOO'. But those recommendations are not per se the same. This is a consequence of negative logic.
With EXISTS
If we want to JOIN over the same table, we can work with an EXISTS subquery:
to_exclude = Recommendation.objects.filter(
car=OuterRef('pk'),
is_active=True,
description="FOO",
)
Now we can exclude the Cars for which such Recommendation exists:
Car.objects.annotate(
has_recommendation_to_exclude=Exists(to_exclude)
).exclude(
has_recommendation_to_exclude=False
)
Using a COUNT-filter approach
Car.objects.annotate(
nrec=Sum(
Case(
When(
recommendation__is_active=True,
recommendation__description="FOO",
then=Value(1)
),
default=Value(0),
output_field=IntegerField(),
)
)
).exclude(nrec__gt=0)

LINQ not returning all child records

I have a query in the DB:
SELECT GreenInventoryBlendGradeID,bgx.blendgradeid,
bgX.GreenBlendGradeTypeID,[Description]
FROM [GreenInventory] gi
INNER JOIN [GreenInventoryBlendGradeXref] bgX
ON bgX.[GreenInventoryID] = gi.[GreenInventoryID]
INNER JOIN [BlendGrade] bg
ON bg.[BlendGradeID]=bgx.[BlendGradeID]
That returns 3 records:
TypeID Desc
1 XR
2 XR
1 XF2
The LINQ:
var GreenInventory = (from g in Session.GreenInventory
.Include("GreenInventoryBlendGradeXref")
.Include("GreenInventoryBlendGradeXref.BlendGrade")
.Include("GreenInventoryBlendGradeXref.GreenBlendGradeType")
.Include("GreenInventoryWeightXref")
.Where(x => x.GreenInventoryID == id && x.GreenInventoryBlendGradeXref.Any(bg=>bg.GreenBlendGradeTypeID > 0) )
select g);
I have tried different Where clauses including the simple - (x => x.GreenInventoryID == id)
but always have only the first 2 records returned.
Any Ideas?
If I try the following:
var GreenInventory = (from gi in Session.GreenInventory.Where(y => y.GreenInventoryID == id)
join bgX in Session.GreenInventoryBlendGradeXref.DefaultIfEmpty() on gi.GreenInventoryID equals bgX.GreenInventoryID
join bg in Session.BlendGrade.DefaultIfEmpty() on bgX.BlendGradeID equals g.BlendGradeID
select new { GreenInventory = gi, GreenInventoryBlendGradeXref = bgX, BlendGrade = bg });
I Get back 3 of each objects and the correct information is in the BlendGrade objects. It looks like the 3 GreenInventory objects are the same. They each include 2 of the GreenInventoryBlendGradeXref objects which show the the same 2 records as before.
So I not clear on what the original problem was. Also dont know if this is the best way to resolve it.
Thanks for the answers. If anyone has a further thoughts please let us know.
Based on the few details you present, I would assume that you are missing a join. I have no experience with EntityFramework (I assume that you use this ORM), but as far as I know, the ".Include" tries to ensure that the set of root entities will not change and will not contain duplicates.
Your manually created query seems to indicate that there is at least one 1:n relationship in the model. The result you get from LINQ show that only distinct GreenInventory entities are returned.
Therefore you need to adjust your query and explicitly declare that you want all results (and not only distinct root entities) - I would assume that with an explicit join EntityFramework will yield all expected results - or you need to adjust your mapping.
The first place I'd look in would be your model and joins you have defined between the entities. You might also want to check your generated SQL statement:
Trace.WriteLine(GreenInventory.Provider.ToString())
or use Visual Studio IntelliTrace to investigate what was sent to the database.

Exclusive filtering by tag

I'm using rails 3.0 and MySql 5.1
I have these three models:
Question, Tag and QuestionTag.
Tag has a column called name.
Question has many Tags through QuestionTags and vice versa.
Suppose I have n tag names. How do I find only the questions that have all n tags, identified by tag name.
And how do I do it in a single query.
(If you can convince me that doing it in more than one query is optimal, I'll be open to that)
A pure rails 3 solution would be preferred, but I am not adverse to a pure SQL solution either.
Please notice that the difficulty is in making a query which does not give all the questions that have any of the tags, but only the questions that have all the tags.
This is the solution I found for myself. Unmodified, it will only work in Rails 3 (or higher).
In the Tag model:
scope :find_by_names, lambda { |names|
unless names.empty?
where("tags.name IN (#{Array.new(names.length, "?").join(",")})", *names)
else
where("false")
end
}
In the Question model:
scope :tagged_with, lambda { |tag_names|
unless tag_names.blank?
joins(:question_tags).
where("questions.id = question_tags.question_id").
joins(:tags).where("tags.id = question_tags.tag_id").
group("questions.id").
having("count(questions.id) = ?", tag_names.count) & Tag.find_by_names(tag_names)
else
scoped
end
}
The & Tag.find_by_names(tag_names) combines the two scopes such that the join on tags is really a join on the scoped model.
[Update]
My sql-fu has improved a little, so I thought I'd offer a pure SQL solution also:
SELECT q.*
FROM (
SELECT DISTINCT q.*
FROM `questions` q
JOIN question_tags qt
ON qt.question_id = q.id
JOIN tags t
ON t.id = qt.tag_id
WHERE t.name = 'dogs'
) AS q
JOIN question_tags qt
ON qt.question_id = q.id
JOIN tags t
ON t.id = qt.tag_id
WHERE t.name = 'cats'
This finds all the questions that have been tagged with both 'cats' and 'dogs'. The idea is to have a nested subquery for each tag I want to filter by.
There are several other ways to this. I'm not sure if it makes a difference to have the subquery in the FROM clause instead of the WHERE clause. Any insight would be appreciated.

Django Generic Relations and ORM Queries

Say I have the following models:
class Image(models.Model):
image = models.ImageField(max_length=200, upload_to=file_home)
content_type = models.ForeignKey(ContentType)
object_id = models.PositiveIntegerField()
content_object = generic.GenericForeignKey()
class Article(models.Model):
text = models.TextField()
images = generic.GenericRelation(Image)
class BlogPost(models.Model):
text = models.TextField()
images = generic.GenericRelation(Image)
What's the most processor- and memory-efficient way to find all Articles that have at least one Image attached to them?
I've done this:
Article.objects.filter(pk__in=Image.objects.filter(content_type=ContentType.objects.get_for_model(Article)).values_list('object_id', flat=True))
Which works, but besides being ugly it takes forever.
I suspect there's a better solution using raw SQL, but that's beyond me. For what it's worth, the SQL generated by the above is as following:
SELECT `issues_article`.`id`, `issues_article`.`text` FROM `issues_article` WHERE `issues_article`.`id` IN (SELECT U0.`object_id` FROM `uploads_image` U0 WHERE U0.`content_type_id` = 26 ) LIMIT 21
EDIT: czarchaic's suggestion has much nicer syntax but even worse (slower) performance. The SQL generated by his query looks like the following:
SELECT DISTINCT `issues_article`.`id`, `issues_article`.`text`, COUNT(`uploads_image`.`id`) AS `num_images` FROM `issues_article` LEFT OUTER JOIN `uploads_image` ON (`issues_article`.`id` = `uploads_image`.`object_id`) GROUP BY `issues_article`.`id` HAVING COUNT(`uploads_image`.`id`) > 0 ORDER BY NULL LIMIT 21
EDIT: Hooray for Jarret Hardie! Here's the SQL generated by his should-have-been-obvious solution:
SELECT DISTINCT `issues_article`.`id`, `issues_article`.`text` FROM `issues_article` INNER JOIN `uploads_image` ON (`issues_article`.`id` = `uploads_image`.`object_id`) WHERE (`uploads_image`.`id` IS NOT NULL AND `uploads_image`.`content_type_id` = 26 ) LIMIT 21
Thanks to generic relations, you should be able to query this structure using traditional query-set semantics for reverse relations:
Article.objects.filter(images__isnull=False)
This will produce duplicates for any Articles that are related to multiple Images, but you can eliminate that with the distinct() QuerySet method:
Article.objects.distinct().filter(images__isnull=False)
I think your best bet would be to use aggregation
from django.db.models import Count
Article.objects.annotate(num_images=Count('images')).filter(num_images__gt=0)