Rails ignores columns from second table when using .select - sql

By example:
r = Model.arel_table
s = SomeOtherModel.arel_table
Model.select(r[:id], s[:othercolumn].as('othercolumn')).
joins(:someothermodel)
Will product the sql:
`SELECT `model`.`id`, `someothermodel`.`othercolumn` AS othercolumn FROM `model` INNER JOIN `someothermodel` ON `model`.`id` = `someothermodel`.`model_id`
Which is correct. However, when the models are loaded, the attribute othercolumn is ignored because it is not an attribute of Model.
It's similar to eager loading and includes, but I don't want all columns, only the one specified so include is no good.
There must be an easy way of getting columns from other models? I'd preferably have the items return as instances of Model than simple arrays/hashes

When you do a select with joins or includes, you will be returned an ActiveRecordRelation. This ActiveRecordRelation is composed of only the objects of the class which you use to call select on. The selected columns from the joined models are added to the objects returned. Because these attributes are not Model's attribute they don't show up when you inspect these objects, and I believe this is the primary reason for confusion.
You could try this out in your rails console:
> result = Model.select(r[:id], s[:othercolumn].as('othercolumn')).joins(:someothermodel)
=> #<ActiveRecord::Relation [#<Model id: 1>]>
# "othercolumn" is not shown in the result but doing the following will yield correct result
> result.first.othercolumn
=> "myothercolumnvalue"

Related

SQL join that happens in the view of Django Rest Framework

I just want to know what type of SQL join is happening in the following view. I read about types of SQL joins but I am not able to figure out what is happening here.
class WishListItemsView(ListAPIView):
permission_classes = [IsAuthenticated]
serializer_class = WishListItemsCreateSerializer
def get_queryset(self):
user = self.request.user
return WishListItems.objects.filter(owner=user)
My models:
class WishListItems(models.Model):
owner = models.ForeignKey(User, on_delete=models.CASCADE,blank=True)
#wishlist = models.ForeignKey(WishList,on_delete=models.CASCADE, related_name='wishlistitems')
item = models.ForeignKey(Product, on_delete=models.CASCADE,blank=True, null=True)
wish_variants = models.ForeignKey(Variants,on_delete=models.CASCADE, related_name='wishitems')
I can see it in Django debug toolbar, but it is authenticated so I cant see the queries.
No joins are happening in your code. The following line is the queryset you return:
WishListItems.objects.filter(owner=user)
This filtering does not need any joins, Django will simply use the SQL WHERE clause to make this filter. Suppose the primary key of the user here is 1, then the query would be somewhat like:
SELECT <ALL OF YOUR TABLES COLUMNS HERE> FROM "<APP_NAME>_wishlistitems" WHERE "<APP_NAME>_wishlistitems"."owner_id" = 1
You can see the exact query by writing:
print(WishListItems.objects.filter(owner=user).query)
Moving further if you do want to make some join for optimizing or speeding up things use select_related [Django docs] which will make Django use an INNER JOIN:
WishListItems.objects.filter(owner=user).select_related('owner', 'item') # select the related owner and item

Why are all my SQL queries being duplicated 4 times for Django using "Prefetch_related" for nested MPTT children?

I have a Child MPTT model that has a ForeignKey to itself:
class Child(MPTTModel):
title = models.CharField(max_length=255)
parent = TreeForeignKey(
"self", on_delete=models.CASCADE, null=True, blank=True, related_name="children"
)
I have a recursive Serializer as I want to show all levels of children for any given Child:
class ChildrenSerializer(serializers.HyperlinkedModelSerializer):
url = HyperlinkedIdentityField(
view_name="app:children-detail", lookup_field="pk"
)
class Meta:
model = Child
fields = ("url", "title", "children")
def get_fields(self):
fields = super(ChildrenSerializer, self).get_fields()
fields["children"] = ChildrenSerializer(many=True)
return fields
I am trying to reduce the number of duplicate/similar queries made when accessing a Child's DetailView.
The view below works for a depth of 2 - however, the "depth" is not always known or static.
class ChildrenDetailView(generics.RetrieveUpdateDestroyAPIView):
queryset = Child.objects.prefetch_related(
"children",
"children__children",
# A depth of 3 will additionally require "children__children__children",
# A depth of 4 will additionally require "children__children__children__children",
# etc.
)
serializer_class = ChildrenSerializer
lookup_field = "pk"
Note: If I don't use prefetch_related and simply set the queryset as Child.objects.all(), every SQL query is duplicated four times... which I have no idea why.
How do I leverage a Child's depth (i.e. the Child's MPTT level field) to optimize prefetching? Should I be overwriting the view's get_object and/or retrieve?
Does it even matter if I add a ridiculous number of depths to the prefetch? E.g. children__children__children__children__children__children__children__children? It doesn't seem to increase the number of queries for Children objects that don't require that level of depth.
Edit:
Hm, not sure why but when I try to serialize any Child's top parent (i.e. MPTT's get_root), it duplicates the SQL query four times???
class Child(MPTTModel):
...
#property
def top_parent(self):
return self.get_root()
class ChildrenSerializer(serializers.HyperlinkedModelSerializer):
...
top_parent = ParentSerializer()
fields = ("url", "title", "children", "top_parent")
Edit 2
Adding an arbitrary SerializerMethodField confirms it's being queried four times... for some reason? e.g.
class ChildrenSerializer(serializers.HyperlinkedModelSerializer):
...
foo = serializers.SerializerMethodField()
def get_foo(self, obj):
print("bar")
return obj.get_root().title
This will print "bar" four times. The SQL query is also repeated four times according to django-debug-toolbar:
SELECT ••• FROM "app_child" WHERE ("app_child"."parent_id" IS NULL AND "app_child"."tree_id" = '7') LIMIT 21
4 similar queries. Duplicated 4 times.
Are you using DRF's browsable API? It initializes serializer 3 more times for HTML forms, in rest_framework.renderers.BrowsableAPIRenderer.get_context.
If you do the same request with, say, Postman, "bar" should get printed only once.

Get records with no related data using activerecord and RoR3?

I am making scopes for a model that looks something like this:
class PressRelease < ActiveRecord::Base
has_many :publications
end
What I want to get is all press_releases that does not have publications, but from a scope method, so it can be chained with other scopes. Any ideas?
Thanks!
NOTE: I know that there are methods like present? or any? and so on, but these methods does not return an ActiveRecord::Relation as scope does.
NOTE: I am using RoR 3
Avoid eager_loading if you do not need it (it adds overhead). Also, there is no need for subselect statements.
scope :without_publications, -> { joins("LEFT OUTER JOIN publications ON publications.press_release_id = press_releases.id").where(publications: { id: nil }) }
Explanation and response to comments
My initial thoughts about eager loading overhead is that ActiveRecord would instantiate all the child records (publications) for each press release. Then I realized that the query will never return press release records with publications. So that is a moot point.
There are some points and observations to be made about the way ActiveRecord works. Some things I had previously learned from experience, and some things I learned exploring your question.
The query from includes(:publications).where(publications: {id: nil}) is actually different from my example. It will return all columns from the publications table in addition to the columns from press_releases. The publication columns are completely unnecessary because they will always be null. However, both queries ultimately result in the same set of PressRelease objects.
With the includes method, if you add any sort of limit, for example chaining .first, .last or .limit(), then ActiveRecord (4.2.4) will resort to executing two queries. The first query returns IDs, and the second query uses those IDs to get results. Using the SQL snippet method, ActiveRecord is able to use just one query. Here is an example of this from one of my applications:
Profile.includes(:positions).where(positions: { id: nil }).limit(5)
# SQL (0.8ms) SELECT DISTINCT "profiles"."id" FROM "profiles" LEFT OUTER JOIN "positions" ON "positions"."profile_id" = "profiles"."id" WHERE "positions"."id" IS NULL LIMIT 5
# SQL (0.8ms) SELECT "profiles"."id" AS t0_r0, ..., "positions"."end_year" AS t1_r11 FROM "profiles" LEFT OUTER JOIN "positions" ON "positions"."profile_id" = "profiles"."id" # WHERE "positions"."id" IS NULL AND "profiles"."id" IN (107, 24, 7, 78, 89)
Profile.joins("LEFT OUTER JOIN positions ON positions.profile_id = profiles.id").where(positions: { id: nil }).limit(5)
# Profile Load (1.0ms) SELECT "profiles".* FROM "profiles" LEFT OUTER JOIN positions ON positions.profile_id = profiles.id WHERE "positions"."id" IS NULL LIMIT 5
Most importantly
eager_loading and includes were not intended to solve the problem at hand. And for this particular case I think you are much more aware of what is needed than ActiveRecord is. You can therefore make better decisions about how to structure the query.
you can de the following in your PressRelease:
scope :your_scope, -> { where('id NOT IN(select press_release_id from publications)') }
this will return all PressRelease record without publications.
Couple ways to do this, first one requires two db queries:
PressRelease.where.not(id: Publications.uniq.pluck(:press_release_id))
or if you don't want to hardcode association foreign key:
PressRelease.where.not(id: PressRelease.uniq.joins(:publications).pluck(:id))
Another one is to do a left join and pick those without associated elements - you get a relation object, but it will be tricky to work with it as it already has a join on it:
PressRelease.eager_load(:publications).where(publications: {id: nil})
Another one is to use counter_cache feature. You will need to add publication_count column to your press_releases table.
class Publications < ActiveRecord::Base
belongs_to :presss_release, counter_cache: true
end
Rails will keep this column in sync with a number of records associated to given mode, so then you can simply do:
PressRelease.where(publications_count: [nil, 0])

Magento: Get Collection of Order Items for a product collection filtered by an attribute

I'm working on developing a category roll-up report for a Magento (1.6) store.
To that end, I want to get an Order Item collection for a subset of products - those product whose unique category id (that's a Magento product attribute that I created) match a particular value.
I can get the relevant result set by basing the collection on catalog/product.
$collection = Mage::getModel('catalog/product')
->getCollection()
->addAttributeToFilter('unique_category_id', '75')
->joinTable('sales/order_item', 'product_id=entity_id', array('price'=>'price','qty_ordered' => 'qty_ordered'));
Magento doesn't like it, since there are duplicate entries for the same product id.
How do I craft the code to get this result set based on Order Items? Joining in the product collection filtered by an attribute is eluding me. This code isn't doing the trick, since it assumes that attribute is on the Order Item, and not the Product.
$collection = Mage::getModel('sales/order_item')
->getCollection()
->join('catalog/product', 'entity_id=product_id')
->addAttributeToFilter('unique_category_id', '75');
Any help is appreciated.
The only way to make cross entity selects work cleanly and efficiently is by building the SQL with the collections select object.
$attributeCode = 'unique_category_id';
$alias = $attributeCode.'_table';
$attribute = Mage::getSingleton('eav/config')
->getAttribute(Mage_Catalog_Model_Product::ENTITY, $attributeCode);
$collection = Mage::getResourceModel('sales/order_item_collection');
$select = $collection->getSelect()->join(
array($alias => $attribute->getBackendTable()),
"main_table.product_id = $alias.entity_id AND $alias.attribute_id={$attribute->getId()}",
array($attributeCode => 'value')
)
->where("$alias.value=?", 75);
This works quite well for me. I tend to skip going the full way of joining the eav_entity_type table, then eav_attribute, then the value table etc for performance reasons. Since the attribute_id is entity specific, that is all that is needed.
Depending on the scope of your attribute you might need to add in the store id, too.

Rails (or maybe SQL): Finding and deleting duplicate AR objects

ActiveRecord objects of the class 'Location' (representing the db-table Locations) have the attributes 'url', 'lat' (latitude) and 'lng' (longitude).
Lat-lng-combinations on this model should be unique. The problem is, that there are a lot of Location-objects in the database having duplicate lat-lng-combinations.
I need help in doing the following
Find objects that share the same
lat-lng-combination.
If the 'url' attribute of the object
isn't empty, keep this object and delete the
other duplicates. Otherwise just choose the
oldest object (by checking the attribute
'created_at') and delete the other duplicates.
As this is a one-time-operation, solutions in SQL (MySQL 5.1 compatible) are welcome too.
If it's a one time thing then I'd just do it in Ruby and not worry too much about efficiency. I haven't tested this thoroughly, check the sorting and such to make sure it'll do exactly what you want before running this on your db :)
keep = []
locations = Location.find(:all)
locations.each do |loc|
# get all Locations's with the same coords as this one
same_coords = locations.select { |l| l.lat == loc.lat and \
l.lng == loc.lng }
with_urls = same_coords.select { |l| !l.url.empty? }
# decide which list to use depending if there were any urls
same_coords = with_urls.any? ? with_urls : same_coords
# pick the best one
keep << same_coords.sort { |a,b| b.created_at <=> a.created_at }.first.id
end
# only keep unique ids
keep.uniq!
# now we just delete all the rows we didn't decide to keep
locations.each do |loc|
loc.destroy unless keep.include?( loc.id )
end
Now like I said, this is definitely poor, poor code. But sometimes just hacking out the thing that works is worth the time saved in thinking up something 'better', especially if it's just a one-off.
If you have 2 MySQL columns, you can use the CONCAT function.
SELECT * FROM table1 GROUP BY CONCAT(column_lat, column_lng)
If you need to know the total
SELECT COUNT(*) AS total FROM table1 GROUP BY CONCAT(column_lat, column_lng)
Or, you can combine both
SELECT COUNT(*) AS total, table1.* FROM table1
GROUP BY CONCAT(column_lat, column_lng)
But if you can explain more on your question, perhaps we can have more relevant answers.