CakePHP 3 ORM - Sorting based on two columns in two different related tables - sql

Currently I'm trying to implement timeline functionality which requires to sort the created column in two related tables and update the parent table (in my case pictures) accordingly.
More specific, I have a pictures table which has many comments. I want to sort the pictures based on the most recent timestamp in the created column of both the comments and pictures table.
I have the following query which retrieves the necessary data but it isn't ordered properly:
public function getPicturesAndCommentsOfUser($userId){
return $this->find()
->contain([
'Comments' => function ($q){
return $q
->contain(['Users' => function ($q) {
return $q->select($this->select);
}])
->order(['Comments.created' => 'ASC']);
},
'Users' => function ($q) {
return $q->select($this->select);
},
'Albums'
])
->matching('Albums.Users', function ($q) use ($userId) {
return $q
->where(['Users.id' => $userId]);
});
}
My question is how to combine both the ordering of Pictures.created and Comments.created. I already tried to call the order function in both ->contain(['Comments']) and in the most outer part of the chain after the last matching call. I can't seem to figure out how to relate the two tables to each other so that I can sort on both of them.
Moreover, I read in other sources (like this one) that I could use an union statement but all the information I can find about that option is that it will work on unrelated tables, not related.
Anyone can give me some directions on how to solve this?

First of all your purpose should be clear. Which ordering has priority for you? Pictures.created or Comments.created? You can not sort the results based on both. When you do something like this:
$this->Comments->find()->order(['Pictures.created' => 'ASC'])->
order(['Comments.created' => 'ASC']);
The ordering is only guaranteed for Comments.created. If two comments had equal times then they are ordered by Pictures.created.
There's another thing to be considered. Build the query on the object that is more populated. If each Picture HasMany Comments try to build your query on Comments. I don't have any idea how the results get sorted when you are building the query on Pictures. Because the retrieved records are sorted not by Pictures.id but when the result is translated to object models, Comments are nested in Pictures and previous orderings get disposed.
And at last don't make things complicated. Using php logic in queries doesn't improve the performance, otherwise it decreases your code readability. So if you are about to perform php logic first retrieve all data then process it using several simple foreaches.

Related

Laravel 5.6.9 - database count gives different outputs

Does anyone know why these two ways to count numbers of users in my table give different answers when i run them in tinker?
App\Models\User::count()
=> 92269
$count = \DB::table('users')->count()
=> 92829
Running a SQL query in Sequel Pro gives 92829.
If you have the SoftDelete trait on your User model then when you query via the Model it excludes "deleted" entries. You can include them by adding the withTrashed() constraint.
App\Models\User::withTrashed()->count();
https://laravel.com/docs/7.x/eloquent#soft-deleting

Abstract view of how distinct queries are implemented in NoSQL

I am developing a system using Google Data-Store, where there's a Kind - Posts and which has 2 properties
1. message (string)
2. hashtags (list)
I wanted to query the distinct hashtags with the number. For example
say The posts are
{
{
"message":"msg1",
"tags":["abc","cde","efr"]
},
{
"message":"msg2",
"tags":["abc,"efgh","efk"]
},
{
"message":"msg3",
"tags":["abc,"efgh","efr"]
}
}
The output should be
{
"abc":3
"cde":1
"efk":1
"efgh":2
"efr":2
}
But in NoSQL implementation Data-store I can't directly query this. In order to query I have to load all the messages and find distinct queries. It will be a time-consuming event.
But I have seen a distinct function db.collection.distinct() which I think might have optimize this problem. If It has to be done on any NoSQL what may be the optimum solution for this?
Unfortunately, projection queries with 'distinct on' will only return a single result per distinct value (https://cloud.google.com/datastore/docs/concepts/queries#projection_queries). It will not provide a count of each distinct value. You'll need to do the count yourself, but you can use a projection query to save cost by only returning the tag values instead of the full entities.

Rails complex query setup with multiple records

Im trying to figure out how to do a query in rails where with multiple ids returned from another query. I have a Food table that has_many compounds through contents. What Im trying to do is get a list of foods that share at least one compound with the original food. Currently I have the Following:
To get the compounds from the initial food i have this:
def set_food
#food = Food.includes(:compounds).find(params[:id])
end
This sets the food and compounds and i can output by calling each-do all the contents and show what compounds are in each food. So the next step im not sure how to do is how to get all the food_id's from the content table where compound_id equals one of the ones in the original content returned above (hopefully that makes sense). So something like this (i know this isnt right)
def show
#pairs = Food.joins(:contents).where(contents: {compound_id: #food.contents.compound_id})
end
Any help would be appreciated, tried googling answers but not even sure what to google to get in the right direction
You can also do this with a subquery like this:
#pairs = Food.joins(:contents).where(contents: { compound_id:
#food.compounds.select(:id) })
Then you get a query something like:
SELECT `foods`.* FROM `foods` INNER JOIN `contents` ON `contents`.`food_id` =
`foods`.`id` WHERE `contents`.`compound_id` IN(SELECT id FROM compounds WHERE
...)
This can be better for performance if you have lots of ids. If you use pluck or #food.compounds.ids of #food.compound_ids you can get a huge array of ids and pass that to your query that can become very slow.
You already eager loaded your compounds while getting your #food so you can do it in two steps:
First, get all the interesting compound ids related to your original food:
compound_ids = #food.compounds.pluck(:id)
I think with AR relations you should even be able to do:
# worth the try
compound_ids = #food.compound_ids
Step number two you pretty much use the same query you had before:
#pairs = Food.joins(:contents).where(contents: { compound_id: compound_ids })
The { compound_id: [array] } thing should be converted to a SQL IN statement like: contents.compound_id IN (1,2,3).
This is the most straightforward way to do it without getting into subqueries and whatnot.

CakePHP query additions in controller

I am migrating raw PHP code to CakePHP and have some problems. As I have big problems with query to ORM transformation I temporary use raw SQL. All is going nice, but I met the ugly code and don't really know how to make it beautiful. I made DealersController and added function advanced($condition = null) (it will be called from AJAX with parameters 1-15 and 69). function looks like:
switch ($condition) {
case '1':
$cond_query = ' AND ( (d.email = \'\' OR d.email IS NULL) )';
break;
case '2':
$cond_query = ' AND (d.id IN (SELECT dealer_id FROM dealer_logo)';
break;
// There are many cases, some long, some like these two
}
if($user_group == 'group_1') {
$query = 'LONG QUERY WITH 6+ TABLES JOINING' . $cond_query;
} elseif ($user_group == 'group_2'){
$query = 'A LITLE BIT DIFFERENT LONG QUERY WITH 6+ TABLES JOINING' . $cond_query;
} else {
$query = 'A LITLE MORE BIT DIFFERENT LONG QUERY WITH 10+ TABLES JOINING' . $cond_query;
}
// THERE IS $this->Dealer->query($query); and so on
So.. As you see code looks ugly. I have two variants:
1) get out query addition and make model methods for every condition, then these conditions seperate to functions. But this is not DRY, because main 3 big queries is almost the same and if I will need to change something in one - I will need to change 16+ queries.
2) Make small reusable model methods/queries whitch will get out of DB small pieces of data, then don't use raw SQL but play with methods. It would be good, but the performance will be low and I need it as high as possible.
Please give me advice. Thank you!
If you're concerned about how CakePHP makes a database query for every joined table, you might find that the Linkable behaviour can help you reduce the number of queries (where the joins are simple associations on the one table).
Otherwise, I find that creating simple database querying methods at the Model level to get your smaller pieces of information, and then combining them afterwards, is a good approach. It allows you to clearly outline what your code does (through inline documentation). If you can migrate to using CakePHP's find methods instead of raw queries, you will be using the conditions array syntax. So one way you could approach your problem is to have public functions on your Model classes which append their appropriate conditions to an inputted conditions array. For example:
class SomeModel extends AppModel {
...
public function addEmailCondition(&$conditions) {
$conditions['OR'] = array(
'alias.email_address' => null,
'alias.email_address =' => ''
);
}
}
You would call these functions to build up one large conditions array which you can then use to retrieve the data you want from your controller (or from the model if you want to contain it all at the model layer). Note that in the above example, the conditions array is being passed by reference, so it can be edited in place. Also note that any existing 'OR' conditions in the array will be overwritten by this function: your real solution would have to be smarter in terms of merging your new conditions with any existing ones.
Don't worry about 'hypothetical' performance issues - if you've tried to queries and they're too slow, then you can worry about how to increase performance. But for starters, try to write the code as cleanly as possible.
You also might want to consider splitting up that function advanced() call into multiple Controller Actions that are grouped by the similarity of their condition query.
Finally, in case you haven't already checked it out, here's the Book's entry on retrieving data from models. There might be some tricks you hadn't seen before: http://book.cakephp.org/view/1017/Retrieving-Your-Data
If the base part of the query is the same, you could have a function to generate that part of the query, and then use other small functions to append the different where conditions, etc.

Django query for large number of relationships

I have Django models setup in the following manner:
model A has a one-to-many relationship to model B
each record in A has between 3,000 to 15,000 records in B
What is the best way to construct a query that will retrieve the newest (greatest pk) record in B that corresponds to a record in A for each record in A? Is this something that I must use SQL for in lieu of the Django ORM?
Create a helper function for safely extracting the 'top' item from any queryset. I use this all over the place in my own Django apps.
def top_or_none(queryset):
"""Safely pulls off the top element in a queryset"""
# Extracts a single element collection w/ top item
result = queryset[0:1]
# Return that element or None if there weren't any matches
return result[0] if result else None
This uses a bit of a trick w/ the slice operator to add a limit clause onto your SQL.
Now use this function anywhere you need to get the 'top' item of a query set. In this case, you want to get the top B item for a given A where the B's are sorted by descending pk, as such:
latest = top_or_none(B.objects.filter(a=my_a).order_by('-pk'))
There's also the recently added 'Max' function in Django Aggregation which could help you get the max pk, but I don't like that solution in this case since it adds complexity.
P.S. I don't really like relying on the 'pk' field for this type of query as some RDBMSs don't guarantee that sequential pks is the same as logical creation order. If I have a table that I know I will need to query in this fashion, I usually have my own 'creation' datetime column that I can use to order by instead of pk.
Edit based on comment:
If you'd rather use queryset[0], you can modify the 'top_or_none' function thusly:
def top_or_none(queryset):
"""Safely pulls off the top element in a queryset"""
try:
return queryset[0]
except IndexError:
return None
I didn't propose this initially because I was under the impression that queryset[0] would pull back the entire result set, then take the 0th item. Apparently Django adds a 'LIMIT 1' in this scenario too, so it's a safe alternative to my slicing version.
Edit 2
Of course you can also take advantage of Django's related manager construct here and build the queryset through your 'A' object, depending on your preference:
latest = top_or_none(my_a.b_set.order_by('-pk'))
I don't think Django ORM can do this (but I've been pleasantly surprised before...). If there's a reasonable number of A record (or if you're paging), I'd just add a method to A model that would return this 'newest' B record. If you want to get a lot of A records, each with it's own newest B, I'd drop to SQL.
remeber that no matter which route you take, you'll need a suitable composite index on B table, maybe adding an order_by=('a_fk','-id') to the Meta subclass