Django: how to filter for rows whose fields are contained in passed value? - sql

MyModel.objects.filter(field__icontains=value) returns all the rows whose field contains value. How to do the opposite? Namely, construct a queryset that returns all the rows whose field is contained in value?
Preferably without using custom SQL (ie only using the ORM) or without using backend-dependent SQL.

field__icontains and similar are coded right into the ORM. The other version simple doesn't exist.
You could use the where param described under the reference for QuerySet.
In this case, you would use something like:
MyModel.objects.extra(where=["%s LIKE CONCAT('%%',field,'%%')"], params=[value])
Of course, do keep in mind that there is no standard method of concatenation across DMBS. So as far as I know, there is no way to satisfy your requirement of avoiding backend-dependent SQL.
If you're okay with working with a list of dictionaries rather than a queryset, you could always do this instead:
qs = MyModel.objects.all().values()
matches = [r for r in qs if value in r[field]]
although this is of course not ideal for huge data sets.

Related

How to make criteria with array field in Hibernate

I'm using Hibernate and Postgres and defined a character(1)[] column type.
So I donĀ“t know how to make this criteria to find a value in the array.
Like this query
SELECT * FROM cpfbloqueado WHERE bloqueados #> ARRAY['V']::character[]
I am not familiar with Postgres and its types but you can define your own type using custom basic type mapping. That could simplify the query.
There are many threads here on SO regarding Postres array types and Hibernate, for instance, this one. Another array mapping example that could be useful is here. At last, here is an example of using Criteria with user type.
Code example could be
List result = session.createCriteria(Cpfbloqueado.class)
.setProjection(Projections.projectionList()
.add(Projections.property("characterColumn.attribute"), PostgresCharArrayType.class)
)
.setResultTransformer(Transformer.aliasToBean(Cpfbloqueado.class))
.add(...) // add where restrictions here
.list()
Also, if it is not important for the implementation, you can define max length in the entity model, annotating your field with #Column(length = 1).
Or if you need to store an array of characters with length of 1 it is possible to use a collection type.
I hope I got the point right, however, it would be nice if the problem domain was better described.
So you have array of single characters... Problem is that in PG that is not fixed length. I had this problem, but around 10 years ago. At that time I had that column mapped as string, and that way I was able to process internal data - simply slice by comma, and do what is needed.
If you hate that way, as I did... Look for columns with text[] type - that is more common, so it is quite easy to find out something. Please look at this sample project:
https://github.com/phstudy/jpa-array-converter-sample

pig - transform data from rows to columns while inserting placeholders for non-existent fields in specific rows

Suppose I have the following flat file on HDFS (let's call this key_value):
1,1,Name,Jack
1,1,Title,Junior Accountant
1,1,Department,Finance
1,1,Supervisor,John
2,1,Title,Vice President
2,1,Name,Ron
2,1,Department,Billing
Here is the output I'm looking for:
(1,1,Department,Finance,Name,Jack,Supervisor,John,Title,Junior Accountant)
(2,1,Department,Billing,Name,Ron,,,Title,Vice President)
In other words, the first two columns form a unique identifier (similar to a composite key in db terminology) and for a given value of this identifier, we want one row in the output (i.e., the last two columns - which are effectively key-value pairs - are condensed onto the same row as long as the identifier is the same). Also notice the nulls in the second row to add placeholders for Supervisor piece that's missing when the unique identifier is (2, 1).
Towards this end, I started putting together this pig script:
data = LOAD 'key_value' USING PigStorage(',') as (i1:int, i2:int, key:chararray, value:chararray);
data_group = GROUP data by (i1, i2);
expected = FOREACH data_group {
sorted = ORDER data BY key, value;
GENERATE FLATTEN(BagToTuple(sorted));
};
dump expected;
The above script gives me the following output:
(1,1,Department,Finance,1,1,Name,Jack,1,1,Supervisor,John,1,1,Title,Junior Accountant)
(2,1,Department,Billing,2,1,Name,Ron,2,1,Title,Vice President)
Notice that the null place holders for missing Supervisor are not represented in the second record (which is expected). If I can get those nulls into place, then it seems just a matter of another projection to get rid of redundant columns (the first two which are replicated multiple times - once per every key value pair).
Short of using a UDF, is there a way to accomplish this in pig using the in-built functions?
UPDATE: As WinnieNicklaus correctly pointed out, the names in the output are redundant. So the output can be condensed to:
(1,1,Finance,Jack,John,Junior Accountant)
(2,1,Billing,Ron,,Vice President)
First of all, let me point out that if for most rows, most of the columns are not filled out, that a better solution IMO would be to use a map. The builtin TOMAP UDF combined with a custom UDF to combine maps would enable you to do this.
I am sure there is a way to solve your original question by computing a list of all possible keys, exploding it out with null values and then throwing away the instances where a non-null value also exists... but this would involve a lot of MR cycles, really ugly code, and I suspect is no better than organizing your data in some other way.
You could also write a UDF to take in a bag of key/value pairs, another bag all possible keys, and generates the tuple you're looking for. That would be clearer and simpler.

What's the reasoning behind result columns being excluded from auto-select statements in PetaPoco

If I have a POCO class with ResultColumn attribute set and then when I do a Single<Entity>() call, my result column isn't mapped. I've set my column to be a result column because its value should always be generated by SQL column's default constraint. I don't want this column to be injected or updated from business layer. What I'm trying to say is that my column's type is a simple SQL data type and not a related entity type (as I've seen ResultColumn being used mostly on those).
Looking at code I can see this line in PetaPoco:
// Build column list for automatic select
QueryColumns = ( from c in Columns
where !c.Value.ResultColumn
select c.Key
).ToArray();
Why are result columns excluded from automatic select statement because as I understand it their nature is to be read only. So used in selects only. I can see this scenario when a column is actually a related entity type (complex). Ok. but then we should have a separate attribute like ComputedColumnAttribute that would always be returned in selects but never used in inserts or updates...
Why did PetaPoco team decide to omit result columns from selects then?
How am I supposed to read result columns then?
I can't answer why the creator did not add them to auto-selects, though I would assume it's because your particular use-case is not the main one that they were considering. If you look at the examples and explanation for that feature on their site, it's more geared towards extra columns you bring back in a join or calculation (like maybe a description from a lookup table for a code value). In these situations, you could not have them automatically added to the select because they are not part of the underlying table.
So if you want to use that attribute, and get a value for the property, you'll have to use your own manual select statement rather than relying on the auto-select.
Of course, the beauty of using PetaPoco is that you can easily modify it to suit your needs, by either creating a new attribute, like you suggest above, or modifying the code you showed to not exclude those fields from the select (assuming you are not using ResultColumn in other join-type situations).

Yii CSqlDataProvider confusion

I am having some trouble understanding CSqlDataProvider and how it works.
When I am using CActiveDataProvider, the results can be accessed as follows:
$data->userProfile['first_name'];
However, when I use CSqlDataProvider, I understand that the results are returned as an array not an object. However, the structure of the array is flat. In other words, I am seeing the following array:
$data['first_name']
instead of
$data['userProfile']['first_name']
But the problem here is what if I have another joined table (let's call it 'author') in my sql code that also contains a first_name field? With CActiveDataProvider, the two fields are disambiguated, so I can do the following to access the two fields:
$data->userProfile['first_name'];
$data->author['first_name'];
But with CSqlDataProvider, there doesn't seem to be anyway I can access the data as follows:
$data['userProfile']['first_name'];
$data['author']['first_name'];
So, outside of assigning a unique name to those fields directly inside my SQL, by doing something like this:
select author.first_name as author_first_name, userProfile.first_name as user_first_name
And then referring to them like this:
$data['author_first_name'];
$data['user_first_name']
is there anyway to get CSqlDataProvider to automatically structure the arrays so they are nested in the same way that CActiveDataProvider objects are? So that I can call them by using $data['userProfile']['first_name']
Or is there another class I should be using to obtain these kinds of nested arrays?
Many thanks!
As far as I can tell, no Yii DB methods break out JOIN query results in to 2D arrays like you are looking for. I think you will need to - as you suggest - alias the column names in your select statement.
MySql returns a single row of data when you JOIN tables in a query, and CSqlDataProvider returns exactly what MySql does: single tabular array representation indexed/keyed by the column names, just like your query returns.
If you want to break apart your results into a multi-dimensional array I would either alias the columns, or use a regular CActiveDataProvider (which you can still pass complex queries and joins in via CDbCritiera).

Django query for large number of relationships

I have Django models setup in the following manner:
model A has a one-to-many relationship to model B
each record in A has between 3,000 to 15,000 records in B
What is the best way to construct a query that will retrieve the newest (greatest pk) record in B that corresponds to a record in A for each record in A? Is this something that I must use SQL for in lieu of the Django ORM?
Create a helper function for safely extracting the 'top' item from any queryset. I use this all over the place in my own Django apps.
def top_or_none(queryset):
"""Safely pulls off the top element in a queryset"""
# Extracts a single element collection w/ top item
result = queryset[0:1]
# Return that element or None if there weren't any matches
return result[0] if result else None
This uses a bit of a trick w/ the slice operator to add a limit clause onto your SQL.
Now use this function anywhere you need to get the 'top' item of a query set. In this case, you want to get the top B item for a given A where the B's are sorted by descending pk, as such:
latest = top_or_none(B.objects.filter(a=my_a).order_by('-pk'))
There's also the recently added 'Max' function in Django Aggregation which could help you get the max pk, but I don't like that solution in this case since it adds complexity.
P.S. I don't really like relying on the 'pk' field for this type of query as some RDBMSs don't guarantee that sequential pks is the same as logical creation order. If I have a table that I know I will need to query in this fashion, I usually have my own 'creation' datetime column that I can use to order by instead of pk.
Edit based on comment:
If you'd rather use queryset[0], you can modify the 'top_or_none' function thusly:
def top_or_none(queryset):
"""Safely pulls off the top element in a queryset"""
try:
return queryset[0]
except IndexError:
return None
I didn't propose this initially because I was under the impression that queryset[0] would pull back the entire result set, then take the 0th item. Apparently Django adds a 'LIMIT 1' in this scenario too, so it's a safe alternative to my slicing version.
Edit 2
Of course you can also take advantage of Django's related manager construct here and build the queryset through your 'A' object, depending on your preference:
latest = top_or_none(my_a.b_set.order_by('-pk'))
I don't think Django ORM can do this (but I've been pleasantly surprised before...). If there's a reasonable number of A record (or if you're paging), I'd just add a method to A model that would return this 'newest' B record. If you want to get a lot of A records, each with it's own newest B, I'd drop to SQL.
remeber that no matter which route you take, you'll need a suitable composite index on B table, maybe adding an order_by=('a_fk','-id') to the Meta subclass