Django query to Postgres returning wrong column - sql

I'm facing a strange problem maybe related with some cache that I cannot find.
I have the following Models:
class Incubadores(models.Model):
incubador = models.CharField(max_length=10, primary_key=True)
posicion = models.CharField(max_length=10)
class Tareas(TimeStampedModel):
priority = models.CharField(max_length=20, choices=PRIORITIES, default='normal')
incubador = models.ForeignKey(Incubadores, on_delete=models.CASCADE, null=True, db_column='incubador')
info = JSONField(null=True)
datos = JSONField(null=True)
class Meta:
ordering = ('priority','modified','created')
I previously didn't have the argument db_column, so the Postgres column for that field was incubador_id
I used the argument db_column to change the name of the column, and then I run python manage.py makemgrations and python manage.py migrate, but I'm still getting the column as incubadores_id whenever I perform a query such as:
>>> tareas = Tareas.objects.all().values()
>>> print(tareas)
<QuerySet [{'info': None, 'modified': datetime.datetime(2019, 11, 1, 15, 24, 58, 743803, tzinfo=<UTC>), 'created': datetime.datetime(2019, 11, 1, 15, 24, 58, 743803, tzinfo=<UTC>), 'datos': None, 'priority': 'normal', 'incubador_id': 'I1.1', 'id': 24}, {'info': None, 'modified': datetime.datetime(2019, 11, 1, 15, 25, 25, 49950, tzinfo=<UTC>), 'created': datetime.datetime(2019, 11, 1, 15, 25, 25, 49950, tzinfo=<UTC>), 'datos': None, 'priority': 'normal', 'incubador_id': 'I1.1', 'id': 25}]>
I need to modify this column name because I'm having other issues with Serializers. So the change is necessary.
If I perform the same query in other Models where I've also changed the name of the default field. The problem is exactly the same.
It happens both on the shell and on the code.
I've tried with different queries, to make sure it's not related to Django lazy query system, but the problem is the same. I've also tried executing django.db.connection.close().
If I do a direct SQL query to PostgreSQL, it cannot find incubador_id, but only incubador, which is correct.
Anyone has any idea of what can be happening? I've already been 2 days with this problem and I cannot find a reason :( It's a very basic operation.
Thanks!

This answer will explain why this is happening.
Django's built-in serializers don't have this issue, but probably won't yield exactly what you're looking for:
>>> from django.core import serializers
>>> serializers.serialize("json", Tareas.objects.all())
'[{"model": "inc.tareas", "pk": 1, "fields": {"priority": "normal", "incubador": "test-i"}}]'
You could use the fields attribute here, which seems like it would give you what you're looking for.
You don't specify what your "other issues with Serializers" are, but my suggestion would be to write custom serialization code. Relying on something like .values() or even serializers.serialize() is a bit too implicit for me; writing explicit serialization code makes it less likely you'll accidentally break a contract with a consumer of your serialized data if this model changes.
Note: Please try to make the example you provide minimal and reproducible. I removed some fields to make this work with stock Django, which is why the serialized value is missing fields; the _id issue was still present without the third-party apps you're using, and was resolved with serializers. This also isn't specific to PG; it happens in sqlite as well.

Related

Solr: indexing nested JSON files + some fields independent of UniqueKey (need new core?)

I am working on an NLP project and I have a large amount of text data to index with Solr. I have already created an initial index (Solr core) with fields title, authors, publication date, authors, abstract. The is an ID that is unique to each article (PMID). Since then, I have extracted more information from the dataset and I am stuck with how to incorporate this new info into the existing index. I don't know how to approach the problem and I would appreciate suggestions.
The new information is currently stored in JSON files that look like this:
{id: {entity: [[33, 39, 0, subj], [103, 115, 1, obj], ...],
another_entity: [[88, 95, 0, subj], [444, 449, 1, obj], ...],
...},
another id,
...}
where the integers are the character span and the index of the sentence the entity appears in.
Is there a way to have something like subfields in Solr? Since the id is the same as the unique key in the main index I was thinking of adding a field entities, but then this field would need to have its own subfields start character, end character, sentence index, dependency tag. I have come across Nested Child Documents and I am considering changing the structure of the extracted information to:
{id: {entity: [{start:33, end:39, sent_idx:0, dep_tag:'subj'},
{start:103, end:115, sent_idx:1, dep_tag:'obj'}, ...],
another_entity: [{}, {}, ...],
...},
another id,
...}
Having keys for the nested values, I should be able to use the methods linked above - though I am still unsure if I am on the right track here. Is there a better way to approach this? All fields should be searchable. I am familiar with Python, and so far I have been using the library subprocess to post documents to Solr via Python script
sp.Popen(f"./post -c {core_name} {json_path}", shell=True, cwd=SOLR_BIN_DIR)
Additionally, I want to index some information that is not linked to a specific PMID (does not have the same unique key), so I assume I need to create a new Solr core for it? Does it mean I have to switch to SolrCloud mode? So far I have been using a simple, single core.
Example of such information (abbreviations and the respective long form - also stored in a JSON file):
{"IEOP": "immunoelectroosmophoresis",
"ELISA": "enzyme-linked immunosorbent assay",
"GAGs": "glycosaminoglycans",
...}
I would appreciate any input - thank you!
S.

Dealing with special characters from data schema in GraphQL

I am dealing with data which is structured (by a third party) with special characters; like so:
"pageFansGenderAge": {
"current": {
"U.13-17": 1,
"U.55-64": 246,
"M.55-64": 11925,
"U.35-44": 370,
"F.45-54": 16443,
"M.18-24": 8996,
"M.35-44": 20641,
"F.25-34": 11687,
"U.65+": 148,
"U.18-24": 42,
"M.25-34": 22341,
"F.13-17": 177,
"U.45-54": 415,
"F.65+": 5916,
"F.55-64": 12172,
"M.13-17": 141,
"M.65+": 6576,
"F.35-44": 14491,
"U.25-34": 178,
"M.45-54": 17979,
"F.18-24": 5787
},
GraphQL is throwing errors as it can't accept special characters , the full-stop and the hyphen are causing issues. Is there a known way to parse these in to stop the errors? Simply removing all the special characters (obvs) just returns null values.
Thanks in advance.
I have found a work around.
I can return the current data as JSON. Thanks to this stack-overflow answer:
💡 Answer: Use 'scalar JSON' in your GraphQL query
GraphQL - Get all fields from nested JSON object
🤘

ActiveRecord Generate Condition SQL

My goal in writing a function is to allow callers to pass in the same condition arguments they would to a where call in ActiveRecord, and I want the corresponding Rails-generated SQL.
Example
If my function receives a hash like this as an argument
role: 'Admin', id: [4, 8, 15]
I would expect to generate this string
"users"."role" = 'Admin' AND "users"."id" IN (4, 8, 15)
Possible Solutions
I get the closest with to_sql.
pry(main)> User.where(role: 'Admin', id: [4, 8, 15])
=> "SELECT \"users\".* FROM \"users\" WHERE \"users\".\"role\" = 'Admin' AND \"users\".\"id\" IN (4, 8, 15)"
It returns almost exactly what I want; however, I would be more comfortable not stripping away the SELECT ... WHERE myself in case something changes in the way the SQL is generated. I realize the WHERE should always be there to split on, but I'd prefer an even less brittle approach.
My next approach was using Arel's where_sql function.
pry(main)> User.where(role: 'Admin', id: [4, 8, 15]).arel.where_sql
=> "WHERE \"users\".\"role\" = $1 AND \"users\".\"id\" IN (4, 8, 15)"
It gets rid of the SELECT but leaves the WHERE. I would prefer it to the above if it had already injected the sanitized role, but that renders it quite a bit less desirable.
I've also considered generating the SQL myself, but I would prefer to avoid that.
Do any of you know if there's some method right under my nose I just haven't found yet? Or is there a better way of doing this altogether?
Ruby 2.3.7
Rails 5.1.4
I too would like to know how to get the conditions without the leading WHERE. I see in https://coderwall.com/p/lsdnsw/chain-rails-scopes-with-or that they used string manipulation to get rid of the WHERE, which seems messy but maybe the only solution currently. :/
scope.arel.where_sql.gsub(/\AWHERE /i, "")

MongoDB using $and with slice and a match

I'm using #Query from the spring data package and I want to query on the last element of an array in a document.
For example the data structure could be like this:
{
name : 'John',
scores: [10, 12, 14, 16]
},
{
name : 'Mary',
scores: [78, 20, 14]
},
So I've built a query, however it is complaining that "error message 'unknown operator: $slice' on server"
The $slice part of the query, when run separately, is fine:
db.getCollection('users').find({}, {scores: { $slice: -1 })
However as soon as I combine it with a more complex check, it gives the error as mentioned.
db.getCollection('users').find{{"$and":[{ } , {"scores" : { "$slice" : -1}} ,{"scores": "16"}]})
This query would return the list of users who had a last score of 16, in my example John would be returned but not Mary.
I've put it into a standard mongo query (to debug things), however ideally I need it to go into a spring-data #query construct - they should be fairly similar.
Is there anyway of doing this, without resorting to hand-cranked java calls? I don't see much documentation for #Query, other than it takes standard queries.
As commented with the link post, that refers to aggregate, how does that work with #Query, plus one of the main answers uses $where, this inefficient.
The general way forward with the problem is unfortunately the data, although #Veeram's response is correct, it will mean that you do not hit indexes. This is an issue where you've got very large data sets of course and you will see ever decreasing return times. It's something $where, $arrayElemAt cannot help you with. They have to pre-process the data and that means a full collection scan. We analysed several queries with these constructs and they involved a "COLSCAN".
The solution is ideally to create a field that contains the last item, for instance:
{
name : 'John',
scores: [10, 12, 14, 16],
lastScore: 16
},
{
name : 'Mary',
scores: [78, 20, 14],
lastScore: 14
}
You could create a listener to maintain this as follows:
#Component
public class ScoreListener extends AbstractMongoEventListener<Scores>
You then get the ability to sniff the data and make any updates:
#Override
public void onBeforeConvert(BeforeConvertEvent<Scores> event) {
// process any score and set lastScore
}
Don't forget to update your indexes (!):
#CompoundIndex(name = "lastScore", def = "{"
+ "'lastScore': 1"
+ " }")
Although this does contain a disadvantage of a slight duplication of data, in current Mongo (3.4) this really is the only way of doing this AND to include indexes in the search mechanism. The speed differences were dramatic, from nearly a minute response time down to milliseconds.
In Mongo 3.6 there may be better ways for doing that, however we are fixed on this version, so this has to be our solution.

How does knex handle default values in SQLite?

We currently use mysql / knex, and I'm adding SQLite as a database for testing purposes. I'm getting
Knex:warning - sqlite does not support inserting default values. Set the useNullAsDefault flag to hide this warning. (see docs http://knexjs.org/#Builder-insert).
How does Knex handle default values? Does it just drop any defaults, or does it add in the defaults after an insert as following UPDATE statements?
I don't want to change all of our codebase (swap out all default values), trying to do the minimal change that will allow me to run SQLite in our tests... concerned this will introduce bugs.
I'm learning knex.js so I can use it in a project involving PostgreSQL. While trying out Sqlite I came across this issue.
Turns out it's documented!
If one prefers that undefined keys are replaced with NULL instead of DEFAULT one may give useNullAsDefault configuration parameter in knex config.
And they give this code:
var knex = require('knex')({
client: 'sqlite3',
connection: {
filename: "./mydb.sqlite"
},
useNullAsDefault: true
});
knex('coords').insert([{x: 20}, {y: 30}, {x: 10, y: 20}])
// insert into `coords` (`x`, `y`) values (20, NULL), (NULL, 30), (10, 20)"
This removed the warning message for me.