Setting group_by in specialized query - sql

I need to perform data smoothing using averaging, with a non-standard group_by variable that is created on-the-fly. My model consists of two tables:
class WthrStn(models.Model):
name=models.CharField(max_length=64, error_messages=MOD_ERR_MSGS)
owner_email=models.EmailField('Contact email')
location_city=models.CharField(max_length=32, blank=True)
location_state=models.CharField(max_length=32, blank=True)
...
class WthrData(models.Model):
stn=models.ForeignKey(WthrStn)
date=models.DateField()
time=models.TimeField()
temptr_out=models.DecimalField(max_digits=5, decimal_places=2)
temptr_in=models.DecimalField(max_digits=5, decimal_places=2)
class Meta:
ordering = ['-date','-time']
unique_together = (("date", "time", "stn"),)
The data in WthrData table are entered from an xml file in variable time increments, currently 15 or 30 minutes, but that could vary and change over time. There are >20000 records in that table. I want to provide an option to display the data smoothed to variable time units, e.g. 30 minutes, 1, 2 or N hours (60, 120, 180, etc minutes)
I am using SQLIte3 as the DB engine. I tested the following sql, which proved quite adequate to perform the smoothing in 'bins' of N-minutes duration:
select id, date, time, 24*60*julianday(datetime(date || time))/N jsec, avg(temptr_out)
as temptr_out, avg(temptr_in) as temptr_in, avg(barom_mmhg) as barom_mmhg,
avg(wind_mph) as wind_mph, avg(wind_dir) as wind_dir, avg(humid_pct) as humid_pct,
avg(rain_in) as rain_in, avg(rain_rate) as rain_rate,
datetime(avg(julianday(datetime(date || time)))) as avg_date from wthr_wthrdata where
stn_id=19 group by round(jsec,0) order by stn_id,date,time;
Note I create an output variable 'jsec' using the SQLite3 function 'julianday', which returns number of days in the integer part and fraction of day in the decimal part. So, multiplying by 24*60 gives me number of minutes. Dividing by N-minute resolution gives me a nice 'group by' variable, compensating for varying time increments of the raw data.
How can I implement this in Django? I have tried the objects.raw(), but that returns a RawQuerySet, not a QuerySet to the view, so I get error messages from the html template:
</p>
Number of data entries: {{ valid_form|length }}
</p>
I have tried using a standard Query, with code like this:
wthrdta=WthrData.objects.all()
wthrdta.extra(select={'jsec':'24*60*julianday(datetime(date || time))/{}'.format(n)})
wthrdta.extra(select = {'temptr_out':'avg(temptr_out)',
'temptr_in':'avg(temptr_in)',
'barom_mmhg':'avg(barom_mmhg)',
'wind_mph':'avg(wind_mph)',
'wind_dir':'avg(wind_dir)',
'humid_pct':'avg(humid_pct)',
'rain_in':'avg(rain_in)',
'rain_sum_in':'sum(rain_in)',
'rain_rate':'avg(rain_rate)',
'avg_date':'datetime(avg(julianday(datetime(date || time))))'})
Note that here I use the sql-avg functions instead of using the django aggregate() or annotate(). This seems to generate correct sql code, but I cant seem to get the group_by set properly to my jsec data that is created at the top.
Any suggestions for how to approach this? All I really need is to have the QuerySet.raw() method return a QuerySet, or something that can be converted to a QuerySet instead of RawQuerySet. I can not find an easy way to do that.

The answer to this turns out to be really simple, using a hint I found from
[https://gist.github.com/carymrobbins/8477219][1]
though I modified his code slightly. To return a QuerySet from a RawQuerySet, all I did was add to my models.py file, right above the WthrData class definition:
class MyManager(models.Manager):
def raw_as_qs(self, raw_query, params=()):
"""Execute a raw query and return a QuerySet. The first column in the
result set must be the id field for the model.
:type raw_query: str | unicode
:type params: tuple[T] | dict[str | unicode, T]
:rtype: django.db.models.query.QuerySet
"""
cursor = connection.cursor()
try:
cursor.execute(raw_query, params)
return self.filter(id__in=(x[0] for x in cursor))
finally:
cursor.close()
Then in my class definition for WthrData:
class WthrData(models.Model):
objects=MyManager()
......
and later in the WthrData class:
def get_smoothWthrData(stn_id,n):
sqlcode='select id, date, time, 24*60*julianday(datetime(date || time))/%s jsec, avg(temptr_out) as temptr_out, avg(temptr_in) as temptr_in, avg(barom_mmhg) as barom_mmhg, avg(wind_mph) as wind_mph, avg(wind_dir) as wind_dir, avg(humid_pct) as humid_pct, avg(rain_in) as rain_in, avg(rain_rate) as rain_rate, datetime(avg(julianday(datetime(date || time)))) as avg_date from wthr_wthrdata where stn_id=%s group by round(jsec,0) order by stn_id,date,time;'
return WthrData.objects.raw_as_qs(sqlcode,[n,stn_id]);
This allows me to grab results from the highly populated WthrData table smoothed over time increments, and the results come back as a QuerySet instead of RawQuerySet

Related

Slick: Pass in column to update

Let's say we have a FoodTable with the following columns: Name, Calories, Carbs, Protein. I have an entry for Name = Chocolate, Calories = 100, Carbs = "10g", and Protein = "2g".
I'm wondering if there's a way to pass in a column name and a new value to update with. For example, I want a method that's like
def updateFood(food, columnName, value):
table.filter(_.name === food).map(x => x.columnName).update(value)
It seems like dynamic columns are not possible with Slick? I want to avoid writing a SQL query because that could lead to security flaws or bugs in the code. Is there really no way to do this?
I also don't want to have to pass in the entire object to update, since ideally, it should be:
I want to update column X to value Y. I should only need to pass in the id of the object, the column, and the value to update to.
I'm wondering if there's a way to pass in a column name and a new value to update with
This depends a little bit on what you want the "column name" to be. To maintain safety, what I'd suggest is having the "column name" be a function that can select a column in your table.
At a high level that would look like this:
// Won't compile, but we'll fix that in a moment
def updateFood[V](food: Food, column: FoodTable => Rep[V], value: V): DBIO[Int] =
foods.filter(_.name === food.name).map(column).update(value)
...which we'd call like this:
updateFood(choc, _.calories, 99)
Notice how the "column name" is a function from FoodTable to a column of some value V. Then you provide a value for the V and we do a normal update.
The problem is that Slick knows how to map certain types of values (String, Int, etc) into SQL, but not any kind of value. And the code above won't compile because V is unconstrained.
We can sort of fix that my adding a constraint on V, and it mostly will work:
// Will compile, will work for basic types
def updateFood[V : slick.ast.BaseTypedType](food: Food, column: FoodTable => Rep[V], value: V): DBIO[Int] =
foods.filter(_.name === food.name).map(column).update(value)
However, if you have custom column mappings, they won't match the constraint. We need to go another step on and have an implicit shape in scope:
def updateFood[V](food: Food, column: FoodTable => Rep[V], value: V)(implicit shape: Shape[_ <: FlatShapeLevel, Rep[V], V, _]): DBIO[Int] =
foods.filter(_.name === food.name).map(column).update(value)
I think of Shape as an extra level of abstraction in Slick, above Rep[V]. The mechanisms of the "shape levels" and other details are not something I can explain because I don't understand them yet! (There is a talk that goes into the design of Slick called "Polymorphic Record Types in a Lifted Embedding" which you can find at http://slick.lightbend.com/docs/)
A final note: if you really want the column name to be a String or something like that, I'd suggest pattern matching the string (or validate in some way) to a FoodTable => Rep function and use that in your SQL. That's going to be tricky because your value V is going to have to match the type of the column you want to update.
Off the top of my head, that could look something like this:
def tryUpdateFood(food: Food, columnName: String, value: String): DBIO[Int] =
columnName match {
case "calories" => updateFood(food, _.calories, value.toInt)
case "carbs" => updateFood(food, _.carbs, value)
// etc...
case unknown => DBIO.failed(new Exception(s"Don't know how to update $unknown columns"))
}
I can imagine better error handling, safer or smarter parsing of the value, but in outline the above could work.
For hints at other ways to approach dynamic problems, take a look at the talk "Patterns for Slick database applications" (also listed at: http://slick.lightbend.com/docs/), and towards the end of the presentation there's a section on "Dynamic sorting".

doctrine native sql not accepting parameter list

I'm trying to do native SQL in Doctrine. Basically I have 2 parameters:
CANDIDATE_ID - user for who we delete entries,
list of FILE_ID to keep
So I make
$this->getEntityManager()->getConnection()->
executeUpdate( "DELETE FROM FILE WHERE CANDIDATE_ID = :ID AND NOT ID IN :KEEPID",
array(
"ID" => $candidate->id,
"KEEPID" => array(2) )
);
But Doctrine fails:
Notice: Array to string conversion in D:\xampp\htdocs\azk\vendor\doctrine\dbal\lib\Doctrine\DBAL\Connection.php on line 786
Is this bug in Doctrine? I'm making somewhere else select with IN but with QueryBuilder and it's working. Maybe someone could suggest better way of deleting entries, with QueryBuilder for example?
$stmt = $conn->executeQuery('SELECT * FROM articles WHERE id IN (?)',
array(array(1, 2, 3, 4, 5, 6)),
array(\Doctrine\DBAL\Connection::PARAM_INT_ARRAY)
);
From Doctrine's documentation.
You can't pass an array of IDs to a parameter. You can do this for scalar values, but even if this had a 'toString', it wouldn't be what you want.
String concatenation is one method,
"DELETE FROM FILE WHERE CANDIDATE_ID = :ID AND NOT ID IN (". implode(",", $list_of_ids) .")"
But this method goes straight around parameters, and therefore suffers in terms of readability, and is limited to a certain maximum line length, which can vary between databases.
Another approach is to write a function returning a table result, which takes a string of IDs as a parameter.
You could also solve this with a join to a table containing the IDs to keep.
It's a problem I've seen many times with few good answers, but it's usually caused by a misunderstanding in the way the database is modelled. This is a 'code smell' for database access.

Call display value for choices with SQL

I'm wanting to write SQL for a Django field that uses CharField(choices=()) and have the display value show up in the SQL rather than the call value. Any idea how to do this? It's similar to get_FOO_display().
For reference's sake, here's my model:
class Person(models.Model):
STUDENT_CHOICES=(
(0,'None'),
(1,'UA Current LDP'),
(2,'UA LDP Alumni'),
(3,'MSU Current LDP'),
(4,'MSU LDP Alumni')
)
...
studentStatus=models.IntegerField(choices=STUDENT_CHOICES, verbose_name="Student Status", null=True, blank=True)
And my query:
def mailingListQuery(request):
...
if request.POST:
...
sql = """
...
per."studentStatus" # Here's where I want to access the display value
left outer join person as per on (per.party_id = p.id)
"""
Thanks in advance!
You can use something like:
STUDENT_CHOICES=(
('None', 'None'),
)
Also, avoid using raw SQL. If you really need it - always use parametrized queries
connection.cursor().execute('sql with %s params', [params])

NHibernate Like with integer

I have a NHibernate search function where I receive integers and want to return results where at least the beginning coincides with the integers, e.g.
received integer: 729
returns: 729445, 7291 etc.
The database column is of type int, as is the property "Id" of Foo.
But
int id = 729;
var criteria = session.CreateCriteria(typeof(Foo))
criteria.Add(NHibernate.Criterion.Expression.InsensitiveLike("Id", id.ToString() + "%"));
return criteria.List<Foo>();
does result in an error (Could not convert parameter string to int32). Is there something wrong in the code, a work around, or other solution?
How about this:
int id = 729;
var criteria = session.CreateCriteria(typeof(Foo))
criteria.Add(Expression.Like(Projections.Cast(NHibernateUtil.String, Projections.Property("Id")), id.ToString(), MatchMode.Anywhere));
return criteria.List<Foo>();
Have you tried something like this:
int id = 729;
var criteria = session.CreateCriteria(typeof(Foo))
criteria.Add(NHibernate.Criterion.Expression.Like(Projections.SqlFunction("to_char", NHibernate.NHibernateUtil.String, Projections.Property("Id")), id.ToString() + "%"));
return criteria.List<Foo>();
The idea is convert the column before using a to_char function. Some databases do this automatically.
AFAIK, you'll need to store your integer as a string in the database if you want to use the built in NHibernate functionality for this (I would recommend this approach even without NHibernate - the minute you start doing 'like' searches you are dealing with a string, not a number - think US Zip Codes, etc...).
You could also do it mathematically in a database-specific function (or convert to a string as described in Thiago Azevedo's answer), but I imagine these options would be significantly slower, and also have potential to tie you to a specific database.

How to specify multiple values in where with AR query interface in rails3

Per section 2.2 of rails guide on Active Record query interface here:
which seems to indicate that I can pass a string specifying the condition(s), then an array of values that should be substituted at some point while the arel is being built. So I've got a statement that generates my conditions string, which can be a varying number of attributes chained together with either AND or OR between them, and I pass in an array as the second arg to the where method, and I get:
ActiveRecord::PreparedStatementInvalid: wrong number of bind variables (1 for 5)
which leads me to believe I'm doing this incorrectly. However, I'm not finding anything on how to do it correctly. To restate the problem another way, I need to pass in a string to the where method such as "table.attribute = ? AND table.attribute1 = ? OR table.attribute1 = ?" with an unknown number of these conditions anded or ored together, and then pass something, what I thought would be an array as the second argument that would be used to substitute the values in the first argument conditions string. Is this the correct approach, or, I'm just missing some other huge concept somewhere and I'm coming at this all wrong? I'd think that somehow, this has to be possible, short of just generating a raw sql string.
This is actually pretty simple:
Model.where(attribute: [value1,value2])
Sounds like you're doing something like this:
Model.where("attribute = ? OR attribute2 = ?", [value, value])
Whereas you need to do this:
# notice the lack of an array as the last argument
Model.where("attribute = ? OR attribute2 = ?", value, value)
Have a look at http://guides.rubyonrails.org/active_record_querying.html#array-conditions for more details on how this works.
Instead of passing the same parameter multiple times to where() like this
User.where(
"first_name like ? or last_name like ? or city like ?",
"%#{search}%", "%#{search}%", "%#{search}%"
)
you can easily provide a hash
User.where(
"first_name like :search or last_name like :search or city like :search",
{search: "%#{search}%"}
)
that makes your query much more readable for long argument lists.
Sounds like you're doing something like this:
Model.where("attribute = ? OR attribute2 = ?", [value, value])
Whereas you need to do this:
#notice the lack of an array as the last argument
Model.where("attribute = ? OR attribute2 = ?", value, value) Have a
look at
http://guides.rubyonrails.org/active_record_querying.html#array-conditions
for more details on how this works.
Was really close. You can turn an array into a list of arguments with *my_list.
Model.where("id = ? OR id = ?", *["1", "2"])
OR
params = ["1", "2"]
Model.where("id = ? OR id = ?", *params)
Should work
If you want to chain together an open-ended list of conditions (attribute names and values), I would suggest using an arel table.
It's a bit hard to give specifics since your question is so vague, so I'll just explain how to do this for a simple case of a Post model and a few attributes, say title, summary, and user_id (i.e. a user has_many posts).
First, get the arel table for the model:
table = Post.arel_table
Then, start building your predicate (which you will eventually use to create an SQL query):
relation = table[:title].eq("Foo")
relation = relation.or(table[:summary].eq("A post about foo"))
relation = relation.and(table[:user_id].eq(5))
Here, table[:title], table[:summary] and table[:user_id] are representations of columns in the posts table. When you call table[:title].eq("Foo"), you are creating a predicate, roughly equivalent to a find condition (get all rows whose title column equals "Foo"). These predicates can be chained together with and and or.
When your aggregate predicate is ready, you can get the result with:
Post.where(relation)
which will generate the SQL:
SELECT "posts".* FROM "posts"
WHERE (("posts"."title" = "Foo" OR "posts"."summary" = "A post about foo")
AND "posts"."user_id" = 5)
This will get you all posts that have either the title "Foo" or the summary "A post about foo", and which belong to a user with id 5.
Notice the way arel predicates can be endlessly chained together to create more and more complex queries. This means that if you have (say) a hash of attribute/value pairs, and some way of knowing whether to use AND or OR on each of them, you can loop through them one by one and build up your condition:
relation = table[:title].eq("Foo")
hash.each do |attr, value|
relation = relation.and(table[attr].eq(value))
# or relation = relation.or(table[attr].eq(value)) for an OR predicate
end
Post.where(relation)
Aside from the ease of chaining conditions, another advantage of arel tables is that they are independent of database, so you don't have to worry whether your MySQL query will work in PostgreSQL, etc.
Here's a Railscast with more on arel: http://railscasts.com/episodes/215-advanced-queries-in-rails-3?view=asciicast
Hope that helps.
You can use a hash rather than a string. Build up a hash with however many conditions and corresponding values you are going to have and put it into the first argument of the where method.
WRONG
This is what I used to do for some reason.
keys = params[:search].split(',').map!(&:downcase)
# keys are now ['brooklyn', 'queens']
query = 'lower(city) LIKE ?'
if keys.size > 1
# I need something like this depending on number of keys
# 'lower(city) LIKE ? OR lower(city) LIKE ? OR lower(city) LIKE ?'
query_array = []
keys.size.times { query_array << query }
#['lower(city) LIKE ?','lower(city) LIKE ?']
query = query_array.join(' OR ')
# which gives me 'lower(city) LIKE ? OR lower(city) LIKE ?'
end
# now I can query my model
# if keys size is one then keys are just 'brooklyn',
# in this case it is 'brooklyn', 'queens'
# #posts = Post.where('lower(city) LIKE ? OR lower(city) LIKE ?','brooklyn', 'queens' )
#posts = Post.where(query, *keys )
now however - yes - it's very simple. as nfriend21 mentioned
Model.where(attribute: [value1,value2])
does the same thing