django: datediff sql queries? - sql

I'm trying to do the equivalent of the following SQL in Django:
SELECT * FROM applicant WHERE date_out - date_in >= 1 AND date_out - date_in <= 6
I can do this as a RAW sql query, but this is becoming frustrating in dealing with a RawQuerySet instead of a regular QuerySet object as I would like to be able to filter it later in the code.

I came across the issue of Django not natively supporting Datediff (and other database equivalents), and needed to use such a function many times for a particular project.
Upon further reading, it became clear that the implementation of calculating an interval from two dates differs widely between major database flavours. This is probably why it's not got a native abstraction function in Django yet. So I wrote my own Django ORM function for datediff:
See: mike-db-tools Github repository
You'll see the varying syntax between the database backends written in the docstrings for the respective databases. Datediff supports sqlite, MySQL / MariaDB, PostgreSQL and Oracle.
Usage (Django 1.8+):
from db_tools import Datediff
# Define a new dynamic fields to contain the calculated date difference
applicants = Applicant.objects.annotate(
days_range=Datediff('date_out','date_in', interval='days'),
)
# Now you can use this dynamic field in your standard filter query
applicants = applicants.filter(days_range__gte=1, days_range__lte=6)
I'm really quite derpy when it comes to my code, so I encourage you to fork and improve.

You can use the extra() method and pass in a where keyword argument. The value of where should be a list that contains the SQL WHERE clause of the query above. I tested this with Postgresql 8.4 and this is what it looked like in my case:
q = Applicant.objects.extra(where = ["""date_part('day', age(date_out, date_in)) >= 1 and
date_part('day', age(date_out, date_in)) <= 6"""])
This will return you a valid QuerySet instance.

Related

How to write an SQL NOT EXISTS query/scope in the Rails way?

I have a database scope to filter only latest ProxyConfig version for particular Proxy and environment.
This is the raw SQL that works very well with MySQL, PostgreSQL and Oracle:
class ProxyConfig < ApplicationRecord
...
scope :current_versions, -> do
where %(NOT EXISTS (
SELECT 1 FROM proxy_configs pc
WHERE proxy_configs.environment = environment
AND proxy_configs.proxy_id = proxy_id
AND proxy_configs.version < version
))
end
...
end
You can find a simple test case in my baby_squeel issue.
But I find it nicer not to use SQL directly. I have spent a lot of time trying out different approaches to write it in the Rails way to no avail. I found generic Rails and baby_squeel examples but they always involved different tables.
PS The previous version used joins but it was super slow and it messed up some queries. For example #count produced an SQL syntax error. So I'm not very open on using other approaches. Rather I prefer to know how to implement this query exactly. Although I'm at least curious to see other simple solutions.
PPS About the question that direct SQL is fine. In this case, mostly yes. Maybe all RDBMS can understand this quoting. If one needs to compare text fields though that requires special functions on Oracle. On Postgres the case-insensitive LIKE is ILIKE. It can be handled automatically by Arel. In raw SQL it would require different string for the different RDBMS.
This isn't actually a query that you can build with the ActiveRecord Query Interface alone. It can be done with a light sprinkling of Arel though:
class ProxyConfig < ApplicationRecord
def self.current_versions
pc = arel_table.alias("pc")
where(
unscoped.select(1)
.where(pc[:environment].eq(arel_table[:environment]))
.where(pc[:proxy_id].eq(arel_table[:proxy_id]))
.where(pc[:version].gt(arel_table[:version]))
.from(pc)
.arel.exists.not
)
end
end
The generated SQL isn't identical but I think it should be functionally equivilent.
SELECT "proxy_configs".* FROM "proxy_configs"
WHERE NOT (
EXISTS (
SELECT 1 FROM "proxy_configs" "pc"
WHERE "pc"."environment" = "proxy_configs"."environment"
AND "pc"."proxy_id" = "proxy_configs"."proxy_id"
AND "pc"."version" > "proxy_configs"."version"
)
)

Request last hour data from Big Query with Standard SQL

This is my problem.
I would like to request only the data of the last hour from Big Query.
I would like to use Standard Sql.
I would like to pay only for read the data in this interval of time.
Example :
My partition of the day take 200 Go. I request data of the last hour (40Go). Is it possible to pay only for 40Go in Standard SQL ?
Thanks !
You can use table decorators (specifically range decorators) but they are supported in BigQuery Legacy SQL ONLY
To get data from the last hour you can use below:
SELECT <list_of_fields>
FROM [yourproject:yourdataset.yourtable#-3600000-]
Of course, the preferred query syntax for BigQuery is standard SQL - so you can either have your query logic built with Legacy SQL syntax and thus have whole logic in one query or you can use split logic to first get last hour data into temp table using legacy's sql decorators and then use standard sql to apply needed logic
Meantime see below opened issue on Google's Issue Tracker:
Support an equivalent to table decorators in standard SQL
From that thread - looks like the closest feature to meet your case could be hourly partitioning - whenever it will be available

TABLE_DATE_RANGE for xxxx_yyyymm format tables

I'm having a problem trying to query for 15 months worth of data.
I know about bigquery's wildcard functions, but I can't seem to get them to work with my tables.
For example, if my tables are called:
xxxx_201501,
xxxx_201502,
xxxx_201503,
...
xxxx_201606
How can I select everything from 201501 until today (current_timestamp)?
It seems that it's necessary to have the tables per day, am I wrong?
I've also read that you can use regex but can't find the way.
With Standard SQL, you can use a WHERE clause on a _TABLE_SUFFIX pseudo column as described here:
Is there an equivalent of table wildcard functions in BigQuery with standard SQL?
In this particular case, it would be:
SELECT ... from `mydataset.xxx_*` WHERE _TABLE_SUFFIX >= '201501';
This is a bit long for a comment.
If you are using the standard SQL dialect, then I don't think the functionality is yet implemented.
If you are using the legacy SQL dialect, then you can use a function such as TABLE_DATE_RANGE(). This and other table wildcard functions are well documented.
EDIT:
Oh, I see. The simplest way would be to store the tables as YYYYMM01 so you can use the range query.
But, you can also use table_query():
from table_query(t, 'right(table_id, 6) >= ''201501'' ')

Grouping by month in database and not with ruby

I'm trying to group calls by month but I need to do it in the database and not with ruby. Here is the current code:
Call.limit(1000).group_by { |t| t.created_at.month }
Which returns:
SELECT `calls`.* FROM `calls` ORDER BY created_at desc LIMIT 1000
Then ruby does the grouping. What should I do to make the database do the work ?
Thank you.
The short answer, is that you cannot achieve the same result at SQL level.
Here's the full explanation.
First of all, what should be the result of that call? You can use the PG/SQL Group BY statement, however it's likely the result is not what you expect.
The Group By syntax is designed to group rows with a pattern, and compute and aggregate function. In your case, even assuming you create a query that uses date_trunc to group by a part of the timestamp, the aggregate function does not permit you to return a dataset structured like the Ruby group_by method.
Why do you want to compute such grouping at database level?
If you have specific requirements or computation limits, then work on a custom method.
Use Call.limit(1000).group("month(created_at)")
Please checkout mysql date-time methods appropriate in your case. But .group() will do the mysql grouping.
http://dev.mysql.com/doc/refman/5.1/en/date-and-time-functions.html#function_month

SQL: Ordering by how much greater than something is?

I have two datetime fields here: actual_delivery and scheduled_delivery
What I want to do an ORDER BY on is how much great actual_delivery is than scheduled_delivery.
I'm using MySQL locally and PostgreSQL in production, so it needs to work for both.
If I were doing it in SQL Server I'd calculate DATEDIFF(actual_delivery, scheduled_deliver) AS [DeliveryDifference] then order by that computed column.
A quick search indicates there's a datediff function in MySql but the syntax may be slightly different in PostgreSQL so you may have to create your own function there.
Try this:
SELECT actual_delivery, scheduled_delivery, actual_delivery - scheduled_delivery as difference FROM tablename ORDER BY difference