PyPika control order of with clauses - pypika

I am using PyPika (version 0.37.6) to create queries to be used in BigQuery. I am building up a query that has two WITH clauses, and one clause is dependent on the other. Due to the dynamic nature of my application, I do not have control over the order in which those WITH clauses are added to the query.
Here is example working code:
a_alias = AliasedQuery("a")
b_alias = AliasedQuery("b")
a_subq = Query.select(Term.wrap_constant("1").as_("z")).select(Term.wrap_constant("2").as_("y"))
b_subq = Query.from_(a_alias).select("z")
q = Query.with_(a_subq, "a").from_(a_alias).select(a_alias.y)
q = q.with_(b_subq, "b").from_(b_alias).select(b_alias.z)
sql = q.get_sql(quote_char=None)
That generates a working query:
WITH a AS (SELECT '1' z,'2' y) ,b AS (SELECT a.z FROM a) SELECT a.y,b.z FROM a,b
However, if I add the b WITH clause first, then since a is not yet defined, the resulting query:
WITH b AS (SELECT a.z FROM a), a AS (SELECT '1' z,'2' y) SELECT a.y,b.z FROM a,b
does not work. Since BigQuery does not support WITH RECURSIVE, that is not an option for me.
Is there any way to control the order of the WITH clauses? I see the _with list in the QueryBuilder (the type of variable q), but since that's a private variable, I don't want to rely on that, especially as new versions of PyPika may not operate the same way.
One way I tried to do this is to always insert the first WITH clause at the beginning of the _with list, like this:
q._with.insert(0, q._with.pop())
Although this works, I'd like to use a PyPika supported way to do that.
In a related question, is there a supported way within PyPika to see what has already been added to the select list or other parts of the query? I noticed the q.selects member variable, but selects is not part of the public documentation. Using q.selects did not actually work for me when using our project's Python version (3.6) even though it did work in Python 3.7. The code I was trying to use is:
if any(field.name == "date" for field in q.selects if isinstance(field, Field))
The error I got was as follows:
def __getitem__(self, item: slice) -> "BetweenCriterion":
if not isinstance(item, slice):
> raise TypeError("Field' object is not subscriptable")
Thank you in advance for your help.

I could not figure out how to control the order of the WITH clauses after calling query.with_() (except for the hack already noted). As a result, I restructured my application to get around this problem. I am now calling query.with_() before building up the rest of the query.
This also made my related question moot, because I no longer need to see what I've already added to the query.

Related

How to run SPARQL queries in R (WBG Topical Taxonomy) without parallelization

I am an R user and I am interested to use the World Bank Group (WBG) Topical Taxonomy through SPARQL queries.
This can be done directly on the API https://vocabulary.worldbank.org/PoolParty/sparql/taxonomy but it can be done also through R by using the functions load.rdf (to load the taxonomy.rdf rdfxml file downloaded from https://vocabulary.worldbank.org/ ) and after using the sparql.rdf to perform the query. These functions are available on the "rrdf" package
These are the three lines of codes:
taxonomy_file <- load.rdf("taxonomy.rdf")
query <- "SELECT DISTINCT ?nodeOnPath WHERE {http://vocabulary.worldbank.org/taxonomy/435 http://www.w3.org/2004/02/skos/core#narrower* ?nodeOnPath}"
result_query_1 <- sparql.rdf(taxonomy_file,query)
What I obtain from result_query_1 is exactly the same I get through the API.
However, the load.rdf function uses all the cores available on my computer and not only one. This function is somehow parallelizing the load task over all the core available on my machine and I do not want that. I haven't found any option on that function to specify its serialized usege.
Therefore, I am trying to find other solutions. For instance, I have tried "rdf_parse" and "rdf_query" of the package "rdflib" but without any encouraging result. These are the code lines I have used.
taxonomy_file <- rdf_parse("taxonomy.rdf")
query <- "SELECT DISTINCT ?nodeOnPath WHERE {http://vocabulary.worldbank.org/taxonomy/435 http://www.w3.org/2004/02/skos/core#narrower* ?nodeOnPath}"
result_query_2 <- rdf_query(taxonomy_file , query = query)
Is there any other function that perform this task? The objective of my work is to run several queries simultaneously using foreach.
Thank you very much for any suggestion you could provide me.

Oracle spatial request working on one instance and not on another

I have this statement that is generated by Geoserver
SELECT
shape AS shape
FROM
(
SELECT
c.chantier_id id,
sdo_geom.sdo_buffer(c.shape, m.diminfo, 1) shape,
c.datedebut datedebut,
c.datefin datefin,
o.nom operation,
c.brouillon brouillon,
e.code etat,
u.utilisateur_id utilisateur,
u.groupe_id groupe
FROM
user_sdo_geom_metadata m, lyv_chantier c
JOIN lyv_utilisateur u ON c.createur_id = u.utilisateur_id
JOIN lyv_etat e ON c.etat_id = e.etat_id
JOIN lyv_operation o ON c.operation = o.id
WHERE
m.table_name = 'LYV_CHANTIER'
AND m.column_name = 'SHAPE'
) vtable
WHERE
( brouillon = 0
AND ( etat != 'archive'
OR etat IS NULL )
AND sdo_filter(shape, mdsys.sdo_geometry(2003, 4326, NULL, mdsys.sdo_elem_info_array(1, 1003, 1), mdsys.sdo_ordinate_array(
2.23365783691406, 48.665657043457, 2.23365783691406, 48.9341354370117, 2.76649475097656, 48.9341354370117, 2.76649475097656, 48.665657043457, 2.23365783691406, 48.665657043457)), 'mask=anyinteract querytype=WINDOW') = 'TRUE' );
On my local instance (dockerized if that can explain anything) it works fine, but on another instance I get an error :
ORA-13226: interface not supported without a spatial index
I guess that the SDO_FILTER is applied to the result of SDO_BUFFER which is therefore not indexed.
But why is it working on my local instance ?!
Is there some kind of weird configuration shenanigan that could explain the different behavior maybe ?
EDIT : The idea behind this is to get around a bug in Geoserver with Oracle databases where it renders only the first point of MultiPoint geometries, but works fine with MutltiPolygon.
I am using a SQL view as layer in Geoserver (hence the subselect I guess).
First, you need to do some debugging here.
Connect to each instance, on the same user as your Geoserver's datasource, and run the sql. From the same connections (in each instance) you must also verify that the user's metadata view (user_sdo_geom_metadata) have an entry for the table and the table has a spatial index - whose owner is the same user as the one you connect.
Also, your query ( select ... from 'vtable') has a column 'shape' which is a buffer of the column lyv_chantier.shape. The sdo_filter, in this sql, expects a spatial index on the vtable.shape - which cannot exist. You should try to use a different alias (e.g. buf_shape) and sdo_filter(buf_shape,...) - to see if the sql fails in both instances, as it should.
I'm in a bit of a hurry right now, so my instructions are summarized. If you want, do this debugging and post the results. We then can go into details.
EDIT: Judging from your efforts, I'd say that the simplest approach is: 1) add a second geometry column to lyv_chantier (e.g. buf_shp). 2) update lyv_chantier set buf_shp = sdo_geom.sdo_buffer(shape,...). 3) insert into user_sdo_geom_metadata the values (lyv_chantier, buf_shp, ...). 4) create a spatial index on column buf_shp. You may need to consider a trigger to update buf_shp whenever shape changes...
This is a very practical approach but you don't provide any info about your case (what is the oracle version, how many rows does the table have, how is it used, why do you want to use sdo_buffer, etc), so that's my recommendation for now.
Also, since you are, most likely, using an sql view as layer in Geoserver (you don't say anything about that, either), you could also consider using pure GS functionality to achieve your goal.
At the end, without describing your goal, it's difficult to provide anything more tailor-made.

Using PosgreSQL array_agg with join alias in JOOQ DSL

I want to convert this SQL query to JOOQ DSL.
select "p".*, array_agg("pmu") as projectmemberusers
from "Projects" as "p"
join "ProjectMemberUsers" as "pmu" on "pmu"."projectId" = "p"."id"
group by "p"."id";
Currently i have tried doing something like this using JOOQ:
val p = PROJECTS.`as`("p")
val pmu = PROJECTMEMBERUSERS.`as`("pmu")
val query = db.select(p.asterisk(), DSL.arrayAgg(pmu))
.from(p.join(pmu).on(p.ID.eq(pmu.PROJECTID)))
.groupBy(p.ID)
This does not work because DSL.arrayAgg expects something of type Field<T> as input.
I am new to JOOQ and not an SQL professional. Detailed explanations and impovement suggestions are highly appreciated.
First of all, the syntax indeed works, checked this in SQL Fiddle: http://sqlfiddle.com/#!17/e45b7/3
But it's not documented in detail: https://www.postgresql.org/docs/9.5/static/functions-aggregate.html
https://www.postgresql.org/docs/current/static/rowtypes.html#ROWTYPES-USAGE
That's probably the reason jOOQ doesn't support this currently: https://github.com/jOOQ/jOOQ/blob/master/jOOQ/src/main/java/org/jooq/impl/DSL.java#L16856
The only syntax that will work currently is with a single field: DSL.arrayAgg(pmu.field(1))
What you're looking for is a way to express PostgreSQL's "anonymous" nested records through the jOOQ API, similar to what is requested in this feature request: https://github.com/jOOQ/jOOQ/issues/2360
This is currently not possible in the jOOQ API as of version 3.11, but it definitely will be in the future.
Workaround 1
You could try using the experimental DSL.rowField() methods on a Row[N]<...> representation of your table type. This may or may not work yet, as the feature is currently not supported.
Workaround 2
A workaround is to create a type:
create type my_type as (...) -- Same row type as your table
And a view:
create view x as
select "p".*, array_agg("pmu"::my_type) as projectmemberusers
from "Projects" as "p"
join "ProjectMemberUsers" as "pmu" on "pmu"."projectId" = "p"."id"
group by "p"."id";
And then use the code generator to pick up the resulting type.

Django ORM Cross Product

I have three models:
class Customer(models.Model):
pass
class IssueType(models.Model):
pass
class IssueTypeConfigPerCustomer(models.Model):
customer=models.ForeignKey(Customer)
issue_type=models.ForeignKey(IssueType)
class Meta:
unique_together=[('customer', 'issue_type')]
How can I find all tuples of (custmer, issue_type) where there is no IssueTypeConfigPerCustomer object?
I want to avoid a loop in Python. A solution which solves this in the DB would be preferred.
Background: for every customer and for every issue-type, there should be a config in the DB.
If you can afford to make one database trip for each issue type, try something like this untested snippet:
def lacking_configs():
for issue_type in IssueType.objects.all():
for customer in Customer.objects.filter(
issuetypeconfigpercustomer__issue_type=None
):
yield customer, issue_type
missing = list(lacking_configs())
This is probably OK unless you have a lot of issue types or if you are doing this several times per second, but you may also consider having a sensible default instead of making a config object mandatory for each combination of issue type and customer (IMHO it is a bit of a design-smell).
[update]
I updated the question: I want to avoid a loop in Python. A solution which solves this in the DB would be preferred.
In Django, every Queryset is either a list of Model instances or a dict (values querysets), so it is impossible to return the format you want (a list of tuples of Model) without some Python (and possibly multiple trips to the database).
The closest thing to a cross product would be using the "extra" method without a where parameter, but it involves raw SQL and knowing the underlying table name for the other model:
missing = Customer.objects.extra(
select={"issue_type_id": 'appname_issuetype.id'},
tables=['appname_issuetype']
)
As a result, each Customer object will have an extra attribute, "issue_type_id", containing the id of one IssueType. You can use the where parameter to filter based on NOT EXISTS (SELECT 1 FROM appname_issuetypeconfigpercustomer WHERE issuetype_id=appname_issuetype.id AND customer_id=appname_customer.id). Using the values method you can have something close to what you want - this is probably enough information to verify the rule and create the missing records. If you need other fields from IssueType just include them in the select argument.
In order to assemble a list of (Customer, IssueType) you need something like:
cross_product = [
(customer, IssueType.objects.get(pk=customer.issue_type_id))
for customer in
Customer.objects.extra(
select={"issue_type_id": 'appname_issuetype.id'},
tables=['appname_issuetype'],
where=["""
NOT EXISTS (
SELECT 1
FROM appname_issuetypeconfigpercustomer
WHERE issuetype_id=appname_issuetype.id
AND customer_id=appname_customer.id
)
"""]
)
]
Not only this requires the same number of trips to the database as the "generator" based version but IMHO it is also less portable, less readable and violates DRY. I guess you can lower the number of database queries to a couple using something like this:
missing = Customer.objects.extra(
select={"issue_type_id": 'appname_issuetype.id'},
tables=['appname_issuetype'],
where=["""
NOT EXISTS (
SELECT 1
FROM appname_issuetypeconfigpercustomer
WHERE issuetype_id=appname_issuetype.id
AND customer_id=appname_customer.id
)
"""]
)
issue_list = dict(
(issue.id, issue)
for issue in
IssueType.objects.filter(
pk__in=set(m.issue_type_id for m in missing)
)
)
cross_product = [(c, issue_list[c.issue_type_id]) for c in missing]
Bottom line: in the best case you make two queries at the cost of legibility and portability. Having sensible defaults is probably a better design compared to mandatory config for each combination of Customer and IssueType.
This is all untested, sorry if some homework was left for you.

Translate SQL to OCL?

I have a piece of SQL that I want to translate to OCL. I'm not good at SQL so I want to increase maintainability by this. We are using Interbase 2009, Delphi 2007 with Bold and modeldriven development. Now my hope is that someone here both speaks good SQL and OCL :-)
The original SQL:
Select Bold_Id, MessageId, ScaniaId, MessageType, MessageTime, Cancellation, ChassieNumber, UserFriendlyFormat, ReceivingOwner, Invalidated, InvalidationReason,
(Select Parcel.MCurrentStates From Parcel
Where ScaniaEdiSolMessage.ReceivingOwner = Parcel.Bold_Id) as ParcelState From ScaniaEdiSolMessage
Where MessageType = 'IFTMBP' and
not Exists (Select * From ScaniaEdiSolMessage EdiSolMsg
Where EdiSolMsg.ChassieNumber = ScaniaEdiSolMessage.ChassieNumber and EdiSolMsg.ShipFromFinland = ScaniaEdiSolMessage.ShipFromFinland and EdiSolMsg.MessageType = 'IFTMBF') and
invalidated = 0 Order By MessageTime desc
After a small simplification:
Select Bold_Id, (Select Parcel.MCurrentStates From Parcel
where ScaniaEdiSolMessage.ReceivingOwner = Parcel.Bold_Id) From ScaniaEdiSolMessage
Where MessageType = 'IFTMBP' and not Exists (Select * From ScaniaEdiSolMessage
EdiSolMsg Where EdiSolMsg.ChassieNumber = ScaniaEdiSolMessage.ChassieNumber and
EdiSolMsg.ShipFromFinland = ScaniaEdiSolMessage.ShipFromFinland and
EdiSolMsg.MessageType = 'IFTMBF') and invalidated = 0
NOTE: There are 2 cases for MessageType, 'IFTMBP' and 'IFTMBF'.
So the table to be listed is ScaniaEdiSolMessage.
It has attributes like:
MessageType: String
ChassiNumber: String
ShipFromFinland: Boolean
Invalidated: Boolean
It has also a link to table Parcel named ReceivingOwner with BoldId as key.
So it seems like it list all rows of ScaniaEdiSolMessage and then have a subquery that also list all rows of ScaniaEdiSolMessage and name it EdiSolMsg. Then it exclude almost all rows. In fact the query above give one hit from 28000 records.
In OCL it is easy to list all instances:
ScaniaEdiSolMessage.allinstances
Also easy to filter rows by select for example:
ScaniaEdiSolMessage.allinstances->select(shipFromFinland and not invalidated)
But I do not understand how I should make a OCL to match the SQL above.
Listen to Gabriel and Stephanie, learn more SQL.
You state that you want to make the code more maintainable, yet the number of developers who understand SQL is greater by far than the number of developers who understand OCL.
If you leave the project tomorrow after converting this to OCL, the chances that you'd be able to find someone who could maintain the OCL are very slim. However, the chances that you could find someone to maintain the SQL are very high.
Don't try to fit a square peg in a round hole just because you're good with round hammers :)
There is a project, Dresden OCL, that might help you.
Dresden OCL provides a set of tools to parse and evaluate OCL constraints on various models like UML, EMF and Java. Furthermore Dresden OCL provides tools for Java/AspectJ and SQL code generation. The tools of Dresden OCL can be either used as a library for other project or as a plug-in project that extends Eclipse with OCL support.
I haven't used it, but there is a demo showing how the tool generates SQL from a model and OCL constraints. I realize you're asking for the opposite, but maybe using this you can figure it out. There is also a paper that describes OCL->SQL transformations by the same people.
With MDriven (successor of Bold for Delphi) I would do it like this:
When working with OCL to SQL everything becomes easier if you think about the different set's of information you need to check - and then use ocl operators as ->intersection to find the set you are after.
So in your case you might have a set like this:
ScaniaEdiSolMessage.allinstances->select(shipFromFinland and not invalidated)
but you also have a set like this:
ScaniaEdiSolMessage.allinstances->select(m|m.ReceivingOwner.MessageType = 'IFTMBP')
And you further more have this criteria:
Parcel.allinstances->select(p|p.Messages->exists(m|m.MessageType = 'IFTMBF')).Messages
If all these Sets have the same result type (collection of ScaniaEdiSolMessage) you can simply intersect them to get your desired result
ScaniaEdiSolMessage.allinstances->select(shipFromFinland and not invalidated)
->intersection(ScaniaEdiSolMessage.allinstances->select(m|m.ReceivingOwner.MessageType = 'IFTMBP'))
->intersection(Parcel.allinstances->select(p|p.Messages->exists(m|m.MessageType = 'IFTMBF')).Messages
)
And looking at that we can reduce it a bit to:
ScaniaEdiSolMessage.allinstances
->select(m|m.shipFromFinland and (not m.invalidated) and
(m.ReceivingOwner.MessageType = 'IFTMBP'))
->intersection(Parcel.allinstances->select(p|
p.Messages->exists(m|m.MessageType = 'IFTMBF')).Messages
)