Translate SQL to OCL? - sql

I have a piece of SQL that I want to translate to OCL. I'm not good at SQL so I want to increase maintainability by this. We are using Interbase 2009, Delphi 2007 with Bold and modeldriven development. Now my hope is that someone here both speaks good SQL and OCL :-)
The original SQL:
Select Bold_Id, MessageId, ScaniaId, MessageType, MessageTime, Cancellation, ChassieNumber, UserFriendlyFormat, ReceivingOwner, Invalidated, InvalidationReason,
(Select Parcel.MCurrentStates From Parcel
Where ScaniaEdiSolMessage.ReceivingOwner = Parcel.Bold_Id) as ParcelState From ScaniaEdiSolMessage
Where MessageType = 'IFTMBP' and
not Exists (Select * From ScaniaEdiSolMessage EdiSolMsg
Where EdiSolMsg.ChassieNumber = ScaniaEdiSolMessage.ChassieNumber and EdiSolMsg.ShipFromFinland = ScaniaEdiSolMessage.ShipFromFinland and EdiSolMsg.MessageType = 'IFTMBF') and
invalidated = 0 Order By MessageTime desc
After a small simplification:
Select Bold_Id, (Select Parcel.MCurrentStates From Parcel
where ScaniaEdiSolMessage.ReceivingOwner = Parcel.Bold_Id) From ScaniaEdiSolMessage
Where MessageType = 'IFTMBP' and not Exists (Select * From ScaniaEdiSolMessage
EdiSolMsg Where EdiSolMsg.ChassieNumber = ScaniaEdiSolMessage.ChassieNumber and
EdiSolMsg.ShipFromFinland = ScaniaEdiSolMessage.ShipFromFinland and
EdiSolMsg.MessageType = 'IFTMBF') and invalidated = 0
NOTE: There are 2 cases for MessageType, 'IFTMBP' and 'IFTMBF'.
So the table to be listed is ScaniaEdiSolMessage.
It has attributes like:
MessageType: String
ChassiNumber: String
ShipFromFinland: Boolean
Invalidated: Boolean
It has also a link to table Parcel named ReceivingOwner with BoldId as key.
So it seems like it list all rows of ScaniaEdiSolMessage and then have a subquery that also list all rows of ScaniaEdiSolMessage and name it EdiSolMsg. Then it exclude almost all rows. In fact the query above give one hit from 28000 records.
In OCL it is easy to list all instances:
ScaniaEdiSolMessage.allinstances
Also easy to filter rows by select for example:
ScaniaEdiSolMessage.allinstances->select(shipFromFinland and not invalidated)
But I do not understand how I should make a OCL to match the SQL above.

Listen to Gabriel and Stephanie, learn more SQL.
You state that you want to make the code more maintainable, yet the number of developers who understand SQL is greater by far than the number of developers who understand OCL.
If you leave the project tomorrow after converting this to OCL, the chances that you'd be able to find someone who could maintain the OCL are very slim. However, the chances that you could find someone to maintain the SQL are very high.
Don't try to fit a square peg in a round hole just because you're good with round hammers :)

There is a project, Dresden OCL, that might help you.
Dresden OCL provides a set of tools to parse and evaluate OCL constraints on various models like UML, EMF and Java. Furthermore Dresden OCL provides tools for Java/AspectJ and SQL code generation. The tools of Dresden OCL can be either used as a library for other project or as a plug-in project that extends Eclipse with OCL support.
I haven't used it, but there is a demo showing how the tool generates SQL from a model and OCL constraints. I realize you're asking for the opposite, but maybe using this you can figure it out. There is also a paper that describes OCL->SQL transformations by the same people.

With MDriven (successor of Bold for Delphi) I would do it like this:
When working with OCL to SQL everything becomes easier if you think about the different set's of information you need to check - and then use ocl operators as ->intersection to find the set you are after.
So in your case you might have a set like this:
ScaniaEdiSolMessage.allinstances->select(shipFromFinland and not invalidated)
but you also have a set like this:
ScaniaEdiSolMessage.allinstances->select(m|m.ReceivingOwner.MessageType = 'IFTMBP')
And you further more have this criteria:
Parcel.allinstances->select(p|p.Messages->exists(m|m.MessageType = 'IFTMBF')).Messages
If all these Sets have the same result type (collection of ScaniaEdiSolMessage) you can simply intersect them to get your desired result
ScaniaEdiSolMessage.allinstances->select(shipFromFinland and not invalidated)
->intersection(ScaniaEdiSolMessage.allinstances->select(m|m.ReceivingOwner.MessageType = 'IFTMBP'))
->intersection(Parcel.allinstances->select(p|p.Messages->exists(m|m.MessageType = 'IFTMBF')).Messages
)
And looking at that we can reduce it a bit to:
ScaniaEdiSolMessage.allinstances
->select(m|m.shipFromFinland and (not m.invalidated) and
(m.ReceivingOwner.MessageType = 'IFTMBP'))
->intersection(Parcel.allinstances->select(p|
p.Messages->exists(m|m.MessageType = 'IFTMBF')).Messages
)

Related

PyPika control order of with clauses

I am using PyPika (version 0.37.6) to create queries to be used in BigQuery. I am building up a query that has two WITH clauses, and one clause is dependent on the other. Due to the dynamic nature of my application, I do not have control over the order in which those WITH clauses are added to the query.
Here is example working code:
a_alias = AliasedQuery("a")
b_alias = AliasedQuery("b")
a_subq = Query.select(Term.wrap_constant("1").as_("z")).select(Term.wrap_constant("2").as_("y"))
b_subq = Query.from_(a_alias).select("z")
q = Query.with_(a_subq, "a").from_(a_alias).select(a_alias.y)
q = q.with_(b_subq, "b").from_(b_alias).select(b_alias.z)
sql = q.get_sql(quote_char=None)
That generates a working query:
WITH a AS (SELECT '1' z,'2' y) ,b AS (SELECT a.z FROM a) SELECT a.y,b.z FROM a,b
However, if I add the b WITH clause first, then since a is not yet defined, the resulting query:
WITH b AS (SELECT a.z FROM a), a AS (SELECT '1' z,'2' y) SELECT a.y,b.z FROM a,b
does not work. Since BigQuery does not support WITH RECURSIVE, that is not an option for me.
Is there any way to control the order of the WITH clauses? I see the _with list in the QueryBuilder (the type of variable q), but since that's a private variable, I don't want to rely on that, especially as new versions of PyPika may not operate the same way.
One way I tried to do this is to always insert the first WITH clause at the beginning of the _with list, like this:
q._with.insert(0, q._with.pop())
Although this works, I'd like to use a PyPika supported way to do that.
In a related question, is there a supported way within PyPika to see what has already been added to the select list or other parts of the query? I noticed the q.selects member variable, but selects is not part of the public documentation. Using q.selects did not actually work for me when using our project's Python version (3.6) even though it did work in Python 3.7. The code I was trying to use is:
if any(field.name == "date" for field in q.selects if isinstance(field, Field))
The error I got was as follows:
def __getitem__(self, item: slice) -> "BetweenCriterion":
if not isinstance(item, slice):
> raise TypeError("Field' object is not subscriptable")
Thank you in advance for your help.
I could not figure out how to control the order of the WITH clauses after calling query.with_() (except for the hack already noted). As a result, I restructured my application to get around this problem. I am now calling query.with_() before building up the rest of the query.
This also made my related question moot, because I no longer need to see what I've already added to the query.

Best design patterns for refactoring code without breaking other parts

I have some PHP code from an application that was written using Laravel. One of the modules was written quite poorly. The controller of this module has a whole bunch of reporting functions which uses the functions defined inside the model object to fetch data. And the functions inside the model object are super messy.
Following is a list of some of the functions from the controller (ReportController.php) and the model (Report.php)(I'm only giving names of functions and no implementations as my question is design related)
Functions from ReportController.php
questionAnswersReportPdf()
fetchStudentAnswerReportDetail()
studentAnswersReport()
wholeClassScoreReportGradebookCSV()
wholeClassScoreReportCSV()
wholeClassScoreReportGradebook()
formativeWholeClassScoreReportGradebookPdf()
wholeClassScoreReport()
fetchWholeClassScoreReport()
fetchCurriculumAnalysisReportData()
curriculumAnalysisReportCSV()
curriculumAnalysisReportPdf()
studentAnswersReportCSV()
fetchStudentScoreReportStudents()
Functions from Report.php
getWholeClassScoreReportData
getReportsByFilters
reportMeta
fetchCurriculumAnalysisReportData
fetchCurriculumAnalysisReportGraphData
fetchCurriculumAnalysisReportUsersData
fetchTestHistoryClassAveragesData
fetchAllTestHistoryClassAveragesData
fetchAllTestHistoryClassAveragesDataCsv
fetchHistoryClassAveragesDataCsv
fetchHistoryClassAveragesGraphData
The business logic has been written in quite a messy way also. Some parts of it are in the controller while other parts are in the model object.
I have 2 specific questions :
a) I have an ongoing goal of reducing code complexity and optimizing code structure. How can I leverage common OOP design patterns to ensure altering the code in any given report does not negatively affect the other reports? I specifically want to clean up the code for some critical reports first but want to ensure that by doing this none of the other reports will break.
b) The reporting module is relatively static in definition and unlikely to change over time. The majority of reports generated by the application involve nested sub-queries as well as standard grouping & filtering options. Most of these SQL queries have been hosed within the functions of the model object and contain some really complex joins. Without spending time evaluating the database structure or table indices, which solution architecture techniques would you recommend for scaling the report functionality to ensure optimized performance? Below is a snippet of one of the SQL queries
$sql = 'SELECT "Parent"."Id",
"Parent"."ParentId",
"Parent"."Name" as systemStandardName,
string_agg(DISTINCT((("SubsectionQuestions"."QuestionSerial"))::text) , \', \') AS "quesions",
count(DISTINCT("SubsectionQuestions"."QuestionId")) AS "totalQuestions",
case when sum("SQUA"."attemptedUsers")::float > 0 then
(COALESCE(round((
(
sum(("SQUA"."totalCorrectAnswers"))::float
/
sum("SQUA"."attemptedUsers")::float
)
*100
)::numeric),0))
else 0 end as classacuracy,
case when sum("SQUA"."attemptedUsers")::float > 0 then
(COALESCE((round(((1 -
(
(
sum(("SQUA"."totalCorrectAnswers"))::float
/
sum("SQUA"."attemptedUsers")::float
)
)
)::float * count(DISTINCT("SubsectionQuestions"."QuestionId")))::numeric,1)),0))
else 0 end as pgain
FROM "'.$gainCategoryTable.'" as "Parent"
'.$resourceTableJoin.'
INNER JOIN "SubsectionQuestions"
ON "SubsectionQuestions"."QuestionId" = "resourceTable"."ResourceId"
INNER JOIN "Subsections"
ON "Subsections"."Id" = "SubsectionQuestions"."SubsectionId"
LEFT Join (
Select "SubsectionQuestionId",
count(distinct case when "IsCorrect" = \'Yes\' then CONCAT ("UserId", \' \', "SubsectionQuestionId") else null end) AS "totalCorrectAnswers"
, count(distinct CONCAT ("UserId", \' \', "SubsectionQuestionId")) AS "attemptedUsers"
From "SubsectionQuestionUserAnswers"';
if(!empty($selectedUserIdsArr)){
$sql .= ' where "UserId" IN (' .implode (",", $selectedUserIdsArr).')' ;
}else {
$sql .= ' where "UserId" IN (' .implode (",", $assignmentUsers).')' ;
}
$sql .= ' AND "AssessmentAssignmentId" = '.$assignmentId.' AND "SubsectionQuestionId" IN ('.implode(",", $subsectionQuestions).') Group by "SubsectionQuestionId"
) as "SQUA" on "SQUA"."SubsectionQuestionId" = "SubsectionQuestions"."Id"
INNER JOIN "AssessmentAssignment"
ON "AssessmentAssignment"."assessmentId" = "Subsections"."AssessmentId"
INNER JOIN "AssessmentAssignmentUsers"
ON "AssessmentAssignmentUsers"."AssignmentId" = "AssessmentAssignment"."Id"
AND "AssessmentAssignmentUsers"."Type" = \'User\'
'.$conditaionlJoin.'
WHERE "Parent"."Id" IN ('.implode(',', $ssLeaf).')
'.$conditionalWhere.'
GROUP BY "Parent"."Id",
"Parent"."ParentId",
"Parent"."Name"
'.$sorter;
$results = DB::select(DB::raw($sql));
My take on a). In my experience when I want to reduce code complexity/sheer messiness I slowly refactor out code that violates the single responsibility principle while I'm working in that area already for either a bug fix or a feature update. I try not to spend hours upon hours of updating code that is "working" that I'm not actively updating for a business process reason. Follow the "Leave it better than you found it" approach as you work in this code base, and it will get better over time. Doing this will allow you to improve the code base while also getting features and bug fixes out the door, while also keeping business owners/project managers happy because you're keeping things moving.
about a) : The first thing I do, to ensure none of my refactoring is breaking anything, is covering the code with, at least, unitary tests (doing TDD ensures the most optimal coverage). It's easier when, like #DavidY says, you respect principles like SRP (does my class try to answer too many problems ?). With test, you'll feel safer when you'll need to refactor, and the tests will tell you exactly where it broke.
about b) : Do not optimize until you need it. And optimize only when you know what cost you the most. It's the best way to know what pattern you need, otherwise you may try to force the wrong solution into the wrong problem.

How to use LINQ to Entities to make a left join using a static value

I've got a few tables, Deployment, Deployment_Report and Workflow. In the event that the deployment is being reviewed they join together so you can see all details in the report. If a revision is going out, the new workflow doesn't exist yet new workflow is going into place so I'd like the values to return null as the revision doesn't exist yet.
Complications aside, this is a sample of the SQL that I'd like to have run:
DECLARE #WorkflowID int
SET #WorkflowID = 399 -- Set to -1 if new
SELECT *
FROM Deployment d
LEFT JOIN Deployment_Report r
ON d.FSJ_Deployment_ID = r.FSJ_Deployment_ID
AND r.Workflow_ID = #WorkflowID
WHERE d.FSJ_Deployment_ID = 339
The above in SQL works great and returns the full record if viewing an active workflow, or the left side of the record with empty fields for revision details which haven't been supplied in the event that a new report is being generated.
Using various samples around S.O. I've produced some Entity to SQL based on a few multiple on statements but I feel like I'm missing something fundamental to make this work:
int Workflow_ID = 399 // or -1 if new, just like the above example
from d in context.Deployments
join r in context.Deployment_Reports.DefaultIfEmpty()
on
new { d.Deployment_ID, Workflow_ID }
equals
new { r.Deployment_ID, r.Workflow_ID }
where d.FSJ_Deployment_ID == fsj_deployment_id
select new
{
...
}
Is the SQL query above possible to create using LINQ to Entities without employing Entity SQL? This is the first time I've needed to create such a join since it's very confusing to look at but in the report it's the only way to do it right since it should only return one record at all times.
The workflow ID is a value passed in to the call to retrieve the data source so in the outgoing query it would be considered a static value (for lack of better terminology on my part)
First of all don't kill yourself on learning the intricacies of EF as there are a LOT of things to learn about it. Unfortunately our deadlines don't like the learning curve!
Here's examples to learn over time:
http://msdn.microsoft.com/en-us/library/bb397895.aspx
In the mean time I've found this very nice workaround using EF for this kind of thing:
var query = "SELECT * Deployment d JOIN Deployment_Report r d.FSJ_Deployment_ID = r.Workflow_ID = #WorkflowID d.FSJ_Deployment_ID = 339"
var parm = new SqlParameter(parameterName="WorkFlowID" value = myvalue);
using (var db = new MyEntities()){
db.Database.SqlQuery<MyReturnType>(query, parm.ToArray());
}
All you have to do is create a model for what you want SQL to return and it will fill in all the values you want. The values you are after are all the fields that are returned by the "Select *"...
There's even a really cool way to get EF to help you. First find the table with the most fields, and get EF to generated the model for you. Then you can write another class that inherits from that class adding in the other fields you want. SQL is able to find all fields added regardless of class hierarchy. It makes your job simple.
Warning, make sure your filed names in the class are exactly the same (case sensitive) as those in the database. The goal is to make a super class model that contains all the fields of all the join activity. SQL just knows how to put them into that resultant class giving you strong typing ability and even more important use-ability with LINQ
You can even use dataannotations in the Super Class Model for displaying other names you prefer to the User, this is a super nice way to keep the table field names but show the user something more user friendly.

orientdb sql update edge?

I have been messing around with orientdb sql, and I was wondering if there is a way to update an edge of a vertex, together with some data on it.
assuming I have the following data:
Vertex: Person, Room
Edge: Inside (from Person to Room)
something like:
UPDATE Persons SET phone=000000, out_Inside=(
select #rid from Rooms where room_id=5) where person_id=8
obviously, the above does not work. It throws exception:
Error: java.lang.ClassCastException: com.orientechnologies.orient.core.id.ORecordId cannot be cast to com.orientechnologies.orient.core.db.record.ridbag.ORidBag
I tried to look at the sources at github searching for a syntax for bag with 1 item,
but couldn't find any (found %, but that seems to be for serialization no for SQL).
(1) Is there any way to do that then? how do I update a connection? Is there even a way, or am I forced to create a new edge, and delete the old one?
(2) When writing this, it came to my mind that perhaps edges are not the way to go in this case. Perhaps I should use a LINK instead. I have to say i'm not sure when to use which, or what are the implications involved in using any of them. I did found this though:
https://groups.google.com/forum/#!topic/orient-database/xXlNNXHI1UE
comment 3 from the top, of Lvc#, where he says:
"The suggested way is to always create an edge for relationships"
Also, even if I should use a link, please respond to (1). I would be happy to know the answer anyway.
p.s.
In my scenario, a person can only be at one room. This will most likely not change in the future. Obviously, the edge has the advantage that in case I might want to change it (however improbable that may be), it will be very easy.
Solution (partial)
(1) The solution was simply to remove the field selection. Thanks for Lvca for pointing it out!
(2) --Still not sure--
CREATE EDGE and DELETE EDGE commands have this goal: avoid the user to fight with underlying structure.
However if you want to do it (a little "dirty"), try this one:
UPDATE Persons SET phone=000000, out_Inside=(
select from Rooms where room_id=5) where person_id=8
update EDGE Custom_Family_Of_Custom
set survey_status = '%s',
apply_source = '%s'
where #rid in (
select level1_e.#rid from (
MATCH {class: Custom, as: custom, where: (custom_uuid = '%s')}.bothE('Custom_Family_Of_Custom') {as: level1_e} .bothV('Custom') {as: level1_v, where: (custom_uuid = '%s')} return level1_e
)
)
it works well

Endeca UrlENEQuery java API search

I'm currently trying to create an Endeca query using the Java API for a URLENEQuery. The current query is:
collection()/record[CONTACT_ID = "xxxxx" and SALES_OFFICE = "yyyy"]
I need it to be:
collection()/record[(CONTACT_ID = "xxxxx" or CONTACT_ID = "zzzzz") and
SALES_OFFICE = "yyyy"]
Currently this is being done with an ERecSearchList with CONTACT_ID and the string I'm trying to match in an ERecSearch object, but I'm having difficulty figuring out how to get the UrlENEQuery to generate the or in the correct fashion as I have above. Does anyone know how I can do this?
One of us is confused on multiple levels:
Let me try to explain why I am confused:
If Contact_ID and Sales_Office are different dimensions, where Contact_ID is a multi-or dimension, then you don't need to use EQL (the xpath like language) to do anything. Just select the appropriate dimension values and your navigation state will reflect the query you are trying to build with XPATH. IE CONTACT_IDs "ORed together" with SALES_OFFICE "ANDed".
If you do have to use EQL, then the only way to modify it (provided that you have to modify it from the returned results) is via string manipulation.
ERecSearchList gives you ability to use "Search Within" functionality which functions completely different from the EQL filtering, though you can achieve similar results by using tricks like searching only specified field (which would be separate from the generic search interface") I am still not sure what's the connection between ERecSearchList and the EQL expression above?
Having expressed my confusion, I think what you need to do is to use String manipulation to dynamically build the EQL expression and add it to the Query.
A code example of what you are doing would be extremely helpful as well.