How to design generic filtering operators in the query string of an API? - sql

I'm building a generic API with content and a schema that can be user-defined. I want to add filtering logic to API responses, so that users can query for specific objects they've stored in the API. For example, if a user is storing event objects, they could do things like filter on:
Array contains: Whether properties.categories contains Engineering
Greater than: Whether properties.created_at is older than 2016-10-02
Not equal: Whether properties.address.city is not Washington
Equal: Whether properties.name is Meetup
etc.
I'm trying to design filtering into the query string of API responses, and coming up with a few options, but I'm not sure which syntax for it is best...
1. Operator as Nested Key
/events?properties.name=Harry&properties.address.city.neq=Washington
This example is uses just a nested object to specific the operators (like neq as shown). This is nice in that it is very simple, and easy to read.
But in cases where the properties of an event can be defined by the user, it runs into an issue where there is a potential clash between a property named address.city.neq using a normal equal operator, and a property named address.city using a not equal operator.
Example: Stripe's API
2. Operator as Key Suffix
/events?properties.name=Harry&properties.address.city+neq=Washington
This example is similar to the first one, except it uses a + delimiter (which is equivalent to a space) for operations, instead of . so that there is no confusion, since keys in my domain can't contain spaces.
One downside is that it is slightly harder to read, although that's arguable since it might be construed as more clear. Another might be that it is slightly harder to parse, but not that much.
3. Operator as Value Prefix
/events?properties.name=Harry&properties.address.city=neq:Washington
This example is very similar to the previous one, except that it moves the operator syntax into the value of the parameter instead of the key. This has the benefit of eliminating a bit of the complexity in parsing the query string.
But this comes at the cost of no longer being able to differentiate between an equal operator checking for the literal string neq:Washington and a not equal operator checking for the string Washington.
Example: Sparkpay's API
4. Custom Filter Parameter
/events?filter=properties.name==Harry;properties.address.city!=Washington
This example uses a single top-level query paramter, filter, to namespace all of the filtering logic under. This is nice in that you never have to worry about the top-level namespace colliding. (Although in my case, everything custom is nested under properties. so this isn't an issue in the first place.)
But this comes at a cost of having a harder query string to type out when you want to do basic equality filtering, which will probably result in having to check the documentation most of the time. And relying on symbols for the operators might lead to confusion for non-obvious operations like "near" or "within" or "contains".
Example: Google Analytics's API
5. Custom Verbose Filter Parameter
/events?filter=properties.name eq Harry; properties.address.city neq Washington
This example uses a similar top-level filter parameter as the previous one, but it spells out the operators with word instead of defining them with symbols, and has spaces between them. This might be slightly more readable.
But this comes at a cost of having a longer URL, and a lot of spaces that will need to be encoded?
Example: OData's API
6. Object Filter Parameter
/events?filter[1][key]=properties.name&filter[1][eq]=Harry&filter[2][key]=properties.address.city&filter[2][neq]=Washington
This example also uses a top-level filter parameter, but instead of creating a completely custom syntax for it that mimics programming, it instead builds up an object definition of filters using a more standard query string syntax. This has the benefit of bring slightly more "standard".
But it comes at the cost of being very verbose to type and hard to parse.
Example Magento's API
Given all of those examples, or a different approach, which syntax is best? Ideally it would be easy to construct the query parameter, so that playing around in the URL bar is doable, but also not pose problems for future interoperability.
I'm leaning towards #2 since it seems like it is legible, but also doesn't have some of the downsides of other schemes.

I might not answer the "which one is best" question, but I can at least give you some insights and other examples to consider.
First, you are talking about "generic API with content and a schema that can be user-defined".
That sound a lot like solr / elasticsearch which are both hi level wrappers over Apache Lucene which basically indexes and aggregates documents.
Those two took totally different approaches to their rest API, I happened to work with both of them.
Elasticsearch :
They made entire JSON based Query DSL, which currently looks like this :
GET /_search
{
"query": {
"bool": {
"must": [
{ "match": { "title": "Search" }},
{ "match": { "content": "Elasticsearch" }}
],
"filter": [
{ "term": { "status": "published" }},
{ "range": { "publish_date": { "gte": "2015-01-01" }}}
]
}
}
}
Taken from their current doc. I was surprised that you can actually put data in GET...
It actually looks better now, in earlier versions it was much more hierarchical.
From my personal experience, this DSL was powerful, but rather hard to learn and use fluently (especially older versions). And to actually get some result you need more than just play with URL. Starting with the fact that many clients don't even support data in GET request.
SOLR :
They put everything into query params, which basically looks like this (taken from the doc) :
q=*:*&fq={!cache=false cost=5}inStock:true&fq={!frange l=1 u=4 cache=false cost=50}sqrt(popularity)
Working with that was more straightforward. But that's just my personal taste.
Now about my experiences. We were implementing another layer above those two and we took approach number #4. Actually, I think #4 and #5 should be supported at the same time. Why? Because whatever you pick people will be complaining, and since you will be having your own "micro-DSL" anyway, you might as well support few more aliases for your keywords.
Why not #2? Having single filter param and query inside gives you total control over DSL. Half a year after we made our resource, we got "simple" feature request - logical OR and parenthesis (). Query parameters are basically a list of AND operations and logical OR like city=London OR age>25 don't really fit there. On the other hand parenthesis introduced nesting into DSL structure, which would also be a problem in flat query string structure.
Well, those were the problems we stumbled upon, your case might be different. But it is still worth to consider, what future expectations from this API will be.

Matomo Analytics has an other approach to deal with segment filter and its syntaxe seems to be more readable and intuitive, e.g:
developer.matomo.org/api-reference/reporting-api-segmentation
Operator
Behavior
Example
==
Equals
&segment=countryCode==IN Return results where the country is India
!=
Not equals
&segment=actions!=1 Return results where the number of actions (page views, downloads, etc.) is not 1
<=
Less than or equal to
&segment=actions<=4 Return results where the number of actions (page views, downloads, etc.) is 4 or less
<
Less than
&segment=visitServerHour<12 Return results where the Server time (hour) is before midday.
=#
Contains
&segment=referrerName=#piwik Return results where the Referer name (website domain or search engine name) contains the word "piwik".
!#
Does not contain
&segment=referrerKeyword!#yourBrand Return results where the keyword used to access the website does not contain word "yourBrand".
=^
Starts with
&segment=referrerKeyword=^yourBrand Return results where the keyword used to access the website starts with "yourBrand" (requires at least Matomo 2.15.1).
=$
Ends with
&segment=referrerKeyword=$yourBrand Return results where the keyword used to access the website ends with "yourBrand" (requires at least Matomo 2.15.1).
and you can have a close look at how they parse the segment filter here: https://github.com/matomo-org/matomo/blob/4.x-dev/core/Segment/SegmentExpression.php

#4
I like how Google Analytics filter API looks like, easy to use and easy to understand from a client's point of view.
They use a URL encoded form, for example:
Equals: %3D%3D filters=ga:timeOnPage%3D%3D10
Not equals: !%3D filters=ga:timeOnPage!%3D10
Although you need to check documentation but it still has its own advantages. IF you think that the users can get accustomed to this then go for it.
#2
Using operators as key suffixes also seems like a good idea (according to your requirements).
However I would recommend to encode the + sign so that it isn't parsed as a space. Also it might be slightly harder to parse as mentioned but I think you can write a custom parser for this one. I stumbled across this gist by jlong some time back. Perhaps you'll find it useful to write your parser.

You could also try Spring Expression Language (SpEL)
All you need to do is to stick to the said format in the document, the SpEL engine would take care of parsing the query and executing it on a given object. Similar to your requirement of filtering a list of objects, you could write the query as:
properties.address.city == 'Washington' and properties.name == 'Harry'
It supports all kind of relational and logical operators that you would need. The rest api could just take this query as the filter string and pass it to SpEL engine to run on an object.
Benefits: it's readable, easy to write, and execution is well taken care of.
So, the URL would look like:
/events?filter="properties.address.city == 'Washington' and properties.name == 'Harry'"
Sample code using org.springframework:spring-core:4.3.4.RELEASE :
The main function of interest:
/**
* Filter the list of objects based on the given query
*
* #param query
* #param objects
* #return
*/
private static <T> List<T> filter(String query, List<T> objects) {
ExpressionParser parser = new SpelExpressionParser();
Expression exp = parser.parseExpression(query);
return objects.stream().filter(obj -> {
return exp.getValue(obj, Boolean.class);
}).collect(Collectors.toList());
}
Complete example with helper classes and other non-interesting code:
import java.util.Arrays;
import java.util.List;
import java.util.stream.Collectors;
import org.springframework.expression.Expression;
import org.springframework.expression.ExpressionParser;
import org.springframework.expression.spel.standard.SpelExpressionParser;
public class SpELTest {
public static void main(String[] args) {
String query = "address.city == 'Washington' and name == 'Harry'";
Event event1 = new Event(new Address("Washington"), "Harry");
Event event2 = new Event(new Address("XYZ"), "Harry");
List<Event> events = Arrays.asList(event1, event2);
List<Event> filteredEvents = filter(query, events);
System.out.println(filteredEvents.size()); // 1
}
/**
* Filter the list of objects based on the query
*
* #param query
* #param objects
* #return
*/
private static <T> List<T> filter(String query, List<T> objects) {
ExpressionParser parser = new SpelExpressionParser();
Expression exp = parser.parseExpression(query);
return objects.stream().filter(obj -> {
return exp.getValue(obj, Boolean.class);
}).collect(Collectors.toList());
}
public static class Event {
private Address address;
private String name;
public Event(Address address, String name) {
this.address = address;
this.name = name;
}
public Address getAddress() {
return address;
}
public void setAddress(Address address) {
this.address = address;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
}
public static class Address {
private String city;
public Address(String city) {
this.city = city;
}
public String getCity() {
return city;
}
public void setCity(String city) {
this.city = city;
}
}
}

I decided to compare the approaches #1/#2 (1) and #3 (2) and concluded that (1) is preferred (at least, for Java server side).
Assume, some parameter a must be equal 10 or 20. Our URL query in this case must look like ?a.eq=10&a.eq=20 for (1) and ?a=eq:10&a=eq:20 for (2). In Java HttpServletRequest#getParameterMap() will return the next values: { a.eq: [10, 20] } for (1) and { a: [eq:10, eq:20] } for (2). Later we must convert returned maps, for example, to SQL where clause. And we should get: where a = 10 or a = 20 for both (1) and (2). Briefly, it looks something like that:
1) ?a=eq:10&a=eq:20 -> { a: [eq:10, eq:20] } -> where a = 10 or a = 20
2) ?a.eq=10&a.eq=20 -> { a.eq: [10, 20] } -> where a = 10 or a = 20
So, we got the next rule: when we pass through URL query two parameters with the same name we must use OR operand in SQL.
But let's assume another case. The parameter a must be greater than 10 and less than
20. Applying the rule above we will have the next conversion:
1) ?a.gt=10&a.ls=20 -> { a.gt: 10, a.lt: 20 } -> where a > 10 and a < 20
2) ?a=gt:10&a=ls:20 -> { a: [gt.10, lt.20] } -> where a > 10 or(?!) a < 20
As you can see, in (1) we have two parameters with different names: a.gt and a.ls. This means our SQL query will have AND operand. But for (2) we still have the same names and it must be converted to the SQL with OR operand!
This means that for (2) instead of using #getParameterMap() we must directly parse the URL query and analyze repeated parameter names.

I know this is old school, but how about a sort of operator overloading?
It would make the query parsing a lot harder (and not standard CGI), but would resemble the contents of an SQL WHERE clause.
/events?properties.name=Harry&properties.address.city+neq=Washington
would become
/events?properties.name=='Harry'&&properties.address.city!='Washington'||properties.name=='Jack'&&properties.address.city!=('Paris','New Orleans')
paranthesis would start a list. Keeping strings in quotes would simplify parsing.
So the above query would be for events for Harry's not in Washington or for Jacks not in Paris or in New Orleans.
It would be a ton of work to implement... and the database optimization to run those queries would be a nightmare, but if you're looking for a simple and powerful query language, just imitate SQL :)
-k

Related

Spock Extension - Extracting variable names from Data Tables

In order to extract data table values to use in a reporting extension for Spock, I am using the
following code:
#Override
public void beforeIteration(IterationInfo iteration) {
Object[] values = iteration.getDataValues();
}
This returns to me the reference to the objects in the data table. However, I would like to get
the name of the variable that references the value.
For example, in the following test:
private static User userAge15 = instantiateUserByAge(15);
private static User userAge18 = instantiateUserByAge(18);
private static User userAge19 = instantiateUserByAge(19);
private static User userAge40 = instantiateUserByAge(40);
def "Should show popup if user is 18 or under"(User user, Boolean shouldShowPopup) {
given: "the user <user>"
when: "the user do whatever"
...something here...
then: "the popup is shown is <showPopup>"
showPopup == shouldShowPopup
where:
user | shouldShowPopup
userAge15 | true
userAge18 | true
userAge19 | false
userAge40 | false
}
Is there a way to receive the string “userAge15”, “userAge18”, “userAge19”, “userAge40” instead of their values?
The motivation for this is that the object User is complex with lots of information as name, surname, etc, and its toString() method would make the where clause unreadable in the report I generate.
You can use specificationContext.currentFeature.dataVariables. It returns a list of strings containing the data variable names. This should work both in Spock 1.3 and 2.0.
Edit: Oh sorry, you do not want the data variable names ["a", "b", "expected"] but ["test1", "test1", "test2"]. Sorry, I cannot help you with that and would not if I could because that is just a horrible way to program IMO. I would rather make sure the toString() output gets shortened or trimmed in an appropriate manner, if necessary, or to (additionally or instead) print the class name and/or object ID.
Last but not least, writing tests is a design tool uncovering potential problems in your application. You might want to ask yourself why toString() results are not suited to print in a report and refactor those methods. Maybe your toString() methods use line breaks and should be simplified to print a one-line representation of the object. Maybe you want to factor out the multi-line representation into other methods and/or have a set of related methods like toString(), toShortString(), toLongString() (all seen in APIs before) or maybe something specific like toMultiLineString().
Update after OP significantly changed the question:
If the user of your extension feels that the report is not clear enough, she could add a column userType to the data table, containing values like "15 years old".
Or maybe simpler, just add an age column with values like 15, 18, 19, 40 and instantiate users directly via instantiateUserByAge(age) in the user column or in the test's given section instead of creating lots of static variables. The age value would be reported by your extension. In combination with an unrolled feature method name using #age this should be clear enough.
Is creating users so super expensive you have to put them into static variables? You want to avoid statics if not really necessary because they tend to bleed over side effects to other tests if those objects are mutable and their internal state changes in one test, e.g. because someone conveniently uses userAge15 in order to test setAge(int). Try to avoid premature optimisations via static variables which often just save microseconds. Even if you do decide to pre-create a set of users and re-use them in all tests, you could put them into a map with the age being the key and conveniently retrieve them from your feature methods, again just using the age in the data table as an input value for querying the map, either directly or via a helper method.
Bottom line: I think you do not have to change your extension in order to cater to users writing bad tests. Those users ought to learn how to write better tests. As a side effect, the reports will also look more comprehensive. 😀

Common return type for all ANTLR visitor methods

I'm writing a parser for an old proprietary report specification with ANTLR and I'm currently trying to implement a visitor of the generated parse tree extending the autogenerated abstract visito class.
I have little experience both with ANTLR (which I learned only recently) and with the visitor pattern in general, but if I understood it correctly, the visitor should encapsulate one single operation on the whole data structure (in this case the parse tree), thus sharing the same return type between each Visit*() method.
Taking an example from The Definitive ANTLR 4 Reference book by Terence Parr, to visit a parse tree generated by a grammar that parses a sequence of arithmetic expressions, it feels natural to choose the int return type, as each node of the tree is actually part of the the arithmetic operation that contributes to the final result by the calculator.
Considering my current situation, I don't have a common type: my grammar parses the whole document, which is actually split in different sections with different responsibilities (variable declarations, print options, actual text for the rows, etc...), and I can't find a common type between the result of the visit of so much different nodes, besides object of course.
I tried to think to some possible solutions:
I firstly tried implementing a stateless visitor using object as
the common type, but the amount of type casts needed sounds like a
big red flag to me. I was considering the usage of JSON, but I think
the problem remains, potentially adding some extra overhead in the
serialization process.
I was also thinking about splitting the visitor in more smaller
visitors with a specific purpose (get all the variables, get all the
rows, etc.), but with this solution for each visitor I would
implement only a small subset of the method of the autogenerated
interface (as it is meant to support the visit of the whole tree),
because each visiting operation would probably focus only on a
specific subtree. Is it normal?
Another possibility could be to redesign the data structure so that
it could be used at every level of the tree or, better, define a generic
specification of the nodes that can be used later to build the data
structure. This solution sounds good, but I think it is difficult to
apply in this domain.
A final option could be to switch to a stateful visitor, which
incapsulates one or more builders for the different sections that
each Visit*() method could use to build the data structure
step-by-step. This solution seems to be clean and doable, but I have
difficulties to think about how to scope the result of each visit
operation in the parent scope when needed.
What solution is generally used to visit complex ANTLR parse trees?
ANTLR4 parse trees are often complex because of recursion, e.g.
I would define the class ParsedDocumentModel whose properties would added or modified as your project evolves (which is normal, no program is set in stone).
Assuming your grammar be called Parser in the file Parser.g4, here is sample C# code:
public class ParsedDocumentModel {
public string Title { get; set; }
//other properties ...
}
public class ParserVisitor : ParserBaseVisitor<ParsedDocumentModel>
{
public override ParsedDocumentModel VisitNounz(NounzContext context)
{
var res = "unknown";
var s = context.GetText();
if (s == "products")
res = "<<products>>"; //for example
var model = new ParsedDocumentModel();
model.Title = res; //add more info...
return model;
}
}

Are extensible records useless in Elm 0.19?

Extensible records were one of the most amazing Elm's features, but since v0.16 adding and removing fields is no longer available. And this puts me in an awkward position.
Consider an example. I want to give a name to a random thing t, and extensible records provide me a perfect tool for this:
type alias Named t = { t | name: String }
„Okay,“ says the complier. Now I need a constructor, i.e. a function that equips a thing with specified name:
equip : String -> t -> Named t
equip name thing = { thing | name = name } -- Oops! Type mismatch
Compilation fails, because { thing | name = ... } syntax assumes thing to be a record with name field, but type system can't assure this. In fact, with Named t I've tried to express something opposite: t should be a record type without its own name field, and the function adds this field to a record. Anyway, field addition is necessary to implement equip function.
So, it seems impossible to write equip in polymorphic manner, but it's probably not a such big deal. After all, any time I'm going to give a name to some concrete thing I can do this by hands. Much worse, inverse function extract : Named t -> t (which erases name of a named thing) requires field removal mechanism, and thus is not implementable too:
extract : Named t -> t
extract thing = thing -- Error: No implicit upcast
It would be extremely important function, because I have tons of routines those accept old-fashioned unnamed things, and I need a way to use them for named things. Of course, massive refactoring of those functions is ineligible solution.
At last, after this long introduction, let me state my questions:
Does modern Elm provides some substitute for old deprecated field addition/removal syntax?
If not, is there some built-in function like equip and extract above? For every custom extensible record type I would like to have a polymorphic analyzer (a function that extracts its base part) and a polymorphic constructor (a function that combines base part with additive and produces the record).
Negative answers for both (1) and (2) would force me to implement Named t in a more traditional way:
type Named t = Named String t
In this case, I can't catch the purpose of extensible records. Is there a positive use case, a scenario in which extensible records play critical role?
Type { t | name : String } means a record that has a name field. It does not extend the t type but, rather, extends the compiler’s knowledge about t itself.
So in fact the type of equip is String -> { t | name : String } -> { t | name : String }.
What is more, as you noticed, Elm no longer supports adding fields to records so even if the type system allowed what you want, you still could not do it. { thing | name = name } syntax only supports updating the records of type { t | name : String }.
Similarly, there is no support for deleting fields from record.
If you really need to have types from which you can add or remove fields you can use Dict. The other options are either writing the transformers manually, or creating and using a code generator (this was recommended solution for JSON decoding boilerplate for a while).
And regarding the extensible records, Elm does not really support the “extensible” part much any more – the only remaining part is the { t | name : u } -> u projection so perhaps it should be called just scoped records. Elm docs itself acknowledge the extensibility is not very useful at the moment.
You could just wrap the t type with name but it wouldn't make a big difference compared to approach with custom type:
type alias Named t = { val: t, name: String }
equip : String -> t -> Named t
equip name thing = { val = thing, name = name }
extract : Named t -> t
extract thing = thing.val
Is there a positive use case, a scenario in which extensible records play critical role?
Yes, they are useful when your application Model grows too large and you face the question of how to scale out your application. Extensible records let you slice up the model in arbitrary ways, without committing to particular slices long term. If you sliced it up by splitting it into several smaller nested records, you would be committed to that particular arrangement - which might tend to lead to nested TEA and the 'out message' pattern; usually a bad design choice.
Instead, use extensible records to describe slices of the model, and group functions that operate over particular slices into their own modules. If you later need to work accross different areas of the model, you can create a new extensible record for that.
Its described by Richard Feldman in his Scaling Elm Apps talk:
https://www.youtube.com/watch?v=DoA4Txr4GUs&ab_channel=ElmEurope
I agree that extensible records can seem a bit useless in Elm, but it is a very good thing they are there to solve the scaling issue in the best way.

Skipping terms of MUST_NOT clauses during term extraction

I am using Lucene 3.6.1. I have a BooleanQuery some clauses of which are marked as Occur.MUST_NOT. When I extract terms from this query, it happily extracts the terms that must not occur as well. This is so because of the following code in BooleanQuery.java
#Override
public void extractTerms(Set<Term> terms) {
for (BooleanClause clause : clauses) {
clause.getQuery().extractTerms(terms);
}
}
I am using these terms to present the user with a set of terms that can be added or removed from the query. If the user has explicitly specified that some term or phrase is not desired (e..g, by adding -"foo bar" to a query), I don't want to show these terms to him. What might make more sense is code like this:
#Override
public void extractTerms(Set<Term> terms) {
for (BooleanClause clause : clauses) {
if (!clause.isProhibited())
clause.getQuery().extractTerms(terms);
}
}
What is the design rationale for the existing implementation? When does it make sense? What's the best way to get around this problem, assuming I don't want negated terms, but don't know where in the query tree they occur?
Gene: maybe you can open a LUCENE Jira ticket for this?
I actually think extractTerms should do as you suggest. For example if i make a simple highlighter that uses this method (which I've done before), I don't want the negative portions either. I'm guessing in general this is the expected behavior for most uses of this method.
At the very least its currently inconsistent, e.g. SpanNotQuery is in the same boat and excludes its "negative" portions from extractTerms.

Implement LINQ to SQL expressions for a database with custom date/time format

I'm working with an MS-SQL database with tables that use a customized date/time format stored as an integer. The format maintains time order, but is not one-to-one with ticks. Simple conversions are possible from the custom format to hours / days / months / etc. - for example, I could derive the month with the SQL statement:
SELECT ((CustomDateInt / 60 / 60 / 24) % 13) AS Month FROM HistoryData
From these tables, I need to generate reports, and I'd like to do this using LINQ-to-SQL. I'd like to have the ability to choose from a variety of grouping methods based on these dates (by month / by year / etc.).
I'd prefer to use the group command in LINQ that targets one of these grouping methods. For performance, I would like the grouping to be performed in the database, rather than pulling all my data into POCO objects first and then custom-grouping them afterwords. For example:
var results = from row in myHistoryDataContext.HistoryData
group row by CustomDate.GetMonth(row.CustomDateInt) into grouping
select new int?[] { grouping.Key , grouping.Count() }
How do I implement my grouping functions (like CustomDate.GetMonth) so that they will be transformed into SQL commands automatically and performed in the database? Do I need to provide them as Func<int, int> objects or Expression<> objects, or by some other means?
You can't write a method and expect L2S to automatically know how to take your method and translate it to SQL. L2S knows about some of the more common methods provided as part of the .NET framework for primitive types. Anything beyond that and it will not know how to perform the translation.
If you have to keep your db model as is:
You can define methods for interacting with the custom format and use them in queries. However, you'll have to help L2S with the translation. To do this, you would look for calls to your methods in the expression tree generated for your query and replace them with an implementation L2S can translate. One way to do this is to provide a proxy IQueryProvider implementation that inspects the expression tree for a given query and performs the replacement before passing it off to the L2S IQueryProvider for translation and execution. The expression tree L2S will see can be translated to SQL because it only contains the simple arithmetic operations used in the definitions of your methods.
If you have the option to change your db model:
You might be better off using a standard DateTime column type for your data. Then your could model the column as System.DateTime and use its methods (which L2S understands). You could achieve this by modifying the table itself or providing a view that performs the conversion and having L2S interact with the view.
Update:
Since you need to keep your current model, you'll want to translate your methods for L2S. Our objective is to replace calls to some specific methods in a L2S query with a lambda L2S can translate. All other calls to these methods will of course execute normally. Here's an example of one way you could do that...
static class DateUtils
{
public static readonly Expression<Func<int, int>> GetMonthExpression = t => (t / 60 / 60 / 24) % 13;
static readonly Func<int, int> GetMonthFunction;
static DateUtils()
{
GetMonthFunction = GetMonthExpression.Compile();
}
public static int GetMonth(int t)
{
return GetMonthFunction(t);
}
}
Here we have a class that defines a lambda expression for getting the month from an integer time. To avoid defining the math twice, you could compile the expression and then invoke it from your GetMonth method as shown here. Alternatively, you could take the body of the lambda and copy it into the body of the GetMonth method. That would skip the runtime compilation of the expression and likely execute faster -- up to you which you prefer.
Notice that the signature of the GetMonthExpression lambda matches the GetMonth method exactly. Next we'll inspect the query expression using System.Linq.Expressions.ExpressionVisitor, find calls to GetMonth, and replace them with our lambda, having substituted t with the value of the first argument to GetMonth.
class DateUtilMethodCallExpander : ExpressionVisitor
{
protected override Expression VisitMethodCall(MethodCallExpression node)
{
LambdaExpression Substitution = null;
//check if the method call is one we should replace
if(node.Method.DeclaringType == typeof(DateUtils))
{
switch(node.Method.Name)
{
case "GetMonth": Substitution = DateUtils.GetMonthExpression;
}
}
if(Substitution != null)
{
//we'd like to replace the method call; we'll need to wire up the method call arguments to the parameters of the lambda
var Replacement = new LambdaParameterSubstitution(Substitution.Parameters, node.Arguments).Visit(Substitution.Body);
return Replacement;
}
return base.VisitMethodCall(node);
}
}
class LambdaParameterSubstitution : ExpressionVisitor
{
ParameterExpression[] Parameters;
Expression[] Replacements;
public LambdaParameterExpressionVisitor(ParameterExpression[] parameters, Expression[] replacements)
{
Parameters = parameters;
Replacements = replacements;
}
protected override Expression VisitParameter(ParameterExpression node)
{
//see if the parameter is one we should replace
int p = Array.IndexOf(Parameters, node);
if(p >= 0)
{
return Replacements[p];
}
return base.VisitParameter(node);
}
}
The first class here will visit the query expression tree and find references to GetMonth (or any other method requiring substitution) and replace the method call. The replacement is provided in part by the second class, which inspects a given lambda expression and replaces references to its parameters.
Having transformed the query expression, L2S will never see calls to your methods, and it can now execute the query as expected.
In order to intercept the query before it hits L2S in a convenient way, you can create your own IQueryable provider that is used as a proxy in front of L2S. You would perform the above replacements in your implementation of Execute and then pass the new query expression to the L2S provider.
I think you can register your custom function in the DataContext and use it in the linq query. In this post is very well explained: http://msdn.microsoft.com/en-us/library/bb399416.aspx
Hope it helps.
Found a reference to some existing code which implements an IQueryable provider as Michael suggests.
http://tomasp.net/blog/linq-expand.aspx
I think assuming that code works, the other lingering issue is that you would have to have an Expression property for each type which contained the date.
The resulting code for avoiding doing that appears to be a bit cumbersome (though it would avoid the sort of errors you're trying to avoid by putting the calculation in a method):
Group Expression:
group row by CustomDate.GetMonth(row, x => x.customdate).Compile().Invoke(row)
Method to Return Group Expression:
public class CustomDate
{
public static Expression<Func<TEntity, int>> GetMonth<TEntity>(TEntity entity, Func<TEntity, int> func)
{
return x => ((func.Invoke(entity)/60/60/24)%13);
}
}
I'm not entirely sure whether that nested .Invoke would cause problems with the Expandable expression or whether the concept would have to be tweaked a bit more, but that code seems to supply an alternative to building a custom IQueryProvider for simple mathematical expressions.
There doesn't appear to be any way to instruct LINQ-to-SQL to call your SQL UDF. However, I believe you can encapsulate a reusable C# implementation in System.Linq.Expressions.Expression trees...
public class CustomDate {
public static readonly Expression<Func<int, int>> GetMonth =
customDateInt => (customDateInt / 60 / 60 / 24) % 13;
}
var results = from row in myHistoryDataContext.HistoryData
group row by CustomDate.GetMonth(row.CustomDateInt) into grouping
select new int?[] { grouping.Key , grouping.Count() }