I'm using a SPARQL query tool called Twinkle that doesn't seem to support functions like AVG() and SUM(). So far, only the COUNT() ARQ function works. Is there an alternative way to sum numbers so that I can at least divide using COUNT()?
I don't think so. Switch tools and use something that supports SPARQL 1.1.
Related
For example, i wanted to write Window functions like sum over (window)
Since over clause is not supported by Druid, how do i achieve the same using Druid Native query API or SQL API?
You should use a GroupBy Query. As Druid is a time series database, you have to specify your interval (window) where you want to query data from. You can use aggregation methods over this data, for example a SUM() aggregation.
If you want, you can also do extra filtering within your aggregation, like "only sum records where city=paris"). You could also apply the SUM aggregation only to records which exists in a certain time window within your selected interval.
If you are a PHP user then maybe this package is handy for you: https://github.com/level23/druid-client#sum
We have tried to implement an easy way to query such data.
I'm looking for a list of pre-defined aggregation functions in Spark SQL. I have in mind something analogous to Presto Aggregate Functions.
I Ctrl+F'd around a little in the SQL API docs to no avail... it's also hard to tell at a glance which functions are for aggregation vs. not. For example, if I didn't know avg is an aggregation function I'd be hard pressed to tell it is one (in a way that's actually scalable to the full set of functions):
avg - avg(expr) - Returns the mean calculated from values of a group.
If such a list doesn't exist, can someone at least confirm to me that there's no pre-defined function like any/bool_or or all/bool_and to determine if any or all of a boolean column in a group are true (or false)?
For now, my workaround is
select grp_col, count(if(bool_col, true, NULL)) > 0 any_agg
Just take a look at Spark Docs on Aggregate functions section
The list of functions is here under Relational Grouped Dataset - specifically the API's that return DataFrame (not RelationalGroupedDataSet):
https://spark.apache.org/docs/latest/api/scala/index.html?org/apache/spark/sql/RelationalGroupedDataset.html#org.apache.spark.sql.RelationalGroupedDataset
I'm trying to get a list of Change Requests that match certain conditions, some of these conditions are met by using functions like has_attr().
I would like to ask is it at all possible, I need for instance to use such function has_associated_task(cvtype="task") is it possible to do that?
For queries I'm using the following pattern:
http://ip[:port]/change/oslc/db/dbURI/role/User/cr?oslc_cm.query=change:cvtype="problem" and request_type="Change_Request" and has_associated_task(cvtype="task")&oslc_cm.properties=problem_synopsis
this does work without the function term but I would like to extend the search criteria further, is there any other way besides doing a predefined query in change? Is there somewhere a list of terms? like change:cvtype (I've tried to see this [http://www.ibm.com/xmlns/prod/rational/change/1.0/][1] but I got a "whoops" from the web server)
There are some ways you could solve this:
OSLC Resource Shapes - some OSLC providers associate shapes (like schemas) that describe what you can expect from an OSLC Query Capability.
There isn't a way in the simple query syntax to test for null (or not null), assuming you want to have some condition such as (cvtype="task" and linkedTask != NULL). To get around this you can simply query based on cvtype="task" and locally filter the results using tools such as XPath or Jena. Alternatively you can do is look for extensions to the tool you are working with to see if they provide any extensions to the query syntax to support your use case, I don't have this information off hand.
Thank you so much for the window functions!!!
I'm curious if some more "basic" aggregates will be supported:
Sum()
Average()
Min()
Max()
Current result of trying to use Sum():
Error: Unrecognized Analytic Function: SUM cannot be used with an OVER() clause.
These are currently on the feature roadmap, but we do not have an ETA at this time. Anyone interested in these functions should vote up this question, and we'll use that to prioritize the feature request.
I'm doing some complex querying using Lucene 4.0, and my information-retrieval-theory buddy has told me that anywhere that I can use a filter instead of a query, I should, in order to improve performance. Therefore, I decided to take one particularly hairy component of the query and transform it into a filter. This is relatively straightforward, as there are Filter equivalents of BooleanQuery and NumericRangeQuery, but there doesn't seem to be a TermFilter equivalent of TermQuery. There is a FieldValueFilter, but that seems only to filter on the presence of a given field, not a particular value in that field.
What filter should I use for this?
I believe TermsFilter is what you are looking for.