What is the use case for Merge function SQL Clr? - sql

I am writing a CLR userdefinedAggregate function to implement median. While I understand all the other function which I have to implement. I can not understand, what is the use of the merge function.
I am getting a vague idea that if aggregated function is partially evaluated ( i.e. evaluated for some rows with one group and the remaining in other ) then the values needs to be aggregated. If its the case is there a way to test this ?
Please let me know if any of the above is not clear or if you need any further information.

Your vague idea is correct.
From Requirements for CLR User-Defined Aggregates
This method can be used to merge another instance of this aggregate
class with the current instance. The query processor uses this method
to merge multiple partial computations of an aggregation.
The parameter to merge is another instance of your aggregate and you should merge the aggregated data in that instance to your current instance.
You can have a look at the sample string concatenate aggregate. The merge method add the concatenated strings from the parameter to the current instance of the aggregate class.

Related

Qlik sense: How to aggregate strings into single row in script

I am trying to aggregate strings that belong to the same product code in one row. Which Qlik sense aggregation function should I use?
image
I am able to aggregate integers in such example, but failed for string aggregation.
Have you tried maxstring() - this is a string aggregation function.
As x3ja mentioned, you can use an aggregation function in charts that will work for strings, including:
MaxString()
Only()
Concat()
These can result in the type of thing you're looking for:
It's worth noting, though, that this sort of problem is almost always an issue with the underlying data model. Depending on what your source data looks like, you should consider investigating your use of Join and/or Concatenate. You can see more info on how to use those functions on this Qlik Help page.
Here's a very basic example of using a Join to properly combine the data in a way that results in all data showing up a single record without needing any aggregations in the table chart:

List of aggregation functions in Spark SQL

I'm looking for a list of pre-defined aggregation functions in Spark SQL. I have in mind something analogous to Presto Aggregate Functions.
I Ctrl+F'd around a little in the SQL API docs to no avail... it's also hard to tell at a glance which functions are for aggregation vs. not. For example, if I didn't know avg is an aggregation function I'd be hard pressed to tell it is one (in a way that's actually scalable to the full set of functions):
avg - avg(expr) - Returns the mean calculated from values of a group.
If such a list doesn't exist, can someone at least confirm to me that there's no pre-defined function like any/bool_or or all/bool_and to determine if any or all of a boolean column in a group are true (or false)?
For now, my workaround is
select grp_col, count(if(bool_col, true, NULL)) > 0 any_agg
Just take a look at Spark Docs on Aggregate functions section
The list of functions is here under Relational Grouped Dataset - specifically the API's that return DataFrame (not RelationalGroupedDataSet):
https://spark.apache.org/docs/latest/api/scala/index.html?org/apache/spark/sql/RelationalGroupedDataset.html#org.apache.spark.sql.RelationalGroupedDataset

How can i use the new UDF functionality to create "Dynamic SQL statement"?

How can i use the new UDF functionality to create "Dynamic SQL statement"?
Is there a way to use UDF in order to construct SQL statement based on template and input variables, and later run this query?
The documentation https://cloud.google.com/bigquery/user-defined-functions?hl=en says:
A UDF is similar to the "Map" function in a MapReduce: it takes a
single row as input and produces zero or more rows as output. The
output can potentially have a different schema than the input.
So your UDF receives just a single row.
Therefore - no, UDF is not for the purpose you described in your question.
You might take a look at views - maybe that will suit you better:
https://cloud.google.com/bigquery/querying-data#views

Possible to spy/mock Sql Server User Defined Functions?

Is it possible to mock/spy functions with T-SQL? I couldn't find anything mentioning it. I was thinking of creating my own implementation using the SpyProcedure as a guideline (if no implementation exists). Anyone had any success with this?
Thanks.
In SQL Server functions cannot have side-effects. That means, in your test you can replace the inner function with on that returns a fixed result, but there is no way to record the parameters that were past into the function.
There is one exception: If the function returns a string and the string does not have to follow a specific format, you could concatenate the passed-in parameters and then assert later on that the value coming back out contained all the correct values, but that is a very special case and not generally possible.
To fake a function, just drop or rename the original and create your own within the test. I would put this code into a helper function, as it probably will be called from more than one test.

How to pass an entire row (in SQL, not PL/SQL) to a stored function?

I am having the following (pretty simple) problem. I would like to write an (Oracle) SQL query, roughly like the following:
SELECT count(*), MyFunc(MyTable.*)
FROM MyTable
GROUP BY MyFunc(MyTable.*)
Within PL/SQL, one can use a RECORD type (and/or %ROWTYPE), but to my knowledge, these tools are not available within SQL. The function expects the complete row, however. What can I do to pass the entire row to the stored function?
Thanks!
Don't think you can.
Either create the function with all the arguments you need, or pass the id of the row and do a SELECT within the function.