How can I call query module procedures in Memgraph? - cypher

I have loaded the custom query modules that I've created. How can I call it within the Memgraph?

Once the MAGE query modules or any custom modules you developed have been loaded into Memgraph, you can call them within queries using the following Cypher syntax:
CALL module.procedure(arg1, "string_argument", ...) YIELD res1, res2, ...;
Each procedure returns zero or more records, where each record contains named fields. The YIELD clause is used to select fields you are interested in or all of them (*). If you are not interested in any fields, omit the YIELD clause. The procedure will still run, but the record fields will not be stored in variables. If you are trying to YIELD fields that are not a part of the produced record, the query will result in an error.
Procedures can be standalone as in the example above, or a part of a larger query when we want the procedure to work on data the query is producing.
For example:
MATCH (node) CALL module.procedure(node) YIELD result RETURN *;
When the CALL clause is a part of a larger query, results from the query are returned using the RETURN clause. If the CALL clause is followed by a clause that only updates the data and doesn't read it, RETURN is unnecessary. It is the Cypher convention that read-only queries need to end with a RETURN, while queries that update something don't need to RETURN anything.
Also, if the procedure itself writes into the database, all the rest of the clauses in the query can only be read from the database, and the CALL clause can only be followed by the YIELD clause and/or RETURN clause.
If a procedure returns a record with the same field name as some variable we already have in the query, that field name can be aliased with some other name using the AS sub-clause:
MATCH (result) CALL module.procedure(42) YIELD result AS procedure_result RETURN *;

Related

SQL update scalar function

When I run this query:
UPDATE myTable SET x = (SELECT round(random()*100));
All records have the same value for x. This makes sense.
When I run this query:
UPDATE myTable SET x = round(random()*100);
It updates all records in the table and for each record the value of x is different.
I would like to know whats happening in the background for the second query. Is it running a update query for each record id(1....n)?
I am guessing it works similar to triggers, where for each row before updating
the trigger intercepts
calls the function and sets value of x
executes query
What is actually happening?
It's rather simple, really. The function random() is defined VOLATILE.
If you put it into a subquery, you generate a derived table with a single row. Postgres can materialize the result and reuse it in the outer query many times.
Otherwise Postgres calls the function for every row. That's how VOLATILE functions have to be treated. Per documentation on function volatility:
A VOLATILE function can do anything, including modifying the database.
It can return different results on successive calls with the same
arguments. The optimizer makes no assumptions about the behavior of
such functions. A query using a volatile function will re-evaluate the
function at every row where its value is needed.
Bold emphasis mine.

How to optimize a XQUERY in a SELECT statement?

I am using Oracle database. I have a table in which one of the Column is of XMLTYPE. Now, the problem statement is that I need to extract the count of those record that have an XML with a particular root element and one more condition. Suppose the XML stored are of following formats:
<ns1:Warehouse whNo="102" xmlns:ns1="xyz">
<ns1:Building></ns1:Building>
</ns1:Warehouse>
and
<ns1:Warehouse whNo="102" xmlns:ns1="xyz">
<ns1:Building>Owned</ns1:Building>
</ns1:Warehouse>
and there are other XMLs with Root elements other than Warehouse
Now, I need to fetch those records which have
Root element as Warehouse
Building element as empty
I wrote the following SQL query:
select count(XMLQuery('declare namespace ns1="xyz.com";
for $i in /*
where fn:local-name($i) eq "Warehouse"
and fn:string-length($i/ns1:Building ) = 0
return <Test>{$i/ns1:Building}</Test>'
PASSING xml_response RETURNING CONTENT)) countOfWareHouses
from test
Here, test is the name of the table and *xml_response* is the name of the XMLTYPE column in the table test.
This query works fine when the records are less. I have tested it for around 500 records in the table and the time it takes is around 0.1s. But as you increase the number of records in the table, the time increases. When I increased the number of records to 5000, the time it took was ~11s. And for a production table, where the number of records currently stored are 185000, this query never completes.
Please help me to optimize this query or the xquery.
Edit 1:
I tried using this:
select count(XMLQuery(
'declare namespace ns1 = "xyz";
for $i in /
return /ns1:Warehouse[not(ns1:Building/text())]'
PASSING xml_response RETURNING CONTENT))
from test
and
select count(XMLQuery(
'declare namespace ns1 = "xyz";
return /ns1:Warehouse[fn:string-length(ns1:Building)=0]'
PASSING xml_response RETURNING CONTENT))
from test
But this is not working.
When I try to run these, it asks for binding values for Building and Warehouse.
Instead of where you should use predicates which would work faster like:
ns1:Warehouse[string-length(ns1:Building)=0]
Do not use local-name(...) if not necessary. Node tests will probably be faster and enable index use. You're also able to remove the string-length(...) call.
Search for <Warehouse/> elements, which do not have text nodes below their <Building/> node. If you also want to scan for arbitrary subnodes (including attributes!) use node() instead of text(). If you just want to make sure there's text somewhere possibly as child of other nodes, use ns1:Building//text() instead, for example in cases like this: <ns1:Building><foo>bar</foo></ns1:Building>.
This simple XPath expression is doing what you need:
/ns1:Warehouse[not(ns1:Building/text())]
If you need to construct those <Test/> elements, use
for $warehouse in /ns1:Warehouse[not(ns1:Building/text())]
return <Test>{$warehouse/ns1:Building}</Test>
which should be a real drop-in replacement to your XQuery.
I just realized all you want to know is the number, then better count in XQuery (I cannot tell you how to read the single result then though):
count(/ns1:Warehouse[not(ns1:Building/text())])

What is the difference between a Result Set and Return value in a SQL procedure? what do they signify?

I know that writing :
SELECT * FROM <ANY TABLE>
in a stored procedure will output a result set... what why do we have a return value separately in a stored procedure? where do we use it ?
If any error comes then the result set will be null rite?
First of all you have two distinct ways to return something. You may return a result set (i.e. a table) as the result of the operation as well as return value indicating either some sort of error or status of the result set.
Also, a return value is limited to a single 32bit integer, whereas a result set can have as many rows and columns the RDBMS allows.
My personal opinion is to use a stored procedure to execute a task mainly, and not to create a result set. But that is a matter of taste. However, using this paradigm, an action should inform the caller about the success and -in case of a failure- about the reason. Some RDBMS allow using exceptions, but if there is nothing to throw, i.e. just returning a status (e.g. 0,1,2 for 'data was new and had to be inserted, data existed and was updated, data could not be updated etc.)
There is a third way to pass information back to the caller: By using output parameter. So you have three different possibilities of passing information back to the caller.
This is one more than with a 'normal' programming language. They usually have the choice of either returning a value (e.g. int Foo() or an output/ref parameter void Foo(ref int bar). But SQL introduces a new and very powerful way of returning data (i.e. tables).
In fact, you may return more than one table which makes this feature even more powerful.
Because if you use return values you can have a more fine grained control over the execution status and what the error (if any) were and you can return different error codes for malformed or invalid parameters etc and hence add error control/checking on the calling side to.
If you just check for an empty result set you really don't know why the set might be empty (maybe you called the procedure with an invalid parameter).
The main difference between a result set and a return value is that the result set stores the data returned (if any) and the return code holds some kind of status information about the execution itself.
You can use return value to return additional information from a stored procedure. This can be error codes, validation results or any other custom information you may want to return. It gives you additional flexibility when coding stored procedures.
why do we have a return value separately in a stored procedure?
A stored procedure may return 0 or more resultsets. Insert, update, and delete normally don't produce a resultset, and a stored procedure may call select many times. In all cases, the resultset is data.
I suggest the best way to think of the "return value" is as status information: it indicates how the stored procedure worked out. You could return ##rowcount for an update. Sometimes it can be something simple, like the number of rows meeting some criteria, saving you the work of binding a variable to a single row to get the same answer. Or you could return 0 for success and nonzero for error; it's often easier to check the return status inline than in an error handler.
There's an analogy on the lines of the Unix cat utility that might help: it produces data on standard output, and returns an exit status to let the caller know whether or not it succeeded.

How to cope with null results in SQL Tasks that return single rows in SSIS 2005?

In a dataflow task, I can slip a rowcount into the processing flow and place the count into a variable. I can later use that variable to conditionally perform some other work if the rowcount was > 0. This works well for me, but I have no corresponding strategy for sql tasks expected to return a single row. In that event, I'm returning those values into variables. If the lookup produces no rows, the sql task fails when assigning values into those variables. I can branch on that component failing, but there's a side effect of that - if I'm running the job as a SQL server agent job step, the step returns DTSER_FAILURE, causing the step to fail. I can tell the sql agent to disregard the step failure, but then I won't know if I have a legitimate error in that step. This seems harder than it should be.
The only strategy I can think of is to run the same query with a count(*) aggregate and test if that returns a number > 0 and if so running the query again without the count. That's ugly because I have the same query in two places that I need to keep in sync.
Is there a better way?
In that same condition you can have additional logic (&& or ||). I would take one of the variables for your single statement and say something to the effect:
If #User::rowcount>0 || #User:single_record_var!=Default
That should help.
What kind of SQL statement? Can you change it to still return a single row with all NULLs instead of no rows?
What stops it from returning more than one row? The package would fail if it ended up returning more than one row, right?
You could also change it to call a stored procedure, and then call the stored procedure in two places without code duplication. You could also change it to be a view or user-defined function (if parameters are needed), SELECT COUNT(*) FROM udf() to check if there is data, SELECT * FROM udf() to get the row.

Incorrect Results when calling a Python UDF in Redshift multiple times within a single column inside a select statement

I am encountering an issue in Redshift where calling a UDF more than once per column inside a select statement is returning the same result as the first call to that UDF.
Bit of Background
I have a very simple Python UDF that calculates an md5 hash. The reason for this function is to be able to handle UTF-16/UTF-8 conversion before doing the hash so it is consistent with SQL server. Now the syntax or logic inside the function does not seem to be the issue as we have tried creating even simpler functions that produce the same behavior.
The Problem
My function is named MD5_UTF16 and is called by doing MD5_UTF16(yourvalue), and returns a hash string / hexdigest of the value you pass into the argument.
In my query I need to be able to do this (postgresql syntax):
SELECT MD5_UTF16(column1) || MD5_UTF16(column2)|| MD5_UTF16(column3) AS concatenatedhash
FROM MyTable
i.e. I need to calculate each hash and concatenate them in a single column. If I calculated each of those hashes separately in their own columns, the function generates the correct hashes for those columns. However, in my example above I have called each function and concatenated the results with the results of the other calls. In this scenario what is happening is all the calls to the functions are returning the hash for the first call i.e. MD5_UTF16(column1).
To clarify a bit further using example hash values. Let's pretend these are the hashes for each of the columns above:
Column 1: 275AB169CBEE4550F752C634B9335AE0
Column 2: B2214041A94F50B027FE1DEEC4C8474C
Column 3: 91050DAEFFEE20CDA2FC9914B6E4EBE9
My expected result for the concatenatedhash column would be a simple concatenation of the strings above (275AB169CBEE4550F752C634B9335AE0B2214041A94F50B027FE1DEEC4C8474C91050DAEFFEE20CDA2FC9914B6E4EBE9)
Instead, what I am getting is a concatenation of column 1's hash 3 times:
(275AB169CBEE4550F752C634B9335AE0275AB169CBEE4550F752C634B9335AE0275AB169CBEE4550F752C634B9335AE0)
In my SELECT statement if I had called the function on column 2 (instead of column 1) first, then it would be the hash for column 2 that is repeated.
Has anyone encountered this before?
NOTE: You can only replicate this behavior if you are selecting data out of a table. So doing a:
SELECT MD5_UTF16('hard-coded value 1') || MD5_UTF16('hard-coded value 2')
with no table source will not replicate this behavior.
Work-arounds I am aware of
I do know of a possible workaround but I still would have expected my method above to work, so this question is not about applying the following workaround, but more understanding why the above method is not working.
- Workaround: Calculate each hash in a separate column first then concatenate them. This will have potential performance implications on our end among other things.
EDIT 1
Have found that the issue I've described only happens when there is a join in my query.. even if none of the column data from the joined table are being used in my UDF calls i.e.
SELECT ...concatenated hashes..
FROM table1
JOIN table2 ...
Removing the join seems to cause the hashes to be calculated correctly. Will attempt a workaround using this new knowledge. Not sure if it has anything to do with the execution plan running the UDF's differently when a join is involved - even though none of the column data from the joined table is being used for the UDF calls.