How to use Flink 1.9 LAST_VALUE? - sql

I am trying to use Flink's 1.9 LAST_VALUE. Unlike the Alibaba docs, it does not accept a second argument for ORDER and it does not like the OVER(...) clause. So, I am not sure, how to feed into LAST_VALUE a criteria?
I was hoping that if you set the processing to "event-time", last_value would return the latest value based on event-time, but instead, it is returning the latest value read?

The function LAST_VALUE is only supported by the Blink planner when running SQL on Flink. One needs to explicitly activate the usage of the Blink planner via
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.EnvironmentSettings;
import org.apache.flink.table.api.java.StreamTableEnvironment;
StreamExecutionEnvironment bsEnv = StreamExecutionEnvironment.getExecutionEnvironment();
EnvironmentSettings bsSettings = EnvironmentSettings.newInstance().useBlinkPlanner().inStreamingMode().build();
StreamTableEnvironment bsTableEnv = StreamTableEnvironment.create(bsEnv, bsSettings);
Only then, you should be able to run SQL queries containing the LAST_VALUE function.

Related

How to add allowDiskUse to mongooperation.find method in spring boot

The below query i want to add allowDiskUse true. I am able to add allowDiskUsage to aggregation query but not able to find a solution for find method
Query query = new Query().with(pageable).addCriteria(criteria).collation(collation);
mongoOperations.find(query, RuntimeApplication.class, RUNTIME_APP);
I want to know is that possible to add allowDiskUsage to find method or i need to change query to aggregation.
i just found in below query from internet. but i am not getting how to convert it into spring boot
db.collection.find(<match>).sort(<sort>).allowDiskUse()
What kind of spring-data version are you using? You should be able to add allowDiskUse to the Query object just like this:
val query = Query()
.with(Pageable.ofSize(1))
.addCriteria(Criteria())
.allowDiskUse(true)
And then afterwards use the query with find for instance with mongoTemplate.
Here is the documentation: https://javadoc.io/doc/org.springframework.data/spring-data-mongodb/latest/org/springframework/data/mongodb/core/query/Query.html

Is there a way to execute text gremlin query with PartitionStrategy

I'm looking for an implementation to run text query ex: "g.V().limit(1).toList()" while using the PatitionStrategy in Apache TinkerPop.
I'm attempting to build a REST interface to run queries on selected graph paritions only. I know how to run a raw query using Client, but I'm looking for an implementation where I can create a multi-tenant graph (https://tinkerpop.apache.org/docs/current/reference/#partitionstrategy) and query only selected tenants using raw text query instead of a GLV. Im able to query only selected partitions using pythongremlin, but there is no reference implementation I could find to run a text query on a tenant.
Here is tenant query implementation
connection = DriverRemoteConnection('ws://megamind-ws:8182/gremlin', 'g')
g = traversal().withRemote(connection)
partition = PartitionStrategy(partition_key="partition_key",
write_partition="tenant_a",
read_partitions=["tenant_a"])
partitioned_g = g.withStrategies(partition)
x = partitioned_g.V.limit(1).next() <---- query on partition only
Here is how I execute raw query on entire graph, but Im looking for implementation to run text based queries on only selected partitions.
from gremlin_python.driver import client
client = client.Client('ws://megamind-ws:8182/gremlin', 'g')
results = client.submitAsync("g.V().limit(1).toList()").result().one() <-- runs on entire graph.
print(results)
client.close()
Any suggestions appreciated? TIA
It depends on how the backend store handles text mode queries, but for the query itself, essentially you just need to use the Groovy/Java style formulation. This will work with GremlinServer and Amazon Neptune. For other backends you will need to make sure that this syntax is supported. So from Python you would use something like:
client.submit('
g.withStrategies(new PartitionStrategy(partitionKey: "_partition",
writePartition: "b",
readPartitions: ["b"])).V().count()')

PyPika control order of with clauses

I am using PyPika (version 0.37.6) to create queries to be used in BigQuery. I am building up a query that has two WITH clauses, and one clause is dependent on the other. Due to the dynamic nature of my application, I do not have control over the order in which those WITH clauses are added to the query.
Here is example working code:
a_alias = AliasedQuery("a")
b_alias = AliasedQuery("b")
a_subq = Query.select(Term.wrap_constant("1").as_("z")).select(Term.wrap_constant("2").as_("y"))
b_subq = Query.from_(a_alias).select("z")
q = Query.with_(a_subq, "a").from_(a_alias).select(a_alias.y)
q = q.with_(b_subq, "b").from_(b_alias).select(b_alias.z)
sql = q.get_sql(quote_char=None)
That generates a working query:
WITH a AS (SELECT '1' z,'2' y) ,b AS (SELECT a.z FROM a) SELECT a.y,b.z FROM a,b
However, if I add the b WITH clause first, then since a is not yet defined, the resulting query:
WITH b AS (SELECT a.z FROM a), a AS (SELECT '1' z,'2' y) SELECT a.y,b.z FROM a,b
does not work. Since BigQuery does not support WITH RECURSIVE, that is not an option for me.
Is there any way to control the order of the WITH clauses? I see the _with list in the QueryBuilder (the type of variable q), but since that's a private variable, I don't want to rely on that, especially as new versions of PyPika may not operate the same way.
One way I tried to do this is to always insert the first WITH clause at the beginning of the _with list, like this:
q._with.insert(0, q._with.pop())
Although this works, I'd like to use a PyPika supported way to do that.
In a related question, is there a supported way within PyPika to see what has already been added to the select list or other parts of the query? I noticed the q.selects member variable, but selects is not part of the public documentation. Using q.selects did not actually work for me when using our project's Python version (3.6) even though it did work in Python 3.7. The code I was trying to use is:
if any(field.name == "date" for field in q.selects if isinstance(field, Field))
The error I got was as follows:
def __getitem__(self, item: slice) -> "BetweenCriterion":
if not isinstance(item, slice):
> raise TypeError("Field' object is not subscriptable")
Thank you in advance for your help.
I could not figure out how to control the order of the WITH clauses after calling query.with_() (except for the hack already noted). As a result, I restructured my application to get around this problem. I am now calling query.with_() before building up the rest of the query.
This also made my related question moot, because I no longer need to see what I've already added to the query.

Apache Flink Error Handing and Conditional Processing

I am new to Flink and have gone through site(s)/examples/blogs to get started. I am struggling with the correct use of operators. Basically I have 2 questions
Question 1: Does Flink support declarative exception handling, I need to handle parse/validate/... errors?
Can I use org.apache.flink.runtime.operators.sort.ExceptionHandler or similar
to handle errors?
or Rich/FlatMap function my best option?
If Rich/FlatMap the only option then is there a way to get handle to Stream inside Rich/FlatMap function so Sink(s) could be attached for error processing?
Question 2: Can I conditionally attach different Sink(s)?
Based on certain field(s) in keyed split streams I need to select different sink(s), do I split the stream again or use a Rich/FlatMap to handle that?
I am using Flink 1.3.2. Here is the relevant portion of my job
.....
.....
DataStream<String> eventTextStream = env.addSource(messageSource)
KeyedStream<EventPojo, Tuple> eventPojoStream = eventTextStream
// parse, transform or enrich
.flatMap(new MyParseTransformEnrichFunction())
.assignTimestampsAndWatermarks(new EventAscendingTimestampExtractor())
.keyBy("eventId");
// split stream based on eventType as different reduce and windowing functions need to be applied
SplitStream<EventPojo> splitStream = eventPojoStream
.split(new EventStreamSplitFunction());
// need to apply reduce function
DataStream<EventPojo> event1TypeStream = splitStream.select("event1Type");
// need to apply reduce function
DataStream<EventPojo> event2TypeStream = splitStream.select("event2Type");
// need to apply time based windowing function
DataStream<EventPojo> event3TypeStream = splitStream.select("event3Type");
....
....
env.execute("Event Processing");
Am I using the correct operators here?
Update 1:
Tried using the ProcessFunction as suggested by #alpinegizmo but that didn't work as it depends upon a keyed stream which I don't have until I parse/validate input. I get "InvalidProgramException: Field expression must be equal to '*' or '_' for non-composite types. ".
It's such a common use case where your first parse/validate input and won't have keyed stream yet, so how do you solve it?
Thanks for your patience and help.
There's one key building block that you've overlooked. Take a look at side outputs.
This mechanism provides a typesafe way to produce any number of additional output streams. This can be a clean way to report errors, among other uses. In Flink 1.3 side outputs can only be used with ProcessFunction, but 1.4 will add side outputs to ProcessWindowFunction.

From within a grails HQL, how would I use a (non-aggregate) Oracle function?

If I were retrieving the data I wanted from a plain sql query, the following would suffice:
select * from stvterm where stvterm_code > TT_STUDENT.STU_GENERAL.F_Get_Current_term()
I have a grails domain set up correctly for this table, and I can run the following code successfully:
def a = SaturnStvterm.findAll("from SaturnStvterm as s where id > 201797") as JSON
a.render(response)
return false
In other words, I can hardcode in the results from the Oracle function and have the HQL run correctly, but it chokes any way that I can figure to try it with the function. I have read through some of the documentation on Hibernate about using procs and functions, but I'm having trouble making much sense of it. Can anyone give me a hint as to the proper way to handle this?
Also, since I think it is probably relevant, there aren't any synonyms in place that would allow the function to be called without qualifying it as schema.package.function(). I'm sure that'll make things more difficult. This is all for Grails 1.3.7, though I could use a later version if needed.
To call a function in HQL, the SQL dialect must be aware of it. You can add your function at runtime in BootStrap.groovy like this:
import org.hibernate.dialect.function.SQLFunctionTemplate
import org.hibernate.Hibernate
def dialect = applicationContext.sessionFactory.dialect
def getCurrentTerm = new SQLFunctionTemplate(Hibernate.INTEGER, "TT_STUDENT.STU_GENERAL.F_Get_Current_term()")
dialect.registerFunction('F_Get_Current_term', getCurrentTerm)
Once registered, you should be able to call the function in your queries:
def a = SaturnStvterm.findAll("from SaturnStvterm as s where id > TT_STUDENT.STU_GENERAL.F_Get_Current_term()")