I use some configuration logic to generate Sparql queries with RDF4j and the SparqlBuilder.
// prepare selectVariables, prefixes and whereCondition according to configuration
SelectQuery mainQuery = Queries.SELECT(selectVariables)
.prefix(prefixes)
.where(whereCondition)
Now I wish to allow for users to configure custom WHERE conditions to be used as SubSelects and composed with the rest of the query logic.
Since the configuration is YAML and the users are trained in Sparql, I wished to let users specify custom patterns as YAML multiline strings like this example
customQuery: |
?_ wdt:P31 wd:Q5;
wdt:P19/wdt:P131* wd:Q60.
This way I can let the users customize freely the different queries that I will generate based on the configured condition.
The problem
I already managed to parse the query fragment using RDFj SparqlParser:
SPARQLParserFactory PARSER_FACTORY = new SPARQLParserFactory();
QueryParser parser = PARSER_FACTORY.getParser();
ParsedQuery parsed = parser.parseQuery(query, null);
ProjectionVisitor projectionVisitor = new ProjectionVisitor();
parsed.getTupleExpr().visit(projectionVisitor);
TupleExpr parsedExpression = projectionVisitor.getProjectionArg();
but I can't use the parsedExpression into the SparqlBuilder methods, the nodes representation for the parser looks incompatible with the ones for the fluent builder.
Is there any way to use parsed expressions inside the SparqlBuilder?
No, it is not possible to use parsed expressions in the SparqlBuilder. What you could probably do instead though (freewheeling here) is use the SparqlBuilder to generate a query with a placeholder pattern of some sort, parse that, and then use a parse tree visitor to find that placeholder pattern and replace it with the custom parsed expression you got from the user.
Related
I am trying to use AWS Config Advanced Query to generate a report against a specific rule I have created.
SELECT
configuration.targetResourceId,
configuration.targetResourceType,
configuration.complianceType,
configuration.configRuleList
WHERE
configuration.configRuleList.configRuleName = 'aws_config-requiredtags-rule'
AND configuration.complianceType = 'NON_COMPLIANT'
Results look similar to this:
[
0:{
"configRuleName":"aws_configrequiredtags-rule"
"configRuleArn":"arn:aws:config:us-east-2:123456789:config-rule/config-rule-dl6gsy"
"configRuleId":"config-rule-dl6gsy"
"complianceType":"COMPLIANT"
}
1:{
"configRuleName":"eaws_config-instanceinvpc-rule"
"configRuleArn":"arn:aws:config:us-east-2:123456789:config-rule/config-rule-dc4f1x"
"configRuleId":"config-rule-dc4f1x"
"complianceType":"NON-COMPLIANT"
}
While this query produces results, it separates my config rule and compliance type, so I am not only getting results where my config rule is ONLY Non-compliance for 'aws_config-requiredtags-rule' results.
I am pretty novice with SQL, but hope there is a way for me to specify that I only want to see Non-Compliant results against a specific rule.
thanks,
This is a limitation of the AWS Config Service - and a pretty big one IMO. When you filter on properties within arrays, those filters are treated like OR operations instead of AND. There doesn't seem to be a good way of performing meaningful queries for individual rules.
From the docs:
When querying against multiple properties within an array of objects, matches are computed against all the array elements
...
The first condition configuration.configRuleList.complianceType = 'non_compliant' is applied to ALL elements in R.configRuleList, because R has a rule (rule B) with complianceType = ‘non_compliant’, the condition is evaluated as true. The second condition configuration.configRuleList.configRuleName is applied to ALL elements in R.configRuleList, because R has a rule (rule A) with configRuleName = ‘A’, the condition is evaluated as true. As both conditions are true, R will be returned.
I am new to Flink and have gone through site(s)/examples/blogs to get started. I am struggling with the correct use of operators. Basically I have 2 questions
Question 1: Does Flink support declarative exception handling, I need to handle parse/validate/... errors?
Can I use org.apache.flink.runtime.operators.sort.ExceptionHandler or similar
to handle errors?
or Rich/FlatMap function my best option?
If Rich/FlatMap the only option then is there a way to get handle to Stream inside Rich/FlatMap function so Sink(s) could be attached for error processing?
Question 2: Can I conditionally attach different Sink(s)?
Based on certain field(s) in keyed split streams I need to select different sink(s), do I split the stream again or use a Rich/FlatMap to handle that?
I am using Flink 1.3.2. Here is the relevant portion of my job
.....
.....
DataStream<String> eventTextStream = env.addSource(messageSource)
KeyedStream<EventPojo, Tuple> eventPojoStream = eventTextStream
// parse, transform or enrich
.flatMap(new MyParseTransformEnrichFunction())
.assignTimestampsAndWatermarks(new EventAscendingTimestampExtractor())
.keyBy("eventId");
// split stream based on eventType as different reduce and windowing functions need to be applied
SplitStream<EventPojo> splitStream = eventPojoStream
.split(new EventStreamSplitFunction());
// need to apply reduce function
DataStream<EventPojo> event1TypeStream = splitStream.select("event1Type");
// need to apply reduce function
DataStream<EventPojo> event2TypeStream = splitStream.select("event2Type");
// need to apply time based windowing function
DataStream<EventPojo> event3TypeStream = splitStream.select("event3Type");
....
....
env.execute("Event Processing");
Am I using the correct operators here?
Update 1:
Tried using the ProcessFunction as suggested by #alpinegizmo but that didn't work as it depends upon a keyed stream which I don't have until I parse/validate input. I get "InvalidProgramException: Field expression must be equal to '*' or '_' for non-composite types. ".
It's such a common use case where your first parse/validate input and won't have keyed stream yet, so how do you solve it?
Thanks for your patience and help.
There's one key building block that you've overlooked. Take a look at side outputs.
This mechanism provides a typesafe way to produce any number of additional output streams. This can be a clean way to report errors, among other uses. In Flink 1.3 side outputs can only be used with ProcessFunction, but 1.4 will add side outputs to ProcessWindowFunction.
I am trying to avoid a sql injection. This topic has been dealt with in Java (How to prevent query injection on Google Big Query) and Php.
How is this accomplished in App Scripts? I did not find how to add a parameter to a SQL statement. Here is what I had hoped to do:
var sql = 'SELECT [row],etext,ftext FROM [hcd.hdctext] WHERE (REGEXP_MATCH(etext, esearch = ?) AND REGEXP_MATCH(ftext, fsearch = ?));';
var queryResults;
var resource = {
query: sql,
timeoutMs: 1000,
esearch='r"[^a-zA-z]comfortable"',
fsearch='r"[a-z,A-z]confortable"'
};
queryResults = BigQuery.Jobs.query(resource,projectNumber);
And then have esearch and fsearch filled in with the values (which could be set elsewhere).
That does not work, according to the doc.
Any suggestions on how to get a parameter in an SQL query? (I could not find a setString function...)
Thanks!
Unfortunately, BigQuery doesn't support this type of parameter substitution. It is on our list of features to consider, and I'll bump the priority since it seems like this is a common request.
The only suggestion that I can make in the mean time is that if you are building query strings by hand, you will need to make sure you escape them carefully (which is a non-trivial operation).
I've read on how to use the per field analyzer wrapper, but can't get it to work with a custom analyzer of mine. I can't even get the analyzer to run the constructor, which makes me believe I'm actually calling the per field analyzer incorrectly.
Here's what I'm doing:
Create the per field analyzer:
PerFieldAnalyzerWrapper perFieldAnalyzer = new PerFieldAnalyzerWrapper(srchInfo.GetAnalyzer(true));
perFieldAnalyzer.AddAnalyzer("<special field>", dta);
Add all the fields do document as usual, including a special field that we analyze differently.
And add document using the analyzer like this:
iw.AddDocument(doc, perFieldAnalyzer);
Am I on the right track?
The problem was related to my reliance on CMSs (Kentico) built-in Lucene helper classes. Basically, using those classes you need to specify the custom analyzer at index-level through the CMS and I did not wish to do that. So I ended up using Lucene.net directly almost everywhere gaining the flexibility of using any custom analyzer I want
I also did some changes to how I structure data and ended up using the tried-and-true KeywordAnalyzer to analyze document tags. Previously I was trying to do some custom tokenization magic on comma separated values like [tag1, tag2, tag with many parts] and could not get it reliably working with multi-parted tags. I still kept that field, but started adding multiple "tag" fields to the document, each storing one tag. So now I have N "tag" fields for "N" tags, each analyzed as a keyword, meaning each tag (one word or many) is a single token.
I think I overthinked it with my initial approach.
Here is what I ended up with.
On Indexing:
KeywordAnalyzer ka = new KeywordAnalyzer();
PerFieldAnalyzerWrapper perFieldAnalyzer = new PerFieldAnalyzerWrapper(srchInfo.GetAnalyzer(true));
perFieldAnalyzer.AddAnalyzer("documenttags_t", ka);
-- Some procedure to compile all documents by reading from DB and putting into Lucene docs
foreach(var doc in docs)
{
iw.AddDocument(doc, perFieldAnalyzer);
}
On Searching:
KeywordAnalyzer ka = new KeywordAnalyzer();
PerFieldAnalyzerWrapper perFieldAnalyzer = new PerFieldAnalyzerWrapper(srchInfo.GetAnalyzer(true));
perFieldAnalyzer.AddAnalyzer("documenttags_t", ka);
string baseQuery = "documenttags_t:\"" + tagName + "\"";
Query query = _parser.Parse(baseQuery);
var results = _searcher.Search(query, sortBy)
I'm currently trying to create an Endeca query using the Java API for a URLENEQuery. The current query is:
collection()/record[CONTACT_ID = "xxxxx" and SALES_OFFICE = "yyyy"]
I need it to be:
collection()/record[(CONTACT_ID = "xxxxx" or CONTACT_ID = "zzzzz") and
SALES_OFFICE = "yyyy"]
Currently this is being done with an ERecSearchList with CONTACT_ID and the string I'm trying to match in an ERecSearch object, but I'm having difficulty figuring out how to get the UrlENEQuery to generate the or in the correct fashion as I have above. Does anyone know how I can do this?
One of us is confused on multiple levels:
Let me try to explain why I am confused:
If Contact_ID and Sales_Office are different dimensions, where Contact_ID is a multi-or dimension, then you don't need to use EQL (the xpath like language) to do anything. Just select the appropriate dimension values and your navigation state will reflect the query you are trying to build with XPATH. IE CONTACT_IDs "ORed together" with SALES_OFFICE "ANDed".
If you do have to use EQL, then the only way to modify it (provided that you have to modify it from the returned results) is via string manipulation.
ERecSearchList gives you ability to use "Search Within" functionality which functions completely different from the EQL filtering, though you can achieve similar results by using tricks like searching only specified field (which would be separate from the generic search interface") I am still not sure what's the connection between ERecSearchList and the EQL expression above?
Having expressed my confusion, I think what you need to do is to use String manipulation to dynamically build the EQL expression and add it to the Query.
A code example of what you are doing would be extremely helpful as well.