I use some configuration logic to generate Sparql queries with RDF4j and the SparqlBuilder.
// prepare selectVariables, prefixes and whereCondition according to configuration
SelectQuery mainQuery = Queries.SELECT(selectVariables)
.prefix(prefixes)
.where(whereCondition)
Now I wish to allow for users to configure custom WHERE conditions to be used as SubSelects and composed with the rest of the query logic.
Since the configuration is YAML and the users are trained in Sparql, I wished to let users specify custom patterns as YAML multiline strings like this example
customQuery: |
?_ wdt:P31 wd:Q5;
wdt:P19/wdt:P131* wd:Q60.
This way I can let the users customize freely the different queries that I will generate based on the configured condition.
The problem
I already managed to parse the query fragment using RDFj SparqlParser:
SPARQLParserFactory PARSER_FACTORY = new SPARQLParserFactory();
QueryParser parser = PARSER_FACTORY.getParser();
ParsedQuery parsed = parser.parseQuery(query, null);
ProjectionVisitor projectionVisitor = new ProjectionVisitor();
parsed.getTupleExpr().visit(projectionVisitor);
TupleExpr parsedExpression = projectionVisitor.getProjectionArg();
but I can't use the parsedExpression into the SparqlBuilder methods, the nodes representation for the parser looks incompatible with the ones for the fluent builder.
Is there any way to use parsed expressions inside the SparqlBuilder?
No, it is not possible to use parsed expressions in the SparqlBuilder. What you could probably do instead though (freewheeling here) is use the SparqlBuilder to generate a query with a placeholder pattern of some sort, parse that, and then use a parse tree visitor to find that placeholder pattern and replace it with the custom parsed expression you got from the user.
I want to extract 4 values out of one field, called msg, from a Splunk query; and the msg is in the form of:
msg: "Service call successful k1=v1 k2=v2 k3=v3 k4=v4 k5=v5 something else can be ignored"
keys are always static but values are not, for instance, v2 could be XXX or XXYYZZ; similarly possible values for v3 just have unpredictable length.
I query to get some sample results and hope to use Field Extractor to generate a regex, but the regex generated can't get all the values out and I guess it's probably because values are not having the same length?
Do I need to change my logging format by separating each key=value using a common? Or I am not using the field extractor correctly?
[Update1]: A few sample data:
msg:Service call successful k1=XXX k2=BBBB k3=Something I made up k4=YYYNNN k5=do not need to retrieve this value
msg:Service call successful k1=SSSSSS k2=AAA k3=This could contain space and comma, like this one k4=YYYNNM k5=can be ignored
I could change the logging format if it makes easier to query and extract fields. Will adding a separator like dot or pipe help?
Normally Splunk will pull key-value pairs out automatically
However, when it doesn't, go try your regular expression(s) on regex101 - the field extractor is often a good[ish] start, but rarely creates efficient (or complete) regular expressions
An inline version of this would be as follows (presuming the "value" half of the key-value pair is contiguous characters):
| rex field=_raw "k1=(?<k1>\S+)\s+k2=(?<k2>\S+)\s+k3=(?<k3>\S+)\s+k4=(?<k4>\S+)\s+k5=(?<k5>\S+)"
Normally I prefer to do sequential rex calls, in case something's out of order or missing, but if your data's consistent, this will work
Once you have it the way you want it, update your props.conf and transforms.conf as appropriate for the sourcetype
EDIT for updated sample data / comment response:
...
| rex field=_raw "k3=(?<k3>.+)\s+k4="
| rex field=_raw "k4=(?<k4>.+)\s+k5="
...
I am trying to get specific data between two strings which are a opening and closing tag. Normally I would just parse it using XmlParse but the problem is it has a lot of other junk in the dataset.
Here is an example of the large string:
test of data need to parse:<?xml version="1.0" encoding="UTF-8"?><alert xmlns="urn:oasis:names:tc::cap:1.2"><identifier>_2020-12-16T17:32:5620201116173256</identifier><sender>683</sender><sent>2020-12-16T17:32:56-05:00</sent><status>Test</status><msgType>Alert</msgType><source>test of data need to parse</source><scope>Public</scope><addresses/><code>Test1.0</code><note>WENS IPAWS</note><info><language>en-US</language></info>
<capsig:Signature xmlns:capsig="http://www.w3.org/2000/09/xmldsig">
<capsig:Info>
<capsig:CanonicalizationMethod Algorithm="http://www.w3.org/2001/10/xml-exc-c14n"/>
<capsig:SignatureMethod Algorithm="http://www.w3.org/2001/04/xmldsig-morersa-sha256"/>
<capsig:Referrer URI="">
<capsig:Trans>
<capsig:Trans Algorithm="http://www.w3.org/2000/09/xmldsigenveloped-signature"/>
</capsig:Trans>
<capsig:DMethod Algorithm="http://www.w3.org/2001/04/xmlencsha256"/>
<capsig:DigestValue>wjL4tqltJY7m/4=</capsig:DigestValue>
</capsig:Referrer>
</capsig:Info>
test of data need to parse:<?xml version="1.0" encoding="UTF-8"?><alert xmlns="urn:oasis:names:tc::cap:1.2"><identifier>_2020-12-16T17:32:5620201116173256</identifier><sender>683</sender><sent>2020-12-16T17:32:56-05:00</sent><status>Test</status><msgType>Alert</msgType><source>test of data need to parse</source><scope>Public</scope><addresses/><code>Test1.0</code><note>WENS IPAWS</note><info><language>en-US</language></info>
So what I need to do is just extract the following:
<capsig:Info>
<capsig:CanonicalizationMethod Algorithm="http://www.w3.org/2001/10/xml-exc-c14n"/>
<capsig:SignatureMethod Algorithm="http://www.w3.org/2001/04/xmldsig-morersa-sha256"/>
<capsig:Referrer URL="">
<capsig:Trans>
<capsig:Trans Algorithm="http://www.w3.org/2000/09/xmldsigenveloped-signature"/>
</capsig:Trans>
<capsig:DMethod Algorithm="http://www.w3.org/2001/04/xmlencsha256"/>
<capsig:DigestValue>wjL4tqltJY7m/4=</capsig:DigestValue>
</capsig:Referrer>
</capsig:Info>
I have searched everywhere and I have found where things can be done with characters and counts but none of them really worked. Tried doing it with SQL but because the constant change in the string it causes issues. So my plan was get everything after "capsig:Info" and before "</capsig:Info>" then insert it into a table.
Is there a way to do this with Coldfusion?
Any suggestions would be appreciated.
Thanks!
Yes, you can use a regular expression match to extract the substring containing the text between the <capsig:Info> ... </capsig:Info> tags by using the ColdFusion function reMatch() which will return an array of all substrings that match the specified pattern. This can be done using the line of code below.
<!--- Use reMatch to extract all pattern matches into an array --->
<cfset parsedXml = reMatch("<capsig:Info>(.*?)</capsig:Info>", xmlToParse)>
<!--- parsedXml is an array of strings. The result will be found in the first array element as such --->
<cfdump var="#parsedXml[1]#" label="parsedXml">
You can see this using the demo here.
https://trycf.com/gist/00be732d93ef49b2427768e18e371527/lucee5?theme=monokai
I have a few large fixed with text files that have multiple specification formats in them. I need to parse out the txt files based on a character with a set location in the file. That character can have a different position in the file.
I have written queries for each of the different specifications (95 of them) with the start position and length hard coded into the query using the mid() function with a WHERE() function to filter the [Record Identifier] from the specification. As you can see below the 2 specifications in the WHERE() function have different placements in the txt file.
\\\
SELECT Mid([AllData],1,5) AS PlanNumber, Mid([AllData],6,4) AS Spaces1, Mid([AllData],10,3) AS Filler1, Mid([AllData],13,11) AS SSN, Mid([AllData],24,1) AS AccountIdentifier, Mid([AllData],25,5) AS Filler2, Mid([AllData],30,2) AS RecordIdentifier, Mid([AllData],32,1) AS FieldType, Mid([AllData],33,4) AS Filler3, Mid([AllData],37,8) AS HireDate, Mid([AllData],45,8) AS ParticipationDate, Mid([AllData],53,8) AS VestinDate, Mid([AllData],61,8) AS DateOfBirth, Mid([AllData],77,1) AS Spaces2, Mid([AllData],78,1) AS Reserved1, Mid([AllData],79,1) AS Reserved2, Mid([AllData],80,1) AS Spaces3
FROM TBL_Company1
WHERE (((Mid([AllData],30,2))="02") AND ((Mid([AllData],32,1))="D"));
\\\
Or
\\\
SELECT Mid([AllData],1,5) AS PlanNumber, Mid([AllData],6,4) AS Spaces1, Mid([AllData],10,3) AS Filler1, Mid([AllData],13,11) AS SSN, Mid([AllData],24,1) AS AccountIdentifier, Mid([AllData],25,7) AS RecordIdentifier, Mid([AllData],32,22) AS StreetAddressForBank, Mid([AllData],54,20) AS CityForBank, Mid([AllData],74,2) AS StateForBank, Mid([AllData],76,5) AS ZipCodeForBank
FROM TBL_Company1
WHERE (((Mid([AllData],25,7))="49EFTAD"));
\\\
Is there a way to Parse out this without having to hard code every position and length into the code?
I was thinking of having a table with all of the specifications in it and have an import function look to the specification table and parse out the data accordingly to a new table or maybe something else.
What I have done is not very scalable and if the format changes a little I would have to go back to each query to change it.
Any Help is greatly appreciated
I think in your situation, I'd want to be able to generate the SQL statement dynamically, as you suggest.
I'd have a table something like:
Format#,Position,OutColName,FromPos,Length,WhereValue
1,1,"PlanNumber",1,5,
1,2,"Spaces1",6,4,
...
1,n,,30,2,"02"
1,n+1,,32,1"D"
and then some VBA to process it and build and execute the SQL string(s). The SELECT clause entries would be recognized by having a value in the OutColName field and WHERE clause entries by values in the the WhereValue column.
Of course this is only more "efficient" in the sense that it's a bit easier to code up new formats or fix/modify existing ones.
is there a way to extract only IPv4 from a file in JSON language using VB.net
For example I would like that when I open a JSON file from VB I can filter only IPv4 from this text for example: https://pastebin.com/raw/S7Vnnxqa
& i expect the results like this https://pastebin.com/raw/8L8Ckrwi i founded this website that he offer a tool to do that https://www.toolsvoid.com/extract-ip-addresses/ i put the link here to understand more what i mean but i don't want to use an external tool i want it to be converted from VB directly thanks for your help in advance.
Your "text" is JSON. Load it using the JSON parser of your choice (google VB.NET parse JSON), loop over the matches array and read the IP address from the http.host property of each element.
Here is an example how to do it using the Newtonsoft.Json package (see it working here on DotNetFiddle):
' Assume that the variable myJsonString contains the original string
Dim myJObject = JObject.Parse(myJsonString)
For Each match In myJObject("matches")
Console.WriteLine(match("http")("host"))
Next
Output:
62.176.84.198
197.214.169.59
46.234.76.75
122.136.141.67
219.73.94.83
2402:800:621b:33f1:d1e3:5544:4fcf:526e
178.136.75.125
188.167.212.252
...
If you want to extract only IPv4 and not IPv6, you can use a regular expression to check whether it matches:
Dim IPV4Regex = New Regex("^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$")
Dim ip = match("http")("host")
If IPV4Regex.Match(ip).Success Then
Console.WriteLine(ip)
End If
62.176.84.198
197.214.169.59
46.234.76.75
122.136.141.67
219.73.94.83
178.136.75.125
188.167.212.252
...
Of course it's always recommended to parse the input data in a structured way, to avoid surprises such as false positives. But if you just want to match anything that looks like an IP address, regardless of the input format (even if you just put hello1.2.3.4world in the textbox), then you could use just the regular expression and skip the structured approach (see it working here on DotNetFiddle):
Dim IPV4RegexWithWordBoundary = New Regex("\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b")
Dim match = IPV4RegexWithWordBoundary.Match(myJsonString)
Do While match.Success
Console.WriteLine(match.Value)
match = match.NextMatch()
Loop
Here I modified the regular expression to use \b...\b instead of ^...$ so that it matches word boundaries instead of start/end of string. Note however that now we get IP addresses twice with the input that you provided, because the addresses exist more than once:
62.176.84.198
62.176.84.198
197.214.169.59
197.214.169.59
46.234.76.75
46.234.76.75
...