regular expression to pull words beginning with # - vb.net

Trying to parse an SQL string and pull out the parameters.
Ex: "select * from table where [Year] between #Yr1 and #Yr2"
I want to pull out "#Yr1" and "#Yr2"
I have tried many patterns, but none has worked, such as:
matches = Regex.Matches(sSQL, "\b#\w*\b")
and
matches = Regex.Matches(sSQL, "\b\#\w*\b")
Any help?

You're trying to put a word boundary after the #, rather than before. Maybe this:
\w(#[A-Z0-9a-z]+)
or
\w(#[^\s]+)

I would have gone with
/^|\s(#\w+)\s|$/
or if you didn't want to include the #
/^|\s#(\w+)\s|$/
though I also like joel's above, so maybe one of these
/^|\s(#[^\s]+)\s|$/
/^|\s#([^\s]+)\s|$/

Related

Open Refine: Exporting nested XML with templating

I have a question regarding the templating option for XML in Open Refine. Is it possible to export data from two columns in a nested XML-structure, if both columns contain multiple values, that need to be split first?
Here's an example to illustrate better what I mean. My columns look like this:
Column1
Column2
https://d-nb.info/gnd/119119110;https://d-nb.info/gnd/118529889
Grützner, Eduard von;Elisabeth II., Großbritannien, Königin
https://d-nb.info/gnd/1037554086;https://d-nb.info/gnd/1245873660
Müller, Jakob;Meier, Anina
Each value separated by semicolon in Column1 has a corresponding value in Column2 in the right order and my desired output would look like this:
<rootElement>
<recordRootElement>
...
<edm:Agent rdf:about="https://d-nb.info/gnd/119119110">
<skos:prefLabel xml:lang="zxx">Grützner, Eduard von</skos:prefLabel>
</edm:Agent>
<edm:Agent rdf:about="https://d-nb.info/gnd/118529889">
<skos:prefLabel xml:lang="zxx">Elisabeth II., Großbritannien, Königin</skos:prefLabel>
</edm:Agent>
...
</recordRootElement>
<recordRootElement>
...
<edm:Agent rdf:about="https://d-nb.info/gnd/1037554086">
<skos:prefLabel xml:lang="zxx">Müller, Jakob</skos:prefLabel>
</edm:Agent>
<edm:Agent rdf:about="https://d-nb.info/gnd/1245873660">
<skos:prefLabel xml:lang="zxx">Meier, Anina</skos:prefLabel>
</edm:Agent>
...
</recordRootElement>
<rootElement>
(note: in my initial posting, the position of the root element was not indicated and it looked like this:
<edm:Agent rdf:about="https://d-nb.info/gnd/119119110">
<skos:prefLabel xml:lang="zxx">Grützner, Eduard von</skos:prefLabel>
</edm:Agent>
<edm:Agent rdf:about="https://d-nb.info/gnd/118529889">
<skos:prefLabel xml:lang="zxx">Elisabeth II., Großbritannien, Königin</skos:prefLabel>
</edm:Agent>
)
I managed to split the values separated by ";" for both columns like this
{{forEach(cells["Column1"].value.split(";"),v,"<edm:Agent rdf:about=\""+v+"\">"+"\n"+"</edm:Agent>")}}
{{forEach(cells["Column2"].value.split(";"),v,"<skos:prefLabel xml:lang=\"zxx\">"+v+"</skos:prefLabel>")}}
but I can't find out how to nest the splitted skos:prefLabel into the edm:Agent element. Is that even possible? If not, I would work with seperate columns or another workaround, but I wanted to make sure, if there's a more direct way before.
Thank you!
Kristina
I am going to expand the answer from RolfBly using the Templating Exporter from OpenRefine.
I do have the following assumptions:
There is some other column left of Column1 acting as record identifying column (see first screenshot).
The columns actually have some proper names
The columns URI and Name are the only columns with multiple values. Otherwise we might produce empty XML elements with the following recipe.
We will use the information about records available via GREL to determine whether to write a <recordRootElement> or not.
Recipe:
Split first Name and then URI on the separator ";" via "Edit cells" => "Split multi-valued cells".
Go to "Export" => "Templating..."
In the prefix field use the value
<?xml version="1.0" encoding="utf-8"?>
<rootElement>
Please note that I skipped the namespace imports for edm, skos, rdf and xml.
In the row template field use the value:
{{if(row.index - row.record.fromRowIndex == 0, '<recordRootElement>', '')}}
<edm:Agent rdf:about="{{escape(cells['URI'].value, 'xml')}}">
<skos:prefLabel xml:lang="zxx">{{escape(cells['Name'].value, 'xml')}}</skos:prefLabel>
</edm:Agent>
{{if(row.index - row.record.fromRowIndex == row.record.rowCount - 1, '</recordRootElement>', '')}}
The row separator field should just contain a linebreak.
In the suffix field use the value:
</rootElement>
Disclaimer: If you're keen on using only OpenRefine, this won't be the answer you were hoping for. There may be ways in OR that I don't know of. That said, here's how I would do it.
Edit The trick is to keep URL and literal side by side on one line. b2m's answer below does just that: go from right to left splitting, not from left to right. You can then skip steps 2 and 3, to get the result in the image.
split each column into 2 columns by separator ;. You'll get 4 columns, 1 and 3 belong together, and 2 and 4 belong together. I'm assuming this will be the case consistently in your data.
export 1 and 3 to a file, and export 2 and 4 to another file, of any convenient format, using the custom tabular exporter.
concatenate those two files into one single file using an editor (I use Notepad++), or any other method you may prefer. Several ways to Rome here. Result in OR would be something like this.
You then have all sorts of options to put text strings in front, between and after your two columns.
In OR, you could use transform on column URL to build your XML using the below code
(note the \n for newline, that's probably just a line feed, you may want to use \r\n for carriage return + line feed if you're using Windows).
'<edm:Agent rdf:about="' + value + '">\n<skos:prefLabel xml:lang="zxx">' + cells.Name.value + '</skos:prefLabel>\n</edm:Agent>'
to get your XML in one column, like so
which you can then export using the custom tabular exporter again. Or instead you could use Add column based on this column in a similar manner, if you want to retain your URL column.
You could even do this in the editor without re-importing the file back into OR, but that's beyond the scope of this answer.

Querying full and sub-strings via multi-valued parameter using SQL

I am building a report with Microsoft SSRS (2012) having a multi-value parameter #parCode for the user to filter for certain codes. This works perfectly fine. Generally, my query looks like this:
SELECT ...
FROM ...
WHERE
TblCode.Code IN (#Code)
ORDER BY...
The codes are of following type (just an excerpt):
C73.0
C73.1
...
C79.0
C79.1
C79.2
Now, in additon to filtering for multiple of these codes I would like to als be able to filter for sub-strings of the codes. Meaning, when the user enters (Example 1)
C79
for #parCodes The output should be
C79.0
C79.1
C79.2
So eventually the user should be able to enter (Example 2)
C73.0
C79
for #parCodes and the output would be
C73.0
C79.0
C79.1
C79.2
I managed to implement both functionalities seperately, so either filtering for multiple "complete" codes or filterting for sub-string of code, but not both simultaneously.
I tried to do something like
...
WHERE
TblCode.Code IN (#parCode +'%')
ORDER BY...
but this screws up the Example 2. On the other hand, if I try to work with LIKE or = instead of IN statement, then I won't be able to make the parameter multi-valued.
Does anyone have an idea how to realize such functionality or whether IN statement pared with multi-valued parameters simply doesn't allow for it?
Thank you very much!
Assuming you are using SQL server
WHERE (
TblCode.Code IN (#parCode)
OR
CASE
WHEN CHARINDEX('.', Code)>0 THEN LEFT(TblCode.Code, CHARINDEX('.', TblCode.Code)-1)
ELSE TblCode.Code
END IN (#parCode)
)
The first clause makes exact match so for your example matches C73.0
The second clause matches characters before the dot character so it would get values C79.0, C79.1, C79.2 etc
Warning: Filtering using expressions would invalidate the use of an index on TblCode.Code

Using regexp in Big Query to extract URLs

I've been trying to extract any URL present within my 'Text' column in Big Query. The column contains a mixture of text and URLs dotted throughout (a cell might contain more than one URL) I'm trying to use this regexp:
SELECT
REGEXP_EXTRACT (Text, r'(http(s)?:\/\/.)?(www\.)?[-a-zA-Z0-9:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9%_:?\+.~#&//=]*')
FROM
Data.Text_Files
I currently get 'failed to parse regular expression' when I try to run the query. I've tried modifying it but to no avail.
The regexp works in an online builder but I'm just not sure how to incorporate it into Big Query.
Any help would be much appreciated - or at least pointers on how to incorporate regular expressions into Big Query!
Try below - it is for BigQuery Standard SQL (see Enabling Standard SQL and Migrating from legacy SQL)
WITH YourTable AS (
SELECT 1 AS id, 'What have you tried so far? Please edit your question to show a [Minimal, Complete, and Verifiable example](http://stackoverflow.com/help/mcve) of the code that you are having problems with, then we can try to help with the specific problem. You can also read [How to Ask](http://stackoverflow.com/help/how-to-ask). ' AS Text UNION ALL
SELECT 2 AS id, 'Important on SO, you can mark accepted answer by using the tick on the left of the posted answer, below the voting. see http://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work#5235 for why it is important. There are more ... You can check about what to do when someone answers your question - http://stackoverflow.com/help/someone-answers.' AS Text UNION ALL
SELECT 3 AS id, 'If an answer has helped you solve your problem and you accept it you should also consider voting it up. See more at http://stackoverflow.com/help/someone-answers and Upvote section in http://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work#5235' AS Text
)
SELECT
id,
REGEXP_EXTRACT_ALL(Text, r'(?i:(?:(?:(?:ftp|https?):\/\/)(?:www\.)?|www\.)(?:[\da-z-_\.]+)(?:[a-z\.]{2,7})(?:[\/\w\.-_\?\&]*)*\/?)') AS URL
FROM YourTable
This gives you output with id field, and repeated field with all respective URLs
If you need flattened result - you can use below variation
WITH YourTable AS (
SELECT 1 AS id, 'What have you tried so far? Please edit your question to show a [Minimal, Complete, and Verifiable example](http://stackoverflow.com/help/mcve) of the code that you are having problems with, then we can try to help with the specific problem. You can also read [How to Ask](http://stackoverflow.com/help/how-to-ask). ' AS Text UNION ALL
SELECT 2 AS id, 'Important on SO, you can mark accepted answer by using the tick on the left of the posted answer, below the voting. see http://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work#5235 for why it is important. There are more ... You can check about what to do when someone answers your question - http://stackoverflow.com/help/someone-answers.' AS Text UNION ALL
SELECT 3 AS id, 'If an answer has helped you solve your problem and you accept it you should also consider voting it up. See more at http://stackoverflow.com/help/someone-answers and Upvote section in http://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work#5235' AS Text
)
SELECT
id, URL
FROM (
SELECT id, REGEXP_EXTRACT_ALL(Text, r'(?i:(?:(?:(?:ftp|https?):\/\/)(?:www\.)?|www\.)(?:[\da-z-_\.]+)(?:[a-z\.]{2,7})(?:[\/\w\.-_\?\&]*)*\/?)') AS URL
FROM YourTable
), UNNEST(URL) as URL
Note: you can use here any regexp that you will be able to find on web - but what a must is - there is only one matching group is allowed! so all inner matching group should be escaped with ?: as you can see it in above examples. So the ONLY group that you expect to see in output should be left as is - w/o ?:
Your regex has an incomplete capturing group, and has 2 unescaped characters. I don't know which online regex builder you're using, but maybe you forgot to put your new regex into it?
The problems are as follows:
(http(s)?:\/\/.)?(www\.)?[-a-zA-Z0-9:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9%_:?\+.~#&//=]*
POINTERS TO PROBLEMS ON THIS LINE ---> ^1 ^^2
This is the start of a capturing group with no end. You probably want the ) right before the *.
All slashes need to be escaped. This should probably be \/ or maybe even \/\\.
Here is an example with both of my suggestions implemented: https://regex101.com/r/pt1hqS/1
Good luck fixing it!

REGEXP Oracle SQL

I am having a clob field rq_dev_comments which should replace the username with "anonymous"
Update <TABLE>.req
Set rq_dev_comments = regexp_REPLACE(rq_dev_comments,
'\<[bB]\>.*gt;,', '<b>anonymous ')
where length(rq_dev_comments) > ...
Now my question is, if there is a way to check before wheather "anonymous" is already set or not and how to reduce the datasets?
Example:
rq_dev_comments = "<html><b>HendrikHeim</b>: I found an error....</html>"
Desired: "<html><b>Anonymous</b>: I found an error....</html>"
The following solution will not catch cases where "username" may appear more than once, and some but not all occurrences have already been replaced with "anonymous". So think twice before you use it. (The same would apply to ANY solutions along the lines of what you asked!)
Add the following to your WHERE clause:
... where length(...) ....
and dbms_lob.instr(rq_dev_comments, '<b>Anonymous') = 0
"= 0" means the search pattern wasn't found in the input string.
Another thing: In the example you show "anonymous" capitalized (with upper case A), but in your code you have it all lower case. Decide one way or another and be consistent. Good luck!

Detailed information in Lucene/Solr results

After having performed a search in Lucene/Solr without having specified a field, how can I know in which fields of a result document the search string was found (and how often)?
You could use Query Highlighting.
Try setting debugQuery=on. See this example.
As mentioned, use debugQuery=true. The response will then include an "explain" section. By default, this will give you some awful formatted text that looks like this:
0.69102794 = (MATCH) weight(body:arrai^1.5 in 6357), product of:
0.46610788 = queryWeight(body:arrai^1.5), product of:
1.5 = boost
5.591044 = idf(docFreq=55709, maxDocs=5492855)
0.055577915 = queryNorm
1.4825494 = (MATCH) fieldWeight(body:arrai in 6357), product of:
2.828427 = tf(termFreq(body:arrai)=8)
5.591044 = idf(docFreq=55709, maxDocs=5492855)
0.09375 = fieldNorm(field=body, doc=6357)
For each match in each field, you will get a block like this that explains how SOLR computed the relevancy of this document to your query. What you're asking about (how many matches in this document's field) SOLR calls term frequency "tf". You can see this on the 7th line of the output i pasted above. In this line, SOLR is telling you that it found 8 matches for arrai in the field called "body".
The other lines stand for things like inverse document frequency-"idf" (how rare the matched term is) and fieldNorm, which relates to how short the document's field is relative to the match. You can learn about these here: http://wiki.apache.org/solr/SolrRelevancyFAQ
FYI if you need this "explain" information in a structured format instead of clumsy text you can pass this parameter with your query: debug.explain.structured=true However, its still pretty hard to use = )