Split string with [] brackets - google-bigquery

I have a string 'attributes.inquiry_result[{name: tran type}]', want split it in a way it returns array ['attributes', 'inquiry_result', '{name: tran type}'].
I tried with 'attributes.inquiry_result[{name: tran type}]'.split(/\[([^[\]]*)\]/);, but it do not split by dot, returns ['attributes.inquiry_result', '{name: tran type}']
Tried adding logic for dot, but it resulting in something else.

You could use the split function with Regex. Filter is being used to remove the last empty value, since your string ends with the character "]"
const string = 'attributes.inquiry_result[{name: tran type}]';
const result = string.split(/\.|]|\[/).filter(item => item);
console.log(result);

Related

Split by delimiter which is contained in a record

I have a column which I am splitting in Snowflake.
The format is as follows:
I have been using split_to_table(A, ',') inside of my query but as you can probably tell this uncorrectly also splits the Scooter > Sprinting, Jogging and Walking record.
Perhaps having the delimiter only work if there is no spaced on either side of it? As I cannot see a different condition that could work.
I have been researching online but haven't found a suitable work around yet, is there anyone that encountered a similar problem in the past?
Thanks
This is a custom rule for the split to table, so we can use a UDTF to apply a custom rule:
create or replace function split_to_table2(STR string, DELIM string, ROW_MUST_CONTAIN string)
returns table (VALUE string)
language javascript
strict immutable
as
$$
{
initialize: function (argumentInfo, context) {
},
processRow: function (row, rowWriter, context) {
var buffer = "";
var i;
const s = row.STR.split(row.DELIM);
for(i=0; i<s.length-1; i++) {
buffer += s[i];
if(s[i+1].includes(row.ROW_MUST_CONTAIN)) {
rowWriter.writeRow({VALUE: buffer});
buffer = "";
} else {
buffer += row.DELIM
}
}
rowWriter.writeRow({VALUE: s[i]})
},
}
$$;
select VALUE from
table(split_to_table2('Car > Bike,Bike > Scooter,Scooter > Sprinting, Jogging and Walking,Walking > Flying', ',', '>'))
;
Output:
VALUE
Car > Bike
Bike > Scooter
Scooter > Sprinting, Jogging and Walking
Walking > Flying
This UDTF adds one more parameter than the two in the build in table function split_to_table. The third parameter, ROW_MUST_CONTAIN is the string a row must contain. It splits the string on DELIM, but if it does not have the ROW_MUST_CONTAIN string, it concatenates the strings to form a complete string for a row. In this case we just specify , for the delimiter and > for ROW_MUST_CONTAIN.
We can get a little clever with regexp_replace by replacing the actual delimiters with something else before the table split. I am using double pipes '||' but you can change that to something else. The '\|\|\\1' trick is called back-referencing that allows us to include the captured group (\\1) as part of replacement (\|\|)
set str='car>bike,bike>car,truck, and jeep,horse>cat,truck>car,truck, and jeep';
select $str, *
from table(split_to_table(regexp_replace($str,',([^>,]+>)','\|\|\\1'),'||'))
Yes, you are right. The only pattern, which I can see, is the one with the whitespace after the comma.
It's a small workaround but we can make use of this pattern. In below code I am replacing such commas, where we do have whitespaces afterwards. Then I am applying split to table function and I am converting the previous replacement back.
It's not super pretty and would crash if your string contains "my_replacement" or any other new pattern, but its working for me:
select replace(t.value, 'my_replacement', ', ')
from table(
split_to_table(replace('Car > Bike,Bike > Scooter,Scooter > Sprinting, Jogging and Walking,Walking > Flying', ', ', 'my_replacement'),',')) t

String comparison in TCL

I have a variable nodename = v445 and another variable nodepatch=v445-sctpsv
I want to compare as
if { $nodename == ???????} {
}
so here I just want to compare only part before certain - the second variable could contain more than one - in name, so I just want to extract equivalent string to nodename to compare with.
after string manipulation second part should come up like this:
if { $nodename == "v445"} {
proceed } else {
}
Try
set nodename v445
set nodepatch v445-sctpsv
if {[string match $nodename* $nodepatch]} {
proceed
} else {
}
string match does a glob-style match against a string, in this case the string that starts with the value in $nodename and contains zero or more characters is matched against the string $nodepatch.
If you need to ensure that the dash occurs, use string match $nodename-* $nodepatch instead.
Documentation: if, set, string

Is there a built-in function to extract all characters in a string up until the first occurrence of a space?

Is there a built-in function to extract all characters in a string up until the first occurrence of a space?
Say the string is:
Methicillin-resistant staphylococcus aureus
I want to be able to get the substring:
Methicillin-resistant
You can do it in two functions:
newstring = mystring.Substring(0, mystring.IndexOf(" "))
Although that will fail if there's no space in mystring.
So you could pull out mystring.IndexOf(" ") into a variable and check whether it's -1 (no space found) before you try to use it in Substring.
The first solution you can use is a simple IndexOf
string GetFirstWord(string source)
{
int index = source.IndexOf(" ");
if (index == -1) return source;
else return source.Substring(0, index);
}
The second solution can be used if you want to keep all words into a string array.
string[] GetWords(string source)
{
return source.Split(' ');
}
if you only want the first word, you can use it like this :
string word = GetWords("Methicillin-resistant staphylococcus aureus")[0];
And a VB.NET solution. No, it can't be done with one built-in method; you need two:
Left(myString, InStr(myString, " ") - 1)
And like the other solutions you need to check InStr doesn't return 0 if myString may not contain a space.

How to filter out some vulnerability causing characters in query string?

I need to filter out characters like /?-^%{}[];$=*`#|&#'\"<>()+,\. I need replace this with empty string if it is there in the query string. Please help me out. I am using this in ASP pages.
Best idea would be to use a function something along the lines of:
Public Function MakeSQLSafe(ByVal sql As String) As String
'first i'd avoid putting quote chars in as they might be valid? just double them up.
Dim strIllegalChars As String = "/?-^%{}[];$=*`#|&#\<>()+,\"
'replace single quotes with double so they don't cause escape character
If sql.Contains("'") Then
sql = sql.Replace("'", "''")
End If
'need to double up double quotes from what I remember to get them through
If sql.Contains("""") Then
sql = sql.Replace("""", """""")
End If
'remove illegal chars
For Each c As Char In strIllegalChars
If sql.Contains(c.ToString) Then
sql = sql.Replace(c.ToString, "")
End If
Next
Return sql
End Function
This hasn't been tested and it could probably be made more efficient, but it should get you going. Wherever you execute your sql in your app, just wrap the sql in this function to clean the string before execution:
ExecuteSQL(MakeSQLSafe(strSQL))
Hope that helps
As with any string sanitisation, you're much better off working with a whitelist that dictates which characters are allowed, rather than a blacklist of characters that aren't.
This question about filtering HTML tags resulted in an accepted answer suggesting the use of a regular expression to match against a whitelist: How do I filter all HTML tags except a certain whitelist? - I suggest you do something very similar.
I'm using URL Routing and I found this works well, pass each part of your URL to this function. It's more than you need as it converts characters like "&" to "and", but you can modify it to suit:
public static string CleanUrl(this string urlpart) {
// convert accented characters to regular ones
string cleaned = urlpart.Trim().anglicized();
// do some pretty conversions
cleaned = Regex.Replace(cleaned, " ", "-");
cleaned = Regex.Replace(cleaned, "#", "no.");
cleaned = Regex.Replace(cleaned, "&", "and");
cleaned = Regex.Replace(cleaned, "%", "percent");
cleaned = Regex.Replace(cleaned, "#", "at");
// strip all illegal characters like punctuation
cleaned = Regex.Replace(cleaned, "[^A-Za-z0-9- ]", "");
// convert spaces to dashes
cleaned = Regex.Replace(cleaned, " +", "-");
// If we're left with nothing after everything is stripped and cleaned
if (cleaned.Length == 0)
cleaned = "no-description";
// return lowercased string
return cleaned.ToLower();
}
// Convert accented characters to standardized ones
private static string anglicized(this string urlpart) {
string beforeConversion = "àÀâÂäÄáÁéÉèÈêÊëËìÌîÎïÏòÒôÔöÖùÙûÛüÜçÇ’ñ";
string afterConversion = "aAaAaAaAeEeEeEeEiIiIiIoOoOoOuUuUuUcC'n";
string cleaned = urlpart;
for (int i = 0; i < beforeConversion.Length; i++) {
cleaned = Regex.Replace(urlpart, afterConversion[i].ToString(), afterConversion[i].ToString());
}
return cleaned;
// Spanish : ÁÉÍÑÓÚÜ¡¿áéíñóúü"
}

Length Cannot be zero vb.net

Hi is there away to detect the length of a byte before I get the error message:
Length cannot be less than zero. Parameter name: length
I get the error on this line:
new_username = new_username.Substring(0, new_username.IndexOf(" Joined "))
I am removing the "joined" from the string I get....how can I ignore it is "joined" isnt the the data?
Thanks
I would test to see what IndexOf returned before using it in this context:
if(new_username.IndexOf(" Joined") > 0)
{
new_username = new_username.Substring(0, new_username.IndexOf(" Joined "))
}
Try this:
new_username = new_Username.Replace(" Joined ", "")
Be warned that this will remove all occurrences of the "Joined" substring rather than just the first.
It looks like new_username.IndexOf(" Joined ") is returning -1 meaning the string " Joined" was not found by Substring. I would break this out into two statements:
The error you are seeing is that you are effectively making this call:
new_username = new_username.Substring(0, -1)