I have the following function to convert any string to title case:
CREATE OR REPLACE FUNCTION udf.title_case(str STRING)
RETURNS STRING
LANGUAGE js AS """
return str
.replace(/([^\\W_]+[^\\s-]*) */g, function(txt){return txt.charAt(0).toUpperCase() + txt.substr(1).toLowerCase();})
""";
UPDATE:
I fixed chartAt to charAt and still get the same error
and it produces the following error:
"project.dataset.charAt" is not a function at UDF$1(STRING) line 3, columns 110-111
I can bypass this error by using [] notation which is not ideal however I hit the same error with substr.
I normally test my functions in JSBin or similar and works fine but when translate it to Bigquery I need to escape \ in regex and then deal with these out of the blue errors.
Makes life harder for those who are not experienced in the arts of JS programming.
Thanks in advance for your help.
Consider use of INITCAP function instead of JS UDF
It takes a STRING and returns it with the first character in each word in uppercase and all other characters in lowercase
for example
SELECT INITCAP('I have the following function to convert any string to title case:')
produces below output
You write the wrong function name, "char[t]At" instead of "charAt".
I use temp function to test
CREATE TEMP FUNCTION tempFunc(str STRING)
RETURNS STRING
LANGUAGE js AS """
return str
.replace(/([^\\W_]+[^\\s-]*) */g, function(txt){return txt.charAt(0).toUpperCase() + txt.substr(1).toLowerCase();})
""";
select tempFunc("abce")
and the result is "Abce".
You can try this query in your bigquery editor
Related
I have names in my table for which I need to replace certain characters with others.
I have the following code, but I get an error
REPLACE(REPLACE(TRIM(name),NCHAR(0x2019),NCHAR(0x0027)),NCHAR(0x200B),'')
However I get the following error message. Can somebody help me rewrite the code not using the Nchar function?
Undefined function: 'NCHAR'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'
Instead of NCHAR function you can use unicode literals (\uXXXX) to represent a character as it's described in Spark documentation, in your case it will be:
REPLACE(REPLACE(TRIM(name),'\u2019','\u0027'),'\u200B','')
I have, in a database, records that are serialized PHP strings that I must obfuscate emails if there are any. The simplest record is like {s:20:"pika.chu#pokemon.com"}. It is basically saying: this is a string of length 20 which is pika.chu#pokemon.com. This field can be kilobytes long with lot of emails (or none) and sometimes it is empty.
I wish I could use a SQL regular expression function to obfuscate the user part of the email while preserving the length of the string in order not to break the PHP serialization. The example email above shall be turned into {s:20:"xxxxxxxx#pokemon.com"} where the number of x matches the length of pika.chu.
Any thoughts?
Here is a more complete example of what can be found as serialized PHP:
a:4:{s:7:"locales";a:3:{i:0;s:5:"fr_FR";i:1;s:5:"de_DE";i:2;s:5:"en_US";}s:9:"publisher";s:18:"john#something.com";s:7:"authors";a:2:{i:0;s:21:"william#something.com";i:1;s:19:"debbie#software.org";}s:12:"published_at";O:8:"DateTime":3:{s:4:"date";s:26:"2022-01-26 13:05:26.531289";s:13:"timezone_type";i:3;s:8:"timezone";s:3:"UTC";}}
I tried to do it using native functions but it not worked because functions like REGEXP_REPLACE don't let you manipulate the match to get the size of it, for example.
Instead, I've created a UDF to do that:
CREATE TEMP FUNCTION hideEmail(str STRING)
RETURNS STRING
LANGUAGE js AS """
return str
.replace(/([a-zA-Z.0-9_\\+-:]*)#/g, function(txt){return '*'.repeat(txt.length-1)+"#";})
""";
select hideEmail('a:4:{s:7:"locales";a:3:{i:0;s:5:"fr_FR";i:1;s:5:"de_DE";i:2;s:5:"en_US";}s:9:"publisher";s:18:"john#something.com";s:7:"authors";a:2:{i:0;s:21:"william#something.com";i:1;s:19:"debbie#software.org";}s:12:"published_at";O:8:"DateTime":3:{s:4:"date";s:26:"2022-01-26 13:05:26.531289";s:13:"timezone_type";i:3;s:8:"timezone";s:3:"UTC";}}')
Result:
a:4:{s:7:"locales";a:3:{i:0;s:5:"fr_FR";i:1;s:5:"de_DE";i:2;s:5:"en_US";}s:9:"publisher";s:18:"****#something.com";s:7:"authors";a:2:{i:0;s:21:"*******#something.com";i:1;s:19:"******#software.org";}s:12:"published_at";O:8:"DateTime":3:{s:4:"date";s:26:"2022-01-26 13:05:26.531289";s:13:"timezone_type";i:3;s:8:"timezone";s:3:"UTC";}}
What is the Substitute function for CONTAINS function in QUICK SIGHT. Or any function that can be used in the backend (in Redshift).
I think you are looking for locate. If a substring is not contained in a string, locate returns 0
locate(expression, substring, start)
Documentation
I need to execute a SQL query, which converts a String column to a Array and then validate the size of that array
I was able to do it easily with postgresql:
e.g.
select
cardinality(string_to_array('a$b','$')),
cardinality(string_to_array('a$b$','$')),
cardinality(string_to_array('a$b$$$$$','$')),
But for some reason trying to convert String on vertica to array is not that simple, Saw this links:
https://www.vertica.com/blog/vertica-quick-tip-dynamically-split-string/
https://forum.vertica.com/discussion/239031/how-to-create-an-array-in-vertica
And much more that non of them helped.
I also tried using:
select REGEXP_COUNT('a$b$$$$$','$')
But i get an incorrect value - 1.
How can i Convert String to array on Vertica and gets his Length ?
$ has a special meaning in a regular expression. It represents the end of the string.
Try escaping it:
select REGEXP_COUNT('a$b$$$$$', '[$]')
You could create a UDx scalar function (UDSF) in Java, C++, R or Python. The input would be a string and the output would be an integer. https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/ExtendingVertica/UDx/ScalarFunctions/ScalarFunctions.htm
This will allow you to use language specific array logic on the strings passed in. For example in python, you could include this logic:
input_list = input.split("$")
filtered_input_list = list(filter(None, input_list))
list_count = len(filtered_input_list)
These examples are a good starting point for writing UDx's for Vertica. https://github.com/vertica/UDx-Examples
I wasn't able to convert to an array - but Im able to get the length of the values
What i do is convert to Rows an use count - its not best performance wise
But with this way Im able to do also manipulation like filtering of each value between delimiter - and i dont need to use [] for characters like $
select (select count(1)
from (select StringTokenizerDelim('a$b$c','$') over ()) t)
Return 3
I've written a plpgsql script which generates an array of json objects in a string but after I use to_json() method passing a variable with that string to it, it returns a result which is doublequoted and also every doublequote character is escaped. But I need that string as is.
initial content of jsonResult variable is:
[{"key":{04949429911,"Code":"400"},"value":"20000.00"},{"key":{"InsuranceNumber":"04949429911","Code":"403"},"value":"10000.00"},...]
but after to_json() it looks like this:
"[{\"key\":{04949429911,\"Code\":\"400\"},\"value\":\"20000.00\"},{\"key\":{\"InsuranceNumber\":\"04949429911\",\"Code\":\"403\"},\"value\":\"10000.00\"}...]"
This is the place where everything stored in jsonResult breakes:
UPDATE factor_value SET params = to_json(jsonResult) WHERE id = _id;
What am I doing wrong?
This answer points out that simply casting to json should suffice:
UPDATE factor_value SET params = jsonResult::json WHERE id = _id;
The weird escaping you see is probably due to postgresql not realizing you already have valid JSON and your varchar is just converted to a plain javascript/JSON string.
I needed to convert a bytea column to jsonb and I ended up with escaped characters when using to_json. I also used bytea_data_column::jsonb and postgres said cannot cast to jsonb. Here is how I worked around it:
bytea_data_column::text::jsonb