I have a bunch of data that contains a phone number and a birthday as well as other data.
{1997-06-28,07742367858}
{07791100873,1996-07-14}
{30/01/1997,07974335488}
{1997-04-04,07701003703}
{1996-03-11,07480227283}
{1998-06-20,07713817233}
{1996-09-13,07435148920}
{"21 03 2000",07548542539,1st}
{1996-03-09,07539248008}
{07484642432,1996-03-01}
I am trying to extract the phone number from this, however unsure on how to get this out when the data is not always in the same order.
I would expect to one column that return a phone number, the next which returned a birthday then another which return any arbitrary value in the 3rd column slot.
You can try to sort parts of each string by the number of digits they contain. This can be done with the expression:
select length(regexp_replace('1997-06-28', '\D', '', 'g'))
length
--------
8
(1 row)
The query removes curly brackets from strings, splits them by comma, sorts elements by the number of digits and aggregates back to arrays:
with my_data(str) as (
values
('{1997-06-28,07742367858}'),
('{07791100873,1996-07-14}'),
('{30/01/1997,07974335488}'),
('{1997-04-04,07701003703}'),
('{1996-03-11,07480227283}'),
('{1998-06-20,07713817233}'),
('{1996-09-13,07435148920}'),
('{"21 03 2000",07548542539,1st}'),
('{1996-03-09,07539248008}'),
('{07484642432,1996-03-01}')
)
select id, array_agg(elem order by length(regexp_replace(elem, '\D', '', 'g')) desc)
from (
select id, trim(unnest(string_to_array(str, ',')), '"') as elem
from (
select trim(str, '{}') as str, row_number() over () as id
from my_data
) s
) s
group by id
Result:
id | array_agg
----+--------------------------------
1 | {07742367858,1997-06-28}
2 | {07791100873,1996-07-14}
3 | {07974335488,30/01/1997}
4 | {07701003703,1997-04-04}
5 | {07480227283,1996-03-11}
6 | {07713817233,1998-06-20}
7 | {07435148920,1996-09-13}
8 | {07548542539,"21 03 2000",1st}
9 | {07539248008,1996-03-09}
10 | {07484642432,1996-03-01}
(10 rows)
See also this answer Looking for solution to swap position of date format DMY to YMD if you want to normalize dates. You should modify the function:
create or replace function iso_date(text)
returns date language sql immutable as $$
select case
when $1 like '__/__/____' then to_date($1, 'DD/MM/YYYY')
when $1 like '____/__/__' then to_date($1, 'YYYY/MM/DD')
when $1 like '____-__-__' then to_date($1, 'YYYY-MM-DD')
when trim($1, '"') like '__ __ ____' then to_date(trim($1, '"'), 'DD MM YYYY')
end
$$;
and use it:
select id, a[1] as phone, iso_date(a[2]) as birthday, a[3] as comment
from (
select id, array_agg(elem order by length(regexp_replace(elem, '\D', '', 'g')) desc) as a
from (
select id, trim(unnest(string_to_array(str, ',')), '"') as elem
from (
select trim(str, '{}') as str, row_number() over () as id
from my_data
) s
) s
group by id
) s
id | phone | birthday | comment
----+-------------+------------+---------
1 | 07742367858 | 1997-06-28 |
2 | 07791100873 | 1996-07-14 |
3 | 07974335488 | 1997-01-30 |
4 | 07701003703 | 1997-04-04 |
5 | 07480227283 | 1996-03-11 |
6 | 07713817233 | 1998-06-20 |
7 | 07435148920 | 1996-09-13 |
8 | 07548542539 | 2000-03-21 | 1st
9 | 07539248008 | 1996-03-09 |
10 | 07484642432 | 1996-03-01 |
(10 rows)
Related
I'm trying to split a column that contains Strings separated by commas into rows (easy part) but also divide the second column by the number of item in the comma separated string.
Input -
+--------------------+----+
|11710, 11830 | 10 |
+--------------------+----+
|11711, 11015, 10020 | 9 |
+--------------------+----+
Expected result
+------+---+
|11710 | 5 |
+------+---+
|11830 | 5 |
+------+---+
|11711 | 3 |
+------+---+
|11015 | 3 |
+------+---+
|10020 | 3 |
+------+---+
Query:
#standardSQL
WITH `project.dataset.table` AS (
SELECT '11710, 11830' id, 10 hours UNION ALL
SELECT '11711, 11015, 10020', 9
)
SELECT * EXCEPT(uniq_id) REPLACE(uniq_id AS id)
FROM `project.dataset.table`,
UNNEST(SPLIT(id)) uniq_id
Try this
WITH data_ AS (
SELECT [11710, 11830] id, 10 hours UNION ALL
SELECT [11711, 11015, 10020], 9
)
select itm,
cast((hours/array_length(id)) as int64) as div ,
hours from data_,unnest(id) as itm
Base on your sharing script, might also consider below approach.
WITH `project.dataset.table` AS (
SELECT '11710, 11830' id, 10 hours UNION ALL
SELECT '11711, 11015, 10020', 9
)
SELECT
TRIM(VALUE) AS id,
CAST(HOURS/ARRAY_LENGTH(SPLIT(id,',')) AS INT64) AS HOURS
FROM `project.dataset.table`,UNNEST(split(id,',')) AS VALUE
I have a table with a column that contains a list of strings like below:
EXAMPLE:
STRING User_ID [...]
"[""null"",""personal"",""Other""]" 2122213 ....
"[""Other"",""to_dos_and_thing""]" 2132214 ....
"[""getting_things_done"",""TO_dos_and_thing"",""Work!!!!!""]" 4342323 ....
QUESTION:
I want to be able to get a count of the amount of times each unique string appears (strings are seperable within the strings column by commas) but only know how to do the following:
SELECT u.STRING, count(u.USERID) as cnt
FROM table u
group by u.STRING
order by cnt desc;
However the above method doesn't work as it only counts the number of user ids that use a specific grouping of strings.
The ideal output using the example above would like this!
DESIRED OUTPUT:
STRING COUNT_Instances
"null" 1223
"personal" 543
"Other" 324
"to_dos_and_thing" 221
"getting_things_done" 146
"Work!!!!!" 22
Based on your description, here is my sample table:
create table u (user_id number, string varchar);
insert into u values
(2122213, '"[""null"",""personal"",""Other""]"'),
(2132214, '"[""Other"",""to_dos_and_thing""]"'),
(2132215, '"[""getting_things_done"",""TO_dos_and_thing"",""Work!!!!!""]"' );
I used SPLIT_TO_TABLE to split each string as a row, and then REGEXP_SUBSTR to clean the data. So here's the query and output:
select REGEXP_SUBSTR( s.VALUE, '""(.*)""', 1, 1, 'i', 1 ) extracted, count(*) from u,
lateral SPLIT_TO_TABLE( string , ',' ) s
GROUP BY extracted
order by count(*) DESC;
+---------------------+----------+
| EXTRACTED | COUNT(*) |
+---------------------+----------+
| Other | 2 |
| null | 1 |
| personal | 1 |
| to_dos_and_thing | 1 |
| getting_things_done | 1 |
| TO_dos_and_thing | 1 |
| Work!!!!! | 1 |
+---------------------+----------+
SPLIT_TO_TABLE https://docs.snowflake.com/en/sql-reference/functions/split_to_table.html
REGEXP_SUBSTR https://docs.snowflake.com/en/sql-reference/functions/regexp_substr.html
I have 2 string columns (thousands of rows) with ordered numbers in each string (there can be zero to ten numbers in each string). Example:
+------------------+------------+
| ColString1 | ColString2 |
+------------------+------------+
| 1;3;5;12; | 4;6' |
+------------------+------------+
| 1;5;10 | 2;26; |
+------------------+------------+
| 4;7; | 3; |
+------------------+------------+
The end result is to combine these 2 columns, sort the numbers in
ascending order and then put each number into individual columns (smallest, 2nd smallest etc).
e.g. Colstring1 is 1;3;5;12; and ColString2 is 4;6; needs to return 1;3;4;5;6;12; which I then use xml to allocated into columns.
Everthing works fine using xml apart from the step to order the numbers (i.e I'm getting 1;3;5;12;4;6; when I combine the strings i.e. not in ascending order).
I've tried put them into a JSON array first to order, thinking I could do a top[1] etc but that did not work.
Any help on how to combine the 2 columns and order them before inserting into columns:
Steps so far:
Example data:
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, ColString1 VARCHAR(50), ColString2 VARCHAR(50));
INSERT INTO #tbl (ColString1, ColString2)
VALUES
('1;3;5;12;', '4;6;'),
('1;5;10;', '2;26;'),
('14;', '3;8;');
XML Approach (Combines strings and puts into columns but not in the correct order):
;WITH Split_Numbers (xmlname)
AS
(
SELECT
CONVERT(XML,'<Names><name>'
+ REPLACE ( LEFT(ColString1+ColString2,LEN(ColString1+ColString2) - 1),';', '</name><name>') + '</name></Names>') AS xmlname
FROM #tbl
)
SELECT
xmlname.value('/Names[1]/name[1]','int') AS Number1,
xmlname.value('/Names[1]/name[2]','int') AS Number2,
xmlname.value('/Names[1]/name[3]','int') AS Number3,
xmlname.value('/Names[1]/name[4]','int') AS Number4,
xmlname.value('/Names[1]/name[5]','int') AS Number5
--etc for additional columns
FROM Split_Numbers
Current Output: numbers not in correct order,
+---------+---------+---------+---------+---------+
| Number1 | Number2 | Number3 | Number4 | Number5 |
+---------+---------+---------+---------+---------+
| 1 | 3 | 5 | 12 | 4 |
| 1 | 5 | 10 | 2 | 26 |
| 14 | 3 | 8 | NULL | NULL |
+---------+---------+---------+---------+---------+
Desired Output: numbers in ascending order.
+---------+---------+---------+---------+---------+
| Number1 | Number2 | Number3 | Number4 | Number5 |
+---------+---------+---------+---------+---------+
| 1 | 3 | 4 | 5 | 6 |
| 1 | 2 | 5 | 10 | 26 |
| 3 | 8 | 14 | NULL | NULL |
+---------+---------+---------+---------+---------+
JSON Approach: combines the columns into a JSON array but I still can't order correctly when in JSON format.
REPLACE ( CONCAT('[', LEFT(ColString1+ColString2,LEN(ColString1+ColString2) - 1), ']') ,';',',')
Any help will be greatly appreciated whether there is a way to order the xml or JSON string prior to entry. Happy to consider an alternative way if there is an easier solution.
You can use string_agg() and string_split():
select t.*, newstring
from t cross apply
(select string_agg(value, ',') order by (value) as newstring
from (select s1.value
from unnest(colstring1, ',') s1
union all
select s2.value
from unnest(colstring2, ',') s2
) s
) s;
That said, you should probably put your effort into fixing the data model. Storing numbers in strings is bad. Storing multiple values in a string is bad, bad. If the numbers are foreign references to other tables, that is bad, bad, bad, bad, bad.
While waiting for a DDL and sample data population, etc., here is a conceptual example for you. It is using XQuery and its FLWOR expression.
CTE does most of the heavy lifting:
Concatenates both columns values into one string. CONCAT() function protects against NULL values.
Converts it into XML data type.
Sorts XML elements by converting their values to int data type in the FLWOR expression.
Filters out XML elements with no legit values.
The rest is trivial.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, col1 VARCHAR(100), col2 VARCHAR(100));
INSERT INTO #tbl (col1, col2)
VALUES
('1;3;5;12;', '4;6;'),
('1;5;10;', '2;26;');
-- DDL and sample data population, end
DECLARE #separator CHAR(1) = ';';
;WITH rs AS
(
SELECT *
, CAST('<root><r><![CDATA[' +
REPLACE(CONCAT(col1, col2), #separator, ']]></r><r><![CDATA[') +
']]></r></root>' AS XML).query('<root>
{
for $x in /root/r[text()]
order by xs:int($x)
return $x
}
</root>') AS sortedXML
FROM #tbl
)
SELECT ID
, c.value('(r[1]/text())[1]','INT') AS Number1
, c.value('(r[2]/text())[1]','INT') AS Number2
, c.value('(r[3]/text())[1]','INT') AS Number3
-- continue with the rest of the columns
FROM rs CROSS APPLY sortedXML.nodes('/root') AS t(c);
Output
+----+---------+---------+---------+
| ID | Number1 | Number2 | Number3 |
+----+---------+---------+---------+
| 1 | 1 | 3 | 4 |
| 2 | 1 | 2 | 5 |
+----+---------+---------+---------+
I need to create function that replaces characters in a string to characters from another table. What I'm trying returns exactly the same string from the beginning. Table t_symbols is:
+-------------------+-------------------------+
| Symbol_to_replace | Symbol_in_return_string |
+-------------------+-------------------------+
| K | Ќ |
| k | ќ |
| X | Ћ |
| x | ћ |
| A | Є |
| a | є |
| H | Њ |
| h | њ |
| O | ¤ |
| o | µ |
| U | ¦ |
| u | ± |
| Y | ‡ |
| y | ‰ |
| I | І |
| i | і |
| G | Ѓ |
| g | ѓ |
+-------------------+-------------------------+
I need to use cursor and take characters from this table and not just nesting multiple REPLACE
create or replace function f_replace(text in varchar2) return varchar2 is
ResultText varchar2(2000);
begin
for cur in (select t.symbol_to_replace, t.symbol_in_return_string from
t_symbols t) loop
ResultText:= Replace(text, cur.symbol_to_replace,
cur.symbol_in_return_string);
end loop;
return(ResultText);
end f_replace;
SQL has a function exactly for this. It is not REPLACE (where indeed you would need multiple iterations); it's the TRANSLATE function.
If the table contents may change, and you need to write a function that looks things up in the table at the time it is called, you could do something like the function I show below.
I am showing a complete example: First I create a table that will store the required substitutions. I only include the first few substitutions, because I want to show how the behavior of the function changes as the table is being modified - without needing to change anything about the function. (Which is the whole point of this.)
Then I show the function definition, and I demonstrate how it works. Then I insert two more rows in the substitutions table and I run exactly the same query; the result will now reflect the longer "list" of substitutions, as needed.
create table character_substitutions ( symbol_to_replace, symbol_in_return_string )
as
select 'K', 'Ќ' from dual union all
select 'k', 'ќ' from dual union all
select 'X', 'Ћ' from dual union all
select 'x', 'ћ' from dual union all
select 'A', 'Є' from dual union all
select 'a', 'є' from dual
;
create or replace function my_character_substitutions ( input_str varchar2 )
return varchar2
deterministic
as
symbols_to_replace varchar2(4000);
symbols_to_return varchar2(4000);
begin
select listagg(symbol_to_replace ) within group (order by rownum),
listagg(symbol_in_return_string) within group (order by rownum)
into symbols_to_replace, symbols_to_return
from character_substitutions;
return translate(input_str, symbols_to_replace, symbols_to_return);
end;
/
select 'Kags' as input_str, my_character_substitutions('Kags') as replaced_str
from dual;
INPUT_STR REPLACED_STR
---------- ------------
Kags Ќєgs
OK, so now let's insert a couple more rows into the table and run the same query. Notice how now the g is also substituted for.
insert into character_substitutions ( symbol_to_replace, symbol_in_return_string )
select 'G', 'Ѓ' from dual union all
select 'g', 'ѓ' from dual
;
select 'Kags' as input_str, my_character_substitutions('Kags') as replaced_str
from dual;
INPUT_STR REPLACED_STR
---------- ------------
Kags Ќєѓs
I have a table that contains patters for phone numbers, where x can match any digit.
+----+--------------+----------------------+
| ID | phone_number | phone_number_type_id |
+----+--------------+----------------------+
| 1 | 1234x000x | 1 |
| 2 | 87654311100x | 4 |
| 3 | x111x222x | 6 |
+----+--------------+----------------------+
Now, I might have 511132228 which will match with row 3 and it should return its type. So, it's kind of like SQL wilcards, but the other way around and I'm confused on how to achieve this.
Give this a go:
select * from my_table
where '511132228' like replace(phone_number, 'x', '_')
select *
from yourtable
where '511132228' like (replace(phone_number, 'x','_'))
Try below query:
SELECT ID,phone_number,phone_number_type_id
FROM TableName
WHERE '511132228' LIKE REPLACE(phone_number,'x','_');
Query with test data:
With TableName as
(
SELECT 3 ID, 'x111x222x' phone_number, 6 phone_number_type_id from dual
)
SELECT 'true' value_available
FROM TableName
WHERE '511132228' LIKE REPLACE(phone_number,'x','_');
The above query will return data if pattern match is available and will not return any row if no match is available.