Split a String in Bigquery? (Split on character numbers instead of delimiter)? - google-bigquery

I have the following dataset:
| Column1 |
| 100BB7832036 B120501|
I would like the output to look like:
|column 1 | column 2 | column 3 | column 4 |
|100BB7832036| B | 1205 | 01 |
I am having trouble splitting this string as the only delimiter is ' ', and I am not sure if it is possible to split this string based on the character values (ex: values 0-11 would give 100BB7832036, and value 13 would give B, and values 14-17 would give 1205, and values 18-19 would give 01)
So far I have tried:
split(column, ' ')[offset(0)] as Colum1
split(column, ' ')[offset(1)] as Column2
however this results in
| Column 1 | Column 2 |
| 100BB7832036| |
where column 2 is blank
Any help or suggestions would be greatly appreciated!
Thanks!

You can use SUBSTR function to split string into column(s) with this syntax
SUBSTR(column, start, [length])
Feel free to adjust the start index and length for your use case.
with example as (
select "100BB7832036 B120501" as column1
)
select
substr(column1, 0, 13) as col1,
substr(column1,14, 1) as col2,
substr(column1,15, 4) as col3,
substr(column1,19) as col3,
from example
Output:
col1 col2 col3 col4
100BB7832036 B 1205 01
Example:

Related

Sort each character in a string from a specific column in Snowflake SQL

I am trying to alphabetically sort each value in a column with Snowflake. For example I have:
| NAME |
| ---- |
| abc |
| bca |
| acb |
and want
| NAME |
| ---- |
| abc |
| abc |
| abc |
how would I go about doing that? I've tried using SPLIT and the ordering the rows, but that doesn't seem to work without a specific delimiter.
Using REGEXP_REPLACE to introduce separator between each character, STRTOK_SPLIT_TO_TABLE to get individual letters as rows and LISTAGG to combine again as sorted string:
SELECT tab.col, LISTAGG(s.value) WITHIN GROUP (ORDER BY s.value) AS result
FROM tab
, TABLE(STRTOK_SPLIT_TO_TABLE(REGEXP_REPLACE(tab.col, '(.)', '\\1~'), '~')) AS s
GROUP BY tab.col;
For sample data:
CREATE OR REPLACE TABLE tab
AS
SELECT 'abc' AS col UNION
SELECT 'bca' UNION
SELECT 'acb';
Output:
Similar implementation as Lukasz's, but using regexp_extract_all to extract individual characters in the form of an array that we later split to rows using flatten . The listagg then stitches it back in the order we specify in within group clause.
with cte (col) as
(select 'abc' union
select 'bca' union
select 'acb')
select col, listagg(b.value) within group (order by b.value) as col2
from cte, lateral flatten(regexp_extract_all(col,'.')) b
group by col;

Replace values in a column for all rows

I have a column with entries like:
column:
156781
234762
780417
and would like to have the following:
column:
0000156781
0000234762
0000780417
For this I use the following query:
Select isnull(replicate('0', 10 - len(column)),'') + rtrim(column) as a from table)
However, I don't know how to replace the values in the whole column.
I already tried with:
UPDATE table
SET column= (
Select isnull(replicate('0', 10 - len(column)),'') + rtrim(column) as columnfrom table)
But I get the following error.
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
The answer to your question is going to depend on the data type of your column. If it is a text column for example VARCHAR then you can modify the value in the table. If it is a number type such as INT it is the value and not the characters which is stored.
We can also express this by saying that "0" + "1" = "01" whilst 0 + 1 = 1.
In either case we can format the value in a query.
create table numberz(
val1 int,
val2 varchar(10));
insert into numberz values
(156781,'156781'),
(234762,'234762'),
(780417,'780417');
/* required format
0000156781
0000234762
0000780417
*/
select * from numberz;
GO
val1 | val2
-----: | :-----
156781 | 156781
234762 | 234762
780417 | 780417
UPDATE numberz
SET val1 = isnull(
replicate('0',
10 - len(val1)),'')
+ rtrim(val1),
val2 = isnull(
replicate('0',
10 - len(val2)),'')
+ rtrim(val2);
GO
3 rows affected
select * from numberz;
GO
val1 | val2
-----: | :---------
156781 | 0000156781
234762 | 0000234762
780417 | 0000780417
select isnull(
replicate('0',
10 - len(val1)),'')
+ rtrim(val1)
from numberz
GO
| (No column name) |
| :--------------- |
| 0000156781 |
| 0000234762 |
| 0000780417 |
db<>fiddle here
Usually, when we need to show values in specificity format these processes are performed using the CASE command or with other functions on the selection field list, mean without updating. In such cases, we can change our format to any format and anytime with changing functions. As dynamic fields.
For example:
select id, lpad(id::text, 6, '0') as format_id from test.test_table1
order by id
Result:
id format_id
-------------
1 000001
2 000002
3 000003
4 000004
5 000005
Maybe you really need an UPDATE, so I wrote a sample query for an UPDATE command too.
update test.test_table1
set
id = lpad(id::text, 6, '0');

in sql rows to columns conversion for one person having 2 or more emails in rows want output in single column

hey guys I have a small issue in SQL I have a table shown below
each name has at least 2 emails shown in the table I want an output in which the name is in one row and all emails side by side
if any solution help out
Table
col1 | col2
__________________________
abhi | xyz#email
abhi | abc#email
abhi | rst#email
ragu | str#email
ragu | pqr#email
expected output:
col1 | col2
abhi | xyz#email,abc#email,rst#email
ragu | str#email,pqr#email
You are looking for listagg():
select col1, listagg(col2, ',') within group (order by col2)
from t
group by col1;

PostgreSQL - Query a single column table containing 5 element array per row to return array split into 5 columns per row

I need to take the contents of this single column table, containing a 5 element array per row:
Col1
----
{a,b,c,d,e}
{f,g,h,i,j}
{a,f,r,y,t}
...
and achieve this:
colA | ColB | ColC | ColD | ColE
a | b | c | d | e
f | g | h | i | j
a | f | r | y | t
I have tried using unnest(string_to_array):
SELECT unnest(string_to_array('a,b,c,d,e', ',')) AS splitvals
but end up with
splitvals
a
b
c
d
e
But I need each array element split into a separate column without having to resort to split_part(). In other contexts I may have a 7 element array per row so it would be nice to have a more 'generic' query to achieve it for any array size, without just adding split_part() for each additional element.
Any ideas please?
Found answer using:
SELECT NULLIF(p[1],'') AS Col1,
NULLIF(p[2],'') AS Col2,
NULLIF(p[3],'') AS Col3,
NULLIF(p[4],'') AS Col4,
NULLIF(p[5],'') AS Col5
FROM (
SELECT string_to_array('a,b,c,d,e', ',') AS p
) x

SELECT only Unique values from Multiple Columns in SQL

I have to concatenate around 35 Columns in a table into a single string. The data within a column can be repetitive with different case, as per the below.
COL_1
apple | ORANGE | APPLE | Orange
COL_2
GRAPE | grape | Grape
The data in each column is pipe separated and I am trying to concatenate each column by separating with '|'. I expect the final output to be "apple | orange | grape" (All in lower case is fine)
But currently I am getting
apple | ORANGE | APPLE | Orange | GRAPE | grape | Grape
My current SQL is
SELECT COL_1 || '|' || COL_2 from TABLE_X;
Can some one explain me how to extract unique value from each column? This will reduce my string length drastically. My current SQL is exceeding Oracle's 4000 character limit.
I tried doing this
WITH test AS
( SELECT 'Test | test | Test' str FROM dual
)
SELECT *
FROM
(SELECT DISTINCT(LOWER(regexp_substr (str, '[^ | ]+', 1, rownum))) split
FROM test
CONNECT BY level <= LENGTH (regexp_replace (str, '[^ | ]+')) + 1
)
WHERE SPLIT IS NOT NULL;
This query produces only 'test'
Some how its producing unique values after splitting the string separated by ' | ' in a column. But doing this for 35+ columns in a single SQL query would be cumbersome. Could someone suggest a better approach?