Splitting the string into columns to extract values using BigQuery - sql

How can i split the string by a specific character and extract the value of each. The idea is that i need to extract each word between the line including the start/end of the string as this information represents something. Is there a regex pattern ? or a way to split the info into columns ?
Name
A|B|C|D|E|F|G
Name col1 col2 col3 col4 col5 col6 col7
A|B|C|D|E|F|G A B C D E F G
I am using BigQuery for this and couldn't find a way to get the info of all of those. I tried the regex code which only works for the case where we have A|B|C.
I have to compare each column value and then create conditions using case when
CODE:
select
regexp_extract(name, "\\w+\\S(x|y)") as c2, -- gives either x or y
left(regexp_substr(name, "\\w+\\S\\w+\\S\\w+"),1) as c1,
right(regexp_extract(name, "\\w+\\S\\w+\\S\\w+"),1) as c3
from Table

Consider below approach
select * from (
select *
from your_table, unnest(split(name, '|')) value with offset
)
pivot(any_value(value) as col for offset in (0,1,2,3,4,5,6))
if applied to dummy data as in your question - output is

This seems like a use case for SPLIT().
select split(name,"|")[safe_offset(0)] as c1, split(name,"|")[safe_offset(1)] as c2, ..
from table
see https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#split
Added use of safe_offset instead of offset per Array index 74 is out of bounds (overflow) google big query

Related

How to split string based on column length and insert into table

I have a string that I need to split and create table from it.
00001 00000009716496000000000331001700000115200000000000
I know the exact length of each column:
Col1 = 5
Col2 = 7
Col3 = 23
etc...
I need something like this (Empty values are NULL's)
Can you direct me to the right way of doing that?
Use substring():
select substring(col, 1, 5) as col1,
substring(col, 6, 2) as col2,
. . .
you can use computed column to improve your performance(visit https://www.sqlservertutorial.net/sql-server-basics/sql-server-computed-columns/)
use below function to fill your column
SUBSTRING(string, start, length)

Select from column where value starts with 1

i have a table T
id val1
a 199.87
b 166.56
c 100.67
d 233.45
e 177.23
I want to select those rows where val1 starts with 1
rows where val1 starts with 19.
Is there any way of doing this in SQL Server.
The data type of val1 is float.
1) I want to select those rows where val1 starts with 1
SELECT
*
FROM Table_1
WHERE val1 LIKE '1%'
2) rows where val1 starts with 19.
SELECT
*
FROM Table_1
WHERE val1 LIKE '19%'
You can use CONVERT or cast function in where clause to cast float to character data type . Below queries will generate your desired result.
select * from table1
where convert(varchar,val) like '1%';
select * from table1
where cast(val as char(10)) like '1%';
You can use anyone of the above.
Result
id val
-----------
a 199,87
b 166,56
c 100,67
e 177,23
To return val like 19, just replace 1 with 19.
You can check the demo here
If your data values will never exceed 200 it would be preferable not to convert them to another data type just to answer this query. Instead leave them as floats and assess them using a range comparison
SELECT * FROM t WHERE val1 >= 100 AND val1 < 200
Similarly for 19x.xx values, make it >= 190
Avoid converting data wherever possible; it adds overhead which affects query performance, and also means that indexes cannot be used - this can massively impact the performance of a query
You can simply use LIKE % conditioning when filtering results you selct from the table and so it goes like this.
SELECT * FROM TABLE_NAME WHERE STR(VAL1, 10, 5) LIKE '19%'
This works for me whenever I select a row containing a specific char or number of a column containing value.
EDIT: Used STR() function to convert numeric data into character data.
Use like operator to get your results. No need to convert datatype of Val1.
rows where val1 starts with 1:
SELECT * FROM T WHERE val1 LIKE '1%';
rows where val1 starts with 19:
SELECT * FROM T WHERE val1 LIKE '19%';
This question already has many good answers, below answer is a different approach.
create table table1(
id char(10),
val float
);
insert into table1 values('a',199.87);
insert into table1 values('b',166.56);
insert into table1 values('c',100.67);
insert into table1 values('d',233.45);
insert into table1 values('e',177.23);
Query
select * from table1 where (CHARINDEX( '1',val, 1))=1;
select * from table1 where (CHARINDEX( '19',val, 1))=1;
output
id val
a 199,87
b 166,56
c 100,67
e 177,23
id val
a 199,87

SQL Strings - Filter by Hypen(x number)

I am trying to formulate a query that will allow me to find all records from a single column with 3 hyphens. An example of a record would be like XXXX-RP-XXXAS1-P.
I need to be able to sort through 1000s of records with either 2 or 3 hyphens.
You can REPLACE the hyphens in the string with an empty string and compute the difference of the length of original string and the replaced string to check for the number of hyphens.
select *
from yourtable
where len(column_name)-len(replace(column_name,'-',''))=3
and substring(column_name,9,1) not like '%[0-9]%'
If your records have 2 or 3 hyphens, then just do:
where col like '%-%-%-%'
This will get 3 or more hyphens. For exactly 3:
where col like '%-%-%-%' and col not like '%-%-%-%-%'
try this,
declare #t table(col1 varchar(50))
insert into #t values ('A-B'),('A-B-C-D-E'),('A-B-C-D')
select * from
(SELECT *
,(len(col1) - len(replace(col1, '-', ''))
/ len('-')) col2
FROM #T)t4
where col2=3

SQL Server: How to display a specific character based on position in a column

So I'm attempting to display a single character based on its position in a string from one column. Since this is grid data, there is a simple math to it. The grid has 24 rows 'A-X', and 44 columns.
So lets say I want to see the value in D9. I already know the expected value should be a 'A1', so that means the character length is '2'. If I do the math: (A + B + C = 3 x 44, + 9). That two-character value for D9 starts at the 141st position of that string in Col2. I attempted to use SUBSTRING with no success
SELECT
Col1 , SUBSTRING('Col2',141,2)
FROM Table1
Query result displays data in Col1, but for Col2 its just blank. What am I missing?
Asked too soon. Figured out I had to remove the ' from the column name
SELECT
Col1 , SUBSTRING('Col2',141,2)
FROM Table1
Didn't work
SELECT
Col1 , SUBSTRING(Col2,141,2)
FROM Table1
Works

pgsql parse string to get a string after certain position

I have a table column that has data like
NA_PTR_51000_LAT_CO-BOGOTA_S_A
NA_PTR_51000_LAT_COL_M_A
NA_PTR_51000_LAT_COL_S_A
NA_PTR_51000_LAT_COL_S_B
NA_PTR_51000_LAT_MX-MC_L_A
NA_PTR_51000_LAT_MX-MTY_M_A
I want to parse each column value so that I get the values in column_B. Thank you.
COLUMN_A COLUMN_B
NA_PTR_51000_LAT_CO-BOGOTA_S_A CO-BOGOTA
NA_PTR_51000_LAT_COL_M_A COL
NA_PTR_51000_LAT_COL_S_A COL
NA_PTR_51000_LAT_COL_S_B COL
NA_PTR_51000_LAT_MX-MC_L_A MX-MC
NA_PTR_51000_LAT_MX-MTY_M_A MX-MTY
I'm not sure of the Postgresql and I can't get SQL fiddle to accept the schema build...
substring and length may vary...
Select Column_A, substr(columN_A,18,length(columN_A)-17-4) from tableName
Ok how about this then:
http://sqlfiddle.com/#!15/ad0dd/56/0
Select column_A, b
from (
Select Column_A, b, row_number() OVER (ORDER BY column_A) AS k
FROM (
SELECT Column_A
, regexp_split_to_table(Column_A, '_') b
FROM test
) I
) X
Where k%7=5
Inside out:
Inner most select simply splits the data into multiple rows on _
middle select adds a row number so that we can use the use the mod operator to find all occurances of a 5th remainder.
This ASSUMES that the section of data you're after is always the 5th segment AND that there are always 7 segments...
Use regexp_matches() with a search pattern like 'NA_PTR_51000_LAT_(.+)_'
This should return everything after NA_PTR_51000_LAT_ before the next underscore, which would match the pattern you are looking for.