Splitting comma separated values to multiple columns - sql

I am trying to split a comma separated column to multiple columns using SQL such that each separated value is under its own column on Snowflake
This is a sample of my table
"2015-01-01","00:52:44",161144,"3.1.0","x86_64","mingw32","Formula","1.1-2","US",1
This is what I have tried:
SELECT * FROM "TUTORIAL"."PUBLIC"."CRAN_LOGS" STRING_SPLIT('date','time','size','r_version','r_arch','r_os','package','version','country','ip_id',';')
This is giving me the error message:
SQL compilation error: syntax error line 1 at position 59 unexpected
''date''. syntax error line 1 at position 149 unexpected ')'.

You are looking for split_part
SPLIT_PART(<string>, <delimiter>, <partNumber>)
Just be careful with column names though. Some of them could be reserved keywords in SQL and might throw an error
with your_table as
(select 'date,time,size,r_version,r_arch,r_os,package,version,country,ip_id' as col)
select split_part(col,',',1) as date,
split_part(col,',',2) as time,
split_part(col,',',3) as size,
split_part(col,',',4) as r_version,
split_part(col,',',5) as r_arch,
split_part(col,',',6) as r_os,
split_part(col,',',7) as package,
split_part(col,',',8) as version,
split_part(col,',',9) as country,
split_part(col,',',10) as ip_id
from your_table;

Related

Why do I keep getting this Syntax Error in BigQuery

I keep getting this syntax error:
Syntax error: SELECT list must not be empty at [2:3]
on my SQL query:
Select
FROM 'tfdalissabuck.assessment_01.experiment'
WHERE experiment_cohort = 'treatment' AND minutes_listening_during_experiment > '60'
You need something between SELECT AND FROM, a single or multiple column names.
Also I believe you are using single quotes around the table name, you should be using a tick mark, and if minutes_listening_during_experiment is an integer you can remove the quotes around it
SELECT *
FROM `tfdalissabuck.assessment_01.experiment`
WHERE experiment_cohort = 'treatment'
AND minutes_listening_during_experiment > 60

Pulling data out of a text field

I have a notetext field that will have 2 fields in it: price and desc
The price is never the same amount of characters and is in text format
Example: (51445 text description) or (9801 text description)
How do I set it up to pull the amount out and then have it formatted with 2 decimals?
I have tried: LEFT(note_text,CHARINDEX(' ', note_text) - 1) AS Incoming_Cust_Prc but this isn't working. I am getting an error
ORA-00904 LEFT invalid identifier
The error message suggests that the DBMS is Oracle rather than SQL Server in which there are functions such as LEFT() and CHARINDEX(). Thus, use Oracle functions as
SELECT TO_CHAR(SUBSTR(note_text,1,INSTR(note_text,' ')), 'fm999G999D00') AS Incoming_Cust_Prc,
SUBSTR(note_text,INSTR(note_text,' ')+1,LENGTH(note_text)) AS Incoming_Cust_Desc
FROM t
in order to get two seperate columns price and description.
Demo

Getting an error: Argument data type varchar is invalid for argument 2 of substring function

I'm trying to get a substring from the value of a column and I'm getting the following error Argument data type varchar is invalid for argument 2 of substring function.
The column type is NvarChar(50) and is a system column for an application, so I can't modify it.
Ideally I'd just be able to select the substring as part of the query without having to alter the table, or create a view or another table.
Here's my query
SELECT SUBSTRING(INVOICE__, ':', 1)
FROM dwsystem.dbo.DWGroup
Im trying to select only everything in the string after a specific character. In this case the : character.
Use charindex with : as the first argument
select substring(invoice__,charindex(':',invoice__)+1,len(invoice__))
from dwsystem.dbo.dwgroup
SUBSTRING parameter is start position and end position so both parameter will be number like below
SELECT SUBSTRING(INVOICE__, 1, 1)
FROM dwsystem.dbo.DWGroup
you can use SUBSTRING_INDEX as you used mysql
SELECT SUBSTRING_INDEX(INVOICE__,':',-1);
example
SELECT SUBSTRING_INDEX('mytestpage:info',':',-1); it will return
info

Invalid digits on Redshift

I'm trying to load some data from stage to relational environment and something is happening I can't figure out.
I'm trying to run the following query:
SELECT
CAST(SPLIT_PART(some_field,'_',2) AS BIGINT) cmt_par
FROM
public.some_table;
The some_field is a column that has data with two numbers joined by an underscore like this:
some_field -> 38972691802309_48937927428392
And I'm trying to get the second part.
That said, here is the error I'm getting:
[Amazon](500310) Invalid operation: Invalid digit, Value '1', Pos 0,
Type: Long
Details:
-----------------------------------------------
error: Invalid digit, Value '1', Pos 0, Type: Long
code: 1207
context:
query: 1097254
location: :0
process: query0_99 [pid=0]
-----------------------------------------------;
Execution time: 2.61s
Statement 1 of 1 finished
1 statement failed.
It's literally saying some numbers are not valid digits. I've already tried to get the exactly data which is throwing the error and it appears to be a normal field like I was expecting. It happens even if I throw out NULL fields.
I thought it would be an encoding error, but I've not found any references to solve that.
Anyone has any idea?
Thanks everybody.
I just ran into this problem and did some digging. Seems like the error Value '1' is the misleading part, and the problem is actually that these fields are just not valid as numeric.
In my case they were empty strings. I found the solution to my problem in this blogpost, which is essentially to find any fields that aren't numeric, and fill them with null before casting.
select cast(colname as integer) from
(select
case when colname ~ '^[0-9]+$' then colname
else null
end as colname
from tablename);
Bottom line: this Redshift error is completely confusing and really needs to be fixed.
When you are using a Glue job to upsert data from any data source to Redshift:
Glue will rearrange the data then copy which can cause this issue. This happened to me even after using apply-mapping.
In my case, the datatype was not an issue at all. In the source they were typecast to exactly match the fields in Redshift.
Glue was rearranging the columns by the alphabetical order of column names then copying the data into Redshift table (which will
obviously throw an error because my first column is an ID Key, not
like the other string column).
To fix the issue, I used a SQL query within Glue to run a select command with the correct order of the columns in the table..
It's weird why Glue did that even after using apply-mapping, but the work-around I used helped.
For example: source table has fields ID|EMAIL|NAME with values 1|abcd#gmail.com|abcd and target table has fields ID|EMAIL|NAME But when Glue is upserting the data, it is rearranging the data by their column names before writing. Glue is trying to write abcd#gmail.com|1|abcd in ID|EMAIL|NAME. This is throwing an error because ID is expecting a int value, EMAIL is expecting a string. I did a SQL query transform using the query "SELECT ID, EMAIL, NAME FROM data" to rearrange the columns before writing the data.
Hmmm. I would start by investigating the problem. Are there any non-digit characters?
SELECT some_field
FROM public.some_table
WHERE SPLIT_PART(some_field, '_', 2) ~ '[^0-9]';
Is the value too long for a bigint?
SELECT some_field
FROM public.some_table
WHERE LEN(SPLIT_PART(some_field, '_', 2)) > 27
If you need more than 27 digits of precision, consider a decimal rather than bigint.
If you get error message like “Invalid digit, Value ‘O’, Pos 0, Type: Integer” try executing your copy command by eliminating the header row. Use IGNOREHEADER parameter in your copy command to ignore the first line of the data file.
So the COPY command will look like below:
COPY orders FROM 's3://sourcedatainorig/order.txt' credentials 'aws_access_key_id=<your access key id>;aws_secret_access_key=<your secret key>' delimiter '\t' IGNOREHEADER 1;
For my Redshift SQL, I had to wrap my columns with Cast(col As Datatype) to make this error go away.
For example, setting my columns datatype to Char with a specific length worked:
Cast(COLUMN1 As Char(xx)) = Cast(COLUMN2 As Char(xxx))

Error on query with "Replace"

I Have an error on a oracle server but I don't understand why is not work.
I use the software oracle sql developer.
The query is:
SELECT * FROM TestView WHERE REPLACE(TestView.Row2, '.', ',') > 0 ;
The value who is in TestVue.Row2 : '46.08','-46.47','1084.05','66500',...
"TestView" is a view who check to return a row without empty value
When I Execute the query I have always an error who says:
ORA-01722: invalid number
01722. 00000 - "invalid number"
*Cause:
*Action:
Thanks for your help
Zoners
You have a row in your table, which cannot be parsed into number format. Most likely it is a blank column.
SELECT * FROM TestVue WHERE TestVue.Row2 IS NULL ;
Use a NVL Function to avoid errors caused by null field and a TO_NUMBER function to cast into a number before comparision with zero
SELECT * FROM TestView WHERE TO_NUMBER(REPLACE(NVL(TestView.Row2,'0'), '.', ',')) > 0 ;
You should be able to find the row with something like:
select Row2
from TestVuw
where length(trim(translate(Row2, '+-.,0123456789', ' '))) = 0
Of course, this allows multiple +, -, ., and ,. It can be further refined using a regular expression, but this usually works to find the bad ones.