How do I load <file name>.csv.gz from snowflake stage into a snowflake table? - datatables

I have successfully loaded 1000 files into a Snowflake stage=MT_STAGE.
Every file has exact same schema.
Every file has exact same naming convention (filename).csv.gz
Every file is about 50 megs (+/- a couple megs).
Every file has between 115k-120k records.
Every file has 184 columns.
I have created a Snowflake table=MT_TABLE.
I keep on getting errors trying to do a "COPY INTO" to move files from stage into a single table.
I've tried countless variations of the command, with & without different options. I've spent 3 days reading documentation and trying to watch videos. I have failed. Can anyone help?
copy into MT_TABLE from #MT_STAGE;
Copy executed with 0 files processed
copy into MT_TABLE from #MT_STAGE (type=csv field_delimiter=”,” skip_header=1);
Syntax error: unexpected '('. (line 1)
copy into MT_TABLE from #MT_STAGE type=csv field_delimiter=”,” skip_header=1;
Syntax error: unexpected '”,'. (line 1)

So as per Mike's statement if there are comma's in your data
col_a
col_b
col c
no comma
one, comma
two,, commas
col_a, col_b, col_b
no comma, one, comma, two,, commas
how can anything tell which is the correct way to know what is in what
col_a
col_b
col c
no comma
one, comma
two,, commas
no comma, one
, comma
two,, commas
no comma
one, comma, two
, commas
no comma, one
, comma, two
, commas
no comma
one, comma, two,
commas
no comma, one
, comma, two,
commas
which is the correct line.
So you ether change the field delimeter from , to pipe | or you quote the data
no comma| one, comma| two,, commas
double quotes
"no comma","one, comma"," two,, commas"
single quotes
'no comma','one, comma',' two,, commas'
The cool thing is, if you change your column delimiter it has to not be in the in the data OR the data has to be quoted.
And if you change to quoting it has to not be in the filed OR it has to be escaped.
OR you can encode as some safe data type like base64 and it takes more space, but now it's transportation transport safe:
bm8gY29tbWE,IG9uZSwgY29tbWE,IHR3bywsIGNvbW1hcw

Related

How to break a string apart by a character the occurs multiple times in SQL

I am looking to break a column into two columns by a character. Some of the rows have the character occuring multiple times and I need to key on the last occuring one.
Example:
400000007_MOD-HUD_1-1.jpg
I want to break into two columns
Column 1: 400000007_MOD-HUD_1
Column 2: -1.jpg
The data looks like this,
200000297_R-1_1-1.jpg
400000007_MOD-HUD_1-1.jpg
500000334_R-1_1-1.jpg
500000334_R-2_MOD_HUD_1-1.jpg
500000342_MOD-HUD_1-1.jpg
1200000177_MOD-HUD_1-1.jpg
1300000433_C-1-EQSHED_1-1.jpg
1300000433_C-3-UB_1-1.jpg
2100000375_C_1-5_Barn_1-1.jpg
The character I want to split them by is "-". This character occurs multiple times in some of these file names and I want to key on the last occuring one.
Here's a possible solution you can try. Possible as I'm guessing you want to break on the last - character. Assuming ssms tag implies SQL Server, try the following:
select Left(col,Len(col)-p) Col1, Right(col,p) Col2
from t
cross apply (values(CharIndex('-',Reverse(col))))x(p)
Demo DB<>Fiddle

Delete specific pattern between commas in text file

I have thousand of SQL queries written over notepad++ line by line.Single line contain single SQL query.Every SQL query contain list of columns to be selected from database as comma separated values.Now we want certain columns not to be part of that list which follow a specific pattern/regular expression.The SQL query follows a specific pattern :
A trimmed column has been selected as alias 'PK'
Every query has got a 'dated'where condition at the end of it.
Sometimes the pattern which we wish to remove exist in either PK/where or both.we don't want to remove that column/pattern from those places.Just from the column selection list.
Below is the example of a SQL query :
select (TRIM(TAE_TSP_REC_UPDATE)) as PK,TAE_AMT_FAIR_MV,TAE_TXT_ACCT_NUM,TAE_CDE_OWNER_TYPE,TAE_DTE_AQA_ABA,TAE_RID_OWNER,TAE_FID_OWNER,TAE_CID_OWNER,TAE_TSP_REC_UPDATE from TABLE_TAX_REP where DATE(TAE_TSP_REC_UPDATE)>='03/31/2018'
After removal of columns/patterns query should look like below :
select (TRIM(TAE_TSP_REC_UPDATE)) as PK,TAE_AMT_FAIR_MV,TAE_TXT_ACCT_NUM,TAE_CDE_OWNER_TYPE,TAE_DTE_AQA_ABA from TABLE_TAX_REP where DATE(TAE_TSP_REC_UPDATE)>='03/31/2018'
want to remove below patterns from each and every query between the commas :
.FID.
.RID.
.CID.
.TSP.
If the pattern exist within TRIM/DATE function it should not be touched.It should only be removed from column selection list.
Could somebody please help me regarding above.Thanks in advance
You may use
(?:\G(?!^)|\sas\s(?=.*'\d{2}/\d{2}/\d{4}'$))(?:(?!\sfrom\s).)*?\K,?\s*[A-Z_]+_(?:[FRC]ID|TSP)_[A-Z_]+
Details
(?:\G(?!^)|\sas\s(?=.*'\d{2}/\d{2}/\d{4}'$)) - two alternatives:
\G(?!^) - the end of the previous location, not a position at the start of the line
| - or
\sas\s(?=.*'\d{2}/\d{2}/\d{4}'$) - an as surrounded with single whitespaces that is followed with any 0+ chars other than line break chars and then ', 2 digits, /, 2 digits, /, 4 digits and ' at the end of the line
(?:(?!\sfrom\s).)*? - consumes any char other than a linebreak char, 0 or more repetitions, as few as possible, that does not start whitespace, from, whitespace sequence
\K - a match reset operator discarding all text matched so far
,?\s* - an optional comma followed with 0+ whitespaces
[A-Z_]+_(?:[FRC]ID|TSP)_[A-Z_]+ - ASCII letters or/and _, 1 or more occurrences, followed with _, then F, R or C followed with ID or TSP, then _, and again 1 or more occurrences of ASCII letters or/and _.
See the regex demo.

SQL find all rows that do not have certain characters

I want to find all rows for which values in string column does not possess certain characters (to be specific [a-Z0-9 \t\n]) how can I do it in sql ?
I tried to do it with like operator
SELECT ***
where column like '%[^ a-Z0-9 \t\n]%'
however, it does not work and I get rows that possess characters and numbers.
To fetch all records that contain any characters other than alphabets, numbers, spaces, tabs and new-line delimiters:
SELECT ***
WHERE column like '%[^A-Za-z0-9 \t\n]%'
Note that [^A-Za-z0-9 \t\n] represents anything other than alphanumeric characters, spaces, tabs, and new line delimitters.
Your logic is inverted. I think you want:
where column not like '%[^ a-Z0-9 \t\n]%'
I don't think that SQL Server interprets \t and \n as special characters. You may need to insert the actual values for the characters. (See here.)
SELECT ***
WHERE column like '%[^A-Za-z0-9 \t\n]%'

Big Query Table Creation Confusion

I have to create a big query table with the schema as follows
snippet:STRING,comment_date:TIMESTAMP
And i have data as follows
"Love both of these brands , but the "" buy a $100k car , get or give a pair of $40 shoes "" message seems .",2015-06-22 00:00:00
"All Givens Best Commercial Ever",2015-06-22 00:00:00
I was confused because both the rows were accepted and were inserted in the table although in the first line all the snippet string is in between the double quotes but it also contains double quotes and comma in between
Why does not big query get confused there ?
When parsing CSV, BigQuery splits only on unquoted commas, and it treats double quotes "" as a single escaped quote character " when encountered inside a quoted string. So your input is valid CSV according to BigQuery.

SQL loader to load data into specific column of a table

Recently started working on SQL Loader, enjoying the way it works.
We are stuck with a problem where we have to load all the columns in csv format say (10 columns in excel)but the destination table contains around 15 fields.
filler works when you want you skip columns in source file but unsure what to do here.
using is staging table helps but is there any alternative?
Any help is really appreciated.
thanks.
You have to specify the columns in the control file
Recommended reading: SQL*Loader Control File Reference
10 The remainder of the control file contains the field list, which provides information about column formats in the table being loaded. See Chapter 6 for information about that section of the control file.
Excerpt from Chapter 6:
Example 6-1 Field List Section of Sample Control File
1 (hiredate SYSDATE,
2 deptno POSITION(1:2) INTEGER EXTERNAL(2)
NULLIF deptno=BLANKS,
3 job POSITION(7:14) CHAR TERMINATED BY WHITESPACE
NULLIF job=BLANKS "UPPER(:job)",
mgr POSITION(28:31) INTEGER EXTERNAL
TERMINATED BY WHITESPACE, NULLIF mgr=BLANKS,
ename POSITION(34:41) CHAR
TERMINATED BY WHITESPACE "UPPER(:ename)",
empno POSITION(45) INTEGER EXTERNAL
TERMINATED BY WHITESPACE,
sal POSITION(51) CHAR TERMINATED BY WHITESPACE
"TO_NUMBER(:sal,'$99,999.99')",
4 comm INTEGER EXTERNAL ENCLOSED BY '(' AND '%'
":comm * 100"
)
In this sample control file, the numbers that appear to the left would not appear in a real control file. They are keyed in this sample to the explanatory notes in the following list:
1 SYSDATE sets the column to the current system date. See Setting a Column to the Current Date.
2 POSITION specifies the position of a data field. See Specifying the Position of a Data Field.
INTEGER EXTERNAL is the datatype for the field. See Specifying the Datatype of a Data Field and Numeric EXTERNAL.
The NULLIF clause is one of the clauses that can be used to specify field conditions. See Using the WHEN, NULLIF, and DEFAULTIF Clauses.
In this sample, the field is being compared to blanks, using the BLANKS parameter. See Comparing Fields to BLANKS.
3 The TERMINATED BY WHITESPACE clause is one of the delimiters it is possible to specify for a field. See TERMINATED Fields.
4 The ENCLOSED BY clause is another possible field delimiter. See Enclosed Fields.