I'm using a stored procedure to bulk insert a large csv file into a table, all in one field that is set to varchar(8000). I've had to do it this way as some of the data is enclosed in quotation marks and some are not. In SQL Server 2008 to be usable as a data file for bulk import, a CSV file must comply with the following restrictions:
Data fields never contain the field terminator.
Either none or all of the values in a data field are enclosed in quotation marks ("").
My data is thus:
Field1
"data", "data2", "data3", "data4", 123, 567, 354, 5,64,4565,54
Which is now in a temptable with SQL Server. How do I now clean the data and insert into a table to look like the below: (I already have this new table setup with the correct headings)
Field1
data
Field2
data 2
Field 3
data 3
And so on.
Ultimately it all needs to be performed in a stored procedure as it needs to be in reporting services. I've been looking at the functions, but how do I make it work when some of the fields do not have double quotes? Is the comma enough? Also is the XML function the best?
I take the following approach to importing data:
Insert the table into a staging table where all the columns are character strings.
Copy the table from there into the appropriate table with the column types and structure that I want.
In your case, the second part of the code would be:
insert into FinalTable(col1, . . . )
select (case when left(col1, 1) = '"' then replace(col1, '"', '')
else col1
end),
(case when isnumeric(col2) = 1 then cast(col2 as float)
end),
. . .
from StagingTable
There are, no doubt, solutions in SSIS or using a format file. I prefer a staging table approach because I find it easier to debug data issues from the database using the staging table.
Related
I am new to SQL and I was wondering if there was any way to search a table for instances with values from a column of an external table (csv file). To explain that in a clearer manner, this is what I'm working on: If a column in the csv file contains latitudes and another column contained longitudes; I want to search a table that contains information about several locations, with their Latitudes and Longitudes specified in the table.
I want to retrieve the information about that particular location with Latitudes and Longitudes as input from a csv file.
Would it look something like this? :
CREATE TABLE MyTable(
latitude DECIMAL(5, 2) NOT NULL,
longitude DECIMAL(5, 2) NOT NULL
);
LOAD DATA INFILE 'C:\Users\Admin\Desktop\Catalog.csv'
INTO TABLE MyTable
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
SELECT
main.object_id,
main.latitude,
main.longitude,
main.Some_Information
FROM
location_info AS main,
MyTable AS temp
WHERE
main.latitude = temp.latitude AND
main.longitude = temp.longitude
I also tried using psql's \copy like:
\copy MyTable FROM 'C:\Users\Admin\Desktop\Catalog.csv' WITH CSV;
As given here -> http://postgresguide.com/utilities/copy.html.
But this didn't work either. There was an error near "\" at or near copy, but then this could be because of the presence of an older version of psql.
Also I am not a Superuser, hence the use of \copy and not COPY FROM.
I also tried using a temporary table and using \copy alongside it. It gave the same error as above.
PostgreSQL does not support the LOAD DATA syntax you're using. You'll need to use COPY instead.
Your workflow should look more like:
CREATE TABLE MyTable(
latitude DECIMAL(5, 2) NOT NULL,
longitude DECIMAL(5, 2) NOT NULL
);
COPY MyTable(latitude, longitude)
FROM 'C:\Users\Admin\Desktop\Catalog.csv' WITH CSV;
SELECT
main.object_id,
main.latitude,
main.longitude,
main.Some_Information
FROM
location_info AS main
JOIN MyTable AS temp
on main.latitude = temp.latitude
and main.longitude = temp.longitude
There are three main things to notice here.
I've removed your \ from the COPY command.
I've specified the columns that you're trying to insert into with the COPY command. If the columns are in a different order in the CSV file, simply reorder them in the COPY expression.
I've changed the syntax of your join to a standard ANSI join. The logic is the same, but this is a better standard to use for readability/compatibility reasons.
I need to create external table for a hdfs location. The data is having null instead of empty space for few fields. If the field length is less than 4 for such fields, it is throwing error when selecting data. Is there a way to define replacement of all such nulls with empty space while creating table it self.?
I am trying it in greenplum, just tagged hive to see what can be done for such cases in hive.
You could use the serialization property for mapping NULL string to empty string.
CREATE TABLE IF NOT EXISTS abc ( ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE TBLPROPERTIES ("serialization.null.format"="")
In this case when you query it from hive you would get empty value for that field and hdfs would have "\N".
Or
If you want to represented empty string instead of '\N', you can using COALESCE function:
INSERT OVERWRITE tabname SELECT NULL, COALESCE(NULL,"") FROM data_table;
the answer to the problem is using NULL as 'null' statement in create table syntax for greenplum. As i have mentioned, i wanted to get few inputs from people who faced such issues in hive. so i have tagged hive as well. But, greenplum external table syntax supports NULL AS phrase in which we can specify the form of NULL that you want to keep.
I am exporting a csv file into hive table.
about the csv file : column values are enclosed within double-quotes , seperated by comma .
Sample record from csv
"4","good"
"3","not bad"
"1","very worst"
I created a hive table with the following statement,
create external table currys(review_rating string,review_comment string ) row format fields delimited by ',';
Table created .
now I loaded the data using the command load data local inpath and it was successful.
when I query the table,
select * from currys;
The result is :
"4" "good"
"3" "not bad"
"1" "very worst"
instead of
4 good
3 not bad
1 very worst
records are inserted with double-quotes which shouldnt be.
Please let me know how to get rid of this double quote .. any help or guidance is highly appreciated...
Thanks beforehand!
Are you using any serde? If so, then you can write a regex command in the SERDE PROPERTIES to remove the quotes.
Or you can use the csv-serde from here and define the quote character.
I have written an SQL script having the below query. The query works fine.
update partner set is_seller_buyer=1 where id in (select id from partner
where names in
(
'A','B','C','D','E',... -- Around 100 names.
));
But now instead of writing around 100 names in a query itself, I want to fetch all the names from the CSV file. I read about SQL*Loader on the Internet, but I did not get much on an update query.
My CSV file only contain names.
I have tried
load data
infile 'c:\data\mydata.csv'
into table partner set is_wholesaler_reseller=1
where id in (select id from partner
where names in
(
'A','B','C','D','E',... -- Around 100 names.
));
fields terminated by "," optionally enclosed by '"'
( names, sal, deptno )
How can I achieve this?
SQL*Loader does not perform updates, only inserts. So, you should insert your names into a separate table, say names, and run your update from that:
update partner set is_seller_buyer=1 where id in (select id from partner
where names in
(
select names from names
));
Your loader script can be changed to:
load data
infile 'c:\data\mydata.csv'
into table names
fields terminated by "," optionally enclosed by '"'
( names, sal, deptno )
An alternate to this is to use External Tables which allows Oracle to treat a flat file like it is a table. An example to get you started can be found here.
We need to store a select statement in a table
select * from table where col = 'col'
But the single quotes messes the insert statement up.
Is it possible to do this somehow?
From Oracle 10G on there is an alternative to doubling up the single quotes:
insert into mytable (mycol) values (q'"select * from table where col = 'col'"');
I used a double-quote character ("), but you can specify a different one e.g.:
insert into mytable (mycol) values (q'#select * from table where col = 'col'#');
The syntax of the literal is:
q'<special character><your string><special character>'
It isn't obviously more readable in a small example like this, but it pays off with large quantities of text e.g.
insert into mytable (mycol) values (
q'"select empno, ename, 'Hello' message
from emp
where job = 'Manager'
and name like 'K%'"'
);
How are you performing the insert? If you are using any sort of provider on the front end, then it should format the string for you so that quotes aren't an issue.
Basically, create a parameterized query and assign the value of the SQL statement to the parameter class instance, and let the db layer take care of it for you.
you can either use two quotes '' to represent a single quote ' or (with 10g+) you can also use a new notation:
SQL> select ' ''foo'' ' txt from dual;
TXT
-------
'foo'
SQL> select q'$ 'bar' $' txt from dual;
TXT
-------
'bar'
If you are using a programming language such as JAVA or C#, you can use prepared (parametrized) statements to put your values in and retrieve them.
If you are in SQLPlus you can escape the apostrophe like this:
insert into my_sql_table (sql_command)
values ('select * from table where col = ''col''');
Single quotes are escaped by duplicating them:
INSERT INTO foo (sql) VALUES ('select * from table where col = ''col''')
However, most database libraries provide bind parameters so you don't need to care about these details:
INSERT INTO foo (sql) VALUES (:sql)
... and then you assign a value to :sql.
Don't store SQL statements in a database!!
Store SQL Views in a database. Put them in a schema if you have to make them cleaner. There is nothing good that will happen ever if you store SQL Statements in a database, short of logging this is categorically a bad idea.
Also if you're using 10g, and you must do this: do it right! Per the FAQ
Use the 10g Quoting mechanism:
Syntax
q'[QUOTE_CHAR]Text[QUOTE_CHAR]'
Make sure that the QUOTE_CHAR doesnt exist in the text.
SELECT q'{This is Orafaq's 'quoted' text field}' FROM DUAL;