I have a CSV file . But I cant use this file - sql

I have a cvs file . But when ı want import my cvs file to database ı cant . Because my datas shifts to other tables . Im using SQL Server Management Studio. I have a lot of data . How can ı import my file . This my Csv file and This is after import view

To handle rows which aren't loaded into table because of invalid data or format, could be handle using [a link] https://learn.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql?redirectedfrom=&view=sql-server-ver16 specify the error file name, it will write the rows having error to error file. code should look like.
BULK INSERT SchoolsTemp
FROM 'C:\CSVData\Schools.csv'
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = ',', --CSV field delimiter
ROWTERMINATOR = '\n', --Use to shift the control to next row
ERRORFILE = 'C:\CSVDATA\SchoolsErrorRows.csv',
TABLOCK
)

Related

trying to import csv file to table in sql

I have 4 csv files each having 500,000 rows. I am trying to import the csv data into my Exasol databse, but there is an error with the date column and I have a problem with the first unwanted column in the files.
Here is an example CSV file:
unnamed:0 , time, lat, lon, nobs_cloud_day
0, 2006-03-30, 24.125, -119.375, 22.0
1, 2006-03-30, 24.125, -119.125, 25.0
The table I created to import csv to is
CREATE TABLE cloud_coverage_CONUS (
index_cloud DECIMAL(10,0)
,"time" DATE -- PRIMARY KEY
,lat DECIMAL(10,6)
,lon DECIMAL(10,6)
,nobs_cloud_day DECIMAL (3,1)
)
The command to import is
IMPORT INTO cloud_coverage_CONUS FROM LOCAL CSV FILE 'D:\uni\BI\project 1\AOL_DB_ANALYSIS_TASK1\datasets\cloud\cfc_us_part0.csv';
But I get this error:
SQL Error [42636]: java.sql.SQLException: ETL-3050: [Column=0 Row=0] [Transformation of value='Unnamed: 0' failed - invalid character value for cast; Value: 'Unnamed: 0'] (Session: 1750854753345597339) while executing '/* add path to the 4 csv files, that are in the cloud database folder*/ IMPORT INTO cloud_coverage_CONUS FROM CSV AT 'https://27.1.0.10:59205' FILE 'e12a96a6-a98f-4c0a-963a-e5dad7319fd5' ;'; 04509 java.sql.SQLException: java.net.SocketException: Connection reset by peer: socket write error
Alternatively I use this table (without the first column):
CREATE TABLE cloud_coverage_CONUS (
"time" DATE -- PRIMARY KEY
,lat DECIMAL(10,6)
,lon DECIMAL(10,6)
,nobs_cloud_day DECIMAL (3,1)
)
And use this import code:
IMPORT INTO cloud_coverage_CONUS FROM LOCAL CSV FILE 'D:\uni\BI\project 1\AOL_DB_ANALYSIS_TASK1\datasets\cloud\cfc_us_part0.csv'(2 FORMAT='YYYY-MM-DD', 3 .. 5);
But I still get this error:
SQL Error [42636]: java.sql.SQLException: ETL-3052: [Column=0 Row=0] [Transformation of value='time' failed - invalid value for YYYY format token; Value: 'time' Format: 'YYYY-MM-DD'] (Session: 1750854753345597339) while executing '/* add path to the 4 csv files, that are in the cloud database folder*/ IMPORT INTO cloud_coverage_CONUS FROM CSV AT 'https://27.1.0.10:60350' FILE '22c64219-cd10-4c35-9e81-018d20146222' (2 FORMAT='YYYY-MM-DD', 3 .. 5);'; 04509 java.sql.SQLException: java.net.SocketException: Connection reset by peer: socket write error
(I actually do want to ignore the first column in the files.)
How can I solve this issue?
Solution:
IMPORT INTO cloud_coverage_CONUS FROM LOCAL CSV FILE 'D:\uni\BI\project 1\AOL_DB_ANALYSIS_TASK1\datasets\cloud\cfc_us_part0.csv' (2 .. 5) ROW SEPARATOR = 'CRLF' COLUMN SEPARATOR = ',' SKIP = 1;
I did not realise that mysql is different from exasol
Looking at the first error message, a few things stand out. First we see this:
[Column=0 Row=0]
This tells us the problem is with the very first value in the file. This brings us to the next thing, where the message even tells us what value was read:
Transformation of value='Unnamed: 0' failed
So it's failing to convert Unnamed: 0. You also provided the table definition, where we see the first column in the table is a decimal type.
This makes sense. Unnamed: 0 is not a decimal. For this to work, the CSV data MUST align with the data types for the columns in the table.
But we also see this looks like a header row. Assuming everything else matches we can fix it by telling the database to skip this first row. I'm not familiar with Exasol, but according to the documentation I believe the correct code will look like this:
IMPORT INTO cloud_coverage_CONUS
FROM LOCAL CSV FILE 'D:\uni\BI\project 1\AOL_DB_ANALYSIS_TASK1\datasets\cloud\cfc_us_part0.csv'
(2 FORMAT='YYYY-MM-DD', 3 .. 5)
ROW SEPARATOR = 'CRLF'
COLUMN SEPARATOR = ','
SKIP = 1;

Partitioning Data in SQL On-Demand with Blob Storage as Data Source

In Amazon Redshift there is a way to create a partition key when using your S3 bucket as a data source. Link.
I am attempting to do something similar in Azure Synapse using the SQL On-Demand service.
Currently I have a storage account that is partitioned such that it follows this scheme:
-Sales (folder)
- 2020-10-01 (folder)
- File 1
- File 2
- 2020-10-02 (folder)
- File 3
- File 4
To create a view and pull in all 4 files I ran the command:
CREATE VIEW testview3 AS SELECT * FROM OPENROWSET ( BULK 'Sales/*/*.csv', FORMAT = 'CSV', PARSER_VERSION = '2.0', DATA_SOURCE = 'AzureBlob', FIELDTERMINATOR = ',', FIRSTROW = 2 ) AS tv1;
If I run a query of SELECT * FROM [myview] I receive data from all 4 files.
How can I go about creating a partition key so that I could run a query such as
SELECT * FROM [myview] WHERE folderdate > 2020-10-01
so that I can only analyze data from Files 3 and 4?
I know I can edit my OPENROWSET BULK statement but I want to be able to get all the data from my container at first and then constrain searches as needed.
Serverless SQL can parse partitioned folder structure's using the filename (where you wish to load a specific file or files) and filepath (where you wish to load all files in this said path). More information on syntax and usage is available on documentation online.
In your case, you can parse all files from '2020-10-01' and beyond using the filepath syntax such as filepath(1) > '2020-10-01'
To expand on the answer from Raunak I ended up with the following syntax for my query.
DROP VIEW IF EXISTS testview6
GO
CREATE VIEW testview6 AS
SELECT *,
r.filepath(1) AS [date]
FROM OPENROWSET (
BULK 'Sales/*/*.csv',
FORMAT = 'CSV', PARSER_VERSION = '2.0',
DATA_SOURCE = 'AzureBlob',
FIELDTERMINATOR = ',',
FIRSTROW = 2
) AS [r]
WHERE r.filepath(1) IN ('2020-10-02');
You can adjust the granularity of your partitioning by the addition of extra wildcards (*) and r.filepath(x) statements.
For instance you can create your query such as:
DROP VIEW IF EXISTS testview6
GO
CREATE VIEW testview6 AS
SELECT *,
r.filepath(1) AS [year],
r.filepath(2) as [month]
FROM OPENROWSET (
BULK 'Sales/*-*-01/*.csv',
FORMAT = 'CSV', PARSER_VERSION = '2.0',
DATA_SOURCE = 'AzureBlob',
FIELDTERMINATOR = ',',
FIRSTROW = 2
) AS [r]
WHERE r.filepath(1) IN ('2020')
AND r.filepath(2) IN ('10');

Reading file containing multiple JSON objects using MS SQL Server

I need help reading JSON file in MSSQL
I have a json format like this
{
"_id":{
"$oid":"5c6ceb395916c77f71d9f531"
},
"uuid":"8337df01-7d98-4cdd-b5eb-7fafa88d3740",
"firstName":"TESTFN",
"lastName":"TESTLN",
"middleName":"TESTMN",
"gender":"MALE",
"nationality":"",
"mobileNumber":"",
"card":"TACC-00001",
"claims":{
"$numberLong":"1"
},
"hisId":{
"$numberLong":"0"
},
"hospitals":[
{
"$ref":"hospital",
"$id":{
"$oid":"5bd80e4e36ec8c600f3e750a"
}
}
]
}
I'm trying to read json using below code. It is working fine.
But when I use the exact source file which is repeated of above format and has a lot of fields, I'm only retrieving one row.
SELECT *
FROM OPENROWSET (BULK 'C:\demo_json\claims.json', SINGLE_CLOB) as j
CROSS APPLY OPENJSON(BulkColumn)
I've searched it and found out it is because of multi nested json.
Used below code..
SELECT *
FROM OPENROWSET (BULK 'C:\demo_json\claims.json', SINGLE_CLOB) as j
CROSS APPLY OPENJSON('['+BulkColumn+']')
But I'm getting error: JSON text is not properly formatted. Unexpected character '{' is found at position 2760.
Note: I have no control or cant edit the source file JSON
What can I do to make sure JSON file is in correct format or make it correct for me to read it.
Source file : claims
It seems, that your input file has lines, which are valid JSON objects. One possible approach in this situation is to import this file using BULK INSERT and appropriate ROWTERMINATOR:
Text file (claims.json):
{"_id":{"$oid":"5c6ceb395916c77f71d9f536"},"_class":"stash.ph.data.model.mongo.claims.Claims","uuid":"02735305-220e-4399-9217-0b99939797d2","submittedAt":{"$date":{"$numberLong":"1550641993123"}},"submittedByUuid":"c60d326e-0893-4718-a0e0-f64ac697dd2e","status":"SUBMITTED","billingDesignation":"Hospital","createdAt":{"$date":{"$numberLong":"1550641977093"}},"updatedAt":{"$date":{"$numberLong":"1550641977093"}},"particulars":[{"_id":{"$oid":"5c6ceb395916c77f71d9f530"},"uuid":"b2b3cc59-b9a8-49f1-8f92-30021d2c7abb","particular":"ER_CONSULTATION","procedure":[],"doctors":[{"_id":{"$oid":"5c6ceb395916c77f71d9f52f"},"uuid":"494e1bf9-a3f9-4c3a-bdc0-3ef15b4760a2","type":"PRIMARY","professionalFee":{"$numberDouble":"500.0"},"doctor":{"$ref":"doctor","$id":{"$oid":"5bd8156736ec8c600f3e7604"}}}]}],"patient":{"_id":{"$oid":"5c6ceb395916c77f71d9f531"},"uuid":"8337df01-7d98-4cdd-b5eb-7fafa88d3740","firstName":"TESTFN","email":"","lastName":"TESTLN","middleName":"TESTMN","suffix":"","prefix":"","birthday":"","age":"","gender":"MALE","nationality":"","mobileNumber":"","card":"TACC-00001","cardExpiration":"","company":"","claims":{"$numberLong":"1"},"phicNo":"","hisId":{"$numberLong":"0"},"hospitals":[{"$ref":"hospital","$id":{"$oid":"5bd80e4e36ec8c600f3e750a"}}],"hmos":[]},"files":[{"_id":{"$oid":"5c6ceb395916c77f71d9f533"},"uuid":"8731cb71-e424-4147-8158-38558df56ad5","filename":"8731cb71-e424-4147-8158-38558df56ad5.pdf","description":"files desc"}],"medicalData":{"_id":{"$oid":"5c6ceb395916c77f71d9f534"},"uuid":"617e193b-2280-491b-b34f-adf6eaeb2244","complain":[""],"diagnosis":["A90 | Dengue fever [classical dengue]"],"remarks":"","dateExamined":{"$date":{"$numberLong":"1546272000000"}}},"loa":{"_id":null,"uuid":"b4bc96d1-554b-4cc7-9511-c56d325cf5de","referenceId":""},"payment":{"_id":{"$oid":"5c6ceb395916c77f71d9f535"},"uuid":"9714fdf1-311f-4898-8070-f3fd1ddd0658","phicBill":{"$numberDouble":"0.0"},"hospitalBill":{"$numberDouble":"0.0"},"total":{"$numberDouble":"500.0"},"balance":{"$numberDouble":"500.0"},"createdAt":{"$date":{"$numberLong":"1550641977101"}},"updatedAt":{"$date":{"$numberLong":"1550641977101"}}},"hmoRepresentative":"","remarks":"","registryNumber":{"$numberLong":"0"},"soaNumber":"MBC-MCD-OUT-190000001","soaUuid":"2366abe1-672b-456b-bfd9-5584bf9e7f5a","batchName":"BATCH-190000001","claimLogs":[{"_id":{"$oid":"5c6ceb395916c77f71d9f532"},"uuid":"e2079956-0fbd-4565-9a93-0791a898fbcd","status":"PENDING","date":{"$date":{"$numberLong":"1550641977093"}},"userUuid":"c60d326e-0893-4718-a0e0-f64ac697dd2e","firstName":"Medicard","middleName":"","lastName":"Hospital"}],"messages":[],"hospital":{"$ref":"hospital","$id":{"$oid":"5bd80e4e36ec8c600f3e750a"}},"hmo":{"$ref":"hmo","$id":{"$oid":"598615f970d8a672a291132e"}}}
{"_id":{"$oid":"5c6ceda95916c77f71d9f574"},"_class":"stash.ph.data.model.mongo.claims.Claims","uuid":"73e6627e-187a-4f9f-a141-e4e01666c7c4","submittedAt":{"$date":{"$numberLong":"1550642786142"}},"submittedByUuid":"c60d326e-0893-4718-a0e0-f64ac697dd2e","status":"RETURN","billingDesignation":"Hospital","createdAt":{"$date":{"$numberLong":"1550642601507"}},"updatedAt":{"$date":{"$numberLong":"1550642601507"}},"particulars":[{"_id":{"$oid":"5c6ceda95916c77f71d9f56e"},"uuid":"8c774847-e0ca-4732-9669-6bf6a4579cc6","particular":"ER_CONSULTATION","procedure":[],"doctors":[{"_id":{"$oid":"5c6ceda95916c77f71d9f56d"},"uuid":"0520ffdf-ceb9-4413-8cec-c5eb728b8972","type":"PRIMARY","professionalFee":{"$numberDouble":"500.0"},"doctor":{"$ref":"doctor","$id":{"$oid":"5bd8156736ec8c600f3e7604"}}}]}],"patient":{"_id":{"$oid":"5c6ceda95916c77f71d9f56f"},"uuid":"e71116f6-710e-4e13-b806-c88cf172f5ca","firstName":"FGSDF","email":"","lastName":"FDGDS","middleName":"FGHDSF","suffix":"","prefix":"","birthday":"","age":"","gender":"MALE","nationality":"","mobileNumber":"","card":"TACC-00002","cardExpiration":"","company":"","claims":{"$numberLong":"1"},"phicNo":"","hisId":{"$numberLong":"0"},"hospitals":[{"$ref":"hospital","$id":{"$oid":"5bd80e4e36ec8c600f3e750a"}}],"hmos":[]},"files":[{"_id":{"$oid":"5c6ceda95916c77f71d9f571"},"uuid":"cc53c964-9440-413e-87b6-24fa86ccaa39","filename":"cc53c964-9440-413e-87b6-24fa86ccaa39.pdf","description":"files desc"}],"medicalData":{"_id":{"$oid":"5c6ceda95916c77f71d9f572"},"uuid":"d683a97b-f440-45a3-adef-b3df81be2fdc","complain":[""],"diagnosis":["A90 | Dengue fever [classical dengue]"],"remarks":"","dateExamined":{"$date":{"$numberLong":"1546358400000"}}},"loa":{"_id":null,"uuid":"3e929137-44c3-45f0-8cb5-fef1d83ff866","referenceId":""},"payment":{"_id":{"$oid":"5c6ceda95916c77f71d9f573"},"uuid":"6b0ed0d3-2993-4605-b4cd-421d84656921","phicBill":{"$numberDouble":"0.0"},"hospitalBill":{"$numberDouble":"0.0"},"total":{"$numberDouble":"500.0"},"balance":{"$numberDouble":"500.0"},"createdAt":{"$date":{"$numberLong":"1550642601512"}},"updatedAt":{"$date":{"$numberLong":"1550642601512"}}},"hmoRepresentative":"","remarks":"lack of documents","registryNumber":{"$numberLong":"0"},"approvedByUuid":"6fe1a2df-2a87-4b1e-a9b7-85a8b46eec19","soaNumber":"MBC-ILE-OUT-190000001","soaUuid":"6198678b-4254-4c89-a3d1-54e0366e751c","batchName":"BATCH-190000002","claimLogs":[{"_id":{"$oid":"5c6ceda95916c77f71d9f570"},"uuid":"96077dba-166b-47bd-a8d9-b06bee38e763","status":"PENDING","date":{"$date":{"$numberLong":"1550642601507"}},"userUuid":"c60d326e-0893-4718-a0e0-f64ac697dd2e","firstName":"Medicard","middleName":"","lastName":"Hospital"}],"messages":[],"hospital":{"$ref":"hospital","$id":{"$oid":"5bd80e4e36ec8c600f3e750a"}},"hmo":{"$ref":"hmo","$id":{"$oid":"5985c32870d8a672a291129d"}}}
Statement:
CREATE TABLE #Data (
BulkColumn nvarchar(max)
)
BULK INSERT #Data
FROM 'C:\demo_json\claims.json'
WITH (ROWTERMINATOR = '0x0A')
SELECT *
FROM #Data d
CROSS APPLY OPENJSON(d.BulkColumn) j
Notes:
You may try to use basic string transformations to build a valid JSON array, but reading the file line by line should be your first option:
SELECT *
FROM OPENROWSET (BULK 'C:\demo_json\claims.json', SINGLE_CLOB) d
CROSS APPLY OPENJSON('[' + REPLACE(d.BulkColumn, '}' + CHAR(10) + '{', '},{') + ']') j

SQL import 10000+ .csv files

Unfortunately I have had issues with my storage and was forced to reacquire data. However, this came in many .csv files and don't know how to import all of them without doing it one by one. I would like to have the 10000+ .csv files into one table and would like help with coding all imports one time.
All of the files have the same schema:
'Symbol' (varchar(15))
'Date' (Date)
'Open' (Float)
'High' (Float)
'Low' (Float)
'Close' (Float)
'Volume' (Int)
Also: All files will have the same structure for their naming:
XXXXXX_YYYYMMDD
(XXXXXX is the name of the market; I have 7 unique names)
Create Table [investment data 1].dbo.AA
(
Symbol varchar(15),
[Date] Date,
[Open] Float,
High Float,
Low Float,
[Close] Float,
Volume Int
)
At this point I do not know how to generate a loop that will look at all files in the "Investment Data" folder; the below example is the sample code for one .csv file. If there is a better way than "bulk insert" then I will modify the statement below.
bulk insert [investment data 1].dbo.AA
from 'R:\Investment Data\NASDAQ_20090626.csv'
with
(
firstrow=2
,rowterminator = '\n'
,fieldterminator = ','
)
Any help is appreciated; if I can be more clear please let me know. Thanks for your time.
Does what you wrote (for that one file) work ?
Great.
Open a dos prompt
Navigate to the folder with your 10,000 files
type DIR /b >c:\temp\files.txt
Now install a decent text editor, like Notepad++ (these instructions are for notepad ++)
Open c:\temp\files.txt in that editor
Open the find/replace dialog, place a tick next to "Extended (\n, \r..." - this makes it match newlines, and support newlines in replacements
Put this in Find: \r\n
Put this in Replace: ' with(firstrow=2,rowterminator = '\\n',fieldterminator = ',');\r\nbulk insert [investment data 1].dbo.AA from 'R:\Investment Data\
This will make your list of files that used to look like this:
a.txt
b.txt
c.txt
d.txt
Look like this:
a.txt' with(firstrow=2,rowterminator = '\n',fieldterminator = ',')
bulk insert [investment data 1].dbo.AA from 'R:\Investment Data\b.txt' with(firstrow=2,rowterminator = '\n',fieldterminator = ',');
bulk insert [investment data 1].dbo.AA from 'R:\Investment Data\c.txt' with(firstrow=2,rowterminator = '\n',fieldterminator = ',');
bulk insert [investment data 1].dbo.AA from 'R:\Investment Data\d.txt' with(firstrow=2,rowterminator = '\n',fieldterminator = ',');
bulk insert [investment data 1].dbo.AA from 'R:\Investment Data\
Now just clean up the first and last lines so it's a proper SQL. Paste and run in SSMS

Exporting data containing line feeds as CSV from PostgreSQL

I'm trying to export data From postgresql to csv.
First i created the query and tried exporting From pgadmin with the File -> Export to CSV. The CSV is wrong, as it contains for example :
The header : Field1;Field2;Field3;Field4
Now, the rows begin well, except for the last field that it puts it on another line:
Example :
Data1;Data2;Data3;
Data4;
The problem is i get error when trying to import the data to another server.
The data is From a view i created.
I also tried
COPY view(field1,field2...) TO 'C:\test.csv' DELIMITER ',' CSV HEADER;
It exports the same file.
I just want to export the data to another server.
Edit:
When trying to import the csv i get the error :
ERROR : Extra data after the last expected column. Context Copy
actions, line 3: <<"Data1, data2 etc.">>
So the first line is the header, the second line is the first row with data minus the last field, which is on the 3rd line, alone.
In order for you to export the file in another server you have two options:
Creating a shared folder between the two servers, so that the
database also has access to this directory.
COPY (SELECT field1,field2 FROM your_table) TO '[shared directory]' DELIMITER ',' CSV HEADER;
Triggering the export from the target server using the STDOUT of
COPY. Using psql you can achieve this running the following
command:
psql yourdb -c "COPY (SELECT * FROM your_table) TO STDOUT" > output.csv
EDIT: Addressing the issue of fields containing line feeds (\n)
In case you wanna get rid of the line feeds, use the REPLACE function.
Example:
SELECT E'foo\nbar';
?column?
----------
foo +
bar
(1 Zeile)
Removing the line feed:
SELECT REPLACE(E'foo\nbaar',E'\n','');
replace
---------
foobaar
(1 Zeile)
So your COPY should look like this:
COPY (SELECT field1,REPLACE(field2,E'\n','') AS field2 FROM your_table) TO '[shared directory]' DELIMITER ',' CSV HEADER;
the described above export procedure is OK, e.g:
t=# create table so(i int, t text);
CREATE TABLE
t=# insert into so select 1,chr(10)||'aaa';
INSERT 0 1
t=# copy so to stdout csv header;
i,t
1,"
aaa"
t=# create table so1(i int, t text);
CREATE TABLE
t=# copy so1 from stdout csv header;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> i,t
1,"
aaa"
>> >> >> \.
COPY 1
t=# select * from so1;
i | t
---+-----
1 | +
| aaa
(1 row)