import multiple txt files continuously by filename and DateStamp

import multiple txt files continuously by filename and DateStamp - sql

I would like to set up an automated job to continuously insert multiple txt files into a table.
I have multiple dataloggers that output multiple (every minute) .txt files and are named by their datestamp, i.e. 20130921_1755.txt (YYYYMMDD_HHMM.txt). They also have a field named DateStamp, which contains date values by the second for each record.
I know what I want my query to do....
insert all filenames from directory into table ALLFILENAMES
select maximum date in final table TBLMEASUREMENTS
convert maximum date to filename (2013-09-22 17:53:00 to "20130922_1753.txt")
bulk insert all filenames > max date from table ALLFILENAMES
I have already started the process by using a post found here:
Import Multiple CSV Files to SQL Server from a Folder
I am having trouble trying to sort out how to select which specific files I need imported to the table. Especially since this is an ongoing job, I need to constantly look to the DB to see which files have not been imported, and then import them. Here's my code so far which works to import multiple files.
--some variables
declare #filename varchar(255),
#path varchar(255),
#sql varchar(8000),
#cmd varchar(1000)
--get the list of files to process:
SET #path = 'C:\SQL_txt\1_hr\'
SET #cmd = 'dir ' + #path + '*.txt /b'
INSERT INTO ALLFILENAMES(WHICHFILE)
EXEC Master..xp_cmdShell #cmd
UPDATE ALLFILENAMES SET WHICHPATH = #path where WHICHPATH is null
--cursor loop
declare c1 cursor for SELECT WHICHPATH,WHICHFILE FROM ALLFILENAMES where WHICHFILE like '%.txt%'
open c1
fetch next from c1 into #path,#filename
While ##fetch_status <> -1
begin
--bulk insert won't take a variable name, so make a sql and execute it instead:
set #sql = 'BULK INSERT Temp FROM ''' + #path + #filename + ''' '
+ ' WITH (
FIELDTERMINATOR = ''\t'',
ROWTERMINATOR = ''\n''
) '
print #sql
exec (#sql)
fetch next from c1 into #path,#filename
end
close c1
deallocate c1
I have been playing around with LEFT, LEN and REPLACE to try to convert the max datestamp into a filename, but have had no luck. Any help or suggestions would be useful. Am I going at this wrong? Thanks

I would do this with SSIS/Data Tools.
Import a file from your 'watch' folder, then move the file to a different folder, using:
ForEach Loop Container
Data Flow Task
File System Task
Derived Column (optional but recommended for source tracking)
You can use the mapped filepath variable in a derived column to indicate source file, and unless there's danger of the same files being added to the watch folder multiple times, there's little need to run the 'has this been imported already' check each time.
Many tutorials on SSIS available, here are two:
SSIS - Loop through Flat Files
SSIS - Move and Rename Files

Related

SQL SERVER bulk load

I am working on the SQL Server 2017. I need to import 20 text file into one table. Every text file has the same data type and column name. I have checked the data and they have in the same order also.
I need to import in SQL Table and create a new column, the last column saying that
Row 1 to Row 150 data comes from "textfile-1"
, Row151 to Row300 data comes from "textfile-2"
, Row301 to Row400 data comes from "textfile-3"
We don't have any packages like SSIS.
Can we do it in Advance SQL Query? if so can someone please guide me

SQL BULK INSERT
First of all you have to make sure that the table structure is identical with the file structure.
You can store the text files path inside a table, loop over these values using a cursor, build the command dynamically then execute the command:
DECLARE #strQuery VARCHAR(4000)
DECLARE #FileName VARCHAR(4000)
DECLARE file_cursor CURSOR
FOR SELECT FilePath FROM FilesTable
OPEN file_cursor
FETCH NEXT FROM file_cursor INTO #FileName;
WHILE ##FETCH_STATUS = 0
BEGIN
SET #strQuery = 'BULK INSERT SchoolsTemp
FROM ''' + #FileName + '''
WITH
(
FIELDTERMINATOR = '','', --Columns delimiter
ROWTERMINATOR = ''\n'', --Rows delimiter
TABLOCK
)
EXEC(#strQuery)
FETCH NEXT FROM file_cursor INTO #FileName;
END
CLOSE file_cursor
DEALLOCATE file_cursor
More information
BULK INSERT (Transact-SQL)
C# approach: SchemaMapper class library
If you are familiar with C#, recently i started a new project on Github, which is a class library developed using C#. You can use it to import tabular data from excel, word , powerpoint, text, csv, html, json and xml into a unified SQL server table. check it out at:
SchemaMapper: C# Schema mapping class library
You can follow this Wiki page for a step-by-step guide:
Import data from multiple files into one SQL table step by step guide

MS-SQL: Changing the FileGrowth parameters of a database generically

In our software the user can create databases as well as connect to databases that were not created by our software. The DBMS is Microsoft SQL-Server.
Now I need to update the databases that we use and set the FileGrowth parameter of all the files of all the databases to a certain value.
I know how to get the logical file names of the files of the current database from a query:
SELECT file_id, name as [logical_file_name], physical_name FROM sys.database_files
And I know how to set the desired FileGrowth value, once I know the logical file name:
ALTER DATABASE MyDB MODIFY FILE (Name='<logical file name>', FileGrowth=10%)
But I don't know how to combine these to steps into one script.
Since there are various databases I can't hard code the logical file names into the script.
And for the update process (right now) we only have the possibility to get the connection of a database and execute sql scripts on this connection, so a "pure" script solution would be best, if that's possible.

The following script receives a database name as parameter and uses 2 dynamic SQL: one for a cursor to cycle database files of chosen database and another to apply the proper ALTER TABLE command, since you can't use a variable for the file name on MODIFY FILE.
The EXEC is commented on both occasions and there's a PRINT instead, so you can review before executing. I've just tested it on my sandbox and it's working as expected.
DECLARE #DatabaseName VARCHAR(100) = 'DBName'
DECLARE #DynamicSQLCursor VARCHAR(MAX) = '
USE ' + #DatabaseName + ';
DECLARE #FileName VARCHAR(100)
DECLARE FileCursor CURSOR FOR
SELECT S.name FROM sys.database_files AS S
OPEN FileCursor
FETCH NEXT FROM FileCursor INTO #FileName
WHILE ##FETCH_STATUS = 0
BEGIN
DECLARE #DynamicSQLAlterDatabase VARCHAR(MAX) = ''
ALTER DATABASE ' + #DatabaseName + ' MODIFY FILE (Name = '''''' + #FileName + '''''', FileGrowth = 10%)''
-- EXEC (#DynamicSQLAlterDatabase)
PRINT (#DynamicSQLAlterDatabase)
FETCH NEXT FROM FileCursor INTO #FileName
END
CLOSE FileCursor
DEALLOCATE FileCursor '
-- EXEC (#DynamicSQLCursor)
PRINT (#DynamicSQLCursor)
You might want to check for the usual dynamic SQL caveats like making sure the values being concatenated won't break the SQL and also add error handling.
As for how to apply this to several databases, you can create an SP and execute it several times, or wrap a database name cursor / while loop over this.

SSIS OPENROWSET query flat file

I currently have a variable name called InvoiceFileName that is creating .csv files through a foreach loop. A list of .csv is then outputted to a folder.
I will then need to query off of each .csv file to select the header and the first row of data for each .csv.
I believe I need to use the OPENROWSET to query off of the .csv. I have 2 questions.
What is the syntax to query off of the variable name InvoiceFileName.
Is it possible to select the header field and first row of data OPENROWSET without inserting into a table.
Below is a simple OPENROWSET that only provides the header of the file.
SELECT
top 1 *
FROM OPENROWSET(BULK N'\\myservername\f$\reports\Invoices\CokeFiles\54ASBSd.csv', SINGLE_CLOB) AS Report

What kind of privs do you have on the database? If you have or can get slightly elevated privs, you can use BULK INSERT and xp_cmdShell to accomplish this, but like #scsimon said, you will have to use dynamic sql. Here's a quick example:
-----------------------------------------------------------------------------------------------------
-- Set up your variables
-----------------------------------------------------------------------------------------------------
DECLARE
#folderPath AS VARCHAR(100) = '\\some\folder\path\here\',
#cmd AS VARCHAR(150), -- Will populate this with a command to get a list of files in a directory
#InvoiceFileName AS VARCHAR(100), -- Will be used in cursor loop
#targetTable AS VARCHAR(50) = 'SomeTable',
#fieldTerminator AS CHAR(1) = ',',
#rowTerminator AS CHAR(2) = '\n'
-----------------------------------------------------------------------------------------------------
-- Create a temp table to store the file names
-----------------------------------------------------------------------------------------------------
IF OBJECT_ID('tempdb..#FILE_LIST') IS NOT NULL
DROP TABLE #FILE_LIST
--
CREATE TABLE #FILE_LIST(FILE_NAME VARCHAR(255))
-----------------------------------------------------------------------------------------------------
-- Get a list of the files and store them in the temp table:
-- NOTE: this DOES require elevated permissions
-----------------------------------------------------------------------------------------------------
SET #cmd = 'dir "' + #folderPath + '" /b'
--
INSERT INTO #FILE_LIST(FILE_NAME)
EXEC Master..xp_cmdShell #cmd
--------------------------------------------------------------------------------
-- Here we remove any null values
--------------------------------------------------------------------------------
DELETE #FILE_LIST WHERE FILE_NAME IS NULL
-----------------------------------------------------------------------------------------------------
-- Set up our cursor and loop through the files
-----------------------------------------------------------------------------------------------------
DECLARE c1 CURSOR FOR SELECT FILE_NAME FROM #FILE_LIST
OPEN c1
FETCH NEXT FROM c1 INTO #InvoiceFileName
WHILE ##FETCH_STATUS <> -1
BEGIN -- Begin WHILE loop
BEGIN TRY
-- Bulk insert won't take a variable name, so dynamically generate the
-- SQL statement and execute it instead:
SET #sql = 'BULK INSERT ' + #targetTable + ' FROM ''' + #InvoiceFileName + ''' '
+ ' WITH (
FIELDTERMINATOR = ''' + #fieldTerminator + ''',
ROWTERMINATOR = ''' + #rowTerminator + ''',
FIRSTROW = 1,
LASTROW = 2
) '
EXEC (#sql)
END TRY
BEGIN CATCH
-- Handle errors here
END CATCH
-- Continue your loop
FETCH NEXT FROM c1 INTO #path,#filename
END -- End WHILE loop
-- Do what you need to do here with the data in your target table
A few disclaimers:
I have not tested this code. Only copied from a slightly more complex proc I've used in the past that works for exactly this kind of scenario.
You will need elevated privs for BULK INSERT and xp_cmdShell.
I know people frown on using xp_cmdShell (and for good reason) but this is a quick and dirty solution making a lot of assumptions about what your environment is like.
This is assuming you're not grabbing the data as you get each file in your variable. If you are, you can skip the first part of this code.
This code also assumes you are doing your own error handling in places other than the one try/catch block you see. I've omitted a lot of that for simplicity.
For doing this through SSIS, ideally you'd probably need to use a format file for the bulk operation, but you'd have to have consistently formatted files and remove the SINGLE_CLOB option as well. A really hacky and non-ideal way to do this would be to do something like this:
Let's say your file contains this data:
Col1,Col2,Col3,Col4
Here's,The,First,Line
Here's,The,Second,Line
Here's,The,Third,Line
Here's,The,Fourth,Line
Then you could basically just parse the data doing something like this:
SELECT SUBSTRING(OnlyColumn, 0, CHARINDEX(CHAR(10), OnlyColumn, CHARINDEX(CHAR(10), OnlyColumn, 0)+1) )
FROM OPENROWSET(BULK '\\location\of\myFile.csv', SINGLE_CLOB) AS Report (OnlyColumn)
And your result would be this:
Col1,Col2,Col3,Col4 Here's,The,First,Line
This is obviously dependent on your line endings being consistent, but if you want the results in a single column and single row (as is the behavior of the bulk operation with the SINGLE_CLOB option), that should get you what you need.
You can take a look at the solution on this SO post for info on how to pass the SSIS variable value as a parameter to your query.

Use a Foreach Loop container to query all files in a folder. You can use wildcards for the file name, or user the variables in your DTS to set the properties of the components.
Inside the loop container you place a Data Flow Task with your source file connection, your transformations, and your destination.
You can modify the file names and paths of all these objects by setting their properties to variables in your DTS.
With an Expresion Task inside the loop, you can change the path of the CSV file connection.

How do I use bulk insert to import a file just based on its file extension?

I have a folder that new log files get created every hour. Each time the file name is different. How do I bulk insert just based on any file that has the extension .log? Here is my code
select * from [data_MaximusImport_t]
BULK
INSERT Data_MaximusImport_t
FROM 'C:\Program Files (x86)\DataMaxx\*.log'
WITH
(FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
Right now I get the error *.log" could not be opened. Operating system error code 123(The filename, directory name, or volume label syntax is incorrect.).
***this is an edit to my original question. I was able to figure out the file name with this code
DECLARE #Path varchar(256) = 'dir C:\datamaxx\*.log'
DECLARE #Command varchar(1024) = #Path + ' /A-D /B'
INSERT INTO myFileList
EXEC MASTER.dbo.xp_cmdshell #Command
SELECT * FROM myFileList
Now i just need to figure out how to stick that name in the path. SHould i delcare the file name as a variable?

You'll need dynamic SQL for this.
Assuming that the file names are already in myFileList, then this is how I would do it:
DECLARE #sql As VARCHAR(MAX);
SET #sql = '';
SELECT #sql = #sql + REPLACE('
BULK INSERT Data_MaximusImport_t
FROM ''C:\Program Files (x86)\DataMaxx\*''
WITH (FIELDTERMINATOR = '','', ROWTERMINATOR = ''\n'' );
', '*', myFileName)
FROM myFileList
WHERE myfileName != '';
PRINT #sql;
EXEC(#sql);

You unfortunately can't use wild cards in the file path with SQL server bulk inserts.
Possible workarounds are scripting a loop to get the filenames from the system and inserting one at a time, or using SSIS

Delete multiple files from folder using T-SQL without using cursor

I am writing a cleanup script. This script will run on weekend and clean up the db. Tables are related to Eamils and path of attachments are being stored in table. In cleanup of tables I also have to delete files from folder.
The path of files is like following.
\\xxx.xxx.xxx.xxx\EmailAttachments\Some Confirmation for xyz Children Centre_9FW4ZE1C57324B70EC79WZ15FT9FA19E.pdf
I can delete multiple files like following.
xp_cmdshell 'del c:\xyz.txt, abc.txt'
BUT when I create a CSV from table using FOR XML PATH('') the string cut off at the end. There might be 1000s of rows to delete so I don't want to use cursor to delete files from folder.
How can I delete files from folder
without using cursor
What permissions do I need on
network folder to delete files using t-sql from sql server
EDIT:
I have used cursor and it looks ok, not taking so much time. One problem which I am facing is
The sql server consider file name with space as two files like following statement
xp_cmdshell 'del E:\Standard Invite.doc'
throws error
Could Not Find E:\Standard
Could Not Find C:\Windows\system32\Invite.doc
NULL
Thanks.

Personally, I wouldn't worry too much about using a cursor here. Cursors are only 'mostly evil'; as your task isn't a set-based operation a cursor may be the most effective solution.

Although you have a comment stating that it will take an "awful lot of time" to use a cursor, in this case the biggest overhead is the actual delete of the file (not the cursor).
Note: The file deletion is done by the Operation System, not by the RDBMS.
As the delete is being done by calling xp_cmdshell, and because it it a procedure (not a function, etc), you can't call it and pass in a table's contents.
What you could do is build up a string, and execute that. But note, you are limitted to a maximum of 8000 characters in this string. As you have already said that you may have thousands of files, you will certaily not fit it within 8000 characters.
This means that you are going to need a loop no matter what.
DECLARE
#command VARCHAR(8000),
#next_id INT,
#next_file VARCHAR(8000),
#total_len INT
SELECT
#command = 'DEL ',
#total_len = 4
SELECT TOP 1
#next_id = id,
#next_file = file_name + ', '
FROM
table_of_files_to_delete
ORDER BY
id DESC
WHILE (#next_file IS NOT NULL)
BEGIN
WHILE ((#total_len + LEN(#next_file)) <= 8000) AND (#next_file IS NOT NULL)
BEGIN
SELECT
#command = #command + #next_file,
#total_len = #total_len + LEN(#next_file)
SELECT
#next_file = NULL
SELECT TOP 1
#next_id = id,
#next_file = file_name + ', '
FROM
table_of_files_to_delete
WHERE
id < #next_id
ORDER BY
id DESC
END
SET #command = SUBSTRING(#command, 1, #total_len - 2) -- remove the last ', '
EXEC xp_cmdshell #command
SELECT
#command = 'DEL ',
#total_len = 4
END
Not pretty, huh?
What you may be able do, depending on what needs deleting, is to use wild-cards. For example:
EXEC xp_cmdshell 'DELETE C:\abc\def\*.txt'

To delete files with space in name you need to enclose the filename with "
xp_cmdshell 'del "E:\Standard Invite.doc"'

DECLARE #deleteSql varchar(500)
,#myPath varchar(500) = '\\DestinationFolder\'
SET #deleteSql = 'EXEC master..xp_cmdshell ''del '+#myPath +'*.csv'''
EXEC(#deleteSql)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

import multiple txt files continuously by filename and DateStamp - sql

Related

SQL SERVER bulk load

MS-SQL: Changing the FileGrowth parameters of a database generically

SSIS OPENROWSET query flat file

How do I use bulk insert to import a file just based on its file extension?

Delete multiple files from folder using T-SQL without using cursor

Categories

Resources