Delete multiple files from folder using T-SQL without using cursor - sql

I am writing a cleanup script. This script will run on weekend and clean up the db. Tables are related to Eamils and path of attachments are being stored in table. In cleanup of tables I also have to delete files from folder.
The path of files is like following.
\\xxx.xxx.xxx.xxx\EmailAttachments\Some Confirmation for xyz Children Centre_9FW4ZE1C57324B70EC79WZ15FT9FA19E.pdf
I can delete multiple files like following.
xp_cmdshell 'del c:\xyz.txt, abc.txt'
BUT when I create a CSV from table using FOR XML PATH('') the string cut off at the end. There might be 1000s of rows to delete so I don't want to use cursor to delete files from folder.
How can I delete files from folder
without using cursor
What permissions do I need on
network folder to delete files using t-sql from sql server
EDIT:
I have used cursor and it looks ok, not taking so much time. One problem which I am facing is
The sql server consider file name with space as two files like following statement
xp_cmdshell 'del E:\Standard Invite.doc'
throws error
Could Not Find E:\Standard
Could Not Find C:\Windows\system32\Invite.doc
NULL
Thanks.

Personally, I wouldn't worry too much about using a cursor here. Cursors are only 'mostly evil'; as your task isn't a set-based operation a cursor may be the most effective solution.

Although you have a comment stating that it will take an "awful lot of time" to use a cursor, in this case the biggest overhead is the actual delete of the file (not the cursor).
Note: The file deletion is done by the Operation System, not by the RDBMS.
As the delete is being done by calling xp_cmdshell, and because it it a procedure (not a function, etc), you can't call it and pass in a table's contents.
What you could do is build up a string, and execute that. But note, you are limitted to a maximum of 8000 characters in this string. As you have already said that you may have thousands of files, you will certaily not fit it within 8000 characters.
This means that you are going to need a loop no matter what.
DECLARE
#command VARCHAR(8000),
#next_id INT,
#next_file VARCHAR(8000),
#total_len INT
SELECT
#command = 'DEL ',
#total_len = 4
SELECT TOP 1
#next_id = id,
#next_file = file_name + ', '
FROM
table_of_files_to_delete
ORDER BY
id DESC
WHILE (#next_file IS NOT NULL)
BEGIN
WHILE ((#total_len + LEN(#next_file)) <= 8000) AND (#next_file IS NOT NULL)
BEGIN
SELECT
#command = #command + #next_file,
#total_len = #total_len + LEN(#next_file)
SELECT
#next_file = NULL
SELECT TOP 1
#next_id = id,
#next_file = file_name + ', '
FROM
table_of_files_to_delete
WHERE
id < #next_id
ORDER BY
id DESC
END
SET #command = SUBSTRING(#command, 1, #total_len - 2) -- remove the last ', '
EXEC xp_cmdshell #command
SELECT
#command = 'DEL ',
#total_len = 4
END
Not pretty, huh?
What you may be able do, depending on what needs deleting, is to use wild-cards. For example:
EXEC xp_cmdshell 'DELETE C:\abc\def\*.txt'

To delete files with space in name you need to enclose the filename with "
xp_cmdshell 'del "E:\Standard Invite.doc"'

DECLARE #deleteSql varchar(500)
,#myPath varchar(500) = '\\DestinationFolder\'
SET #deleteSql = 'EXEC master..xp_cmdshell ''del '+#myPath +'*.csv'''
EXEC(#deleteSql)

Related

SQL export every row item to xml file

Having issue with exporting each line item from an SQL server database into it's own XML file.
All the questions I've read on site all have various knowledge points that don't solve the answer for me and I am having one or two pointed gaps in knowledge letting me down. Basically for each item (~350 rows in a table called BPAProcess which exists across 6 databases, each database is basically a different version they're independent of each other. so ideally I can execute this query 6 times manually once on each db to extract all the rows from this table) I want to take XML for each row and save it locally, the xml contents of the cell are already suitable for xml structure they just need to export. So far I've tried creating a loop mechanism to loop over a temp table I created then for each row item extract it into an SQL command to save the details out. I have gotten this far where BCP is failing because it can't see the temporary table ive created, after reading up on this I think its going to fail once i resolve this issue in some other design because there is a character limit of 8k on fiel output but these files are ~300k long each.
USE [BP6.8.1]
GO
-- Save XML records to a file:
DECLARE #fileName VARCHAR(175)
DECLARE #filePath VARCHAR(175)
DECLARE #sqlStr VARCHAR(1000)
DECLARE #sqlCmd VARCHAR(1000)
DECLARE #MaxRowsCount INT
DECLARE #Iter INT
select ROW_NUMBER() OVER (order by processid) row_num, name, processxml into #Process from BPAProcess
SET #MaxRowsCount = (SELECT MAX(row_num) FROM #Process)
SET #Iter = 1
WHILE #Iter <= #MaxRowsCount
BEGIN
SET #fileName = (select name from #Process where row_num = +#Iter)
SET #filePath = 'C:\Temp\sql queries\'+ #fileName +'.xml'
SET #sqlStr = 'select processxml from #Process where row_num ='+cast(#Iter as varchar(3))
SET #sqlCmd = 'bcp "' + #sqlStr + '" queryout "' + #filePath + '" -w -T'
--EXEC xp_cmdshell #sqlCmd
SET #Iter = #Iter + 19090
-- +19090 just to execute first iteration only
END
Drop TABLE #Process
I feel what I need is some way to loop the #Process table and for each item at the row_num then export it out via stdout/similar or create a looping mechanism to do a bpc command with the sql pointing at the xml cell I want but I don't know how to make bpc see the table I'm creating.
Few caveats, this is not my DB its a db used by an application and changing anything is not an option. The task is to take each individual XML and store it on a file drive saved as [name].xml where name is in the table. I know this isn't (from reading comments around everywhere) the correct way to use SQL, admittedly im not an SQL developer nor do we have one to hand but Ive been tasked with exporting this code and the manual way on the GUI will take several weeks as it's a long a laborious process wheras this would be much faster. The XML contents are quite long like i said 300k->500k in some instances.
Any help is appreciated, and if that help is 'this is not appropriate for SQL to be executing' that would be fine I could go explore it in C# or some other language potentially if this really isn't the way it should be done.
When you call someone rude, should you expect an actual answer? A developer with any significant experience (you) should be able to find possible solutions and evaluate their usability. As an example, searching "sql server export one row to file" finds first entry using a cursor.
Quite frankly you seem to have significant experience in different languages so I question why you chose a TSQL based solution rather than one involving whatever language that is your strong suit. But that's a very different issue. No matter - RBAR is still RBAR regardless of language / development platform.
Your code converted to a simple cursor is below. There are some things you assume and I again suggest you use a language that you are proficient in.
declare #sql varchar(500);
declare #name varchar(20);
declare c1 cursor FAST_FORWARD for
select name from dbo.mytable order by name;
open c1;
fetch next from c1 into #name;
while ##fetch_status = 0
begin
--print #name;
set #sql = 'select processxml from dbo.mytable where name = ''' + #name + '''';
set #sql = 'bcp "' + #sql + '" queryout "' + 'C:\Temp\sql queries\'+ #name +'.xml " -w +T'
print #sql;
fetch next from c1 into #name;
end;
close c1;
deallocate c1;
You might need to adjust this to connect to the correct instance and database using a specific login. The documentation has cursor examples - always a good starting point. Perhaps the first thing you should try is to use BCP from a command prompt to export that column in a single specific row to a specific file to validate your assumptions and expectations first.
With just a little work you could add some additional logic to run this for multiple databases within a given instance. Doing that would require handling name collisions in the set of files generated.
Lastly, your code as is but without the reliance on the temp table with the BCP command. Note the small adjustment because of the simplified table in the fiddle.
--select ROW_NUMBER() OVER (order by processid) row_num, name, processxml
select ROW_NUMBER() OVER (order by name) row_num, name
into #Process
from dbo.mytable
SET #MaxRowsCount = (SELECT MAX(row_num) FROM #Process)
SET #Iter = 1
WHILE #Iter <= #MaxRowsCount
BEGIN
SET #fileName = (select name from #Process where row_num = +#Iter)
SET #filePath = 'C:\Temp\sql queries\'+ #fileName +'.xml'
SET #sqlStr = 'select processxml from dbo.mytable where row_num ='+cast(#Iter as varchar(3))
SET #sqlCmd = 'bcp "' + #sqlStr + '" queryout "' + #filePath + '" -w -T'
print #sqlCmd;
--EXEC xp_cmdshell #sqlCmd
SET #Iter = #Iter + 1
END
Drop TABLE #Process
fiddle to demonstrate both.
This is really not a job for T-SQL, it is not a generalized scripting language, it is only meant for querying.
Instead, use Powershell to extract the XML as separate rows, then output them into files
Invoke-Sqlcmd
-ServerInstance "YourServer"
-Database "YourDB" -Username "user" -Password "pass"
-Query "select name, processxml from dbo.mytable;"
| % {
Out-File -FilePath ("Downloads\temp\" + $_.name + ".xml") -InputObject $_.processxml
}
You could also use any other client app to do the same, far more easily than in T-SQL.

MS-SQL: Changing the FileGrowth parameters of a database generically

In our software the user can create databases as well as connect to databases that were not created by our software. The DBMS is Microsoft SQL-Server.
Now I need to update the databases that we use and set the FileGrowth parameter of all the files of all the databases to a certain value.
I know how to get the logical file names of the files of the current database from a query:
SELECT file_id, name as [logical_file_name], physical_name FROM sys.database_files
And I know how to set the desired FileGrowth value, once I know the logical file name:
ALTER DATABASE MyDB MODIFY FILE (Name='<logical file name>', FileGrowth=10%)
But I don't know how to combine these to steps into one script.
Since there are various databases I can't hard code the logical file names into the script.
And for the update process (right now) we only have the possibility to get the connection of a database and execute sql scripts on this connection, so a "pure" script solution would be best, if that's possible.
The following script receives a database name as parameter and uses 2 dynamic SQL: one for a cursor to cycle database files of chosen database and another to apply the proper ALTER TABLE command, since you can't use a variable for the file name on MODIFY FILE.
The EXEC is commented on both occasions and there's a PRINT instead, so you can review before executing. I've just tested it on my sandbox and it's working as expected.
DECLARE #DatabaseName VARCHAR(100) = 'DBName'
DECLARE #DynamicSQLCursor VARCHAR(MAX) = '
USE ' + #DatabaseName + ';
DECLARE #FileName VARCHAR(100)
DECLARE FileCursor CURSOR FOR
SELECT S.name FROM sys.database_files AS S
OPEN FileCursor
FETCH NEXT FROM FileCursor INTO #FileName
WHILE ##FETCH_STATUS = 0
BEGIN
DECLARE #DynamicSQLAlterDatabase VARCHAR(MAX) = ''
ALTER DATABASE ' + #DatabaseName + ' MODIFY FILE (Name = '''''' + #FileName + '''''', FileGrowth = 10%)''
-- EXEC (#DynamicSQLAlterDatabase)
PRINT (#DynamicSQLAlterDatabase)
FETCH NEXT FROM FileCursor INTO #FileName
END
CLOSE FileCursor
DEALLOCATE FileCursor '
-- EXEC (#DynamicSQLCursor)
PRINT (#DynamicSQLCursor)
You might want to check for the usual dynamic SQL caveats like making sure the values being concatenated won't break the SQL and also add error handling.
As for how to apply this to several databases, you can create an SP and execute it several times, or wrap a database name cursor / while loop over this.

SSIS OPENROWSET query flat file

I currently have a variable name called InvoiceFileName that is creating .csv files through a foreach loop. A list of .csv is then outputted to a folder.
I will then need to query off of each .csv file to select the header and the first row of data for each .csv.
I believe I need to use the OPENROWSET to query off of the .csv. I have 2 questions.
What is the syntax to query off of the variable name InvoiceFileName.
Is it possible to select the header field and first row of data OPENROWSET without inserting into a table.
Below is a simple OPENROWSET that only provides the header of the file.
SELECT
top 1 *
FROM OPENROWSET(BULK N'\\myservername\f$\reports\Invoices\CokeFiles\54ASBSd.csv', SINGLE_CLOB) AS Report
What kind of privs do you have on the database? If you have or can get slightly elevated privs, you can use BULK INSERT and xp_cmdShell to accomplish this, but like #scsimon said, you will have to use dynamic sql. Here's a quick example:
-----------------------------------------------------------------------------------------------------
-- Set up your variables
-----------------------------------------------------------------------------------------------------
DECLARE
#folderPath AS VARCHAR(100) = '\\some\folder\path\here\',
#cmd AS VARCHAR(150), -- Will populate this with a command to get a list of files in a directory
#InvoiceFileName AS VARCHAR(100), -- Will be used in cursor loop
#targetTable AS VARCHAR(50) = 'SomeTable',
#fieldTerminator AS CHAR(1) = ',',
#rowTerminator AS CHAR(2) = '\n'
-----------------------------------------------------------------------------------------------------
-- Create a temp table to store the file names
-----------------------------------------------------------------------------------------------------
IF OBJECT_ID('tempdb..#FILE_LIST') IS NOT NULL
DROP TABLE #FILE_LIST
--
CREATE TABLE #FILE_LIST(FILE_NAME VARCHAR(255))
-----------------------------------------------------------------------------------------------------
-- Get a list of the files and store them in the temp table:
-- NOTE: this DOES require elevated permissions
-----------------------------------------------------------------------------------------------------
SET #cmd = 'dir "' + #folderPath + '" /b'
--
INSERT INTO #FILE_LIST(FILE_NAME)
EXEC Master..xp_cmdShell #cmd
--------------------------------------------------------------------------------
-- Here we remove any null values
--------------------------------------------------------------------------------
DELETE #FILE_LIST WHERE FILE_NAME IS NULL
-----------------------------------------------------------------------------------------------------
-- Set up our cursor and loop through the files
-----------------------------------------------------------------------------------------------------
DECLARE c1 CURSOR FOR SELECT FILE_NAME FROM #FILE_LIST
OPEN c1
FETCH NEXT FROM c1 INTO #InvoiceFileName
WHILE ##FETCH_STATUS <> -1
BEGIN -- Begin WHILE loop
BEGIN TRY
-- Bulk insert won't take a variable name, so dynamically generate the
-- SQL statement and execute it instead:
SET #sql = 'BULK INSERT ' + #targetTable + ' FROM ''' + #InvoiceFileName + ''' '
+ ' WITH (
FIELDTERMINATOR = ''' + #fieldTerminator + ''',
ROWTERMINATOR = ''' + #rowTerminator + ''',
FIRSTROW = 1,
LASTROW = 2
) '
EXEC (#sql)
END TRY
BEGIN CATCH
-- Handle errors here
END CATCH
-- Continue your loop
FETCH NEXT FROM c1 INTO #path,#filename
END -- End WHILE loop
-- Do what you need to do here with the data in your target table
A few disclaimers:
I have not tested this code. Only copied from a slightly more complex proc I've used in the past that works for exactly this kind of scenario.
You will need elevated privs for BULK INSERT and xp_cmdShell.
I know people frown on using xp_cmdShell (and for good reason) but this is a quick and dirty solution making a lot of assumptions about what your environment is like.
This is assuming you're not grabbing the data as you get each file in your variable. If you are, you can skip the first part of this code.
This code also assumes you are doing your own error handling in places other than the one try/catch block you see. I've omitted a lot of that for simplicity.
For doing this through SSIS, ideally you'd probably need to use a format file for the bulk operation, but you'd have to have consistently formatted files and remove the SINGLE_CLOB option as well. A really hacky and non-ideal way to do this would be to do something like this:
Let's say your file contains this data:
Col1,Col2,Col3,Col4
Here's,The,First,Line
Here's,The,Second,Line
Here's,The,Third,Line
Here's,The,Fourth,Line
Then you could basically just parse the data doing something like this:
SELECT SUBSTRING(OnlyColumn, 0, CHARINDEX(CHAR(10), OnlyColumn, CHARINDEX(CHAR(10), OnlyColumn, 0)+1) )
FROM OPENROWSET(BULK '\\location\of\myFile.csv', SINGLE_CLOB) AS Report (OnlyColumn)
And your result would be this:
Col1,Col2,Col3,Col4 Here's,The,First,Line
This is obviously dependent on your line endings being consistent, but if you want the results in a single column and single row (as is the behavior of the bulk operation with the SINGLE_CLOB option), that should get you what you need.
You can take a look at the solution on this SO post for info on how to pass the SSIS variable value as a parameter to your query.
Use a Foreach Loop container to query all files in a folder. You can use wildcards for the file name, or user the variables in your DTS to set the properties of the components.
Inside the loop container you place a Data Flow Task with your source file connection, your transformations, and your destination.
You can modify the file names and paths of all these objects by setting their properties to variables in your DTS.
With an Expresion Task inside the loop, you can change the path of the CSV file connection.

Executing SQL query on multiple databases

I know my post has a very similar title to other ones in this forum, but I really couldn't find the answer I need.
Here is my problem, I have a SQL Server running on my Windows Server. Inside my SQL Server, I have around 30 databases. All of them have the same tables, and the same stored procedures.
Now, here is the problem, I have this huge script that I need to run in all of these databases. I wish I could do it just once against all my databases.
I tried a couple things like go to "view" >> registered servers >> local server groups >> new server registration. But this solution is for many servers, not many databases.
I know I could do it by typing the database name, but the query is really huge, so it would take too long to run in all databases.
Does anybody have any idea if that is possible?
You can use WHILE loop over all database names and inside loop execute query with EXECUTE. I think that statement SET #dbname = ... could be better, but this works too.
DECLARE #rn INT = 1, #dbname varchar(MAX) = '';
WHILE #dbname IS NOT NULL
BEGIN
SET #dbname = (SELECT name FROM (SELECT name, ROW_NUMBER() OVER (ORDER BY name) rn
FROM sys.databases WHERE name NOT IN('master','tempdb')) t WHERE rn = #rn);
IF #dbname <> '' AND #dbname IS NOT NULL
EXECUTE ('use '+QUOTENAME(#dbname)+';
/* Your script code here */
UPDATE some_table SET ... ;
');
SET #rn = #rn + 1;
END;
Consider running the script in SQLCMD Mode from SSMS (Query--SQLCMD Mode). This way, you can save the script to a file and run it in the context of each of the desired databases easily:
USE DB1;
:r C:\SqlScript\YourLargeScript.sql
GO
USE DB2;
:r C:\SqlScript\YourLargeScript.sql
GO
USE DB3;
:r C:\SqlScript\YourLargeScript.sql
GO
This technique can also be used to run the script against databases on other servers with the addition of a :CONNECT command. The connection reverts back to initial server/database after execution of the entire script:
:CONNECT SomeServer
USE DB4;
:r C:\SqlScript\YourLargeScript.sql
GO
:CONNECT SomeOtherServer
USE DB5;
:r C:\SqlScript\YourLargeScript.sql
GO
Important gotcha: Note GO batch separators are needed for :CONNECT to work as expected. I recommend including GO in the the invoking script like the above example but GO as the last line in the :r script file will also provide the desired results. Without GO in this example (or at the end of the script file), the script would run twice on SomeServer and not run against SomeOtherServer at all.
ApexSQL Propagate is the tool which can help in this situation. It is used for executing single or multiple scripts on multiple databases, even multiple servers. What you should do is simply select that script, then select all databases against which you want to execute that script:
When you load scripts and databases you should just click the “Execute” button and wait for the results:
You can write script like this
DECLARE CURSOR_ALLDB_NAMES CURSOR FOR
SELECT name
FROM Sys.Databases
WHERE name NOT IN('master', 'tempdb')
OPEN CURSOR_ALLDB_NAMES
FETCH CURSOR_ALLDB_NAMES INTO #DB_NAME
WHILE ##Fetch_Status = 0
BEGIN
EXEC('UPDATE '+ #DB_NAME + '..SameTableNameAllDb SET Status=1')
FETCH CURSOR_ALLDB_NAMESINTO INTO #DB_NAME
END
CLOSE CURSOR_ALLDB_NAMES
this is the normal way of doing this :
suppose you want to do a select on database DBOther than it would be :
select * from DBOther..TableName
Also check if the table or view is on the dbo schema, if not you should add the schema also : Please notice I use only one dot now after the database name
select * from DBOther.dbo.ViewName
If any of the databases is on another server on another machine, than make sure the Database is in the Linked Server.
Then you can access the table or view on that database via:
SELECT * FROM [AnotherServerName].[DB].[dbo].[Table]
Here is another way that does not requires typing the database name :
use DB1
go
select * from table1
go
use DB2
go
select * from table1
go
Note that this will only work if the tables and fields are exact the same on each database
You can use the following script to run the same script on a set of databases. Just change the filter in the insert line.
declare #dbs table (
dbName varchar(100),
done bit default 0
)
insert #dbs select [name], 0 FROM master.dbo.sysdatabases WHERE [Name] like 'targets_%'
while (exists(select 1 from #dbs where done = 0))
begin
declare #db varchar(100);
select top 1 #db = dbName from #dbs where done = 0;
exec ('
use [' + #db + '];
update table1 set
col1 = '''',
col2 = 1
where id = ''45b6facb-510d-422f-a48c-687449f08821''
');
print #db + ' updated!';
update #dbs set done = 1 where dbName = #db;
end
If your SQL Server version does not support table variables, just use Temp Tables but don`t forget to drop them at the end of the script.
Depending on the requirement, you can do this:
declare #dbName nvarchar(100)
declare #script nvarchar(max)
declare #dbIndex bigint = 0
declare #dbCount bigint = (
select count(*) from
sys.databases
)
declare crs_databases cursor for
(
select
[name]
from
sys.databases
)
open crs_databases
fetch next from crs_databases into #dbName
while ##FETCH_STATUS = 0
begin
set #dbIndex = #dbIndex+1
set #script = concat(#script,
' select Id from ['+#dbName+']..YourTableName ',
case
when #dbIndex = #dbCount then ''
else 'union'
end)
fetch next from crs_databases into #dbName
end
select #script
close crs_databases
deallocate crs_databases
Please note that the double dotted notation assumes that the schema is dbo. Otherwise, you need to explicitly write down the schema.
select Id from ['+#dbName+'].schema.YourTableName
When you need to execute stored procedures on each server, the #script variable will have another content.

import multiple txt files continuously by filename and DateStamp

I would like to set up an automated job to continuously insert multiple txt files into a table.
I have multiple dataloggers that output multiple (every minute) .txt files and are named by their datestamp, i.e. 20130921_1755.txt (YYYYMMDD_HHMM.txt). They also have a field named DateStamp, which contains date values by the second for each record.
I know what I want my query to do....
insert all filenames from directory into table ALLFILENAMES
select maximum date in final table TBLMEASUREMENTS
convert maximum date to filename (2013-09-22 17:53:00 to "20130922_1753.txt")
bulk insert all filenames > max date from table ALLFILENAMES
I have already started the process by using a post found here:
Import Multiple CSV Files to SQL Server from a Folder
I am having trouble trying to sort out how to select which specific files I need imported to the table. Especially since this is an ongoing job, I need to constantly look to the DB to see which files have not been imported, and then import them. Here's my code so far which works to import multiple files.
--some variables
declare #filename varchar(255),
#path varchar(255),
#sql varchar(8000),
#cmd varchar(1000)
--get the list of files to process:
SET #path = 'C:\SQL_txt\1_hr\'
SET #cmd = 'dir ' + #path + '*.txt /b'
INSERT INTO ALLFILENAMES(WHICHFILE)
EXEC Master..xp_cmdShell #cmd
UPDATE ALLFILENAMES SET WHICHPATH = #path where WHICHPATH is null
--cursor loop
declare c1 cursor for SELECT WHICHPATH,WHICHFILE FROM ALLFILENAMES where WHICHFILE like '%.txt%'
open c1
fetch next from c1 into #path,#filename
While ##fetch_status <> -1
begin
--bulk insert won't take a variable name, so make a sql and execute it instead:
set #sql = 'BULK INSERT Temp FROM ''' + #path + #filename + ''' '
+ ' WITH (
FIELDTERMINATOR = ''\t'',
ROWTERMINATOR = ''\n''
) '
print #sql
exec (#sql)
fetch next from c1 into #path,#filename
end
close c1
deallocate c1
I have been playing around with LEFT, LEN and REPLACE to try to convert the max datestamp into a filename, but have had no luck. Any help or suggestions would be useful. Am I going at this wrong? Thanks
I would do this with SSIS/Data Tools.
Import a file from your 'watch' folder, then move the file to a different folder, using:
ForEach Loop Container
Data Flow Task
File System Task
Derived Column (optional but recommended for source tracking)
You can use the mapped filepath variable in a derived column to indicate source file, and unless there's danger of the same files being added to the watch folder multiple times, there's little need to run the 'has this been imported already' check each time.
Many tutorials on SSIS available, here are two:
SSIS - Loop through Flat Files
SSIS - Move and Rename Files