Retrieve filename of uploading file in SQL - sql

I've been searching for a solution on how to get the filename of using SQL Server. I know that it's possible if you're using C#. But how is it done in SQL?
For example, I have a file (example: uploadfile.txt) located in C:\ that is about to be uploaded. I have a table which has a field "filename". How do I get the filename of this file?
This is the script that I have as of the moment.
-- Insert to table
BULK INSERT Price_Template_Host
FROM 'C:\uploadfile.txt'
WITH
(
FIELDTERMINATOR = '\t',
ROWTERMINATOR = '\n'
)
-- Insert into transaction log table filename and datetime()

To the best of my knowledge, there is no direct method in T-SQL to locate a file on the file system. After all this is not what the language is intended to be used for. From the script you have provided, BULK INSERT requires that the fully qualified file name already be known at the time of the statement call.
There are of course a whole variety of ways you could identify/locate a file, outside of using T-SQL for example using SSIS, perhaps you could use xp_cmdshell (has security caveats), or create a managed code module within SQL Server to perform this task.
To provide you with specific guidence, it may help if you could provide us all with details of the business process that you are trying to implement.

I would personally attach this problem with an SSIs package, which would give you much more flexibility in terms of load and subsequent logging. However, if you're set on doing this through T-SQL, consider exec'ing dynamically-constructed SQL:
declare #cmd nvarchar(max), #filename nvarchar(255)
set #filename = 'C:\uploadfile.txt'
set #cmd =
'BULK INSERT Price_Template_Host
FROM '''+#filename+'''
WITH
(
FIELDTERMINATOR = ''\t'',
ROWTERMINATOR = ''\n''
)'
-- Debug only
print #cmd
-- Insert to table
exec(#cmd)
-- Insert into transaction log table filename and datetime()
insert into dbo.LoadLog (filename, TheTime)
values (#filename, getdate())
If I understand your question correctly, this paramaterizes the filename so that you can capture it further down in the script.

Related

Using single stored procedure for different data imports to different tables

I am in a designing stage of an application, I have a huge functionality of importing data into a SQL Server database. As there are numerous tables in database, I want to avoid conventional based approach of creating models and writing stored procedures for each Import. Is there a way by which I can use create single stored procedure for different tables and insert data into them?
Note: columns will vary from table to table.
Thanks in advance
Well, I would stick to comments discouraging it, but on other hand, if this procedure will be super simple and maintenance will be transferred to JSON creator, you can do it like this:
declare #tablename as nvarchar(max)
declare #json as nvarchar(max)
declare #query as nvarchar(max)
set #tablename = (SELECT TableName FROM YourAllowedTableNamesList WHERE Tablename = #tablename)
Set #query =
'Insert into ' + #tablename +
'SELECT * FROM OPENJSON(' + #json + ')'
Exec (#query)
Yes I have done something like this at my current shop. Your question is too broad so I will give you only a broad overview of what we have done.
We wrote a console app that gets a SQL Command from a meta table and executes it on the source into an in-memory DataTable. It then bulk-inserts that data into a staging table on the destination database.
Then we run a generic merge proc that looks at the system tables to get the Primary Keys and datatypes of the final destination table and constructs INSERT and UPDATE statements using dynamic sql.
Despite the well-meaning warnings of others, it's working well for us, though it does have some limitations, such as an inability to handle BLOB datatypes in a generic way. There may be other limitations that we just haven't encountered yet as well.

SQL Server loop openrowset performance

I have the following stored procedure to loop through hundreds of different JSON files that are downloaded to the server every day.
The issue is that the query takes a good 15 minutes to run, I will need to create something similar soon for a larger amount of JSON files, is somebody able to point me in the correct direction in regards to increasing the performance of the query?
DECLARE #json VARCHAR(MAX) = ''
DECLARE #Int INT = 1
DECLARE #Union INT = 0
DECLARE #sql NVARCHAR(max)
DECLARE #PageNo INT = 300
WHILE (#Int < #PageNo)
BEGIN
SET #sql = (
'SELECT
#cnt = value
FROM
OPENROWSET (BULK ''C:\JSON\tickets' + CONVERT(varchar(10), #Int) + '.json'', SINGLE_CLOB) as j
CROSS APPLY OPENJSON(BulkColumn)
WHERE
[key] = ''tickets''
')
EXECUTE sp_executesql #sql, N'#cnt nvarchar(max) OUTPUT', #cnt=#json OUTPUT
IF NOT EXISTS (SELECT * FROM OPENJSON(#json) WITH ([id] int) j JOIN tickets t on t.id = j.id)
BEGIN
INSERT INTO
tickets (id, Field1)
SELECT
*
FROM OPENJSON(#json)
WITH ([id] int, Field1 int)
END
END
It seems your BULK INSERT in the loop is the bottleneck. Generally a BULK INSERT is the fastest way to retrieve data. Anyway, here it seems the amount of files is your problem.
To make things faster you would want to read the JSON files in parallel. You could do that by first creating the complete dynamic sql query for all files or maybe for some file groups and simultaneously read.
I would rather advise to use Integration Services with a script component as a source in parallel data flow tasks. First read all files from your destination folder, split them for example in 4 groups, for each group have a loop container that runs in parallel. Depending on your executing machine, you can use as many parallel flows as possible. Allready 2 dataflows should make up for the overhead of integration services.
Another option would be to write a CLR (common language runtime) stored procedure and parallelly deserialize JSON using C#.
It also depends on the machine doing the job. You would want to have enough random-access memory and free cpu power, so it should be considered to do the import while the machine is not busy.
So one method I've had success with when loading data into tables from lots of individual XML files, which you might be able to apply to this problem is by using the FileTable feature of SQL server.
The way it worked was to set up a filetable in the database, then allow access to the FileStream share that was created on the server for the process that was uploading the XML files. XML files were then dropped into the share and were immediately available in the database for querying using xPath.
A process would then run xPath queries would load the required data from the XML into the required tables and keep track of which files had been loaded, then when the next schedule came along, only load data from the newest files.
A scheduled task on the machine would then remove files when they were no longer required.
Have a read up on FileTable here:
FileTables (SQL Server)
It's available in all SQL server editions.

Finding and Reading a XML file using MS SQL

I have an SQL script which I want to run on multiple databases. The script runs a couple of update and insert statements and I also want to open and parse an xml file on different paths.
The issue I have is that I know where the file I want to open is (the directory) but I don't know its name (the name is random) but the extension is always .profile (which is a xml file) and there is only one file in each directory.
I wonder how I can open a XML/profile file without knowing its exact name using MS SQL.
As far as I understand your question correctly:
declare #files table (ID int IDENTITY, fileName varchar(max))
insert into #files execute xp_cmdshell 'dir <yourdirectoryhere>\*.profile /b'
declare #fileName varchar(max)
select top 1 #fineName = fileName * from #files
does what you want but is based on calling xp_cmdshell and it's usually a very bad idea to use it.
Try something along the lines of this:
DECLARE #output NVARCHAR(MAX)
CREATE TABLE #OUTPUT
(
OUTPUT VARCHAR(255) NULL
)
INSERT #OUTPUT
EXEC #output = XP_CMDSHELL
'DIR "C:\temp\*.profile" /B '
SELECT *
FROM #OUTPUT
DROP TABLE #OUTPUT
As explained here (and that's just one way), you can access disk contents from SQL Server, provided your permissions are working fine.
IIRC, the following options need to be enabled. However, you need them anyway to access files from SQL Server.
EXEC sp_configure 'show advanced options', 1
GO
RECONFIGURE
GO
EXEC sp_configure 'xp_cmdshell', 1
GO

SQL calling SSIS package in stored procedure char limit

I have a stored procedure that has an XML variable as input parameter.
Each node of the XML variable matches with a SQL column of a certain database table.
The stored procedure simply iterates over the XML nodes and inserts them into the table.
I have been using the "OPENXML" functionality in my stored procedure to do this, but I am having performance issues (query takes up to 40 sec) => SQL process XML performance: Insert into columns in a table.
I want to boost the performance but I'm not sure where to begin. The standard XML function doesn't seem to improve, so I am looking into an alternative.
I have tried to do this functionality using SSIS packages. So from my stored procedure, I call an SSIS package that has a string variable as input parameter.
I pass the XML I receive from the stored procedure into the package as a string.
But I have come accross an issue with the xp_cmdshell functionality.
It only allows a command of max 8000 character.
This is how the code looks like:
--Execution SSIS Package
DECLARE #Command varchar(8000)
, #PackageLocation varchar(1000)
, #PackageName varchar(1000)
SET #PackageLocation = 'C:\SSIS\Package.dtsx'
SET #Command = 'dtexec /f "' + #PackageLocation + '" /set \package.Variables[Xmldata].Value;"' + #datastring + '"'
EXEC #ExitCode = xp_cmdshell #Command
The problem is that the #datastring variable can be longer than 8000 characters, so then the command fails.
Any idea how I can solve this?
Or perhaps other alternatives to accomplish this functionality?
Thanks.
Best regards,
I will store the XML I receive in the stored procedure in a temporarily "buffer" table, so I can retrieve it in the SSIS package and process it further.

SQL 2008 FILESTREAM - How to Check if DB Has Filestream Enabled Before Altering It

In my situation, there is a possibility of a script being run many times (don't ask why).
So I want to make my script bullet proof before it runs in prod.
I have a pretty big change where I add FILESTREAM capabilities. I have already got it to work in my dev environment with the necessary scripts by enabling FILESTREAM, altering the database to add a filestream file group, and then adding a file location to that file group, and lastly creating a table with filestream on the data column (varbinary(MAX)).
That is all good. But running the ALTER DATABASE script many times can produce errors if a file group is already there. So i did this...
IF NOT EXISTS (SELECT * FROM sys.filegroups WHERE [name]='NewFileGroup')
BEGIN
ALTER DATABASE [MyDb]
ADD FILEGROUP [NewFileGroup] CONTAINS FILESTREAM
END
GO
But the next part of the code is the part that i don't want running many times...
DECLARE #Path NVARCHAR(MAX)
SET #Path = (SELECT REPLACE(filename, 'MyDb.mdf', 'NewFileGroup') FROM sysfiles WHERE Name = 'PrimaryFileName')
DECLARE #SQL NVARCHAR(max)
SET #SQL =
'ALTER DATABASE [MyDb]
ADD FILE
(NAME = ''NewFileGroup''
, FILENAME = ' + QuoteName( #Path, '''' )
+ ')
TO FILEGROUP [NewFileGroup]'
EXEC(#SQL)
That code works fine but how do i check if the FILENAME / file path already exists in that file group? Please someobody help. I just want to make another if statement around it.
To see if you have already a filegroup for FILESTREAM look in sys.data_spaces:
select * from sys.data_spaces where type='FD';
To see if the filegroup has any file for FILESTREAM already look is sys.database_files:
select * from sys.database_files where type = 2;
Whatever you do, do not rely on the object names.
Or from SQL Server 2012 and above this query will list all the databases which have non-transactional access enabled on them, i.e, FileStream.
SELECT DB_NAME(database_id) [DB_Name],directory_name [FileStream_DirectoryName]
FROM sys.database_filestream_options
WHERE non_transacted_access != 0;
Refer to http://msdn.microsoft.com/en-us/library/gg492071.aspx for more info