I'm trying to build a process where I can dump a list of IDs to an XML document which can then be used in a SQL statement to return information for only those IDs. I feel as though I'm very close, but the BFILENAME function I need to use to open the XML file requires that I use the CREATE DIRECTORY statement which fails as I have read-only access and cannot create objects. Is there something else I can do to be able to create the directory used in the BFILENAME function?
I'm experienced in building SQL statements, but have never had to pull data from an external source in this way.
Here is the script I'm trying to run as a proof-of-concept test. Ultimately this will be joining into another table and spooling output to a CSV file.
CREATE DIRECTORY temp_dir AS 'C:\Users\MyDude\Desktop\';
DECLARE
acct_doc xmltype := xmltype( bfilename('temp_dir','TestXML.xml'), nls_charset_id('AL32UTF8') );
vis_PersonID varchar(100);
BEGIN
select PersonID
into vis_PersonID
from xmltable(
'/Root/E'
passing acct_doc
columns PersonID VARCHAR2(400) PATH '.'
);
END;
This is failing on line 1 as I have read only access. I have only two IDs in the file, if this were working properly I'd expect to see those two IDs output.
Related
My question is somewhat similar to the below post. I want to download some data from a hive table using select query. But because the data is large, I want to write it as an external table in a given path. so that I can create a csv file. Uses the below code
create external table output(col1 STRING, col2STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '{outdir}/output'
INSERT OVERWRITE TABLE output
Select col1, col2 from atable limit 1000
This works fine, and create a file in 0000_ format, which can be copied as a csv file.
But my question is how to ensure that the output will always have a single file? If there is no partition defined, will it always be single file? What is the rule it uses to split files?
Saw few similar questions like below. But it discuss hdfs file access.
How to point to a single file with external table
I know the below alternative, but I use a hive connection object to execute queries from a remote node.
hive -e ' selectsql; ' | sed 's/[\t]/,/g' > outpathwithfilename
You can set the below property before doing the overwrite
set mapreduce.job.reduces=1;
Note: If the hive engine doesn't allow to be modified at runtime, then whitelist the parameter by setting below property in hive-site.xml
hive.security.authorization.sqlstd.confwhitelist.append=|mapreduce.job.|mapreduce.map.|mapreduce.reduce.*
I'm trying to create a stored procedure that will access a file in an azure blob storage container, store the first line of the file in a temporary file, use this data to create a table (effectively using the header fields in the file as the column titles), and then populate the file with the rest of the data.
I've tried the basic process in a local SQL database, using a local source file on my machine, and the procedure itself works as I want it to, creating a new table from the supplied file.
However, when I've set it up within an Azure SQL database and amend the procedure to use a 'datasource' rather than pointing it at a local file, it's producing the following error:
Cannot bulk load because the file "my_example_file" could not be opened. Operating system error code 12007(failed to retrieve text for this error. Reason: 317).
My stored procedure contains the following:
CREATE TABLE [TempColumnTitleTable] ([ColumnTitles] [nvarchar](max)
NULL);
DECLARE #Sql NVARCHAR(Max) = 'BULK INSERT [dbo].
[TempColumnTitleTable] FROM ''' + #fileName + ''' WITH
(DATA_SOURCE = ''Source_File_Blob'', DATAFILETYPE = ''char'',
FIRSTROW = 1, LASTROW = 1, ROWTERMINATOR = ''0x0a'')';
EXEC(#Sql);
The above should be creating a single column table containing all the text for the headers, which I can then interrogate and use for the column titles in my permanent file.
I've set up the DataSource as follows:
CREATE EXTERNAL DATA SOURCE Source_File_Blob
WITH (
TYPE = BLOB_STORAGE,
LOCATION = 'location_url',
CREDENTIAL = AzureBlobCredential
);
with an appropriate credential in place!
I'm expecting it to populate my temporary column title file (and then go on and do the other populating that I haven't shown code for above), but it just returns the mentioned error code.
I've had a Google, but the error code seems to be related to other 'path' type issues that I don't think apply here.
We've got similar processes that use blob storage with the same credentials, and they all seem to work ok, but the problem is that the person who wrote them is no longer at our company, so I can't actually consult them!
So basically, what would be causing that error? I don't think it's access, since I am able to run similar processes on other blobs, and as far as I can tell access levels are the same on these.
Yep - used the wrong URL as prefix. It was only when I was finally got access to the blob storage that I realised.
I wrote an sql statement like this:
CREATE TABLE TableA
( "Col1" VARCHAR2(5 BYTE),
"Col2" VARCHAR2(2 BYTE),
)
ORGANIZATION EXTERNAL
( TYPE ORACLE_LOADER
DEFAULT DIRECTORY "DIR"
ACCESS PARAMETERS
( records delimited by newline
fields terminated by ','
optionally enclosed by '"'
)
LOCATION
( 'TableA.csv'
)
)
REJECT LIMIT UNLIMITED ;
There are 120 .csv files in my 'DIR' directory. I need to read in each of the file and create separate external tables for each file. For example tableA has to be created for fileA.
Is it possible to write a procedure for this, reading each filename into a loop and substituting the filename variable into the above sql state and running the sql statement within the loop?
I had a very similar problem a few years ago, and I used this procedure provided by Tom Kyte in his world famous asktom.oracle.com:
http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:439619916584
Basically, you have to create a java procedure that reads the directory, and then a PL/SQL stored procedure that uses that java procedure to read the files in the directories.
Take a look at it, and tell me how it goes...I'm looking for my code that I made for that years ago, but can't seem to find it. It was at a different job.
Good luck.
Hi I often have to insert a lot of data into a table. For example, I would have data from excel or text file in the form of
1,a
3,bsdf
4,sdkfj
5,something
129,else
then I often construct 6 insert statements in this example and run the SQL script. I found this was slow when I have to send thousands of small packages to server, it also causes extra overhead to the network.
What's your best way of doing this?
Update: I'm using ORACLE 10g.
Use Oracle external tables.
See also e.g.
OraFaq about external tables
What Tom thinks about external tables
René Nyffenegger's notes about external tables
A simple example that should get you started
You need a file located in a server directory (get familiar with directory objects):
SQL> select directory_path from all_directories where directory_name = 'JTEST';
DIRECTORY_PATH
--------------------------------------------------------------------------------
c:\data\jtest
SQL> !cat ~/.gvfs/jtest\ on\ 192.168.xxx.xxx/exttable-1.csv
1,a
3,bsdf
4,sdkfj
5,something
129,else
Create an external table:
create table so13t (
id number(4),
data varchar2(20)
)
organization external (
type oracle_loader
default directory jtest /* jtest is an existing directory object */
access parameters (
records delimited by newline
fields terminated by ','
missing field values are null
)
location ('exttable-1.csv') /* the file located in jtest directory */
)
reject limit unlimited;
Now you can use all the powers of SQL to access the data:
SQL> select * from so13t order by data;
ID DATA
---------- ------------------------------------------------------------
1 a
3 bsdf
129 else
4 sdkfj
5 something
Im not sure if this works in Oracle but in SQL Server you can use BULK INSERT sql statement to upload data from a txt or a csv file.
BULK
INSERT [TableName]
FROM 'c:\FileName.txt'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
GO
Just make sure that the table columns correctly matches whats in the txt file. With a more complicated solution you may want to use a format file see the following:
http://msdn.microsoft.com/en-us/library/ms178129.aspx
There are alot of ways to speed this up.
1) Do it in a single transaction. This will speed things up by avoiding connection opening / closing.
2) Load directly as a CSV file. If you load data as a CSV file, the "SQL" statements aren't required at all. in MySQL the "LOAD DATA INFILE" operation accomplishes this very intuitively and simply.
3) You can also simply dump the whole file as text into a table called "raw". And then let the database parse the data on its own using triggers. This is a hack, but it will simplify your application code and reduce network usage.
I have a table with a column type of xml. I also have a directory that can have 0 to n number of xml documents. For each xml document, i need to insert a new row in the table and throw the xml into the xml column.
To fit with our clients needs, I need to perform this operation using an SSIS package. I plan to use a Stored Procedure to insert the xml, passing in the file path.
I've created the stored procedure and tested, it functions as expected.
My question is, how do I execute the stored procedure from an SSIS package for each xml document is a specific directory?
Thanks in advance for any help.
-
Basically you just need to loop through the files and get the full file paths to pass to the stored proc. This can be done easily using a For Each Loop and the ForEach File Enumerator. This page has a good description of how to set that up:
http://www.sqlis.com/post/Looping-over-files-with-the-Foreach-Loop.aspx
Within the loop then you just access the variable that is populated each time the loop executes (an XML file is found) and send it as a parameter into an Execute SQL Task (residing inside your For Eacu Loop container) to call your stored procedure. Here is an example of passing variables as parameters:
http://geekswithblogs.net/stun/archive/2009/03/05/mapping-stored-procedure-parameters-in-ssis-ole-db-source-editor.aspx
You don't need to use a stored procedure for this. You can do all of this within an SSIS package. Here's how:
Have a For-Each Loop task read all available files in the folder. Put the full path of the file into a variable called XMLFileName
Inside the For-Each loop, use a Data-Flow task read the contents.
The OLE_SRC is reading from the same SQL Server and it's statement is SELECT GetDate() as CurrentDateTime
The DerivedColumn component creates a column called XMLFilePath with the full path of the XML file
The ImportColumn component is the one that does the magic. It will take the XMLFilePath as an input column, give it the LineageId of a new output column you create and it will import the full XML for you. Read more on how to set it up here:
http://www.bimonkey.com/2009/09/the-import-column-transformation/
Use the OleDB Destination to write to the table.