I am trying to find a way to quickly load a lot of data into database and one suggested to use Firebird External Tables, I would like know more about this method, I've tried searching online but I'm not getting the useful information about this, I want to know how do they really work? Do the tables have to be exactly the same? and what if you are loading data from more than one database?
Use external tables like this:
CREATE TABLE ext1 EXTERNAL 'c:\myfile.txt'
(
field1 char(20),
field2 smallint
);
To do quick import into regular table, do something like this:
INSERT INTO realtable1 (field1, field2)
SELECT field1, field2 FROM ext1;
Remember to disable triggers and indexes (if possible) before loading, and reactivate them after.
This information is from Firebird FAQ: http://www.firebirdfaq.org/faq209/
Here's more information about using external tables, including information about file format: http://www.delphiman.de/Bin/UsingExternalFilesAsTables.pdf
Using an external file as a table is a great way to get lots of data into Firebird quickly. However, the sample, which is from the Firebird FAQs, seems to me to be either unnecessarily complex or incorrect, because of use of smallint in table definition. As the FB 2.5 documentation points out, "for most purposes, only columns of CHAR types would be useful."
The external file must be a text file of fixed-length records (so a .csv file won't work). The external table def should then use CHAR fields with sizes that match the lengths of the fields in each record.
Any variation in the length of the records in the text file will lead to misery (from bitter experience). I suppose the example possibly might work if all of the smallints were the same number of digits but more generically, things will go more smoothly if other formats (date, numeric) are simply expressed as CHAR in the text file by padding with spaces.
For example, if the raw data looked like this:
Canada 37855702
Central African Republic 4829764
Chad 16425859
Chile 19116209
China 1404676330
Then the text file should look like this:
Canada 37855702
Central African Republic 4829764
Chad 16425859
Chile 19116209
China 1404676330
Countries are right-padded to twenty-five characters and the (big) integers are left-padded to 10 characters, so the records are 35 characters, plus one for a line feed (*nix) or two for Window's CRLF. (Note that things get more complicated if the file uses Unicode encoding.)
The table def would look like this:
CREATE TABLE ext_test EXTERNAL '/home/dave/fbtest.txt'
(
COUNTRY CHAR(25),
POPULATION CHAR(10),
LF CHAR(1)
);
Make sure that the file resides on the same file system as the FB server process, that the server process has rights to the file (maybe through a FB group) and that the ExternalFileAccess parameter in firebird.conf is set appropriately - see the 2.5 documentation for details.
There are some limited things you can do with an external table, but it's most useful as a temporary transfer table, as a source for the ultimate FB table. INSERT each row from the external table into the ultimate target, casting the CHAR fields to the appropriate data types. For data of any real volume, the process runs much faster than, say, some Python code to read and feed each line individually.
If you are using an older version of FB, don't forget to DROP the external table when you're done with it to free up file locks, as outlined in the FAQs. Newer versions do this automatically. There's lots more on external tables in the 2.5 documentation at the above link.
PS - I have emailed the above to the Firebird documentation team.
Related
With Firebird 2.5.8, and a table with a dozen of blob fields, I have this weird behavior querying this way:
SELECT *
FROM TABLE
WHERE BLOBFIELD4 LIKE '%SOMETEXT%'
and I get results though SOMETEXT is actually in a different column and not in BLOBFIELD4 (happens with every blob column).
What am I missing?
Thanks for the data. I made few fast tests using latest IB Expert with Firebird 2.5.5 (what i had on hands).
It seems that you actually have much more data, than you might think you have.
First of all - it is a bad, dangerous practice to keep text data in columns marked as CHARSET NONE ! Make sure that your columns are marked with some reasonable charset, like Windows 1250 or UTF8 or something. And also that the very CONNECTION of your all applicationa (including development tools) to the database server also has some explicitly defined character set that suits your textual data.
https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/
Or, if you want those BLOBs be seen as binary - then explitly create them as SUB_TYPE BINARY not SUB_TYPE TEXT
However, here is the simple script to run on your database.
alter table comm
add NF_VC VARCHAR(4000) CHARACTER SET UTF8,
add NF_BL BLOB SUB_TYPE 1 SEGMENT SIZE 4096 CHARACTER SET UTF8
then
update comm
set nf_vc = '**' || com1 || '**'
then
update comm
set nf_bl = '##' || nf_vc || '##'
Notice, i intentionally force Firebird to do conversion BLOB -> VARCHAR -> BLOB.
Just to be on a safe side.
Now check some data.
select id_comm, nf_vc
from comm where
nf_vc containing 'f4le dans 2 ans'
and
select id_comm, nf_bl
from comm where
nf_bl containing 'f4le dans 2 ans'
What do you see now?
On the first picture we see that very mystery - the line is selected, but we can not see your search pattern in it, the "f4le dans 2 ans".
BUT !!!
Can you see the marks, the double asterisks, the ** ?
Yes, you can, in the beginning! But you can not see them in the ending!!!
That means, you DO NOT see the whole text, but only some first part of it!
On the second picture - you see the very same row ID=854392, but re-converted back to BLOB and additionally marked with ## at both ends.
Can you see the marks on both start and end?
Can you see your search pattern?
Yes and yes - if you look at the grid row (white).
No and no - if you look and the tooltip (yellow).
So, again, the data you search for - it DOES exist. But you just fail to see it for some reason.
Now, when may be a typical reason the string is not displayed completely?
It can be the zero-value byte (or several bytes, UNICODE codepoint), the way C language marks the end of line, the custom that is vastly used in Windows and many libraries and programs. Or maybe some other unusual value (EOF, EOT, -1, etc), that makes those programs you use falsely detect the end of the text where it did not actually ended yet.
Look at the two screenshots again, where is that, that lines start to differ? it is after \viewkind4 ... \par} and before pard. Notice the weird anomaly! that said pard should start with reversed slash - \ - to be a vaild RTF command. But it is instead prepended with something invisible, something blank. What can it be?...
Let us go back to your original query in your comments.
Also, it is bad practice to put important details into comments! They are hard to find there for any person, that was not tracking the story from the very start. And the more comments added, the harder it gets. The proper avenue for you would have been to EDIT the question adding the new data into the quesiton body, and then adding a comment (for notification sake) saying the question was edited. Please, in future, add new data that way.
select id_comm, COM1
from comm where
COM1 containing 'f4le dans 2 ans'
On the first glance our fishing ended with nothing, we see the text that does not have your pattern, ending at that very \par}.
But is it so? Switch into binary view, and....
Voila! What is there before the found-lost-found-again pard? there is that very ZERO BYTE i talked about earlier.
So, what did happenned, to wrap it up.
Firebird is correct, the data is found because the data is really there, in the BLOBs.
Your applications, reading the data, are not correct. Being confused with zero byte they show you only part of data, not all of data.
Your application, writing the data, might be not correct. Or the data itself.
How did that zero byte ended there? Why RTF structure was corrupt, lacking the reversed slash before pard? Was data size, you passed to the server when inserting that data, larger than it should had been, passing some garbage after a meaningful data? Was data size correct, but the data contents corrupt before inserting?
Something is fishy there. I do not think RTF Specifications explicitly prohibits zero byte, but having it is very untypical, because it triggers bugs like this in way too many applications and libraries.
P.S. the design of the table having MANY columns with BLOB types seems poor.
"wide" tables often lead to problems in future development and maintenance.
While it is not the essense of your quesiton, but please do think about remaking this table into a narrow one, and save your data as a number of one-BLOB rows.
It will give you some fixed added work now, but probably would save you from a snowballing problems in future.
As it commonly known, it is not recommended by SAP to use 255+ character fields in transparent tables. One should use several 255 fields instead, wrap text in LCHR, LRAW or STRING, or use SO10 text etc.
However, while maintaining legacy (and ugly) developments, such problem often arises: how to view what is stored in char500 or char1000 field in database?
The real life scenario:
we have a development where some structure written and read from char1000 field in transparent table
we know field structure and parsing the field through CL_ABAP_CONTAINER_UTILITIES=>FILL_CONTAINER_C or SO_STRUCT_TO_CHAR goes fine, all fields are put wonderfully
displaying the fields via SE11/SE16/SE16n gives nothing as the field is truncated to 255, and to 132 in debugger, AFAIR.
Is there any standard tool, transaction or FM we can use to display such long field?
In the DBA cockpit (ST04), there is a SQL command line, where you can enter directly the "native" SQL commands and display the result as an ALV view. With a substring function, you can split a field into several sections (expl: select substr(sql_text,1,100) s1, substr(sql_text,101,100) s2, substr(sql_text,201,100) s3, substr(sql_text,301,100) s4 from dba_hist_sqltext where sql_id = '0cuyjatkcmjf0'). PS: every ALV cell is 128 characters maximum.
Not sure whether this tool is available for all supported database softwares.
There is also an equivalent program named RSDU_EXEC_SQL (in all ABAP-based systems?)
Unfortunately, they won't work for ersatz of tables by SAP (clustered tables and so on) as they can be queried only with ABAP "Open SQL".
If you have an ERP system to you hand check transaction PP01 out with infotype 1002. Basically They store text in table HRP1002 and HRT1002 and create a special view with an text editor. It looks like this: http://www.sapfunctional.com/HCM/Positions/Page1.13.jpg
In debugger you can switch the view to e.g. HTML and you should see the whole string, but editing is limited as far as i know to a certain number of charachters.
I have a table that stores a tree like structure of file names. There are currently 8 million records in this table. I am working on a way to quickly find a list of files what have a specific serial number embedded in the name.
FS_NODES
-----------------------------------
NODE_ID bigint PK
ROOT_ID bigint
PARENT_ID bigint
NODE_TYPE tinyint
NODE_NAME nvarchar(250)
REC_MODIFIED_UTC datetime
REC_DELETION_BIT bit
Example file name (as stored in the node_name):
scriptname_SomeSerialNumber_201205240730.xml
As expected, the LIKE statement to find the files takes several minutes to scan the entire table and would like to improve this. There is no consistent patterns for the names as each developer likes to create their own naming convention.
I tried using the Full Text Search and really love the idea but not able to get it to find files based off keywords in the name. I believe the problem is due to the underscores.
Any suggestions on how I can get this to work? I am using a neutral language for the catalog.
##VERSION
Microsoft SQL Server 2005 - 9.00.4035.00 (Intel X86)
Nov 24 2008 13:01:59
Copyright (c) 1988-2005 Microsoft Corporation
Standard Edition on Windows NT 5.2 (Build 3790: Service Pack 2)
Is there a way to alter the catalog and split the keywords out manually?
Thank you!
Full-text search is not the answer. It is used for words, not partial string matching. What you should do is, when inserting or updating data in this table, extract the parts of the filename that are relevant for future searching into their own column(s) which you can index. After all, they are separate pieces of data the way you are using them. You could also consider enforcing a more predictable naming convention instead of just letting the developers do whatever they want.
EDIT per user request:
Add a computed column that is REPLACE(filename, '_', ' '). Or instead of a computed column, just a column you manually populate for existing data and change your insert procedure to deal with going forward. Or even break those out into separate rows in a related table.
I have a text file (~100,000+ rows), where each column is a fixed length and I need to get it into a SQL Server database table. Each one of our clients are required to get this data, but each text file is slightly different so we have to manually go in and adjust the character spacing in a SQL stored procedure.
I was wondering if there is a way that we can use XML/XSD/XSLT instead. This way, I would not have to go in and manually edit the stored procedures.
What we do currently is this:
1.) SQL server stored procedure reads a text file from the disk
2.) Each record is split into an XML element and dumped into a temporary table
3.) Using SQL Server's string manipulation, each element is parsed
4.) Each column is dumped into
For clarification, here are a couple of examples...
One client's text file would have the following:
Name [12 Characters]
Employer [20 Characters]
Income [7 Characters]
Year-Qtr [5 Characters]
JIM JONES HOMERS HOUSE OF HOSE100000 20113
Another client's text file would have the following:
Year-Qtr [5 Characters]
Income [7 Characters]
Name [12 Characters]
Employer [20 Characters]
20113100000 JIM JONES HOMERS HOUSE OF HOSE
They basically all have the same fields, some may have a couple more are a couple less, just in different orders.
Using SQL Server xml processing functions to import a fixed length text file seems like a backwards way of doing things (no offense).
You don't need to build your own application, Microsoft has already built one for you. It's ingeniously called BCP Utility. If needed, you can create a format file that tells BCP Utility how to import your data. The best part is it's ridiculously fast and you can import the data to SQL Server from a remote machine (as in the file doesn't have to be located on the SQL Server box to import it)
To address the fact that you need to be able to change the column widths, I don't think editing the format file would be to bad.
Ideally you would be able to use a delimited format instead of an ever-changing fixed length format, that would make things much easier. It might be quick and easy for you to import the data into excel and save it in a delimited format and then go from there.
Excel, Access, all the flavors of VB and C# have easy-to-use drivers for treating text files as virtual database tables, usually with visual aids for mapping the columns. And reading and writing to SQL Server is of course cake. I'd start there.
100K rows should not be a problem unless maybe you're doing it hourly for several clients.
I'd come across File Helpers a while back when I was looking for a CSV parser. The example I've linked to shows you how you can use basic POCOs decorated with attributes to represent the file you are trying to parse. Therefore you'd need a Customer specific POCO in order to parse their files.
I haven't tried this myself, but it could be worth a look.
Is a way around this??
I am trying to insert some data into a table whose structure is:
Column name Type Nulls
crs_no char(12) no
cat char(4) no
pr_cat char(1) yes
pr_sch char(1) yes
abstr text yes
The type of the last field reads 'text', but when trying to insert into this table, I get this error:
insert into crsabstr_rec values ("COMS110","UG09","Y","Y","CHEESE");
617: A blob data type must be supplied within this context.
Error in line 1
Near character position 66
So this field is some sort of blob apparently, but won't take inserts (or updates). Normally, these records are inserted into a GUI form, then C code handles the insertions.
There are no blob (BYTE or TEXT) literals in Informix Dynamic Server (IDS) - nor for CLOB or BLOB types in IDS 9.00 and later. It is an ongoing source of frustration to me; I've had the feature request in the system for years, but it never reaches the pain threshold internally that means it gets fixed -- other things get given a higher priority.
Nevertheless, it bites people all the time.
In IDS 7.3 (which you should aim to upgrade - it goes out of service in September 2009 after a decade or so), you are pretty much stuck with using C to get the data into the TEXT field of a database. You have to use the approved C type 'loc_t' to store the information about the BYTE or TEXT data, and pass that to the server.
If you need examples in ESQL/C, look at the International Informix User Group web site, and especially the Software Repository. Amongst other things, you'll find the original SQLCMD program (Microsoft's program of the same name is a Johnny-Come-Lately) in source form. It also includes a set of programs that I dub 'vignettes'; they manipulate blobs in various ways, and are designed to show how to use 'loc_t' structures in various scenarios.
in iSQL....
Load from desc.txt insert into crsabstr_rec;
3 row(s) loaded.
desc.txt is a | (pipe) delimited text file and the number of fields in the txt have to match the number of fields in the table