Fixed Length Text File to SQL Data Table - sql

I have a text file (~100,000+ rows), where each column is a fixed length and I need to get it into a SQL Server database table. Each one of our clients are required to get this data, but each text file is slightly different so we have to manually go in and adjust the character spacing in a SQL stored procedure.
I was wondering if there is a way that we can use XML/XSD/XSLT instead. This way, I would not have to go in and manually edit the stored procedures.
What we do currently is this:
1.) SQL server stored procedure reads a text file from the disk
2.) Each record is split into an XML element and dumped into a temporary table
3.) Using SQL Server's string manipulation, each element is parsed
4.) Each column is dumped into
For clarification, here are a couple of examples...
One client's text file would have the following:
Name [12 Characters]
Employer [20 Characters]
Income [7 Characters]
Year-Qtr [5 Characters]
JIM JONES HOMERS HOUSE OF HOSE100000 20113
Another client's text file would have the following:
Year-Qtr [5 Characters]
Income [7 Characters]
Name [12 Characters]
Employer [20 Characters]
20113100000 JIM JONES HOMERS HOUSE OF HOSE
They basically all have the same fields, some may have a couple more are a couple less, just in different orders.

Using SQL Server xml processing functions to import a fixed length text file seems like a backwards way of doing things (no offense).
You don't need to build your own application, Microsoft has already built one for you. It's ingeniously called BCP Utility. If needed, you can create a format file that tells BCP Utility how to import your data. The best part is it's ridiculously fast and you can import the data to SQL Server from a remote machine (as in the file doesn't have to be located on the SQL Server box to import it)
To address the fact that you need to be able to change the column widths, I don't think editing the format file would be to bad.
Ideally you would be able to use a delimited format instead of an ever-changing fixed length format, that would make things much easier. It might be quick and easy for you to import the data into excel and save it in a delimited format and then go from there.

Excel, Access, all the flavors of VB and C# have easy-to-use drivers for treating text files as virtual database tables, usually with visual aids for mapping the columns. And reading and writing to SQL Server is of course cake. I'd start there.
100K rows should not be a problem unless maybe you're doing it hourly for several clients.

I'd come across File Helpers a while back when I was looking for a CSV parser. The example I've linked to shows you how you can use basic POCOs decorated with attributes to represent the file you are trying to parse. Therefore you'd need a Customer specific POCO in order to parse their files.
I haven't tried this myself, but it could be worth a look.

Related

Firebird External Tables

I am trying to find a way to quickly load a lot of data into database and one suggested to use Firebird External Tables, I would like know more about this method, I've tried searching online but I'm not getting the useful information about this, I want to know how do they really work? Do the tables have to be exactly the same? and what if you are loading data from more than one database?
Use external tables like this:
CREATE TABLE ext1 EXTERNAL 'c:\myfile.txt'
(
field1 char(20),
field2 smallint
);
To do quick import into regular table, do something like this:
INSERT INTO realtable1 (field1, field2)
SELECT field1, field2 FROM ext1;
Remember to disable triggers and indexes (if possible) before loading, and reactivate them after.
This information is from Firebird FAQ: http://www.firebirdfaq.org/faq209/
Here's more information about using external tables, including information about file format: http://www.delphiman.de/Bin/UsingExternalFilesAsTables.pdf
Using an external file as a table is a great way to get lots of data into Firebird quickly. However, the sample, which is from the Firebird FAQs, seems to me to be either unnecessarily complex or incorrect, because of use of smallint in table definition. As the FB 2.5 documentation points out, "for most purposes, only columns of CHAR types would be useful."
The external file must be a text file of fixed-length records (so a .csv file won't work). The external table def should then use CHAR fields with sizes that match the lengths of the fields in each record.
Any variation in the length of the records in the text file will lead to misery (from bitter experience). I suppose the example possibly might work if all of the smallints were the same number of digits but more generically, things will go more smoothly if other formats (date, numeric) are simply expressed as CHAR in the text file by padding with spaces.
For example, if the raw data looked like this:
Canada 37855702
Central African Republic 4829764
Chad 16425859
Chile 19116209
China 1404676330
Then the text file should look like this:
Canada 37855702
Central African Republic 4829764
Chad 16425859
Chile 19116209
China 1404676330
Countries are right-padded to twenty-five characters and the (big) integers are left-padded to 10 characters, so the records are 35 characters, plus one for a line feed (*nix) or two for Window's CRLF. (Note that things get more complicated if the file uses Unicode encoding.)
The table def would look like this:
CREATE TABLE ext_test EXTERNAL '/home/dave/fbtest.txt'
(
COUNTRY CHAR(25),
POPULATION CHAR(10),
LF CHAR(1)
);
Make sure that the file resides on the same file system as the FB server process, that the server process has rights to the file (maybe through a FB group) and that the ExternalFileAccess parameter in firebird.conf is set appropriately - see the 2.5 documentation for details.
There are some limited things you can do with an external table, but it's most useful as a temporary transfer table, as a source for the ultimate FB table. INSERT each row from the external table into the ultimate target, casting the CHAR fields to the appropriate data types. For data of any real volume, the process runs much faster than, say, some Python code to read and feed each line individually.
If you are using an older version of FB, don't forget to DROP the external table when you're done with it to free up file locks, as outlined in the FAQs. Newer versions do this automatically. There's lots more on external tables in the 2.5 documentation at the above link.
PS - I have emailed the above to the Firebird documentation team.

Text was truncated or one or more characters had no match in the target code page including the primary key in an unpivot

I'm trying to import a flat file into an oledb target sql server database.
here's the field that's giving me trouble:
here are the properties of that flat file connection, specifically the field:
here's the error message:
[Source - 18942979103_txt [424]] Error: Data conversion failed. The
data conversion for column "recipient-name" returned status value 4
and status text "Text was truncated or one or more characters had no
match in the target code page.".
What am I doing wrong?
Here is what fixed the problem for me. I did not have to convert to Excel. Just modified the DataType when choosing the data source to "text stream" (Figure 1). You can also check the "Edit Mappings" dialog to verify the change to the size (Figure 2).
Figure 1
Figure 2
After failing by increasing the length or even changing to data type text, I solved this by creating an XLSX file and importing. It accurately detected the data type instead of setting all columns as varchar(50). Turns out nvarchar(255) for that column would have done it too.
I solved this problem by ORDERING my source data (xls, csv, whatever) such that the longest text values on at the top of the file. Excel is great. use the LEN() function on your challenging column. Order by that length value with the longest value on top of your dataset. Save. Try the import again.
SQL Server may be able to suggest the right data type for you (even when it does not choose the right type by default) - clicking the "Suggest Types" button (shown in your screenshot above) allows you to have SQL Server scan the source and suggest a data type for the field that's throwing an error. In my case, choosing to scan 20000 rows to generate the suggestions, and using the resulting suggested data type, fixed the issue.
While an approach proposed above (#chookoos, here in this q&a convert to Excel workbook) and import resolves those kinds of issues, this solution this solution in another q&a is excellent because you can stay with your csv or tsv or txt file, and perfom the necessary fine tuning without creating a Microsoft product related solution
I've resolved it by checking the 'UNICODE'checkbox. Click on below Image link:
You need to go increase the column length while importing the data for particular column.
Choose a data source >> Advanced >> increase the column from default 50 to 200 or more.
Not really a technical solution, but SQL Server 2017 flat file import is totally revamped, and imported my large-ish file with 5 clicks, handled encoding / field length issues without any input from me
SQl Management Studio data import looks at the first few rows to determine source data specs..
shift your records around so that the longest text is at top.
None of the above worked for me. I SOLVED my problem by saving my source data (save as) Excel file as a single xls Worksheet Excel 5.0/95 and imported without column headings. Also, I created the table in advance and mapped manually instead of letting SQL create the table.
I had similar problem against 2 different databases (DB2 and SQL), finally I solved it by using CAST in the source query from DB2. I also take advantage of using a query by adapting the source column to varchar and avoiding the useless blank spaces:
CAST(RTRIM(LTRIM(COLUMN_NAME)) AS VARCHAR(60) CCSID UNICODE
FOR SBCS DATA) COLUMN_NAME
The important issue here is the CCSID conversion.
It usually because in connection manager it may be still of 50 char , hence I have resolved the problem by going to Connection Manager--> Advanced and then change to 100 or may be 1000 if its big enough

How to replace an extremely high occurrence of the same character quickly in a CLOB field (Oracle 10g)?

Due to a bug in one of our applications, a certain character was duplicated 2^n times in many CLOB fields, where n is anywhere between 1 and 24. For the sake of simplicity, lets say the character would be X. It is safe to assume that any adjacent occurrence of two or more of these characters identifies broken data.
We've thought of running over every CLOB field in the database, and replacing the value where necessary. We've quickly found out that you can easily replace the value by using REGEXP_REPLACE, e.g. like this (might contain syntax errors, typed this by heart):
SELECT REGEXP_REPLACE( clob_value, 'XX*', 'X' )
FROM someTable
WHERE clob_value LIKE 'XX%';
However, even when changing the WHERE part to WHERE primary_key = 1234, for a data set which contains around four million characters in two locations within its CLOB field, this query takes more than fifteen minutes to execute (we aborted the attempt after that time, not sure how long it would actually take).
As a comparison, reading the same value into a C# application, fixing it there using a similar regular expression approach, and writing it back into the database only takes 3 seconds.
We could write such a C# application and execute that, but due to security restrictions it would just be a lot easier to send a database script to our customer which they could execute theirselves.
Is there any way to do a replacement like this much faster on an Oracle 10g (10.2.0.3) database?
Note: There are two configurations, one running the database on a Windows 2003 Server with the Clients being Windows XP, and another one running both the database and the client on a standalone Windows XP notebook. Both configurations are affected
How does your client access the Oracle server? If it is via a Unix environment(which most likely is the case) then maybe you can write a shell script to extract the value from database, fix it using sed, and write back to database. Replacing in unix should be real quick.
Maybe you facing problem with LOB segment space fragmentation. In fact each of your lobs will be shorted that before. Try to create a new table and copy modified clobs into this new table.
As we didn't find any way to make it faster on the database, we delivered the C# tool within an executable patch.

SSIS flat file with string larger than 50

SSIS by default makes the datatype to be String with length 50, what if the string in a certain column is larger than 50 and also I can't use suggest types (it sucks!).
Is there a way to fix this, rather than manually increasing the sizes ie. manually editing the column lengths/datatypes in the flat file manager's advanced tab, ideally changing datatypes based on the destination (sql server) mapping columns' datatypes?
You can set datatypes in the flat file connection manager. In the advanced section.
I've heard good things about BIDS Helper, but haven't used it myself.
I haven't found a way to change default length, or to stop it from resetting when changing the connection manager. I was pleased that you can select all columns at once in the advanced editor and change them simultaneously, that's something...
The best way I could do this was write C# code that modifies the ssis package xml file and increases the string length values by looking at the lengths of the destination table (using information_schema query)

Outputting long fixed-width records to a flat file using SSIS

I have to build an SSIS package for work that takes the contents of a table, all columns, and outputs it to a flat file. The problem is, one of the columns is a varchar(5100) and that puts the total row size at about 5200 characters. It seems the flat file connection manager editor won't let me define a fixed-width row beyond 483 characters.
I've tried going at this from several directions. The ragged right option doesn't appear to work for me, as the columns themselves don't have delimiters in them (no CR/LF for instance). They truly are fixed width. But I can't figure out how to tell the flat file connection manager to go past 483 characters. Does anyone know what I can do here?
Thanks!
Personally I would use a delimited text file as my destination. SSIs will put inthe column delimter and the record dellimiters for you. These are simpler to import as well which the people you are sending it to should appreciate. We use | as the delimiter for most of our exports and CR?LF as the record delimiter.