Querying text file with SQL converts large numbers to NULL - sql

I am importing data from a text file and have hit a snag. I have a numeric field which occasionally has very large values (10 billion+) and some of these values which are being converted to NULLs.
Upon further testing I have isolated the problem as follows - the first 25 rows of data are used to determine the field size, and if none of the first 25 values are large then it throws out any value >= 2,147,483,648 (2^31) which comes after.
I'm using ADO and the following connection string:
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=FILE_ADDRESS;Extended Properties=""text;HDR=YES;FMT=Delimited""
Therefore, can anyone suggest how I can get round this problem without having to get the source data sorted descending on the large value column? Is there some way I could define the data types of the recordset prior to importing rather than let it decide for itself?
Many thanks!

You can use an INI file placed in the directory you are connecting to which describes the column types.
See here for details:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms709353(v=vs.85).aspx

Related

HANA: Unknown Characters in Database column of datatype BLOB

I need help on how to resolve characters of unknown type from a database field into a readable format, because I need to overwrite this value on database level with another valid value (in the exact format the application stores it in) to automate system copy acitvities.
I have a proprietary application that also allows users to configure it in via the frontend. This configuration data gets stored in a table and the values of a configuration property are stored in a column of type "BLOB". For the here desired value, I provide a valid URL in the application frontend (like http://myserver:8080). However, what gets stored in the database is not readable (some square characters). I tried all sorts of conversion functions of HANA (HEX, binary), simple, and in a cascaded way (e.g. first to binary, then to varchar) to make it readable. Also, I tried it the other way around and make the value that I want to insert appear in the correct format (conversion to BLOL over hex or binary) but this does not work either. I copied the value to clipboard and compared it to all sorts of character set tables (although I am not sure if this can work at all).
My conversion tries look somewhat like this:
SELECT TO_ALPHANUM('') FROM DUMMY;
while the brackets would contain the characters in question. I cant even print them here.
How can one approach this and maybe find out the character set that is used by this application? I would be grateful for some more ideas.
What you have in your BLOB column is a series of bytes. As you mentioned, these bytes have been written by an application that uses an unknown character set.
In order to interpret those bytes correctly, you need to know the character set as this is literally the mapping of bytes to characters or character identifiers (e.g. code points in UTF).
Now, HANA doesn't come with a whole lot of options to work on LOB data in the first place and for C(haracter)LOB data most manipulations implicitly perform a conversion to a string data type.
So, what I would recommend is to write a custom application that is able to read out the BLOB bytes and perform the conversion in that custom app. Once successfully converted into a string you can store the data in a new NVCLOB field that keeps it in UTF-8 encoding.
You will have to know the character set in the first place, though. No way around that.
I assume you are on Oracle. You can convert BLOB to CLOB as described here.
http://www.dba-oracle.com/t_convert_blob_to_clob_script.htm
In case of your example try this query:
select UTL_RAW.CAST_TO_VARCHAR2(DBMS_LOB.SUBSTR(<your_blob_value)) from dual;
Obviously this only works for values below 32767 characters.

SSIS convert exponent number to real (DT_R4)

I have a flat CSV file and some fields contain a value like "1.8e-5, 8.139717345049093e-39" (exponent or scientific numbers). I need to store this value in a SQL real data type field (not float). But the maximum exponent supported by real is e-38.
But I need a mechanism to convert this string field to a real number through SSIS. Basically the e-39 or smaller values should be replaced as 0. and the rest should be stored properly.
I tried setting the data type to DT_R4 in flat file connection field mapping and that didn't help. I tried casting it to DT_R4 through a derived column and that didn't help too. When I check through Data Viewer still the value has the unsupported exponent value and it fails when I insert it to the SQL table.

Import PostgreSQL dump into SQL Server - data type errors

I have some data which was dumped from a PostgreSQL database (allegedly, using pg_dump) which needs to get imported into SQL Server.
While the data types are ok, I am running into an issue where there seems to be a placeholder for a NULL. I see a backslash followed by an uppercase N in many fields. Below is a snippet of the data, as viewed from within Excel. Left column has a Boolean data type, and the right one has an integer as the data type
Some of these are supposed to be of the Boolean datatype, and having two characters in there is most certainly not going to fly.
Here's what I tried so far:
Import via dirty read - keeping whatever datatypes SSIS decided each field had; to no avail. There were error messages about truncation on all of the boolean fields.
Creating a table for the data based on the correct data types, though this was more fun... I needed to do the same as in the dirty read, as the source would otherwise not load properly. There was also a need to transform the data into the correct data type for insertion into the destination data source; yet, I am getting truncation issues, when it most certainly shouldn't be.
Here is a sample expression in my derived column transformation editor:
(DT_BOOL)REPLACE(observation,"\\N","")
The data type should be Boolean.
Any suggestion would be really helpful!
Thanks!
Since I was unable to circumvent the SSIS rules in order to get my data into my tables without an error, I took the quick-and-dirty approach.
The solution which worked for me was to have the source data read each column as if it were a string, and the destination table had all fields be of the datatype VARCHAR. This destination table will be used as a staging table, once in SS, I can manipulate as needed.
Thank you #cha for your input.

Text was truncated or one or more characters had no match in the target code page including the primary key in an unpivot

I'm trying to import a flat file into an oledb target sql server database.
here's the field that's giving me trouble:
here are the properties of that flat file connection, specifically the field:
here's the error message:
[Source - 18942979103_txt [424]] Error: Data conversion failed. The
data conversion for column "recipient-name" returned status value 4
and status text "Text was truncated or one or more characters had no
match in the target code page.".
What am I doing wrong?
Here is what fixed the problem for me. I did not have to convert to Excel. Just modified the DataType when choosing the data source to "text stream" (Figure 1). You can also check the "Edit Mappings" dialog to verify the change to the size (Figure 2).
Figure 1
Figure 2
After failing by increasing the length or even changing to data type text, I solved this by creating an XLSX file and importing. It accurately detected the data type instead of setting all columns as varchar(50). Turns out nvarchar(255) for that column would have done it too.
I solved this problem by ORDERING my source data (xls, csv, whatever) such that the longest text values on at the top of the file. Excel is great. use the LEN() function on your challenging column. Order by that length value with the longest value on top of your dataset. Save. Try the import again.
SQL Server may be able to suggest the right data type for you (even when it does not choose the right type by default) - clicking the "Suggest Types" button (shown in your screenshot above) allows you to have SQL Server scan the source and suggest a data type for the field that's throwing an error. In my case, choosing to scan 20000 rows to generate the suggestions, and using the resulting suggested data type, fixed the issue.
While an approach proposed above (#chookoos, here in this q&a convert to Excel workbook) and import resolves those kinds of issues, this solution this solution in another q&a is excellent because you can stay with your csv or tsv or txt file, and perfom the necessary fine tuning without creating a Microsoft product related solution
I've resolved it by checking the 'UNICODE'checkbox. Click on below Image link:
You need to go increase the column length while importing the data for particular column.
Choose a data source >> Advanced >> increase the column from default 50 to 200 or more.
Not really a technical solution, but SQL Server 2017 flat file import is totally revamped, and imported my large-ish file with 5 clicks, handled encoding / field length issues without any input from me
SQl Management Studio data import looks at the first few rows to determine source data specs..
shift your records around so that the longest text is at top.
None of the above worked for me. I SOLVED my problem by saving my source data (save as) Excel file as a single xls Worksheet Excel 5.0/95 and imported without column headings. Also, I created the table in advance and mapped manually instead of letting SQL create the table.
I had similar problem against 2 different databases (DB2 and SQL), finally I solved it by using CAST in the source query from DB2. I also take advantage of using a query by adapting the source column to varchar and avoiding the useless blank spaces:
CAST(RTRIM(LTRIM(COLUMN_NAME)) AS VARCHAR(60) CCSID UNICODE
FOR SBCS DATA) COLUMN_NAME
The important issue here is the CCSID conversion.
It usually because in connection manager it may be still of 50 char , hence I have resolved the problem by going to Connection Manager--> Advanced and then change to 100 or may be 1000 if its big enough

Why am I getting a "[SQL0802] Data conversion of data mapping error" exception?

I am not very familiar with iseries/DB2. However, I work on a website that uses it as its primary database.
A new column was recently added to an existing table. When I view it via AS400, I see the following data type:
Type: S
Length: 9
Dec: 2
This tells me it's a numeric field with 6 digits before the decimal point, and 2 digits after the decimal point.
When I query the data with a simple SELECT (SELECT MYCOL FROM MYTABLE), I get back all the records without a problem. However, when I try using a DISTINCT, GROUP BY, or ORDER BY on that same column I get the following exception:
[SQL0802] Data conversion of data mapping error
I've deduced that at least one record has invalid data - what my DBA calls "blanks" or "4 O". How is this possible though? Shouldn't the database throw an exception when invalid data is attempted to be added to that column?
Is there any way I can get around this, such as filtering out those bad records in my query?
"4 O" means 0x40 which is the EBCDIC code for a space or blank character and is the default value placed into any new space in a record.
Legacy programs / operations can introduce the decimal data error. For example if the new file was created and filled using the CPYF command with the FMTOPT(*NOCHK) option.
The easiest way to fix it is to write an HLL program (RPG) to read the file and correct the records.
The only solution I could find was to write a script that checks for blank values in the column and then updates them to zero when they are found.
If the file has record format level checking turned off [ie. LVLCHK(*NO)] or is overridden to that, then an HLL program. (ex. RPG, COBOL, etc) that was not recompiled with the new record might write out records with invalid data in this column, especially if the new column is not at the end of the record.
Make sure that all programs that use native I/O to write or update records on this file are recompiled.
I was able to solve this error by force-casting the key columns to integer. I changed the join from this...
FROM DAILYV INNER JOIN BXV ON DAILYV.DAITEM=BXV.BXPACK
...to this...
FROM DAILYV INNER JOIN BXV ON CAST(DAILYV.DAITEM AS INT)=CAST(BXV.BXPACK AS INT)
...and I didn't have to make any corrections to the tables. This is a very old, very messy database with lots of junk in it. I've made many corrections, but it's a work in progress.