String or binary data would be truncated, field find - sql

I understand the error and how to fix I am just interested in finding the field to fix.
Let me start from the top. I am running a scheduled task daily which executes a process that at some points runs some sprocs in sql which run insert statements. Unfortunately after checking my logs I am getting the error in question and therefore my sprocs arent working. I could update every field to a bigger length and this would probably fix it but id rather not. Is there any way of knowing (without manually checking as there are many fields and thousands of rows) the field that contains the value that is too big for the field it is being inserted into?

Import the data into a new table using VARCHAR(MAX) as the datatype for the columns. Then you can use DATALENGTH to get the maximum size of each column.
SELECT MAX(DATALENGTH(col1)) AS col1, MAX(DATALENGTH(col2)) AS col2, etc.
FROM newTable
This will tell you which column(s) exceed the size of your column(s).

Related

"Numeric value '' is not recognized" - what column?

I am trying to insert data from a staging table into the master table. The table has nearly 300 columns, and is a mix of data-typed Varchars, Integers, Decimals, Dates, etc.
Snowflake gives the unhelpful error message of "Numeric value '' is not recognized"
I have gone through and cut out various parts of the query to try and isolate where it is coming from. After several hours and cutting every column, it is still happening.
Does anyone know of a Snowflake diagnostic query (like Redshift has) which can tell me a specific column where the issue is occurring?
Unfortunately not at the point you're at. If you went back to the COPY INTO that loaded the data, you'd be able to use VALIDATE() function to get better information to the record and byte-offset level.
I would query your staging table for just the numeric fields and look for blanks, or you can wrap all of your fields destined for numeric fields with try_to_number() functions. A bit tedious, but might not be too bad if you don't have a lot of numbers.
https://docs.snowflake.com/en/sql-reference/functions/try_to_decimal.html
As a note, when you stage, you should try and use the NULL_IF options to get rid of bad characters and/or try to load them into stage using the actual datatypes in your stage table, so you can leverage the VALIDATE() function to make sure the data types are correct before loading into Snowflake.
Query your staging using try_to_number() and/or try_to_decimal() for number and decimal fields of the table and the use the minus to get the difference
Select $1,$2,...$300 from #stage
minus
Select $1,try_to_number($2)...$300 from#stage
If any number field has a string that cannot be converted then it will be null and then minus should return those rows which have a problem..Once you get the rows then try to analyze the columns in the result set for errors.

Split multiple points in text format and switch coordinates in postgres column

I have a PostgreSQL column of type text that contains data like shown below
(32.85563, -117.25624)(32.855470000000004, -117.25648000000001)(32.85567, -117.25710000000001)(32.85544, -117.2556)
(37.75363, -121.44142000000001)(37.75292, -121.4414)
I want to convert this into another column of type text like shown below
(-117.25624, 32.85563)(-117.25648000000001,32.855470000000004 )(-117.25710000000001,32.85567 )(-117.2556,32.85544 )
(-121.44142000000001,37.75363 )(-121.4414,37.75292 )
As you can see, the values inside the parentheses have switched around. Also note that I have shown two records here to indicate that not all fields have same number of parenthesized figures.
What I've tried
I tried extracting the column to Java and performing my operations there. But due to sheer amount of records I have, I will run out of memory. I also cannot do this method in batched due to time constraints.
What I want
A SQL query or a sequence of SQL queries that will achieve the result that I have mentioned above.
I am using PostgreSQL9.4 with PGAdmin III as the client
this is a type of problem that should not be solved by sql, but you are lucky to use Postgres.
I suggest the following steps in defining your algorithm.
First part will be turning your strings into a structured data, second will transform structured data back to string in a format that you require.
From string to data
First, you need to turn your bracketed values into an array, which can be done with string_to_array function.
Now you can turn this array into rows with unnest function, which will return a row per bracketed value.
Finally you need to slit values in each row into two fields.
From data to string
You need to group results of the first query with results wrapped in string_agg function that will combine all numbers in rows into string.
You will need to experiment with brackets to achieve exactly what you want.
PS. I am not providing query here. Once you have some code that you tried, let me know.
Assuming you also have a PK or some unique column, and possibly other columns, you can do as follows:
SELECT id, (...), string_agg(point(pt[1], pt[0])::text, '') AS col_reversed
FROM (
SELECT id, (...), unnest(string_to_array(replace(col, ')(', ');('), ';'))::point AS pt
FROM my_table) sub
GROUP BY id; -- assuming id is PK or no other columns
PostgreSQL has the point type which you can use here. First you need to make sure you can properly divide the long string into individual points (insert ';' between the parentheses), then turn that into an array of individual points in text format, unnest the array into individual rows, and finally cast those rows to the point data type:
unnest(string_to_array(replace(col, ')(', ');('), ';'))::point AS pt
You can then create a new point from the point you just created, but with the coordinates reversed, turn that into a string and aggregate into your desired output:
string_agg(point(pt[1], pt[0])::text, '') AS col_reversed
But you might also move away from the text format and make an array of point values as that will be easier and faster to work with:
array_agg(point(pt[1], pt[0])) AS pt_reversed
As I put in the question, I tried extracting the column to Java and performing my operations there. But due to sheer amount of records I have, I will run out of memory. I also cannot do this method in batched due to time constraints.
I ran out of memory here as I was putting everything in a Hashmap of
< my_primary_key,the_newly_formatted_text >. As the text was very long sometimes and due to the sheer number of records that I had, it wasnt surprising that I got an OOM.
Solution that I used:
As suggested my many folks here, this solution was better solved with a code. I wrote a small script that formatted the text as per my liking and wrote the primary key and the newly formatted text to a file in tsv format. Then I imported the tsv in a new table and updated the original table from the new one.

SqlBulkCopy exception, find the colid

Using SqlBulkCopy and getting this exception:
Received an invalid column length from the bcp client for colid 30.
I've been banging my head against this one for hours. I know what row is having the issue, but I don't know which column "colid 30" is. There are 178 columns in the data table. All values seem to be correct and I don't see any that are longer than any column data types in my database.
This database holds property listings and currently has over 3 million records, all of which are just fine.
Is there a way to pinpoint what colid 30 is? Or is there a way to view the actual SQL that the bcp is submitting to the database?
I hope this helps solve someone else's issues as well.
The error was because one of the string/varchar fields in the datatable had a semicolon ";" in it's value. Apparently you need to manually escape these before doing the insert!
I did a loop through all rows/columns and did:
string.Replace(";", "CHAR(59)");
After that, everything inserted smoothly.
Check the size of the columns in the table you are doing bulk insert. the varchar or other string columns might need to be extended.
e.g, Increase size 30 to 50 =>
ALTER TABLE [dbo].[TableName]
ALTER COLUMN [ColumnName] Varchar(50)
In my case, the same issue message was on colid 53. In actual, the issue was on column 54. I guess, you should also check the very next column for size comparison with target table. I hope, it may help you.

Why am I getting a "[SQL0802] Data conversion of data mapping error" exception?

I am not very familiar with iseries/DB2. However, I work on a website that uses it as its primary database.
A new column was recently added to an existing table. When I view it via AS400, I see the following data type:
Type: S
Length: 9
Dec: 2
This tells me it's a numeric field with 6 digits before the decimal point, and 2 digits after the decimal point.
When I query the data with a simple SELECT (SELECT MYCOL FROM MYTABLE), I get back all the records without a problem. However, when I try using a DISTINCT, GROUP BY, or ORDER BY on that same column I get the following exception:
[SQL0802] Data conversion of data mapping error
I've deduced that at least one record has invalid data - what my DBA calls "blanks" or "4 O". How is this possible though? Shouldn't the database throw an exception when invalid data is attempted to be added to that column?
Is there any way I can get around this, such as filtering out those bad records in my query?
"4 O" means 0x40 which is the EBCDIC code for a space or blank character and is the default value placed into any new space in a record.
Legacy programs / operations can introduce the decimal data error. For example if the new file was created and filled using the CPYF command with the FMTOPT(*NOCHK) option.
The easiest way to fix it is to write an HLL program (RPG) to read the file and correct the records.
The only solution I could find was to write a script that checks for blank values in the column and then updates them to zero when they are found.
If the file has record format level checking turned off [ie. LVLCHK(*NO)] or is overridden to that, then an HLL program. (ex. RPG, COBOL, etc) that was not recompiled with the new record might write out records with invalid data in this column, especially if the new column is not at the end of the record.
Make sure that all programs that use native I/O to write or update records on this file are recompiled.
I was able to solve this error by force-casting the key columns to integer. I changed the join from this...
FROM DAILYV INNER JOIN BXV ON DAILYV.DAITEM=BXV.BXPACK
...to this...
FROM DAILYV INNER JOIN BXV ON CAST(DAILYV.DAITEM AS INT)=CAST(BXV.BXPACK AS INT)
...and I didn't have to make any corrections to the tables. This is a very old, very messy database with lots of junk in it. I've made many corrections, but it's a work in progress.

T-SQL: How to log erroneous entries during import

I do an import and convert columns to other datatypes. During the import I check, if I can convert the values stored in the columns into the target datatypes. If not I put a default value, e.g. I convert a string to int and when I cannot convert the string I insert -1.
At this moment it would be great if I could log the erroneous entries into a separate table, e.g. instead of a parsable string '1234' 'xze' arrives, so I put -1 into the target table and 'xze' into the log table.
Is this possible with (T-)SQL?
Cheers,
Andreas
This is also very easy to do with SSIS data flow tasks. Any of the data conversion or lookup steps you might try have a "success" output and an "error" output. You simply direct all the error outputs to a "union" transform, and from there into a common error table. The result of all the successes go into your "success" table. You can even add extra details into the error table, to give you clear error messages.
The nice thing about this is that you still get high performance, as entire buffers move through the system. You'll eventually have buffers full of valid data being bulk written to the success table, and small buffers full of errors being written to the error table. When errors happen on a row, that row will simply be moved from one buffer of rows into another.
If you have a staging table, you can filter the good stuff into the final table and do a join back (NOT EXISTS) to find the rubbish for the log table