I am trying to insert data from a staging table into the master table. The table has nearly 300 columns, and is a mix of data-typed Varchars, Integers, Decimals, Dates, etc.
Snowflake gives the unhelpful error message of "Numeric value '' is not recognized"
I have gone through and cut out various parts of the query to try and isolate where it is coming from. After several hours and cutting every column, it is still happening.
Does anyone know of a Snowflake diagnostic query (like Redshift has) which can tell me a specific column where the issue is occurring?
Unfortunately not at the point you're at. If you went back to the COPY INTO that loaded the data, you'd be able to use VALIDATE() function to get better information to the record and byte-offset level.
I would query your staging table for just the numeric fields and look for blanks, or you can wrap all of your fields destined for numeric fields with try_to_number() functions. A bit tedious, but might not be too bad if you don't have a lot of numbers.
https://docs.snowflake.com/en/sql-reference/functions/try_to_decimal.html
As a note, when you stage, you should try and use the NULL_IF options to get rid of bad characters and/or try to load them into stage using the actual datatypes in your stage table, so you can leverage the VALIDATE() function to make sure the data types are correct before loading into Snowflake.
Query your staging using try_to_number() and/or try_to_decimal() for number and decimal fields of the table and the use the minus to get the difference
Select $1,$2,...$300 from #stage
minus
Select $1,try_to_number($2)...$300 from#stage
If any number field has a string that cannot be converted then it will be null and then minus should return those rows which have a problem..Once you get the rows then try to analyze the columns in the result set for errors.
Related
Let me explain why I want to do this... I have built a Tableau dashboard that allows a user to browse/search all of the tables & columns in our warehouse by schema, object type (table,view,materialized view), etc. I want to add a column that pulls a sample of the data from each column in each table - this is also done, but with this problem...:
The resulting column is comprised of data of different types (varchar2, LONG, etc.). I can basically get every type of data to conform to a single data type except for LONG - it will not allow me to convert it to anything else compatible with everything else (if that makes sense...). I simply need all data types to coexist in a single column. I've tried many different things and have been reading up on the subject for about a week now, but it sounds like it just can't be done, but in my experience there is always a way... I figured I'd check with the guru's here before admitting defeat.
One of the things I've tried:
--Here, from two different tables, I'm pulling a single piece of data from a single column and attempting to merge into a single column called SAMPLE_DATA
--OTHER is LONG data type
--ORGN_NME is VARCHAR2 data type
select 'PLAN','OTHER', cast(substr(OTHER,1,2) as varchar2(4000)) as SAMPLE_DATA from sde.PLAN union all
select 'BUS_ORGN','ORGN_NME', cast(substr(ORGN_NME,1,2) as varchar2(4000)) as SAMPLE_DATA from sde.BUS_ORGN;
Resulting error:
Lookup Error
ORA-00932: inconsistent datatypes: expected CHAR got LONG
How can I achieve this?
Thanks in advance
Long datatypes are basically unusable by most applications. I made something similar where I wanted to search the contents of packages. The solution is to convert the LONG into CLOB using a pipelined function. Adrian Billington's source code can be found here:
https://github.com/oracle-developer/dla
You end up with a view that you can query. I did not see any performance hit even when looking at large packages so it should work for you.
I have a PostgreSQL column of type text that contains data like shown below
(32.85563, -117.25624)(32.855470000000004, -117.25648000000001)(32.85567, -117.25710000000001)(32.85544, -117.2556)
(37.75363, -121.44142000000001)(37.75292, -121.4414)
I want to convert this into another column of type text like shown below
(-117.25624, 32.85563)(-117.25648000000001,32.855470000000004 )(-117.25710000000001,32.85567 )(-117.2556,32.85544 )
(-121.44142000000001,37.75363 )(-121.4414,37.75292 )
As you can see, the values inside the parentheses have switched around. Also note that I have shown two records here to indicate that not all fields have same number of parenthesized figures.
What I've tried
I tried extracting the column to Java and performing my operations there. But due to sheer amount of records I have, I will run out of memory. I also cannot do this method in batched due to time constraints.
What I want
A SQL query or a sequence of SQL queries that will achieve the result that I have mentioned above.
I am using PostgreSQL9.4 with PGAdmin III as the client
this is a type of problem that should not be solved by sql, but you are lucky to use Postgres.
I suggest the following steps in defining your algorithm.
First part will be turning your strings into a structured data, second will transform structured data back to string in a format that you require.
From string to data
First, you need to turn your bracketed values into an array, which can be done with string_to_array function.
Now you can turn this array into rows with unnest function, which will return a row per bracketed value.
Finally you need to slit values in each row into two fields.
From data to string
You need to group results of the first query with results wrapped in string_agg function that will combine all numbers in rows into string.
You will need to experiment with brackets to achieve exactly what you want.
PS. I am not providing query here. Once you have some code that you tried, let me know.
Assuming you also have a PK or some unique column, and possibly other columns, you can do as follows:
SELECT id, (...), string_agg(point(pt[1], pt[0])::text, '') AS col_reversed
FROM (
SELECT id, (...), unnest(string_to_array(replace(col, ')(', ');('), ';'))::point AS pt
FROM my_table) sub
GROUP BY id; -- assuming id is PK or no other columns
PostgreSQL has the point type which you can use here. First you need to make sure you can properly divide the long string into individual points (insert ';' between the parentheses), then turn that into an array of individual points in text format, unnest the array into individual rows, and finally cast those rows to the point data type:
unnest(string_to_array(replace(col, ')(', ');('), ';'))::point AS pt
You can then create a new point from the point you just created, but with the coordinates reversed, turn that into a string and aggregate into your desired output:
string_agg(point(pt[1], pt[0])::text, '') AS col_reversed
But you might also move away from the text format and make an array of point values as that will be easier and faster to work with:
array_agg(point(pt[1], pt[0])) AS pt_reversed
As I put in the question, I tried extracting the column to Java and performing my operations there. But due to sheer amount of records I have, I will run out of memory. I also cannot do this method in batched due to time constraints.
I ran out of memory here as I was putting everything in a Hashmap of
< my_primary_key,the_newly_formatted_text >. As the text was very long sometimes and due to the sheer number of records that I had, it wasnt surprising that I got an OOM.
Solution that I used:
As suggested my many folks here, this solution was better solved with a code. I wrote a small script that formatted the text as per my liking and wrote the primary key and the newly formatted text to a file in tsv format. Then I imported the tsv in a new table and updated the original table from the new one.
I am having to create a second header line and am using the first record of the Query to do this. I am using a UNION All to create this header record and the second part of the UNION to extract the Data required.
I have one issue on one column.
,'Active Energy kWh'
UNION ALL
,SUM(cast(invc.UNITS as Decimal (15,0)))
Each side are 11 lines before and after the Union and I have tried all sorts of combinations but it always results in an error message.
The above gives me "Error converting data type varchar to numeric."
Any help would be much appreciated.
The error message indicates that one of your values in the INVC table UNITS column is non-numeric. I would hazard a guess that it's either a string (VARCHAR or similar) column or something else - and one of the values has ended up in a state where it cannot be parsed.
Unfortunately there is no way other than checking small ranges of the table to gradually locate the 'bad' row (i.e. Try running the query for a few million rows at a time, then reducing the number until you home in on the bad data). SQL 2014 if you can get a database restored to it has the TRY_CONVERT function which will permit conversions to fail, enabling a more direct check - but you'll need to play with this on another system
(I'm assuming that an upgrade to 2014 for this feature is out of the question - your best bet is likely just looking for the bad row).
The problem is that you are trying to mix header information with data information in a single query.
Obviously, all your header columns will be strings. But not all your data columns will be strings, and SQL Server is unhappy when you mix data types this way.
What you are doing is equivalent to this:
select 'header1' as col1 -- string
union all
select 123.5 -- decimal
The above query produces the following error:
Error converting data type varchar to numeric.
...which makes sense, because you are trying to mix both a string (the header) with a decimal field.
So you have 2 options:
Remove the header columns from your query, and deal with header information outside your query.
Accept the fact that you'll need to convert the data type of every column to a string type. So when you have numeric data, you'll need to cast the column to varchar(n) explicitly.
In your case, it would mean adding the cast like this:
,'Active Energy kWh'
UNION ALL
,CAST(SUM(cast(invc.UNITS as Decimal (15,0))) AS VARCHAR(50)) -- Change 50 to appropriate value for your case
EDIT: Based on comment feedback, changed the cast to varchar to have an explicit length (varchar(n)) to avoid relying on the default length, which may or may not be long enough. OP knows the data, so OP needs to pick the right length.
I understand the error and how to fix I am just interested in finding the field to fix.
Let me start from the top. I am running a scheduled task daily which executes a process that at some points runs some sprocs in sql which run insert statements. Unfortunately after checking my logs I am getting the error in question and therefore my sprocs arent working. I could update every field to a bigger length and this would probably fix it but id rather not. Is there any way of knowing (without manually checking as there are many fields and thousands of rows) the field that contains the value that is too big for the field it is being inserted into?
Import the data into a new table using VARCHAR(MAX) as the datatype for the columns. Then you can use DATALENGTH to get the maximum size of each column.
SELECT MAX(DATALENGTH(col1)) AS col1, MAX(DATALENGTH(col2)) AS col2, etc.
FROM newTable
This will tell you which column(s) exceed the size of your column(s).
I am not very familiar with iseries/DB2. However, I work on a website that uses it as its primary database.
A new column was recently added to an existing table. When I view it via AS400, I see the following data type:
Type: S
Length: 9
Dec: 2
This tells me it's a numeric field with 6 digits before the decimal point, and 2 digits after the decimal point.
When I query the data with a simple SELECT (SELECT MYCOL FROM MYTABLE), I get back all the records without a problem. However, when I try using a DISTINCT, GROUP BY, or ORDER BY on that same column I get the following exception:
[SQL0802] Data conversion of data mapping error
I've deduced that at least one record has invalid data - what my DBA calls "blanks" or "4 O". How is this possible though? Shouldn't the database throw an exception when invalid data is attempted to be added to that column?
Is there any way I can get around this, such as filtering out those bad records in my query?
"4 O" means 0x40 which is the EBCDIC code for a space or blank character and is the default value placed into any new space in a record.
Legacy programs / operations can introduce the decimal data error. For example if the new file was created and filled using the CPYF command with the FMTOPT(*NOCHK) option.
The easiest way to fix it is to write an HLL program (RPG) to read the file and correct the records.
The only solution I could find was to write a script that checks for blank values in the column and then updates them to zero when they are found.
If the file has record format level checking turned off [ie. LVLCHK(*NO)] or is overridden to that, then an HLL program. (ex. RPG, COBOL, etc) that was not recompiled with the new record might write out records with invalid data in this column, especially if the new column is not at the end of the record.
Make sure that all programs that use native I/O to write or update records on this file are recompiled.
I was able to solve this error by force-casting the key columns to integer. I changed the join from this...
FROM DAILYV INNER JOIN BXV ON DAILYV.DAITEM=BXV.BXPACK
...to this...
FROM DAILYV INNER JOIN BXV ON CAST(DAILYV.DAITEM AS INT)=CAST(BXV.BXPACK AS INT)
...and I didn't have to make any corrections to the tables. This is a very old, very messy database with lots of junk in it. I've made many corrections, but it's a work in progress.