I've imported a somewhat large set of data, with, at times, an odd number format (e.g., 12,345.01- and 1,945.001-), and I am trying to 'fix' it.
The data was imported as VARCHAR(20)
My solution:
to_number(BadNumCol, 'S999G999G999D999')
input: 10426.95 ;261.000 ;33.93-
outputs:42695.00 ;261.000 ; 3.93
the output is NUMERIC(12,3)
desired output: 10426.95 ; 261.000 ; -33.93
What's going on here? What am I missing/not understanding in my ignorance?
And, how do I fix these ~400 Million data elements?
Ther are two issues that I see here.
In the case of the sample input data from the "solution" section of you data, none of the sample input data has group separators, so the group separators you are specifying in your TO_NUMBER function is mismatched.
Also, you are specifyng a sign character anchored to the beginning of the string, when your data has a trailing minus instead.
The most appropriate conversion format string I can infer from your sample input data is: '999999999D999MI'
select * from num_test;
BADNUMCOL
-----------
261.000
10426.95
33.93-
(3 rows)
select to_number(BadNumCol,'999999999D999MI')::NUMERIC(12,3) GoodNumCol from num_test;
GOODNUMCOL
------------
10426.950
-33.930
261.000
(3 rows)
Related
I am using SAS to pull data in a Teradata environment. I am counting the rows in the Teradata table, but want the output to be in a comma format (i.e. 1,000,000). I was able to use the code below to display the value as a comma, but when I try to add the column in SAS, I can't since the output is in a character format. Does anyone have any suggestions on how to format the number value as comma, so that it can be used for calculation purposes in SAS? Thanks.
CAST(Count(*) as (format 'Z,ZZZ,ZZ9')) as char(10)) as rowCount,
Assuming you're using pass through, pull it in as numeric and format it on the SAS side. You've now converted it to character (char10) and SAS doesn't do math on character variables which makes logical sense.
select rowCount format=comma12. from con
(select
count(*) as rowCount ....
)
If you have a select * you can always format it later in a data step or via PROC DATASETS. SAS separates the display and storage layers so the format controls the appearance but the underlying data still remains numeric.
I have a big problem right now and I really need your help, because I can't find the right answer.
I am currently writing a script that triggers a migration process from a table with raw data (data we received from an excel file) to a new normalized schema.
My problem is that there is a column PRICE (varchar2 datatype) with a bunch of traps. For example: 540S, 25oo , I200 , S000 .
And I need to convert it to the correct NUMBER(9,2) format so I can get: 5405, 2500, 1200, 5000 as NUMBER for the previous examples and INSERT INTO my_new_table.
Is there any way I can parse every CHAR of these strings that verify certain conditions?
Or others better way?
Thank you :)!
One of the wonderful things about Oracle that some other DBs lack, is the TRANSLATE function:
SELECT TRANSLATE(number, 'SsIilOoxyz', '5511100') FROM t
This will convert:
S, s to 5
I, i and l to 1
O, o to 0
Remove any x, y or z from the number
The second and third arguments to translate define what characters are to be mapped. If the first string is longer than the second then anything over the length of the second is deleted from the resulting string. Mapping is direct based on position:
'SsIilOoxyz',
'5511100'
Look at the columns of the characters; the character above is mapped to the character below:
S->5,
s->5,
I->1,
i->1,
l->1,
O->0,
o->0,
x->removed,
y->removed,
z->removed`
You can use translate() and along with to_number(). Your rules are not exactly clear, but something like this:
select to_number(translate(price, '0123456789IoS', '012345678910'))
from t;
This replaces I with 1, o with 0, and removes S.
Initial situation
I have a relatively large table (ca. 0.7 Mio records) where an nvarchar field "MediaID" contains largely media IDs in proper hexadecimal notation (as they should).
Within my "sequential" query (each query depends on the output of the query before, this is all in pure T-SQL) I have to convert these hexadecimal values into decimal bigint values in order to do further calculations and filtering on these calculated values for the subsequent queries.
--> So far, no problem. The "sequential" query works fine.
Problem
Unfortunately, some of these Media IDs do contain non-hex characters - most probably because there was some typing errors by the people which have added them or through import errors from the previous business system.
Because of these non-hex chars, the whole query fails (of course) because the conversion hits an error.
For my current purpose, such rows must be skipped/ignored as they are clearly wrong and cannot be used (there are no medias / data carriers in use with the current business system which can have non-hex character IDs).
Manual editing of the data is not an option as there are too many errors and it is not clear with what the data must be replaced.
Challenge
To create a query which only returns records which have valid hex values within the media ID field.
(Unfortunately, my SQL skills are not enough to create the above query. Your help is highly appreciated.)
The relevant section of the larger query looks like this (xxxx is where your help comes in :-))
select
pureMediaID
, mediaID
, CUSTOMERID
,CONTRACT_CUSTOMERID
from
(
select concat('0x', Replace(Ltrim(Replace(mediaID, '0', ' ')), ' ', '0')) AS pureMediaID
--, CUSTOMERID
, *
from M_T_CONTRACT_CUSTOMERS
where mediaID is not null
and mediaID like '0%'
and xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
) as inner1
EDIT: As per request I have added here some good and some bad data:
Good:
4335463357
4335459809
1426427996
4335463509
4335515039
4335465134
4427370396
4335415661
4427369036
4335419089
004BB03433
004e7cf9c6
00BD23133
00EE13D8C1
00CCB5522C
00C46522C
00dbbe3433
Bad:
4564589+
AB6B8BFC.8
7B498DFCnm
DB218DFChb
d<tgfh8CFC
CB9E8AFCzj
B458DFCjhl
rytzju8DFC
BFCtdsjshj
DB9888FCgf
9BC08CFCyx
EB198DFCzj
4B628CFChj
7B2B8DFCgg
After I did upgrade the compatibility level of the SQL instance to SQL2016 (it was below 2012 before) I could use try_convert with same syntax as the original convert function as donPablo has pointed out. With that the query could run fully through and every MediaID which is not a correct hex value gets nicely converted into a null value - really, really nice.
Exactly what I needed.
Unfortunately, the solution of ALICE... didn't work out for me as this was also (strangely) returning records which had the "+" character within them.
Edit: The added comment of Alice... where you create a calculated field like this:
CASE WHEN "KEY" LIKE '%[^0-9A-F]%' THEN 0 ELSE 1 end as xyz
and then filter in the next query like this:
where xyz = 1
works also with SQL Instances with compatibility level < SQL 2012.
Great addition for people which still have to work with older SQL instances.
An option (although not ideal in terms of performance) is to check the characters in the MediaID through a case statement and regular expression
Hexadecimals cannot contain characters other than A-F and numbers between 0 and 9
CASE WHEN MediaID LIKE '%[0-9A-F]%' THEN 1 ELSE 0 END
I would recommend writing a function that can be used to evaluate MediaID first and checks if it is hexadecimal and then running the query for conversion
I have a problem when I am trying to move a varbinary(max) field from one DB to another.
If I insert like this:
0xD0CF11E0A1B11AE10000000
It results the beginning with an additional '0':
0x0D0CF11E0A1B11AE10000000
And I cannot get rid of this. I've tried many tools, like SSMS export tool or BCP, but without any success. And it would be better fro me to solve it in a script anyway.
And don't have much kowledge about varbinary (a program generates it), my only goal is to copy it:)
0xD0CF11E0A1B11AE10000000
This value contains an odd number of characters. Varbinary stores bytes. Each byte is represented by exactly two hexadecimal characters. You're either missing a character, or your not storing bytes.
Here, SQL Server is guessing that the most significant digit is a zero, which would not change the numeric value of the string. For example:
select 0xD0C "value"
,cast(0xD0C as int) "as_integer"
,cast(0x0D0C as int) "leading_zero"
,cast(0xD0C0 as int) "trailing_zero"
value 3_char leading_zero trailing_zero
---------- --------- --------------- ----------------
0d0c 3340 3340 53440
Or:
select 1 "test"
where 0xD0C = 0x0D0C
test
-------
1
It's just a difference of SQL Server assuming that varbinary always represents bytes.
I am looking for a way to take data from one table and manipulate it and bring it to another table using an SQL query.
I have a Column called NumberStuff that has data like this in it:
INC000000315482
I need to cut off the INC portion of the number and convert it into an integer and store it into a Column in another table so that it ends up looking like this:
315482
Any help would be much appreciated!
Another approach is to use the Replace function. Either in TSQL or as a Derived Column Expression in SSIS.
TSQL
SELECT REPLACE(T.MyColumn, 'INC', '') AS ReplacedINC
SSIS
REPLACE([MyColumn], "INC", "")
This removes the character based data. It then becomes an optional exercise in converting to a numeric type before storing it to the target table or letting the implicit conversion happen.
Simplest version of what you need.
select cast(right(column,6) as int) from table
Are you doing this in a SSIS statement, or?...is it always the last 6 or?...
This is a little less dependant on your formatting...removes 0's and can be any length (will trim the first 3 chars and the leading 0's).
select cast(SUBSTRING('INC000000315482',4,LEN('INC000000315482') - 3) as int)