Extract Number from VARCHAR - sql

I have a [Comment] column of type VARCHAR(255) in a table that I'm trying to extract numbers from. The numbers will always be 12 digits, but aren't usually in the same place. Some of them will also have more than one 12 digit number, which is fine, but I only need the first.
I've tried using PATINDEX('%[0-9]%',[Comment]), but I can't figure out how to set a requirement of 12 digits.
An example of the data I'm working with is below:
Combined 4 items for $73.05 with same claim no. 123456789012 as is exceeding financial limits
Consolidated remaining amount of claim numbers, 123456789013, 123456789014, 123456789015, 123456789016 due to financial limits

You can just use 12 [0-9]'s in a row:
PATINDEX('%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9‌​][0-9][0-9]%',[Comme‌​nt])

Related

Why does Oracle seem to add extra decimal places when converting number using to_char?

I have a large Oracle table (millions of rows) with two columns of type NUMBER. I am trying to write a query to get the max number of decimal places on pl_pred (I expect this to be around 7 or 8). When I do a to_char on the column, there are extra decimal values showing up and it says the max is 18 when I'm only seeing around 4-7 when selecting on that column. Anyone know why as well as how to accurately assess what the max amount of decimal places is? I need to transfer this data to SQL Server and was trying to come up with the right precision and scale values for the numeric data type.
select
pl.pl_pred as pred_number,
to_char(pl_pred) as pred_char,
length(to_char(pl_pred)) as pred_len
from pollution pl
where length(to_char(pl_pred)) > 15
Results:
PRED_NUMBER PRED_CHAR PRED_LEN
4.6328 "4.6327999999999987" 18
5.8767 "5.8766999999999996" 18
11.19625 "11.196250000000001" 18
13.566375 "13.566374999999997" 18
Table:
CREATE TABLE RHT.POLLUTION
(
LOCATION_GROUP VARCHAR2(20 BYTE),
PL_DATE DATE,
PL_PRED NUMBER,
PL_SE NUMBER
)
Update (again): I ran the same query in SQL Developer and got this, where the two values are showing up exactly the same. So that's interesting. I went back and was able to look at the raw data and it does not match up, though. 4.63278 and 5.8767 is what I see. There are some longer ones, like 10.4820321428571. It's like it's treating Number like float but I thought Number in Oracle was exact.
"PRED_NUMBER" "PRED_CHAR" "PRED_LEN"
4.6327999999999987 "4.6327999999999987" 18
5.8766999999999996 "5.8766999999999996" 18
10.4820321428571 "10.4820321428571" 15
Raw data:
4.6328
5.8767
10.4820321428571

How do you do multiple substrings for a field in Teradata?

I have a field to pull account numbers which have different lengths and I want to pass the last four digits of the account number. The dilemma I am having is that since they are different lengths I am having trouble in substringing the fields. The standard length is 11 digits but there are accounts with 9 digits and 7 digits.
How do I substring those values in multiple substrings to capture all the account last 4 digits in one query?
This currently what I have:
SELECT SUBSTRING(ACCT_NBR,7,4) AS BNK_ACCT_NBR
FROM NAMEOFTABLE;
I want to have additional substrings to capture the account numbers that don't have 11 digits similar to
SUBSTRING(ACCT_NBR,5,4)
SUBSTRING(ACCT_NBR,4,4)
The results should look like:
76587990891 - 0891
654378908 - 8908
45643456 - 3456
Can you please help me in figuring out how I can do that?
Thanks.
Is ACCT_NBR a VarChar or an INT?
VarChar:
Right (ACCT_NBR,4)
Substr(ACCT_NBR,Char_Length(x)-3)
INT:
ACCT_NBR MOD 10000

Generate a progressive number when new record are inserted (some record need to have the same number)

the Title can be a little confused. Let me explain the problem. I have a pipeline that loads new record daily. This record contain sales. The key is <date, location, ticket, line>. This data are loaded into a redshift table and than are exposed through a view that is read by a system. This system have a limit, the column for the ticket is a varchar(10) but the ticket is a string of 30 char. If the system take only the first 10 character will generate duplicate. The ticket number can be a "fake" number. Doesn't matter if it isn't equal to the real number. So I'm thinking to add a new column on the redshift table that contain a progressive number. The problem is that I cannot use an identity column because the record belonging to the same ticket must have the same "progressive number". Then I will expose this new column (ticket_id) instead of the original one.
That is what I want:
day
location
ticket
line
amount
ticket_id
12/12/2020
67
123...GH
1
10
1
12/12/2020
67
123...GH
2
5
1
12/12/2020
67
123...GH
3
23
1
12/12/2020
23
123...GB
1
13
2
12/12/2020
23
123...GB
2
45
2
...
...
...
...
...
...
12/12/2020
78
123...AG
5
100
153
The next day when new data will be loaded I want start with the ticket_id 154 and so on.
Each row have a column which specify the instant in which it was inserted. Rows inserted the same day have the same insert_time.
My solution is:
insert the record with ticket_id as a dense_rank. But each time (that I load new record, so each day) the ticket_id start by one, so...
... update the rows just inserted as ticket_id = ticket_id + the max number that I find under the ticket_id column where insert_time != max(insert_time)
Do you think that there is a better solution? It would be very nice if a hash function existed that take <day, location, ticket> as input and return a number of max 10 characters.
So from the comments it sounds like you cannot add a dimension table to just look up the number or 10 character string that identifies each ticket as this would be a data model change. This is likely the best and most accurate way to do this.
You asked about a hash function to do this and there are several. But first let's talk about hashes - these take strings of varying length and make a signature out of them. Since this process can significantly reduce the number of characters there is a possibility that 2 different string will generate the same hash. The longer the hash value is the lower the odds are for having such a collision but the odds are never zero. Since you can only have 10 chars this sets the odds of a hash collision.
The md5() function on Redshift will take a string and make a 32 character string (base 16 characters) out of it. md5(day::text || location || ticket:text) will make such a hash out of the columns you mentioned. This process can make 16^32 possible different strings which is a big number.
But you only want a string of 10 character. The good news is that hash functions like md5() spread the differences between strings across the whole output so you can just pick any 10 characters to use. Doing this will reduce the number of unique values to 16^10 or about 1.1 trillion - still a big number but if you have billions of rows you could see a collision. One way to improve this would be to base64 encode the md5() output and then truncate to 10 characters. Doing this will require a UDF but would improve the number of possible hashes to 1.1E18 - a million times larger. If you want the output to be an integer you can convert hex strings to integers with strtol() but a 10 digit number only has 10 billion possible values.
So if you are sure you want to use a hash this is quite possible. Just remember what a hash does.

How to write a SQL query to find number of accounts tied to a particular value?

I want to find the number of companies tied to a specific membership field. This is somewhat complex, as many different companies can belong to the same membership number. Companies are identified by a 6 character string (account number), but in this table they can be either 6 or 8 characters long. I am only interested in accounts that are 6 characters, or 8 characters where the last two characters are 00. To make matters worse, there are many duplicate account numbers in this table -- the field is not unique.
The membership field is numeric, but can also be null. I care only about non-zero, non-null membership numbers. Unfortunately it's random whether or not the user input a null, a zero, or an actual number.
The end result is to display all the membership numbers with a count of how many account numbers belong.
All the information is from the same table. Each row will have an account number and membership number among many other fields.
Think of it like a franchise situation. Let's pretend Burger Joint is our customer, so we have a membership number for them. Then, all the franchisees with different account numbers, use the membership number to do business with us. I'm trying to find out how much REAL business is done with each membership.
I realize this may be a multi-step process. Below is my attempt to try to get back the correct account number types. AQT didn't seem to like it though.
where (length(account) > 6 and substring(account, (length(account) -2), length(account)) ='00')
Companies are identified by a 6 character string (account number), but in this table they can be either 6 or 8 characters long. I am only interested in accounts that are 6 characters, or 8 characters where the last two characters are 00
WHERE (accountnumber LIKE '______' or accountnumber LIKE '______00')
I care only about non-zero, non-null membership numbers
AND membershipnumber > 0
You don't need to specifically check for null member numbers; requiring the number be greater than 0 excludes nulls implicitly
The end result is to display all the membership numbers with a count of how many account numbers belong
SELECT membershipnumber, COUNT(DISTINCT accountnumber) as count_unique_accs
FROM ...
WHERE ...
GROUP BY membershipnumber
Doesn't matter if accountnumber is duplicated: count distinct only gives the unique occurrences. If 123456 and 12345600 are the same accountnumber, substring inside the count
Edit:
By this last sentence I mean:
COUNT(DISTINCT SUBSTRING(accountnumber, 1, 6)) --sql server syntax
where membership is not null and membership <> 0 and account is not null
and (length(account) = 6
or (length(account) = 8 and right(account,2) = '00'))

CHECK constraint for string to contain a certain amount of digits as well as certain digits. (Oracle SQL)

I have a column, number where I need a length constraint (say 11 digits) as well as to assert the existence of some certain numbers. Let us say the first four digits need to be '1234' and the fifth in the range'6-9'. I am using a varchar type so I also need to assert numbers. With some research here is what I have been able to come up with:
CHECK (REGEXP_LIKE(number, '^1234\d{6}$'))
In this way I have been able to check the number of digits (11), the first 4 starting numbers and number values. However, I cannot fit the fifth number which needs to be between 6 and 9 into this expression.
Thanks in advance
Try this.
CHECK (REGEXP_LIKE(number, '^1234[6-9]\d{6}$'))