How to add a character to the last third place of a string? - sql

I have a column with numbers with various lengths such as 50055, 1055,155 etc. How can I add a decimal before the last 2nd place of each so that it would be 500.55, 10.55, and 1.55?
I tried using replace by finding the last 2 numbers and replace it with .||last 2 number. That doesn't always work because of a possibility of multiple repetition of the same sequence in the same string.
replace(round(v_num/2),substr(round(v_num/2),-2),'.'||substr(round(v_num/2),-2))

You would divide by 100:
select v_num / 100
You can convert this into a string, if you want.

Related

Select rows which contain numeric substrings in Pandas

I need to delete rows from a dataframe in which a particular column contains string which contains numeric substrings. See the shaded column of my dataframe.
rows with values like 0E as prefix or 21 (any two digit number) as suffix or 24A (any two digit number with a letter) as suffix should be deleted.
Any suggestions?
Thanks in advance.
You can use boolean indexing with a str.contains() regex:
^0E - starts with 0E
\d{2}$ - ends with 2 digits
\d{2}[A-Z]$ - ends with 2 digits and 1 capital letter
col = ... # target column
mask = df[col].str.contains(r'^0E|\d{2}$|\d{2}[A-Z]$')
df = df.loc[~mask]
#tdy gave a good answer, but only one place need to be modified if I understand it correctly.
For value ends with two digits or two digits and a capital character, the regex should be:
.*\d{2}[A-Z]?$

How to extract just numeric value with REGEXP_EXTRACT in BigQuery?

I am trying to extract just the numbers from a particular column in BigQuery.
The fields concerned have this format: value = "Livraison_21J|Relais_19J" or "RELAIS_15 DAY"
I am trying to extract the number of days for each value preceeded by the keyword "Relais".
The days range from 1 to 100.
I used this to do so:
SELECT CAST(REGEXP_EXTRACT(delivery, r"RELAIS_([0-9]+J)") as string) as relayDay
FROM TABLE
I want to be able to extract just the number of days regardless of the the string that comes after the numbers, be it "J" or "DAY".
Sample data :
RETRAIT_2H|LIVRAISON_5J|RELAIS_5J | 5J
LIVRAISON_21J|RELAIS_19J | 19J
LIVRAISON_21J|RELAIS_19J | 19J
RETRAIT_2H|LIVRAISON_3J|RELAIS_3J | 3J
You may use
REGEXP_EXTRACT(delivery, r"(?:.*\D)?(\d+)\s*(?:J|DAY)")
See the regex demo
Details
(?:.*\D)? - an optional non-capturing group that matches 0+ chars other than line break chsrs as many as possible and then a non-digit char (this pattern is required to advance the index to the location right before the last sequence of digits, not the last digit)
(\d+) - Group 1 (just what the REGEXP_EXTRACT returns): one or more digits
\s* - 0+ whitespaces
(?:J|DAY) - J or DAY substrings.

Python: Remove exponential in Strings

I have been trying to remove the exponential in a string for the longest time to no avail.
The column involves strings with alphabets in it and also long numbers of more than 24 digits. I tried converting the column to string with .astype(str) but it just reads the line as "1.234123E+23". An example of the table is
A
345223423dd234324
1.234123E+23
how do i get the table to show the full string of digits in pandas?
b = "1.234123E+23"
str(int(float(b)))
output is '123412299999999992791040'
no idea how to do it in pandas with mixed data type in column

SQL - Create Unique AlphaNumeric based on a 10-digit integer stored as VARCHAR

I'm trying to emulate a function in SQL that a client has produced in Excel. In effect, they have a unique, 10-digit numeric value (VARCHAR) as the primary key in one of their enterprise database systems. Within another database, they require a unique, 5-digit alphanumeric identifier. They want that 5-digit alphanumeric value to be a representation of the 10-digit number. So what they did in excel was to split the 10-digit number into pairs, then convert each of those pairs into a hexadecimal value, then stitch them back together.
The EXCEL equation is:
=IF(VALUE(MID(A2,1,4))>0,DEC2HEX(VALUE(MID(A2,3,2)))&DEC2HEX(VALUE(MID(A2,5,2)))&DEC2HEX(VALUE(MID(A2,7,2)))&DEC2HEX(VALUE(MID(A2,9,2))),DEC2HEX(VALUE(MID(A2,5,2)))&DEC2HEX(VALUE(MID(A2,7,2)))&DEC2HEX((VALUE(MID(A2,9,2)))))
I need the SQL equivalent of this. Of course, should someone out there know a better way to accomplish their goal of "a 5-digit alphanumeric identifier" based off the 10-digit number, I'm all ears.
ADDED 8/2/2011
First of all, thank you to everyone for the replies. Nice to see folks willing to help and even enjoying it! Based on all the responses, I'm apt to tell my client they're intent is sound, only their method is off kilter. I'd also like to recommend a solution. So the challenge remains, just modified slightly:
CHALLENGE: Within SQL, take a 10 digit, unique NUMERIC string and represent it ALPHANUMERICALLY in as few characters as possible. The resulting string must also be unique.
Note that the first 3-4 characters in the 10-digit string are likely to be zeros, and that they could be stripped to shorten the resulting alphanumeric string. Not required, but perhaps helpful.
This problem is inherently impossible. You have a 10 digit numeric value that you want to convert to a 5 digit alphanumeric value. Since there are 10 numeric characters, this means that there are 10^10 = 10 000 000 000 unique values for your 10 digit number. Since there are 36 alphanumeric characters (26 letters + 10 numbers), there are 36^5 = 60 466 176 unique values for your 5 digit number. You cannot map a set of 10 billion elements into a set with around 60 million.
Now, lets take a closer look at what your client's code is doing:
So what they did in excel was to split the 10-digit number into pairs, then convert each of those pairs into a hexadecimal value, then stitch them back together.
This isn't 100% accurate. The excel code never uses the first 2 digits, but performs this operation on the remaining 8. There are two main problems with this algorithm which may not be intuitively obvious:
Two 10 digit numbers can map to the same 5 digit number. Consider the numbers 1000000117 and 1000001701. The last four digits of 1000000117 get mapped to 1 11, where the last four digits of 1000001701 get mapped to 11 1. This causes both to map to 00111.
The 5 digit number may not even end up being 5 digits! For example, 1000001616 gets mapped to 001010.
So, what is a possible solution? Well, if you don't care if that 5 digit number is unique or not, in MySQL you can use something like:
hex(<NUMERIC VALUE> % 0xFFFFF)
The log of 10^10 base 2 is 33.219280948874
> return math.log(10 ^ 10) / math.log(2)
33.219280948874
> = 2 ^ 33.21928
9999993422.9114
So, it takes 34 bits to represent this number. In hex this will take 34/4 = 8.5 characters, much more than 5.
> return math.log(10 ^ 10) / math.log(16)
8.3048202372184
The Excel macro is ignoring the first 4 (or 6) characters of the 10 character string.
You could try encoding in base 36 instead of 16. This will get you to 7 characters or less.
> return math.log(10 ^ 10) / math.log(36)
6.4254860446923
The popular base 64 encoding will get you to 6 characters
> return math.log(10 ^ 10) / math.log(64)
5.5365468248123
Even Ascii85 encoding won't get you down to 5.
> return math.log(10 ^ 10) / math.log(85)
5.1829075929158
You need base 100 to get to 5 characters
> return math.log(10 ^ 10) / math.log(100)
5
There aren't 100 printable ASCII characters, so this is not going to work, as zkhr explained as well, unless you're willing to go beyond ASCII.
I found your question interesting (although I don't claim to know the answer) - I googled a bit for you out of interest and found this which may help you http://dpatrickcaldwell.blogspot.com/2009/05/converting-decimal-to-hexadecimal-with.html

vb.net what is a good way of displaying a decimal with a given maximum length

I am writing a custom totaling method for a grid view. I am totaling fairly large numbers so I'd like to use a decimal to get the total. The problem is I need to control the maximum length of the total number. To solve this problem I started using float but it doesn't seem to support large enough numbers, I get this in the totals column(1.551538E+07). So is there some formating string I can use in .ToString() to guarentee that I never get more then X characters in the total field? Keep in mind I'm totaling integers and decimals.
If you're fine with all numbers displaying in scientific notation, you could go with "E[numberOfDecimalPlaces]" as your format string.
For example, if you want to cap your strings at, say, 12 characters, then, accounting for the one character for the decimal point and five characters needed to display the exponential part, you could do:
Function FormatDecimal(ByVal value As Decimal) As String
If value >= 0D Then
Return value.ToString("E5")
Else
' negative sign eats up another character '
Return value.ToString("E4")
End If
End Function
Here's a simple demo of this function:
Dim d(5) As Decimal
d(0) = 1.203D
d(1) = 0D
d(2) = 1231234789.432412341239873D
d(3) = 33.3218403820498320498320498234D
d(4) = -0.314453908342094D
d(5) = 000032131231285432940D
For Each value As Decimal in d
Console.WriteLine(FormatDecimal(value))
Next
Output:
1.20300E+000
0.00000E+000
1.23123E+009
3.33218E+001
-3.1445E-001
3.21312E+016
You could use Decimal.Round, but I don't understand the exact question, it sounds like you're saying that if the total adds up to 12345.67, you might only want to show 4 digits and would then show 2345 or do you just mean that you want to remove the decimals?