Remove all but two trailing zeros AND thousand separator - sql

Postgres Version 11.7
I am trying to remove trailing zeros beyond two decimal places, while also adding thousand separator to my result set.
Example 1 produces the desired result; all trailing zeros were removed:
SELECT to_char(54354.0010, 'FM99 999 999 990.999999'); --> returns 54 354.001
Example 2 illustrates the problem:
SELECT to_char(54354.0000, 'FM99 999 999 990.999999'); --> returns 54 354.
In Example 2 all zeroes after the decimal are removed.
But the desired result would be:
54 354.00
The result should always have a minimum of two decimal places, regardless whether they are zero or not.

This produces your desired result:
SELECT to_char(54354.0000, 'FM99 999 999 990.009999')
Two 0 instead of 9 after the decimal point.
The manual:
9 digit position (can be dropped if insignificant)
0 digit position (will not be dropped, even if insignificant)

Related

Split a vector into parts separated by zeros and cumulatively sum the elements in each part

I want to split a vector into several parts separated by the numeric value 0. For each part, cumulatively calculate the sum of the elements encountered so far. Negative numbers do not participate in the calculation.
For example, with the input [0,1,1,1,0,1,-1,1,1], I expect the result to be [0,1,2,3,0,1,1,2,3].
How to implement this in DolphinDB?
Use the DolphinDB built-in function cumPositiveStreak(X). Treat the negative elements in X as NULL values.
Script:
a = 1 2 -1 0 1 2 3
cumPositiveStreak(iif(a<0,NULL,a))
Execution result:
1 3 3 0 1 3 6

Need a way to split string pandas to colums with numbers

hi i have string in one column :
s='123. 125. 200.'
i want to split it to 3 columns(or as many numbers i have ends with .)
To separate columns and that it will be number not string !, in every column .
From what I understand, you can use:
s='123. 125. 200.'
pd.Series(s).str.rstrip('.').str.split('.',expand=True).apply(pd.to_numeric,errors='coerce')
0 1 2
0 123 125 200

Unable to identify strange whitespace character in MSSQL table

We have a process that reads an XML file into our database and inserts any rows that aren't currently in another table to that table.
This process also has a trigger to write to an audit table and a nightly snapshot is also held in another table.
In the XML holding table a field looks like 1234567890123456 but it exists on our live table as 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6. Those spaces will not be removed by any combination of REPLACE functions. We have tried all CHAR values and it does not recognise the character. The audit table and nightly snapshot, however, contain the correct values.
Similarly, if we run a comparison between SELECT CASE WHEN '1234567890123456' = '1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 ' THEN 1 ELSE 0 END, this returns 1, so they match. However LEN('1234567890123456') is 16 and LEN('1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 ') is 32.
We have ran some queries to loop through the characters in the field and output the ASCII and Unicode values for the characters. The digits return the correct ASCII/Unicode values, but this random whitespace character does not return a value.
An example of the incorrectly displayed one is 0x35000000320000003800000036000000380000003300000039000000370000003800000037000000330000003000000035000000340000003000000033000000 and a correct one is 0x3500320038003600380033003200300030003000360033003600380036003000. Both were added by the same means on the same day. One has the extra bytes, the other is fine.
How can we identify this character and get rid of it? Is there a reason this would have been inserted originally? How can we avoid this in future?
Data entry
It looks like some null (i.e. Char(0)) characters have got into the data.
If the data was supposed to be ASCII when it was entered but UTF-16 data got, then it could be:
Entered character codes: 48 00
Sent to database: 48 00 00 00
To avoid that, remove disallowed characters as the first step in processing the input, say by using a regex to replace [\x00-\x1F] with an empty string.
Data repair
Search for entries which a Char(0) in them to confirm that they can be found that way.
If so, replace the Char(0) with an empty string.
If that doesn't work, you could convert the data to the format '0x35000000320000003800000036000000380000003300000039000000370000003800000037000000330000003000000035000000340000003000000033000000', replace '000000' with '00', and then convert back.

regex - match exactly 10 digits with atleast one symbol or spaces between them

I'm trying to write a query in oracle sql to get rows which has invalid 10 digit numbers, ie with other symbols in between them.
For example:
(111) 111-1111 #10 digit number with some symbols and spaces in between
111-111-1111
(111)111-1111
111)111-1111
(111) 11 1-1111
ie, It should match exactly 10 digit numbers which are non consecutive because it has some symbols in it.
So it should not match the following example:
111 #consecutive 3 digit number
11 1 #3 digit number with spaces
11-1 #3 digit number with symbol in between
1111111111 #consective 10 digit number
And I'm using REGEXP_LIKE, something like this
select * from table where REGEXP_LIKE(column, ?)
Any help is much appreciated. Thanks.
You could use a combination of a regex and length; the latter to exclude a pure 10-digit number without other characters:
regexp_like(col, '^[ .()-]*(\d[ .()-]*){10}$') and length(col) > 10
In the [.()-] class you would list all the characters that you would allow as symbols among the digits. Note that - needs to be the last in that list or else be escaped.
If you would allow any non-digit to occur among the 10 digits, you can use \D:
regexp_like(col, '^\D*(\d\D*){10}$') and length(col) > 10
So: the string should have length greater than 10, and the total number of digits must be exactly 10. This can be done without regular expressions (which should make it faster):
... where length(str) > 10 and
length(str) = 10 + length(translate(str, 'z0123456789', 'z'))
translate will translate the letter z to itself and all the other characters (digits) to nothing. Having to include the z is annoying, but unavoidable; translate will return NULL if any of its arguments is NULL. The second condition says the length of the input str is exactly 10 more than the length of the string with all digits removed - so there are exactly 10 digits.

SQL - Create Unique AlphaNumeric based on a 10-digit integer stored as VARCHAR

I'm trying to emulate a function in SQL that a client has produced in Excel. In effect, they have a unique, 10-digit numeric value (VARCHAR) as the primary key in one of their enterprise database systems. Within another database, they require a unique, 5-digit alphanumeric identifier. They want that 5-digit alphanumeric value to be a representation of the 10-digit number. So what they did in excel was to split the 10-digit number into pairs, then convert each of those pairs into a hexadecimal value, then stitch them back together.
The EXCEL equation is:
=IF(VALUE(MID(A2,1,4))>0,DEC2HEX(VALUE(MID(A2,3,2)))&DEC2HEX(VALUE(MID(A2,5,2)))&DEC2HEX(VALUE(MID(A2,7,2)))&DEC2HEX(VALUE(MID(A2,9,2))),DEC2HEX(VALUE(MID(A2,5,2)))&DEC2HEX(VALUE(MID(A2,7,2)))&DEC2HEX((VALUE(MID(A2,9,2)))))
I need the SQL equivalent of this. Of course, should someone out there know a better way to accomplish their goal of "a 5-digit alphanumeric identifier" based off the 10-digit number, I'm all ears.
ADDED 8/2/2011
First of all, thank you to everyone for the replies. Nice to see folks willing to help and even enjoying it! Based on all the responses, I'm apt to tell my client they're intent is sound, only their method is off kilter. I'd also like to recommend a solution. So the challenge remains, just modified slightly:
CHALLENGE: Within SQL, take a 10 digit, unique NUMERIC string and represent it ALPHANUMERICALLY in as few characters as possible. The resulting string must also be unique.
Note that the first 3-4 characters in the 10-digit string are likely to be zeros, and that they could be stripped to shorten the resulting alphanumeric string. Not required, but perhaps helpful.
This problem is inherently impossible. You have a 10 digit numeric value that you want to convert to a 5 digit alphanumeric value. Since there are 10 numeric characters, this means that there are 10^10 = 10 000 000 000 unique values for your 10 digit number. Since there are 36 alphanumeric characters (26 letters + 10 numbers), there are 36^5 = 60 466 176 unique values for your 5 digit number. You cannot map a set of 10 billion elements into a set with around 60 million.
Now, lets take a closer look at what your client's code is doing:
So what they did in excel was to split the 10-digit number into pairs, then convert each of those pairs into a hexadecimal value, then stitch them back together.
This isn't 100% accurate. The excel code never uses the first 2 digits, but performs this operation on the remaining 8. There are two main problems with this algorithm which may not be intuitively obvious:
Two 10 digit numbers can map to the same 5 digit number. Consider the numbers 1000000117 and 1000001701. The last four digits of 1000000117 get mapped to 1 11, where the last four digits of 1000001701 get mapped to 11 1. This causes both to map to 00111.
The 5 digit number may not even end up being 5 digits! For example, 1000001616 gets mapped to 001010.
So, what is a possible solution? Well, if you don't care if that 5 digit number is unique or not, in MySQL you can use something like:
hex(<NUMERIC VALUE> % 0xFFFFF)
The log of 10^10 base 2 is 33.219280948874
> return math.log(10 ^ 10) / math.log(2)
33.219280948874
> = 2 ^ 33.21928
9999993422.9114
So, it takes 34 bits to represent this number. In hex this will take 34/4 = 8.5 characters, much more than 5.
> return math.log(10 ^ 10) / math.log(16)
8.3048202372184
The Excel macro is ignoring the first 4 (or 6) characters of the 10 character string.
You could try encoding in base 36 instead of 16. This will get you to 7 characters or less.
> return math.log(10 ^ 10) / math.log(36)
6.4254860446923
The popular base 64 encoding will get you to 6 characters
> return math.log(10 ^ 10) / math.log(64)
5.5365468248123
Even Ascii85 encoding won't get you down to 5.
> return math.log(10 ^ 10) / math.log(85)
5.1829075929158
You need base 100 to get to 5 characters
> return math.log(10 ^ 10) / math.log(100)
5
There aren't 100 printable ASCII characters, so this is not going to work, as zkhr explained as well, unless you're willing to go beyond ASCII.
I found your question interesting (although I don't claim to know the answer) - I googled a bit for you out of interest and found this which may help you http://dpatrickcaldwell.blogspot.com/2009/05/converting-decimal-to-hexadecimal-with.html