Azure Data Factory: Reading doubles with a comma as decimal separator instead of a dot - formatting

I'm currently trying to read from a CSV files ( separated with semicolon ';') with decimal numbers formatted with a comma(,) as a decimal separator instead of a dot (.).
i.e: the number 12356.12 is stored as 12356,12.
In the source's projection, what would be the correct format to read the value correctly?
The format should in Java Decimal Format

If your CSV file's columnDelimiter is a comma (','), your first concern is how to avoid your number data won't be treated as different columns. Since your number data is stored as 12356,12, so my suggests as below :
Change the columnDelimiter as | or other special characters.
2.Set escape char. Please see this description:
In addition, 12356,12 can't be identified as Decimal format in ADF automatically. And no such mechanism o turn , into .. So I think you need to transfer data as string temporary. Then convert it into Decimal in your destination with java code.

True answer is in the comments: In the copy job the culture can be defined, which influences the decimal separator. Go to "mapping" > "Type conversion settings" > "culture" and chose en-us, de-de or whatever works for you. Be aware that this will also influence other types like dates.

Related

rxTextToXdf to read commas as decimals

I have a large text file that uses commas instead of periods to indicate decimals.
Is there a way to get the rxTexttoXdf function in the RevolScaleR package to view commas as periods?
I suspect I'm going to get so much flak for this post as it seems really simple
Edit:
I am currently using a workaround that involves importing the numeric columns as character type, followed by stripping the comma and replacing it with a period and then converting to numeric
library(dplyrXdf)
imported_data %>% #dataset with character types
mutate_if(is.character,
funs(gsub(",",".",.))) %>% #replace commas for period
mutate_if(is.character, as.numeric) %>% #convert character to numeric
persist(cleaned_file) # cleaned_file being a file path
It feels like there are much cleaner ways of doing this
RxTextData has a decimalPoint argument for just this purpose.
Assuming your text file is European csv (columns are ; separated, , is the decimal point):
txt <- RxTextData("your/file.txt", decimalPoint=",", delimiter=";")
xdf <- rxDataStep(txt, "imported.xdf")
# do stuff with xdf
In general, it's a good idea to use data source objects to refer to files, rather than filenames. You can also use rxDataStep for just about everything.

Fortran 90: How to correctly read an integer among other real

I have created a Fortran 90 code to filter and convert the text output of another program in a csv form. The file contains a table with columns of various types (character, real, integer). There is a column that generally contains decimal values (probability values). BUΤ, in some rows, where the value should be decimal "1.000", the value is actually integer "1".
I use "F5.3" specifier to read this column and I have the same format statement for every row of the table. So, when the code finds "1", it reads ".001", because it does not find a decimal point.
What ways could I use to correctly (and generally) read integers among other decimals?
Could I specify "unformatted" input only for a number of "spaces"?
The data edit descriptor fw.d for floating point format specification is for input normally used with zero d (it cannot be ommited). Nonzero d is used in the rare case when the floating point data is stored as scaled integers, or you do some unit conversion from the integer values.
You could try using list-directed input: use a * instead of a format specifier. This would be for the entire read, not selected items. Or you could read the lines into a string test their contents to decide how to read them. If the sub-string has a decimal point: read (string(M:N), '(F5.3)') value. If it doesn't, use a different format, e.g., perhaps read as as F5.0.
P.S. "unformatted" is reading binary data without conversion ... it is a direct copy of the data from the file to the data item. "listed-directed" is the Fortran term for reading & converting data without using a format specification.
well here's someting new to me: f90 allows a mix of comma and space delimiters for a simple list directed read:
read(unit,*)v1,v2,v3,v4
with input
1.222 2 , 3.14 , 4
yields
1.222000 2.000000 3.140000 4.000000

Check if character field contains only digits

I read data from a excel file.
The cols of internal table all are char128, there are 2 cols contain only digital with decimal point. So I need to check the fields only contain digital or digital with decimal point.
The function module NUMERIC_CHECK, just can check only digital, if the digital with decimal point it will be useless.
You may use CO (contains only):
IF value CO '1234567890.'.
"OK
ELSE.
"Error"
ENDIF.
Maybe you need also a space in your IF _ CO-statement.
This check does not detect multiple decimals points (e.g. 123.45.67.89).
Newer versions of ABAP support regular expressions.
If you have also spaces in your string, you may add them into the CO-value:: IF value CO '1234567890 .'.
You might try to use REGEX. The report DEMO_REGEX_TOY lets you input strings and test regular expressions against them.
Someone more experienced with REGEX in general might be able to make this a little more versatile but here's what I came up with:
-?[0-9]+(\.[0-9]*)?
-? Optionally match the '-' character (allows for negatives or non-negatives
[0-9]+ matches digits (the + makes it match 1 or more)
\.? optionally matches the '.' character (the \ is needed as '.' is an operator)
([0-9]+)? optionally matches any digits after the decimal
if you want to check if the string is a valid decimal, you could use the following module function : 'HRCM_STRING_TO_AMOUNT_CONVERT', which allow you to convert a string to its numeric counterpart, given the string, the decimal separator and the thousand separator.
regards
Another way to check for number:
data: float_value type f.
try.
float_value = <your string value here>.
catch cx_sy_conversion_no_number.
" not a valid number
endtry.

Hsqldb - how to remove the padding on char fields

I'm finding that Char fields are being padded.
Is there any way to stop this happening.
I've tried using the property
SET PROPERTY "sql.enforce_strict_size" FALSE
but doesn't seem to help.
Indeed, the MySQL docs specify that "When CHAR values are retrieved, trailing spaces are removed." This is odd, as other databases seem to always keep the padding (i can confirm that for Oracle). The SQL-92 standard indicates that right-padded spaces are part of the char, for example in the definition of the CAST function on p. 148. When source (SV=source value) and target (TV=target value, LTD=length of target datatype), then:
ii) If the length in characters of SV is larger than LTD, then
TV is the first LTD characters of SV. If any of the re-
maining characters of SV are non-<space> characters, then a
completion condition is raised: warning-string data, right
truncation.
iii) If the length in characters M of SV is smaller than LTD,
then TV is SV extended on the right by LTD-M <space>s.
Maybe that's just another one of MySQL's many oddities and gotchas.
And to answer your question: if you don't want the trailing spaces, you should use VARCHAR instead.
I thought 'char' by definition are space padded to fill the field. They are considered fixed lenght and will be space padded to be fixed length.
The data type 'varchar' is defined as variable char where they are not space padded to fill the field.
I could be wrong though since I normally work on SQL Server.

Correct termiology for documentation [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
The documentation below is for a module, which has now been "decommissioned"
and I'm writing it's replacement.
Before i write the replacement I want to get my terms right.
I know the terms are wrong in the documentation - it was hacked together quickly
so i could instruct a college working on the hardware side of this project on how to use a program I'ld made.
Full documentary can be found here for any who are interested (in so much as has been written and added to our wiki), the Website may only be available to certain IPS's (depends on you ISP - university internet connections are most likely to work), and the SVN repo is private.
So there are alot of terms that are wrong.
such as.
deliminators
formatted string containing value expressions (might now be wrong but is hard to say)
What are the correct terms for these.
And what other mistakes have I made
==== formatted string containing value expressions ====
Before I start on actual programs an explanation of:
"formatted string containing value expressions" and how to encode values in them.
The ''formatted string containing value expressions'' is at the core of doing low level transmission.
We know the decimal 65, hex 41, binary 0100 0001, and the ascii character 'A' all have the same binary representation, so to tell which we are using we have a series of deliminators - numbers preceded by:
# are decimal
$ are Hex
# are binary
No deliminator, then ascii.
Putting a sign indicator after the deliminator is optional. It is required if you want to send a negative number.
You may put muliple values in the same string.
eg: "a#21#1001111$-0F"
All values in a ''formatted string containing value expressions'' must be in the range -128 to 255 (inclusive) as they must fit in 8bytes (other values will cause an error). Negative numbers have the compliment of 2 representation for their binary form.
There are some problems with ascii - characters that can't be sent (in future versions this will be fixed by giving ascii a delineator and some more code to make that deliminator work, I think).
Characters that can't be sent:
* The delineator characters: $##
* Numbers written immediately after a value that could have contained those digits:
* 0,1,2,3,4,5,6,7,8,9 for decimal
* 0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f,A,B,C,D,E,F for hex
* 0,1 for binary
For a start, deliminator would probably be delimiter, although I notice your text has both delineator and deliminator in it - perhaps deliminator is a special delimiter/terminator combination :-)
However, a delimiter is usually used to separate fields and is usually present no matter what. What you have is an optional prefix which dictates the following field type. So I would probably call that a "prefix" or "type prefix" instead.
The "formatted string containing value expressions" I would just call a "value expression string" or "value string" to change it to a shorter form.
One other possible problem:
must be in the range -128 to 255 (inclusive) as they must fit in 8bytes
I think you mean 8 bits.
Try something like the following:
==== Value string encoding ====
The value string is at the core of the data used for low level
transmissions.
Within the value string the following refixes are used:
# decimal
$ Hex
# binary
No prefix - ASCII.
An optional sign may be included after the delimiter for negative numbers.
Negative numbers are represented using twos complement.
The value string may contain multiple values:
eg: "a#21#1001111$-0F"
All elements of the value string must represent an 8bit value and must
be in the range -128 to 255
When using ASCII representation the following characters that can't be sent
* The delineator characters: $## (use prefixed hex value.)
* Numbers written immediately after a value that could have
contained those digits:
* 0,1,2,3,4,5,6,7,8,9 for decimal
* 0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f,A,B,C,D,E,F for hex
* 0,1 for binary