rxTextToXdf to read commas as decimals - microsoft-r

I have a large text file that uses commas instead of periods to indicate decimals.
Is there a way to get the rxTexttoXdf function in the RevolScaleR package to view commas as periods?
I suspect I'm going to get so much flak for this post as it seems really simple
Edit:
I am currently using a workaround that involves importing the numeric columns as character type, followed by stripping the comma and replacing it with a period and then converting to numeric
library(dplyrXdf)
imported_data %>% #dataset with character types
mutate_if(is.character,
funs(gsub(",",".",.))) %>% #replace commas for period
mutate_if(is.character, as.numeric) %>% #convert character to numeric
persist(cleaned_file) # cleaned_file being a file path
It feels like there are much cleaner ways of doing this

RxTextData has a decimalPoint argument for just this purpose.
Assuming your text file is European csv (columns are ; separated, , is the decimal point):
txt <- RxTextData("your/file.txt", decimalPoint=",", delimiter=";")
xdf <- rxDataStep(txt, "imported.xdf")
# do stuff with xdf
In general, it's a good idea to use data source objects to refer to files, rather than filenames. You can also use rxDataStep for just about everything.

Related

Azure Data Factory: Reading doubles with a comma as decimal separator instead of a dot

I'm currently trying to read from a CSV files ( separated with semicolon ';') with decimal numbers formatted with a comma(,) as a decimal separator instead of a dot (.).
i.e: the number 12356.12 is stored as 12356,12.
In the source's projection, what would be the correct format to read the value correctly?
The format should in Java Decimal Format
If your CSV file's columnDelimiter is a comma (','), your first concern is how to avoid your number data won't be treated as different columns. Since your number data is stored as 12356,12, so my suggests as below :
Change the columnDelimiter as | or other special characters.
2.Set escape char. Please see this description:
In addition, 12356,12 can't be identified as Decimal format in ADF automatically. And no such mechanism o turn , into .. So I think you need to transfer data as string temporary. Then convert it into Decimal in your destination with java code.
True answer is in the comments: In the copy job the culture can be defined, which influences the decimal separator. Go to "mapping" > "Type conversion settings" > "culture" and chose en-us, de-de or whatever works for you. Be aware that this will also influence other types like dates.

Kotlin: Printing string with array elements that cuts off left side of answers

I am writing a small text based game to familiarize myself with Kotlin. I am creating two strings that print out the multiple choice options. I have confirmed that all four array elements are captured appropriately, but when the string prints it cuts off the a) and c) options. I have used \t, spaces, etc. and it does the same thing. I have also tried to just use print() and then use a \n at the end
println(menuList[0])
println(menuList[1])
println(menuList[2])
println(menuList[3])
println("a) ${menuList[0]} b) ${menuList[1]}")
println("c) ${menuList[2]} d) ${menuList[3]}")
Output:
erroneous output of multiple choice text
The source text came from a file which was separating each line with \r\n, but the code reading it was splitting it with \n. The result was that each entry ended with \r. When printed out, this caused the first value to be overwritten.
The solution is, when reading the file, to split by \r\n rather than \n.

I was wondering if there is any way to treat delimiters inside quotes as merely characters and not delimiters

I have a massive amount of files that are all made using the same schema. They are put into a format where they are space delimited. A sample file row looks like this:
1 2 abc def "g h" 3
And when I try to use the schema INT, INT, STRING, STRING, STRING, INT, it fails for me because of the space inside the quotation marks.
I know this is where the error is because if I make a sample tab separated instead of space separated, no such error occurs, but that is not feasible for me to do with all of my data. I was wondering if there is any way to be able to indicate in a file upload that delimiters in quotes should not be treated as delimiters but rather as characters? (Rather that all quoted text should be treated as one string.)
I know this feature exists for new line characters, and so I was wondering about delimiters.
Thank you!
I figured it out. The error was there was an extra delimiter character at the end of the file. Now I just need to trim each line of the file before uploading.

Access VBA, importing csv file via TransferText with commata as decimal separator and semicolon as delimiter

I'm having some problems importing double numbers from csv files. The files have a semicolon delimiter and comma as decimal separator.
I can't set up import specs since the order of the fields in the csv often changes and it would be a desaster if the data goes into the wrong field.
Also the csv files will have to written to a temporary table first. Don't hate me for it, but since I have to process data and set some information fields for later data processing this is by far the easiest, fastest and safest way to achieve it.
Here is the problem itself:
When using TransferText it will import, but of course interpret the comma as delimiter. Not good ...
When replacing comma by full stop and semicolon by comma it works. But it will ignore full stops, so 1.2 becomes 12, 1.333 becomes 1333. The field will be of type double.
I've tests numerous things. Besides TransferText I've tried:
DoCmd.RunSQL ("INSERT INTO Tabelle1 SELECT cdbl(a1) as aa FROM[TEXT;FMT=Delimited;HDR=YES;CharacterSet=437;DATABASE=C:\SPOT].[test.csv]")
But nothing seems to work, even when I create a new table with field type DOUBLE before using TransferText ... decimals are still ignored.
So, I would be happy if you could tell me either how to use TransferText with or without replacing semicolon and comma in a first step or how to use the INSERT INTO stuff.
Thank you very much!
Ok, I think I got it!
The problem where the regional settings and that my Access uses comma as decimal separator. I was also not able to create a Import Spec via manual import, since it needs to have defined which fields will have to be imported.
What I did now was this:
Open the table MSysIMEXSpecsthat contains the import specs via query:
select * from MSysIMEXSpecs
Then add a new row and set SpecName = "Whatever", DecimalPoint= "," and 'FieldSeparator` = ";" and whatever other settings have to be made.
Since there is this workaround, isn't there a way to do this easier?

Fortran 90: How to correctly read an integer among other real

I have created a Fortran 90 code to filter and convert the text output of another program in a csv form. The file contains a table with columns of various types (character, real, integer). There is a column that generally contains decimal values (probability values). BUΤ, in some rows, where the value should be decimal "1.000", the value is actually integer "1".
I use "F5.3" specifier to read this column and I have the same format statement for every row of the table. So, when the code finds "1", it reads ".001", because it does not find a decimal point.
What ways could I use to correctly (and generally) read integers among other decimals?
Could I specify "unformatted" input only for a number of "spaces"?
The data edit descriptor fw.d for floating point format specification is for input normally used with zero d (it cannot be ommited). Nonzero d is used in the rare case when the floating point data is stored as scaled integers, or you do some unit conversion from the integer values.
You could try using list-directed input: use a * instead of a format specifier. This would be for the entire read, not selected items. Or you could read the lines into a string test their contents to decide how to read them. If the sub-string has a decimal point: read (string(M:N), '(F5.3)') value. If it doesn't, use a different format, e.g., perhaps read as as F5.0.
P.S. "unformatted" is reading binary data without conversion ... it is a direct copy of the data from the file to the data item. "listed-directed" is the Fortran term for reading & converting data without using a format specification.
well here's someting new to me: f90 allows a mix of comma and space delimiters for a simple list directed read:
read(unit,*)v1,v2,v3,v4
with input
1.222 2 , 3.14 , 4
yields
1.222000 2.000000 3.140000 4.000000