Exporting amounts using space as '000 delimiter - pandas

I would like all amounts exported to Excel to use space as '000 delimiter and ',' for decimal. E.g: "3 257 132,54" (common number format in Europe)
I tried to adapt the example provided on xlsxwriter.readthedocs.io :
format1 = workbook.add_format({'num_format': '#,##0.00'})
As follows
format1 = workbook.add_format({'num_format': '# ##0,00'})
I am using the code from the xlsxwriter doc. I just modified the '000 delimiter and the decimal point:
# Add some cell formats.
format1 = workbook.add_format({'num_format': '#,##0.00'})
# Set the column width and format.
worksheet.set_column('B:B', 18, format1)
I obtain a very surprising result. The example provided above will appear,in Excel, as: 3257 132,54.
Almost good, but the '000 separator is only used once for thousands, but not for millions or billions. (nb: the comma as decimal separator works fine)
Is there a trick I missed?

You just need to use whatever number format that you would use in Excel for this. Probably something like ### ### ###.00 (although it doesn't have a comma for a decimal):
import xlsxwriter
workbook = xlsxwriter.Workbook('test.xlsx')
worksheet = workbook.add_worksheet()
format1 = workbook.add_format({'num_format': '### ### ###.00'})
worksheet.set_column('B:B', 18, format1)
worksheet.write(0, 1, 123.123)
worksheet.write(1, 1, 1234.123)
worksheet.write(2, 1, 12345.123)
worksheet.write(3, 1, 123456.123)
worksheet.write(4, 1, 1234567.123)
worksheet.write(5, 1, 12345678.123)
workbook.close()
Output:
You can find the exact number format you need by setting it in Excel and then checking what it is in the custom section of the number format dialog.

Related

The as.Date() Function does not work, my characters remain characters

I have an Excel File in which there is a column containing the date and hour of a regarding measurment in the format 01.01.2018 01:00.
The first 3 rows contain characters, the whole column is formatted as "Number" (in Excel/libre)
If I try to read the xlxs file with readxl:
NO2_2018 <- read_excel("NO2_2018.xlsx", sheet = "Seite 1",
range = "A2:AU8762", col_types = c("date",
"numeric", ....)
I get NA Values (format is POSIXct) and the warning
Expecting date in .... / .....: got '03.01.2018 02:00'
Then I thought I read it as "txt" and then convert it with as.Date() function:
as.Date(NO2_2018$Zeitpunkt,format = "%d.%m.%Y% H:%M", tz="CEST")
However, it does not change the class
class(NO2_2018$Zeitpunkt)
[1] "character"
Have you tried to change the dot in the date and then use the as.date in your transformed variable?
(gsub(".", "/", date)

Sqldf in R - error with first column names

Whenever I use read.csv.sql I cannot select from the first column with and any output from the code places an unusual character (A(tilde)-..) at the begging of the first column's name.
So suppose I create a df.csv file in in Excel that looks something like this
df = data.frame(
a = 1,
b = 2,
c = 3,
d = 4)
Then if I use sqldf to query the csv which is in my working directory I get the following error:
> read.csv.sql("df.csv", sql = "select * from file where a == 1")
Error in result_create(conn#ptr, statement) : no such column: a
If I query a different column than the first, I get a result but with the output of the unusual characters as seen below
df <- read.csv.sql("df.csv", sql = "select * from file where b == 2")
View(df)
Any idea how to prevent these characters from being added to the first column name?
The problem is presumably that you have a file that is larger than R can handle and so only want to read a subset of rows into R and specifying the condition to filter it by involves referring to the first column whose name is messed up so you can't use it.
Here are two alternative approaches. The first one involves a bit more code but has the advantage that it is 100% R. The second one is only one statement and also uses R but additionally makes use an of an external utility.
1) skip header Read the file in skipping over the header. That will cause the columns to be labelled V1, V2, etc. and use V1 in the condition.
# write out a test file - BOD is a data frame that comes with R
write.csv(BOD, "BOD.csv", row.names = FALSE, quote = FALSE)
# read file skipping over header
DF <- read.csv.sql("BOD.csv", "select * from file where V1 < 3",
skip = 1, header = FALSE)
# read in header, assign it to DF and fix first column
hdr <- read.csv.sql("BOD.csv", "select * from file limit 0")
names(DF) <- names(hdr)
names(DF)[1] <- "TIME" # suppose we want TIME instead of Time
DF
## TIME demand
## 1 1 8.3
## 2 2 10.3
2) filter Another way to proceed is to use the filter= argument. Here we assume we know that the end of the column name is ime but there are other characters prior to that that we don't know. This assumes that sed is available and on your path. If you are on Windows install Rtools to get sed. The quoting might need to be changed depending on your shell.
When trying this on Windows I noticed that sed from Rtools changed the line endings so below we specified eol= to ensure correct processing. You may not need that.
DF <- read.csv.sql("BOD.csv", "select * from file where TIME < 3",
filter = 'sed -e "1s/.*ime,/TIME,/"' , eol = "\n")
DF
## TIME demand
## 1 1 8.3
## 2 2 10.3
So I figured it out by reading through the above comments.
I'm on a Windows 10 machine using Excel for Office 365. The special characters will go away by changing how I saved the file from a "CSV UTF-8 (Comma Delimited)" to just "CSV (Comma delimited)".

Changing the order of columns in a CSV file in VB.NET

I have a CSV files output from a software without headers,
I need to change the order of columns based on a config file
initial-column Final-Column
1 5
2 3
3 1
Any ideas how to go about this?
There is very very little to go on, such as how the config file works and what the data looks like.
Note that using the layout of {1, 5, 2, 3, 3, 1} you arent just reordering the columns, that drops one (4) and duplicates columns 1 and 3.
Using some fake random data left over from this answer, this reads it in, then writes it back out in a a different order. You will have to modify it to take the config file into consideration.
Sample data:
Ndxn fegy n, 105, Imaypfrtzghkh, -1, red, 1501
Mfyze, 1301, Kob dlfqcqtkoccxwbd, 0, blue, 704
Xe fnzeifvpha, 328, Mnarhrlselxhcyby hq, -1, red, 1903
Dim csvFile As String = "C:\Temp\mysqlbatch.csv"
Dim lines = File.ReadAllLines(csvFile)
Dim outFile As String = "C:\Temp\mysqlbatch2.csv"
Dim data As String()
Dim format As String = "{0}, {4}, {1}, {2}, {2}, {0}"
Using fs As New StreamWriter(outFile, False)
For Each s As String In lines
' not the best way to split a CSV,
' no OP data to know if it will work
data = s.Split(","c)
' specifiy the columns to write in
' the order desired
fs.WriteLine(String.Format(format,
data(0),
data(1),
data(2),
data(3),
data(4),
data(5)
)
)
Next
End Using
This approach uses the format string and placeholder ({N}) to control the order. The placeholders and array elements are all zero based, so {1, 5, 2, 3, 3, 1} becomes {0, 4, 1, 2, 2, 0}. Your config file contents could simply be a collection of these format strings. Note that you can have more args to String.Format() than there are placeholders but not fewer.
Output:
Ndxn fegy n, red, 105, Imaypfrtzghkh, Imaypfrtzghkh, Ndxn fegy n
Mfyze, blue, 1301, Kob dlfqcqtkoccxwbd, Kob dlfqcqtkoccxwbd, Mfyze
Xe fnzeifvpha, red, 328, Mnarhrlselxhcyby hq, Mnarhrlselxhcyby hq, Xe fnzeifvpha
Splitting the incoming data on the comma (s.Split(","c)) will work in many cases, but not all. If the data contains commas (as in some currencies "1,23") it will fail. In this case the seperator char is usually ";" instead, but the data can have commons for other reasons ("Jan 22, 2016" or "garden hose, green"). The data may have to be split differently.
Note: All the OPs previous posts are vba related. The title includes VB.NET and is tagged vb.net, so this is a VB.NET answer

How to load 2D array from a text(csv) file into Octave?

Consider the following text(csv) file:
1, Some text
2, More text
3, Text with comma, more text
How to load the data into a 2D array in Octave? The number can go into the first column, and all text to the right of the first comma (including other commas) goes into the second text column.
If necessary, I can replace the first comma with a different delimiter character.
AFAIK you cannot put stings of different size into an array. You need to create a so called cell array.
A possible way to read the data from your question stored in a file Test.txt into a cell array is
t1 = textread("Test.txt", "%s", "delimiter", "\n");
for i = 1:length(t1)
j = findstr(t1{i}, ",")(1);
T{i,1} = t1{i}(1:j - 1);
T{i,2} = strtrim(t1{i}(j + 1:end));
end
Now
T{3,1} gives you 3 and
T{3,2} gives you Text with comma, more text.
After many long hours of searching and debugging, here's how I got it to work on Octave 3.2.4. Using | as the delimiter (instead of comma).
The data file now looks like:
1|Some text
2|More text
3|Text with comma, more text
Here's how to call it: data = load_data('data/data_file.csv', NUMBER_OF_LINES);
Limitation: You need to know how many lines you want to get. If you want to get all, then you will need to write a function to count the number of lines in the file in order to initialize the cell_array. It's all very clunky and primitive. So much for "high level languages like Octave".
Note: After the unpleasant exercise of getting this to work, it seems that Octave is not very useful unless you enjoy wasting your time writing code to do the simplest things. Better choices seems to be R, Python, or C#/Java with a Machine Learning or Matrix library.
function all_messages = load_data(filename, NUMBER_OF_LINES)
fid = fopen(filename, "r");
all_messages = cell (NUMBER_OF_LINES, 2 );
counter = 1;
line = fgetl(fid);
while line != -1
separator_index = index(line, '|');
all_messages {counter, 1} = substr(line, 1, separator_index - 1); % Up to the separator
all_messages {counter, 2} = substr(line, separator_index + 1, length(line) - separator_index); % After the separator
counter++;
line = fgetl(fid);
endwhile
fprintf("Processed %i lines.\n", counter -1);
fclose(fid);
end

Converting dynamic, nicely formatted tabular data in Python to str.format()

I have the following Python 2.x code, which generates a header row for tabular data:
headers = ['Name', 'Date', 'Age']
maxColumnWidth = 20 # this is just a placeholder
headerRow = "|".join( ["%s" % k.center(maxColumnWidth) for k in headers] )
print(headerRow)
This code outputs the following:
Name | Date | Age
Which is exactly what I want - the data is nicely formatted and centered in columns of width maxColumnWidth. (maxColumnWidth is calculated earlier in the program)
According to the Python docs, you should be able to do the same thing in Python3 with curly brace string formatting, as follows:
headerRow = "|".join( ["{:^maxColumnWidth}".format(k) for k in headers] )
However, when I do this, I get the following:
ValueError: Invalid conversion specification
But, if I do this:
headerRow = "|".join( ["{:^30}".format(k) for k in headers] )
Everything works fine.
My question is: How do I use a variable in the format string instead of an integer?:
headerRow = "|".join( ["{:^maxColumnWidth}".format(k) for k in headers] )
headers = ['Name', 'Date', 'Age']
maxColumnWidth=21
headerRow = "|".join( "{k:^{m}}".format(k=k,m=maxColumnWidth) for k in headers )
print(headerRow)
yields
Name | Date | Age
You can represent the width maxColumnWidth as {m}, and then
substitute the value through a format parameter.
No need to use brackets (list comprehension) inside the join. A
generator expression (without brackets) suffices.
As it says, your conversion specification is invalid. "maxColumnWidth" is not a valid conversion specification.
>>> "{:^{maxColumnWidth}}".format('foo', maxColumnWidth=10)
' foo '