I want to read in data from multiple files that I want to use for plotting (matplotlib).
I found a function loadtxt() that I could use for this purpose. However, I only want to read in one column from each file.
How would I do this?
The following command works for me if I read in at least 2 columns, for example:
numpy.loadtxt('myfile.dat', usecols=(2,3))
But
numpy.loadtxt('myfile.dat', usecols=(3))
would throw an error.
You need a comma after the 3 in order to tell Python that (3,) is a tuple. Python interprets (3) to be the same value as the int 3, and loadtxt wants a sequence-type argument for usecols.
numpy.loadtxt('myfile.dat', usecols=(3,))
Related
I need to be able to parse 2 different types of CSVs with read_csv, the first has ;-separated values and the second has ,-separated values. I need to do this at the same time.
That is, the CSV can have this format:
some;csv;values;here
or this:
some,csv,values,here
or even mixed:
some;csv,values;here
I tried many things like the following regex but nothing worked:
data = pd.read_csv(csv_file, sep=r'[,;]', engine='python')
Am I doing something wrong with the regex?
Instead of reading from a file, I ran your code sample
reading from a string:
txt = '''C1;C2,C3;C4
some;csv,values;here
some1;csv1,values1;here1'''
data = pd.read_csv(io.StringIO(txt), sep='[,;]', engine='python')
and got a proper result:
C1 C2 C3 C4
0 some csv values here
1 some1 csv1 values1 here1
Note that the sep parameter can be even an ordinary (not raw) string,
because it does not contain any backslashes.
So your idea to specify multiple separators as a regex pattern is OK.
The reason that your code failed is probably an "inconsistent" division of
lines into fileds. Maybe you should ensure that each line contains the
same number of commas and semi-colons (at least not too many).
Look thoroughly at your stack trace. There should include some information
about which line of the source file caused the problem.
Then look at the indicated line and correct it.
Edit
To look what happens in a "failure case", I changed the source string to:
txt = '''C1;C2,C3;C4
some;csv,values;here
some1;csv1,values1;here1
some2;csv2,values2;here2,xxxx'''
i.e. I added one line with 5 fields (one too many).
Then execution of the above code results in an error message:
ParserError: Expected 4 fields in line 4, saw 5. ...
Note words in line 4, precisely indicating the offending input line
(line numbers starts from 1).
I have a csv file, the one from https://www.kaggle.com/jolasa/waves-measuring-buoys-data-mooloolaba/downloads/waves-measuring-buoys-data-mooloolaba.zip/1. The first entries look like this:
The first column has dates which I'm trying to read with this command:
matrix = dlmread ('waves-measuring-buoys-data/WavesMooloolabaJan2017toJun2019.csv',',',1,0);
(If referring to file on Kaggle, note that I slightly modified the directory and file names for ease of reading)
Then when I check a date by printing matrix(2,1), I get 1 instead of 01/01/2017 00:00.
How do I get the correct format?
csvread is only for numeric inputs.
Use csv2cell from the io package instead, to obtain your data as a string, and then perform any necessary string operatios and conversions accordingly.
I want to create a mathematical calculator in objective-C. I need it to run
through a command line. The user will enter an equation like 4 + 2 * 12 etc. The output should calculate the 2 and 12 first because they are times by. How would I create a command line program that creates output based on mathematical order or precedence. for example solving whats in the brackets first then anything that is multiplied and or divided by etc etc.
there are multiple programs available online for this here is just one http://www.wikihow.com/Make-a-Command-Prompt-Calculator, in the CMD line you can specify precedence by simply using parenthesis like the following C:> set /a ((2*12)+4) obviously replacing hard coded values with that passed to a variable.
I am trying to read a binary file with Python. This is the code I use:
fb = open(Bin_File, "r")
a = numpy.fromfile(fb, dtype=numpy.float32)
However, I get zero values at the end of the array. For example, for a case where nrows=296 and ncol=439 and as a result, len(a)=296*439, I get zero values for a[-922:]. I know these values should be noData (-9999 in this example) from a trusted piece of code in R. Does anybody know why I am getting these non-sense zeros?
P.S: I am not sure it is related on not, but len(a) is nrows*ncols+2! I have to get rid of these two using a = a[0:-2] so that when I reshape them into rows and columns using a_reshape = a.reshape(nrows, ncols) I don't get an error.
When opening a file for reading as binary you should use the mode "rb" instead of "r".
Here is some background from the docs. On linux machines you don't need the "b" but it wont hurt. On Windows machines you must use "rb" for binary files.
Also note that the two extra entries you're getting is a common bug/feature when using the "unformatted" binary output format of Fortran. Each write statement given in this mode will produce a record that is surrounded by two 4 byte blocks.
These blocks represent integers that list the number of bytes in the block of unformatted data. For example, [223] [223 bytes of data] [223].
I am trying to read a file into my code.
there are 2 subroutines, one which writes a file and the other which reads it.
the writing part was:
write(*,*)'entered refile, shall make file'
ileunitA=int(presentstep)
write(fname,1012)ileunitA
1012 format('DATA_',i6.6,'.dat')
write(fnam,1112)index
1112 format('pp',i3.3)
open(UNIT=ileunitA,FILE=fname)
!variables from module global
write(ileunita,*)u,v,w,pc,p,p0,rho1,gam,con
write(ileunita,*)aip,aim,ajp,ajm,akp,akm,ap,ap0
write(ileunita,*) scon,smomu,smomv,smomw
...
The reading part was as follows(in another subroutine):
ileunita=25;
open(unit=ILEUNITA,file='DATA_010500.dat')
!variables from module global
read(ileunita,*)u,v,w,pc,p,p0,rho1,gam,con
read(ileunita,*)aip,aim,ajp,ajm,akp,akm,ap,ap0
read(ileunita,*) scon,smomu,smomv,smomw
...
When I run the code, it shows the following error:
At line 3682 of file bub2.f90 (unit = 25, file = 'DATA_000001.dat')
Fortran runtime error: Bad repeat count in item 1 of list input
Can anyone help me figure out what could be the problem? And what is 'repeat count'. What is a 'bad' repeat count? Thanks
Guessing a little (you could show the text in the problematic line in your question...), but you are using list directed input (and output) with the * as the second specifier in the read (and write) statements. List directed input allows multiple fields that have the same value to be represented using the syntax r*c, where r is a numeric repeat count and c is the value to be repeated.
If any of your output items generate a field that contains a * then that could be confusing the processing of input.
(It is permissible (though rare) for a processor to represent multiple output fields that have the same value using a repeat count, for example WRITE (unit,*) 23, 23, 23, 23 could result in an input file that contains the text 4*23.)
List directed input also has some other features, such as the handling of delimiter characters, the / character causing input processing to terminate and the possibility and handling of null values. Some of these features may surprise those not familiar with the rules (which are inspired by typical short cuts taken when input was submitted via punched cards), which why it is often better to avoid list directed input and output and use an explicit format instead.
If any of your data fields are of type character you should consider using a non-default DELIM mode to avoid any special characters within the character variable value from confusing the input processing.