Let's say you only want to parse the start of a large file using Perl 6 grammar. In order to avoid reading the whole file into a string, and then call subparse on the string. Is it possible to do a subparse when reading the file?
I could not find any subparsefile() method in the Grammar class, so I guess this is difficult to implement. But it should be possible in theory, see for example How do I search a file for a multiline pattern without reading the whole file into memory?
Currently you can't. Parsing anything at the moment requires the entire string to exist in memory.
Having said that, if you know the maximum number of lines your pattern may expand over, you could do something like:
my $max = 3; # maximum number of lines
for "textfile".IO.lines(:!chomp).rotor( $max => -$max + 1 ) -> #lines {
#lines.join.subparse( $grammar)
# and whatever you would like to do
}
It wouldn't be the fastest way of doing it, but it would not have to read the whole file in memory.
Related
I'm trying to store my results in NSArray and save it as CSV File using Objective-C but i don't seem to find any solution which is relevant. Please find the below sample code:
int a=5,b=10;
int c=b-a;
double d=4.5,e=3.0;
double h=d-e;
NSLog(#"host_port:%f", c);
NSLog(#"host_size:%d", h;
I would like to store my values c and h in array and write that to CSV File. Any advise on this would be helpful.
Thanks in advance.
When you ask a question on SO you need to show effort - code you've tried, details of what you've read - if you don't you'll get down and close votes (you have one of each as I write this). The code you have included has nothing to do with CSV or arrays, and is not even pasted in valid code (the formats are wrong).
That said, let's see if you can give you something to get you going.
A CSV file is just plain text, you don't need to use any packages to write one, just standard I/O routines will do the job. You also do not need to store all the values in an array and then output the array, or build up a string version of the whole CSV file and output that, you can output items as they are generated if you wish and it may be more efficient to do so. In your code fragment you only have two values, maybe you intend this to be the core of a loop, and given those we assume you wish the CSV file:
host_port,host_size
5,1.5
your values have basic types, int and double, they are not Objective-C object types. Given this you can use the standard C I/O operations to produce your file.
First you may need to obtain the destination file name from the user, assuming this is a GUI app look up NSOpenPanel for this. That will give you an NSURL from which you can obtain the file path as an NSString, and you can convert that into a C string using NSString methods.
Now you can enter the C I/O world, to find the documentation on the following functions open the Terminal and use the man command, e.g. man fopen etc.
To create and open for writing the file for writing use fopen() passing it the C string pathname you obtained above.
To write the headers and each row of data use fprintf(). This takes a format string just like NSLog(), but you must remember to explicitly include the line breaks by using \n in the format.
When you've finished close the file with fclose().
Now go read the documentation and write your CSV file!
HTH
Please note that I'm not asking how to append texts at the end of the file. I'm asking how to prepend texts to the beginning of file.
let handle = try FileHandle(forWritingTo: someFile)
//handle.seekToEndOfFile() // This is for appending
handle.seek(toFileOffset: 0) // Me trying to seek to the beginning of file
handle.write(content)
handle.closeFile()
It seems like my content is being written at the beginning of the file, but it just replaces the existing consent as well... Thanks!
One reasonable solution is to write the new content to a temporary file, then append the existing contents to the end of the temporary file. Then move the temporary file over the old file.
When you seek to a point in an existing file and then perform a write, the existing contents are overwritten from that point. This is why your current approach fails.
In general, most file systems don't have built-in support for prepending data to files. Likewise, most file I/O APIs don't either.
In order to prepend data, you first have to shift all of the existing data further along the file to make room for the new data at the beginning. You typically do this by starting near the end, reading a chunk of data, writing that data to the original position plus the length of data you hope to eventually prepend, and then repeating with the next chunk closer to the beginning of the file. In this way, you gradually shift everything down. Only after you've done all of that can you safely write the new data at the beginning of the file safely.
Frankly, if there's any way to avoid this, you should try to. The performance is likely to be terrible if the file is large and/or you're doing it frequently.
I've written several IDL programs to analyse some data. To keep it simple the programs read in some time varying data and calculate the fourier spectrum. This spectrum is written to file using this code:
openw,3,filename
printf,3,[transpose(freq),transpose(power)],format='(e,e)'
close,3
The file is then read by another program using this code:
rdfloat,filename,freq,power,/double
The rdfloat procedure can be found here: http://idlastro.gsfc.nasa.gov/
The error i get when trying to read the a file is: "Input conversion error. Unit: 101"
When i delve in to the file being read, i notice several types of unrecognised characters. I dont know if these are a result of the writing to the file or some thing else related to the number of files being created (over 300 files)
These symbols/characters are in the place of a single number:
< dle> < dc1> < dc2> < dc3> < dc4> < can> < nak> < em> < soh> < syn>
Example of what appears in the file being read, Note they are not consecutive lines.
7.7346< dle>18165493007e+01 8.4796811549010105e+00
7.7354408697119453e+01 1.04459538071< dc2>1749e+01
7.7360701595839< can>28e+01 3.0447318983094189e+00
Whenever I run the procedures that write the files, there is always at least one file that has some or all of these characters. The file/s that contains these characters is always different.
Can anyone explain what these symbols are and what I might be doing to create them as well as how to ensure they are not written to file?
I see two things that may be causing a problem. But first, I want to suggest a few tips.
When you open a file, it is useful to use the /GET_LUN keyword because it allows IDL to find and use a logical unit number (LUN) that is available (e.g., in case you left LUN 3 open somewhere else). When you print formatted data, you should specify the total width and number of decimal places. It will make things easier because then you need not worry about changing spacings between numbers in a file.
So I would change your first set of code to the following (or some variant of the following):
OPENW,gunit,filename[0],/GET_LUN,ERROR=err
FOR j=0L, N_ELEMENTS(freq) - 1L DO BEGIN
PRINTF,gunit,freq[j],power[j],FORMAT='(2e20.12)'
ENDFOR
FREE_LUN,gunit ;; this is better than using the CLOSE routine
So the first potential issue I see is that if your variable power was calculated using something like FFT.pro, then it will be a complex float or complex double, depending on the input and keywords used.
The second potential issue may be due to an incorrect format statement. You did not tell PRINTF how many columns or rows to expect. It might not know how to handle the input properly, so it guesses and may result in those characters you show. Those characters may be spacing characters due to the vague format statement or the software you are using to look at the files (e.g., I would not recommend using Word to open text files, use a text editor).
Side Note: You can open and read the file you just wrote in a similar fashion to what I showed above, but changed to the following:
n = FILE_LINES(filename[0])
freq = DBLARR(n)
power = DBLARR(n)
OPENR,gunit,filename[0],/GET_LUN,ERROR=err
FOR j=0L, N_ELEMENTS(freq) - 1L DO BEGIN
READF,gunit,freq[j],power[j],FORMAT='(2e20.12)'
ENDFOR
FREE_LUN,gunit ;; this is better than using the CLOSE routine
I have a xml file which only contains a single line, but the problem is the line is very large, so it seems that I can't store in a variable.
What i want is this,
given tag1, tag2.....tag900, I want to break each tag into a line as follow:
tag1
tag2
tag3
......
tag900
Do not attempt to do this using native batch. It will be extremely difficult, and any solution will be very slow.
The problem is native batch cannot read lines > 8k, and batch does not have a good way to read partial lines.
There is a method that creates a test file that has size >= your file that consists of a single repeated character. A binary file compare ( FC /B ) is then done and the results are parsed character by character expressed as hex codes. It's a bit more complex than that, but I don't think you want to go there.
The only other option is to use SET /P to read in 1021 chars at a time, and then parse and piece things together. But this is unproven, and again, I don't think worth the effort.
If you want to use a native scripting language than I suggest VBScript or JScript. (Perhaps PowerShell, but I don't really know much about its capabilities).
You could download a Unix text processing tool like sed that has been ported to Windows.
I don't do much with XML, but I've got to believe there is a free tool geared specifically for XML that would make your job fairly easy.
Basically, use anything except batch! (this is coming from someone whose hobby is solving problems with batch)
I am looking for ways to read in a PDF file with SAS. Apparently this is not basic functionality and there is very little to be found on the internet. (Let alone that google is not easy with PDF in you search giving you also links to PDF documents that go about other things.)
The only things that can be found, are people looking for ways to import data into datasets from a PDF. For me, that is not even necesarry. I would like to be able to read the contents of the PDF file in one big character variable. If possible, it would even be better to be able to read in the file's binary data.
Is this possible with SAS and how? (I got it to work in Access VBA, but can't find any similar ways in SAS.)
(In the end, the purpose is to convert this to base64 and put that base64-string into an XML document.)
You probably will not be able to read the entire file into one character variable since the maximum size of a character variable is around 33 KB. A simple way to read in one line at a time, though, is something like the following:
%let pdfFileName = Test.pdf;
%let lineSize = 2000;
data base;
format text_line $&lineSize..;
infile "&pdfFileName" lrecl=&lineSize;
input text_line $;
run;
This requires that you have a general idea of the maximum record length ahead of time, but you could write additional code to determine the maximum record size prior to reading in the file. In this example each line of text is read into one character variable named "text_line." From there, you could use a RETAIN statement or double trailers (##) in the INPUT line to process multiple lines at a time. The SAS web-site has plenty of documentation on how to read and process text from various types of input files.