Monadic File I/O - file-io

There are many examples of how to read from and write to files, but many posts seem out of date, are too complicated, or are not 'safe' (1, 2) (they throw/raise exceptions). Coming from Rust, I'd like to explicitly handle all errors with something monadic like result.
Below is an attempt that is 'safe-er' because an open and read/write will not throw/raise. But not sure whether the close can fail. Is there a more concise and potentially safer way to do this?
(* opam install core batteries *)
open Stdio
open Batteries
open BatResult.Infix
let read_safe (file_path: string): (string, exn) BatPervasives.result =
(try let chan = In_channel.create file_path in Ok(chan)
with (e: exn) -> Error(e))
>>= fun chan ->
let res_strings =
try
let b = In_channel.input_lines chan in
Ok(b)
with (e: exn) -> Error(e) in
In_channel.close chan;
BatResult.map (fun strings -> String.concat "\n" strings) res_strings
let write_safe (file_path: string) (text: string) : (unit, exn) BatPervasives.result =
(try
(let chan = Out_channel.create file_path in Ok(chan))
with (e: exn) -> Error(e))
>>= fun chan ->
let res =
(try let b = Out_channel.output_string chan text in Ok(b)
with (e: exn) -> Error(e)) in
Out_channel.close chan;
res
let () =
let out =
read_safe "test-in.txt"
>>= fun str -> write_safe "test-out.txt" str in
BatResult.iter_error (fun e -> print_endline (Base.Exn.to_string e)) out

The Stdio library, which is a part of the Janestreet industrial-strength standard library, already provides such functions, which are, of course safe, e.g., In_channel.read_all reads the contents of the file to a string and corresponding Out_channel.write_all writes it to a file, so we can implement a cp utility as,
(* file cp.ml *)
(* file cp.ml *)
open Base
open Stdio
let () = match Sys.get_argv () with
| [|_cp; src; dst |] ->
Out_channel.write_all dst
~data:(In_channel.read_all src)
| _ -> invalid_arg "Usage: cp src dst"
To build and run the code, put it in the cp.ml file (ideally in a fresh new directory), and run
dune init exe cp --libs=base,stdio
this command will bootstrap your project using dune. Then you can run your program with
dune exec ./cp.exe cp.ml cp.copy.ml
Here is the link to the OCaml Documentation Hub that will make it easier for you to find interesting libraries in OCaml.
Also, if you want to turn a function that raises an exception to a function that returns an error instead, you can use Result.try_with, e.g.,
let safe_read file = Result.try_with ## fun () ->
In_channel.read_all file

You can read and write files in OCaml without needing alternative standard libraries. Everything you need is already built into Stdlib which ships with OCaml.
Here's an example of reading a file while ensuring the file descriptor gets closed safely in case of an exception: https://stackoverflow.com/a/67607879/20371 . From there you can write a similar function to write a file using the corresponding functions open_out, out_channel_length, and output.
These read and write file contents as OCaml's bytes type, i.e. mutable bytestrings. However, they may throw exceptions. This is fine. In OCaml exceptions are cheap and easy to handle. Nevertheless, sometimes people don't like them for whatever reason. So it's a bit of a convention nowadays to suffix functions which throw exceptions with _exn. So suppose you define the above-mentioned two functions as such:
val get_contents_exn : string -> bytes
val set_contents_exn : string -> bytes -> unit
Now it's easy for you (or anyone) to wrap them and return a result value, like Rust. But, since we have polymorphic variants in OCaml, we take advantage of that to compose together functions which can return result values, as described here: https://keleshev.com/composable-error-handling-in-ocaml
So you can wrap them like this:
let get_contents filename =
try Ok (get_contents_exn filename) with exn -> Error (`Exn exn)
let set_contents filename contents =
try Ok (set_contents_exn filename contents) with exn -> Error (`Exn exn)
Now these have the types:
val get_contents : string -> (bytes, [> `Exn of exn]) result
val set_contents : string -> bytes -> (unit, [> `Exn of exn]) result
And they can be composed together with each other and other functions which return result values with a polymorphic variant error channel.
One point I am trying to make here is to offer your users both, so they can choose whichever way–exceptions or results–makes sense for them.

Here's the full safe solution based on #ivg answer, using only the Base library.
open Base
open Base.Result
open Stdio
let read_safe (file_path: string) =
Result.try_with ## fun () ->
In_channel.read_all file_path
let write_safe (file_path: string) (text: string) =
Result.try_with ## fun () ->
Out_channel.write_all ~data:text file_path
let () =
let out =
read_safe "test-in.txt"
>>= fun str ->
write_safe "test-out.txt" str in
iter_error out ~f:(fun e -> print_endline (Base.Exn.to_string e))

Related

Distinguish functions with lambda argument by lambda's return type?

I have a function timeout(...) (extension function that returns this) which accepts an argument that is either String, Date or Long. What I am trying to do is to make it accept any lambda that also returns one of these three types.
Kotlin finds the below functions ambiguous and can't decide which one to call when I type, for example, timeout { "something" }.
#JvmName("timeoutString")
fun <CR: CachableResponse> CR.timeout(timeLambda: CR.()->String): CR = timeout(timeLambda())
#JvmName("timeoutLong")
fun <CR: CachableResponse> CR.timeout(timeLambda: CR.()->Long): CR = timeout(timeLambda())
#JvmName("timeoutDate")
fun <CR: CachableResponse> CR.timeout(timeLambda: CR.()->Date): CR = timeout(timeLambda())
The error I'm getting is Cannot choose among the following candidates without completing type inference.
Of course one way to work around this, is to have one function instead of three like this:
fun <CR: CachableResponse, Type> CR.timeout(timeLambda: CR.()->Type): CR =
timeLambda().let { when (it) {
is String -> timeout(it)
is Date -> timeout(it)
is Long -> timeout(it)
else -> this
} }
In this case, though, the developer won't have any clue what its lambda will have to return without reading the description or checking the source code.
Is there any more elegant solution?
Actually, you solution is rather elegant.
I would only suggest to inline CR generic parameter and capture when subject in a variable:
fun <Type> CachableResponse.timeout(timeLambda: CachableResponse.() -> Type) =
when (val it = timeLambda()) {
is String -> timeout(it)
is Date -> timeout(it)
is Long -> timeout(it)
else -> this
}
In this case, though, the developer won't have any clue what its lambda will have to return without reading the description or checking the source code.
IDE comes to the rescue:

The signature for this packaged module couldn't be inferred in recursive function

I'm still trying to figure out how to split code when using mirage and it's myriad of first class modules.
I've put everything I need in a big ugly Context module, to avoid having to pass ten modules to all my functions, one is pain enough.
I have a function to receive commands over tcp :
let recvCmds (type a) (module Ctx : Context with type chan = a) nodeid chan = ...
After hours of trial and errors, I figured out that I needed to add (type a) and the "explicit" type chan = a to make it work. Looks ugly, but it compiles.
But if I want to make that function recursive :
let rec recvCmds (type a) (module Ctx : Context with type chan = a) nodeid chan =
Ctx.readMsg chan >>= fun res ->
... more stuff ...
|> OtherModule.getStorageForId (module Ctx)
... more stuff ...
recvCmds (module Ctx) nodeid chan
I pass the module twice, the first time no problem but
I get an error on the recursion line :
The signature for this packaged module couldn't be inferred.
and if I try to specify the signature I get
This expression has type a but an expression was expected of type 'a
The type constructor a would escape its scope
And it seems like I can't use the whole (type chan = a) thing.
If someone could explain what is going on, and ideally a way to work around it, it'd be great.
I could just use a while of course, but I'd rather finally understand these damn modules. Thanks !
The pratical answer is that recursive functions should universally quantify their locally abstract types with let rec f: type a. .... = fun ... .
More precisely, your example can be simplified to
module type T = sig type t end
let rec f (type a) (m: (module T with type t = a)) = f m
which yield the same error as yours:
Error: This expression has type (module T with type t = a)
but an expression was expected of type 'a
The type constructor a would escape its scope
This error can be fixed with an explicit forall quantification: this can be done with
the short-hand notation (for universally quantified locally abstract type):
let rec f: type a. (module T with type t = a) -> 'never = fun m -> f m
The reason behind this behavior is that locally abstract type should not escape
the scope of the function that introduced them. For instance, this code
let ext_store = ref None
let store x = ext_store := Some x
let f (type a) (x:a) = store x
should visibly fail because it tries to store a value of type a, which is a non-sensical type outside of the body of f.
By consequence, values with a locally abstract type can only be used by polymorphic function. For instance, this example
let id x = x
let f (x:a) : a = id x
is fine because id x works for any x.
The problem with a function like
let rec f (type a) (m: (module T with type t = a)) = f m
is then that the type of f is not yet generalized inside its body, because type generalization in ML happens at let definition. The fix is therefore to explicitly tell to the compiler that f is polymorphic in its argument:
let rec f: 'a. (module T with type t = 'a) -> 'never =
fun (type a) (m:(module T with type t = a)) -> f m
Here, 'a. ... is an universal quantification that should read forall 'a. ....
This first line tells to the compiler that the function f is polymorphic in its first argument, whereas the second line explicitly introduces the locally abstract type a to refine the packed module type. Splitting these two declarations is quite verbose, thus the shorthand notation combines both:
let rec f: type a. (module T with type t = a) -> 'never = fun m -> f m

Chaining iterators of different types

I get type errors when chaining different types of Iterator.
let s = Some(10);
let v = (1..5).chain(s.iter())
.collect::<Vec<_>>();
Output:
<anon>:23:20: 23:35 error: type mismatch resolving `<core::option::Iter<'_, _> as core::iter::IntoIterator>::Item == _`:
expected &-ptr,
found integral variable [E0271]
<anon>:23 let v = (1..5).chain(s.iter())
^~~~~~~~~~~~~~~
<anon>:23:20: 23:35 help: see the detailed explanation for E0271
<anon>:24:14: 24:33 error: no method named `collect` found for type `core::iter::Chain<core::ops::Range<_>, core::option::Iter<'_, _>>` in the current scope
<anon>:24 .collect::<Vec<_>>();
^~~~~~~~~~~~~~~~~~~
<anon>:24:14: 24:33 note: the method `collect` exists but the following trait bounds were not satisfied: `core::iter::Chain<core::ops::Range<_>, core::option::Iter<'_, _>> : core::iter::Iterator`
error: aborting due to 2 previous errors
But it works fine when zipping:
let s = Some(10);
let v = (1..5).zip(s.iter())
.collect::<Vec<_>>();
Output:
[(1, 10)]
Why is Rust able to infer the correct types for zip but not for chain and how can I fix it? n.b. I want to be able to do this for any iterator, so I don't want a solution that just works for Range and Option.
First, note that the iterators yield different types. I've added an explicit u8 to the numbers to make the types more obvious:
fn main() {
let s = Some(10u8);
let r = (1..5u8);
let () = s.iter().next(); // Option<&u8>
let () = r.next(); // Option<u8>
}
When you chain two iterators, both iterators must yield the same type. This makes sense as the iterator cannot "switch" what type it outputs when it gets to the end of one and begins on the second:
fn chain<U>(self, other: U) -> Chain<Self, U::IntoIter>
where U: IntoIterator<Item=Self::Item>
// ^~~~~~~~~~~~~~~ This means the types must match
So why does zip work? Because it doesn't have that restriction:
fn zip<U>(self, other: U) -> Zip<Self, U::IntoIter>
where U: IntoIterator
// ^~~~ Nothing here!
This is because zip returns a tuple with one value from each iterator; a new type, distinct from either source iterator's type. One iterator could be an integral type and the other could return your own custom type for all zip cares.
Why is Rust able to infer the correct types for zip but not for chain
There is no type inference happening here; that's a different thing. This is just plain-old type mismatching.
and how can I fix it?
In this case, your inner iterator yields a reference to an integer, a Clone-able type, so you can use cloned to make a new iterator that clones each value and then both iterators would have the same type:
fn main() {
let s = Some(10);
let v: Vec<_> = (1..5).chain(s.iter().cloned()).collect();
}
If you are done with the option, you can also use a consuming iterator with into_iter:
fn main() {
let s = Some(10);
let v: Vec<_> = (1..5).chain(s.into_iter()).collect();
}

What's the de-facto way of reading and writing files in Rust 1.x?

With Rust being comparatively new, I've seen far too many ways of reading and writing files. Many are extremely messy snippets someone came up with for their blog, and 99% of the examples I've found (even on Stack Overflow) are from unstable builds that no longer work. Now that Rust is stable, what is a simple, readable, non-panicking snippet for reading or writing files?
This is the closest I've gotten to something that works in terms of reading a text file, but it's still not compiling even though I'm fairly certain I've included everything I should have. This is based off of a snippet I found on Google+ of all places, and the only thing I've changed is that the old BufferedReader is now just BufReader:
use std::fs::File;
use std::io::BufReader;
use std::path::Path;
fn main() {
let path = Path::new("./textfile");
let mut file = BufReader::new(File::open(&path));
for line in file.lines() {
println!("{}", line);
}
}
The compiler complains:
error: the trait bound `std::result::Result<std::fs::File, std::io::Error>: std::io::Read` is not satisfied [--explain E0277]
--> src/main.rs:7:20
|>
7 |> let mut file = BufReader::new(File::open(&path));
|> ^^^^^^^^^^^^^^
note: required by `std::io::BufReader::new`
error: no method named `lines` found for type `std::io::BufReader<std::result::Result<std::fs::File, std::io::Error>>` in the current scope
--> src/main.rs:8:22
|>
8 |> for line in file.lines() {
|> ^^^^^
To sum it up, what I'm looking for is:
brevity
readability
covers all possible errors
doesn't panic
None of the functions I show here panic on their own, but I am using expect because I don't know what kind of error handling will fit best into your application. Go read The Rust Programming Language's chapter on error handling to understand how to appropriately handle failure in your own program.
Rust 1.26 and onwards
If you don't want to care about the underlying details, there are one-line functions for reading and writing.
Read a file to a String
use std::fs;
fn main() {
let data = fs::read_to_string("/etc/hosts").expect("Unable to read file");
println!("{}", data);
}
Read a file as a Vec<u8>
use std::fs;
fn main() {
let data = fs::read("/etc/hosts").expect("Unable to read file");
println!("{}", data.len());
}
Write a file
use std::fs;
fn main() {
let data = "Some data!";
fs::write("/tmp/foo", data).expect("Unable to write file");
}
Rust 1.0 and onwards
These forms are slightly more verbose than the one-line functions that allocate a String or Vec for you, but are more powerful in that you can reuse allocated data or append to an existing object.
Reading data
Reading a file requires two core pieces: File and Read.
Read a file to a String
use std::fs::File;
use std::io::Read;
fn main() {
let mut data = String::new();
let mut f = File::open("/etc/hosts").expect("Unable to open file");
f.read_to_string(&mut data).expect("Unable to read string");
println!("{}", data);
}
Read a file as a Vec<u8>
use std::fs::File;
use std::io::Read;
fn main() {
let mut data = Vec::new();
let mut f = File::open("/etc/hosts").expect("Unable to open file");
f.read_to_end(&mut data).expect("Unable to read data");
println!("{}", data.len());
}
Write a file
Writing a file is similar, except we use the Write trait and we always write out bytes. You can convert a String / &str to bytes with as_bytes:
use std::fs::File;
use std::io::Write;
fn main() {
let data = "Some data!";
let mut f = File::create("/tmp/foo").expect("Unable to create file");
f.write_all(data.as_bytes()).expect("Unable to write data");
}
Buffered I/O
I felt a bit of a push from the community to use BufReader and BufWriter instead of reading straight from a file
A buffered reader (or writer) uses a buffer to reduce the number of I/O requests. For example, it's much more efficient to access the disk once to read 256 bytes instead of accessing the disk 256 times.
That being said, I don't believe a buffered reader/writer will be useful when reading the entire file. read_to_end seems to copy data in somewhat large chunks, so the transfer may already be naturally coalesced into fewer I/O requests.
Here's an example of using it for reading:
use std::fs::File;
use std::io::{BufReader, Read};
fn main() {
let mut data = String::new();
let f = File::open("/etc/hosts").expect("Unable to open file");
let mut br = BufReader::new(f);
br.read_to_string(&mut data).expect("Unable to read string");
println!("{}", data);
}
And for writing:
use std::fs::File;
use std::io::{BufWriter, Write};
fn main() {
let data = "Some data!";
let f = File::create("/tmp/foo").expect("Unable to create file");
let mut f = BufWriter::new(f);
f.write_all(data.as_bytes()).expect("Unable to write data");
}
A BufReader is more useful when you want to read line-by-line:
use std::fs::File;
use std::io::{BufRead, BufReader};
fn main() {
let f = File::open("/etc/hosts").expect("Unable to open file");
let f = BufReader::new(f);
for line in f.lines() {
let line = line.expect("Unable to read line");
println!("Line: {}", line);
}
}
For anybody who is writing to a file, the accepted answer is good but if you need to append to the file you have to use the OpenOptions struct instead:
use std::io::Write;
use std::fs::OpenOptions;
fn main() {
let data = "Some data!\n";
let mut f = OpenOptions::new()
.append(true)
.create(true) // Optionally create the file if it doesn't already exist
.open("/tmp/foo")
.expect("Unable to open file");
f.write_all(data.as_bytes()).expect("Unable to write data");
}
Buffered writing still works the same way:
use std::io::{BufWriter, Write};
use std::fs::OpenOptions;
fn main() {
let data = "Some data!\n";
let f = OpenOptions::new()
.append(true)
.open("/tmp/foo")
.expect("Unable to open file");
let mut f = BufWriter::new(f);
f.write_all(data.as_bytes()).expect("Unable to write data");
}
By using the Buffered I/O you can copy the file size is greater than the actual memory.
use std::fs::{File, OpenOptions};
use std::io::{BufReader, BufWriter, Write, BufRead};
fn main() {
let read = File::open(r#"E:\1.xls"#);
let write = OpenOptions::new().write(true).create(true).open(r#"E:\2.xls"#);
let mut reader = BufReader::new(read.unwrap());
let mut writer = BufWriter::new(write.unwrap());
let mut length = 1;
while length > 0 {
let buffer = reader.fill_buf().unwrap();
writer.write(buffer);
length = buffer.len();
reader.consume(length);
}
}

Testing functions in Haskell that do IO

Working through Real World Haskell right now. Here's a solution to a very early exercise in the book:
-- | 4) Counts the number of characters in a file
numCharactersInFile :: FilePath -> IO Int
numCharactersInFile fileName = do
contents <- readFile fileName
return (length contents)
My question is: How would you test this function? Is there a way to make a "mock" input instead of actually needing to interact with the file system to test it out? Haskell places such an emphasis on pure functions that I have to imagine that this is easy to do.
You can make your code testable by using a type-class-constrained type variable instead of IO.
First, let's get the imports out of the way.
{-# LANGUAGE FlexibleInstances #-}
import qualified Prelude
import Prelude hiding(readFile)
import Control.Monad.State
The code we want to test:
class Monad m => FSMonad m where
readFile :: FilePath -> m String
-- | 4) Counts the number of characters in a file
numCharactersInFile :: FSMonad m => FilePath -> m Int
numCharactersInFile fileName = do
contents <- readFile fileName
return (length contents)
Later, we can run it:
instance FSMonad IO where
readFile = Prelude.readFile
And test it too:
data MockFS = SingleFile FilePath String
instance FSMonad (State MockFS) where
-- ^ Reader would be enough in this particular case though
readFile pathRequested = do
(SingleFile pathExisting contents) <- get
if pathExisting == pathRequested
then return contents
else fail "file not found"
testNumCharactersInFile :: Bool
testNumCharactersInFile =
evalState
(numCharactersInFile "test.txt")
(SingleFile "test.txt" "hello world")
== 11
This way your code under test needs very little modification.
As Alexander Poluektov already pointed out, the code you are trying to test can easily be separated into a pure and an impure part.
Nevertheless I think it is good to know how to test such impure functions in haskell.
The usual approach to testing in haskell is to use quickcheck and that's what I also tend to use for impure code.
Here is an example of how you might achieve what you are trying to do which gives you kind of a mock behavior * :
import Test.QuickCheck
import Test.QuickCheck.Monadic(monadicIO,run,assert)
import System.Directory(removeFile,getTemporaryDirectory)
import System.IO
import Control.Exception(finally,bracket)
numCharactersInFile :: FilePath -> IO Int
numCharactersInFile fileName = do
contents <- readFile fileName
return (length contents)
Now provide an alternative function (Testing against a model):
numAlternative :: FilePath -> IO Integer
numAlternative p = bracket (openFile p ReadMode) hClose hFileSize
Provide an Arbitrary instance for the test environment:
data TestFile = TestFile String deriving (Eq,Ord,Show)
instance Arbitrary TestFile where
arbitrary = do
n <- choose (0,2000)
testString <- vectorOf n $ elements ['a'..'z']
return $ TestFile testString
Property testing against the model (using quickcheck for monadic code):
prop_charsInFile (TestFile string) =
length string > 0 ==> monadicIO $ do
(res,alternative) <- run $ createTmpFile string $
\p h -> do
alternative <- numAlternative p
testRes <- numCharactersInFile p
return (testRes,alternative)
assert $ res == fromInteger alternative
And a little helper function:
createTmpFile :: String -> (FilePath -> Handle -> IO a) -> IO a
createTmpFile content func = do
tempdir <- catch getTemporaryDirectory (\_ -> return ".")
(tempfile, temph) <- openTempFile tempdir ""
hPutStr temph content
hFlush temph
hClose temph
finally (func tempfile temph)
(removeFile tempfile)
This will let quickCheck create some random files for you and test your implementation against a model function.
$ quickCheck prop_charsInFile
+++ OK, passed 100 tests.
Of course you could also test some other properties depending on your usecase.
* Note about the my usage of the term mock behavior:
The term mock in the object oriented sense is perhaps not the best here. But what is the intention behind a mock?
It let's you test code that needs access to a resource that usually is
either not available at testing time
or is not easily controllable and thus not easy to verify.
By shifting the responsibility of providing such a resource to quickcheck, it suddenly becomes feasible to provide an environment for the code under test that can be verified after a test run.
Martin Fowler describes this nicely in an article about mocks :
"Mocks are ... objects pre-programmed with expectations which form a specification of the calls they are expected to receive."
For the quickcheck setup I'd say that files generated as input are "pre-programmed" such that we know about their size (== expectation). And then they are verified against our specification (== property).
For that you will need to modify the function such that it becomes:
numCharactersInFile :: (FilePath -> IO String) -> FilePath -> IO Int
numCharactersInFile reader fileName = do
contents <- reader fileName
return (length contents)
Now you can pass any mock function that takes a file path and return IO string such as:
fakeFile :: FilePath -> IO String
fakeFile fileName = return "Fake content"
and pass this function to numCharactersInFile.
The function consists from two parts: impure (reading part content as String) and pure (calculating the length of String).
The impure part cannot be "unit"-tested by definition. The pure part is just call to the library function (and of course you can test it if you want :) ).
So there is nothing to mock and nothing to unit-test in this example.
Put it another way. Consider you have an equal C++ or Java implementation (*): reading content and then calculating its length. What would you really want to mock and what would remain for testing afterwards?
(*) which is of course not the way you will do in C++ or Java, but that's offtopic.
Based on my layman's understanding of Haskell, I've come to the following conclusions:
If a function makes use of the IO monad, mock testing is going to be impossible. Avoid hard-coding the IO monad in your function.
Make a helper version of your function that takes in other functions that may do IO. The result will look like this:
numCharactersInFile' :: Monad m => (FilePath -> m String) -> FilePath -> m Int
numCharactersInFile' f filePath = do
contents <- f filePath
return (length contents)
numCharactersInFile' is now testable with mocks!
mockFileSystem :: FilePath -> Identity String
mockFileSystem "fileName" = return "mock file contents"
Now you can verify that numCharactersInFile' returns the the expected results w/o IO:
18 == (runIdentity . numCharactersInFile' mockFileSystem $ "fileName")
Finally, export a version of your original function signature for use with IO
numCharactersInFile :: IO Int
numCharactersInFile = NumCharactersInFile' readFile
So, at the end of the day, numCharactersInFile' is testable with mocks. numCharactersInFile is just a variation of numCharactersInFile'.