Lightyear requireFailure does not do backtracking - idris

I want to parse a series of any 4 chars. However, these chars shouldn't form a specific string ( "bb" in an example below). So "aaaa" and "abcd" are okay, but neither "bbcd" nor "abbc" should not match.
I composed a following parser:
ntimes 4 (requireFailure (string "bb") *> anyChar)
However, I noticed, that it "eats" single b chars. E.g.
parse (ntimes 4 (requireFailure (string "bb") *> anyToken)) "abcde"
results in ['a', 'c', 'd', 'e'] (it fails, however, on "bbcd" and "abbc" as expected).
As a workaround I used my own implementation of requireFailure:
requireFailure' : Parser a -> Parser ()
requireFailure' p = do
isP <- p *> pure True <|> pure False
if isP then fail "argument parser to fail"
else pure ()
So
parse (ntimes 4 (requireFailure' (string "bb") *> anyToken)) "abcde"
gives ['a', 'b', 'c', 'd'] as I expect.
Apparently lightyear parsers are backtrack-by-default, unless one calls commitTo.
So my question is why library implementation of requireFailure does not do backtracking in case it's argument fails and is it an expected behavior?

If you look at the implementation of requireFailure you can see that it calls the "success" continuation us with the state s it gets after running its argument rather than the one ST i pos tw it got before.
requireFailure : ParserT str m tok -> ParserT str m ()
requireFailure (PT f) = PT $ \r, us, cs, ue, ce, (ST i pos tw) =>
f r
(\t, s => ue [Err pos "argument parser to fail"] s)
(\t, s => ce [Err pos "argument parser to fail"] s)
(\errs, s => us () s)
(\errs, s => cs () s)
(ST i pos tw)
The documentation claims that requireFailure is called notFollowedBy in parsec and that doesn't consume any input so you could argue it's a bug on LightYear's side.
You could open a bug report suggesting to replace the current code with something like (I don't know whether Idris supports # patterns):
requireFailure : ParserT str m tok -> ParserT str m ()
requireFailure (PT f) = PT $ \r, us, cs, ue, ce, s#(ST i pos tw) =>
f r
(\t, s => ue [Err pos "argument parser to fail"] s)
(\t, s => ce [Err pos "argument parser to fail"] s)
(\errs, _ => us () s)
(\errs, _ => cs () s)
s

Related

How do I solve this ambiguous type variable error in Haskell?

I couldn't find an answer to my question among several ambiguous type variable error questions.
I'm currently trying to get this code I found to work. (https://gist.github.com/kirelagin/3886243)
My code:
import Control.Arrow
import Data.List
import qualified Data.Map as M
import Data.Function
main = do
putStrLn "Start test"
let foo = "Hello World"
let freqTest = freqList foo
putStrLn "Frequentie list"
print freqTest
putStrLn "Done.."
let treeTest = buildTree freqTest
putStrLn "Huffman Tree"
print treeTest
putStrLn "Done.."
let codeMaphTest = buildCodemap treeTest
putStrLn "Codemap ding"
-- print codeMaphTest
putStrLn "Done.."
--This typeclass is supposed to make life _a bit_ easier.
class Eq a => Bits a where
zer :: a
one :: a
instance Bits Int where
zer = 0
one = 1
instance Bits Bool where
zer = False
one = True
-- Codemap is generated from a Huffman tree. It is used for fast encoding.
type Codemap a = M.Map Char [a]
-- Huffman tree is a simple binary tree. Each leaf contains a Char and its weight.
-- Fork (node with children) also has weight = sum of weights of its children.
data HTree = Leaf Char Int
| Fork HTree HTree Int
deriving (Show)
weight :: HTree -> Int
weight (Leaf _ w) = w
weight (Fork _ _ w) = w
-- The only useful operation on Huffman trees is merging, that is we take
-- two trees and make them children of a new Fork-node.
merge t1 t2 = Fork t1 t2 (weight t1 + weight t2)
-- `freqList` is an utility function. It takes a string and produces a list
-- of pairs (character, number of occurences of this character in the string).
freqList :: String -> [(Char, Int)]
freqList = M.toList . M.fromListWith (+) . map (flip (,) 1)
-- `buildTree` builds a Huffman tree from a list of character frequencies
-- (obtained, for example, from `freqList` or elsewhere).
-- It sorts the list in ascending order by frequency, turns each (char, freq) pair
-- into a one-leaf tree and keeps merging two trees with the smallest frequencies
-- until only one tree is remaining.
buildTree :: [(Char, Int)] -> HTree
buildTree = bld . map (uncurry Leaf) . sortBy (compare `on` snd)
where bld (t:[]) = t
bld (a:b:cs) = bld $ insertBy (compare `on` weight) (merge a b) cs
-- The next function traverses a Huffman tree to obtain a list of codes for
-- all characters and converts this list into a `Map`.
buildCodemap :: Bits a => HTree -> Codemap a
buildCodemap = M.fromList . buildCodelist
where buildCodelist (Leaf c w) = [(c, [])]
buildCodelist (Fork l r w) = map (addBit zer) (buildCodelist l) ++ map (addBit one) (buildCodelist r)
where addBit b = second (b :)
-- Simple functions to get a Huffman tree or a `Codemap` from a `String`.
stringTree :: String -> HTree
stringTree = buildTree . freqList
stringCodemap :: Bits a => String -> Codemap a
stringCodemap = buildCodemap . stringTree
-- Time to do the real encoding and decoding!
-- Encoding function just represents each character of a string by corresponding
-- sequence of `Bit`s.
encode :: Bits a => Codemap a -> String -> [a]
encode m = concat . map (m M.!)
encode' :: Bits a => HTree -> String -> [a]
encode' t = encode $ buildCodemap t
-- Decoding is a little trickier. We have to traverse the tree until
-- we reach a leaf which means we've just finished reading a sequence
-- of `Bit`s corresponding to a single character.
-- We keep doing this to process the whole list of `Bit`s.
decode :: Bits a => HTree -> [a] -> String
decode tree = dcd tree
where dcd (Leaf c _) [] = [c]
dcd (Leaf c _) bs = c : dcd tree bs
dcd (Fork l r _) (b:bs) = dcd (if b == zer then l else r) bs
Output:
huffmancompress.hs:17:24: error:
* Ambiguous type variable `a0' arising from a use of `buildCodemap'
prevents the constraint `(Bits a0)' from being solved.
Relevant bindings include
codeMaphTest :: Codemap a0 (bound at huffmancompress.hs:17:9)
Probable fix: use a type annotation to specify what `a0' should be.
These potential instances exist:
instance Bits Bool -- Defined at huffmancompress.hs:35:10
instance Bits Int -- Defined at huffmancompress.hs:31:10
* In the expression: buildCodemap treeTest
In an equation for `codeMaphTest':
codeMaphTest = buildCodemap treeTest
In the expression:
do putStrLn "Start test"
let foo = "Hello World"
let freqTest = freqList foo
putStrLn "Frequentie list"
....
|
17 | let codeMaphTest = buildCodemap treeTest
| ^^^^^^^^^^^^^^^^^^^^^
I've tried serveral things I found on the internet but nothing worth mentioning to be honest.
Maybe any of you guys can help me out!
On line 17, where the error points you:
let codeMaphTest = buildCodemap treeTest
What type is codeMaphTest? Should it be Codemap Int? Or Codemap String? Or, perhaps, Codemap Bool? The function buildCodemap can return any type, as long as it has an instance of Bit. So what type should it be?
The compiler doesn't know. There is nowhere to glean this information from. It's ambiguous.
And this is exactly what the compiler is telling you: "ambiguous type variable".
One way to fix this is to provide a type annotation (exactly as the error message says, by the way):
let codeMaphTest :: Codemap Int = buildCodemap treeTest
Note that I chose Int just as an example, because I don't know which type you meant (I'm somewhat like the compiler in that respect). Please substitute your own type - the one you actually wanted there.
Your code is indeed ambiguous. buildCodemap treeTest has a polymorphic type Bits a => Codemap a, so it can be used as a Codemap Int, a Codemap Bool, or even as another type if you defines further instances of Bits.
This is not a problem, on its own, but later on you try to use this value (e.g., to print it), so we really need to pick a concrete type a.
You could pick a at the definition point:
let codeMaphTest :: Codemap Int
codeMaphTest = buildCodemap treeTest
Or, alternatively, you could choose a later on, where you use it
print (codeMaphTest :: Codemap Int)

type error with CamlinternalFormatBasics.fmt

I am writing loop by recursion and I have problem:
let isRectangleIn a b c d =
if (a > c && b > d) || (a>d && b>c)
then
"TAK"
else
"NIE";;
let rec loop k =
if k = 0 then 0 else
let a = read_int () in
let b = read_int () in
let c = read_int () in
let d = read_int () in
Printf.printf "%s \n" (isRectangleIn a b c d)
loop (k-1);;
let i = read_int ();;
let result = loop i;;
Compiler says that
This expression has type
('a -> 'b -> 'c, out_channel, unit, unit, unit, 'a -> 'b -> 'c)
CamlinternalFormatBasics.fmt
but an expression was expected of type
('a -> 'b -> 'c, out_channel, unit, unit, unit, unit)
CamlinternalFormatBasics.fmt
Type 'a -> 'b -> 'c is not compatible with type unit
but I dont understand what i am doing wrong. Can somebody help me?
Whenever you see an error displaying CamlinternalFormatBasics.fmt, it means that a printf function is involved. Moreover, if there is a function type (here 'a -> 'b -> 'c) in the first parameter of the format, the error is that printf have too many argument compared to the format string.
In your case, the format string is "%s \n", which requires one argument, however you are using it with 3 arguments:
Printf.printf "%s \n" (isRectangleIn a b c d) loop (k-1)
(One can notice that there is as many supernumerary arguments in this function application and in the function type in the type error message.)
The root issue here is a missing ; between the printf expression and loop (k-1):
Printf.printf "%s \n" (isRectangleIn a b c d);
loop (k-1)
To avoid this kind of issue, it is generally advised to use ocp-indent (or ocamlformat) to indent code automatically and avoid deceitful indentation. For instance, ocp-indent would have indented your code as
Printf.printf "%s \n" (isRectangleIn a b c d)
loop (k-1);;
manisfesting the fact that printf and loop are not as the same level.

SML converting a string to an int with error catching

So what I want to do is to convert a string into an int and do some error catching on it. I would also like to know where I would put what I want it to do after it fails if it does.
I know how to convert, but I am not sure how to catch it and where the code will jump to after the error
I believe the method for converting it Int.fromString(x)
Thank you.
SML has two approaches to error handling. One, based on raise to raise errors and handle to catch the error, is somewhat similar to how error handling works in languages like Python or Java. It is effective, but the resulting code tends to lose some of its functional flavor. The other method is based on the notion of options. Since the return type of Int.fromString is
string -> int option
it makes the most sense to use the option-based approach.
An int option is either SOME n, where n is and integer, or it is NONE. The function Int.fromString returns the latter if it fails in its attempt to convert the string to an integer. The function which calls Int.fromString can explicitly test for NONE and use the valOf to extract the value in the case that the return value is of the form SOME n. Alternatively, and somewhat more idiomatically, you can use pattern matching in a case expression. Here is a toy example:
fun squareString s =
case Int.fromString(s) of
SOME n => Int.toString (n * n) |
NONE => s ^ " isn't an integer";
This function has type string -> string. Typical output:
- squareString "4";
val it = "16" : string
- squareString "Bob";
val it = "Bob isn't an integer" : string
Note that the clause which starts NONE => is basically an error handler. If the function that you are defining isn't able to handle such errors, it could pass the buck. For example:
fun squareString s =
case Int.fromString(s) of
SOME n => SOME (Int.toString (n * n))|
NONE => NONE;
This has type string -> string option with output now looking like:
- squareString "4";
val it = SOME "16" : string option
- squareString "Bob";
val it = NONE : string option
This would make it the responsibility of the caller to figure out what to do with the option.
The approach to error handling that John explains is elaborated in the StackOverflow question 'Unpacking' the data in an SML DataType without a case statement. The use-case there is a bit different, since it also involves syntax trees, but the same convenience applies for smaller cases:
fun squareString s = Int.fromString s >>= (fn i => SOME (i*i))
Assuming you defined the >>= operator as:
infix 3 >>=
fun NONE >>= _ = NONE
| (SOME a) >>= f = f a
The drawback of using 'a option for error handling is that you have to take into account, every single time you use a function that has this return type, whether it errored. This is not unreasonable. It's like mandatory null-checking. But it comes at the cost of not being able to easily compose your functions (using e.g. the o operator) and a lot of nested case-ofs:
fun inputSqrt s =
case TextIO.inputLine TextIO.stdIn of
NONE => NONE
| SOME s => case Real.fromString s of
NONE => NONE
| SOME x => SOME (Math.sqrt x) handle Domain => NONE
A workaround is that you can build this constant error handling into your function composition operator, as long as all your functions share the same way of expressing errors, e.g. using 'a option:
fun safeSqrt x = SOME (Math.sqrt x) handle Domain => NONE
fun inputSqrt () =
TextIO.inputLine TextIO.stdIn >>=
(fn s => Real.fromString s >>=
(fn x => safeSqrt x))
Or even shorter by applying Eta conversion:
fun inputSqrt () = TextIO.inputLine TextIO.stdIn >>= Real.fromString >>= safeSqrt
This function could fail either because of a lack of input, or because the input didn't convert to a real, or because it was negative. Naturally, this error handling isn't smart enough to say what the error was, so you might want to extend your functions from using an 'a option to using an ('a, 'b) either:
datatype ('a, 'b) either = Left of 'a | Right of 'b
infix 3 >>=
fun (Left msg) >>= _ = Left msg
| (Right a) >>= f = f a
fun try (SOME x) _ = Right x
| try NONE msg = Left msg
fun inputLine () =
try (TextIO.inputLine TextIO.stdIn) "Could not read from stdIn."
fun realFromString s =
try (Real.fromString s) "Could not derive real from string."
fun safeSqrt x =
try (SOME (Math.sqrt x) handle Domain => NONE) "Square root of negative number"
fun inputSqrt () =
inputLine () >>= realFromString >>= safeSqrt
And trying this out:
- ​inputSqrt ();
​9
> val it = Right 3.0 : (string, real) either
- ​inputSqrt ();
​~42
> val it = Left "Square root of negative number" : (string, real) either
- ​inputSqrt ();
Hello
> val it = Left "Could not derive real from string." : (string, real) either
- (TextIO.closeIn TextIO.stdIn; inputSqrt ());
> val it = Left "Could not read from stdIn." : (string, real) either

Matching bytestrings in Parsec

I am currently trying to use the Full CSV Parser presented in Real World Haskell. In order to I tried to modify the code to use ByteString instead of String, but there is a string combinator which just works with String.
Is there a Parsec combinator similar to string that works with ByteString, without having to do conversions back and forth?
I've seen there is an alternative parser that handles ByteString: attoparsec, but I would prefer to stick with Parsec, since I'm just learning how to use it.
I'm assuming you're starting with something like
import Prelude hiding (getContents, putStrLn)
import Data.ByteString
import Text.Parsec.ByteString
Here's what I've got so far. There are two versions. Both compile. Probably neither is exactly what you want, but they should aid discussion and help you to clarify your question.
Something I noticed along the way:
If you import Text.Parsec.ByteString then this uses uncons from Data.ByteString.Char8, which in turn uses w2c from Data.ByteString.Internal, to convert all read bytes to Chars. This enables Parsec's line and column number error reporting to work sensibly, and also enables you to use string and friends without problem.
Thus, the easy version of the CSV parser, which does exactly that:
import Prelude hiding (getContents, putStrLn)
import Data.ByteString (ByteString)
import qualified Prelude (getContents, putStrLn)
import qualified Data.ByteString as ByteString (getContents)
import Text.Parsec
import Text.Parsec.ByteString
csvFile :: Parser [[String]]
csvFile = endBy line eol
line :: Parser [String]
line = sepBy cell (char ',')
cell :: Parser String
cell = quotedCell <|> many (noneOf ",\n\r")
quotedCell :: Parser String
quotedCell =
do _ <- char '"'
content <- many quotedChar
_ <- char '"' <?> "quote at end of cell"
return content
quotedChar :: Parser Char
quotedChar =
noneOf "\""
<|> try (string "\"\"" >> return '"')
eol :: Parser String
eol = try (string "\n\r")
<|> try (string "\r\n")
<|> string "\n"
<|> string "\r"
<?> "end of line"
parseCSV :: ByteString -> Either ParseError [[String]]
parseCSV = parse csvFile "(unknown)"
main :: IO ()
main =
do c <- ByteString.getContents
case parse csvFile "(stdin)" c of
Left e -> do Prelude.putStrLn "Error parsing input:"
print e
Right r -> mapM_ print r
But this was so trivial to get working that I assume it cannot possibly be what you want. Perhaps you want everything to remain a ByteString or [Word8] or something similar all the way through? Hence my second attempt below. I am still importing Text.Parsec.ByteString, which may be a mistake, and the code is hopelessly riddled with conversions.
But, it compiles and has complete type annotations, and therefore should make a sound starting point.
import Prelude hiding (getContents, putStrLn)
import Data.ByteString (ByteString)
import Control.Monad (liftM)
import qualified Prelude (getContents, putStrLn)
import qualified Data.ByteString as ByteString (pack, getContents)
import qualified Data.ByteString.Char8 as Char8 (pack)
import Data.Word (Word8)
import Data.ByteString.Internal (c2w)
import Text.Parsec ((<|>), (<?>), parse, try, endBy, sepBy, many)
import Text.Parsec.ByteString
import Text.Parsec.Prim (tokens, tokenPrim)
import Text.Parsec.Pos (updatePosChar, updatePosString)
import Text.Parsec.Error (ParseError)
csvFile :: Parser [[ByteString]]
csvFile = endBy line eol
line :: Parser [ByteString]
line = sepBy cell (char ',')
cell :: Parser ByteString
cell = quotedCell <|> liftM ByteString.pack (many (noneOf ",\n\r"))
quotedCell :: Parser ByteString
quotedCell =
do _ <- char '"'
content <- many quotedChar
_ <- char '"' <?> "quote at end of cell"
return (ByteString.pack content)
quotedChar :: Parser Word8
quotedChar =
noneOf "\""
<|> try (string "\"\"" >> return (c2w '"'))
eol :: Parser ByteString
eol = try (string "\n\r")
<|> try (string "\r\n")
<|> string "\n"
<|> string "\r"
<?> "end of line"
parseCSV :: ByteString -> Either ParseError [[ByteString]]
parseCSV = parse csvFile "(unknown)"
main :: IO ()
main =
do c <- ByteString.getContents
case parse csvFile "(stdin)" c of
Left e -> do Prelude.putStrLn "Error parsing input:"
print e
Right r -> mapM_ print r
-- replacements for some of the functions in the Parsec library
noneOf :: String -> Parser Word8
noneOf cs = satisfy (\b -> b `notElem` [c2w c | c <- cs])
char :: Char -> Parser Word8
char c = byte (c2w c)
byte :: Word8 -> Parser Word8
byte c = satisfy (==c) <?> show [c]
satisfy :: (Word8 -> Bool) -> Parser Word8
satisfy f = tokenPrim (\c -> show [c])
(\pos c _cs -> updatePosChar pos c)
(\c -> if f (c2w c) then Just (c2w c) else Nothing)
string :: String -> Parser ByteString
string s = liftM Char8.pack (tokens show updatePosString s)
Probably your concern, efficiency-wise, should be those two ByteString.pack instructions, in the definitions of cell and quotedCell. You might try to replace the Text.Parsec.ByteString module so that instead of “making strict ByteStrings an instance of Stream with Char token type,” you make ByteStrings an instance of Stream with Word8 token type, but this won't help you with efficiency, it will just give you a headache trying to reimplement all the sourcePos functions to keep track of your position in the input for error messages.
No, the way to make it more efficient would be to change the types of char, quotedChar and string to Parser [Word8] and the types of line and csvFile to Parser [[Word8]] and Parser [[[Word8]]] respectively. You could even change the type of eol to Parser (). The necessary changes would look something like this:
cell :: Parser [Word8]
cell = quotedCell <|> many (noneOf ",\n\r")
quotedCell :: Parser [Word8]
quotedCell =
do _ <- char '"'
content <- many quotedChar
_ <- char '"' <?> "quote at end of cell"
return content
string :: String -> Parser [Word8]
string s = [c2w c | c <- (tokens show updatePosString s)]
You don't need to worry about all the calls to c2w as far as efficiency is concerned, because they cost nothing.
If this doesn't answer your question, please say what would.
I don't believe so. You will need to create one yourself using tokens. Although the documentation for it is a bit... nonexistent, the first two arguments are a function to use to show the expected tokens in an error message and a function to update the source position that will be printed in errors.

GHC rejects ST monad code as unable to unify type variables?

I wrote the following function:
(.>=.) :: Num a => STRef s a -> a -> Bool
r .>=. x = runST $ do
v <- readSTRef r
return $ v >= x
but when I tried to compile I got the following error:
Could not deduce (s ~ s1)
from the context (Num a)
bound by the type signature for
.>=. :: Num a => STRef s a -> a -> Bool
at test.hs:(27,1)-(29,16)
`s' is a rigid type variable bound by
the type signature for .>=. :: Num a => STRef s a -> a -> Bool
at test.hs:27:1
`s1' is a rigid type variable bound by
a type expected by the context: ST s1 Bool at test.hs:27:12
Expected type: STRef s1 a
Actual type: STRef s a
In the first argument of `readSTRef', namely `r'
In a stmt of a 'do' expression: v <- readSTRef r
Can anyone help?
This is exactly as intended. An STRef is only valid in one run of runST. And you try to put an external STRef into a new run of runST. That is not valid. That would allow arbitrary side-effects in pure code.
So, what you try is impossible to achieve. By design!
You need to stay within the ST context:
(.>=.) :: Ord a => STRef s a -> a -> ST s Bool
r .>=. x = do
v <- readSTRef r
return $ v >= x
(And as hammar points out, to use >= you need the Ord typeclass, which Num doesn't provide.)