functional programming : understand parser combinator - elm

I was trying to solve a problem using a parser combinator. I tried the following:
Note: the below code uses the combinator library
styleParserItalic : Bool -> Parser ( List (List Char , Style))
styleParserItalic bolded =
let
style = if bolded then Italic else Unstyled
in
(end `andThen` always ( succeed ( [] )))
<|> (string "(!ITALIC!)" `andThen` \_ -> styleParserItalic ( not bolded ) )
<|> ( anyChar `andThen` \c -> styleParserItalic bolded `andThen` \cs -> succeed ((c :: [],style) :: cs) )
I am struggling to understand how this parser runs since the styleParserItalic parser is called before the parser succeeds.
Could someone explain how the parser works when it is given a string of characters?
If someone is interested in the purpose of the parser and the full code, here is my previous question.
Here is what I have understood thus far
The parser will check first if it is the end of a line , if not it will try to parser the string (!ITALIC!) if that the case then it will call the parser with with parameter True or false (if false then will makes it true ..)
If the parser does not find the the string (!ITALIC!) it will try to parse any character then it will call the parser again.
What confuses me is that the parser will keep calling itself as long as it succeeds with parsing any character!
edit :* NOTE THE BELOW IS NOT A PART OF THE QUESTION, JUST TO SHARE THE CODE IF SOMEONE IS INTERESTED
thanks for all responses, I have updated the parser to parse Bold italic underline..., as per the below screen shot
type Style = Bold| Unstyled | Italic | Coded | Lined | Titled | Marked | Underline
styleParser : Bool ->Bool ->Bool ->Bool-> Bool-> Bool->Bool
-> Parser ( List (List Char , (Style,Style,Style,Style,Style,Style,Style)))
--(bold,italic ,code,line ,Titled,mark)
styleParser bolded italiced coded lined titled marked underlined=
let
style = (
if bolded then Bold else Unstyled
,if italiced then Italic else Unstyled
,if coded then Coded else Unstyled
,if lined then Lined else Unstyled
,if titled then Titled else Unstyled
,if marked then Marked else Unstyled
,if underlined then Underline else Unstyled
)
in
(end `andThen` always ( succeed ( [] )))
<|> (string "//" `andThen` \_ -> styleParser bolded italiced coded lined titled marked (not underlined))
<|> (string "**" `andThen` \_ -> styleParser (not bolded) italiced coded lined titled marked underlined)
<|> (string "*" `andThen` \_ -> styleParser bolded (not italiced) coded lined titled marked underlined)
<|> (string "`" `andThen` \_ -> styleParser bolded italiced (not coded) lined titled marked underlined)
<|> (string "/br" `andThen` \_ -> styleParser bolded italiced coded (not lined) titled marked underlined)
<|> (string "/*" `andThen` \_ -> styleParser bolded italiced coded lined (not titled) marked underlined)
<|> (string "{-" `andThen` \_ -> styleParser bolded italiced coded lined titled (not marked) underlined)
<|> ( anyChar `andThen` \c -> styleParser bolded italiced coded lined titled marked underlined `andThen` \cs -> succeed ((c :: [],style) :: cs) )
foldStyleHtml : List ( List Char , ( Style,Style,Style,Style,Style,Style,Style) ) -> List (Html Msg)
foldStyleHtml lst =
List.map styleToHtml lst
styleToHtml : ( List Char, (Style ,Style,Style,Style,Style,Style,Style)) -> Html Msg
styleToHtml (a,b) =
case b of
(Bold,Italic,_,_,_,_,Unstyled) -> strong [] [em [][ text (String.fromList a)]]
(Bold,Italic,_,_,_,_,Underline) -> u[][ strong [] [em [][ text (String.fromList a)]]]
(Bold,Unstyled,_,_,_,_,Underline) -> u[][ strong [] [text (String.fromList a)]]
(Unstyled,Italic,_,_,_,_,Underline) -> u[][ em [] [text (String.fromList a)]]
(Unstyled,Italic,_,_,_,_,_) -> em[] [text (String.fromList a)]
(Bold,Unstyled,_,_,_,_,_) -> strong [][ text (String.fromList a)]
(_,_,Coded,_,_,_,_) -> code [codeStyle ][text (String.fromList a)]
(_,_,_,Lined,_,_,_) -> br [][text " "]
-- (_,_,_,_,Titled,_,_) -> div [][text (String.fromList a)]
(_,_,_,_,_,Marked,_) -> mark [][text (String.fromList a)]
(_,_,_,_,_,_,Underline) -> u [][text (String.fromList a)]
(_,_,_,_,_,_,_) -> text (String.fromList a)
htmlParser : Parser (List (Html Msg))
htmlParser =
styleParser False False False False False False False `andThen` (succeed << foldStyleHtml )
runParser : Parser (List (Html Msg)) -> String -> Html Msg
runParser parser str =
case parse parser str of
(Ok htmls,_)-> div [] htmls
(Err err, _) -> div [ style [("color", "red")] ] [ text <| toString <| err]

Parser combinators (generally) consume input as they succeed. In this library, if string "(!ITALIC!)" fails, it will not consume any input. Since the <|> combinator is used, it then tries to use the next part of the code that starts with anyChar.
When anyChar succeeds, it consumes that single character and captures it in c after andThen. Then the remaining string (everything but the character captured by anyChar) is then "crawled" when the recursive call to styleParserItalic bolded is made. That second andThen captures the output of the recursive combinator into cs and prepends the captured character onto the rest of the list of characters from the recursive call.
I think the important part to remember is that the combinators consume input as they succeed and (generally) don't consume input when they fail.

First a few simplications...
(1) Every element of the inner List in the signature:
styleParserItalic : Bool -> Parser ( List (List Char , Style))
^^^^^^^^^
is just a single character. By just removing :: [] from the last line,
<|> ... `andThen` \cs -> succeed ((c ,style) :: cs) )
^^^
removed `:: []`
you can make it have this signature.
(2) Note that the bolded argument only affects the Style - it has not effect on control flow. It really should be called italic since the style appearing in the output is Italic if the argument is True and Unstyled otherwise.
Also note that once this parameter is set the True, it will remain True in
all subsequent recursive calls.
So now the algorithm is:
If at the end of the line, return the empty list.
If at (!ITALIC!), use Italic for the style in the remainder of the parse.
Otherwise, parse a character, parse the remainder of the line and concatenate the results.
An approximate Python algorithm would be something like:
def parseLine(style, line):
result = []
while line:
if line.startsWith('(!ITALIC!)'):
line = line[8:]
style = Italic
# loop around
else:
result.append( (line[0], style) )
line = line[1:]
return result

Related

Format a string in Elm

I have a list of string and generate it to HTML dynamically with li tag. I want to assign that value to id attribute as well. But the problem is the string item has some special characters like :, ', é, ... I just want the output to include the number(0-9) and the alphabet (a-z) only.
// Input:
listStr = ["Pop & Suki", "PINK N' PROPER", "L'Oréal Paris"]
// Output:
result = ["pop_suki", "pink_n_proper", "loreal_paris"] ("loral_paris" is also good)
Currently, I've just lowercased and replace " " to _, but don't know how to eliminate special character.
Many thanks!
Instead of thinking of it as eliminating special characters, consider the permitted characters – you want just lower-case alphanumeric characters.
Elm provides Char.isAlphaNum to test for alphanumeric characters, and Char.toLower to transform a character to lower case. It also provides the higher function String.foldl which you can use to process a String one Char at a time.
So for each character:
check if it's alphanumeric
if it is, transform it to lower case
if not and it is a space, transform it to an underscore
else drop the character
Putting this together, we create a function that processes a character and appends it to the string processed so far, then apply that to all characters in the input string:
transformNextCharacter : Char -> String -> String
transformNextCharacter nextCharacter partialString =
if Char.isAlphaNum nextCharacter then
partialString ++ String.fromChar (Char.toLower nextCharacter)
else if nextCharacter == ' ' then
partialString ++ "_"
else
partialString
transformString : String -> String
transformString inputString =
String.foldl transformNextCharacter "" inputString
Online demo here.
Note: This answer simply drops special characters and thus produces "loral_paris" which is acceptable as per the OP.
The answer that was ticked is a lot more efficient than the code I have below. Nonetheless, I just want to add my code as an optional method.
Nonetheless, if you want to change accents to normal characters, you can install and use the elm-community/string-extra package. That one has the remove accent method.
This code below is inefficient as you keep on calling library function on the same string of which all of them would go through your string one char at a time.
Also, take note that when you remove the & in the first index you would have a double underscore. You would have to replace the double underscore with a single underscore.
import Html exposing (text)
import String
import List
import String.Extra
import Char
listStr = ["Pop & Suki", "PINK N' PROPER", "L'Oréal Paris"]
-- True if alpha or digit or space, otherwise, False.
isDigitAlphaSpace : Char -> Bool
isDigitAlphaSpace c =
if Char.isAlpha c || Char.isDigit c || c == ' ' then
True
else
False
main =
List.map (\x -> String.Extra.removeAccents x --Remove Accents first
|> String.filter isDigitAlphaSpace --Remove anything that not digit alpha or space
|> String.replace " " "_" --Replace space with _
|> String.replace "__" "_" --Replace double __ with _
|> String.toLower) listStr --Turn the string to lower
|> Debug.toString
|> Html.text

Print elements in list in OCaml

I want to write a tail recursive function to print elements in a string list in separate lines like this:
# printlist ["a";"b";"c"];;
a
b
c
- : unit = ()
# printlist ["hello";"thanks"];;
hello
thanks
- : unit = ()
I was able to get it to work using print_endline with no problem:
let rec printlist strlist =
match strlist with
| [] -> print_endline ""
| hd::[] -> print_endline hd
| hd :: tl -> print_endline hd ; printlist tl;;
However, as soon as I switch to printf, it doesn't work anymore. What's wrong with my printf version?
let rec printlist strlist =
match strlist with
| [] -> printf ""
| hd::[] -> printf hd
| hd :: tl -> printf "%s\n" hd ; printlist tl;;
Error: This expression has type
(unit, out_channel, unit) format =
(unit, out_channel, unit, unit, unit, unit)
CamlinternalFormatBasics.format6
but an expression was expected of type string
In essence, you're trying to use printf without a format. The first argument of printf has to be a constant string. So you should have this:
printf "%s" hd
rather than this:
printf hd
To see why this is required, imagine what would happen if some of the strings in your input contained percent characters. Things would get out of control (type-wise) quite quickly.
In addition to Jeffrey's answer, I would suggest you use the standard library more in order to write more concise code.
List.iter, for example, calls a given function on all the elements of the list:
let print_list l = List.iter (fun e -> Printf.printf "%s\n" e) l
Using partial application smartly, you can make this line even shorter and more readable:
let print_list = List.iter (Printf.printf "%s\n")
The only difference with your function is the newline after the last element.
On the other hand, instead of printing elements one after another, a more functional and idiomatic approach would be to build the whole string first, and then print it.
Lucky you, the standard library got you covered. String.concat joins the elements in a string list together in one big string. You can also specify a string to use as a separator, and you don't have to worry about the newline after the last element.
let print_list l = print_string (String.concat "\n" l)

Getting syntax error in ocaml

I am trying to make a module that would allow to create a table in ocaml. It would do a query called project to limit the table's values. However on the last line of the definition of the function chooser I am getting syntax error.
module type TABLE =
sig
type database
type table
val create_table: string list * string list* (string list) list -> table
val printTable : table -> string
val listToString : string list -> string
val project : string list * table -> table
val chooser : string list * string list-> string list
end;;
module UsingTable : TABLE =
struct
type table = (string list * string list* (string list) list)
type database = table list
let create_table (a,b,c) = (a,b,c)
let chooser inputList = (
for i = 0 to (List.length trueFalseList-1) do
if List.nth trueFalseList i = "True"
then
(List.nth inputList i)::ans
done
List.rev ans;;)
let project (conditions, aTable)= (
let rec innerProc tmp= function
n,[],v->List.rev tmp
|n,cH::cT,v-> if List.mem cH conditions
then innerProc (["True"]::tmp) (n,cT,v)
else innerProc (["False"]::tmp) (n,cT,v)
in
let trueFalseList = innerProc [] aTable
let rec finalListCreator = match aTable with
n,[],[]->n,[],[]
|n,cH::cT,[]->n,chooser cH ::finalListCreator cT,[]
|n,c,h::t -> n,c,chooser h ::finalListCreator t
)
let rec listToString aList = match aList with
[] -> ""
| h::t -> "\t"^h^"\t"^listToString t
let rec printTable aTable = match aTable with
[],[],[] -> ""
| [],[],vH::vT -> "\n"^(listToString vH)^printTable ([],[],vT)
| [],cH::cT,v -> "\t"^cH^"\t"^printTable([],cT, v)
| n, c , v-> "\n"^(List.hd n)^"\n\n"^printTable([],c, v)
end;;
let atable =UsingTable.create_table (["Student"], ["Id";"Name";"Gender";"Course"],
[["001";"Jim";"M";"AlgoDS"];
["002";"Linnea";"F";"Databases"];
["003";"Anna";"F";"C#"];
["004";"Abby";"F";"C#"];
["005";"Arthur";"M";"JavaScript"]]);;
print_string (UsingTable.printTable atable) ;;
These lines have at least two syntax problems:
let chooser inputList = (
for i = 0 to (List.length trueFalseList-1) do
if List.nth trueFalseList i = "True"
then
(List.nth inputList i)::ans
done
List.rev ans;;)
First, the for .. done is one expression, and List.rev ans is another expression. You need a semicolon (;) between them.
Second, you should use ;; only when you want the input up to that point to be processed. But here if you process the input at the ;; you are missing a right parenthesis.
In my opinion, you should be entering ;; only at the toplevel. The best way to think of this token is as an instruction to the toplevel. It's not part of normal OCaml syntax.
These are only the first two errors. There are quite a few other errors in the code. It might be good to add one function at a time to the module so you can concentrate on a few problems at a time.
Update
The environment you're using is a little bit extra complicated because it has an Evaluate button that asks to evaluate what you've typed so far. This makes the ;; token much less useful.
It would be a good discipline to use this environment without using the ;; token at all. Just click the Evaluate button when you want an evaluation.
The main trick is if you want to evaluate a statement (a unit-valued expression in OCaml) at the outer level, like say Printf.printf "hello world\n". The usual idiom to avoid putting ;; before this is to make it into a declaration like so:
let () = Printf.printf "hello world\n"
That is the one non-obvious idiom that people use when writing source code (where the ;; almost never appears in my experience).

Elm: String.toFloat doesn't work with comma only with point - what to do?

I'm very new to elm and i want to do a simple mileage counter app.
If i get "1.2" (POINT) form input - String.toFloat returns in the OK branch with 1.2 as a number.
But if i get "1,2" (COMMA) form input, then String.toFloat returns in the Err branch with "You can't have words, only numbers!"
This pretty much works like a real time validator.
The code:
TypingInInput val ->
case String.toFloat val of
Ok success ->
{ model | inputValue = val, errorMessage = Nothing }
Err err ->
{ model | inputValue = val, errorMessage = Just "You can't have words, or spaces, only numbers!" }
.
Question: So how can i force String.toFloat of "1,2" to give me 1.2 the number?
Unfortunately the source for toFloat is hardcoded to only respect a dot as decimal separator. You can replace the comma with a dot in the string prior to passing it to toFloat as a workaround.
String.Extra.replace can be used for the simple string replacement.
The implementation of String.toFloat only supports a dot as a separator.
You should replace commas first before parsing the Float
Please see the example:
import Html exposing (text)
import String
import Regex
main =
"1,2"
|> Regex.replace Regex.All (Regex.regex ",") (\_ -> ".")
|> String.toFloat
|> toString
|> text -- 1.2
In JavaScript parseFloat doesn't support comma separator either.

Matching bytestrings in Parsec

I am currently trying to use the Full CSV Parser presented in Real World Haskell. In order to I tried to modify the code to use ByteString instead of String, but there is a string combinator which just works with String.
Is there a Parsec combinator similar to string that works with ByteString, without having to do conversions back and forth?
I've seen there is an alternative parser that handles ByteString: attoparsec, but I would prefer to stick with Parsec, since I'm just learning how to use it.
I'm assuming you're starting with something like
import Prelude hiding (getContents, putStrLn)
import Data.ByteString
import Text.Parsec.ByteString
Here's what I've got so far. There are two versions. Both compile. Probably neither is exactly what you want, but they should aid discussion and help you to clarify your question.
Something I noticed along the way:
If you import Text.Parsec.ByteString then this uses uncons from Data.ByteString.Char8, which in turn uses w2c from Data.ByteString.Internal, to convert all read bytes to Chars. This enables Parsec's line and column number error reporting to work sensibly, and also enables you to use string and friends without problem.
Thus, the easy version of the CSV parser, which does exactly that:
import Prelude hiding (getContents, putStrLn)
import Data.ByteString (ByteString)
import qualified Prelude (getContents, putStrLn)
import qualified Data.ByteString as ByteString (getContents)
import Text.Parsec
import Text.Parsec.ByteString
csvFile :: Parser [[String]]
csvFile = endBy line eol
line :: Parser [String]
line = sepBy cell (char ',')
cell :: Parser String
cell = quotedCell <|> many (noneOf ",\n\r")
quotedCell :: Parser String
quotedCell =
do _ <- char '"'
content <- many quotedChar
_ <- char '"' <?> "quote at end of cell"
return content
quotedChar :: Parser Char
quotedChar =
noneOf "\""
<|> try (string "\"\"" >> return '"')
eol :: Parser String
eol = try (string "\n\r")
<|> try (string "\r\n")
<|> string "\n"
<|> string "\r"
<?> "end of line"
parseCSV :: ByteString -> Either ParseError [[String]]
parseCSV = parse csvFile "(unknown)"
main :: IO ()
main =
do c <- ByteString.getContents
case parse csvFile "(stdin)" c of
Left e -> do Prelude.putStrLn "Error parsing input:"
print e
Right r -> mapM_ print r
But this was so trivial to get working that I assume it cannot possibly be what you want. Perhaps you want everything to remain a ByteString or [Word8] or something similar all the way through? Hence my second attempt below. I am still importing Text.Parsec.ByteString, which may be a mistake, and the code is hopelessly riddled with conversions.
But, it compiles and has complete type annotations, and therefore should make a sound starting point.
import Prelude hiding (getContents, putStrLn)
import Data.ByteString (ByteString)
import Control.Monad (liftM)
import qualified Prelude (getContents, putStrLn)
import qualified Data.ByteString as ByteString (pack, getContents)
import qualified Data.ByteString.Char8 as Char8 (pack)
import Data.Word (Word8)
import Data.ByteString.Internal (c2w)
import Text.Parsec ((<|>), (<?>), parse, try, endBy, sepBy, many)
import Text.Parsec.ByteString
import Text.Parsec.Prim (tokens, tokenPrim)
import Text.Parsec.Pos (updatePosChar, updatePosString)
import Text.Parsec.Error (ParseError)
csvFile :: Parser [[ByteString]]
csvFile = endBy line eol
line :: Parser [ByteString]
line = sepBy cell (char ',')
cell :: Parser ByteString
cell = quotedCell <|> liftM ByteString.pack (many (noneOf ",\n\r"))
quotedCell :: Parser ByteString
quotedCell =
do _ <- char '"'
content <- many quotedChar
_ <- char '"' <?> "quote at end of cell"
return (ByteString.pack content)
quotedChar :: Parser Word8
quotedChar =
noneOf "\""
<|> try (string "\"\"" >> return (c2w '"'))
eol :: Parser ByteString
eol = try (string "\n\r")
<|> try (string "\r\n")
<|> string "\n"
<|> string "\r"
<?> "end of line"
parseCSV :: ByteString -> Either ParseError [[ByteString]]
parseCSV = parse csvFile "(unknown)"
main :: IO ()
main =
do c <- ByteString.getContents
case parse csvFile "(stdin)" c of
Left e -> do Prelude.putStrLn "Error parsing input:"
print e
Right r -> mapM_ print r
-- replacements for some of the functions in the Parsec library
noneOf :: String -> Parser Word8
noneOf cs = satisfy (\b -> b `notElem` [c2w c | c <- cs])
char :: Char -> Parser Word8
char c = byte (c2w c)
byte :: Word8 -> Parser Word8
byte c = satisfy (==c) <?> show [c]
satisfy :: (Word8 -> Bool) -> Parser Word8
satisfy f = tokenPrim (\c -> show [c])
(\pos c _cs -> updatePosChar pos c)
(\c -> if f (c2w c) then Just (c2w c) else Nothing)
string :: String -> Parser ByteString
string s = liftM Char8.pack (tokens show updatePosString s)
Probably your concern, efficiency-wise, should be those two ByteString.pack instructions, in the definitions of cell and quotedCell. You might try to replace the Text.Parsec.ByteString module so that instead of “making strict ByteStrings an instance of Stream with Char token type,” you make ByteStrings an instance of Stream with Word8 token type, but this won't help you with efficiency, it will just give you a headache trying to reimplement all the sourcePos functions to keep track of your position in the input for error messages.
No, the way to make it more efficient would be to change the types of char, quotedChar and string to Parser [Word8] and the types of line and csvFile to Parser [[Word8]] and Parser [[[Word8]]] respectively. You could even change the type of eol to Parser (). The necessary changes would look something like this:
cell :: Parser [Word8]
cell = quotedCell <|> many (noneOf ",\n\r")
quotedCell :: Parser [Word8]
quotedCell =
do _ <- char '"'
content <- many quotedChar
_ <- char '"' <?> "quote at end of cell"
return content
string :: String -> Parser [Word8]
string s = [c2w c | c <- (tokens show updatePosString s)]
You don't need to worry about all the calls to c2w as far as efficiency is concerned, because they cost nothing.
If this doesn't answer your question, please say what would.
I don't believe so. You will need to create one yourself using tokens. Although the documentation for it is a bit... nonexistent, the first two arguments are a function to use to show the expected tokens in an error message and a function to update the source position that will be printed in errors.