Format a string in Elm - elm

I have a list of string and generate it to HTML dynamically with li tag. I want to assign that value to id attribute as well. But the problem is the string item has some special characters like :, ', é, ... I just want the output to include the number(0-9) and the alphabet (a-z) only.
// Input:
listStr = ["Pop & Suki", "PINK N' PROPER", "L'Oréal Paris"]
// Output:
result = ["pop_suki", "pink_n_proper", "loreal_paris"] ("loral_paris" is also good)
Currently, I've just lowercased and replace " " to _, but don't know how to eliminate special character.
Many thanks!

Instead of thinking of it as eliminating special characters, consider the permitted characters – you want just lower-case alphanumeric characters.
Elm provides Char.isAlphaNum to test for alphanumeric characters, and Char.toLower to transform a character to lower case. It also provides the higher function String.foldl which you can use to process a String one Char at a time.
So for each character:
check if it's alphanumeric
if it is, transform it to lower case
if not and it is a space, transform it to an underscore
else drop the character
Putting this together, we create a function that processes a character and appends it to the string processed so far, then apply that to all characters in the input string:
transformNextCharacter : Char -> String -> String
transformNextCharacter nextCharacter partialString =
if Char.isAlphaNum nextCharacter then
partialString ++ String.fromChar (Char.toLower nextCharacter)
else if nextCharacter == ' ' then
partialString ++ "_"
else
partialString
transformString : String -> String
transformString inputString =
String.foldl transformNextCharacter "" inputString
Online demo here.
Note: This answer simply drops special characters and thus produces "loral_paris" which is acceptable as per the OP.

The answer that was ticked is a lot more efficient than the code I have below. Nonetheless, I just want to add my code as an optional method.
Nonetheless, if you want to change accents to normal characters, you can install and use the elm-community/string-extra package. That one has the remove accent method.
This code below is inefficient as you keep on calling library function on the same string of which all of them would go through your string one char at a time.
Also, take note that when you remove the & in the first index you would have a double underscore. You would have to replace the double underscore with a single underscore.
import Html exposing (text)
import String
import List
import String.Extra
import Char
listStr = ["Pop & Suki", "PINK N' PROPER", "L'Oréal Paris"]
-- True if alpha or digit or space, otherwise, False.
isDigitAlphaSpace : Char -> Bool
isDigitAlphaSpace c =
if Char.isAlpha c || Char.isDigit c || c == ' ' then
True
else
False
main =
List.map (\x -> String.Extra.removeAccents x --Remove Accents first
|> String.filter isDigitAlphaSpace --Remove anything that not digit alpha or space
|> String.replace " " "_" --Replace space with _
|> String.replace "__" "_" --Replace double __ with _
|> String.toLower) listStr --Turn the string to lower
|> Debug.toString
|> Html.text

Related

Take string if not empty not working as expected

I have a fun that replaces comma with empty string. Then converts it to double. User can input only numbers and commas. After replace I want to take this string only if its not empty for conversion purposes. But although its empty takeIf{} doesnt seem to see it this way. When I enter only "," as first char in a string, the conversion fails throwing:
java.lang.NumberFormatException: empty String
Scenario when entering , as first char.
replace(",", "")
.takeIf {
println(it) //prints nothing
println(it.length) //prints 0
if (isNotEmpty()) { // docs says that it checks if length > 0, which is not, so string is empty
println("not empty") // still prints not empty
} else {
println("empty")
}
isNotEmpty()
}?.toDouble()) // runs toDouble on empty string
Logs:
System.out: 0
System.out: not empty
Note that replace returns the adapted String, but does not alter the underlying string... So when you call isNotEmpty() you check for the initial/receiver String (which was ","), but when you print the content or the length, you take the result of the replacement (it).
So if you use it consequently, it will work as you expect.
Note also that there exists toDoubleOrNull() which just returns null if no double can be extracted from the String, e.g:
replace(",", "").toDoubleOrNull() // if it is not parseable, we get null
So you can spare even more characters and conditions in your code.

Checking if string is empty in Kotlin

In Java, we've always been reminded to use myString.isEmpty() to check whether a String is empty. In Kotlin however, I find that you can use either myString == "" or myString.isEmpty() or even myString.isBlank().
Are there any guidelines/recommendations on this? Or is it simply "anything that rocks your boat"?
Thanks in advance for feeding my curiosity. :D
Don't use myString == "", in java this would be myString.equals("") which also isn't recommended.
isBlank is not the same as isEmpty and it really depends on your use-case.
isBlank checks that a char sequence has a 0 length or that all indices are white space. isEmpty only checks that the char sequence length is 0.
/**
* Returns `true` if this string is empty or consists solely of whitespace characters.
*/
public fun CharSequence.isBlank(): Boolean = length == 0 || indices.all { this[it].isWhitespace() }
/**
* Returns `true` if this char sequence is empty (contains no characters).
*/
#kotlin.internal.InlineOnly
public inline fun CharSequence.isEmpty(): Boolean = length == 0
For String? (nullable String) datatype, I use .isNullOrBlank()
For String, I use .isBlank()
Why? Because most of the time, I do not want to allow Strings with whitespace (and .isBlank() checks whitespace as well as empty String). If you don't care about whitespace, use .isNullorEmpty() and .isEmpty() for String? and String, respectively.
Use isEmpty when you want to test that a String is exactly equal to the empty string "".
Use isBlank when you want to test that a String is empty or only consists of whitespace ("", " ").
Avoid using == "".
There are two methods available in Kotlin.
isNullOrBlank()
isNullOrEmpty()
And the difference is:
data = " " // this is a text with blank space
println(data.isNullOrBlank()?.toString()) //true
println(data.isNullOrEmpty()?.toString()) //false
You can use isNullOrBlank() to check is a string is null or empty. This method considers spaces only strings to be empty.
Here is a usage example:
val s: String? = null
println(s.isNullOrBlank())
val s1: String? = ""
println(s1.isNullOrBlank())
val s2: String? = " "
println(s2.isNullOrBlank())
val s3: String? = " a "
println(s3.isNullOrBlank())
The output of this snippet is:
true
true
true
false
As someone mentioned in the comments, you can use ifBlank, like so:
fun getSomeValue(): String {
// ...
val foo = someCall()
return foo.ifBlank { "some-default" }
}
Documentation: https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.text/if-blank.html

Elm: String.toFloat doesn't work with comma only with point - what to do?

I'm very new to elm and i want to do a simple mileage counter app.
If i get "1.2" (POINT) form input - String.toFloat returns in the OK branch with 1.2 as a number.
But if i get "1,2" (COMMA) form input, then String.toFloat returns in the Err branch with "You can't have words, only numbers!"
This pretty much works like a real time validator.
The code:
TypingInInput val ->
case String.toFloat val of
Ok success ->
{ model | inputValue = val, errorMessage = Nothing }
Err err ->
{ model | inputValue = val, errorMessage = Just "You can't have words, or spaces, only numbers!" }
.
Question: So how can i force String.toFloat of "1,2" to give me 1.2 the number?
Unfortunately the source for toFloat is hardcoded to only respect a dot as decimal separator. You can replace the comma with a dot in the string prior to passing it to toFloat as a workaround.
String.Extra.replace can be used for the simple string replacement.
The implementation of String.toFloat only supports a dot as a separator.
You should replace commas first before parsing the Float
Please see the example:
import Html exposing (text)
import String
import Regex
main =
"1,2"
|> Regex.replace Regex.All (Regex.regex ",") (\_ -> ".")
|> String.toFloat
|> toString
|> text -- 1.2
In JavaScript parseFloat doesn't support comma separator either.

NSXMLNode textWithStringValue with entity

Creating NSXMLNode with string:
NSXMLNode *node1 = [NSXMLNode textWithStringValue:#"<"];
NSLog(#"node1=%#",node1);
NSXMLNode *node2 = [NSXMLNode textWithStringValue:#">"];
NSLog(#"node2=%#",node2);
produces the following output:
node1=<
node2=>
Why is the "<" character escaped (i.e. converted into "<") while the ">" character is not?
Is this a bug?
Which node is handled correctly?
To quote the XML Spec:
The ampersand character (&) and the left angle bracket (<) must not appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. [...] The right angle bracket (>) may be represented using the string " &gt ; ", and must, for compatibility, be escaped using either " &gt ; " or a character reference when it appears in the string " ]]> " in content, when that string is not marking the end of a CDATA section.
In short, there are circumstances in which > does not have to be escaped, such as if it appears in an attribute.
No.
Both are.
If you ask for the string in canonical format, both characters will be escaped:
NSXMLNode *node3 = [NSXMLNode textWithStringValue:#">"];
NSLog(#"node3=%#",[node3 canonicalXMLStringPreservingComments:NO]);
Output:
node3=>

Matching bytestrings in Parsec

I am currently trying to use the Full CSV Parser presented in Real World Haskell. In order to I tried to modify the code to use ByteString instead of String, but there is a string combinator which just works with String.
Is there a Parsec combinator similar to string that works with ByteString, without having to do conversions back and forth?
I've seen there is an alternative parser that handles ByteString: attoparsec, but I would prefer to stick with Parsec, since I'm just learning how to use it.
I'm assuming you're starting with something like
import Prelude hiding (getContents, putStrLn)
import Data.ByteString
import Text.Parsec.ByteString
Here's what I've got so far. There are two versions. Both compile. Probably neither is exactly what you want, but they should aid discussion and help you to clarify your question.
Something I noticed along the way:
If you import Text.Parsec.ByteString then this uses uncons from Data.ByteString.Char8, which in turn uses w2c from Data.ByteString.Internal, to convert all read bytes to Chars. This enables Parsec's line and column number error reporting to work sensibly, and also enables you to use string and friends without problem.
Thus, the easy version of the CSV parser, which does exactly that:
import Prelude hiding (getContents, putStrLn)
import Data.ByteString (ByteString)
import qualified Prelude (getContents, putStrLn)
import qualified Data.ByteString as ByteString (getContents)
import Text.Parsec
import Text.Parsec.ByteString
csvFile :: Parser [[String]]
csvFile = endBy line eol
line :: Parser [String]
line = sepBy cell (char ',')
cell :: Parser String
cell = quotedCell <|> many (noneOf ",\n\r")
quotedCell :: Parser String
quotedCell =
do _ <- char '"'
content <- many quotedChar
_ <- char '"' <?> "quote at end of cell"
return content
quotedChar :: Parser Char
quotedChar =
noneOf "\""
<|> try (string "\"\"" >> return '"')
eol :: Parser String
eol = try (string "\n\r")
<|> try (string "\r\n")
<|> string "\n"
<|> string "\r"
<?> "end of line"
parseCSV :: ByteString -> Either ParseError [[String]]
parseCSV = parse csvFile "(unknown)"
main :: IO ()
main =
do c <- ByteString.getContents
case parse csvFile "(stdin)" c of
Left e -> do Prelude.putStrLn "Error parsing input:"
print e
Right r -> mapM_ print r
But this was so trivial to get working that I assume it cannot possibly be what you want. Perhaps you want everything to remain a ByteString or [Word8] or something similar all the way through? Hence my second attempt below. I am still importing Text.Parsec.ByteString, which may be a mistake, and the code is hopelessly riddled with conversions.
But, it compiles and has complete type annotations, and therefore should make a sound starting point.
import Prelude hiding (getContents, putStrLn)
import Data.ByteString (ByteString)
import Control.Monad (liftM)
import qualified Prelude (getContents, putStrLn)
import qualified Data.ByteString as ByteString (pack, getContents)
import qualified Data.ByteString.Char8 as Char8 (pack)
import Data.Word (Word8)
import Data.ByteString.Internal (c2w)
import Text.Parsec ((<|>), (<?>), parse, try, endBy, sepBy, many)
import Text.Parsec.ByteString
import Text.Parsec.Prim (tokens, tokenPrim)
import Text.Parsec.Pos (updatePosChar, updatePosString)
import Text.Parsec.Error (ParseError)
csvFile :: Parser [[ByteString]]
csvFile = endBy line eol
line :: Parser [ByteString]
line = sepBy cell (char ',')
cell :: Parser ByteString
cell = quotedCell <|> liftM ByteString.pack (many (noneOf ",\n\r"))
quotedCell :: Parser ByteString
quotedCell =
do _ <- char '"'
content <- many quotedChar
_ <- char '"' <?> "quote at end of cell"
return (ByteString.pack content)
quotedChar :: Parser Word8
quotedChar =
noneOf "\""
<|> try (string "\"\"" >> return (c2w '"'))
eol :: Parser ByteString
eol = try (string "\n\r")
<|> try (string "\r\n")
<|> string "\n"
<|> string "\r"
<?> "end of line"
parseCSV :: ByteString -> Either ParseError [[ByteString]]
parseCSV = parse csvFile "(unknown)"
main :: IO ()
main =
do c <- ByteString.getContents
case parse csvFile "(stdin)" c of
Left e -> do Prelude.putStrLn "Error parsing input:"
print e
Right r -> mapM_ print r
-- replacements for some of the functions in the Parsec library
noneOf :: String -> Parser Word8
noneOf cs = satisfy (\b -> b `notElem` [c2w c | c <- cs])
char :: Char -> Parser Word8
char c = byte (c2w c)
byte :: Word8 -> Parser Word8
byte c = satisfy (==c) <?> show [c]
satisfy :: (Word8 -> Bool) -> Parser Word8
satisfy f = tokenPrim (\c -> show [c])
(\pos c _cs -> updatePosChar pos c)
(\c -> if f (c2w c) then Just (c2w c) else Nothing)
string :: String -> Parser ByteString
string s = liftM Char8.pack (tokens show updatePosString s)
Probably your concern, efficiency-wise, should be those two ByteString.pack instructions, in the definitions of cell and quotedCell. You might try to replace the Text.Parsec.ByteString module so that instead of “making strict ByteStrings an instance of Stream with Char token type,” you make ByteStrings an instance of Stream with Word8 token type, but this won't help you with efficiency, it will just give you a headache trying to reimplement all the sourcePos functions to keep track of your position in the input for error messages.
No, the way to make it more efficient would be to change the types of char, quotedChar and string to Parser [Word8] and the types of line and csvFile to Parser [[Word8]] and Parser [[[Word8]]] respectively. You could even change the type of eol to Parser (). The necessary changes would look something like this:
cell :: Parser [Word8]
cell = quotedCell <|> many (noneOf ",\n\r")
quotedCell :: Parser [Word8]
quotedCell =
do _ <- char '"'
content <- many quotedChar
_ <- char '"' <?> "quote at end of cell"
return content
string :: String -> Parser [Word8]
string s = [c2w c | c <- (tokens show updatePosString s)]
You don't need to worry about all the calls to c2w as far as efficiency is concerned, because they cost nothing.
If this doesn't answer your question, please say what would.
I don't believe so. You will need to create one yourself using tokens. Although the documentation for it is a bit... nonexistent, the first two arguments are a function to use to show the expected tokens in an error message and a function to update the source position that will be printed in errors.