Split lines in clojure while reading from file - file-io

I am learning clojure at school and I have an exam coming up. I was just working on a few things to make sure I get the hang of it.
I am trying to read from a file line by line and as I do, I want to split the line whenever there is a ";".
Here is my code so far
(defn readFile []
(map (fn [line] (clojure.string/split line #";"))
(with-open [rdr (reader "C:/Users/Rohil/Documents/work.txt.txt")]
(doseq [line (line-seq rdr)]
(clojure.string/split line #";")
(println line)))))
When I do this, I still get the output:
"I;Am;A;String;"
Am I missing something?

I'm not sure if you need this at school, but since Gary already gave an excellent answer, consider this as a bonus.
You can do elegant transformations on lines of text with transducers. The ingredient you need is something that allows you to treat the lines as a reducible collection and which closes the reader when you're done reducing:
(defn lines-reducible [^BufferedReader rdr]
(reify clojure.lang.IReduceInit
(reduce [this f init]
(try
(loop [state init]
(if (reduced? state)
#state
(if-let [line (.readLine rdr)]
(recur (f state line))
state)))
(finally
(.close rdr))))))
Now you're able to do the following, given input work.txt:
I;am;a;string
Next;line;please
Count the length of each 'split'
(require '[clojure.string :as str])
(require '[clojure.java.io :as io])
(into []
(comp
(mapcat #(str/split % #";"))
(map count))
(lines-reducible (io/reader "/tmp/work.txt")))
;;=> [1 2 1 6 4 4 6]
Sum the length of all 'splits'
(transduce
(comp
(mapcat #(str/split % #";"))
(map count))
+
(lines-reducible (io/reader "/tmp/work.txt")))
;;=> 24
Sum the length of all words until we find a word that is longer than 5
(transduce
(comp
(mapcat #(str/split % #";"))
(map count))
(fn
([] 0)
([sum] sum)
([sum l]
(if (> l 5)
(reduced sum)
(+ sum l))))
(lines-reducible (io/reader "/tmp/work.txt")))
or with take-while:
(transduce
(comp
(mapcat #(str/split % #";"))
(map count)
(take-while #(> 5 %)))
+
(lines-reducible (io/reader "/tmp/work.txt")))
Read https://tech.grammarly.com/blog/building-etl-pipelines-with-clojure for more details.

TL;DR embrace the REPL and embrace immutability
Your question was "what am I missing?" and to that I'd say you're missing one of the best features of Clojure, the REPL.
Edit: you might also be missing that Clojure uses immutable data structures so
consider this code snippet:
(doseq [x [1 2 3]]
(inc x)
(prn x))
This code does not print "2 3 4"
it prints "1 2 3" because x isn't a mutable variable.
During the first iteration (inc x) gets called, returns 2, and that gets thrown away because it wasn't passed to anything, then (prn x) prints the value of x which is still 1.
Now consider this code snippet:
(doseq [x [1 2 3]] (prn (inc x)))
During the first iteration the inc passes its return value to prn so you get 2
Long example:
I don't want to rob you of the opportunity to solve the problem yourself so I'll use a different problem as an example.
Given the file "birds.txt"
with the data "1chicken\n 2duck\n 3Larry"
you want to write a function that takes a file and returns a sequence of bird names
Lets break this problem down into smaller chunks:
first lets read the file and split it up into lines
(slurp "birds.txt") will give us the whole file a string
clojure.string/split-lines will give us a collection with each line as an element in the collection
(clojure.string/split-lines (slurp "birds.txt")) gets us ["1chicken" "2duck" "3Larry"]
At this point we could map some function over that collection to strip out the number like (map #(clojure.string/replace % #"\d" "") birds-collection)
or we could just move that step up the pipeline when the whole file is one string.
Now that we have all of our pieces we can put them together in a functional pipeline where the result of one piece feeds into the next
In Clojure there is a nice macro to make this more readable, the -> macro
It takes the result of one computation and injects it as the first argument to the next
so our pipeline looks like this:
(-> "C:/birds.txt"
slurp
(clojure.string/replace #"\d" "")
clojure.string/split-lines)
last note on style, for Clojure functions you want to stick to kebab case so readFile should be read-file

I would keep it simple, and code it like this:
(ns tst.demo.core
(:use tupelo.test)
(:require [tupelo.core :as t]
[clojure.string :as str] ))
(def text
"I;am;a;line;
This;is;another;one
Followed;by;this;")
(def tmp-file-name "/tmp/lines.txt")
(dotest
(spit tmp-file-name text) ; write it to a tmp file
(let [lines (str/split-lines (slurp tmp-file-name))
result (for [line lines]
(for [word (str/split line #";")]
(str/trim word)))
result-flat (flatten result)]
(is= result
[["I" "am" "a" "line"]
["This" "is" "another" "one"]
["Followed" "by" "this"]])
Notice that result is a doubly-nested (2D) matrix of words. The simplest way to undo this is the flatten function to produce result-flat:
(is= result-flat
["I" "am" "a" "line" "This" "is" "another" "one" "Followed" "by" "this"])))
You could also use apply concat as in:
(is= (apply concat result) result-flat)
If you want to avoid building up a 2D matrix in the first place, you can use a generator function (a la Python) via lazy-gen and yield from the Tupelo library:
(dotest
(spit tmp-file-name text) ; write it to a tmp file
(let [lines (str/split-lines (slurp tmp-file-name))
result (t/lazy-gen
(doseq [line lines]
(let [words (str/split line #";")]
(doseq [word words]
(t/yield (str/trim word))))))]
(is= result
["I" "am" "a" "line" "This" "is" "another" "one" "Followed" "by" "this"])))
In this case, lazy-gen creates the generator function.
Notice that for has been replaced with doseq, and the yield function places each word into the output lazy sequence.

Related

Can't get Clojure macro to execute without expansion error

I'm writing a macro that looks through the metadata on a given symbol and removes any entries that are not keywords, i.e. the key name doesn't start with a ":" e.g.
(meta (var X)) ;; Here's the metadata for testing...
=>
{:line 1,
:column 1,
:file "C:\\Users\\Joe User\\AppData\\Local\\Temp\\form-init11598934441516564808.clj",
:name X,
:ns #object[clojure.lang.Namespace 0x12ed80f6 "thic.core"],
OneHundred 100,
NinetyNine 99}
I want to remove entryes "OneHundred" and "NinetyNine" and leave the rest of the metadata untouched.
So I have a bit of code that works:
(let [Hold# (meta (var X))] ;;Make a copy of the metadata to search.
(map (fn [[kee valu]] ;;Loop through each metadata key/value.
(if
(not= \: (first (str kee))) ;; If we find a non-keyword key,
(reset-meta! (var X) (dissoc (meta (var X)) kee)) ;; remove it from X's metadata.
)
)
Hold# ;;map through this copy of the metadata.
)
)
It works. The entries for "OneHundred" and "NinetyNine" are gone from X's metadata.
Then I code it up into a macro. God bless REPL's.
(defmacro DelMeta! [S]
`(let [Hold# (meta (var ~S))] ;; Hold onto a copy of S's metadata.
(map ;; Scan through the copy looking for keys that DON'T start with ":"
(fn [[kee valu]]
(if ;; If we find metadata whose keyname does not start with a ":"
(not= \: (first (str kee)))
(reset-meta! (var ~S) (dissoc (meta (var ~S)) kee)) ;; remove it from S's metadata.
)
)
Hold# ;; Loop through the copy of S's metadata so as to not confuse things.
)
)
)
Defining the macro with defmacro works without error.
macroexpand-1 on the macro, e.g.
(macroexpand-1 '(DelMeta! X))
expands into the proper code. Here:
(macroexpand-1 '(DelMeta! X))
=>
(clojure.core/let
[Hold__2135__auto__ (clojure.core/meta (var X))]
(clojure.core/map
(clojure.core/fn
[[thic.core/kee thic.core/valu]]
(if
(clojure.core/not= \: (clojure.core/first (clojure.core/str thic.core/kee)))
(clojure.core/reset-meta! (var X) (clojure.core/dissoc (clojure.core/meta (var X)) thic.core/kee))))
Hold__2135__auto__))
BUT!!!
Actually invoking the macro at the REPL with a real parameter blatzes out the most incomprehensible error message:
(DelMeta! X) ;;Invoke DelMeta! macro with symbol X.
Syntax error macroexpanding clojure.core/fn at (C:\Users\Joe User\AppData\Local\Temp\form-init11598934441516564808.clj:1:1).
([thic.core/kee thic.core/valu]) - failed: Extra input at: [:fn-tail :arity-1 :params] spec: :clojure.core.specs.alpha/param-list
(thic.core/kee thic.core/valu) - failed: Extra input at: [:fn-tail :arity-n :params] spec: :clojure.core.specs.alpha/param-list
Oh, all-powerful and wise Clojuregods, I beseech thee upon thy mercy.
Whither is my sin?
You don't need a macro here. Also, you are misunderstanding the nature of a Clojure keyword, and the complications of a Clojure Var vs a local variable.
Keep it simple to start by using a local "variable" in a let block instead of a Var:
(ns tst.demo.core
(:use tupelo.core tupelo.test))
(dotest
(let [x (with-meta [1 2 3] {:my "meta"})
x2 (vary-meta x assoc :your 25 'abc :def)
x3 (vary-meta x2 dissoc 'abc )]
(is= x [1 2 3])
(is= x2 [1 2 3])
(is= x3 [1 2 3])
(is= (meta x) {:my "meta"})
(is= (meta x2) {:my "meta", :your 25, 'abc :def})
(is= (meta x3) {:my "meta", :your 25}))
So we see the value of x, x2, and x3 is constant. That is the purpose of metadata. The 2nd set of tests shows the effects on the metadata of using vary-meta, which is the best way to change the value.
When we use a Var, it is not only a global value, but it is like a double-indirection of pointers in C. Please see this question:
When to use a Var instead of a function?
This answer also clarifies the difference between a string, a symbol, and a keyword. This is important.
Consider this code
(def ^{:my "meta"} data [1 2 3])
(spyx data)
(spyx-pretty (meta (var data)))
and the result:
data => [1 2 3]
(meta (var data)) =>
{:my "meta",
:line 19,
:column 5,
:file "tst/demo/core.cljc",
:name data,
:ns #object[clojure.lang.Namespace 0x4e4a2bb4 "tst.demo.core"]}
(is= data [1 2 3])
(is= (set (keys (meta (var data))))
#{:my :line :column :file :name :ns})
So we have added the key :my to the metadata as desired. How can we alter it? For a Var, use the function alter-meta!
(alter-meta! (var data) assoc :your 25 'abc :def)
(is= (set (keys (meta (var data))))
#{:ns :name :file 'abc :your :column :line :my})
So we have added 2 new entries to the metadata map. One has the keyword :your as key with value 25, the other has the symbol abc as key with value :def (a keyword).
We can also use alter-meta! to remote a key/val pair from the metadata map:
(alter-meta! (var data) dissoc 'abc )
(is= (set (keys (meta (var data))))
#{:ns :name :file :your :column :line :my})
Keyword vs Symbol vs String
A string literal in a source file has double quotes at each end, but they are not characters in the string. Similarly a keyword literal in a source file needs a leading colon to identify it as such. However, neither the double-quotes of the string nor the colon of the keyword are a part of the name of that value.
Thus, you can't identify a keyword by the colon. You should use these functions to identify different data types:
string?
keyword?
symbol?
the above are from the Clojure CheatSheet. So, the code you really want is:
(defn remove-metadata-symbol-keys
[var-obj]
(assert (var? var-obj)) ; verify it is a Var
(doseq [k (keys (meta var-obj))]
(when (not (keyword? k))
(alter-meta! var-obj dissoc k))))
with a sample:
(def ^{:some "stuff" 'other :things} myVar [1 2 3])
(newline) (spyx-pretty (meta (var myVar)))
(remove-metadata-symbol-keys (var myVar))
(newline) (spyx-pretty (meta (var myVar)))
and result:
(meta (var myVar)) =>
{:some "stuff",
other :things, ; *** to be removed ***
:line 42,
:column 5,
:file "tst/demo/core.cljc",
:name myVar,
:ns #object[clojure.lang.Namespace 0x9b9155f "tst.demo.core"]}
(meta (var myVar)) => ; *** after removing non-keyword keys ***
{:some "stuff",
:line 42,
:column 5,
:file "tst/demo/core.cljc",
:name myVar,
:ns #object[clojure.lang.Namespace 0x9b9155f "tst.demo.core"]}
The above code was all run using this template project.

How to read just two integers from file in common lisp

(defun read-file (filename)
(with-open-file (stream filename)
(loop for line = (read-line stream nil)
while line
collect line)
)
)
I'm totally new to lisp so I want to read integer by integer but I have this line by line piece of code.
So I couldn't find that.
For exmp my file;
10 20
I need help .thx
Not sure exactly what you're asking, so here's a short proposition to read a file into a string, split it by whitespace, and parse each number with parse-integer:
(mapcar #'parse-integer (str:words (uiop:read-file-string "foo.txt")))
uiop comes from ASDF and is included in all major implementations, str is a library to quickload.
uiop also has read-file-lines.
OP goal seems to be to read lines from a file, each line containing a pair of integers, and to return a list containing all of the integers read from the file.
Given an input file, numbers.dat:
10 20
30 40
50 60
70 80
90 100
the file can be read into a list using read-line:
CL-USER> (with-open-file (in "numbers.dat" :direction :input)
(loop :for line := (read-line in nil)
:while line
:collect line))
("10 20" "30 40" "50 60" "70 80" "90 100" "")
But now the list contains strings, and each string corresponds to a pair of integers. We need a function to extract the integers from the strings. Common Lisp has the function read-from-string, which parses a string containing a printed representation of an object and returns that object and the first unread position of the input string. This function can be used in a loop to extract the integers from an input string:
CL-USER> (loop :with num
:and pos := 0
:do (setf (values num pos)
(read-from-string "10 20" t nil :start pos))
:collect num
:until (= pos 5))
(10 20)
If this code is removed to a function that can handle strings of varying lengths and empty strings, and mapped over an input list like ("10 20" "30 40" "50 60" "70 80" "90 100" ""), we will be close to the goal with ((10 20) (30 40) (50 60) (70 80) (90 100) ()). The contents of such a list could be appended together to obtain the desired list of all integers from the file:
;;; Expects well-behaved input
(defun get-numbers-from-string (string)
(if (string= string "") ; empty string returns empty list
'()
(let ((*read-eval* nil) ; small precaution to guard against malicious input
(length (length string)))
(loop :with num ; number from STRING
:and pos := 0 ; next position in STRING to read from
:do (setf (values num pos)
(read-from-string string t nil :start pos))
:collect num
:until (= pos length)))))
(defun read-integer-pairs (filename)
(with-open-file (in filename
:direction :input)
(apply #'append ; combine sublists into a single list of numbers
(mapcar #'get-numbers-from-string ; transform strings to number pairs
(loop :for line := (read-line in nil)
:while line
:collect line)))))
Sample interaction:
CL-USER> (read-integer-pairs "numbers.dat")
(10 20 30 40 50 60 70 80 90 100)
"Read just 2 integers from a file"
leaves some room for interpretation.
I will start by answering in the most literal sense:
Really just 2 integers in the file
(defun read-2-integers (stream)
(let ((i0 (read stream nil))
(i1 (read stream nil)))
(list i0 i1)))
If you write your functions in terms of a stream instead of a filename, you have an easier time testing interactively. For example:
CL-USER> (with-input-from-string (stream "10 20")
(read-2-integers stream))
(10 20)
2 integers per line in the file
Here is, where your code in the question comes in - as you would want
to read the file line by line (in case there are so many lines, that the whole file might not fit into memory or just because of other reasons).
(defun read-lines-with-2-integers (stream)
(loop
for line = (read-line stream nil)
while line
collecting (with-input-from-string (line-stream line)
(read-2-integers line-stream))))
Let's test if it works:
CL-USER> (with-input-from-string (stream "10 20
30 40
50 60")
(read-lines-with-2-integers stream))
((10 20) (30 40) (50 60))
Lines with integers in the file
;; first, we replace our function "read-2-integers" with something a bit more general, so we can apply it to a single line:
(defun read-integers (stream)
(loop
for c = (peek-char t stream nil) ;; skips whitespace
while c
collecting (read stream nil)))
;; next, we use this function just like we used "read-2-integers" before:
(defun read-lines-with-integers (stream)
(loop
for line = (read-line stream nil)
while line
collecting
(with-input-from-string (line-stream line)
(read-integers line-stream))))
which we can test like so:
CL-USER> (with-input-from-string (stream "10 20
30 40 50
60 70
80")
(read-lines-with-integers stream))
((10 20) (30 40 50) (60 70) (80))
File with integers - lines do not matter
In this case, we can just use our read-integers function from above:
CL-USER> (with-input-from-string (stream "10
20
30 40")
(read-integers stream))
(10 20 30 40)

How to bind var's name and value in the clojure macro?

Assum I hava some(more than 20) variables, I want to save them to a file. I don't want to repeat 20 times the same code.
I wrote a macro but it gave me an error.
my test case:
;-----------------------------------------------
(defn processor [ some-parameters ]
(let [
;after some operation ,got these data:
date-str ["JN01","JN02","JN03","JN04"];length 8760
date-temperature (map #(str %2 "," %1) [3.3,4.4,5.5,6.6] date-str) ; all vector's length are 8760
date-ws (map #(str %2 "," %1) [0.2,0.1,0.3,0.4] date-str) ;
;... many variables such like date-relative-humidity,date-pressure, name starts with "date-",
; all same size
]
;(doseq [e date-temperature]
; (println e))
(spit "output-variable_a.TXT"
(with-out-str
(doseq [e date-temperature]
(println e))))
;same 'spit' part will repeat many times
))
(processor 123)
; I NEED to output other variables(ws, wd, relative-humidity, ...)
; Output example:
;JN01,3.3
;JN02,4.4
;JN03,5.5
;JN04,6.6
;-----------------------------------------------
what I want is a macro/function I can use this way:
(write-to-text temperature,ws,wd,pressure,theta-in-k,mixradio)
and this macro/function will do the work.
I don't know how to write such a macro/function.
My macro post here but it doesn't work:
(defmacro write-array [& rest-variables ]
`(doseq [ vname# '~rest-variables ]
;(println vname# vvalue#)
(println "the vname# is" (symbol vname#))
(println "resolve:" (resolve (symbol (str vname# "-lines"))))
(println "resolve2:" (resolve (symbol (str "ws-lines"))))
(let [ vvalue# 5] ;(var-get (resolve (symbol vname#)))]
;----------NOTE: commented out cause '(symbol vname#)' won't work.
;1(spit (str "OUT-" vname# ".TXT" )
;1 (with-out-str
;1 (doseq [ l (var-get (resolve (symbol (str vname# "-lines"))))]
;1 (println l))))
(println vname# vvalue#))))
I found that the problem is (symbol vname#) part, this method only works for a GLOBAL variable, cannot bound to date-temperature in the LET form,(symbol vname#) returns nil.
It looks like you want to write a file of delimited values using binding names and their values from inside a let. Macros transform code during compilation and so they cannot know the run-time values that the symbols you pass are bound to. You can use a macro to emit code that will be evaluated at run-time:
(defmacro to-rows [& args]
(let [names (mapv name args)]
`(cons ~names (map vector ~#args))))
(defn get-stuff []
(let [nums [1 2 3]
chars [\a \b \c]
bools [true false nil]]
(to-rows nums chars bools)))
(get-stuff)
=> (["nums" "chars" "bools"]
[1 \a true]
[2 \b false]
[3 \c nil])
Alternatively you could produce a hash map per row:
(defmacro to-rows [& args]
(let [names (mapv name args)]
`(map (fn [& vs#] (zipmap ~names vs#)) ~#args)))
=> ({"nums" 1, "chars" \a, "bools" true}
{"nums" 2, "chars" \b, "bools" false}
{"nums" 3, "chars" \c, "bools" nil})
You would then need to write that out to a file, either using data.csv or similar code.
To see what to-rows expands to, you can use macroexpand. This is the code being generated at compile-time that will be evaluated at run-time. It does the work of getting the symbol names at compile-time, but emits code that will work on their bound values at run-time.
(macroexpand '(to-rows x y z))
=> (clojure.core/cons ["x" "y" "z"] (clojure.core/map clojure.core/vector x y z))
As an aside, I'm assuming you aren't typing thousands of literal values into let bindings. I think this answers the question as asked but there could likely be a more direct approach than this.
I think you are looking for the function name. To demonstrate:
user=> (defmacro write-columns [& columns]
(let [names (map name columns)]
`(str ~#names)))
#'user/write-columns
user=> (write-columns a b c)
"abc"
You can first capture the variable names and their values into a map:
(defmacro name-map
[& xs]
(let [args-list# (cons 'list (map (juxt (comp keyword str) identity) xs))]
`(into {} ~args-list#)))
If you pass the var names to the macro,
(let [aa 11
bb 22
cc 33]
(name-map aa bb cc))
It gives you a map which you can then use for any further processing:
=> {:aa 11, :bb 22, :cc 33}
(def result *1)
(run!
(fn [[k v]] (println (str "spit file_" (name k) " value: " v)))
result)
=>
spit file_aa value: 11
spit file_bb value: 22
spit file_cc value: 33
Edit: Just noticed it's similar to Taylor's macro. The difference is this one works with primitive types as well, while Taylor's works for the original data (vars resolving to collections).

Scheme Help - File Statistics

So I have to finish a project in Scheme and I'm pretty stuck. Basically, what the program does is open a file and output the statistics. Right now I am able to count the number of characters, but I also need to count the number of lines and words. I'm just trying to tackle this situation for now but eventually I also have to take in two files - the first being a text file, like a book. The second will be a list of words, I have to count how many times those words appear in the first file. Obviously I'll have to work with lists but I would love some help on where to being. Here is the code that I have so far (and works)
(define filestats
(lambda (srcf wordcount linecount charcount )
(if (eof-object? (peek-char srcf ) )
(begin
(close-port srcf)
(display linecount)
(display " ")
(display wordcount)
(display " ")
(display charcount)
(newline) ()
)
(begin
(read-char srcf)
(filestats srcf 0 0 (+ charcount 1))
)
)
)
)
(define filestatistics
(lambda (src)
(let ((file (open-input-file src)))
(filestats file 0 0 0)
)
)
)
How about 'tokenizing' the file into a list of lines, where a line is a list of words, and a word is a list of characters.
(define (tokenize file)
(with-input-from-file file
(lambda ()
(let reading ((lines '()) (words '()) (chars '()))
(let ((char (read-char)))
(if (eof-object? char)
(reverse lines)
(case char
((#\newline) (reading (cons (reverse (cons (reverse chars) words)) lines) '() '()))
((#\space) (reading lines (cons (reverse chars) words) '()))
(else (reading lines words (cons char chars))))))))))
once you've done this, the rest is trivial.
> (tokenize "foo.data")
(((#\a #\b #\c) (#\d #\e #\f))
((#\1 #\2 #\3) (#\x #\y #\z)))
The word count algorithm using Scheme has been explained before in Stack Overflow, for example in here (scroll up to the top of the page to see an equivalent program in C):
(define (word-count input-port)
(let loop ((c (read-char input-port))
(nl 0)
(nw 0)
(nc 0)
(state 'out))
(cond ((eof-object? c)
(printf "nl: ~s, nw: ~s, nc: ~s\n" nl nw nc))
((char=? c #\newline)
(loop (read-char input-port) (add1 nl) nw (add1 nc) 'out))
((char-whitespace? c)
(loop (read-char input-port) nl nw (add1 nc) 'out))
((eq? state 'out)
(loop (read-char input-port) nl (add1 nw) (add1 nc) 'in))
(else
(loop (read-char input-port) nl nw (add1 nc) state)))))
The procedure receives an input port as a parameter, so it's possible to apply it to, say, a file. Notice that for counting words and lines you'll need to test if the current char is either a new line character or a white space character. And an extra flag (called state in the code) is needed for keeping track of the start/end of a new word.

How to get the map function to not return something?

In sml nj, if you use the map function, your basically saying for each element x in a list, apply the function f on it, and return the list of the new values, but lets say f returns a string, and in f a comparison is done, if the comparison is true, then it returns the string, but if it's false, then it doesn't return anything, and nothing gets put into that list that map is currently building.
Is this possible to do?
Instead of using map, use one of the variants of fold (either foldl or foldr). Another option is, of course, to simply do a filter before you do the map.
As a simple example, imagine that you want to return a list of squared integers, but only if the original integers are even numbers. A filter-then-map approach might look like:
fun square_evens xs =
(List.map (fn x => x * x)) (List.filter (fn x => x mod 2 = 0) xs)
Or, you could use a foldr approach.
fun square_evens xs =
List.foldr (fn (x, xs') =>
if x mod 2 = 0
then (x * x) :: xs'
else xs') [] xs
Slightly longer, but arguably clearer, and probably more efficient.