i'm trying grip on clojure. exercise, set out build function returns lazy sequence of given subreddit's entries.
in order make aim clear, put following ruby code using lazy enumerators.
require 'open-uri' require 'nokogiri' class reddit def initialize(subbredit) @url = "http://www.reddit.com/r/" + subbredit.downcase @entries = [] end def entries enumerator::lazy.new(1..float::infinity) |yielder| if @entries.empty? parse else yielder << @entries.shift end end end def reset @url.gsub!(/\?.*/, '') @entries = [] end private def parse page = nokogiri::html(open(@url)) @url = page.css('p.nextprev a[rel="nofollow next"]').first['href'] page.css('div.thing').each |thing| title = thing.css('a.title').text points = thing.css('div.score.unvoted').text.to_i @entries << { :title => title, :points => points } end end end (i welcome remarks on ruby code too. bear in mind interested in lazy sequences rather in object-oriented boilerplate.)
coming clojure, after effort , unintillegible curses ended following code.
(ns playground.experiments.lazy-html (:require [net.cgrand.enlive-html :as html])) (defn subreddit-url [name] (str "http://www.reddit.com/r/" name)) (defn fetch-page [url] (html/html-resource (java.net.url. url))) (defn make-integer [n] (try (integer. n) (catch exception e 0))) (defn page-entries [url] (let [page (fetch-page url) things (html/select page [:div.thing])] (map #(hash-map :title (-> % (html/select [:a.title]) first html/text) :score (-> % (html/select [:div.score.unvoted]) first html/text make-integer)) things))) (defn next-url [url] (let [page (fetch-page url)] (-> page (html/select [:p.nextprev (html/attr-has :rel "next")]) first :attrs :href))) (defn entries [url] (lazy-cat (page-entries url) (entries (next-url url)))) (defn subreddit [name] (-> name subreddit-url entries)) (comments, criticism , improvement suggestions on aspects of code eagerly awaited. posted gist tinker code.)
the thing works… extend. has a huge problem: recursion in entries doesn't occur in tail position. means had willing poll tens of thousands of pages – well, not reddit – stack blow right away, wouldn't ?
i wasn't able find way build optimisation-wise tail-recursive lazy sequences. i've read of threads dedicated clojure lazy sequences, no avail. guess missing point somewhere. below 2 of silly attempts, 1 of seems make no sense clojure compiler, other being endless.
(defn subreddit [name] (loop [url (subreddit-url name)] (lazy-seq (concat (page-entries url) (recur (next-url url)))))) (defn subreddit [name] (loop [url (subreddit-url name) old-entries []] (recur (next-url url) (lazy-cat (page-entries url) old-entries)))) the question is: how should it? how 1 build lazy sequences chunks of io data in clojure? possible lazy sequences not right tool here? (in ruby, laziness — should — memory saving). or lazyseq ressort kind of optimisation magic (caching + flattening stack ?) in such way first bunch of code above happens stackoverflow-safe?
a side question now. ruby code above has state, means 1 can consume part of infinite sequence in first call , next chunk second call. how can 1 achieve similar in clojure? tried closures out, alas unsuccessfully.
nota bene complete newcomer clojure. started the joy of clojure, nice, dense, written , insightful read. part on lazy stuff instance fell little short. clojurians advice grip on clojure?
i'd add alex's answer stressing think key point:
functions producing lazy seqs should not tail recursive.
the reason function like
(defn foo [& args] (lazy-seq ...)) ; ^- lazy-cat etc.; ; ... typically has (cons ... (foo)) in tail position returns lazy seq object caller, , stack frame popped off stack.
if caller -- or else whom lazy seq transferred -- asks actual items lazy seq, produced call function stored inside lazy seq object body corresponds ... in snippet above. if body (something functionally similar to) (cons ... (foo)), recursive call behave outer call, is, return lazy seq object , have frame popped off stack. when part of seq needed process repeated, , on , forth.
notice means whoever realizes lazy seq produced foo gets handed return value of cons, can produced precisely because inner call foo returns -- because returns lazy seq.
in contrast, if foo tail-recursive, couldn't lazy, or in other words, involvement in constructing foo seq have end time returned value caller -- either have produce entire seq returned or delegate "lazy work" function (for argument repeated).
one way of thinking lazy seq producers reify control structure of seq production process on heap, whereas tail-recursive seq producer accumulates results in variables held on stack (regardless of whether grows, on jvm, or doesn't, in scheme).
see how lazy sequences implemented in clojure? question handful of answers going details behind lazy seq. (my own answer contains shorter earlier attempt @ summarizing point i've been making above.)
Comments
Post a Comment