emacs, lisp, php, wordcount, and etc.

Based partially on Steve Yegge’s advice that you should learn your tools, along with an interest in doing some kind of coding even while being busy with school, I’ve been working my way through the emacs manual and the emacs-lisp-intro, which latter is highly recommended.

(As an aside, one thing I’ve been working on has been learning how to make modes and all the things that would go into that, and because of that I was able to “fix” a bug in php-mode, which apparently somebody has found useful and thanked me for. First time ever being thanked for fixing code, let me tell you it was really nice and inspired an extra few hours of learning. What I’m saying is that if you like the work somebody’s doing saying thanks not only makes thme feel good, it also will improve the chances of thme doing further things that will benefit you.)

One direct result of this is that I have written a short count-words-document function. All the hard work was taken care of by `karl’, he wrote the main count-words function as part of the text. But I have started to do a lot of editing of LaTeX text in emacs, so that I don’t have to deal with OOo‘s obsessive desire to print my document every time I hit C-p. I can’t figure it out ;)

The wordcount function’s he wrote can count all the words in a region or a buffer, but neither of those are optimal of a LaTeXer, what with all the boilerplate my templates are accumulating. Sure, I could search for the beginning of the document, mark it, and then go to the end and pass the region to the function, but doesn’t that sound like to much work? Well, I realized that since all my templates begin with a “begin{document}” environment, and usually actually a “begin/end{singlespace}” env., that I could write a defun to just automatically deal with that, so voila:

(defun count-words-document (&optional arg begin end)
  "Without ARG, count all the words in the {document} environment
if ARG exists, count words in region."
  (interactive "P\nr")
  (if arg
      (count-words-region (prefix-numeric-value begin)
			  (prefix-numeric-value end))
    (save-excursion
      (goto-char (point-min))
      (if (search-forward "end\{singlespace\}" nil t)
	  (count-words-region (point) (progn
					(search-forward "end\{document\}")
					(goto-char (match-beginning 0))))
	(progn
	  (goto-char (point-min))
	  (if (search-forward "begin\{document\}" nil t)
	      (count-words-region (point) (progn
					    (search-forward "end\{document\}")
					    (goto-char (match-beginning 0))))
	    (count-words-region (point-min) (point-max) t)))))))

That function depends on the `count-words-region’ defun which was written by “karl” and can be downloaded, for convenience with my additional function from here. The count-words-document function if called without an argument (e.g. `M-x count-words-document’) will automatically create a region that begins, in descending order, with `end{singlespace}’, then `begin{document}’, then just the beginning of the buffer. The first two conditions will end the region at an `end{document}’ string, and the beginning of the buffer ends at the end of the buffer. So it has a nice descending order of most logical TeX regions. You can also, if you want pass it any argument (e.g. `C-u M-x count-words-document’) and it will count the words in the current region instead.

This would be nicer if it searched for variables instead of hard-coded regexes, but for now that’s for a v0.2 or something, since I don’t know how to define custom variables yet. Also, I just realized that the documentation is horribly incomplete. Anyway:

To install it, stick it in your load-path (if you haven’t got one set up, adding (setq load-path (cons ".emacs.d" load-path)) to your .emacs file will add `.emacs.d’ to your load path, just make the actual directory [that is to say, `mkdir .emacs.d'] and put the file inside it.) and add (load "wordcount") to your .emacs file somewhere below the load-path edit, and voila.

You could also set it to a shortcut key, for example I have it set to `C-c d’

(global-set-key "\C-cd" 'count-words-document)

`d’ for Document.

Technorati Tags: , ,

  • Pingback: Bookmarks about Emacs

  • http://neverfriday.com/ Rudolf Olah

    I wrote a blog post on how to count words in Emacs, which uses the how-many function. It works with Emacs22+.

    In any case, I was thinking there was a way to handle this with the syntax table but it looks like that only deals with individual characters.

    Also, this function is flawed if you’re dealing with lists or other types of blocks. If you wanted to ignore blocks when counting words I think you could use the latex-standard-block-names variable to check if you’re in a block or not. Hm. Oh, idea, you could count all the words and then count all the blocks and then subtract their count from the total word count. That could work, though I’m not sure how quick that would be when you have a huge document…

  • http://quodlibetor.com quodlibetor

    @Rudolf: Oh, wow, your count-words function is way cleaner than mine, and i would guess much faster.