Archives

Creative Commons License
This blog is licensed under a Creative Commons License.

Applescript and UTF-8 arguments

| 2 Comments | No TrackBacks

The following tip is based on a hint by mzs found on MacOSXHints.com. And note: this article relates only to Tiger. This issue has been resolved in OS X Leopard and Applescript 2.0.

Although the Mac has been a great environment for working with UTF-8 text (8-bit Unicode), I’ve found a few corners where it’s rather difficult to preserve the encoding of my text. One of these is passing UTF-8 arguments to Applescripts on the command-line, using the osascript utility.

For example, as a result of my work in Persian, I have files that both contain Persian text and have Persian filenames. The default setup for the Mac is pretty well suited for handling this at the Cocoa-level of things, such as the Finder, TextEdit, and so on. But on the command-line, things are a bit different. For one, Terminal.app must be reconfigured to properly display Unicode characters. Then, you have to pass the -w flag to /bin/ls to get Unicode bytes in filenames to render correctly.

If you want pass a Persian filename to a script, many programs do not handle it at all. Some work transparently – they pass the encoded bytes right along to the underlying filesystem calls, which works great. But others convert the encoded filenames to their own encoding (usually MacRoman) which completely destroys UTF-8 characters. osascript is one of these.

If you write an Applescript with an “on run” handler, and call it with osascript, passing a UTF-8 encoded filename, your “on run” handler’s argument list will look nothing like what you passed in. But there is a trick for getting around this limitation. It appears that osascript does not translate data passed in via pipe. We can use this knowledge to trick osascript into reading its argument list in a different way instead of “on run”.

To do this requires making a shell script with two forks. The data fork is a regular shell script whose job is to package the argument list into a string that can be piped directly to osascript. The resource fork is the Applescript itself, compiled to read and unpackage those arguments from the other side of the pipe.

First, the script template, which is always the same:

#!/bin/sh

case $# in
0)
    echo "Usage: ${0##*/} file [ file... ]" >&2
    exit 1 ;;
esac

{   arg=$1
    echo -nE "$arg"
    shift

    for arg in "$@"; do
        echo -ne '\x00'; echo -nE "$arg"
    done
} | /usr/bin/osascript -- "$0"

Next, the Applescript template. After this header, refer to your argument list using the argv list:

set argv to do shell script "/bin/cat"

set AppleScript's text item delimiters to ASCII character 0
set argv to argv's text items
set AppleScript's text item delimiters to {""}

-- The rest of your script follows here...

To bind these pieces together, we’ll assume you’ve called the shell script template.sh, and your Applescript myscript.script. First you need to compile the Applescript:

osacompile -o myscript.scpt -- myscript.script

Then bind the compiled Applescript to the resource fork of the final script:

ditto -rsrc myscript.scpt myscript

Next, copy the shell script template to the data fork of the final script:

cat -- template.sh > myscript

And finally, mark the script executable and delete the byproducts:

chmod 755 myscript
rm myscript.scpt

Now you can run myscript and pass it a UTF-8 encoded filename, and the Applescript will see it as a properly encoded string of type “Unicode text”.

No TrackBacks

TrackBack URL: http://www.newartisans.com/mt/mt-tb.cgi/41

2 Comments

I just thought I’d chime in that AppleScript 2.0, in Leopard, is UTF-8 through and through, so as far as I can tell, this is unecessary on that platform.

You are completely correct. The article relates only to Tiger.

About this Entry

This page contains a single entry by John Wiegley published on October 1, 2007 3:16 PM.

A few remarkable Mac apps was the previous entry in this blog.

Using Archiveopteryx on the Mac is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Recent Comments

  • Curt Sampson: That there’s “no state” in Haskell is quite wrong; in read more
  • rv: Hi. I wanted to drop you a quick note to read more
  • John Wiegley: It’s here: http://ftp.newartisans.com/pub/python/modpython_gateway.py read more
  • Leon: The file “modpython_gateway.py” Is no longer available in the downloads read more
  • Kathy: Well, the article is really the sweetest on this laudable read more
  • mr.design: Hi John, I just started to read your GFTBU, it’s read more
  • yoman: “Barfin”? “Slurping”? “Slime” “Hunchentoot” ??? What in the T.F. world read more
  • John Wiegley: Something like this is slated for the next release of read more
  • womens health: According to me, Apple has implemented something called blocks, which read more
  • Bjorn Tipling: Why would you add instructions for installing an editor when read more
OpenID accepted here Learn more about OpenID
Powered by Movable Type 4.261