The following tip is based on a hint by mzs found on MacOSXHints.com. And note: this article relates only to Tiger. This issue has been resolved in OS X Leopard and Applescript 2.0.
Although the Mac has been a great environment for working with
UTF-8 text (8-bit Unicode), I’ve found a few corners where it’s
rather difficult to preserve the encoding of my text. One of these
is passing UTF-8 arguments to Applescripts on the command-line,
using the osascript utility.
To step back for a second: the reason I need UTF-8 support everywhere is that I sometimes work with Persian texts, which use an Arabic alphabet. In general, most Cocoa application display Arabic text fairly well (though a large number of them have no clue when it comes to properly formatting right-to-left text; this means that when I type an exclamation mark, it often appears to the right of my entered text, rather than to the left). But in the non-Cocoa world, which includes Carbon apps and the command-line, UTF-8 is either non-existent or very poor.
For example, as a result of my work in Persian, I have files
that both contain Persian text and have Persian filenames. The
default setup for the Mac is pretty well suited for handling this
at the Cocoa-level of things, such as the Finder, TextEdit, and so
on. But on the command-line, things are a bit different. For one,
Terminal.app must be reconfigured to properly display Unicode
characters. Then, you have to pass the -w flag to
/bin/ls to get Unicode bytes in filenames to render
correctly.
If you want pass a Persian filename to a script, many programs
do not handle it at all. Some work transparently — they pass the
encoded bytes right along to the underlying filesystem calls, which
works great. But others convert the encoded filenames to their own
encoding (usually MacRoman) which completely destroys UTF-8
characters. osascript is one of these.
If you write an Applescript with an “on run” handler, and call
it with osascript, passing a UTF-8 encoded filename,
your “on run” handler’s argument list will look nothing like what
you passed in. But there is a trick for getting around this
limitation. It appears that osascript does not
translate data passed in via pipe. We can use this knowledge to
trick osascript into reading its argument list in a
different way instead of “on run”.
To do this requires making a shell script with two forks. The
data fork is a regular shell script whose job is to package the
argument list into a string that can be piped directly to
osascript. The resource fork is the Applescript
itself, compiled to read and unpackage those arguments from the
other side of the pipe.
First, the script template, which is always the same:
#!/bin/sh
case $# in
0)
echo "Usage: ${0##*/} file [ file... ]" >&2
exit 1 ;;
esac
{ arg=$1
echo -nE "$arg"
shift
for arg in "$@"; do
echo -ne '\x00'; echo -nE "$arg"
done
} | /usr/bin/osascript — "$0"
Next, the Applescript template. After this header, refer to your
argument list using the argv list:
set argv to do shell script "/bin/cat"
set AppleScript's text item delimiters to ASCII character 0
set argv to argv's text items
set AppleScript's text item delimiters to {""}
— The rest of your script follows here...
To bind these pieces together, we’ll assume you’ve called the
shell script template.sh, and your Applescript
myscript.script. First you need to compile the
Applescript:
osacompile -o myscript.scpt — myscript.script
Then bind the compiled Applescript to the resource fork of the final script:
ditto -rsrc myscript.scpt myscript
Next, copy the shell script template to the data fork of the final script:
cat — template.sh > myscript
And finally, mark the script executable and delete the byproducts:
chmod 755 myscript
rm myscript.scpt
Now you can run myscript and pass it a UTF-8
encoded filename, and the Applescript will see it as a properly
encoded string of type “Unicode text”.