Archives

Creative Commons License
This blog is licensed under a Creative Commons License.

The JVM, and costs vs. benefits

| 28 Comments | No TrackBacks

In a recent entry on differences between Haskell and Lisp, one of the Lisp community’s long-time members, Daniel Weinreb, asked about my stated aversion to JVM-based languages for everyday computing (sometimes referred to as “scripting”). Specifically, it was asked in relation to Clojure, and why I hasn’t been immediately taken by that language – despite it’s having so many features I respect and admire.

I wanted to respond to Daniel’s question in a separate blog entry, since this topic has come up so often, it seems, and deserves thought. The JVM is a rich, mature platform, and you get so much for free by designing new languages on top of it. The point of debate is: what are the costs, and are they always worth the asking price?

Daniel’s question was:

In your own case, you mention “tiny” and “fast-running” executables. I am not sure why “tiny” matters these days: disk space is very cheap, and the byte code used by the JVM is compact. Common Lisp programs compiled with one of the major implementations, and programs written for the Java Virtual Machine, execute at very high speed.

The fact that you distinguish between server-side and client-side applications suggests to me that what you’re really talking about is start-up latency: you’re saying that a very small program written for the JVM nevertheless has a significant fixed overhead that causes perceived latency to the user. Is that what you have in mind?[…]

As a hypothetical question just to clarify your meaning: if there were a JVM implementation that started up instantly, so that the speed of execution of a small program would be the same as the speed of the same code appearing in the middle of a long-running server process, would that answer your objections?

Hi Daniel, thank you for your in-depth reply. As always, I enjoy reading what you’ve contributed to the Net’s compendium of thought on Lisp and related languages.

Your clarification was most accurate: When I said “scripting”, I was talking about a context of usage, not a particular language paradigm. I like that Haskell seems to be just as appropriate for tiny, throw-away scripts as it is for large, long-running programs.

When it comes to the latter, I really no have objections at all to the JVM or its startup time. I’m more than willing to wait 5 minutes for something to execute, if it will run for months at high efficiency. I face this situation all the time at work, where we have a huge EJB application hosted on JBoss. It may complicate debugging sometimes, but the costs are worth the benefits. The sheer number of things that J2EE and JBoss manage on our behalf, compared the small amount of code necessary to take advantage of them, is quite amazing.

What the JVM takes away, at least in 2009, is the choice of what those costs will be, and when I have to pay them. I think one of C’s biggest attractions for a long time has been that most of its costs are a conscious decision. If you favor startup time, or a small memory footprint, or fast execution, you can pretty much decide. This makes it as appropriate for embedded apps, as it is for running an HTTP server, as it is for building operating systems and compilers. With Java, despite all the things you get for “free”, it comes at the cost of other freedoms. And sometimes, Java’s priorities are not mine.

So while I can and do use the JVM for server-side computation, it’s a bit heavy weight for small and simple tasks. Common Lisp’s answer to this problem was an ingenious one. Instead of building programs that you run over and over, it offers an “environment” in which code is iteratively evaluated, so that you actually grow and nurture a burgeoning set of functionality within a long-running VM. I like this model when appropriate, and enjoy it, for example, in Emacs, which I can leave running for days on end while at the same time extending its functionality by writing new functions and customizing variables.

To answer your query then: yes, if JVM startup time could be eliminated, it would “free my hand”. I very much respect the maturity and stability of the JVM libraries Groovy and Clojure have access to. Also, what you said about the JIT, and alternative VMs, can be supplemented by mentioning all the other JVM facilities that exist, like code coverage, performance and memory analysis, and live introspection; along with the ability to pick JVMs to run on phones, or satisfy real-time computing requirements. It’s a rich platform, no doubt.

But why do we never see complaints about languages that link to the C++ standard library, or Boost, or any other of the large frameworks that exist? Because in those worlds, you don’t pay for what you don’t use. It’s been a design philosophy behind C++ for years, and to good effect. We might complain about the language, or its APIs, but you hardly notice if other projects use it, because largely, one can pretend it’s not even there. Not so with the JVM. Every time I start a Java application on my system, I feel it. Run several of them at once, and even my 3Gb laptop starts swapping. Only with the JVM are such things a source of common complaint.

I’m hoping that some day, projects like the LLVM will start to abstract these two sides of development. I want to be able to pick my language for its type safety, clarity, expressiveness, and joy of use; while at the same time I’d like to pick my VM for its security, footprint, handling of parallelism and messaging, and run-time appropriateness. This would let me choose Lisp, Haskell, Python or C++, depending on the skillset of engineers available to me; and the JVM, .NET platform, or LLVM, depending on how I meant the code to be used. Wouldn’t that be a powerful set of tools at one’s disposal?

No TrackBacks

TrackBack URL: http://www.newartisans.com/mt/mt-tb.cgi/2544

28 Comments

Thank you for the clarification. Right at the moment, I’m so busy preparing for the Lisp conference that I don’t have time to run metering experiments; perhaps someone else reading this could try out a small program like “echo” in Clojure, Java, Python, PERL, and sh, and see how long they take, on that person’s hardware and O/S.

It would also be interesting to measure Clojure and Java under the JRockit implementation of Java. I don’t know anything about its startup time, but a few years ago, I did a computationally-intensive simulation program with overnight runs, and JRockit was twice as fast as Sun’s implementation. Obviously anything measurement made that long ago is highly suspect, and does not bear on the startup time issue. I only mention it to provide a bit of enticing evidence that it’s probably worth checking JRockit.

But even knowing that isn’t the complete answer; I’d like to hear from someone familiar with the present JVM implementations, and the plans for the future.

The LLVM is very interesting. I know that at least some of the people at Clozure (the maintainers of Clozure Common Lisp, no relation to the Clojure language) have considered re-implementing their compiler to generate LLVM code.

An advantage of LLVM is that you don’t have to use anybody else’s calling convention, and there is good reason to believe that devising your own calling convention can have significant beneficial effects on performance if you do it optimally. Also, under LLVM, there would not be a big barrier to implementing Scheme-style continuations.

On the other hand, you don’t get automatic interoperation with the huge set of JVM (nor .NET) libraries.

What you suggest in the last paragraph is a paradise that I’m not sure can be feasibly achieved. What would be even better would be everything you said, plus the ability for each of these languages to interoperate with the other at high speed, and allowing things like writing a class in one language and subclassing it in another.

That kinds of integration between Python and Java is present in the Jython implementation of Python, if I’m not mistaken, although if you take advantage of it, you can no longer port your program back to the “regular” Python implementation. The ability to work with the JVM is not part of the portable Python standard. (Actually, I gather that there isn’t a standards document written well enough to be clear about all this, but I don’t see any reason why there couldn’t be.)

Clojure defines the JVM to be part of the semantics of the language, and so you can only port to implementations that can call Java smoothly.

The .NET VM allows a great deal of inter-language cooperation. However, my impression is that as a result, the languages are forced to agree on certain important semantic issues. This led to the introduction of the new “managed” implementation of Visual Basic, whose semantics are so different from the original Visual Basic that translating from old to new is not easily done mechanically, and people with large code bases in the old language are being left behind.

You must be able to write a decompiler for .NET byte codes (just as decompilers have been written for the JVM, although I don’t know any that is currently maintained and up-to-date). A decompiler could presumably decompile into any of the languages that run on .NET.

(Well, that’s not quite true; it may be that although they all share the same potential set of instructions, there are sequences of byte code that only some compilers can ever produce. With that big caveat, I’ll continue.)

That would seem to mean that you could translate a program between any two .NET languages. To me, that means that they’re really all the “same” language in an important sense, with the same semantics and merely different syntax.

In the case of the JVM, does this mean that any Clojure program could be automatically translated into a Java program, and vice versa? Of course, the Java program that came out might be very weird, unlike anything that a reasonable programmer would ever write, and perhaps very hard to understand. Still, I think it’s an interesting hypothetical question and I don’t know the answer.

What you said about leaving the Lisp environment in place so that you don’t have to pay the Lisp startup cost is exactly how the Lisp machine worked. This strategy works great when everything (and I mean everything!) is written in Lisp.

If you’re using Unix/Linux, though, and you want to write a command that can participate in pipes and all that, obviously you can’t do that directly. However, assuming that Lisp took too long to start up, here’s what you could do:

Have a subroutine library that you’d use in a simple, C-coded command executable, that would perform interprocess communication with an exiting Lisp process whose lifetime is that of your interactive session.

To demonstrate that such a strategy is feasible in practice, consider Slime/Swank, the popular Common Lisp interactive development environment. When you start using Slime, you get a Lisp environment in another process, running a “slave” (or “server”) called Swank. Your own program-being-debugged also is provided to you as a read-eval-print loop so that you can test out your program in the usual Lisp way. However, you can give IDE commands that respond very quickly, because they send a message to Swank, which does things in the Lisp process, and sends replies back to Slime. People do this all the time and it works fine.

Of course there would be a library/framework to make it easy to write such a command.

By the way, I remember that at the first Lightweight Languages Workshop (ll1.ai.mit.edu), Jeremy Hylton (then one of the major Python developers) said that Python started up fast, but took a long time to stop! This is because at the end of the program, all the memory that had been dynamically-allocated became unused, and the (relatively slow) garbage collector (reference-count based, not a “true” GC) would run on all the memory blocks, which took a long time. This was back in 2001 and no doubt they’ve fixed this long ago. But it was pretty funny at the time.

Rich Hickey, the inventor of Clojure, says that he definitely has users who are using Clojure for scripting, in the sense that you mean.

Try this on your favorite hardware:

time java -jar clojure.jar -e “(java.util.Date.)”

On Rick’s Mac (he didn’t say what model), he’s seeing realtimes like .75 seconds. On my machine, Windows Vista SP1 with 4G main memory, Intel Core 2 Duo CPU (2.66 Ghz each), I am seeing times like 1.2 or 1.3 seconds. If you have not run Java for a long time, it’s more like 3.3 seconds.

By comparison, running “date” takes about .1 seconds on my system.

BUT:

In your paragraph above starting “What the JVM takes away, at least in 2009, is the choice of what those costs will be”. That is not quite the case. You CAN, in fact, make a choice about whether you want fast startup time versus better overall throughput.

Try running the same Clojure example, using the -client option to Java. I see times more like .07 seconds! That’s 70 milliseconds. It’s not very perceptible.

Sorry, “date” is an unfair example. It must be doing a lot more work than I know. If I time “echo a”, it’s less than 1 ms.

Still, the Clojure example is actually faster than “date”, so it’s a bit hard to say that it’s unacceptable on grounds of its startup time being too low, at least on a contemporary machine. My new Sony Z750 laptop has the same CPU speed and 3GB main memory, which would have cost about $1,800 had I not sprung for the 128GB SSD option. This is the same price as a Mac Air (but the latter has no optical drive…). So it is not necessarily to spend that much money to get the hardware needed to reproduce these results.

I have to disagree about C being a good choice for implementing compilers. The cost in developer time is way higher than with Haskell or ML, and that should count for a lot.

What you’re describing here, about a resident Lisp engine communicated with via some kind of IPC, sounds very similar to what Nailgun wanted to achieve with Java (a project that hasn’t seen movement since 2005). I’ll have to give this a try, and see if it gets JVM languages down into the C startup time-frame.

On my MacBook Pro, your Clojure Date example requires on average 0.973s, and with -client it goes down to 0.9238. Note that I also set JAVA_OPTS=-Xverify:none in my environment, to reduce Java startup times as much as possible.

I wonder if part of the slowness for me is Apple’s port of the JDK, since using -client doesn’t seem to have much effect.

@Daniel

“Visual Basic that translating from old to new is not easily done mechanically, and people with large code bases in the old language are being left behind.”

There are translation apps (ASP2ASPX) which will do 95% of the conversion for you (VBScript to VB.NET or C#); I successfully upgraded a 7 million line website in 2 months using this approach.

What an interesting couple of articles and threads…

On my Atom N270 Ubuntu 8.04 machine, I get this real times (using -Xverify:none):

Server JVM Clojure: 0m4.382s, Client JVM Clojure: 0m2.257s, Perl 5.8.8: 0m0.023s, newLISP 10.0.2: 0m0.011s, date: 0m0.009s

I’ve put some other programs I have available for comparison, including another Lisp language. This is the Linux port of Sun’s JVM, 1.6.0_06. While JVM is certainly fast after some seconds running, specially after the hotspot kicks in, the diference in startup time is pretty huge, 2 whole orders of magnitude.

Even if Java libraries and their verbosity would allow for command line piped scripts (in one of its more compact languages) these times are ridiculous. In the same time ‘grep’ or ‘awk’ have already processed a fairly complex regex over several thousands of lines (or more, depending on how fast your IO is) to feed into your script.

Also, ‘(java.util.Date.)’ compared with e.g., ‘(date)’, more than a couple of times gets so tiring… 80 character wide screens? Mmmm…

Nevertheless, JVM has improved a lot, and I hope the new additions planned for dynamic languages makes it eben better for implementing languages over it. LLVm and Parrot are other pretty serious competitors, but lack the huge amount of resources to back them up that JVM has…

Additionally, I’d ask a “scripting language”, in the sense defined before, a great ease to interact with the host OS and its utilities, specially thinking of UNIX in here…

Anyway, thank you very much for the interesting food-for-thought, it’s great to be able to have insights just by reading a blog post and its comments.

Charles Stewart writes yesterday on Advogato ->

johnw from yesterday, talking about his reasons for not being comfortable with languages, in particular Clojure, that target the JVM:

What the JVM takes away, at least in 2009, is the choice of what those [runtime system] costs will be, and when I have to pay them. I think one of C’s biggest attractions for a long time has been that most of its costs are a conscious decision.

I wonder why this isn’t a problem for appreciating Haskell, then, which is awful in this respect. As Norman Ramsey writes in his brief summary of research interests:

The run-time system of the Glasgow Haskell Compiler offers a pleasing collection of useful run-time services, carefully implemented. It is also an immense, bloated hog: it is four times the size of version 6 Unix, a major operating system. The size and complexity of the GHC run-time system is a serious problem for those who want to create secure software using Haskell and for those who wish to use Haskell without an underlying operating system (Hallgren et al. 2005).

Furthermore, it is difficult to get a feel for what the compiler does with one’s code. I’ve read several of the key papers on pure FP code optimisation, and I am completely puzzled by some of the experiences I’ve had with ghc.

I really don’t mind the size and complexity of GHC, as long as I can remain largely unaware of it. For example, I’m a big user of Boost, which in the category of size and complexity is an awful beast. It takes like 4 hours to compile the thing, and over 2G of disk space. C++ executables that are statically linked with it are nearly 10M or so.

However, I have 500G in my laptop, and Boost doesn’t affect startup time at all, nor runtime, except for the fancy features it uses. And what it gives me in return – a “Maybe” type, safe deletion, shared pointers, etc. – are facilities I live by now in order to write safer C++ code.

I suppose I should quantify what the various costs are, and where they lie. For me, I distinguish between two sets: costs I notice, and costs I don’t. If it requires me dropping into the shell and running “du” to notice a cost, then really I never will. But if it slaps me in the face each time I run the program, especially if it’s something I run often, then I call foul.

I feel the same way about the JVM as a platform for scripting, and I have also been ambivalent about using Clojure because it is hosted on the JVM. I don’t think Rich Hickey made a mistake in hosting on the JVM at all, either - in fact, it is a great idea, because every new language that is not hosted on a common platform means starting all over again with libraries, and libraries are hugely important to the practicality of a language. Though learning new programming languages can be useful for mind-expansion, it is difficult to spend enough time with a language to truly benefit from its world-view without being able to write useful programs, and unless you like writing your own XML parser every time (or what have you), the lack of sufficient libraries make a language “a great place to visit, but I would’t want to live there.”

More than startup time, I think the biggest detriment to using the JVM for scripting is the way it treats memory. It seems that the JVM always wants to preallocate a lot of RAM and hold onto it forever, increasing over time until it either hits the configured JVM limit and crashes or grabs so much memory that the OS starts swapping or hits the virtual memory ceiling and crashes. This is a problem with Java programs in general, but even more of a problem with scripting languages on the JVM since they tend to focus on memory efficiency less. I truly do not understand why the JVM needs so much memory, since I have written quick-and-dirty, heavily-allocating Python scripts on many occasions and never seen anywhere near the type of memory hogging I see from JVM apps. This makes me nervous to run JVM apps on any system of importance, and I have been burnt enough times from the JVM hogging memory that I avoid it almost universally, even if it means also avoiding languages like Clojure that are otherwise amazing and close to my ideals.

But this is a good point; Clojure is never going to replace Ruby completely for me because there are still a fair number of everyday tasks I perform that I need instantaneous feedback for without keeping a long-running server around. That said, for anything with nontrivial concurrency using Clojure is a no-brainer; doing that in an OO language is fraught with peril.

I think OS X defaults to using the client JVM. At least on my machine, the normal java looks like this:

andrew-hitchcocks-mac-pro:~ Andrew$ java -version
java version "1.5.0_16"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_16-b06-284)
Java HotSpot(TM) Client VM (build 1.5.0_16-133, mixed mode, sharing)

However, I’ve aliased java to point to the 64 bit version of 1.6 (which forces server):

andrew-hitchcocks-mac-pro:~ Andrew$ java -d64 -version
java version "1.6.0_07"
Java(TM) SE Runtime Environment (build 1.6.0_07-b06-153)
Java HotSpot(TM) 64-Bit Server VM (build 1.6.0_07-b06-57, mixed mode)

For a while I was running some long lived Java programs so I was happy to trade off startup time for execution speed. Even with a startup time of a second or two, I don’t see that as a big impediment to Java’s use.

An advantage of LLVM is that you don’t have to use anybody else’s calling convention,

Can I get a link to how to do that? I discarded LLVM as a possible backend because as far as I could tell it wouldn’t let me use my own calling convention.

Every time I start a Java application on my system, I feel it. Run several of them at once, and even my 3Gb laptop starts swapping.

I have three JVM’s running with 128M each (Tomcat, Resin, Glassfish), one with 256M (Eclipse), a VirtualBox with WinXP with 768M and Firefox with 200M of RAM. I don’t know what you are doing to your machine, by my 2GB aren’t swapping and I think this is pretty intense use. A default JVM takes only 64M… how on earth can that be considered heavy these days?

I’m running JBoss inside VMware, which takes 1G. I have to make it this large because unless I give the JBoss JVM 768M, my EJB application crashes due to heap exhaustion. Then I’m running IDEA, which when building this same EJB application, consumes about 1.2G of memory (observed by looking at the activity monitor).

Perhaps your JVMs are not running applications as large as what I’m using? This combination of IDEA + VMware causes my machine to swap so badly that sometimes I have to wait up to five minutes (!) before my typing is recognized. The solution so far has been to use Maven where possible (because it goes away when it’s done), and avoid long-running JVM instances.

If you are running such large applications, you will always have memory problems, no matter what language/platform you are using ;). However, perhaps you are being bitten by the fact that a JVM never returns any memory to the OS. If you allow it GB and its memory usage grows to that GB, then you will never get it back. For small ‘scripting’ applications, that is not a problem, as they end quickly and usually don’t use much memory in the first place. The memory overhead of using a JVM is minimal.

The startup time is being taken care of currently: Project Jigsaw. There is hope yet :)

If you are running such large applications, you will always have memory problems, no matter what language/platform you are using

Not necessarily, compared to the JVM’s memory consumption. I regularly have about 10-20 applications running on OS X, some of them very “large” (a drawing application, databases, design tools, etc). I only start to notice 3Gb of RAM being “too small” when IDEA and other JVM apps start coming up. I’ve learned that for Java development, 6-8G is about the right memory size, whereas for anything else, 3G has been more than enough.

It doesn’t make any sense to me. What is using the memory? The memory overhead of a JVM is minimal. The amount of memory a Java application uses (or an application in any other JVM language for what that matters) is comparable to the amount of memory that the same application would use in any other language. The only thing that could bite you is when you have long running JVM’s that max out their allowed quantities of memory and retain that memory. But that can be managed by periodically restarting applications and in general managing which applications are running. You run into the same problem when developing large applications in other languages. Am I missing something?

I don’t know where you picked those numbers from, but they seem a bit exaggerated.

I’ve been working on large project at major bank and transportation companies. All I ever had was a standart Dell 2GB-ram workstation, and it was more than comfortable to work with the Eclipse IDE.

As for startup time, FYI, the Tomcat application server starts and deploys several applications in under 400ms on those workstations… So I think a small script (in Groovy, JRuby, or whatever JVM-based language) should start and run under half that time or even less : barely noticeable for any human user…

The startup time of the the Java VM has been reduced a lot over the last major Java releases and a ‘time java HelloWorld’ (main method with System.out.println) is on my system through in 90ms. That made me slightly annoyed reading about the clojure startup times mentioned here, and not sure if the right child is bashed, if the run time of several seconds of a benchmark like ‘java -jar clojure.jar -e “(java.util.Date.)”’ is contributed to the JVM.

Running the Clojure example on my system takes about 600ms, so comparing with the 90ms ‘Hello world!’ benchmark, it looks as if more than 5/6 of the time is spent initializing Clojure and not directly related to the JVM. Examining the JVM profiler output indicates that a lot of time is spent loading Java classes and indeed, the clojure.jar contains more than 1200 Java classes, of which most seem to be loaded when Clojure is initialized and not on demand, as is the case with ‘normal’ Java software. One possibility to reduce startup time is to unpack the jar file (the jar file in the Clojure distribution is compressed with obvious impact on class loader performance), which reduced the run time by 50ms. Disabling the bytecode verifier (as already mentioned here) brought another 200ms, so that I’m now down to 350ms. As a comparison, the latest JRockit VM requires 2.9 seconds for the uncompressed jar and with disabled byte code verifier.

Due to the order in which the class path is searched for a named class, for each of these 1200 Clojure classes, the JVM has to look into its own class libraries to see if it can find the class there, before getting it from clojure.jar. A last attempt to reduce startup time would be to add clojure.jar to the JVM’s boot classpath, but the Clojure implementation in RT.java doesn’t seem to cope well with a modified classloader hierarchy and breaks with a NullPointerException. Perhaps I’ll have timer later to fix that bug and check how much time is saved, by rearraning the classpath search order.

If startup time of Clojure scripts are important, the Clojure team should look into their initialization procedure and consider if it’s really necessary to load all these classes when initializing the software.

The memory overhead of a JVM is minimal. The amount of memory a Java application uses (or an application in any other JVM language for what that matters) is comparable to the amount of memory that the same application would use in any other language.

Are you just hoping that that’s the case? Check out the language shootout and compare memory use by Java to C++. Take a look at the comparison to Python even! The data speaks for itself.

I have lots of applications, written in many languages, and I can tell you from first-hand evidence (running Activity Monitor and top) that unless Java programmers are simply worse at freeing memory, Java applications consume many times more system memory than the next largest thing I run. Is this Apple’s version of the JDK doing it? Now that I don’t know.

I can believe that it should, I just wonder why it doesn’t. I’ll have to run my startup tests on Linux and Windows.

These microbenchmarks are partly misleading. If you subtract the ~10M overhead of the JVM, then in 9 out of the 13 benchmarks, the JVM uses at most two times as much memory as the C++ application. In 9 out of 13 benchmarks, it then uses less memory than the Python appplication. The question that remains is: does that also hold for real-world applications?

However, even if it does: the applications I have built serve thousands of clients a day within 128M of memory. That an equivalent C++ app could do it in 64M really isn’t that interesting. I understand it being a problem if you have really large applications, but for the average developer and the ones that want to use Clojure for scripting, I think it is perfectly acceptable. If someone gets to the point where their Clojure app gets too heavy, I think they can afford an 8G workstation :)

Although fairly new, LLVM 1.5 claims to support Custom Calling Convertions and tail calls.

Leave a comment

About this Entry

This page contains a single entry by John Wiegley published on March 15, 2009 6:28 PM.

Run times for Hello, World in 2009 was the previous entry in this blog.

Journey into Haskell, Part 1 is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Recent Comments

  • yoman: “Barfin”? “Slurping”? “Slime” “Hunchentoot” ??? What in the T.F. world read more
  • Bjorn Tipling: Why would you add instructions for installing an editor when read more
  • Mark Aufflick: sudo port install sbcl +threads If you previously installed sbcl read more
  • Alexander Lehmann: Thank. You. So. Much. – Clisp caused a lot of read more
  • Vetle: Btw, to get support for threading in SBCL, you have read more
  • ifade: I tried the same and get the same answer, but read more
  • Martial Boniou: Hi, I tried to install slime with MacPorts and I read more
OpenID accepted here Learn more about OpenID
Powered by Movable Type 4.261