Author Topic: Optimizing LPC for JIT use  (Read 3716 times)

Offline silenus

  • BFF
  • ***
  • Posts: 178
    • View Profile
Optimizing LPC for JIT use
« on: January 10, 2014, 09:16:57 AM »
I am curious to what extent people feel it may or may not be possible to optimize LPC for JIT use. The main issues I see are that any return value from a call other is of mixed type making it difficult to infer types without some sort of caching strategy and perhaps constant program inference of an interprocedural nature. This is probably nontrivial to do even for an existing LPC driver such as dgd or fluffos and may involve replacing the virtual machine or many opcodes with typed versions rather than relying on dynamic type field identification at runtime.

Offline FallenTree

  • BFF
  • ***
  • Posts: 483
    • View Profile
Re: Optimizing LPC for JIT use
« Reply #1 on: January 10, 2014, 12:39:53 PM »
from driver point of view, there is only one data structure, which is svalue_t , so it hardly matters .   I've put some thoughts in implementing a JIT using llvm for LPC  , which is mostly recorded at https://github.com/fluffos/fluffos/issues/61

Offline silenus

  • BFF
  • ***
  • Posts: 178
    • View Profile
Re: Optimizing LPC for JIT use
« Reply #2 on: January 11, 2014, 02:55:24 AM »
Thanks for the reply. IMHO one will be able to maximize the benefits of using a JIT without moving towards the use of bare integer and float types over svalue_t. The second problem relates to the inlining issue. Because of how dynamic call other is it is difficult given the current infrastructure to predict the return type and fluffos like dgd just expects something akin to a mixed value. The value then needs to be type tested using a tag before applying whatever the next function or operator is. I think much of the slowdown in lpc comes from this extra branch processing. It would be nice if it were possible to optimize away some of it.

Offline FallenTree

  • BFF
  • ***
  • Posts: 483
    • View Profile
Re: Optimizing LPC for JIT use
« Reply #3 on: January 11, 2014, 02:58:46 AM »
The primary cost for a VM execution is due to opcode switch broke modern CPU's deep execution pipeline, branch misprediction, cache misses etc. we discussed a bit on another bug on  implementing a jump table.

IMO, without even having a working function level JIT implementation, discussing return type (optimizing inter-function) is just too early...

Offline silenus

  • BFF
  • ***
  • Posts: 178
    • View Profile
Re: Optimizing LPC for JIT use
« Reply #4 on: January 11, 2014, 10:56:33 PM »
Is anyone currently working on it for fluffos? I was thinking about doing something similar for dgd at one point.

Offline FallenTree

  • BFF
  • ***
  • Posts: 483
    • View Profile
Re: Optimizing LPC for JIT use
« Reply #5 on: January 11, 2014, 11:16:36 PM »
I've been planning to refactor the code to the point where it would be possible.  But given recent progress on that front, I think I would have to push whole thing to 3.1 (i want to focus on finishing a 3.0 release within this month).

Offline silenus

  • BFF
  • ***
  • Posts: 178
    • View Profile
Re: Optimizing LPC for JIT use
« Reply #6 on: January 12, 2014, 01:28:45 AM »
There are probably at least two ways of doing it. The simplest would be to replace eval_instruction with a pair of functions one for compiling the bytecode to llvm bitcode and one for executing it. I think ladyvm the java interpreter associated with the llvm project uses a similar trick where it uses a similated stack to convert stack bytecodes into SSA form. It is still nontrivial given the number of opcodes one needs to account for and adjust to run with the llvm bitcode Interpreter/JIT. I actually surprised by the number of opcodes in fluffos compared to dgd. Keeping so many optimized versions of instructions around(4-5 versions for call efun alone, doubling up of each comparison operator for branching) makes the burden of doing a conversion quite great.

Offline FallenTree

  • BFF
  • ***
  • Posts: 483
    • View Profile
Re: Optimizing LPC for JIT use
« Reply #7 on: January 12, 2014, 03:30:35 AM »
I've fixed that in 3.0, there is only 1 EFUN opcode now.   

I only briefly looked at a few other drivers LPC implementation, how much do you think the semantic differ? It is quite a shame that since LPC difference, different driver/lib can't converge. It's tough in today's world where development resource are more scared...

On actual implementing JIT is to actually focus on expand the big switch directly.  Refactor each opcode's C code to a point to be mostly self contained and  then JITing a function means driver would generate a C source (using the exact source that interpreter uses to execute each op) , just with the switch unrolled. 

Then, using this new generated C source, directly use LLVM library to do compiliation, leveraging all common optimization    available for C,  and result in a dynamic library and call it from driver. Of course , this are just unfinished thoughts for now. I want to avoid having to maintain two version of opcode code (interpreter and potential llvm bytecode) , it should just be one version around. Maybe bite the bullet and ditch our own VM implementation and  switching to use a mature interpreter (lua? python? java? is another choice.

Offline silenus

  • BFF
  • ***
  • Posts: 178
    • View Profile
Re: Optimizing LPC for JIT use
« Reply #8 on: January 12, 2014, 06:35:31 AM »
Actually LLVM is a tool that primarily compiles bitcode into machine code either statically or dynamically and hence can be used to generate a JIT. One basically takes the LPC code and uses a parser to translate the code into ast form then convert it to bitcode instead of the rather adhoc bytecode fluffos uses now. The bitcode can be translated in memory or into a file into machine code and then executed either via a function pointer or treated as it's own executible once the remaining OS requirements in the binary file are met (such as with ELF).

The semantics of different drivers differs enough that it probably isnt possible to automatically port a mud library without at least some tweaking. I do have a passable understanding of three drivers, fluffos, dgd and ldmud. Dgd differs from fluffos in that it supports full persistence but lacks function pointer support among other things. LDmud supports lambda closures which are like function pointers but the syntax is different. Internally dgd is much cleaner than fluffos since it is all written by own hand- Dworkin's.

I considered writing a ground up project for lpc in java leveraging the jvm to do most of the work for me as well- but I think that getting all the little details right would take years of effort before it could run something similar to dead souls for example- something which fluffos already does.

Offline FallenTree

  • BFF
  • ***
  • Posts: 483
    • View Profile
Re: Optimizing LPC for JIT use
« Reply #9 on: January 12, 2014, 08:44:27 PM »
LLVM is bytecode->machine code, i'm mostly saying clang part.

The idea (maybe i wasn't clear as above) is to simply convert each opcode as a macro (or a function call), and use clang + llvm to generate optimized machine code using the same code that build the driver interpreter.

As a example:

void eval_instruactions() {
   switch(*pc++) {
       case OP_1:
             op_1(pc, sp);
              break;
       case OP_2:
             op_2(pc, sp);
             break;
   }
}

, then for a series of opcode that driver would compose a C source as

void function(pc, sp, some_other_arguments) {
   op_1(pc, sp);
   pc++;
   op_2(pc, sp);
}
 
and together with embed source code of op_1(), op_2() (used when compiling the driver, too) ,  then use clang+llvm to generate machine code and link as a dynamic loading library. It's basically dynamically invoking static compilation LPC. 

mudos use to have LPC_TO_C functionality,  wodan removed it in fluffos since it's not working, i really should go back and see what happens there.

Offline silenus

  • BFF
  • ***
  • Posts: 178
    • View Profile
Re: Optimizing LPC for JIT use
« Reply #10 on: January 13, 2014, 02:29:12 AM »
The canonical approach of doing this with llvm is not to include the whole of clang into the binary and just use the llvm binaries. I would suggest just converting the expressions into llvm::bitcode which is LLVM's IR and have the libraries do the rest of the work for you rather than doubling the parsing overhead. The issue of having two IR's is an interesting one. It is possible to just convert LPC directly into llvm bitcode obviating the need for a custom virtual machine but it is probably a bit more work though cleaner. I may attempt to do this once I get more comfortable with FluffOS. Is most of the compilation procedure in icode.c and icode.h?

Offline FallenTree

  • BFF
  • ***
  • Posts: 483
    • View Profile
Re: Optimizing LPC for JIT use
« Reply #11 on: January 13, 2014, 09:33:53 AM »
well,  include clang or matters very little, in fact, my example doesn't even depend on llvm, it can use whatever compiler used to produce driver.

bottom line is...try to re-implement all the opcode into another format is a big deal , and keep two versions laying around is best way to introduce incompatibility and bugs...

As for opcode generation, start looking in compiler.cc  (search for prolog, yyparse() and epilog()) .  from there you will get into icode.cc .

Offline silenus

  • BFF
  • ***
  • Posts: 178
    • View Profile
Re: Optimizing LPC for JIT use
« Reply #12 on: January 20, 2014, 07:51:20 AM »
I think ideally it would be nice if we could move the entire fluffos into llvm::bitcode then run off the llvm jit. This may not be as hard as it seems since one has an ast generation pass in trees.h trees.c and one just needs to rely on icode.c icode.h to generate llvm bitcode. I think a bitcode system within another isnt too bad an option i think the other option is somewhat uglier i.e. packaging the entire compiler with fluffos as a library or something that is called externally.

I probably will make some more minor modifications to my driver copy to get a handle on how it works but having the entire thing run off of llvm is an interesting goal and probably a lot of work. The main problem I would have integrating llvm is it lacks native support for unions which fluffos uses extensively in svalue_t which is all over the place. I would have to do some research into how bitcasting works to get something functioning.

Offline Camlorn

  • Friend
  • **
  • Posts: 76
    • View Profile
Re: Optimizing LPC for JIT use
« Reply #13 on: January 24, 2014, 02:07:12 PM »
This is interesting, and I can't say I know much about it.  One possible approach, though perhaps not the best, would be to reimplement a driver as rpython, and translate it through pypy.  The pypy people aren't aiming for python interpreter, they're aiming for jit language creation toolkit.  Downsides: reimplementing the world of built-ins (but it's not actually that big, unless I'm missing something), and needing to reimplement the core language.  Unfortunately, documentation is lacking (but this is somewhat true of llvm as well).  The tutorials I've seen make it look simple, at least for simple languages, and any new optimizations that Pypy's JIT gets are going to be given to us for free, probably without changes.
I'm not sure that this is worth it from the what-we-gain perspective.  LPC runs fast enough for everyone, and isn't going to get a significant number of new recruits because now it's a JIT.  If you're writing a mud that really, really, really cares about performance, you probably decided to start out with C.  I suppose a few muds would benefit;  those which have been around for a very long time and found performance issues creeping up on them.  Lost Souls is my example, but there's probably some others.  It's worth it from the cool project perspective, of course.  Is the point of this to actually gain something important/useful, or is it just one of those fun, cool, and really interesting projects?  I'm curious if my assessment about muds not really noticing or needing it is actually accurate-I really feel like it is.  And I'm not saying don't do it, I'm just not sure what the motivation is.

Offline quixadhal

  • BFF
  • ***
  • Posts: 631
    • View Profile
    • WileyMUD
Re: Optimizing LPC for JIT use
« Reply #14 on: January 25, 2014, 07:43:42 AM »
I think the benefit you'd gain from re-implementing the driver in some other fashion isn't speed (really, that can't matter unless you have 10,000 players), but the ability to more easily extend the driver.

To be honest, if you really want to move away from using your own VM and push everything into an external VM system... please just create LPC.NET and call it good.  The .NET architecture is very good, and allows you to mix and match various languages (with some restrictions).  It also allows you to use the very powerful visual studio debugger.  I think the mono project is still plugging away at keeping the linux version more-or-less in sync with the official one as well.