Ranter
Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Comments
-
@Lensflare
In a VM, you're working with bytecode, which is an instruction set not unlike assembly that instructs your program to perform certain, usualy low-level actions.
In the implementation, the VM implementation is usually just iterating a buffer of your VM's bytecode, evaluating instructions one by one and jumping to a handler for each opcode (e.g. an `add` instruction will jump to a piece of code that reads stack values or VM registers to add two numbers together and store it somewhere, whereas a `sub` instruction will jump elsewhere).
You need a conditional to perform the instruction decoding, which basically reads the byte(s) and decodes them to an instruction value that corresponds to a location of code to jump to.
In standard C, there are only a handful of options, the most common three being:
- if..elseif..elseif..else nests
- switch statements
- function pointer arrays -
Chained if..elseif..else nests have the problem that they compile to inefficient branching code, meaning the CPU is going to miss the branch quite often and incur a lot of overhead. Not cool.
A switch usually gets compiled to machine code that is slightly better than chained if..elseif..elseif..else nests, but still (usually) incurs a branch. Further, not all compilers have great optimizers for switch statements, and even if your instruction set is linear and gap-free it'll still cause branching in the CPU pipeline.
Function pointer arrays are the more maintainable option IMO. It's just an array with the arity of however many opcodes you have, and each element is a pointer to a function that handles that opcode. The obvious downside here is that function calls are relatively expensive, especially on the hottest of hot-paths (a VM). -
So what's a better solution? Well if we know our opcodes are contiguous and start from zero (ascending) then we can simply do an array lookup - like the function pointers - but ideally without the cost of a function call. After all, it's just a code-flow thing; we can jump to anywhere in the executable and be fine as long as it expects it.
Enter computed goto. It's a `goto` statement but takes an address instead of a label. Further, the extension allows you to use the double-amp operator (&&, unrelated to C++'s rvalue references) on labels, which gives you a `const void *` of the instruction directly after the label. -
That means you can do something like `const void *dispatch[] = {&&do_instruction_PUSH, &&do_instruction_POP, /*etc*/};` and then use the `goto *` variant (note the asterisk) to jump to a computed location based on the instruction byte: `goto *dispatch[current_instruction]`. In this contrived example, the byte `0x00` would thus jump to the PUSH instruction, and `0x01` would jump to the POP instruction, all without incurring a function call penalty.
It's incredibly useful but for whatever reason not specified in the C spec. I wish it were, because it's an amazing feature when you know your enums are bounded to spec. -
You can also use the same logic e.g. for sorting networks if you know that your list lengths will be short, but may vary.
Instead of a switch-case for the list length, you jump to the code part where the sorting network for that list length is handled.
Now, some compilers actually do emit such a table based jump for switch-cases - but there's no guarantee on that, no way to enforce it, and it may change upon any code update. -
@junon no I mean what’s the reason that it didn’t make it into the c standard?
But thank you for the detailed description of computed gotos 😄 -
@Voxera From one of @junon's comments above: "function calls are relatively expensive, especially on the hottest of hot-paths (a VM)."
-
I'm kinda happy it's not a standard trick, because it would be abused so, so easily. But yeah, it's pretty neat for dispatch type stuff in a hot path. I don't think I'd use it anywhere else.
-
@RememberMe
> because it would be abused so, so easily
That's not a reason why not to include something in the standard. Not being able to do the equivalent of a `JMP eax` or whatever in C is a little strange seeing as how C is just one minefield away from Assembly. -
@junon I know, I know, it's just me venting after having spent a week untangling some goto infested code that somebody thought would be "optimized" (it really wasn't, all it did was stop optimization passes and made it 200x harder to read). As somebody who loves both programming language design and systems/HPC, I think I hate goto misuse more than anything else out there.
-
Voxera113774y@Fast-Nop i missed that one, thanks for pointing it out :)
But if functions are to expensive you are doing serious low level stuff ;)
But in that case, you have to compile to byte code to begin with, either if its one specific platform compile to machine code directly.
Of if you need support for more, compile to java byte code, its supposed to be one of the most optimized vm’s there is.
Building a VM for that performance level is always stretching the limits and are likely to require special solutions. -
@Voxera The JVM is optimized because it uses the hotspot JIT engine if I understand correctly. That's a different concern altogether.
Related Rants
It's a fucking shame computed goto didn't make it into the C standard.
It's by far my favorite non-standard feature of C. Writing this VM would suck without it.
rant
c
computed goto