(Quetzalcoatl Home Page) (kdef.com) (VIC-20 Home) (Geek Home)
[History] [Things you to Do] [Future] [Design] [Development Guidelines] [Contact]

Quetzalcoatl ™
A Compiler Hackers' Guide
This Document: http://www.kdef.com/geek/vic/quetz/develop/chg.html Version 2.1.0 BETA - October 11, 2006
Author: Brendan Jones

October 8, 2006: Released Quetzalcoatl 2.1.0 BETA under GNU Public License (GPL). Never had time to fully comment the body, so this guide and the headers will have to do for now.

Legals

Before viewing, using, modifying or copying the Quetzalcoatl Source Code be sure to read the legal note in legal.txt. If this is missing, contact me. You can reach me through the kdef.com web site. I can also be reached through the VIC Denial Forums. I shan't include the e-mail address here because of spam. Alternately contact the Free Software Foundation for the GPL Legal Notice.

What is Quetzalcoatl?

The goal is for Quetzalcoatl to be a ruthless, squeeze-every-byte and waste-no-cycle compiler;
Something that would give you the absolute efficiency of assembler.
Letting you cut your code in C, without sweating on lost bytes or cycles.
Queztalcoatl - The Only Compiler with a Vision Statement.
(If Al Dunlap was a Compiler, he would be Quetzalcoatl)

Queztalcoatl is a Cross Compiler for 6502 Processors. Queztalcoatl runs under Windows, Linux, Solaris, DOS and MacOS. It can compile and link programs written in a subset of ANSI C, Assembler, the 1983 UPL language or any combination thereof. It is suitable for producing programs for the Commodore VIC-20, Commodore 64, Rockwell AIM-65 (Hey! isn't that a missile?) and Atari 7800. (You can run executables directly on emulators such as VICE, Free-64 or PCVIC). With some minor modifications Queztalcoatl will also produce code for the Apple ][, old 6502 game consoles (such as the Nintendo Entertainment System) and 6502 embedded systems.

The Quetzalcoatl Award goes to Harry Dodgson who took Quetzalcoatl in 2005 and ran with it, testing and swatting heaps of bugs, improving the code, adding new runtime targets and porting the compiler to the latest version of GCC and Solaris; Quetzalcoatl is Harry's program too.

This version of the Compiler Hacker's Guide has been expanded for the 2.1.0 BETA release. It includes a new section on optimisation which will hopefully explain much. While I never got the comments to the level I'd wanted to, with this guide it'll hopefully be enough for would-be compiler hackers to find their way around. One thing to keep in mind is Quetzalcoatl is a classic LL(1) compiler using a runtime based on the ideal stack machine. You should be able to find a discussion of this sort of compiler in any elementary compiler text.

I hope you and Queztalcoatl have fun riding together. J

History

Quetzalcoatl's roots go back to 1983, when I wrote the UPL Compiler for the VIC-20 just for fun. In 1983 Development tools were in short supply, so I ended up writing UPL in BASIC. This meant some compromises, such as restricting UPL to byte variables and no advanced features, such as local storage, pointers or structures. Despite this, UPL did run and with inline assembler and the mem[] array you write programs that were almost useful. When the VIC-20 finally lost its crown to the Commodore 64 I put my UPL and my other VIC-20 programs into storage, and chalked up writing UPL as a good educational experience. (You can download the original UPL Compiler by clicking here).

In 1998, having purchased an old second-hand VIC-20 at a garage sale, and having discovered VIC-20 emulators on the Internet, my interest in the VIC-20 was renewed. I wondered, as a dare, if I could could write a 6502 Cross Compiler equivalent to UPL in a night? It ended up taking a few nights, but I soon had something that compiled UPL. Of course UPL was kind of useless. Would it be possible to modify the compiler to read Tiny C programs as well as UPL programs? Indeed it was. In a few nights I had that running too. But the Tiny C/UPL hybird wasn't particularly useful either, so I started adding features like subroutine parameters, arrays and pointers. I'd also spent some time adding optimisation so Quetzalcoatl (as the hybrid was now named) started producing some impressively tight code. Unable to find a decent 6502 assembler for the runtime library, I wrote one of those too and integrated it into the compiler.

I had originally developed Quetzalcoatl to satisfy my own curiosity. At Linus Åkerlund's suggestion I placed its binary up on the web site and have been amazed ever since at the number of people that have downloaded it. (I mean, it's a compiler for a museum-piece CPU! Like, why would anyone care!? J)

In September 1998 I was in the middle of adding proper C data types (pointers, structures and composite types) and improving the optimisation when the real-world beckoned. (Oh to find a Landlord who thinks Retrocompilers are cool! J) I figured I'd return to Quetzalcoatl and finish it when time permitted, but as of time of writing it's been 12 months and I'm still flat out. (Heh. Make that 96 months!)

I had been thinking about open sourcing Quetzalcoatl for some time, but the source lacked all but the most basic comments explaining how everything is strung together. The Netscape experience had taught the world it's not good enough just to slap an open source label on a piece of software, and expect the world to rush in to fix it up for you. It has to be maintainable. Plus I take pride in my code; I don't want a half-assed, second-rate product with my name on it floating around out there, periodically coming back to haunt me as an example of why someone shouldn't hire me. (For my commercial quality work I'm expected to use ISO.)

Rather than let Quetzalcoatl sit around and gather bit-rot, I've been documenting it when I get the chance. As of 15 September 1999 it finally reached the stage where the basic structure is almost understandable. So Quetzalcoatl got a limited public release to people who had contacted me over the last year and expressed either a curiosity or an interest. Quetzalcoatl 2.0a5 was a limited source release to those people. Over the years various people did request it, play around with it and occasionally tweak it. Of these, Harry Dodgson wins the "Quetzalcoatl Prize" for a concerted effort testing, debugging, improving and porting the Big-Q in 2005. This lead to the GPL release in 2006

Over the years, one of the unexpected things was learning the number of Commodore 64 programmers who used Quetzalcoatl, not for the C, but for the Assembler. I had originally planned to use a third-party assembler for the runtime, but what was out there wasn't very good so I ended up writing my own. That turned out to be nice, because Quezalcoatl can compile its own runtime! The interesting thing is your program can sometimes find uses you'd never thought of. If you're lucky, you'll hear about it.

Future

As of October 8, 2006, Quetzalcoatl 2.1.0 BETA is now licensed under the GNU Public License (GPL). I've precious little time on my hands these days, so if the right person comes along I'd be happy to hand it over to someone else with the time and the interest. That may be how community projects evolve; You take something, run hard with it for as long as you can, and exhausted, you hand the batton to someone else. (Or put it on the grass and hope its glint catches someone's eye J).

I think Quetzalcoatl needs to reach a certain level of ANSI C functionality for the C-part of it to be useful. Unless it does, you'll find it kept out of the realms of "real" programs. Given how far it has already come, it'd be a shame not to go than final mile.

But that said, I don't think it needs to be 100% ANSI C compliant. For one, Quetzalcoatl uses an LL parser, where as C is a LR grammar. For two, it isn't really necessary. Very few C programs need 100% compliance, and if you must compile one with bells-and-whistles C there are already perfectly good 6502 C Cross Compilers out there. But as an LL compiler Quetzalcoatl offers some interesting opportunities, because you can really get in there and fine-tune the mechanics. For example, you can add custom optimisers to squeeze every byte and cycle out of the machine. This fits well with Quetzalcoatl's orginal goal; to produce 6502 code for a VIC-20; a machine with a scant 3.5Kb of RAM(!) In such a constrained environment, you have to make every little bit count.

To Do

Goals can be divided into four broad categories:

Arrays/Pointers.
Improvements that bring Quetzalcoatl closer to ANSI C, so that it can compile "real" programs.
Optimisation.
Versatility.

Here are some ideas for Quetzalcoatl development:

Array/Pointer Code Overhaul. Pointers and Indirection was the last code I added to Quetzalcoatl. I integrated this with the array code, but the results were less than successful. The problem was I tried to do it with the existing symbol table. This was fine for Tiny-C, but with this and the addition of structures, it became obvious the symbol table needs a rewrite so it can support complex types like "a pointer to an array of pointers to structures." The existing array/pointer code mostly works, but it isn't sufficiently flexible to be useful (e.g. you can only have one level of indirection) and I have doubts about its robustness. I think the best bet is a rewrite. At a minimum, test and make sure what is there works; (See UPL_COMP.CPP). But better yet, define a more advanced symbol table and replace that existing array/pointer code entirely. Which brings us to...
Proper C types. Quetzalcoatl sorely lacks the full range of proper C types, so it can't handle types such as int **foobar[];. It's necessary to support these so Quetzalcoatl can compile reasonably useful "real" programs. I'd already started many of these, but realised the symbol table needs a major reorganisation so it can represent these. The symbol table in its current form simply can't hold these.
- Pointers. These are partially implemented. You can use 1-level pointers as variables and parameters for call-by-reference. However you can't use them with composite types, such as int **foobar[]; which is where they become particularly useful. One level of direction isn't enough anyway, and they need to work with...
- Structures. Again, partially implemented. You might finish this off, or rewrite it from scratch.
- Maybe Classes. It's not much of a leap from structures to object-oriented classes. Virtual methods would probably be overkill, but basic methods and information hiding would be very useful. This would also let you compile basic C++ programs.
- a switch() statement. This should be easy! Check the if_statement().
- Some Operators. xor is missing. The increment/decrement should be treated as operators (so they can be used in expressions) rather than as statements.
Testing. This source release of Queztalcoatl is a BETA release; It contains bugs. We just don't know where. Unfortunately cognitive dissonance makes developers blind to their own bugs; the "Second Set of Eyes" thing. Quetzalcoatl needs a third-party testing effort, with both test cases (Harry did some of these; check the solaris directory) and "real" programs such as a demo or rewrite or sequel of your favourite 6502 classic. Personally, I'd like to do a new version of Sorcery, but existing 6502 cross-compilers lack "squeeze-every-byte" efficiency, and Quetzalcoatl lacks structures and arrays of structures which are really mandatory for a "real" program.
More Peephole optimisations. The UPL_COPT.CPP file contains a nifty peephole optimiser. This code peeks at code that has just been written, and if it knows a better or faster way of doing it (much the same way a human programmer would recognise such an opportunity), it does a rollback and replaces it. It's very easy to add new optimisations to this submodule.
A Flow Optimiser. After an executable has been linked, it'd be good to have a flow optimiser go through and do additional optimisations.
Tree Optimisations. Quetzalcoatl Rollback optimser can only optimise code at the current end of an object module. A tree-code optimiser would still be able to apply optimisation, even if the optimisable code is not at the current end of the module.
Relocatable Assembler. The Compiler only produces absolute (ie. non-relocatable) modules, and at that can only have one such (assembler) module per program. Quetzalcoatl is very fast, so this isn't necessarily a problem. But it'd be fairly easy to modify the assembler so it can produce relocatable object modules as well.
&&, ||, ! One feature I never liked about C was its el-obscuro operators. Thus Quetzalcoatl uses meaningful keywords like "and", "or", "not", etc. However for closer ANSI compliance it'd be a good idea to modify Quetzalcoatl so it can read el-obscuro as well (since added to ANSI anyway!)
Porting the Runtime Library. UPL/Quetzalcoatl can be easily ported to any 6502 platform. At time of writing, Quetzalcoatl has runtime ports to the VIC-20, C-64, Atari 7800 and Rockwell AIM-65. Despite the popularity of the Apple ][, there is no port to it. I discussed this with others in the 6502 Community, who said "the Apple ][ retro scene is quite different. They play games, but nobody writes anything or hacks." Anyway, adding new targets is quite easy; You probably don't even have to write any new code. Check the memory map, find some free memory locations (preferably zero page). Edit UPLRTIME.ASM. Assign them to the relevant pseudoregisters. Find somewhere to put the datastack. Tell it where to make it's I/O calls: call_print_ch_a call_get_ch_cooked_a call_get_ch_raw_a These respectively print or get the ASCII character in the accumulator. Cooked mode waits for them to press something. Raw mode doesn't wait, returning 0 if nothing was there. Full instuctions are in UPLRTIME.ASM.
Shrinking the Runtime Library. Anything that can be done to make the runtime library code smaller and faster is a good idea. Take a look at uplrtime.asm to see if you can squeeze any extra bytes and cycles out of it.
Granular Runtime Linking. As small as the runtime library already is, there will usually be parts of it included in an executable that are never called. The ability to selectively link only those parts of the runtime library that are needed by a particular executable would be a good idea.
Debugging Quetzalcoatl doesn't support debugging, other than the old "insert a print statement, compile and run it again" kind. Integrated debugging, somehow hooked in with a 6502 emulator, could be a good idea. To do that you'd ask the 6502 emulator developers for some sort of plug-in capability, so you can see where the virtual CPU is and cross-reference that to the EXE->OBJ->source.
That's it!

Rationale

Here are some changes I don't see as necessary, and why.

long. Quetzalcoatl doesn't support longs. You can't totally ignore your target machine if you want to code efficiently, and I'd be hard pressed to think of a 6502 program that *really* needs to count up to 4 billion. Plus the 6502 has a small number of registers. You can hold a 16-bit value in the A and the X registers. If you're working with 32-bit values you're going to have to store at least some its elements in memory, which'll slow things down dramatically, and add bloat out the runtime library. If the runtime was granular, that is less of a problem.
Templates. If you're writing large programs templates are quite useful, but I wonder if your average 6502 programmer would make much use of them? Could you implement them in a way is efficient on a processor with a small memory footprint?
Floating Point. This could go either way. Your average 6502 game wouldn't have much use for floating point. Consider the target: a 1 MHz CPU. While 6502s can and did crunch numbers, it took a big chunk of that computing power. (Even on the much faster early IBM PCs, the venerable Microprose still did all their 3D for F-19 Stealth Fighter using integer math! :-o) On the other hand, embedded processors may find the ability to do floating point math useful. Most 6502 computers with a BASIC ROM (except early Apples) already have some quite decent floating point routines working with various zero page locations. However you'd have to do some fancy juggling with subexpressions, because pushing and popping then off the datastack may prove too slow.

Design

LL vs LR

There are two broad families of compilers; LL and LR. Quetzalcoatl is an LL compiler. These are easy to write and easy to understand. The catch is they can only parse certain types of grammars. (A grammar defines the rules of a language.) C is actually an LR grammar, so I had to bend it somewhat to get Quetzalcoatal, an LL compiler, to be able to be able to parse it. (The official response is that Queztalcoatl isn't C, even if it happens to look a lot like it.)

LR compilers are internally quite complicated. Compiler writers need to use a Compiler Compiler such as YACC or Bison to build them. They do the job beautifully, but internally they're extremely boring. Quetzalcoatl was written for fun, education and as a testbed for optimisation, which is why I chose to write it as an LL compiler. If 100% ANSI compliance was my goal, LR would have won, hands down.

Architecture

Here are the major objects within Queztalcoatl:

Flex. The Lexical Parser. This bites "tokens" from the UPL/Queztalcoatl source file. Here are some example tokens: "(", "20", "+", "and".
upl_Symbols. The Symbol Table. This stores all active variables, constants and subroutines names and their attributes.
upl_Object. This object module holds generated object code/data.
upl_Segments. Each Quetzalcoatl/UPL source file has two segments; code and data. The upl_Segments object contains an object module for each.
upl_Context. This holds all the contextual information for a program being compiled; the symbol table, the object module segments, options, settings and statistics.
upl_Context_state. This remembers the compiler state. These are stored as code is generated. If the compiler later realises a better way it could have generated the code, it "rolls back" to the state, and rewrites the new, improved code.
upl_Compiler. This is where it all happens. Call program() with the Flex object and a upl_Context, and Quetzalcoatl will run away and compile your Quetzalcoatl/UPL source program. This calls other methods to parse different language elements. For example, if_statement() compiles an if-then-else statement, while_statement() a while statement, expr() an expression and so on. An example from while_statement is shown below.
upl_Assembler. This is the 6502 assembler used to compile the runtime library. It is called in the same manner as upl_Compiler.
upl_Link. This is the linker. It loads all the object modules and optionally the runtime libray to produce the final executable.

The upl.h file is a good place to look for more detail on many of these objects.

While Statement Example

The actual upl_Compiler parsing routines are contained in the file UPL_COMP.CPP. Here is a fragment from while_statement() illustrating the conceptual simplicity of an LL parser. For clarity, the address patching code is not shown in this example.


void upl_Compiler::while_statement(Flex& L, upl_Context& C)
{
  // ... patching declarations not shown

  upl_Expr_result       clause;
  upl_Expr		expr;

  L.check("while");
  if (L.matches("("))
    {
    expr(L, C, clause, 0);
    L.check(")");
    L.matches("do");
    }
  else
    {
    expr(L, C, clause, 0);
    L.check("do");
    }

  // ... statement parsing and patching code not shown
}

Also see the runtime library source; UPLRTIME.ASM. It's quite well commented, and has a good explanation of how subroutine calls work.

Optimisation Techniques

We will give an example here of optimising a piece of code.

Quetzalcoatl is a classic stack-based LL(1) compiler. Refer to any elementary compiler text to learn about these. Basically, it computes expressions by pushing intermediate values on a stack, and execution operators (e.g. multiply) by popping the arguments off the stack, doing the operation, then pushishing the result back. Compiler texts often refer to this as an "ideal stack machine"

For example:


weather = (wind + rain)

becomes


push wind
push rain
multiply		// Pops wind and rain off, multiplies, and pushes single result back
pop weather

But that's for an "ideal stack machine." We have to generate that in 6502 code, which is, shall we say, not quite so elegant? :-) While the 6502 does have a processor stack, it's very small and it already contains subroutine calls and is used by the operating system. One little slip up there, and we'll crash the machine. So the UPLRTIME.ASM defines a separate 256 byte area. We call this the datastack. We define our own routines; RUNTIME_PUSH_B to push a byte and RUNTIME_POP_B to pop it. With both of these we use the 6502 Accumulator to transfer our data. The accumulator is pushed with RUNTIME_PUSH_B, and popped with RUNTIME_POP_B.


LDA [wind]		// push wind
jsr RUNTIME_PUSH_B
LDA [rain]		// push rain
jsr RUNTIME_PUSH_B
jsr RUNTIME_MUL_BB	// multiply
jsr RUNTIME_POP_B
STA [weather]		// pop weather

The routine is equivalent to the ideal stack machine multiply operator. It does't (appear) to use the accumulator at all; it only changes what is on the datastack.

Now, actually this isn't quite right. If you multiply an 8 bit number by an 8 bit number, the result can be upto 16 bits long. This is called a "word" in 6502 parlance. Unfortunately the RUNTIME_MUL_BB routine only multiplies bytes, so we must use RUNTIME_MUL_WW instead (WW means multiply a word by a word). To do this we have to cast (convert) the byte elements on the stack into words. (The example here is straight from the .LST file you can get with the -a option. If you do this, use -f to force recomplilation since it won't generate the .LST file if the module has already been compiled.)


lda [wind]
jsr push_b	// This is RUNTIME_PUSH_B, as it appears in the LST style
lda [rain]
jsr push_b
jsr cast_b2w	// Convert the single byte on the top of the stack into a two byte word.
jsr pop_w_c	// We need to get the one underneath; 
		// temporarily store the word in the C pseudoregister (See UPLRTIME.ASM).
jsr cast_b2w	// Convert the single byte underneath into a word
jsr push_w_c	// Now push the word we'd stored in C back on the stack.
jsr umul_ww	// Unsigned multiply the two words
jsr pop_w	// Pop the result into the A and X registers.
sta [weather]	// A goes in the LSB (low-byte) of weather
stx [weather+1]	// X goes in the MSB (high-byte) of weather


How did we know the result of RUNTIME_MUL_BB was a word rather than a byte?
How do we know there are two bytes pushed on the stack, and not one?
Inside UPL_COMP.CPP there is a routine called op_result().
It takes two argument types (in this case, byte and byte) and an operator 
(in this case multiply), works out what they should be.
The normalise_binary() and normalise_unary()
routines call op_result() to find out what the result should be,
then cast/convert their arguments respectively.
(C-folks call conversion between types "casting").

That looks long for what it does... How could we optimise this.
One way is to convert the multiply to inline code,
but the 6502 lacks a multiply opcode so that wouldn't be very efficient.
For adds and subtracts though, that can work well.
See Peephole Optimsation below.

One thing that does look wasteful with that code are all those casting calls.
How could we speed this up, or given our small memory footprint, make it more compact?
Often you'll have to trade off speed and memory given the limited resources of the 6502.
In this case I'd suggest putting that casting code in the UPL Runtime,
and saving 9 bytes for every byte by byte multiply in our program; not bad!!!
We could add a routine to the Runtime called say cast_bb2ww:

                // PROCEDURE cast_bb2ww.
                //
                // Pop two bytes, cast them into words, then push them back.
                //
cast_bb2ww:
		jsr cast_b2w	// Convert the single byte on the top of the stack into a two byte word.
		jsr pop_w_c	// We need to get the one underneath; 
				// temporarily store the word in 
		jsr cast_b2w	// Convert the single byte underneath into a word
		jmp push_w_c	// Now push the word we'd stored in C back on the stack.
				// A one-way jump, so we don't need to rts.

We could then generate the above code:

lda [wind]
jsr push_b	// This is RUNTIME_PUSH_B, as it appears in the LST style
lda [rain]
jsr push_b
jsr cast_bb2ww	// Cast two bytes on stack into two word. #OUR NEW ROUTINE IS HERE#
jsr umul_ww	// Unsigned multiply the two words
jsr pop_w	// Pop the result into the A and X registers.
sta [weather]	// A goes in the LSB (low-byte) of weather
stx [weather+1]	// X goes in the MSB (high-byte) of weather

The file UPL_RUNT.H contains the runtime routine names.
We need to add our new symbol so we can call it:

#define	RUNTIME_PEEK_W_W        74

#define	RUNTIME_POKE_W_W	75



// Pop two bytes, cast them into words, then push them back.
// [bj 9oct2006]
//
#define	RUNTIME_CAST_BB2WW	76

We also need to add it to UPL_RUNT.CPP.
This must matche the name of the label in UPLRTIME.ASM:



char const *runtime[] =
{
...,

"cast_bb2ww"
}





Remember to comment what you add, both here and in the MODIFICATIONS section.

Now, here is the existing code in UPL_COMP.CPP normalise_binary():

  // Find out what the result of this operation will be,
  // and what the  and  value types are.
  // If these are different from the existing  and ,
  // then we must cast/convert them.
  //
  result = op_result(Op, Underneath, Top, New_underneath, New_top, Unsigned);


  // Convert the top operand; we can do this as an unary (one argument) cast.
  //
  if (Top != New_top)
    {
    normalise_unary(L, C, Top, New_top);

    // Normalise_unary() returns true if it generates code for the conversion,
    // or false if it generates no code (because none is necessary).
    // In either case, we don't neeed to check this. All that is imporrant 
    // is that we have the desired binary value  on the top of the stack.
    }


  // The underneath operand is harder, because we already have the top argument in our way.
  // Solution is to generate code that pops the top register, stores it in 
  // (a pseudoregister defined in the runtime library), call normalise_unary()
  // to convert the underneath argument which is temporarily on the top,
  // then push the value in  back on top of the stack.
  //
  if (Underneath != New_underneath)
    {
    C.mark(state);  // Remember in case we want to rollback.

    // Write code to pop the top value into the <C> pseudoregister.
    //
    C.code.out(ASM_JSR);
    C.code.out_word_patch(
      value_bytes(New_top) == 1 ? RUNTIME_POP_B_C : RUNTIME_POP_W_C);

    // Now the <Underneath> value is temporarily on the top of the stack!
    // Call normalise_unary() to convert it.
    // It will return true if it generated code to convert it.
    // If will return false if the conversion turned out to be unnecessary.
    //
    if (normalise_unary(L, C, Underneath, New_underneath))
      {
      // Code was generated to do the conversion.
      // Now all we need do is push <C> back on the stack.
      //
      C.code.out(ASM_JSR);
      C.code.out_word_patch(
	value_bytes(New_top) == 1 ? RUNTIME_PUSH_B_C : RUNTIME_PUSH_W_C);
      }
    else
      //
      // Normalise unary did not generate any code.
      // This happens if the conversion is unnecessary.
      // For example, converting a ushort to a short.
      // If it didn't generate any code, then we can skip this 
      // part of the conversion.
      //
      C.rollback(state);
    }


First note the Rollback Optimisation.
At the start of the if (Underneath != New_underneath) statement block,
we take a watermark with C.mark(state);.
If we later decide the code we have generated after that is unnecessary, we can rollback.
We do this by calling C.rollback(state);.
Read the above example to see how we do this.

Returning to our changes, we want to replace that entire block of code with 
a single call to our new RUNTIME_CAST_BB2WW routine.
There are quite a few ways you could do this.
We'll arbitrarily choose this one:

  // If parameters are both bytes, and we need to convert them both to words,
  // then call our RUNTIME_CAST_BB2WW routine. [bj 09oct2006]
  //
  if (value_bytes(Top)		  == 1
  &&  value_bytes(Underneath)	  == 1
  &&  value_bytes(New_underneath) == 2
  &&  value_bytes(New_top)	  == 2)
    {
    C.code.out(ASM_JSR);
    C.code.out_word_patch(RUNTIME_CAST_BB2WW);
    }
  else
    {
    ...

This would work, but it's not future-safe.
What if we add new types in the future that can't be cast in this way?
It's safer to say explicitly what types we want to convert.
This is better. It's a bit tedious type spell out everything,
but this way you can be sure you don't accidentally "optimise"
the wrong thing in the future:

  if ((Top == upl_byte		  || Top == upl_char		|| Top == upl_boolean)
  &&  (Underneath == upl_byte	  || Underneath == upl_char	|| Underneath == upl_boolean)
  &&  (New_top == upl_ushort	  || New_top == upl_short)
  &&  (New_underneath == upl_ushort || New_underneath == upl_short))
    {
    C.code.out(ASM_JSR);
    C.code.out_word_patch(RUNTIME_CAST_BB2WW);
    }
  else
    {
    ...

If our specific optimisation case doesn't work, we run the existing code.

In this example, you changed the runtime library.
The code you generate will only work with your modified runtime library.
When you make changes to the runtime library, you need to consider how useful
the changes will be. If your changes won't be used by the majority of programs,
it's adding bloat to the runtime library. A proposed extension to Quetzalcoatl
would be to make runtime library linking granular, so that parts of the runtime
which aren't used can be skipped at link time.

The code you've seen here hasn't at time of writing been added to UPL_COMP.CPP.
Rather, this is an example of one situation you might optimise.


Linking

Linking isn't opimisation, but this is a timely juncture to discuss it.

Quetzalcoatl has a full-blown linker.
A linker takes each object module (compiled separately),
and combines them into a single executable.
When each object module is generated, the addresses are all relative.
This is because we don't know what the final executable will look like.
The linker takes each object module, decides where to place it in memory,
reallocates all its relative addresses and patches its calls to the runtime library.

Note how we said C.code.out_word_patch(RUNTIME_CAST_BB2WW);.
The patch says that the address of the RUNTIME_CAST_BB2WW routine
is not yet known, and will need to be patched at linktime (when we finally tie the 
runtime library together with the code).

In other places where you generate code you will see calls like this:

  C.code.out_word(test_label, upl_code_word);

The upl_code_word enumeration says that what we are outputting 
here is a word (2 bytes) of code.
The linker will then know to reallocate it to the code segment at link time;
The code segment being where the code lives.
The data segment is where the data lives.

For example, we may assign the variable weather the relative address
of 0x0003 in our module's data segment. Later, the linker may decide to relocate
this modules data segment to the absolute address 0x2410.
Thus weather's address needs to be corrected to 0x2413.
For example this code:

jsr pop_w
sta [0003]      ; Relocate data word
stx [0004]      ; Relocate data word 

would be relocated thus:

jsr pop_w
sta [2413]
stx [2414]


Instead of writing words, you may write a high-byte or low-byte.
For example, upl_data_byte_hi says that the byte we are outputing
should be relocated to the data segment, adjusted with the high byte.
This is actually much easier on modern CPUs,
but on the 6502 we're forced to adjust the LSB and MSB individually.
To print a string, we need to load the address into the A and X registers.
Since the address is split, we can't do a word relocation.
We must do a byte relocation instead.
This example prints the string begining at byte 76 (hex) in the data segment.

lda #76       ; Relocate data low byte
ldx #00       ; Relocate data high byte
jsr print_string_ax

Relocated to 0x2410 this would become, that string would begin at 0x2486. Thus:

lda #86
ldx #24
jsr print_string_ax



Transient (Register) Optimisation

Here is a piece of code that loads a character into the variable pressed:

pressed = getch();

This compiles to:

jsr get_ch_a
jsr push_b
jsr pop_b
sta [pressed]

Note we push the value and immediately pop it.
Those two steps are redundant.
It is better if we generate this:

jsr get_ch_a
sta [pressed]

The trouble is, at the time we generate getch()
we don't know if they'll be manipulating that value before they store it.
Quetzalcoatl optimises this sort of code by combining rollbacks we saw easier
with what we call transients.
A transient declares something like
"at the moment, the value you want is a character stored in the register A".
This code from UPL_COMP.CPP does precisely that:

      C.code.out(ASM_JSR);
      C.code.out_word_patch(RUNTIME_GET_CH_A);

      // Remember the state we are in now, where the character is in <A>.
      //
      upl_Context_state	key_state;
      C.mark(key_state);

      // Remember that we have stored the character in the accumulator 'a'.
      // We call this <*transient_a>. (Later we may decide to rollback
      // to this point...)
      //
      Result.set_value_type(upl_char, 0, upl_transient_a, &key_state);

      // For now though, we push it on the datastack.
      //
      C.code.out(ASM_JSR);
      C.code.out_word_patch(RUNTIME_PUSH_B);

After we've parsed the getch() expression,
the assign_statement() parser indirectly calls 
load_reg() which makes sure the expression parsed
is loaded into registers.
The first thing does it is look to see if a transient is available.
If it is, it rolls back to the code (in this case to C.mark(key_state)).
By doing this, it knows the expression result (the character returned by getch()),
is sitting in the A register. 

void upl_Compiler::load_reg(
		upl_Context&	  C,
	const 	upl_Expr_result&  Clause)
{
  Select(Clause.is_transient() and Clause.transient != upl_transient_znc)
      C.rollback(Clause.transient_state);

      switch (Clause.transient)
	{
	case upl_transient_a:
	case upl_transient_ax:
	  break;

	case upl_transient_x:
	  C.code.out(ASM_TXA);
	  break;

	case upl_transient_y:
	  C.code.out(ASM_TYA);
	  break;

	case upl_transient_none:
	default:
	  abend(WHERE0, "Bad case");
	}

	...

assign_statement() then calls reg_var(),
which takes the register values and stores them in the variable.
The result is the optimised code we wanted:

jsr get_ch_a
sta [pressed]


Transients can include register values
A, A/X, X, Y
and also the condition flags
ZNC.
This lets us test branches by similarly avoiding a redundant push and pop.


Constant Optimisation

When expr() parses an expression, it not only generates the code
but it also tries to evaluate the expression at compile time.
If the expression includes function calls or looks up variable values,
then obviously it can't do this.
But if the expression is made of numbers and constants,
then it can.
If it finds the expression has a constant value,
it rollsback the expression code and uses the final value instead.
So this code:

rain = 12 + 6 * 2

compiles to this:

lda #24
sta [rain]



Peephole Optimisation

UPL_COPT.CPP implements a peephole optimiser.
It looks at code as it is generated, and optimises it on the fly.
For example, if we spot them multiplying or dividing by a power of 2 (2, 4, 8, etc.)
then we can use a much faster bitshift:

  // If  is a non-zero constant with a single bit (e.g. 2, 4, 8, 16, etc.),
  // then we can perform fast multiplication or division using bit shifting.
  // This is much faster than calling the runtime libraries routines.
  //
  if (Op == upl_mul || Op == upl_div)
    if (c->value > 0)
      if (c->is_constant() && upl_Utility::count_bits(c->value) == 1)
	fast_mul_div = true;

and then below:

	case upl_mul:
	case upl_div:
	  {
	  // Assertion: expression  now sits in the accumulator.
	  //

	  // How many times we have to shift the accumulator depends
	  // on the value held in the constant <c>.
	  //
	  int shift_bits = upl_Utility::log2(c->value);


	  // Generate a shift instruction for each bit we have to shift the accumulator.
	  // If we're multiplying, shift left.  If we're dividing, shift right.
	  //
	  for (int i=0; i<shift_bits; i++)
	    C.code.out(Op == upl_mul ? ASM_ASL_A : ASM_LSR_A);

	  // The result of the addition is held in the accumulator.
	  //
	  push_byte_a = true;
	  }
	  break;

See UPL_COPT.CPP for details.
This source file is well commented,
and this type of optimisation is the easiest/most modular.
It's also the module that most benefits from human eyes;
UPL_COPT.CPP optimises the same way that people dom 
effectively looking at the last few lines of code and saying "Hey! I know a better way!"



Enabling Peephole Optimisation

By default, peephole optimisation is disabled.
Turn it on using the -Op option.
For example: quetzal -Op test.c

Once sufficiently tested we should enable peephole optimisation by default.


Flow Optimisation

A Flow Optimiser would take the entire linked executable,
and optimise it in its entirity.
This is sometimes called "Whole-of-program Optimisation."
Currently Flow Optimisation is not supported.
It is a suggested future extension.



Development Guidelines

Until a suitable volunteer appears, I'm happy to act as a central
point-of-contact for people wanting to modify or otherwise extend
the code.  I'll keep a list of active developers and what they
are doing on the
Queztalcoatl web site.

Source Guidelines

If you change the source code, and want to integrate those
changes back into the Queztalcoatl primary distribution,
please:

Comment your code properly.
Keep to the naming, commenting and indentation conventions
used in the source.
Let others know what you are doing.  This'll save two or more people
all doing the same thing at the same time.
EVERY SOURCE FILE HAS A MODIFICATION SECTION.
PLEASE KEEP THIS UP TO DATE!
This is located at the start of each file.
When you make a change, make an entry in this section.
Assign the modification a sequence number (the previous modification number
plus 1, or 1 if there are no prior changes).  Enter this number, your
initials, the date and a brief description of the change.  At the point where
you make the change enter a comment describing it, along with the following
tag: [initials date MODn].
eg. [bj 15sep1999 MOD6].
If your changes are long include
comments denoting where they start and finish.
Sticking with this format will make it easier for multiple people
to make changes to a module without tripping over each other.

When appropriate, prefix a comment with the keyword:
CAUTION, NOTE, TODO or when you're unsure of something,
UNSURE.  This'll help other programmers quick identify
the relevance of your comment to their current situation.


Contact

If you're thinking of changing the code and want to check in,
or you're trying to understand what a particular piece of code
does or why it was implemented that way, then please
contact me (See Legals above for contact details).

If you're using Quetzalcoatl for a 6502 programming project,
I'm also interested to hear from you.  To date there have been
over (lost count) downloads of Quetzalcoatl and I'd love to know what
on Earth it is that people are doing with it! J



Links

The Queztalcoatl web site.

Contains the Quetzalcoatl binaries, news and links to the development pages.

The Geek VIC-20 web site.

My collection of software, links and information on the VIC-20.


The Kdef.com web site.

Where I Work.





© Brendan Jones, 1999-2006.  All Rights Reserved.
Quetzalcoatl and sachiel.com are trademarks of Brendan Jones.
Kdef.com is a trademark of Kestrel Defence.
Pussiesgalore.com.au is a trademark of Katina Balson.
VIC-20 and C-64 are probably registered trademarks of someone.

This file is part of The Quetzalcoatl Compiler.

The Quetzalcoatl Compiler is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.


The Quetzalcoatl Compiler is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with The Quetzalcoatl Compiler; if not, write to the Free Software
Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA