Chapter 1: Project Architecture and Workflow

Creating a programming language involves several key stages:

  1. High-level code authoring

  2. Compilation to intermediate representation (IR)

  3. Assembly into machine code

  4. File loading

  5. Linking of external dependencies

  6. Final object file generation

Our current focus is on the compiler phase, often considered the most complex part of the language implementation process.We will tackle other parts later in the series

Compiler Pipeline

The compiler consists of two main components: the frontend and the backend.

Frontend:

  1. Lexical Analysis: Tokenization of the source code

  2. Parsing: Abstract Syntax Tree (AST) generation from tokens

Backend:

  1. Intermediate Representation (IR) Generation: Creating a machine-independent code representation

  2. Code Generation: Producing target-specific assembly from the IR

In subsequent chapters, we'll dive deeper into each of these stages, exploring their roles in the language development process and examining specific implementation details.

Key Concepts and Technologies

Throughout this series, we'll encounter several important concepts and tools:

  • Lexical Analysis: The process of converting a sequence of characters into a sequence of tokens.

  • Abstract Syntax Tree (AST): A tree representation of the abstract syntactic structure of source code.

  • Intermediate Representation (IR): A data structure or code used internally by a compiler to represent source code.

  • LLVM: A collection of modular and reusable compiler and toolchain technologies.

While we'll explain these concepts as we go, familiarity with basic programming concepts and language theory will be beneficial.

The terms used will be discussed in detail while implementing them

Stay tuned for the next chapter, where we'll start implementing the Development phase