[RFC] JSIR: A High-Level IR for JavaScript

phisiart · April 6, 2026, 7:40am

This RFC introduces JSIR, a high-level IR for JavaScript:

JSIR preserves all information from the AST and supports high-fidelity round-trip between source ↔ AST ↔ JSIR;
JSIR uses MLIR regions to represent control flow structures;
JSIR supports dataflow analysis.

JSIR is developed and deployed in production at Google for code analysis and transform use cases.

JSIR is open source here: GitHub - google/jsir: Next-generation JavaScript analysis tooling · GitHub.

Motivation

Industry trend of building high-level language-specific IRs

The compiler industry is moving towards building high-level language-specific IRs. For example, the Rust and Swift compilers perform certain analyses on their high-level IRs before lowering down to LLVM. There are also a number of ongoing projects in this direction, such as Clang IR, Mojo, and Carbon.

The need for a high-level JavaScript IR

Why do we need a high-level IR for JavaScript specifically? While much of JavaScript tooling relies on ASTs (like ESTree), complex analyses require a control flow graph (CFG) and dataflow analysis capabilities, which JSIR provides by using the MLIR framework.

Source-to-source transformations

Many JavaScript tooling use cases require emitting JavaScript code as output.

For example:

Transpilation: Babel converts newer versions of JavaScript to older versions of JavaScript, to maximize browser compatibility.
Optimization:
Closure compiler optimizes JavaScript into shorter and faster JavaScript, to minimize download time and maximize performance.
Bundling: Webpack bundles multiple JavaScript files into a single JavaScript file.

These tools all need to operate on a representation that allows code generation back to JavaScript. As a result, they all operate on an AST.

There are tons of AST-based open-source tools, but no IR-based ones

The JavaScript community has a lot of AST-based open-source tools. For example, Babel provides, as its public APIs, an AST with traversal and scope utils; ESLint depends on espree which also provides an AST and relevant utils.

To get a sense of how many JavaScript ASTs there are, here is a list on AST Explorer:

There is even a standard for JavaScript ASTs - ESTree.

However, there is no tool that exposes an IR instead of an AST. JSIR seeks to fill this gap.

Use cases at Google

JSIR is used at Google for code analysis and transform use cases. For example:

Decompilation

JSIR is used for decompiling the Hermes bytecode all the way to JavaScript code, by utilizing its ability to be fully lifted back to source code.
Deobfuscation:

JSIR is used for deobfuscating JavaScript by utilizing its source-to-source transformation capability.

See our latest paper on how we combine the Gemini LLM and JSIR for deobfuscation. This paper has been accepted to ICSE 2026 SEIP track and will be presented on April 15, 2026.

JSIR design goals

A public and stable IR definition

JSIR seeks to fill a gap in the JavaScript community for a public, open-source IR-based tool. To achieve this goal, the definition of JSIR is public, stable, and comprehensive. In particular, it closely follows ESTree, to the extent that most, if not all, JSIR operations have 1-1 mappings from ESTree nodes.

Captures all source-level information

JSIR is not intended to be used for low-level optimization (for example, a JIT is expected to define lower-level IRs, even though they might be lowered from JSIR). Instead, JSIR is a high-level IR that represents all source-level information, in order to support use cases like source-to-source transformation and decompilation.

One design goal for JSIR is that we can convert JSIR back to the JavaScript AST perfectly. In other words, the following round-trip should be lossless:

Source ↔ AST ↔ JSIR

Easy to use

The key benefit of an IR over an AST is that we can perform dataflow analysis. Therefore, we need to expose a dataflow analysis framework (built on top of the MLIR dataflow analysis framework). Such framework must provide an easy way of defining lattices and transfer functions.

Other considerations:

Can we provide an easy IR traversal util like @babel/traverse for AST?
Can we manipulate the IR in JavaScript / TypeScript?
Can we integrate JSIR into godbolt.org?

Why JSIR is interesting to the MLIR community

Battle-test MLIR functionalities

JSIR’s success would provide solid proof that MLIR is capable of defining IRs for general purpose languages. Currently, the core “IR definition” part has been proven to be effective, as demonstrated by JSIR and other projects like ClangIR and Mojo.

Now, we seek to battle-test more “advanced” MLIR functionalities. For example, we have made a wrapper dataflow analysis API on top of MLIR to provide ease-of-use improvements, and hope to contribute our learnings by upstreaming some of these improvements. We also seek to use and potentially improve symbol table, memory effects, etc.. All of these will make MLIR truly the go-to option for building any compiler in the future.

Use MLIR to “represent AST”

There have been discussions on whether MLIR can be used to represent the AST
(reference), and Mojo is pioneering the idea of parsing directly to MLIR.

JSIR is aiming at something even more extreme - an IR that can lift back to source. If JSIR is successful, then it really proves that MLIR can represent ASTs, and that the boundary between AST and IR is perhaps very blurry.

Eliminate the need for ASTs

If a high-level IR can preserve all information from an AST, then we can start to question whether we need ASTs at all.

The fact that Mojo and Carbon perform all analyses on IRs suggest that IRs already have all the analysis capabilities to replace ASTs. However, our experience has shown that developers (especially those unfamiliar with compilers) find ASTs much easier to understand and work with compared to IRs.

JSIR design highlights

NOTE: This section is taken from intermediate_representation_design.md in the repo.

A critical goal of JSIR is to ensure an accurate conversion of the IR back to the AST. Paired with Babel’s AST → source printer, this means we can lift the IR back to source. This “reversible” IR design enables source-to-source transformations - we perform IR transformations then lift the transformed IR to source.

Internal evaluations on billions of JavaScript samples showed that AST - IR round-trips achieved 99.9%+ success resulting in the same source.

In the following sections, we will describe important design decisions that achieve this high-fidelity round-trip.

Post-order traversal of AST

Let’s start from the simplest case - straight-line code, i.e. a list of statements with no control flow structures like if-statements.

Each of these simple expression / statement AST nodes is mapped to a corresponding JSIR operation. Therefore, JSIR for straight-line code is equivalent to a post-order traversal dump of the AST.

For example, for the following JavaScript statements:

1 + 2 + 3;
4 * 5;

The corresponding AST is as follows (see astexplorer for the full AST):

[
  ExpressionStatement {
    expression: BinaryExpression {
      op: '+',
      left: BinaryExpression {
        op: '+',
        left: NumericLiteral { value: 1 },
        right: NumericLiteral { value: 2 }
      },
      right: NumericLiteral { value: 3 }
    }
  },
  ExpressionStatement {
    expression: BinaryExpression {
      op: '*',
      left: NumericLiteral { value: 4 },
      right: NumericLiteral { value: 5 }
    }
  },
]

The corresponding JSIR is as follows:

%1 = jsir.numeric_literal {1}
%2 = jsir.numeric_literal {2}
%1_plus_2 = jsir.binary_expression {'+'} (%1, %2)
%3 = jsir.numeric_literal {3}
%1_plus_2_plus_3 = jsir.binary_expression {'+'} (%1_plus_2, %3)
jsir.expression_statement (%1_plus_2_plus_3)
%4 = jsir.numeric_literal {4}
%5 = jsir.numeric_literal {5}
%4_mult_5 = jsir.binary_expression {'*'} (%4, %5)
jsir.expression_statement (%4_mult_5)

Perhaps the one-to-one mapping from AST nodes to JSIR operations is more obvious if we add some indentations:

      %1 = jsir.numeric_literal {1}
      %2 = jsir.numeric_literal {2}
    %1_plus_2 = jsir.binary_expression {'+'} (%1, %2)
    %3 = jsir.numeric_literal {3}
  %1_plus_2_plus_3 = jsir.binary_expression {'+'} (%1_plus_2, %3)
jsir.expression_statement (%1_plus_2_plus_3)
    %4 = jsir.numeric_literal {4}
    %5 = jsir.numeric_literal {5}
  %4_mult_5 = jsir.binary_expression {'*'} (%4, %5)
jsir.expression_statement (%4_mult_5)

To convert this IR back to the AST, we cannot treat each op as a separate statement, because that would cause every SSA value (e.g. %1) to become a local variable:

// Too many local variables!
var $1 = 1;
var $2 = 2;
var $1_plus_2 = $1 + $2;
var $3 = 3;
var $1_plus_2_plus_3 = $1_plus_2 + $3;
$1_plus_2_plus_3;  // jsir.expression_statement
var $4 = 4;
var $5 = 5;
var $4_mult_5 = $4 * $5;
$4_mult_5;  // jsir.expression_statement

However, we can detect the two statement-level ops (i.e. the two jsir.expression_statement ops) and recursively traverse their use-def chains:

   1 + 2 + 3 ;
// ~                 %1 = jsir.numeric_literal {1}
//     ~             %2 = jsir.numeric_literal {2}
// ~~~~~           %1_plus_2 = jsir.binary_expression {'+'} (%1, %2)
//         ~       %3 = jsir.numeric_literal {3}
// ~~~~~~~~~     %1_plus_2_plus_3 = jsir.binary_expression {'+'} (%1_plus_2, %3)
// ~~~~~~~~~~~ jsir.expression_statement (%1_plus_2_plus_3)

   4 * 5 ;
// ~               %4 = jsir.numeric_literal {4}
//     ~           %5 = jsir.numeric_literal {5}
// ~~~~~         %4_mult_5 = jsir.binary_expression {'*'} (%4, %5)
// ~~~~~~~     jsir.expression_statement (%4_mult_5)

When we try to convert a basic block (mlir::Block) of JSIR ops we always know ahead of time what “kind” of content it holds:

If the block holds a statement, then we find the single statement-level op and traverse its use-def chain to generate a JsStatement AST node.
If the block holds a list of statements, then we find all the statement-level ops and traverse their use-def chains to generate a list of JsStatement AST nodes.
If the block holds an expression, then it always ends with a jsir.expr_region_end (%expr) op. We traverse the use-def chain of %expr to generate a JsExpression AST node.
If the block holds a list of expressions, then it always ends with a jsir.exprs_region_end (%e1, %e2, ...) op. We traverse the use-def chains of %e1, %e2, ... to generate a list of JsExpression AST nodes.

Symbols, l-values and r-values

We distinguish between l-values and r-values in JSIR. For example, consider the following assignment:

a = b;

a is an l-value, and b is an r-value.

L-values and r-values are represented in the same way in the AST:

ExpressionStatement {
  expression: AssignmentExpression {
    left: Identifier {"a"},
    right: Identifier {"b"}
  }
}

However, they are represented differently in the IR:

%a_ref = jsir.identifier_ref {"a"}  // l-value
%b = jsir.identifier {"b"}          // r-value
%assign = jsir.assignment_expression (%a_ref, %b)
jsir.expression_statement (%assign)

The reason for this distinction is to explicitly represent the different semantic meanings:

An l-value is a reference to some object / some memory location;
An r-value is some value.

NOTE: We will likely revisit how we represent symbols.

Representing control flows

As mentioned above, JSIR seeks to have a nearly one-to-one mapping from the AST. Therefore, to preserve all information about the original control flow structures, we define a separate op for each control flow structure (e.g. jshir.if_statement, jshir.while_statement, etc.). The nested code blocks are represented as MLIR regions.

Example: `if`-statement

Consider the following if-statement:

if (cond)
  a;
else
  b;

Its corresponding AST is as follows
(astexplorer):

IfStatement {
  test: Identifier { name: "cond" },
  consequent: ExpressionStatement {
    expression: Identifier { name: "a" }
  },
  alternate: ExpressionStatement {
    expression: Identifier { name: "b" }
  }
}

And, its corresponding JSIR is as follows:

%cond = jsir.identifier {"cond"}
jshir.if_statement (%cond) ({
  %a = jsir.identifier {"a"}
  jsir.expression_statement (%a)
}, {
  %b = jsir.identifier {"b"}
  jsir.expression_statement (%b)
})

Since nested structure is fully preserved, converting JSIR back to the AST is achieved by a standard recursive traversal.

Example: `while`-statement

Consider the following while-statement:

while (cond())
  x++;

Its corresponding AST is as follows
(astexplorer):

WhileStatement {
  test: CallExpression {
    callee: Identifier { name: "cond" },
    arguments: []
  },
  body: ExpressionStatement {
    expression: UpdateExpression {
      operator: "++",
      prefix: false,
      argument: Identifier { name: "x" }
    }
  }
}

Its corresponding JSIR is as follows:

jshir.while_statement ({
  %cond_id = jsir.identifier {"cond"}
  %cond_call = jsir.call_expression (%cond_id)
  jsir.expr_region_end (%cond_call)
}, {
  %x_ref = jsir.identifier_ref {"x"}
  %update = jsir.update_expression {"++"} (%x_ref)
  jsir.expression_statement (%update)
})

Note that unlike jshir.if_statement, the condition in a jshir.while_statement is represented as a region rather than a normal SSA value (%cond). This is because the condition is evaluated in each iteration within the while-statement, whereas the condition is evaluated only once before the if-statement.

Example: logical expression

Consider the following statement with a logical expression:

x = a && b;

Its corresponding AST is as follows (astexplorer):

ExpressionStatement {
  expression: AssignmentExpression {
    left: Identifier { name: "x" },
    right: LogicalExpression {
      left: Identifier { name: "a" },
      right: Identifier { name: "b" }
    }
  }
}

Its corresponding JSIR is as follows:

%x_ref = jsir.identifier_ref {"x"}
%a = jsir.identifier {"a"}
%and = jshir.logical_expression (%a) ({
  %b = jsir.identifier {"b"}
  jsir.expr_region_end (%b)
})
%assign = jsir.assignment_expression (%x_ref, %and)
jsir.expression_statement (%assign)

Note that in jshir.logical_expression, left is an SSA value, and right is a region. This is because left is always evaluated first, whereas right is only evaluated if the result of left is truthy, and omitted if left is falsy due to the short-circuit behavior.

Dataflow analysis in JSIR

JSIR provides a dataflow analysis API, built on top of the upstream MLIR dataflow analysis API, with usability improvements:

We define a class JsirStateRef that encapsulates all writes to AnalysisStates, so that dependent WorkItems are automatically pushed to the worklist.

Benefit: Unlike the upstream MLIR API, the user never has to remember to
call propagateIfChanged().
We define base classes like JsirDataFlowAnalysis and JsirConditionalForwardDataFlowAnalysis for analyses that use both sparse (attached to mlir::Values) and dense (attached to mlir::ProgramPoints) states.

Benefit: Unlike the upstream MLIR API, the user does not have to write two analyses, one deriving SparseAnalysis and one deriving DenseAnalysis.
We define a struct JsirGeneralCfgEdge to unify branches between mlir::Blocks and region branches, including early exits (break and continue statements).

Benefit: Unlike the upstream MLIR API, the user does not need to load ConstantPropagation and DeadCodeAnalysis for every analysis.

Potential next steps

As we continue to improve and scale the impact of JSIR, there are several ideas that might be interesting to the MLIR community, and we are curious about your thoughts.

Adopt more MLIR built-in functionalities

Until now, we haven’t spent too much time trying to use MLIR’s built-in dialects, ops and functionalities. For example:

We could replace jsir.identifier and jsir.identifier_ref with memref.
We could use MLIR’s built-in symbol table. This is possible now since we adopt region-based control flow and scopes are mapped to regions.

Throughout this process, we will really battle-test these built-in functionalities and see how well they work for general purpose languages.

Contribute to MLIR region-based dataflow analysis

We believe that the ease-of-use improvements in JSIR’s dataflow analysis API can be upstreamed to MLIR’s built-in dataflow analysis API. A direct port is infeasible, since our API makes certain assumptions that are only true in JSIR, but the general ideas can be adopted. We hope to write a separate RFC to discuss these ideas in more detail.

Upstream JSIR?

We would be very happy to upstream JSIR into MLIR, similar to WasmSSA. However, there are several practical issues that might make this infeasible. We are eager to see what the community thinks.

Dependency on QuickJS: We use QuickJS for folding constants. This way, we don’t need to reimplement JavaScript semantics (e.g. looking at the ECMAScript spec, even a + b involves many steps due to automatic type conversions). We are not sure if adding a dependency on a lightweight JavaScript execution engine to the LLVM repository would be acceptable.
Dependency on Babel or SWC: JSIR doesn’t come with its own parser - we currently use Babel, and we are trying to migrate to SWC. Babel is written in TypeScript, and we currently run it in QuickJS from C++; SWC is written in Rust. We are not sure if it’s acceptable to add either of those as a dependency in the MLIR codebase.

Contribution welcomed!

We welcome engagement and contributions from the community! Feel free to try it out and let us know where and how we can improve. If you are interested in any of the ideas above, let us know!

Acknowledgement

JSIR couldn’t have been possible without the help with many contributors:

Alex Petit-Bianco
Andrii Bugaiov
Cheng Zhang
David Sklar
David Tao
Elijah Kin (UMD)
Elie Bursztein
Jacques Pienaar
Jeff Niu (now at OpenAI)
Jennifer Pullman
Jianan Yue
Luke Zielinski
Matias Scharager (CMU)
Md Jabir Hossain
Mehdi Amini (now at NVIDIA)
Pavel Petrenko (now at Aegis AI)
Roy Tu
Sajjad JJ Arshad
Shan Jiang (UT Austin)
Shuofei Zhu (Penn State)
Sruthi Bandhakavi
Victor Starenky (now at Lightspeed Commerce)
Vlad Stolyarov

… and many more!

conartist6 · April 8, 2026, 11:20am

How does this format represent comments?

magic-akari · April 8, 2026, 2:13pm

jsir’s repo is easy to try out. I just checked, and it’s mounted in a separate comments field in the file structure.

conartist6 · April 8, 2026, 3:37pm

The problem with that is that it isn’t incremental. If you add a space to the start of a 10K line file, it’s likely you’ll hang the IDE for several seconds if this data structure is in the core edit loop.

Topic		Replies	Views
[RFC] An MLIR based Clang IR (CIR) Clang Frontend	76	26408	July 29, 2023
Is that possible for CSA to analyze other languages e.g. Java、 JavaScript by transpiling them to clang compatible AST Static Analyzer	9	877	January 17, 2023
[RFC] Upstreaming ClangIR Clang Frontend	60	9645	February 12, 2025
LLVM Dev meeting: Slides & Minutes from the Static Analyzer BoF Clang Frontend	13	306	November 4, 2015
[analyzer][RFC] Get info from the LLVM IR for precision Static Analyzer	18	457	September 21, 2020