Skip to content

Native Parser

相比 javascript, rust 原生语言与生俱来具有强大的性能表现。rollup 决定将由 javascript 侧的 acorn 解析器切换到 rust 侧的 swc 解析器,具备高效地解析复杂 ast 的能力,这也作为 rollup v4 的核心变化

Challenges

Native Interaction

直接使用 swcjavascript 引用,通过 swc.parsejavascript 接口来解析复杂 ast 会带来巨大的通讯开销。

ts
import swc from '@swc/core';

const code = `
  const a = 1;
  function add(a, b) {
    return a + b;
  }
`;
swc
  .parse(code, {
    syntax: 'ecmascript',
    comments: false,
    script: true,
    target: 'es3',
    isModule: false
  })
  .then(module => {
    module.type; // file type
    module.body; // AST
  });

通过 swc 源码可以发现 swc 在内部会使用 serde_json 库将解析完成的 program 对象序列化为 JSON 字符串,传递给 javascript 侧。

rust
#[napi]
impl Task for ParseTask {
  type JsValue = String;
  type Output = String;

  fn compute(&mut self) -> napi::Result<Self::Output> {
    let options: ParseOptions = deserialize_json(&self.options)?;
    let fm = self
      .c
      .cm
      .new_source_file(self.filename.clone().into(), self.src.clone());

    let comments = if options.comments {
      Some(self.c.comments() as &dyn Comments)
    } else {
      None
    };

    let program = try_with(self.c.cm.clone(), false, ErrorFormat::Normal, |handler| {
      let mut p = self.c.parse_js(
        fm,
        handler,
        options.target,
        options.syntax,
        options.is_module,
        comments,
      )?;

      p.visit_mut_with(&mut resolver(
        Mark::new(),
        Mark::new(),
        options.syntax.typescript(),
      ));

      Ok(p)
    })
    .convert_err()?;

    let ast_json = serde_json::to_string(&program)?;

    Ok(ast_json)
  }

  fn resolve(&mut self, _env: Env, result: Self::Output) -> napi::Result<Self::JsValue> {
    Ok(result)
  }
}

javascript 接口侧再通过 JSON.parse 反序列化原生解析器返回的 ast 字符串为 javascript 对象。

ts
class Compiler {
  async parse(
    src: string,
    options?: ParseOptions,
    filename?: string
  ): Promise<Program> {
    options = options || { syntax: 'ecmascript' };
    options.syntax = options.syntax || 'ecmascript';

    if (!bindings && !!fallbackBindings) {
      throw new Error(
        'Fallback bindings does not support this interface yet.'
      );
    } else if (!bindings) {
      throw new Error('Bindings not found.');
    }

    if (bindings) {
      const res = await bindings.parse(src, toBuffer(options), filename);
      return JSON.parse(res);
    } else if (fallbackBindings) {
      return fallbackBindings.parse(src, options);
    }
    throw new Error('Bindings not found.');
  }
}

rustjavascript 之间,反复的对 ast 进行 序列化(rust侧)反序列化(javascript侧),那么解析复杂 ast 时将几乎侵蚀了切换为原生解析器(rust)的性能优势。

Ast Compatibility

swc 即使是 estree compat 模块,它仍然是 babel ast,而不是 estree ast。但 rollup 依赖于标准的 estree ast

File Encoding

swc 使用 utf-8 编码,而 rollup 依赖于标准 javascriptutf-16 编码。

utf-8utf-16 是两种不同的字符编码方式,用于表示文本中的字符。它们的主要区别在于每个字符所占用的字节数和编码方式。

utf-8 与 utf-16 的区别

utf-8

可变长度编码:

utf-8 使用 1 ~ 4 个字节来表示一个字符。ascii 字符(例如英文字母和数字)均使用 1 个字节表示,而其他字符(例如汉字)可能使用 2 ~ 4 个字节。

  • 1 字节ascii 字符(U+0000U+007F)。
  • 2 字节:扩展拉丁文字符(U+0080U+07FF)。
  • 3 字节:基本多文种平面(BMP)字符(U+0800U+FFFF)。
  • 4 字节:辅助平面字符(U+10000U+10FFFF)。

向后兼容 ascii:

由于 ascii 字符在 utf-8 中只占用 1 个字节,utf-8ascii 编码完全兼容。

编码效率:

  • 对英语和 ASCII 文本效率高(每个字符 1 字节)。
  • 对非拉丁字符(如中文、日文等),通常需要 3 个字节。
  • 对辅助平面字符(如表情符号),需要 4 个字节。

适用场景:

  • 更适合网络传输和存储,尤其是以 ascii 为主的文本。
  • 常用于网页、json 文件等场景。

utf-16:

固定或可变长度编码:

utf-16 通常使用 2 个字节来表示大多数常用字符,但对于某些特殊字符(如表情符号),可能需要 4 个字节。

  • 2 字节BMP 范围内的字符(U+0000U+FFFF,除去代理对)。
  • 4 字节:超出 BMP 的字符(U+10000U+10FFFF),使用两个 16 位单元(称为代理对)。

不兼容 ascii:

UTF-16 不与 ascii 兼容,因为 ascii 字符在 UTF-16 中需要 2 个字节。但 utf-8utf-16 处理 ascii 的每一个字符均可视为一单位。

编码效率:

  • 对 BMP 范围内字符(如大部分中文、日文)效率较高(每个字符 2 字节)。
  • 对 ASCII 字符效率较低(每个字符 2 字节)。
  • 对辅助平面字符效率与 UTF-8 类似(需要 4 字节)。

适用场景:

  • 更适合内存操作,尤其是在以 BMP 范围字符为主的场景(如中文环境)。
  • 常用于 windowsjavascriptjava 等的内部字符表示。

示例假设:

对于字符串 A你 编码结果如下。

UTF-8编码:

"A"1 个字节,编码为 0x41

"你"3 个字节,编码为 0xE4BDA0

UTF-16编码:

"A"2 个字节,编码为 0x0041

"你"2 个字节,编码为 0x4F60

utf-8 的字符位置是基于字节的,而 utf-16 的字符位置是基于 2字节 的单位。

小结:

特性utf-8utf-16
编码长度1-4 字节2 或 4 字节
ascii 兼容性兼容不兼容
对 ASCII 文本效率高(1 字节/字符)低(2 字节/字符)
对非拉丁文本效率较低(3 字节/字符)较高(2 字节/字符)
字节序问题无需考虑需要 BOM 标记
使用场景网络协议、文件存储内存操作、大型文本处理

在处理文本时,utf-8utf-16 的选择会影响到文件的大小和 字符位置 的计算。那么这会影响到 ast 中的 字符位置 的确定,可以参考一下例子:

js
const info = '你好';

那么通过 babel astestree ast 规范解析的 ast字符位置 上会有所不同。

json
{
  "type": "Module",
  "span": {
    "start": 0,
    "end": 19,
    "ctxt": 0
  },
  "body": [
    {
      "type": "VariableDeclaration",
      "span": {
        "start": 0,
        "end": 19,
        "ctxt": 0
      },
      "kind": "const",
      "declare": false,
      "declarations": [
        {
          "type": "VariableDeclarator",
          "span": {
            "start": 6,
            "end": 18,
            "ctxt": 0
          },
          "id": {
            "type": "Identifier",
            "span": {
              "start": 6,
              "end": 7,
              "ctxt": 0
            },
            "value": "a",
            "optional": false,
            "typeAnnotation": null
          },
          "init": {
            "type": "StringLiteral",
            "span": {
              "start": 10,
              "end": 18,
              "ctxt": 0
            },
            "value": "你好",
            "hasEscape": false,
            "kind": {
              "type": "normal",
              "containsQuote": true
            }
          },
          "definite": false
        }
      ]
    }
  ],
  "interpreter": null
}
json
{
  "type": "Program",
  "start": 0,
  "end": 15,
  "body": [
    {
      "type": "VariableDeclaration",
      "start": 0,
      "end": 15,
      "declarations": [
        {
          "type": "VariableDeclarator",
          "start": 6,
          "end": 14,
          "id": {
            "type": "Identifier",
            "start": 6,
            "end": 7,
            "name": "a"
          },
          "init": {
            "type": "Literal",
            "start": 10,
            "end": 14,
            "value": "你好",
            "raw": "\"你好\""
          }
        }
      ],
      "kind": "const"
    }
  ],
  "sourceType": "module"
}

可以发现两种不同规范的 ast 在处理特殊字符时,由于编码方式不同,所解析的 ast node 位置存在差异化。其中 babel ast tree 解析 utf-8 编码过的 你好 字面量的 ast node 位置区间为 [10, 18),而 estree ast tree 解析的经 utf-16 编码过的字面量的 ast node 位置区间为 [10, 14)

source map 这一章节详细说明了 rollup 内部是如何生成 sourcemap,其中 rollup 会依赖 estree ast 提供的 位置信息映射标记

ts
export class NodeBase extends ExpressionEntity implements ExpressionNode {
  /**
   * Override to perform special initialisation steps after the scope is
   * initialised
   */
  initialise(): void {
    this.scope.context.magicString.addSourcemapLocation(this.start);
    this.scope.context.magicString.addSourcemapLocation(this.end);
  }
}

因此,编码不一样会导致最终 rollup 生成的 sourcemap 发生严重偏移。

Performance

Optimize Ast Compatibility

rust 侧借助 swc 的能力将代码解析为 babel ast

rust
use swc_compiler_base::parse_js;

pub fn parse_ast(code: String, allow_return_outside_function: bool, jsx: bool) -> Vec<u8> {
  // 省略其他代码
  GLOBALS.set(&Globals::default(), || {
    let result = catch_unwind(AssertUnwindSafe(|| {
      let result = try_with_handler(&code_reference, |handler| {
        parse_js(
          cm,
          file,
          handler,
          target,
          syntax,
          IsModule::Unknown,
          Some(&comments),
        )
      });
      match result {
        Err(buffer) => buffer,
        Ok(program) => {
          let annotations = comments.take_annotations();
          let converter = AstConverter::new(&code_reference, &annotations);
          converter.convert_ast_to_buffer(&program)
        }
      }
    }));
    result.unwrap_or_else(|err| {
      let msg = if let Some(msg) = err.downcast_ref::<&str>() {
        msg
      } else if let Some(msg) = err.downcast_ref::<String>() {
        msg
      } else {
        "Unknown rust panic message"
      };
      get_panic_error_buffer(msg)
    })
  })
}

通过 converter.convert_ast_to_buffer(&program) 方法递归解析经 swc 解析完成的 babel ast 树,重新计算 babel ast node 节点位置信息所对应的 estree ast node 的位置信息

rust
/// Converts the given UTF-8 byte index to a UTF-16 byte index.
///
/// To be performant, this method assumes that the given index is not smaller
/// than the previous index. Additionally, it handles "annotations" like
/// `@__PURE__` comments in the process.
///
/// The logic for those comments is as follows:
/// - If the current index is at the start of an annotation, the annotation
///   is collected and the index is advanced to the end of the annotation.
/// - Otherwise, we check if the next character is a white-space character.
///   If not, we invalidate all collected annotations.
///   This is to ensure that we only collect annotations that directly precede
///   an expression and are not e.g. separated by a comma.
/// - If annotations are relevant for an expression, it can "take" the
///   collected annotations by calling `take_collected_annotations`. This
///   clears the internal buffer and returns the collected annotations.
/// - Invalidated annotations are attached to the Program node so that they
///   can all be removed from the source code later.
/// - If an annotation can influence a child that is separated by some
///   non-whitespace from the annotation, `keep_annotations_for_next` will
///   prevent annotations from being invalidated when the next position is
///   converted.
pub(crate) fn convert(&mut self, utf8_index: u32, keep_annotations_for_next: bool) -> u32 {
  if self.current_utf8_index > utf8_index {
    panic!(
      "Cannot convert positions backwards: {} < {}",
      utf8_index, self.current_utf8_index
    );
  }
  while self.current_utf8_index < utf8_index {
    if self.current_utf8_index == self.next_annotation_start {
      let start = self.current_utf16_index;
      let (next_comment_end, next_comment_kind) = self
        .next_annotation
        .map(|a| (a.comment.span.hi.0 - 1, a.kind.clone()))
        .unwrap();
      while self.current_utf8_index < next_comment_end {
        let character = self.character_iterator.next().unwrap();
        self.current_utf8_index += character.len_utf8() as u32;
        self.current_utf16_index += character.len_utf16() as u32;
      }
      if let Annotation(kind) = next_comment_kind {
        self.collected_annotations.push(ConvertedAnnotation {
          start,
          end: self.current_utf16_index,
          kind,
        });
      }
      self.next_annotation = self.annotation_iterator.next();
      self.next_annotation_start = get_annotation_start(self.next_annotation);
    } else {
      let character = self.character_iterator.next().unwrap();
      if !(self.keep_annotations || self.collected_annotations.is_empty()) {
        match character {
          ' ' | '\t' | '\r' | '\n' => {}
          _ => {
            self.invalidate_collected_annotations();
          }
        }
      }
      self.current_utf8_index += character.len_utf8() as u32;
      self.current_utf16_index += character.len_utf16() as u32;
    }
  }
  self.keep_annotations = keep_annotations_for_next;
  self.current_utf16_index
}

同时还需要收集 estree ast node 结构所需的信息。

rust
pub(crate) fn convert_statement(&mut self, statement: &Stmt) {
  match statement {
    Stmt::Break(break_statement) => self.store_break_statement(break_statement),
    Stmt::Block(block_statement) => self.store_block_statement(block_statement, false),
    Stmt::Continue(continue_statement) => self.store_continue_statement(continue_statement),
    Stmt::Decl(declaration) => self.convert_declaration(declaration),
    Stmt::Debugger(debugger_statement) => self.store_debugger_statement(debugger_statement),
    Stmt::DoWhile(do_while_statement) => self.store_do_while_statement(do_while_statement),
    Stmt::Empty(empty_statement) => self.store_empty_statement(empty_statement),
    Stmt::Expr(expression_statement) => self.store_expression_statement(expression_statement),
    Stmt::For(for_statement) => self.store_for_statement(for_statement),
    Stmt::ForIn(for_in_statement) => self.store_for_in_statement(for_in_statement),
    Stmt::ForOf(for_of_statement) => self.store_for_of_statement(for_of_statement),
    Stmt::If(if_statement) => self.store_if_statement(if_statement),
    Stmt::Labeled(labeled_statement) => self.store_labeled_statement(labeled_statement),
    Stmt::Return(return_statement) => self.store_return_statement(return_statement),
    Stmt::Switch(switch_statement) => self.store_switch_statement(switch_statement),
    Stmt::Throw(throw_statement) => self.store_throw_statement(throw_statement),
    Stmt::Try(try_statement) => self.store_try_statement(try_statement),
    Stmt::While(while_statement) => self.store_while_statement(while_statement),
    Stmt::With(_) => unimplemented!("Cannot convert Stmt::With"),
  }
}

通过 babel ast node 的结构获取 estree ast node 所需的信息,使用 utf-16 编码方式重新计算 estree ast 规范下的 位置信息

rust
pub(crate) fn convert_item_list_with_state<T, S, F>(
    &mut self,
    item_list: &[T],
    state: &mut S,
    reference_position: usize,
    convert_item: F,
  ) where
  F: Fn(&mut AstConverter, &T, &mut S) -> bool,
{
  // for an empty list, we leave the referenced position at zero
  if item_list.is_empty() {
    return;
  }
  self.update_reference_position(reference_position);
  // store number of items in first position
  self
    .buffer
    .extend_from_slice(&(item_list.len() as u32).to_ne_bytes());
  let mut reference_position = self.buffer.len();
  // make room for the reference positions of the items
  self
    .buffer
    .resize(self.buffer.len() + item_list.len() * 4, 0);
  for item in item_list {
    let insert_position = (self.buffer.len() as u32) >> 2;
    if convert_item(self, item, state) {
      self.buffer[reference_position..reference_position + 4]
        .copy_from_slice(&insert_position.to_ne_bytes());
    }
    reference_position += 4;
  }
}

当然其中也会对 comments 节点做收集,为后续 rolluptree shaking 做准备。需要注意的是,babel ast 规范是包含 comments 节点的,而 estree ast 规范是不包含 comments 节点的。但 comments 节点的信息对于 rolluptree shaking 至关重要,可以增强 tree shaking 的能力。

rollup 会收集这些注释信息在 estree ast 中,并通过 _rollupAnnotations 属性进行存储。也就是说,最终返回的 estree ast 是包含 _rollupAnnotations 属性的,其结构是兼容 estree ast 结构的。

rust
pub(crate) fn take_collected_annotations(
  &mut self,
  kind: AnnotationKind,
) -> Vec<ConvertedAnnotation> {
  let mut relevant_annotations = Vec::new();
  for annotation in self.collected_annotations.drain(..) {
    if annotation.kind == kind {
      relevant_annotations.push(annotation);
    } else {
      self.invalid_annotations.push(annotation);
    }
  }
  relevant_annotations
}
impl<'a> AstConverter<'a> {
  pub(crate) fn store_call_expression(
    &mut self,
    span: &Span,
    is_optional: bool,
    callee: &StoredCallee,
    arguments: &[ExprOrSpread],
    is_chained: bool,
  ) {
  // annotations
  let annotations = self
    .index_converter
    .take_collected_annotations(AnnotationKind::Pure);
}
impl SequentialComments {
  pub(crate) fn add_comment(&self, comment: Comment) {
    if comment.text.starts_with('#') && comment.text.contains("sourceMappingURL=") {
      self.annotations.borrow_mut().push(AnnotationWithType {
        comment,
        kind: CommentKind::Annotation(AnnotationKind::SourceMappingUrl),
      });
      return;
    }
    let mut search_position = comment
      .text
      .chars()
      .nth(0)
      .map(|first_char| first_char.len_utf8())
      .unwrap_or(0);
    while let Some(Some(match_position)) = comment.text.get(search_position..).map(|s| s.find("__"))
    {
      search_position += match_position;
      // Using a byte reference avoids UTF8 character boundary checks
      match &comment.text.as_bytes()[search_position - 1] {
        b'@' | b'#' => {
          let annotation_slice = &comment.text[search_position..];
          if annotation_slice.starts_with("__PURE__") {
            self.annotations.borrow_mut().push(AnnotationWithType {
              comment,
              kind: CommentKind::Annotation(AnnotationKind::Pure),
            });
            return;
          }
          if annotation_slice.starts_with("__NO_SIDE_EFFECTS__") {
            self.annotations.borrow_mut().push(AnnotationWithType {
              comment,
              kind: CommentKind::Annotation(AnnotationKind::NoSideEffects),
            });
            return;
          }
        }
        _ => {}
      }
      search_position += 2;
    }
    self.annotations.borrow_mut().push(AnnotationWithType {
      comment,
      kind: CommentKind::Comment,
    });
  }

  pub(crate) fn take_annotations(self) -> Vec<AnnotationWithType> {
    self.annotations.take()
  }
}

最后返回给 rollup 侧是兼容 estree astarraybuffer 结构,rollup 侧则需引导解析 arraybuffer 的兼容 estree ast 结构来实例化 rollup 内部实现的 ast class node

ts
export default class Module {
  async setSource({
    ast,
    code,
    customTransformCache,
    originalCode,
    originalSourcemap,
    resolvedIds,
    sourcemapChain,
    transformDependencies,
    transformFiles,
    ...moduleOptions
  }: TransformModuleJSON & {
    resolvedIds?: ResolvedIdMap;
    transformFiles?: EmittedFile[] | undefined;
  }): Promise<void> {
    // Measuring asynchronous code does not provide reasonable results
    timeEnd('generate ast', 3);
    const astBuffer = await parseAsync(
      code,
      false,
      this.options.jsx !== false
    );
    timeStart('generate ast', 3);
    this.ast = convertProgram(astBuffer, programParent, this.scope);
  }
}

rollupbuffer 层面的引导方式

ts
function convertNode(
  parent: Node | { context: AstContext; type: string },
  parentScope: ChildScope,
  position: number,
  buffer: AstBuffer
): any {
  const nodeType = buffer[position];
  const NodeConstructor = nodeConstructors[nodeType];
  /* istanbul ignore if: This should never be executed but is a safeguard against faulty buffers */
  if (!NodeConstructor) {
    console.trace();
    throw new Error(`Unknown node type: ${nodeType}`);
  }
  const node = new NodeConstructor(parent, parentScope);
  node.type = nodeTypeStrings[nodeType];
  node.start = buffer[position + 1];
  node.end = buffer[position + 2];
  bufferParsers[nodeType](node, position + 3, buffer);
  node.initialise();
  return node;
}

Optimize Native Interaction

有上述可知,直接使用 swc 暴露的 javascript 引用会在 rustjavascript 之间进行反复 序列化反序列化 ast 的操作。在处理复杂的 ast 时,解析效率几乎侵蚀了切换为原生解析器(rust)的性能优势。解决方案如下:

采用 arraybuffer 来做 rustjavascript 之间传输解析完成的 ast

不考虑使用 swcjavascript 引用,而是直接在 rust 侧使用 swcrust 侧引用。

rust
use swc_compiler_base::parse_js;

pub fn parse_ast(code: String, allow_return_outside_function: bool, jsx: bool) -> Vec<u8> {
  GLOBALS.set(&Globals::default(), || {
    let result = catch_unwind(AssertUnwindSafe(|| {
      let result = try_with_handler(&code_reference, |handler| {
        parse_js(
          cm,
          file,
          handler,
          target,
          syntax,
          IsModule::Unknown,
          Some(&comments),
        )
      });
      match result {
        Err(buffer) => buffer,
        Ok(program) => {
          let annotations = comments.take_annotations();
          let converter = AstConverter::new(&code_reference, &annotations);
          converter.convert_ast_to_buffer(&program)
        }
      }
    }));
  });
}

同时 rollup 会在 rust 侧将 swc 解析完成的 babel ast 转换为兼容 estree ast二进制格式,然后将其作为 (数组)缓冲区 传递给 javascript

rust
match result {
  Err(buffer) => buffer,
  Ok(program) => {
    let annotations = comments.take_annotations();
    let converter = AstConverter::new(&code_reference, &annotations);
    converter.convert_ast_to_buffer(&program)
  }
}

传递 arraybuffer 基本上是一个无损耗的操作,所以我们只需要教 javascript 侧如何运转 arraybuffer 即可。此外,arraybuffer 的大小只有字符串化 json 的三分之一左右。最后,这将使我们能够轻松地将 arraybuffer 数据格式的 ast 传递给不同的线程,例如可以在 WebWorker 中进行解析,解析完成后将 arraybuffer 数据格式的 ast 无损地传递给主线程。

nodejs 侧使用 napi-rsrust 代码交互,浏览器端采用 wasm-pack 来进行构建。

Optimize Syntax Analysis

rust 侧直接调用 swc 提供的 use swc_compiler_base::parse_js 并不会执行语法分析,也就是说以下代码在 swc 中可以正常解析为 babel ast

js
const a = 1;
const a = 2;

这与 acorn 的解析方式不一样,acorn 在生成 ast 时会做语法分析。对于上述例子会提供如下 报错信息

js
while (this.type !== tt.braceR) {
  const element = this.parseClassElement(node.superClass !== null);
  if (element) {
    classBody.body.push(element);
    if (
      element.type === 'MethodDefinition' &&
      element.kind === 'constructor'
    ) {
      if (hadConstructor)
        this.raiseRecoverable(
          element.start,
          'Duplicate constructor in the same class'
        );
      hadConstructor = true;
    } else if (
      element.key &&
      element.key.type === 'PrivateIdentifier' &&
      isPrivateNameConflicted(privateNameMap, element)
    ) {
      this.raiseRecoverable(
        element.key.start,
        `Identifier '#${element.key.name}' has already been declared`
      );
    }
  }
}

报错提示

Line 2: Identifier 'a' has already been declared.

因此 rollup 需要借助 swc_ecma_lints 的能力来实现语法分析。

rust
use swc_ecma_lints::{rule::Rule, rules, rules::LintParams};

let result = HANDLER.set(&handler, || op(&handler));

match result {
  Ok(mut program) => {
    let unresolved_mark = Mark::new();
    let top_level_mark = Mark::new();
    let unresolved_ctxt = SyntaxContext::empty().apply_mark(unresolved_mark);
    let top_level_ctxt = SyntaxContext::empty().apply_mark(top_level_mark);

    program.visit_mut_with(&mut resolver(unresolved_mark, top_level_mark, false));

    let mut rules = rules::all(LintParams {
      program: &program,
      lint_config: &Default::default(),
      unresolved_ctxt,
      top_level_ctxt,
      es_version,
      source_map: cm.clone(),
    });

    HANDLER.set(&handler, || match &program {
      Program::Module(m) => {
        rules.lint_module(m);
      }
      Program::Script(s) => {
        rules.lint_script(s);
      }
    });

    if handler.has_errors() {
      let buffer = create_error_buffer(&wr, code);
      Err(buffer)
    } else {
      Ok(program)
    }
  }
}

但是从以下 PR讨论 中可知

经测试发现通过 swc_ecma_lints 检测的效率并不是很高。

为了优化这个问题,rollup 原生解析器中,暂时决定在 rust 侧还未实现 作用域分析 前,rollup 移除了在 rust 侧对 ast 执行语法分析。

rust
let result = HANDLER.set(&handler, || op(&handler));

match result { 
  Ok(mut program) => { 
    let unresolved_mark = Mark::new(); 
    let top_level_mark = Mark::new(); 
    let unresolved_ctxt = SyntaxContext::empty().apply_mark(unresolved_mark); 
    let top_level_ctxt = SyntaxContext::empty().apply_mark(top_level_mark); 
    program.visit_mut_with(&mut resolver(unresolved_mark, top_level_mark, false)); 
    let mut rules = rules::all(LintParams { 
      program: &program, 
      lint_config: &Default::default(), 
      unresolved_ctxt, 
      top_level_ctxt, 
      es_version, 
      source_map: cm.clone(), 
    }); 
    HANDLER.set(&handler, || match &program { 
      Program::Module(m) => { 
        rules.lint_module(m); 
      } 
      Program::Script(s) => { 
        rules.lint_script(s); 
      } 
    }); 
    if handler.has_errors() { 
      let buffer = create_error_buffer(&wr, code); 
      Err(buffer) 
    } else { 
      Ok(program) 
    } 
  } 
} 
result.map_err(|_| { 
  if handler.has_errors() { 
    create_error_buffer(&wr, code) 
  } else { 
    panic!("Unexpected error in parse") 
  } 
}) 

语法分析 的任务交付给 javascript 侧做处理。

rollup 会在实例化 ast class node 的回溯阶段时,执行 语法分析。经测试,javascript 侧执行 语法分析 的效率比借助 swc_ecma_lints 高,可见 语法分析rollup 的性能影响不大。

语法分析的任务主要包含如下几个方面:

  1. const_assign

    例子:

    ts
    export default class AssignmentExpression extends NodeBase {
      initialise(): void {
        super.initialise();
        if (this.left instanceof Identifier) {
          const variable = this.scope.variables.get(this.left.name);
          if (variable?.kind === 'const') {
            this.scope.context.error(
              logConstVariableReassignError(),
              this.left.start
            );
          }
        }
        this.left.setAssignedValue(this.right);
      }
    }
  2. duplicate_bindings

    ts
    export function logRedeclarationError(name: string): RollupLog {
      return {
        code: REDECLARATION_ERROR,
        message: `Identifier "${name}" has already been declared`
      };
    }
    export default class Module {
      private addImport(node: ImportDeclaration): void {
        const source = node.source.value;
        this.addSource(source, node);
    
        for (const specifier of node.specifiers) {
          const localName = specifier.local.name;
          if (
            this.scope.variables.has(localName) ||
            this.importDescriptions.has(localName)
          ) {
            this.error(
              logRedeclarationError(localName),
              specifier.local.start
            );
          }
    
          const name =
            specifier instanceof ImportDefaultSpecifier
              ? 'default'
              : specifier instanceof ImportNamespaceSpecifier
                ? '*'
                : specifier.imported instanceof Identifier
                  ? specifier.imported.name
                  : specifier.imported.value;
          this.importDescriptions.set(localName, {
            module: null as never, // filled in later
            name,
            source,
            start: specifier.start
          });
        }
      }
    }
  3. duplicate_exports

    ts
    export default class Module {
      private assertUniqueExportName(name: string, nodeStart: number) {
        if (this.exports.has(name) || this.reexportDescriptions.has(name)) {
          this.error(logDuplicateExportError(name), nodeStart);
        }
      }
    }
  4. no_dupe_args

    ts
    export default class ParameterScope extends ChildScope {
      /**
       * Adds a parameter to this scope. Parameters must be added in the correct
       * order, i.e. from left to right.
       */
      addParameterDeclaration(
        identifier: Identifier,
        argumentPath: ObjectPath
      ): ParameterVariable {
        const { name, start } = identifier;
        const existingParameter = this.variables.get(name);
        if (existingParameter) {
          return this.context.error(
            logDuplicateArgumentNameError(name),
            start
          );
        }
        const variable = new ParameterVariable(
          name,
          identifier,
          argumentPath,
          this.context
        );
        this.variables.set(name, variable);
        // We also add it to the body scope to detect name conflicts with local
        // variables. We still need the intermediate scope, though, as parameter
        // defaults are NOT taken from the body scope but from the parameters or
        // outside scope.
        this.bodyScope.addHoistedVariable(name, variable);
        return variable;
      }
    }

可以发现 语法分析 阶段是十分依赖于模块的作用域,后续 rolluprust 侧实现了 作用域分析 后,再将 语法分析 的任务交付给 rust 侧做处理。

Optimize Ast Parsing

rollup 为插件上下文提供 this.parser 让用户插件使用原生 swc 的能力来解析 codeast。插件会复用用户解析的 ast 树。

即使有了原生的解析能力,但原生生成复杂的 ast 依旧需要耗费时间。在 watch 模式下,rollup缓存(详情可见 Rollup Incremental Build 一节) estree ast 来跳过原生 swc 解析 ast 的过程,递归 estree ast 的结构来实例化 rollup 内部的 ast class node

Performance Comparison

测试了下 rollup4.28.13.29.5 版本下的解析能力对比,其中

4.28.1 版本下是采用原生 swc 解析 ast,从 rust 侧通过 arraybuffer 格式将 compatible estree ast 传递给 javascript 侧。

3.29.5 版本下是采用 acorn 解析 ast

每组各测试 5 次取平均值。

Code Length (Character)SWC Avg. Time (ms)Acorn Avg. Time (ms)
312,37313.4773.92
624,74621.7883.80
1,249,49236.03124.82
2,498,98468.88182.45
4,997,968136.52272.53
9,995,936266.87608.72
19,991,872578.001178.82
159,934,9764155.647276.24
319,869,95210081.40-

经测试发现当解析的字符量达到 319,869,952 时,acorn 解析 ast 会报错。

bash
<--- Last few GCs --->

[69821:0x120078000]    15364 ms: Mark-sweep 4062.9 (4143.2) -> 4059.0 (4143.2) MB, 703.2 / 0.0 ms  (average mu = 0.293, current mu = 0.102) allocation failure; scavenge might not succeed
[69821:0x120078000]    16770 ms: Mark-sweep 4075.3 (4143.2) -> 4071.5 (4169.0) MB, 1383.5 / 0.0 ms  (average mu = 0.143, current mu = 0.016) allocation failure; scavenge might not succeed


<--- JS stacktrace --->

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory

从测试结果来看,切换到原生解析器的解析速度明显快于 acorn

  1. 整体表现:

    • 使用原生解析器(内置 swc)的平均解析时间相对较短,且随着代码长度的增加,增速较为缓和。

    • 使用非原生解析器(内置 acorn)的解析时间在大代码量时增速显著,表现出较高的性能开销。

  2. 数据对比:

    • 小代码量(312,373 字符):差距较为明显,约为 5.5 倍(13.47 ms vs 73.92 ms)。
    • 中代码量(9,995,936 字符):差距约为 2.28 倍(266.87 ms vs 608.72 ms)。
    • 大代码量(159,934,976 字符):差距为 1.75 倍(4155.64 ms vs 7276.24 ms)。

    Module Character Quantity Concept

    moduleCode Length (Character)
    rollup.js312,373
  3. 趋势分析:

    • 使用原生解析器(内置 swc)的解析时间增长幅度较小,适合更大规模模块的解析需求。
    • 使用非原生解析器(内置 acorn)的解析时间增长幅度较大,在超大模块解析效率明显不足。

Discuss

Released under the MIT License. (ee0e562)