simdjson Contribution: Implement Pretty Print of DOM
Preface
Years ago, I was finding some projects related to SIMD for taking a break after making contributions to sse2neon. At the same time, I was working in a big tech company, and one of my colleague, a senior C++ developer, said the magic of simdjson, which can parse gigabytes of JSON per second 1.
In order to make some remarks in this world, I started to find whether there are any issues that I can take to sharpen my skills on SIMD and C++. Luckily, I spotted this issue: “Pretty print for DOM” 2, and I thought it was a great starting point.
A Brief Introduction to simdjson
simdjson is a C++ based library that parses gigabytes of JSON per second. It also provides a strict validation on JSON and UTF-8, and it applies a two stage algorithm which utilizes SIMD instructions 3. Because of its superior performance, it is widely adopted on many projects including the Node.js runtime.
What I’m Going to Do
- Create a indented JSON string.
- Performance priority #1.
The Evolution of My Implementation
At first, I implemented in an inheritance way:
class mini_formatter {
mini_formatter() = default;
virtual ~mini_formatter() = default;
};
class pretty_formatter : public mini_formatter {
pretty_formatter() {}
~pretty_formatter() {}
};
Suggested by a reviewer 4, I altered my implementation into a CRTP 5 solution:
template<class formatter>
class base_formatter {
simdjson_inline void call_print_newline() {
this->print_newline();
}
simdjson_inline void call_print_indents(size_t depth) {
this->print_indents(depth);
}
simdjson_inline void call_print_space() {
this->print_space();
}
};
class mini_formatter : public base_formatter<mini_formatter> {
public:
simdjson_inline void print_newline();
simdjson_inline void print_indents(size_t depth);
simdjson_inline void print_space();
};
class pretty_formatter : public base_formatter<pretty_formatter> {
public:
simdjson_inline void print_newline();
simdjson_inline void print_indents(size_t depth);
simdjson_inline void print_space();
protected:
int indent_step = 4;
};
After applying CRTP, my implementation achieved speed improvement up to 10x (from 2s to 0.2s) on the testing dataset.
For the entire PR, you can visit here: https://github.com/simdjson/simdjson/pull/2026.
Conclusion and Future Aspect
In this PR, I not only implemented the JSON indentation feature, but also proved that C++ templating can enhance performance.
For the future aspect, I am going to add the indentation spaces feature (i.e., user can set 3 space indentation).
References
-
https://github.com/simdjson/simdjson ↩
-
https://github.com/simdjson/simdjson/issues/1329 ↩
-
https://branchfree.org/2019/02/25/paper-parsing-gigabytes-of-json-per-second/ ↩
-
https://github.com/simdjson/simdjson/pull/2026#discussion_r1252196548 ↩
-
https://en.wikipedia.org/wiki/Curiously_recurring_template_pattern ↩