Yeah, on this note, first JSON decoding of multiple events can be trivially parallelized and second the rapidjson C++ lib is incredibly fast.
Finally, the biggest performance hit with this type of work is the heap allocator being used to allocate new memory for each object being decoded.
If you pre-allocate a bunch of space for the decoder output and use a format like rapidjson where the entire result uses one contiguous memory region it’ll be blazing fast without any need to make architectural changes to the protocol.