The per-Niagara-system cost of translucent shadows, smokey bullet impact spams, and emitter aggregation

E: I posted this on the UE feedback forums: Subpar Performance of Volumetric Translucent Shadows & Arbitrary Constraints - Feedback for Unreal Engine team - Unreal Engine Forums
It’d be great if this gets more votes!

(this post only concerns translucent shadows)

I noticed that spamming shadow-casting particle systems with one particle each will get the GPU cost of shadow projection (or shadow depth, depending on which anti-aliasing method you’re using) to skyrocket by several milliseconds. Unloading an entire 30 bullet magazine will take me from about 3 ms total GPU time to 8 or 9 ms.

I then noticed that if I spawned 30 particles in a single system, and shot a single bullet (so there are still 30 particles on the screen), the GPU cost is near-zero, yet shadows displayed perfectly!

I don’t know why, I suppose the explanation to this engine quirk is deep in the implementation details (engine has a substantial overhead per system for some reason). But this brings up important questions, such as how can this be used or worked around?

Can aggregating near-by particle emitters into the same particle system work? There are several concerns such as system visual bounds which need to stay low, etc.

I’m just thinking out loud here, but I’m also very interested in other perspectives - did you notice the same? Did you work around this issue or ignore it? Do you have a neat solution to reducing the per-system shadow cost for a use case similar to spamming bullet impact particles?

I don#t have shadows on for a single effect in our game due to the cost. And yes, instances will kill your performance. Second after that is amount of emitters.
Particles are dead cheap.

This is a shame, because the constraint is arbitrary. I really feel like this needs to be improved somehow. I don’t want to make this a CPP discussion, but isn’t VFX also about optimization, and it’s all CPP behind the curtains anyways isn’t it? I’ma think out loud and if anyone feels like tuning in inorder to fix this arbitrary constraint, please do.

Looking at the engine’s source code, inside ShadowSetup.cpp, the FProjectedShadowInfo::GatherDynamicMeshElementsArray method, some logging reveals that indeed, each system instance appears as its own FPrimitiveSceneProxy, and the loop inside it iterates those system instances. Each instance is then added as a separate primitive inside of the renderer’s mesh collector, which to my understanding, batches meshes of similar type, as code inside NiagaraRendererSprites.cpp reveals. Meshes are loaded into a FMeshBatch, whose documentation reads: “A batch of mesh elements, all with the same material and vertex buffer”

I assume each batch is then rendered in a single pass into the shadow buffer or whatever. Thus, more systems, more passes required. The ridiculous part about this is that at the end of the day our main tool is sprites, which have the same vertex buffer, and in most situations will have the same material as well.

What boggles me is why Epic didn’t squish batches or have some special case code to batch sprites of the same material? This is exactly the kind of thing that can make a game from feasible to impossible.

I might work up the courage to tinker with this and post a solution

I posted this on the UE feedback forums: Subpar Performance of Volumetric Translucent Shadows & Arbitrary Constraints - Feedback for Unreal Engine team - Unreal Engine Forums

It’d be great if this gets more votes!