JVM Settings¶

I'm using ORACLE GraalVM, the enterprise grade one. Finding the download sucks. GraalVM 21.0.7 download

You can swap in whatever the current version is into that download URL and it should work.

I have built up the following set of arguments myself.

-Xms12G -Xmx12G -XX:SoftMaxHeapSize=11G -XX:ConcGCThreads=12 -XX:ParallelGCThreads=4 -XX:ReservedCodeCacheSize=400M -XX:ProfiledCodeHeapSize=194M -XX:NonProfiledCodeHeapSize=194M -XX:NonNMethodCodeHeapSize=12M -XX:MaxNodeLimit=240000 -XX:NodeLimitFudgeFactor=8000 -XX:InlineSmallCode=1000 -XX:FreqInlineSize=100 -XX:NmethodSweepActivity=1  -XX:ThreadPriorityPolicy=1 -XX:UseAVX=2 -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+UseZGC -XX:+ZUncommit -XX:+ZProactive -XX:+DisableExplicitGC -XX:+UseDynamicNumberOfGCThreads -XX:+ParallelRefProcEnabled -XX:+AlwaysPreTouch -XX:AllocatePrefetchStyle=3 -XX:AllocatePrefetchDistance=512 -XX:AllocatePrefetchLines=4 -XX:AllocatePrefetchStepSize=128 -XX:AllocateInstancePrefetchLines=2 -XX:AllocatePrefetchInstr=1 -XX:+PerfDisableSharedMem -XX:+OptimizeFill -XX:+OptimizeStringConcat -XX:+UseCodeCacheFlushing -XX:+UseOnStackReplacement -XX:+UseStringDeduplication -XX:+UseLoopPredicate -XX:+UseCharacterCompareIntrinsics -XX:+UseCopySignIntrinsic -XX:+UseFastUnorderedTimeStamps -XX:+UseCriticalJavaThreadPriority -XX:+UseInlineCaches -XX:+TrustFinalNonStaticFields -XX:+EnableVectorSupport -XX:+EnableVectorAggressiveReboxing -XX:+UseVectorStubs -XX:+UseVectorCmov -XX:+UseFMA -XX:+UseUnalignedAccesses -XX:+AlignVector -XX:-UseSubwordForMaxVector -XX:-SuperWordLoopUnrollAnalysis -XX:+AlwaysActAsServerClassMachine -XX:+TieredCompilation -XX:TieredStopAtLevel=4 -XX:+AlwaysCompileLoopMethods -XX:-DontCompileHugeMethods -XX:+OptoScheduling -XX:+OptoBundling -XX:+UseNUMA -XX:+UseJVMCICompiler -XX:+EnableJVMCIProduct -XX:+EagerJVMCI -Djdk.graal.CompilerConfiguration=enterprise -Djdk.graal.AlwaysInlineIntrinsics=true -Djdk.graal.InlineGraalStubs=true -Djdk.graal.LoopVectorizationKeepPostLoop=true -Djdk.graal.SIMDVectorizationDirectLoadStore=true -Djdk.graal.SIMDVectorizationSingletons=true -Djdk.graal.UnrollEmptyLoops=true -Djdk.graal.FullUnroll=false -Djdk.graal.FullUnrollCodeSizeBudgetFactorForSmallGraphs=2.0 -Djdk.graal.GraphCompressionThreshold=70 -Djdk.graal.VectorizeSIMD=true -Djdk.graal.OptEliminatePartiallyRedundantGuards=true -Djdk.graal.ExplicitNullChecks=true -Djdk.graal.EnableEscapeAnalysis=true -Dgraal.TuneInlinerExploration=1 -Dgraal.LoopRotation=true -Dgraal.OptWriteMotion=true -Dgraal.CompilerConfiguration=enterprise --add-modules=jdk.incubator.vector -XX:-CreateCoredumpOnCrash -XX:+DoEscapeAnalysis -XX:MaxInlineSize=96 -XX:MaxInlineLevel=20 -XX:MaxRecursiveInlineLevel=2 -XX:LiveNodeCountInliningCutoff=40000 -XX:LoopUnrollLimit=60 -XX:LoopUnrollMin=4 -XX:+InlineObjectHash -XX:+InlineThreadNatives -XX:+InlineUnsafeOps -XX:+OptimizeExpensiveOps -XX:+SpecialStringEquals -XX:+SpecialArraysEquals -XX:+UseMathExactIntrinsics -XX:+UseMulAddIntrinsic -XX:+UseSquareToLenIntrinsic -XX:+UseVectorizedMismatchIntrinsic -XX:+OmitStackTraceInFastThrow -XX:+RewriteBytecodes -XX:+RewriteFrequentPairs -XX:+UseFastJNIAccessors -XX:+UseFPUForSpilling -XX:+SegmentedCodeCache -XX:+UseThreadPriorities

Here are the same settings scaled up for big ol' modded server use on an epyc 9654. NOTE: chunk gen throughput etc. might be a bit lower than other arg sets, but core items like load-in and tps max etc. should be superb.

-Xms32G -Xmx32G -XX:SoftMaxHeapSize=28G -XX:ConcGCThreads=64 -XX:ParallelGCThreads=24 -XX:ReservedCodeCacheSize=800M -XX:ProfiledCodeHeapSize=390M -XX:NonProfiledCodeHeapSize=390M -XX:NonNMethodCodeHeapSize=20M -XX:MaxNodeLimit=480000 -XX:NodeLimitFudgeFactor=16000 -XX:InlineSmallCode=2000 -XX:FreqInlineSize=100 -XX:NmethodSweepActivity=1  -XX:ThreadPriorityPolicy=1 -XX:UseAVX=2 -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+UseZGC -XX:+ZUncommit -XX:+ZProactive -XX:+DisableExplicitGC -XX:+UseDynamicNumberOfGCThreads -XX:+ParallelRefProcEnabled -XX:+AlwaysPreTouch -XX:AllocatePrefetchStyle=3 -XX:AllocatePrefetchDistance=512 -XX:AllocatePrefetchLines=4 -XX:AllocatePrefetchStepSize=128 -XX:AllocateInstancePrefetchLines=2 -XX:AllocatePrefetchInstr=1 -XX:+PerfDisableSharedMem -XX:+OptimizeFill -XX:+OptimizeStringConcat -XX:+UseCodeCacheFlushing -XX:+UseOnStackReplacement -XX:+UseStringDeduplication -XX:+UseLoopPredicate -XX:+UseCharacterCompareIntrinsics -XX:+UseCopySignIntrinsic -XX:+UseFastUnorderedTimeStamps -XX:+UseCriticalJavaThreadPriority -XX:+UseInlineCaches -XX:+TrustFinalNonStaticFields -XX:+EnableVectorSupport -XX:+EnableVectorAggressiveReboxing -XX:+UseVectorStubs -XX:+UseVectorCmov -XX:+UseFMA -XX:+UseUnalignedAccesses -XX:+AlignVector -XX:-UseSubwordForMaxVector -XX:-SuperWordLoopUnrollAnalysis -XX:+AlwaysActAsServerClassMachine -XX:+TieredCompilation -XX:TieredStopAtLevel=4 -XX:+AlwaysCompileLoopMethods -XX:-DontCompileHugeMethods -XX:+OptoScheduling -XX:+OptoBundling -XX:+UseNUMA -XX:+UseJVMCICompiler -XX:+EnableJVMCIProduct -XX:+EagerJVMCI -Djdk.graal.CompilerConfiguration=enterprise -Djdk.graal.AlwaysInlineIntrinsics=true -Djdk.graal.InlineGraalStubs=true -Djdk.graal.LoopVectorizationKeepPostLoop=true -Djdk.graal.SIMDVectorizationDirectLoadStore=true -Djdk.graal.SIMDVectorizationSingletons=true -Djdk.graal.UnrollEmptyLoops=true -Djdk.graal.FullUnroll=false -Djdk.graal.FullUnrollCodeSizeBudgetFactorForSmallGraphs=2.0 -Djdk.graal.GraphCompressionThreshold=70 -Djdk.graal.VectorizeSIMD=true -Djdk.graal.OptEliminatePartiallyRedundantGuards=true -Djdk.graal.ExplicitNullChecks=true -Djdk.graal.EnableEscapeAnalysis=true -Dgraal.TuneInlinerExploration=1 -Dgraal.LoopRotation=true -Dgraal.OptWriteMotion=true -Dgraal.CompilerConfiguration=enterprise --add-modules=jdk.incubator.vector -XX:-CreateCoredumpOnCrash -XX:+DoEscapeAnalysis -XX:MaxInlineSize=96 -XX:MaxInlineLevel=20 -XX:MaxRecursiveInlineLevel=2 -XX:LiveNodeCountInliningCutoff=40000 -XX:LoopUnrollLimit=60 -XX:LoopUnrollMin=4 -XX:+InlineObjectHash -XX:+InlineThreadNatives -XX:+InlineUnsafeOps -XX:+OptimizeExpensiveOps -XX:+SpecialStringEquals -XX:+SpecialArraysEquals -XX:+UseMathExactIntrinsics -XX:+UseMulAddIntrinsic -XX:+UseSquareToLenIntrinsic -XX:+UseVectorizedMismatchIntrinsic -XX:+OmitStackTraceInFastThrow -XX:+RewriteBytecodes -XX:+RewriteFrequentPairs -XX:+UseFastJNIAccessors -XX:+UseFPUForSpilling -XX:+SegmentedCodeCache -XX:+UseThreadPriorities

I do not use large page memory arguments (-XX:+UseLargePages XX:+UseLargePagesIndividualAllocation -XX:+UseTransparentHugePages -XX:+UseHugeTLBFS) because they are

OS specific (huge* is only linux)
only supported under root in linux
only supported under freak conditions in windows
Do not show any improvement I can detect

I do not include things like -XX:UseCompressedOops because I don't care about compressing object pointers and other junk that only results in a minimal memory saving while costing me CPU cycles.

I have the JVM requesting AVX2 instructions, and not AVX-512. Many CPUs don't support them, and there is a high potential for hurting cache performance when code optimized to these instructions does not execute efficiently. I may investigate enabling this on my 9654 CPU. DO NOT give minecraft a ton more memory than it needs. this will actually make performance WORSE since now the garbage collector has to fly around so much memory, it'll also make startup time worse. I find that you can give optimized HEAVILY modded minecraft ~6GB of memory and it'll be super happy. I run with 12GB.

Here is the jvm arg optimization strategy:

Optimize the GC with ZGC
- ZGC is concurrent and scales excellently with threads
- has an extremely minimal impact on "spikes". Low pause times and good allocation spike handling
- cannot use zgenerational with graal stuff in jdk 21, but still using zgc
- zgc is more cpu and memory intensive, but is much better for latency + stutter free experience
Compile time optimize a bunch of stuff
- Run the C2 compiler everywhere (large methods, etc)
- Compile a variety of code segments to SIMD
- Run extra C2 features like
  - CPU pipeline optimization
  - Copy operations
Careful considerations
- Avoid optimizing / vectorizing / unrolling code that will yield minimal or no performance boost, as this will cost us l1/l2 cache space without any benefits.
- Avoid over optimizing in general which will result in poor performance in certain areas, which will cause big stuttering problems in certain scenarios.

-XX:-OmitStackTraceInFastThrow
Preserves stack traces even on common exceptions.

JVM Arg Descriptions¶

The following list was formatted by chatgpt. enjoy the emojis.

🧱 Memory & Heap Settings¶

-Xms12G
Initial heap size. Pre-allocates memory to avoid costly resizing. Ensures Minecraft has immediate access to memory. I generally set this at xmx to just flat out prevent heap resizing cause why would I run something with RAM I dont have. Only downside is reservation at startup takes longer.
-Xmx12G
Maximum heap size. Matches -Xms for predictable GC behavior and fewer memory reallocations.
-XX:SoftMaxHeapSize=11G
ZGC heap usage guidance. Tells ZGC to try to stay within this size for better memory pressure control. I set it close to but not at the heap max, so it actually engages. setting this a lot lower than heap max will make the gc go nuts and is wasteful.
-XX:ReservedCodeCacheSize=400M
JIT code storage. Prevents code eviction under load from mod-heavy codebases.
-XX:ProfiledCodeHeapSize=194M
Graal profiling code area. Allocates space for frequently used profiled code.
-XX:NonProfiledCodeHeapSize=194M
Graal non-profiled code area. Holds general JIT-compiled methods.
-XX:NonNMethodCodeHeapSize=12M
Native stubs and other non-method code.

🧵 Threading Configuration¶

-XX:ParallelGCThreads=4
Stop-the-world GC worker threads. Used briefly in ZGC (e.g., relocation). Kept low to reduce thread contention.
-XX:ConcGCThreads=12
Concurrent GC threads. Handles background GC phases like marking. More threads = faster background cleanup.
-XX:NmethodSweepActivity=1
Code cache cleanup aggressiveness. Keeps stale methods from bloating memory.

⚙️ Compiler & Code Generation¶

-XX:MaxNodeLimit=240000
Graal IR node cap. Prevents compiler bailouts for large functions.
-XX:NodeLimitFudgeFactor=8000
Extra node buffer. Allows slight overrun on Graal IR limit.
-XX:InlineSmallCode=1000
Inlining bytecode limit. Controls inlining aggressiveness for small methods.
-XX:FreqInlineSize=100
Inlining threshold for hot methods.
-XX:AllocatePrefetchStyle=1
Prefetch strategy. Reduces L1/L2 cache stress during allocation bursts.
-XX:ThreadPriorityPolicy=1
Thread priority mode. Enables Java thread priority enforcement.
-XX:UseAVX=2
AVX2 instruction set. Enables advanced vector instructions (safe on modern CPUs).

🔓 Unlock Flags (Must be Early)¶

-XX:+UnlockExperimentalVMOptions
Enables use of experimental JVM flags.
-XX:+UnlockDiagnosticVMOptions
Enables diagnostic flags and debugging capabilities.

🧹 ZGC Garbage Collection¶

-XX:+UseZGC
Enables the Z Garbage Collector for low-latency GC behavior.
-XX:+ZUncommit
Returns unused memory to the OS automatically.
-XX:+ZProactive
Allows ZGC to start collection before memory is tight, improving consistency.
-XX:+ParallelRefProcEnabled
Parallel reference processing (soft/weak/phantom refs).
-XX:+AlwaysPreTouch
Pre-touches memory pages on startup for NUMA/hotspot optimization.
-XX:+DisableExplicitGC
Disables System.gc() calls to prevent mods from triggering full GCs.
-XX:+UseDynamicNumberOfGCThreads
Lets ZGC scale thread use adaptively.
-XX:+PerfDisableSharedMem
Disables perfdata memory mapping to reduce native memory pressure.

🔧 Runtime Optimizations¶

-XX:+OptimizeFill
Speeds up object array filling.
-XX:+OptimizeStringConcat
Improves performance of String concatenation.
-XX:+UseCodeCacheFlushing
Flushes unused JIT code to avoid code cache overflow.
-XX:+UseOnStackReplacement
Enables OSR to optimize long-running interpreted loops.
-XX:+UseStringDeduplication
Saves memory by deduplicating identical strings.
-XX:+UseLoopPredicate
Helps eliminate bounds checks in loops.
-XX:+UseCharacterCompareIntrinsics
Optimizes character array comparisons.
-XX:+UseCopySignIntrinsic
Optimizes Math.copySign().
-XX:+UseFastUnorderedTimeStamps
Faster and lower-overhead timestamp logic.
-XX:+UseCriticalJavaThreadPriority
Tries to keep key Java threads from being preempted.
-XX:+UseInlineCaches
Speeds up virtual calls by caching resolved methods.
-XX:+TrustFinalNonStaticFields
Enables compiler optimizations on final fields (common in Minecraft data classes).

🧮 Vector & SIMD Optimizations¶

-XX:+EnableVectorSupport
Enables vector API and hardware acceleration.
-XX:+EnableVectorAggressiveReboxing
Helps with boxed vector types (can affect mod internals).
-XX:+UseVectorStubs
Enables native vector helper methods.
-XX:+UseVectorCmov
Uses conditional moves to reduce branching in SIMD code.
-XX:+UseFMA
Enables fused multiply-add for improved math precision/speed.
-XX:+UseUnalignedAccesses
Allows more efficient memory reads on modern CPUs.
-XX:+AlignVector
Aligns vector loads/stores for performance.
-XX:-UseSubwordForMaxVector
Disables subword loads for larger vector ops.
-XX:-SuperWordLoopUnrollAnalysis
Disables analysis that might over-unroll loops.

🔥 C2/HotSpot Backend Tuning¶

-XX:+AlwaysActAsServerClassMachine
Forces server-optimized JIT behavior.
-XX:+TieredCompilation
Enables both C1 and C2 JIT levels (default, but explicit here).
-XX:TieredStopAtLevel=4
Compiles to C2 only (highest optimization tier).
-XX:+AlwaysCompileLoopMethods
Forces JIT to always compile loop-containing methods.
-XX:-DontCompileHugeMethods
Allows very large methods to be compiled (mod-heavy logic).
-XX:+OptoScheduling
Enables instruction scheduling in the optimizer.
-XX:+OptoBundling
Improves instruction bundling for modern CPUs.

🚀 GraalVM & JVMCI Configuration¶

-XX:+UseJVMCICompiler
Enables JVMCI (Java-level compiler interface).
-XX:+EnableJVMCIProduct
Enables JVMCI product mode for performance.
-Djdk.graal.CompilerConfiguration=enterprise
Uses the full Graal Enterprise optimizer if supported.
-Djdk.graal.AlwaysInlineIntrinsics=true
Forces inlining of intrinsic methods for speed.
-Djdk.graal.InlineGraalStubs=true
Improves inlining of Graal stubs (low-level helpers).
-Djdk.graal.LoopVectorizationKeepPostLoop=true
Retains post-loop cleanup for safety/performance.
-Djdk.graal.SIMDVectorizationDirectLoadStore=true
Enables direct memory access for SIMD ops.
-Djdk.graal.SIMDVectorizationSingletons=true
Optimizes singleton use in vectorized code.
-Djdk.graal.UnrollEmptyLoops=true
Unrolls loops with no internal body logic.
-Djdk.graal.FullUnroll=false
Disables aggressive full-loop unrolling.
-Djdk.graal.FullUnrollCodeSizeBudgetFactorForSmallGraphs=2.0
Allows more budget for unrolling small hot methods.
-Djdk.graal.GraphCompressionThreshold=70
Triggers compression of large compiler graphs.
-Djdk.graal.VectorizeSIMD=true
Enables SIMD auto-vectorization in Graal.
-Djdk.graal.OptEliminatePartiallyRedundantGuards=true
Removes duplicate safety guards in compiled code.
-Djdk.graal.ExplicitNullChecks=true
Makes null checks more efficient (with C2/Graal fusion).
-Djdk.graal.EnableEscapeAnalysis=true
Enables stack allocation of short-lived objects.