Recently we ran into an issue with our edge VMs on NSX-T 3.1.0.0, the datapath mempool usage for pfstate3 was at a critical high level (100%) and the edge VMs were dropping packets. A failover to another edge by triggering the NSX Maintenance Mode for the edge in question was just a quickfix.
This is a known issue by VMware and is caused by a bug in version 3.1.0.0. The issue is caused by memory leak caused by the firewall service in the Edge.syslog.9:2022-02-16T08:49:01.306Z edgeVM01 NSX 4451 FIREWALL [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="firewalldp" level="ERROR"] Memory resource from hugepage exhausted in the firewall service size=912(0M)
syslog.9:2022-02-16T08:49:01.307387+00:00 edgeVM01 f16e2e57db2b 3179 - - 2022-02-16T08:49:01Z datapathd 4451 firewalldp [ERROR] Memory resource from hugepage exhausted in the firewall service size=912(0M)
syslog.9:2022-02-16T08:49:01.307708+00:00 edgeVM01 datapath-systemd-helper 4332 - - 2022-02-16T08:49:01Z datapathd 4451 firewalldp [ERROR] Memory resource from hugepage exhausted in the firewall service size=912(0M)
This can happen even if the gateway firewall on the Edge is not utilized.
Increasing the size of the edge will make the issue less frequent, but a permanent fix is released in version 3.1.3.6. But after upgrading to 3.1.3.6, this version was withdrawn due to a similar issue with datapath memory leak.
https://kb.vmware.com/s/article/87806
So starting a new upgrade to 3.1.3.7……
After the upgrade to 3.1.3.7, the issue was still present. So on to resizing the edge to a bigger form-factor.