Hitting the PF state table limit

I recently had an issue with an OpenBSD firewall where the number of state table entries was hitting the default limit of 10,000. When this limit is reached, no new state entries can be created. If you're using "keep state", "modulate state" or "synproxy state" on your rules or if you're running OpenBSD 4.1 or newer (where "keep state" is the default on all rules) this could mean that:

You cannot make new connections through the firewall
You cannot make new connections to the firewall
You cannot make new connections from the firewall

So.... if you hit the state table limit it's kinda bad, mmmkay?

The default state table size limit is 10,000 entries. You can confirm the limit on your firewall by using "pfctl -sm".

# pfctl -sm
states        hard limit    10000   <<<< state limit <<<<
src-nodes     hard limit    10000
frags         hard limit     5000
tables        hard limit     1000
table-entries hard limit   200000

To see what your current state table usage is, use the "pfctl -si" command and look for "current entries" under the state table section.

# pfctl -si
[...]
State Table                     Total             Rate
  current entries                3134

Keep in mind that this is an instantaneous value and on a busy firewall will be constantly changing. You can get an idea of whether you've ever bumped up against the limit by watching the memory counter:

# pfctl -si | grep memory
  memory                            423433            0.0/s

As you can see the counter reads over 400k hits meaning this firewall has hit the limit quite a few times. The second number is the rate (hits per second) at which the memory limit has been hit since the stats were last cleared. Another good counter to check is the number of failed allocations from the pfstatepl memory pool.

# vmstat -m | grep -E 'pfstatepl|Fail'
Name        Size Requests Fail    InUse Pgreq Pgrel  ...
pfstatepl    204 1130451081 424797 2833   527     0  ...

Here we can also see over 400k failed memory allocations for state table entries. The reason this happens is because the pfstatepl memory pool is created based on the state table size limit. If we hit 10,000 states then that means this memory pool is 100% full. When PF tries to create the 10,001st entry the memory allocation fails.

In order to get a historical view of your state table or of the memory counter, consider installing the PF SNMP MIBs and monitoring them using SNMP.

So I knew I was hitting the 10,000 limit regularly on this firewall but didn't know why. I knew this firewall was busy but it shouldn't be anywhere near 10,000 states. I could use the pfctl command to view the state table entries but it has no ability to do reporting or even sorting so it wasn't going to do me much good on its own. Instead I pulled together a quick perl script to show me the top 10 source IP addresses for all state entries. The script takes the output of "pfctl -ss" and parses through it to find the top IPs. I limited my search to a specific interface using the "-i vlan123" argument to pfctl to only list states associated with interface vlan123. Depending how your PF ruleset is written, if you do not specify an interface you could end up double counting your states.

# pfctl -ss -i vlan123 | perl top-pf-states.pl
192.168.33.6 (3615)
192.168.33.21 (131)
192.168.33.18 (124)
192.168.33.73 (102)
192.168.33.94 (56)
192.168.1.24 (42)
10.15.254.3 (34)
192.168.1.136 (34)
10.0.0.11 (31)
10.1.5.67 (28)

It's easy to see now that the device at 192.168.33.6 is taking up over a third of the possible 10,000 state table entries. Now the investigation can shift to that device to try and find out what it's up to.

You can download the perl script I used here: top-pf-states.pl

Hitting the PF state table limit

Related posts