Burn down your DFW onboarding with vRealize Log Insight

Currently i'm working on a NSX project involving a large amount of networks and a complicated firewall setup that needs to be migrated to NSX. Using tools such as vRealize Network insight for traffic flow insight helps a lot, but ultimately the DFW onboarding process needs to be SMART, and specifically the following:

  • Meet a set deadline for each network as the network cannot be migrated before all firewall rules are created.
  • Provide evidence that no firewall rules have been missed.

Note: I strongly advise against involving any flammables in your DFW onboarding process as this may void your NSX warranty. What we are doing here is using the agile concept of burndown charts to map the progress of your distributed firewall onboarding.

As mentioned before, each network has a large amount of firewall rules, and to meet the above two objectives we need to migrate each network individually before moving on to the next.

To achieve the above two objectives, we are using a burndown chart to monitor firewall rule implementation, which would look something like this:

Screen-Shot-2017-09-04-at-13.02.35

Unfortunately i can't show you a production chart due to restrictions from the customer, but the general idea should be clear: we're showing the total hits on the distributed firewall fallthrough rule for a specific network over time, as well as the hits on specific rulesets for that network. As more and more rules are implemented, the orange line should hit 0 while the other lines should be growing.

For this purpose, we've created the following rulesets:

  • For each network, create a fallthrough rule which is applied to that logical switch, allows all traffic and logs with a specific tag containing the network ID and the firewall ruleset ID. An example of this tag would be DFW.NETID=123.RULESET=456.For fallthrough rules we are using the special ID "FALLTHROUGH"
  • For each network, create a default rule that allows all inter-vlan traffic so that this traffic does not reach the fallthrough rule.
  • For each network, create your firewall rules which are applied to that logical switch only and allows specific traffic to and from that network. This traffic is also logged and tagged with the network ID and the firewall ruleset ID.

Now that we have these firewall tags we can easily filter on these properties in Log Insight. However, as we don't want to manually query and filter every time we're building a dashboard, we've created some extracted fields as can be seen below.

filter1
Filtering the network ID

filter2
Filtering the request ID

Now that we have both these ID's, we can start filtering. In Log Insight, we can now create dashboards that show a stacked graph for a specific network ID, grouped by request ID.

Note that currently we are not blocking anything. We are allowing all traffic and are purely using the DFW to classify traffic and create these dashboards. The concept behind this is that as long as a network is not migrated, all traffic is filtered by the physical firewall. Whenever a rule is imported into the distributed firewall for a specific network, that traffic will start hitting that rule and not the fallthrough rule.

Over time, more and more rules are added for a specific network and less traffic should hit the fallthrough rule for that network. At some point, the traffic to the fallthrough rule should hit zero and the fallthrough rule can be switched from allow all to deny all, at which point the rule migration is considered complete. To prove this in a management-friendly form, we are using a percentage-stacked graph over the last 30 days with the following filters:

  • dfw_network_id = "network id of the migrated network".
  • dfw_rcec_id exists

When the fallthrough rule for a network is switched, we can still use the logging to generate alerts. As there may be a few irregular but legitimate traffic flows which has not been imported (for whatever reason), this firewall rule will generate an alert so that an operator can inspect the flows and either create a rule or inspect the involved systems.

So by smart usage of vRealize Log Insight's extracted fields and graphing options, we've managed to meet the above two requirements:

  • show the progress of the firewall rule through graphing the decrease of fallthrough rule hits.
  • Show proof of firewall rule completion by monitoring fallthrough rule hits.

When the migration is complete, the above tags aren't obsolute. During production we can use t for a variety of purposes. Think of use cases such as:

  • monitoring if old firewall rules are still in use or can be safely removed.
  • Sending alerts when a large amount of fallthrough rules occur.
  • Monitoring changes to firewall rule requests (which should not happen without prior approval).

All this could be done with normal queries in Log Insight, but using extracted fields makes it simpler, reproducable and allows providing the tags to different users and permits creation of multiple dashboards without having to rebuild queries every time.