Manageable micro-segmentation with the consumer-provider architecture ·

Table of Contents

Since NSX was released, I’ve done a significant amount of work on distributed firewall designs, and over the years I’ve learned what does and does not work. The last 5 years when working with the NSX DFW have mainly been focused on a specific model of creating and managing firewall rules, the “consumer-provider” model, and in the last few weeks I’ve had the chance to optimise this and design it for micro-segmentation of an entire VCF environment, including some applications, so today i wanted to talk a bit about how this model works, its strengths and weaknesses, and why you might want to consider using this for your future micro-segmentation strategies:

What is the consumer-provider micro-segmentation model? #

The concept of the consumer-provider micro-segment is relatively straightforward, and is built on 4 main premises:

firewall rules always grant access to a single set of sources (the consumer) to a single set of destinations (the provider).
Firewall rules are always service-specific. A specific firewall rule in this model will always grant you access to a specific service. Now this service can be available on multiple systems or even across environments, but you will generally never create a rule that allows all access to all services within an environment (with some exceptions).
Firewall rule sources and destinations are always static as long as the rule does not change. Effective sources and destinations are changed through inclusion of members in the consumer or provider groups. This reduces the amount of changes required to your rule set to a minimum and reduces risks of misconfiguration.
Consumer and Provider groups never directly include any members, instead membership is granted through membership of another group. this both promotes re-usability, keeps your group structure consistent and reduces the risk of misconfiguration.

That said, how does the consumer-provider micro-segmentation model actually works? First off, let’s talk about rule structure: a rule in this model is always built in the following way:

The source is always a single consumer group, as is the destination a single provider group. The service is always a single service related to this rule, which can consist of multiple ports/protocols, but they all need to be related.

In addition, depending on the level of micro-segmentation that you are doing, you create a rule that allows intra-tier or intra-application traffic. Note that even this rule is based on a consumer-provider model (the application is both the provider and its own consumer in this case)

This will give you a rule set that can be seen below. For grouping i prefer to always keep application rules grouped by inbound rules (grouped by provider) since this seems more logical to me, but ultimately it made the most sense since you generally want to group by inbound traffic to reduce the amount of clutter in a ruleset.

The entire VCF and VVD ruleset can ultimately be condensed down to the following ruleset (excluding common infrastructure rules), which showcases some of the strengths of this model:

Now, let’s talk about grouping structure. Ultimately this is up to you, but what i like to do is the following:

consumer and provider groups have **groups ** as direct members. These groups can be structured in the following way:

Application groups: These are security groups focused on a specific application and - if required - a specific tier of the application. This can be achieved through multiple means, as described below.
Segment groups: These are groups with segments as direct members, which can be used to allow an entire network access to a service, or as an alternative to application groups if you have per-application segments.
Network groups: These are groups with direct IP members for a more broad access to services. Examples of this would be a network range for your end users, admins, internet access, etc.

While segment and network groups are pretty clear, application groups might need some distinction. Application groups with direct IP members are relatively straight forward. For example, for your physical AD infrastructure you would create a group app.dc.ad.prod for your domain controller tier in your AD application in your production environment, which then contains the IP addresses of your specific domain controllers.

Tag based application groups is where the real power of this grouping lies. With tags, using the same example you can assign your DC VMs the following 3 tags:

scope: environment, tag: production
scope: application, tag: ad
scope: tier, tag: dc

and these tags can subsequently be used as a criteria for group membership.

What this allows us to do is to tag VMs when provisioned in NSX and have them automatically gain access to a specific set of services, which can be done manually or through automation tools such as vRealize Automation, Terraform, Ansible, Puppet, etc.

To summarise the typical structure set of the consumer-provider model:

Consumer group which _contain_Multiple Application groups which _contain_tag criteria or IP addresses or segments

Firewall rules use a single consumer and provider group as their source respectively their destination for a single set of common ports/protocols related to a that provider.

Consumer group which *contain*
    Multiple Application groups which *contain*
        tag criteria *or* IP addresses *or* segments
        
Consumer group which *contain*
    Multiple Application groups which *contain*
        tag criteria *or* IP addresses *or* segments

Firewall rules use a single consumer and provider group as their source respectively their destination for a single set of common ports/protocols related to that provider.

An example of how this structure would work in practice can be seen below:

Advantages and Disadvantages #

So now that we know how all of this works, let’s talk about the why.

First off, one of the major strengths of this model is the consistency. Your rulesets are pretty much going to be immutable, since you never have to touch your rules directly, only the indirect membership. This is a significant benefit for consistency, standard operational changes, documentation, and manageability of your firewall rules. Having a single immutable set of rules for an application that will always be consistent for the lifetime of that application is a massive benefit over standard firewall rules, and if that’s not entirely clear i would recommend drawing out the ruleset required for a complex environment with this methodology compared to standard rulesets.

Repeatability becomes significantly simpler by using this model as well. If you have multiple copies of a specific application (for example, a generic 3-tier application with the same services), you can straight up create the new consumer/provider groups, copy the ruleset and apply it, which allows you to create actual ruleset templates of your applications instead of manually modifying rules every time. You could even automate this process through tools such as vRealize Automation or Terraform.

Ease of automation is another massive benefit of this system. Not only can we automatically include VMs in all the rulesets they require by tagging them upon provisioning, but the real power lies elsewhere: Security As A Service (As described below). By having a consistent set of consumer and provider groups, we can now easily add other applications to this group and have them consume another application without needing to know which rules need to be modified exactly. This is a massive benefit over directly automating firewall rules, as this is not only risky but also significantly more complex as you need to be aware of which rules have been created. It also allows for a perfect hybrid management model where firewall rules themselves are managed manually, but effective membership can be handled both manually as well as automatically.

That said, there are some disadvantages as well. For starters, the barrier to get started is higher than normal. You can’t just start writing rules and implement them, as you really have to think about your grouping structure first before even touching a firewall rule. This means that when you start with this model you need to be disciplined enough to apply it everywhere. Mixing this with normal rulesets is going to cause confusion, chaos, and ultimately something spontaneously selfcombusting. This is a highly prescriptive model that everyone needs to adhere to, which can be a good or a bad thing, but everyone in your organisation needs to understand this. The other disadvantage is related to visibility.

Due to how NSX works, while you can see both effective members and membership definition of a group, you can not see how those effective members were added to a group. That means that you need to be very clear on how member groups were defined, and you need to have a consistent naming convention for all your groups. My personal preference (but this is obviously up to you) is the following:

Application groups: app.tier.application.environmentexample: app.collector.vrops.prod

Network groups: net.description.environmentexample: net.admins.prod

Segment groups: seg.description.environmentexample: seg.sddc.prod

Consumer groups: consumes.service.tier.application.environmentexample: consumes.ldap.dc.ad.prod

Provider groups: provides.service.tier.application.environmentexample: provides.ldap.dc.ad.prod

The reasoning behind this naming convention is that it allows 2 things:

identification of what a group is intended for
simple group selection and creation in automation through consistent naming

Obviously you can adjust this naming to your liking, but i would highly recommend that there’s consistent information between your consumer/provider groups and your application groups. If your provider groups care about the environment, include it in your application groups as well.

Alternatives #

There are ofcourse alternatives to this model, and over time i’ve worked on some of them. One example is tagging the VM directly with a consumes and provides tag, and including those directly into the group. The problem with this is that you run into scalability issues related to the maximum number of tags, and generally tagging is harder to automate than adding groups to another group in NSX. The other problem is that if you create a new service, you have to create new tags and potentially tag a lot of VMs, as opposed to just adding their application groups to the new groups.

Another alternative methodology is to directly add applications to the source and destination of a rule. While this absolutely works for simpler environments and applications with single destination groups, it can become a bit more confusing when we have multiple destination group. This would for example apply to the vrops endpoint rule, where we can add different application groups to be a https/ssh/snmp endpoint for vROPS. As you environment scales, you would have more and more groups in this rule to the point where it becomes unmanagable as your number of applications grows. In addition, it defeats the point of having an immutable ruleset if you still have to touch the source and destination groups. While having nested groups does add some overhead and complexity, the benefits of rule consistency and never having to touch your rules after provisioning far outweighs the complexity, especially as your environment grows.

Security as a service #

As mentioned above, one of the powers of this model is enabling security as a service. If you’ve ingested the above information, you should already have an idea why this model is so powerful for automation, but to reiterate, some of the benefits are:

The consumer and provider groups allow self-service for both publishing a provider and consuming a provider. By creating a rule for a service and allowing users to add their application group to the provider list of this service automatically, you can now allow application owners to publish their service in your microsegmentation model. In addition, you can allow users to subscribe to a published service by allowing them to add their application to a consumer group. As an example:

Joe has a set of internal APIs that he wants to make available to the company, however due to the sentitivity of this application, he doesn’t want everyone to have access. In order to do so, Joe has the NSX admin create a firewall rule for his application and through automation adds his application groups to the provider group of this service. Now, whenever he creates a new service, all he has to do is request the new group to be added to the consumer.

Matt wants to subscribe to Joe’s set of APIs. In order to do so, all he has to do is add his application groups to the consumer group of the rule created earlier, be it through automation during deployment time, a day 2 operation, a script or some other means.

The infrastructure team has a set of services they want certain systems to have access to, separately for both their production and development environment. When provisioning a new application, they can automatically add this application group to the consumers of either the production consumer group or the development consumer groups for this environment, depending on variables the user chose during provisioning.

Conclusion #

While this is only a very high level overview of how this model works, once you get the hang of it it is incredibly simple and elegant in practice. It takes a bit of time to get used to the grouping structure and how to build the rulesets, but once implemented it becomes an incredibly powerful tool to keep your complex microsegmentation ruleset both manageable, functional and scalable.

Don’t forget that this architecture described above is just an example of how to apply it for VCF and VVD management infrastructure, it can be adjusted to anything you like and it can absolutely serve as a baseline for your workload microsegmentation model. In addition, you can play around with basing rules on groups containing entire environments, groups spanning multiple applications of the same type (don’t forget, if your application tagging is sensible you can use “STARTSWITH” criteria to include all applications of a common type), or you can apply this to your federated cross-region distributed firewall to create a unified DFW policy across your global environment. This is only intended as a baseline architecture to follow, what you do with it is ultimately up to you.