In some recent engagements i've been involved in getting my customers started with a microsegmentations strategy, and one of the more obvious requests from people new to microsegmentation is to start with their infrastructure. Unfortunately, there is no real guidance out there for VCF + vRealize, and i found that a lot of the information is hard to find, and generally the rules that are available boil down to IP based rules.
So in order to save myself and everyone else time in the future, i decided to build my own reference microsegmentation architecture for VCF + VVD, which is also well suited to be adjusted for customer specific applications. The document itself can be found at https://vxsancom-my.sharepoint.com/:x:/g/personal/s_robroek_vxsan_com/EY7_Q0daSahPnWS_iSlqvSkBs7voTFNrv4nbuRBS-cJVlg?e=sRVMn5, all of the information was retreived by reverse engineering the API behind https://ports.vmware.com (if you see me, ask me about that ;)).
You are free to adjust and modify for your own use with attribution and respecting the Creative commons (Attribution-ShareAlike 4.0 International(CC BY-SA 4.0) license attached to this document.
The framework provides 2 models: inter-app trust segmentation and zero trust microsegmentation. Pick whichever model is applicable to your situation, and look at the relevant traffic matrix. This contains all the required ports between different applications, your VCF infrastructure and any external services. Note that while the inter-app trust is relatively simple, the zero trust microsegmentation is significantly more complex to set up, manage, and troubleshoot. So choose wisely before going all "we must have zero trust microsegmentation!" gung-ho.
Next, there is the ruleset. This ruleset is built on four main principles:
- Rules are always defined as in-out inbound rules, unless an outbound rule is required (for example, when communicating with an external service. The reason for this is because it allows for grouping of rules in applications, creates a consistent model for rules and simplifies the ruleset in general.
- Rules are built on a consumer-provider model, with nested groups as the source and destination. Consumer and provider groups contain application groups, which can be defined as tagged VMs, tagged ports, or IP addresses. By using a consumer-provider model, you gain the following benefits:
- Adding or removing applications or scaling the environment does not require you to adjust the firewall rules, which reduces the risk of operator error.
- The model allows for simple automation as the only thing required to allow inter-application communication is to add a new or existing application to the security group.
- Application rules are consistent and can be templated for multiple versions or multiple environments.
- Each application ruleset is self-contained. All rules related to an application are stored in the same section, and no rules related to that application are stored elsewhere. This provides more consistency when managing or troubleshooting the ruleset. In addition, it increases DFW performance as it allows termination of the rule chain after hitting the application rules instead of going through the entire chain.
- Applications are defined using a model following service.role.application.environment. This allows for consistency in scripting, automation, reporting and generally rule configuration. By doing so, it becomes incredibly simple to create a new application for a different environment, add VMs to an application group, create a tagging strategy, etc.
As you can see, the ruleset for zero trust is significantly more complicated. However, we'll review both scenarios to understand what is going on here. Let's take vRealize Network Insight as an example:
In this example, you can see that first off we have a rule that allows consumption of the HTTPS service on both the platform and the collector. We also have a rule for SSH, which goes to all the nodes since there is no need to split this up. In addition, we have a rule that allows netflow traffic to ESXi hosts, again defined as providers and consumers, however since netflow in VRNI is bidirectional as per the documentation for some reason, both the ESXi hosts and the collectors are simultaneously providers and consumers, allowing 2-way traffic.
In addition, we see rules that allows the customer to define endpoints for SSH, HTTPS and SNMP, so that adding an external network device is as simple as creating an application group and adding them to the right providers group.
The default intra-application rule allows unlimited traffic between the application components, and any other traffic is dropped. If a load balancer was involved, it would be added to the sources and destinations as well.
Now let's take a look at another application, vRealize Operations, but this time for zero trust segmentation.
As you can see, this ruleset is becoming significantly more complicated. We still have the consumer-provider model for services, however as vRealize Operations consists of multiple tiers, we've split all of them up into separate rulesets. As we do not need to follow the consumer-provider model here as the tiers are static within an application, we use the application groups to allow traffic between tiers and within a tier. Even with this complexity, the ruleset is still very repeatable and consistent between applications.
Applications are grouped, as mentioned earlier by creating either consumer or provider groups, which contain application groups. Application groups are defined by IP addresses for non NSX backed services such as your vCenter, NSX Managers, but also external services such as DNS or NTP, or by tags. In order to allow the granularity for this kind of firewall model, we follow the following grouping strategy
- VMs are tagged with an environment tag. While this is not relevant now, it allows segregation of separate environments such as DMZ, Production, DTAP, etc.
- VMs are then tagged with an application they belong to, which allows us to either create a group for all components of the application or create more granular groups.
- If required, VMs are tagged with a role tag in order to allow segregation between application tiers. Note that it is not required that a VM belongs to a single role, in the case of vROPS often you will have your analytics nodes serving as data nodes as well.
- Load balancers are added done by including the autogenerated VIP group in NSX. Unfortunately i have not found a way to tag the logical port belonging to a load balancer as it seems they cannot be tagged through the UI.
With that in mind, we now have a logical firewall model which allows for simple grouping of applications, adding multiple copies of the same application and segmenting them, scaling out applications or creating a multitenancy model, or simply implementing this as a default as part of your VCF or VVD projects. The ultimate goal would be that something like this excel sheet can be fed into a tool such as Terraform or PowerCLI for the standup of NSX microsegmentation, or it can be used in day 2 operations to automate your microsegmentation for both infrastructure as well as workloads. The current model only has two networks that allow access to tools such as vRA, WS1, VRLI, etc. but obviously you can model that however you want, such as limiting specific people to specific tools using AD groups, creating more granular access rules for DTAP environments, segregating your management infrastructure and SDDC infrastructure, you name it. By creating a consistent microsegmentation model you are ensuring that your ruleset will always be consistent and manageable.
That said, some caveats
- This list is far from complete, and a lot of ports for specific production functionality are not in here. This is by design, as it is inteded as a base ruleset for VCF and VVD. Examples are specific rules related to workspace one regarding mobile devices, app management, etc. as this is usually not in the scope of VCF. Products such as the Tanzu suite are not in the list simply to limit the complexity, but obviously this is all extensible.
- This design is not made for any kind of multisite deployment and currently does not take into account things such as using vROPS to monitor SDDC components outside of the management NSX environment. Until VCF supports federation, you will have to make due with IP addresses.
- A lot of choices in this design are based on personal experience and preferences. Some people like to use tags for everything, some people prefer to statically include VMs, they all have their advantages and disadvantages. Again, you are free to adjust this as you see fit, all i ask you that you credit the original source and honour the original CC license.
- This list is not guaranteed to be complete, it was built as a hobby project and has not been validated in an actual live environment, so keep that in mind when applying this. I have included a monitoring rule to both prevent people from shooting themselves in the foot as well as allowing you to monitor if any rules were missed. However, as there are no monitor rules for per-application, you could still lock yourself out of any NSX backed workloads.
- As usual, i am not responsible for any kind of downtime, datacenters burning down, interdimensional rifts, COVID-20 pandemics, election shenanigans or hurt feelings as a result of this information. Buyer beware.