Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move Firewalls from VM to Subnet #1443

Merged
merged 4 commits into from May 2, 2024
Merged

Move Firewalls from VM to Subnet #1443

merged 4 commits into from May 2, 2024

Conversation

furkansahin
Copy link
Contributor

@furkansahin furkansahin commented Apr 9, 2024

Move Firewalls from VM to Subnet model migration
We made the decision to make Firewalls to be added to the whole subnet
instead of individual VMs. This commit implements the migration file.

Create Firewalls and attach to Subnet instead of VM
This commit implements the Firewalls move from VMs to Subnets.
Therefore, there are multiple changes regarding model relationships,
Vm::Nexus.assemble, Vnet::SubnetNexus.assemble and finally at the
routes. The changes are not very interesting as they mostly involve
semaphore increments being performed on subnets instead of individual
VMs or entity creations referring to subnets intead of VMs.

One additional small but interesting change is the cidr validation. It
involves 2 changes;

  1. Validate IPv6 as well
  2. Return the parsed cidr and use its string representation while
    creating the record. This is necessary because when NetAddr is able to
    parse a cidr like "1.1.1.1/8" without an issue, db insert fails because
    the valid form of that cidr is actually "1.0.0.0/8". This used to cause
    500 error in console.

Move Postgres Firewalls from VMs to Subnets

@furkansahin furkansahin force-pushed the fw_to_subnet branch 4 times, most recently from 91df86a to 5263f4d Compare April 10, 2024 11:11
@furkansahin furkansahin marked this pull request as ready for review April 10, 2024 11:16
@velioglu
Copy link
Contributor

With the change we make Firewall belongs to a single PrivateSubnet, so same Firewall can't be associated with multiple different PrivateSubnets at the same time. My understanding from the discussion we had was having many-to-many relationship instead. @fdr can you please chime in if my understanding is wrong?

@fdr
Copy link
Collaborator

fdr commented Apr 12, 2024

I am somewhat flexible on this one. I initially thought we'd limit it and maybe make it slightly annoying when the user wants re-use Firewall objects rather than opening the multiple association problem today.

But sometimes such limitations are a hassle (and add code complications) and someone has a pretty good idea how multiple-association should work, if that's the case, I'm okay with attacking the generalization too.

By default, though, my general thought is "make people complain a little first"

@fdr
Copy link
Collaborator

fdr commented Apr 12, 2024

By the way, I think I'm obliged to elaborate...why make Firewalls so coupled to Subnets? I mentioned this to a couple of people in fragments, but it deserves a written theory. Mostly because, if the theory is violated, we have to change direction.

On Abstraction

Once you peel back all the layers, firewall rules are run using comparisons in a little virtual machine (such as Linux netfilter's), ASIC, or native code. This is why ranges of addresses and even ports have a smaller footprint than itemizing a large number of either individually. This is the "natural" way network hardware and software works, and I'm trying to align the customer's usage with it.

To couple Firewalls in the customer's mind with Subnets is to give Subnets a purpose: their purpose is to provide contiguous addressing, and one of the prominent applications for that is firewalls.

As-is, we have no prescription on subnet usage, partially because there are so few features...and Firewalls are going to change that.

This may sound pithy, but let's consider the AWS situation, as it is often copied, whether it's a good idea or not.

Some History

Back in the days of ec2-classic, there was ~= no virtual networking. You could have security groups, they could refer to other security groups or IP literals, but in the end, everyone partied in the class A private address space, 10.0.0.0/8.

There was no address contiguity, so instead you could establish references between security groups. This simple looking to a user, but it would necessarily have to expand to a large cohort of comparisons, one address at a time, if you wanted many instances to talk to one another. This abstraction, where one innocent looking security group reference could blow up to a big list to evaluate is a bit leaky, but hey, what were you gonna do?

Fast forward to the VPC era, and now there are subnets. Aha! A solution, you might think: now VMs/vnics/"eni" can live in contiguous ranges.

Alas, it was awkward: alone among hyperscalers, AWS Subnets are Zonal. Thus, subnets have a secondary purpose: to allow you to provision any kind of network-apparent zonal resource.

Thus, it is common to create one AWS Subnet per AZ and then put computers or ENIs that have the same job and security definition in all three for balancing. This obscures the connection between subnets and usable perimeters for firewalls.

I'm not sure why the other providers, the ones without zonal subnets, did not try to be more opinionated about this. I can speculate on a few reasons, but none of them indicate that we shouldn't give this a try.

On degradation / fallback

If this theory is wrong, we might add per-NIC (~= per-VM) micro-managing of firewall rules. I don't think what we're doing here is out-of-sample for the other providers, it's more like, a chosen subset.

Omissions

An omission in power in the ubicloud implementation becomes more obvious with this feature: there is no coupling between subnets. We've avoided broaching the issue because we'd like to identify a way to unify peering and service endpoint considerations.

Why Unify?

AWS has three features that accomplish similar concerns. I've been asked to integrate with all three, and each requires basically a novel code and product management: Peering, Transit Gateway, Private Link. And they're the easiest one to work with as compared to Azure and GCP, I found...it's all downhill from there :(

I think...think...it is possible to make within-project communication and inter-account/project subnet communication and/or service delivery symmetric. The fundamental ingredients are peering, firewall rules, and host name resolution as a package, designed to work together.

Finally, the "zero trust" networking idea finally got some mindshare, and having the user couple packet filtering to joining networks together lines up neatly with this welcome change in norms.

A side note, on subnets and perimeters

Often subnets are interpreted as a perimeter, and the nodes within are trusted, i.e. the filtering happens "outside" the subnet somehow. Is this the right thing? I don't think so. Consider the common case of a subnet that holds web application servers that accept traffic from the internet on port 443 for HTTPS. Should it really be the case that they can attack each other via SSH because they sit in the same subnet? I think not. If the customer wants that behavior, I think the firewall on the subnet should have to read: "accept from $ownrange all" explicitly.

Comparison to GCP

Of the hyperscalers, our system is superficially the most like GCP, but without the "VPC" data model. VPCs in GCP are very different than the constructs of the same name in AWS and the similar Azure vnet. GCP VPCs are a global entity that manages routing rules between "Subnetworks" scattered around the globe, more like a collection of peerings that has no equivalent on the other platforms. These subnetworks are a mash-up of Subnet and AWS VPC+Subnet/Azure Vnet+subnet features, collapsing two levels of hierarchy into one. It is in this respect that our subnets are similar to GCP's conception.

I'm not sure if the GCP VPC is something to replicate just because our subnets and their subnetworks share a similarity, I didn't love it exactly. There was a lot going on, and yet, less than what I needed, mostly in how it interacted with peering & DNS as a service provider. So I would rather expend our complications on making the "glue" between subnets fitting of a modern, zero-trust network architecture, where peering, hostnames, and firewalls come together.

migrate/20240409_move_firewall_to_subnet.rb Show resolved Hide resolved
migrate/20240409_move_firewall_to_subnet.rb Show resolved Hide resolved
routes/web/project/location/vm.rb Show resolved Hide resolved
routes/web/project/location/vm.rb Show resolved Hide resolved
model/firewall.rb Show resolved Hide resolved
prog/postgres/postgres_server_nexus.rb Show resolved Hide resolved
prog/postgres/postgres_server_nexus.rb Show resolved Hide resolved
routes/web/project/location/vm.rb Show resolved Hide resolved
migrate/20240409_move_firewall_to_subnet.rb Outdated Show resolved Hide resolved
migrate/20240409_move_firewall_to_subnet.rb Outdated Show resolved Hide resolved
migrate/20240409_move_firewall_to_subnet.rb Outdated Show resolved Hide resolved
model/private_subnet.rb Show resolved Hide resolved
@enescakir
Copy link
Member

Bro, did you check the latest commit's diff? 😅

Copy link
Member

@enescakir enescakir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

We made the decision to make Firewalls to be added to the whole subnet
instead of individual VMs. This commit implements the migration file.
This commit implements the Firewalls move from VMs to Subnets.
Therefore, there are multiple changes regarding model relationships,
Vm::Nexus.assemble, Vnet::SubnetNexus.assemble and finally at the
routes. The changes are not very interesting as they mostly involve
semaphore increments being performed on subnets instead of individual
VMs or entity creations referring to subnets intead of VMs.

One additional small but interesting change is the cidr validation. It
involves 2 changes;
1. Validate IPv6 as well
2. Return the parsed cidr and use its string representation while
creating the record. This is necessary because when NetAddr is able to
parse a cidr like "1.1.1.1/8" without an issue, db insert fails because
the valid form of that cidr is actually "1.0.0.0/8". This used to cause
500 error in console.
@furkansahin furkansahin merged commit e56a496 into main May 2, 2024
6 checks passed
@furkansahin furkansahin deleted the fw_to_subnet branch May 2, 2024 18:04
@github-actions github-actions bot locked and limited conversation to collaborators May 2, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants