Right, so you’ve got your clusters humming along nicely, isolated and secure in their own little worlds. Great. Now you need them to actually talk to each other. You can’t just wave a magic wand and expect ServiceA in cluster-a to ping ServiceB in cluster-b; the network overlords of each cluster have no idea the other exists. This is where Submariner swims in. It’s essentially a VPN and service discovery mechanism that stitches your clusters together across clouds, on-prem, or anywhere else. It doesn’t require you to expose every service with a public LoadBalancer, which is a security nightmare waiting to happen. Instead, it builds an encrypted overlay network between your worker nodes.

Think of it as a sophisticated networking diplomat. It establishes secure tunnels (using IPsec or WireGuard) between gateways nodes in each cluster, and then it takes care of the incredibly tedious job of advertising routes. “Hey, cluster-b, if you get a packet for 10.100.5.0/24, send it my way. I’m cluster-a.” It’s like BGP for your Kubernetes clusters, but without requiring you to be a network engineer who enjoys that sort of pain.

The Core Components: Who Does What

Submariner has a few key players you need to understand. It’s not just one monolithic container.

  • Gateway Engine: This is the workhorse. Deployed as a DaemonSet on gateway nodes, it establishes the encrypted tunnels to other clusters. It’s the one doing the actual packet forwarding.
  • Route Agent: Also a DaemonSet, it runs on every worker node. Its job is to make sure each node knows how to route traffic to the remote clusters via the local gateway node. It messes around with the node’s routing table, which is why it needs NET_ADMIN capabilities.
  • Service Discovery: This is the magic sauce for finding services. It uses Lighthouse under the hood, which is essentially a fancy controller that watches for Services of type ClusterIP in your clusters and syncs their DNS records. This means you can just use a standard Kubernetes DNS name to discover a service in another cluster.

Installing and Joining: The Initial Handshake

First, you install the subctl CLI tool, which is your Swiss Army knife for all this. Then you install Submariner into your clusters. The first cluster is your “broker” – it holds the central information about all the other clusters that join. There’s no magic central server; the broker is just a standard cluster with some CRDs acting as a meeting point.

Let’s install on two clusters. Assume your contexts are cluster-a and cluster-b.

# Download the subctl binary (check the latest version!)
curl -Ls https://get.submariner.io | bash
sudo install subctl /usr/local/bin

# First, set up the broker on cluster-a (your designated first cluster)
subctl deploy-broker --kubeconfig cluster-a-kubeconfig.yaml

# Now, join cluster-a to the broker (it's both broker and member)
subctl join --kubeconfig cluster-a-kubeconfig.yaml broker-info.subm.yaml --clusterid cluster-a --natt=false

# Finally, join cluster-b
subctl join --kubeconfig cluster-b-kubeconfig.yaml broker-info.subm.yaml --clusterid cluster-b --natt=false

Notice the --natt=false? If you’re testing on a local environment like Kind, you often need to disable NAT traversal. In a real cloud environment with public IPs on your gateway nodes, you’d omit that. This is a classic “it works on my machine” pitfall.

Actually Connecting a Service: The Payoff

Now for the fun part. You have a service in cluster-b in the namespace my-app that you want to access from cluster-a. First, you need to export the service from its home cluster. This tells Submariner, “Hey, this service is available for cross-cluster traffic.”

# From cluster-b's context, export the service
kubectl config use-context cluster-b
subctl export service --namespace my-app my-service

That’s it. No, really. Over in cluster-a, you can now discover this service using a fully qualified domain name (FQDN) that Lighthouse creates automatically. The pattern is: <service-name>.<namespace>.svc.clusterset.local

Let’s run a test from a pod in cluster-a:

kubectl config use-context cluster-a
kubectl run test-pod --image=alpine/curl --rm -it --restart=Never -- sh
/# curl -v http://my-service.my-app.svc.clusterset.local

If everything is wired up correctly, your request will seamlessly route through the encrypted tunnel, hit the service in cluster-b, and return the response. It’s borderline sorcery when it works.

Common Pitfalls and The “It’s Not Working” Checklist

This is where the brilliant friend gets real with you. Submariner is powerful, but the networking gods demand tribute. Here’s what usually goes wrong:

  1. Firewalls: This is the number one culprit. The gateway nodes must be able to talk to each other on the required UDP ports (4500 for IPsec, 51820 for WireGuard by default). If your cloud provider’s security group or your on-prem firewall is blocking this, you’re dead in the water. Always check connectivity first.
  2. Overlapping CIDRs: This is a hard stop. If your cluster Pod CIDRs (--cluster-cidr) or Service CIDRs (--service-cidr) overlap, Submariner can’t do its job. How would it know where to route a packet for 10.96.0.10 if two clusters claim it? You must plan your network addressing scheme across clusters. There’s no way around this.
  3. Gateway Node Selection: By default, subctl picks a random node as the gateway. If you’re in a cloud environment, that node needs a public IP that other clusters can reach. You can label a specific node for Submariner to use to ensure it’s one with the right network setup: kubectl label node <node-name> submariner.io/gateway=true.
  4. The NAT Problem: As mentioned earlier, if you’re in a restricted environment, NAT traversal can be tricky. The --natt=false flag is a common workaround for lab setups.

The key insight is that Submariner isn’t abstracting away the network; it’s orchestrating it. You still need to have a fundamentally sound network setup between your clusters. It removes the Kubernetes-specific pain points, but it can’t fix broken underlying connectivity. When it works, it feels like magic. When it doesn’t, you get to learn a lot about IP routing and firewall rules. Fun, right?