Network Debugging¶

Troubleshooting guide for network-related issues including connectivity, DNS, ingress, and load balancing.

Quick Diagnostics¶

# Check network pods
kubectl get pods -n kube-system | grep -E "(coredns|flannel)"
kubectl get pods -n traefik
kubectl get pods -n metallb-system

# Check services and endpoints
kubectl get svc,endpoints -A | grep <service>

# Check ingress routes
kubectl get ingressroute -A

# Test DNS
kubectl run -it --rm debug --image=nicolaka/netshoot --restart=Never -- nslookup kubernetes.default

DNS Resolution Issues¶

Internal DNS Not Working¶

Symptoms: - Pods can't resolve service names - nslookup <service>.<namespace>.svc.cluster.local fails

Diagnosis:

# Check CoreDNS is running
kubectl get pods -n kube-system -l k8s-app=kube-dns

# Check CoreDNS logs
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=100

# Test DNS from pod
kubectl run -it --rm debug --image=nicolaka/netshoot --restart=Never -- nslookup kubernetes.default.svc.cluster.local

Common Fixes:

CoreDNS pods down: Restart them

kubectl rollout restart deployment coredns -n kube-system

Wrong DNS server in pod:

kubectl run -it --rm debug --image=nicolaka/netshoot --restart=Never -- cat /etc/resolv.conf
# Should show: nameserver 10.43.0.10 (or similar cluster IP)

External DNS Not Resolving¶

Symptoms: Can't reach external websites from pods

Fix: Check CoreDNS forward configuration:

kubectl edit configmap coredns -n kube-system
# Ensure: forward . /etc/resolv.conf

Pod-to-Pod Communication Issues¶

Pods Can't Communicate Across Nodes¶

Diagnosis:

# Test from pod
kubectl run -it --rm debug --image=nicolaka/netshoot --restart=Never -- ping <other-pod-ip>

# Check CNI (Flannel)
kubectl get pods -n kube-system -l app=flannel
kubectl logs -n kube-system -l app=flannel --tail=100

Fix:

# Restart Flannel
kubectl rollout restart daemonset kube-flannel -n kube-system

Service Access Issues¶

See Common Issues - Service Not Accessible

MetalLB Issues¶

MetalLB Not Assigning IPs¶

See Common Issues - MetalLB Not Assigning IPs

MetalLB Layer 2 ARP Issues¶

Symptoms: External IP assigned but not reachable

Diagnosis:

# Check MetalLB speaker logs
kubectl logs -n metallb-system -l component=speaker --tail=100

# Check ARP table (on a client machine)
arp -a | grep <external-ip>

Fix: Ensure network allows ARP, verify speaker pods run on all nodes

Traefik Ingress Issues¶

IngressRoute Not Creating Route¶

Diagnosis:

# Check IngressRoute
kubectl get ingressroute <name> -n <namespace>
kubectl describe ingressroute <name> -n <namespace>

# Check Traefik logs
kubectl logs -n traefik -l app.kubernetes.io/name=traefik --tail=100 | grep <your-domain>

# Check Traefik dashboard
kubectl port-forward -n traefik svc/traefik 9000:9000
# Open http://localhost:9000/dashboard/

Common Issues:

Wrong service name or port:

services:
  - name: my-service  # Must match Service name
    port: 80           # Must match Service port

Missing entryPoint:

entryPoints:
  - websecure  # For HTTPS

Can't Access Traefik Dashboard¶

Fix:

# Port-forward to access
kubectl port-forward -n traefik svc/traefik 9000:9000
# Open http://localhost:9000/dashboard/

Connectivity Testing¶

Debug Container¶

Run a debug container with network tools:

kubectl run -it --rm debug --image=nicolaka/netshoot --restart=Never -- /bin/bash

# Inside container:
# ping <ip>
# nslookup <domain>
# curl <url>
# traceroute <ip>

Test Service Connectivity¶

From within cluster:

# Test service by name
kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- curl http://<service>.<namespace>.svc.cluster.local

# Test service by IP
kubectl get svc <service> -n <namespace>
kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- curl http://<cluster-ip>:<port>

From external:

# Test LoadBalancer service
curl http://<external-ip>:<port>

# Test Ingress
curl -k https://<domain>

Firewall and Security¶

Port Requirements¶

k3s requires: - 6443: Kubernetes API - 10250: kubelet - 2379-2380: etcd (control plane nodes) - 8472: Flannel VXLAN - 51820-51821: Flannel Wireguard

Check firewall (on nodes):

ssh <node>
sudo iptables -L -n -v | grep -E "(6443|10250|8472)"

DNS Record Management¶

For external access:

Verify DNS points to MetalLB IP:

nslookup <your-domain>
# Should resolve to your MetalLB external IP

Update DNS (example with CloudFlare):
Add A record: *.example.com → <metallb-ip>
Or specific: app.example.com → <metallb-ip>

Wait for propagation:

# Check from multiple locations
dig <your-domain> @8.8.8.8

Performance Issues¶

High Network Latency¶

Diagnosis:

# Test latency between nodes
kubectl run -it --rm debug --image=nicolaka/netshoot --restart=Never -- ping <node-ip>

# Check network interface stats
ssh <node>
ifconfig
netstat -i

Causes: - Network congestion - Faulty network hardware - Wi-Fi (use wired for k8s nodes!)

Quick Reference¶

# Check all network components
kubectl get pods -A | grep -E "(coredns|flannel|traefik|metallb)"

# Test DNS
kubectl run -it --rm debug --image=nicolaka/netshoot --restart=Never -- nslookup kubernetes.default

# Test connectivity to service
kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- curl http://<service>.<namespace>.svc.cluster.local

# Check service endpoints
kubectl get endpoints <service> -n <namespace>

# Traefik dashboard
kubectl port-forward -n traefik svc/traefik 9000:9000

# Check Traefik logs
kubectl logs -n traefik -l app.kubernetes.io/name=traefik --tail=200

# MetalLB status
kubectl get svc -A | grep LoadBalancer
kubectl logs -n metallb-system -l component=speaker --tail=100

[CSI]: Container Storage Interface
[IOMMU]: Input-Output Memory Management Unit. Used to virualize memory access for devices. See Wikipedia