OpenVPN Production Reference Architecture (No-Surprises Hardened Design)

 

OpenVPN Production Reference Architecture (No-Surprises Hardened Design)

Purpose

This document defines a production-ready OpenVPN architecture with explicit operational failure modes, high availability design, and safe default configurations. It is designed to eliminate common misinterpretations that lead to outages, security gaps, and false assumptions in enterprise deployments.

This is not a compliance certification. It is a hardened engineering reference implementation aligned with Zero Trust principles and commonly interpreted NIST 800-53 / 800-207 control objectives.


1. High-Level Architecture

                 ┌──────────────────────────┐
                 │   Active Directory (AD)  │
                 └─────────────┬────────────┘
                               │
                         ┌─────▼─────┐
                         │   NPS     │  (RADIUS Authentication)
                         └─────┬─────┘
                               │
                               │ RADIUS
                               ▼
┌──────────────┐     TLS     ┌──────────────┐
│  VPN Client  │────────────►│ OpenVPN Pool  │◄────────────┐
└──────────────┘             │ (HA Nodes)   │             │
        │                    └──────┬───────┘             │
        │ Cert + MFA                │                     │
        ▼                          ▼                     │
   MFA Provider              Internal Network            │
        │                                              PKI (Step-CA HA)
        ▼                                                   │
   Authentication Layer                                     ▼
                                              Certificate Issuance / Renewal

2. OpenVPN Production Baseline Configuration (Hardened)

Server Configuration (Validated Safe Baseline)

port 1194
proto udp
dev tun

# PKI
ca ca.crt
cert server.crt
key server.key

# Cryptography baseline
tls-version-min 1.2
auth SHA256
cipher AES-256-GCM
ncp-ciphers AES-256-GCM:AES-128-GCM

# TLS protection (mandatory)
tls-crypt ta.key

# Identity enforcement
remote-cert-tls client
verify-x509-name client name-prefix

# Network hygiene
user nobody
group nogroup
persist-key
persist-tun

# Stability
keepalive 10 60
explicit-exit-notify 1

# Logging
verb 3
status /var/log/openvpn-status.log
log-append /var/log/openvpn.log

# Auth integration (choose one)
# PAM
# plugin /usr/lib/openvpn/openvpn-plugin-auth-pam.so openvpn

# RADIUS (NPS)
# plugin /usr/lib/openvpn/radiusplugin.so /etc/openvpn/radiusplugin.cnf

Client Configuration (Hardened)

client
dev tun
proto udp
remote vpn.example.com 1194
resolv-retry infinite
nobind
persist-key
persist-tun
remote-cert-tls server

cipher AES-256-GCM
auth SHA256
verb 3

3. High Availability Dependency Chain

Critical Path

Client → DNS → Load Balancer → OpenVPN Node → Auth (NPS/PAM) → AD/MFA
                                      │
                                      ▼
                                   PKI (Step-CA HA)

HA Targets (Operational Objectives)

MetricTarget
VPN RTO< 15 minutes
VPN RPO0 (no session persistence required)
Auth availability99.9%+

PKI High Availability Design

  • Primary Step-CA node

  • Secondary standby CA (hot-standby or replicated DB)

  • Offline root CA (air-gapped)

  • Automated backup of the CA state every 24h


NPS Redundancy

  • At least 2 NPS servers

  • Load-balanced via DNS round-robin or RADIUS failover list

  • AD replication is required across sites


4. Failure Mode & Impact Table

FailureImpactBehaviorRecovery
NPS downAuth failureImmediate denialFailover NPS
MFA downAuth failureImmediate denialBreak-glass only
PKI downNo new certsExisting users OKRestore CA
CRL staleRevocation delayedRisk window existsReload required
OpenVPN node failurePartial outageFailover requiredLB reroute
DNS failureFull outageNo connectionDNS restore
TLS-crypt mismatchTotal failureNo handshakeredeploy key

5. Certificate & MFA Failure Scenarios

Certificate

Expiration Failure

  • Bulk outage at expiry boundary

  • Prevented via automated renewal monitoring

Revocation Delay

  • CRL is NOT real-time

  • The risk window exists until reload or reconnect

MFA

Provider outage

  • Full authentication failure

  • Requires break-glass access

Time drift

  • TOTP failures globally

  • Prevented via strict NTP enforcement


6. Break-Glass Access (FULL DESIGN)

Method 1: Offline Emergency Account

  • Disabled by default

  • Stored in a secure vault (offline or PAM vault)

  • Requires console activation

Method 2: Certificate-only VPN profile

  • No MFA dependency

  • Restricted IP range

  • Firewall-limited access

Method 3: Out-of-band access

  • iDRAC / iLO / hypervisor console

  • Independent of the VPN stack

Controls

  • All break-glass usage logged

  • Time-bound activation (auto-expire)

  • Requires dual approval in enterprise environments


7. Identity-Based Segmentation (Correct Model)

OpenVPN does NOT support VLAN tagging.

Enforcement Methods

  • CCD (Client Config Directory)

  • Static IP pools per identity group

  • Firewall policy enforcement

Example

GroupIP RangePolicy
Admins10.8.0.0/24Full
Users10.8.1.0/24Limited
Contractors10.8.2.0/24Restricted

8. Operational Runbooks

VPN Node Failure

  1. Remove node from LB

  2. Redirect traffic to the healthy node

  3. Investigate logs

Auth Failure (NPS)

  1. Failover to secondary NPS

  2. Check AD replication

  3. Validate RADIUS health

PKI Failure

  1. Stop issuance

  2. Restore from backup

  3. Restart CA services


9. Monitoring & Health Checks

Required Metrics

  • VPN active sessions

  • Auth success/failure rate

  • CPU/memory on VPN nodes

  • Disk usage (log safety)

  • Certificate expiration window

Alert Thresholds

  • Auth failure rate > 10% in 5 min → critical

  • Disk usage > 80% → warning

  • Certificate expiry < 14 days → alert


10. Production Security Baseline Checklist (FULL)

This checklist defines the minimum enforceable controls required for a production OpenVPN deployment. Every item must be implemented and validated in staging before production release.


Identity

  • Client certificate authentication is mandatory for all VPN connections (no password-only authentication permitted)

  • The certificate authority is stored in a restricted administrative environment with an offline backup of the root CA

  • MFA is enforced for all interactive authentication flows via PAM or RADIUS (NPS)

  • AD or a centralized identity provider is the source of truth for all user lifecycle management

  • Group membership is used for authorization decisions (not local OpenVPN config logic)

  • Break-glass accounts exist, are stored offline, and are tested quarterly

  • Identity deprovisioning removes VPN access within 24 hours of account disablement


Cryptography

  • TLS-crypt is enabled for all VPN connections to protect the control channel traffic

  • TLS minimum version is enforced at 1.2 or higher; TLS 1.0 and 1.1 are disabled

  • Cipher suite is restricted to AES-256-GCM (preferred) or AES-128-GCM (fallback)

  • HMAC authentication uses SHA256 or stronger

  • Forward secrecy is enabled through modern TLS negotiation

  • Weak ciphers, static keys, and legacy renegotiation are disabled

  • Certificate key strength is a minimum of RSA 2048-bit or ECDSA P-256


PKI

  • Certificate authority architecture includes an offline root CA and an online intermediate CA

  • Certificate issuance is automated via Step-CA or an equivalent secure CA system

  • Certificate validity periods are short-lived (recommended 7–30 days for clients)

  • The automated renewal process is tested and monitored for failure conditions

  • CRL generation and distribution are automated and validated at least every 24 hours

  • The revocation process is documented and tested with measured propagation delay

  • CA private keys are stored in encrypted storage or hardware-backed security modules (HSM, where possible)

  • Backup and disaster recovery procedures for PKI are tested quarterly


Network

  • VPN IP pools are explicitly defined and segmented per identity group

  • No overlapping IP ranges exist across VPN pools or internal subnets

  • CCD (Client Config Directory) or equivalent policy-based assignment is used for segmentation

  • Firewall rules enforce least-privilege access between VPN user groups

  • DNS resolution for VPN clients is restricted to internal resolvers only

  • Split tunneling is explicitly defined and disabled unless the business justifies it

  • Routing tables are explicitly documented and validated for each VPN segment


High Availability (HA)

  • At least two OpenVPN nodes are deployed in active/active or active/passive configuration

  • A load balancer or failover mechanism (HAProxy or keepalived) is implemented and tested

  • NPS/RADIUS authentication servers are deployed in a redundant configuration

  • Active Directory replication is validated across authentication sites

  • PKI system includes a recovery or standby capability for certificate issuance

  • DNS redundancy is configured for VPN endpoint resolution

  • Failure of any single VPN node does not result in a full service outage

  • HA failover has been tested under simulated real-world outage conditions


Logging & Monitoring

  • Authentication success and failure events are logged and forwarded to SIEM

  • VPN session start, stop, and duration logs are collected centrally

  • MFA success and failure events are logged and correlated with authentication attempts

  • Certificate issuance, renewal, and revocation events are logged

  • Logs are retained for a minimum of 90 days in hot storage and longer in archive storage per policy

  • Log storage is protected against tampering using append-only or immutable storage mechanisms

  • Time synchronization (NTP) is enforced across all infrastructure components

  • Disk usage monitoring is implemented to prevent log exhaustion outages

  • Alerts are configured for authentication anomalies, brute-force attempts, and MFA failures


Operations

  • All configuration changes are managed through formal change control processes

  • Infrastructure-as-code or version-controlled configuration management is enforced where possible

  • OpenVPN configuration changes are tested in staging before production deployment

  • Restart and reload behavior has been validated for the specific OpenVPN version in use

  • Rollback procedures are documented, tested, and executable within the defined RTO

  • Dependency failures (MFA, NPS, PKI, DNS) are documented with known impact behavior

  • Patch management for VPN servers is scheduled and tested for compatibility

  • Administrative access to the VPN infrastructure is restricted and audited


Recovery

  • Break-glass access mechanisms are defined, tested, and secured offline

  • Out-of-band management access (iDRAC, iLO, hypervisor console) is available for all VPN nodes

  • The full VPN outage recovery procedure is documented step-by-step

  • Recovery procedures include DNS restoration, authentication failover, and PKI restoration steps

  • Recovery drills are performed at least annually to validate operational readiness

  • Backup restoration procedures for PKI and VPN configuration are tested regularly

  • Emergency access usage is logged, reviewed, and time-restricted


11. Key Engineering Warnings (DO NOT IGNORE) Key Engineering Warnings

  • OpenVPN does NOT enforce AD policies directly

  • CRL is NOT real-time revocation

  • TLS-crypt failure causes total outage

  • MFA dependency failure causes a full lockout

  • IP segmentation requires a CCD or firewall enforcement

  • Reload behavior is environment-dependent


Final Statement

This architecture eliminates ambiguity by explicitly defining:

  • What fails

  • How it fails

  • How to recover

  • What dependencies exist

It is intended as a production engineering standard, not a theoretical security model.

Comments

Popular posts from this blog

Building a Secure Virtual OPNsense 26.1 Firewall with VLANs, DMZ, and CARP High Availability

Proxmox VE + full Kubernetes (kubeadm) step-by-step

Monitoring Virtualized Environments with Graylog: A Complete Guide