This post provides a battlefiled for these 4 Docker multi-host network solutions, including features and performances.
If you want to see the results directly, directly jump to the Conclusion chapter.
Docker kicked off with a simple single-host networking from the very beginning. Unfortunately, this prevents Docker clusters from scale out to multiple hosts. A number of projects put their focus on this problem such as Calico, Flannel and Weave, and also since Nov. 2015, Docker support the Multi-host Overlay Networking itself.
What these projects have in common is trying to control the container’s networking configurations, thus to capture and inject network packets. Consequently, every containers located on different hosts can get IPs in the same subnet and communicate with each other as if they are connected to the same L2 switch. In this way, containers could spread out on multiple hosts, even on multiple data centers.
While there are also a lot of differences between them from technical models, network topology and features. This post will mainly focus on the differences between Calico, Flannel, Weave and Docker Overlay Network, and you could choose the right solution which fits best to your requirements.
According the features these Big Four support, I will compare them in the following aspects:
Now let’s see more details of these aspects on Calico, Flannel, Weave and Docker Overlay Network.
Multi-host networking means aggregating containers on different hosts to a same virtual network, and also these networking providers (Calico, etc.) are organized as a clustering network, too. The cluster organizations are called network model in this post. Technically, these four solutions uses different network model to organize their own network topology.
Calico implements a pure Layer 3 approach to achieve a simpler, higher scaling, better performance and more efficient multi-host networking. So Calico can not be treated as an overlay network
. The pure Layer 3 approach avoids the packet encapsulation associated with the Layer 2 solution which simplifies diagnostics, reduces transport overhead and improves performance. Calico also implements BGP protocl for routing combined with a pure IP network, thus allows Internet scaling for virtual networks.
Flannel has two different network model to choose. One is called UDP backend, which is a simple IP-over-IP solutions which uses a TUN device to encapsulate every IP fragment in a UDP packet, thus forming an overlay network; the other is a VxLAN backend, which is same as Docker Overlay Network. I have run a simple test for these two models, VxLAN is much more faster than UDP backend. The reason, I suggest, is that VxLAN is well supported by Linux Kernel, while UDP backend implements a pure software-layer encapsulation. Flannel requires a Etcd cluster to store the network configuration, allocate subnets and auxiliary data (such as host’s IP). And the packet routing also requires the cooperation of Etcd cluster. Besides, Flannel runs a seperate process flanneld
on host environment to support packet switching. Apart from Docker, flannel can also used for traditional VMs.
Weave also has two different connection modes. One is called sleeve
, which implements a UDP channel to tranverse IP packets from containers. The main differences between Weave sleeve mode and Flannel UDP backend mode is that, Weave will merge multiple container’s packet to one packet and transfer via UDP channel, so technically Weave sleeve mode will be a bit faster than Flannel UDP backend mode in most cases. The other connection mode of Weave is called fastdp
mode, which also implements a VxLAN solutions. Though there’s no official documents clarifying the VxLAN usage, we still can found the usage of VxLAN from Weave codes. Weave runs a Docker container performing the same role as flanneld
.
Docker Overlay Network implements a VxLAN-based solution with the help of libnetwork
and libkv
, and, of course, is integrated into Docker succesfully without any seperate process or containers.
So a brief conclusion of network model is in the following table:
Calico | Flannel | Weave | Docker Overlay Network | |
---|---|---|---|---|
Network Model | Pure Layer-3 Solution | VxLAN or UDP Channel | VxLAN or UDP Channel | VxLAN |
Since containers are connected to each other, we need a method to put containers into different groups and isolate containers in different group.
Flannel, Weave and Docker Overlay Network uses the same application isolation schema - the traditional CIDR isolation. The traditional CIDR isolation uses netmask to identify different subnet, and machines in different subnet cannot talk to each other. For example, w1/w2/w3 has IP 192.168.0.2/24 192.168.0.3/24 and 192.168.1.2/24 seperately. w1 and w2 can talk to each other since they are in the same subnet 192.168.0.0/24, but w3 cannot talk to w1 and w2.
Calico implements another type of application isolation schema - profile. You can create a batch of profiles and append containers with Calico network into different profiles. Only containers in the same profile could talk to each other. Containers in differen profile cannot access to each other even though they are in the same CIDR subnet.
Brief conclusion:
Calico | Flannel | Weave | Docker Overlay Network | |
---|---|---|---|---|
Application Isolation | Profile Schema | CIDR Schema | CIDR Schema | CIDR Schema |
Since Calico is a pure Layer-3 solution, not all Layer-3 or Layer-4 protocols are supported. From the official github forum, developers of Calico declaims only TCP, UDP, ICMP ad ICMPv6 are supported by Calico. It does make sense that supporting other protocols are a bit harder in such a Layer-3 solution.
Other solutions support all protocols. It’s easy for them to achieve so because either udp encapsulation or VxLAN can support encapsulate L2 packets over L3. So it doesn’t matter what kind of protocol the packet holds.
Brief conclusion:
Calico | Flannel | Weave | Docker Overlay Network | |
---|---|---|---|---|
Protocol Support | TCP, UDP, ICMP & ICMPv6 | ALL | ALL | ALL |
Weave supports a name service between containers. When you create a container, Weave will put it into a DNS name service with format {hostname}.weave.local. Thus you can access to any container with {hostname}.weave.local or simply use {hostname}. The suffix (weave.local) can be changed to other strings, and the DNS lookup service can also be turned off.
The others don’t have such feature.
Brief conclusion:
Calico | Flannel | Weave | Docker Overlay Network | |
---|---|---|---|---|
Name Service | No | No | Yes | No |
As to Calico, Flannel and Docker Overlay Network, a distributed storage such as Etcd and Consul is a requirement to change routing and host information. Docker Overlay Network can also cooperate with Docker Swarm’s discovery services to build a cluster.
Weave, however, doesn’t need a distributed storage because Weave itself has a node discovery service using Rumor Protocol. This design decouples with another distributed storage system while introduces complexity and consistency concern of IP allocations, as well as the IPAM performance when cluster grows larger.
Brief conclusion:
Calico | Flannel | Weave | Docker Overlay Network | |
---|---|---|---|---|
Distributed Storage Requirements | Yes | Yes | No | Yes |
Flannel supports TLS encryption channel between Flannel and Etcd, as well as data path between Flannel peers. You can see more details on flanneld --help
with -etcd-certfile
and -remote-certfile
parameters.
Weave can be configured to encrypt both control data passing over TCP connections and the payloads of UDP packets sent between peers. This is accomplished with the NaCl crypto libraries employing Curve25519, XSalsa20 and Poly1305 to encrypt and authenticate messages. Weave protects against injection and replay attacks for traffic forwarded between peers.
Calico and Docker Overlay Network doesn’t support any kinds of encryption method, neither Calico-Etcd channel nor data path between Calico peers. But Calico achieves best performance among these four solutions, so it’s better fit for an internal environment or if you don’t care about data safety.
Brief conclusion:
Calico | Flannel | Weave | Docker Overlay Network | |
---|---|---|---|---|
Encryption Channel | No | TLS | NaCl Library | No |
Weave can be deployed in a partially connected network, a brief example is as follows:
There are four peers with peer 1~3 connect with each other and peer 4 only connects to peer3. Weave can be deployed on peer 1~4. Any traffic from containers on peer 1 to containers on peer 4 will be traversed via peer 3.
This feature allows Weave connects hosts aparted by a firewall, thus connects hosts with internal IP address in different data centers.
Others don’t have such feature.
Brief conclusion:
Calico | Flannel | Weave | Docker Overlay Network | |
---|---|---|---|---|
Partially Connected Network Support | No | No | Yes | No |
Since Weave and Docker Overlay Network create a bridged device and a veth inner containers, they create a seperate vNIC for containers. Routing table of container is also changed, thus bypass all packets of clustered network to this newly created NIC. Other connections, such as google.com, will route to the original vNIC.
Calico can use a unified vNIC for container because it’s a pure Layer-3 solution. Calico can configure NAT for out-going requests and forward subnet packages to other Calico peers. Calico can also use Docker bridged NIC for out-going requests with some manual configuration inner containers. In this way, you need to add -cap-add=Net_Admin
parameter when execute docker run
.
Flannel directly use docker local bridge docker0
to handle all the transportation, so containers with Flannel network will only see one vNIC inside.
Brief conclusion:
Calico | Flannel | Weave | Docker Overlay Network | |
---|---|---|---|---|
Seperate vNIC for Container | No | No | Yes | yes |
Technically, for VxLAN-based solutions, tenant networks can have overlapping internal IP address, though IP addresses assigned to hosts must be unique. According to VxLAN speculations, Weave, Flannel and Docker Overlay Network can support IP overlap for containers. But on my testing environment, I cannot configure any of these three support IP overlap. So I can only say they have potential to support IP overlap.
Calico cannot support IP overlap technically, but Calico official documents emphasize that they can put overlapping IPv4 containers’ packets on IPv6 network. Although this is an alternative solution for IPv4 network, I prefer to treate Calico not support IP overlap.
Brief conclusion:
Calico | Flannel | Weave | Docker Overlay Network | |
---|---|---|---|---|
IP Overlap Support | No | Maybe | Maybe | Maybe |
This section focus on whether container subnet can overlap with host network.
Flannel creates a real bridged network on the host with the subnet address, and use host Linux routing table to forward container packages to this bridge device. So container’s subnet of Flannel cannot be overlap with host network, or host’s routing table will be confused.
Calico is a pure Layer-3 implementation and packets from container to outter world will tranverse NAT table. So Calico also has such restriction that container subnet cannot overlap with host network.
Weave doesn’t use host routing table to differentiate packages from containers, but use the pcap
feature to deliver packages to the right place. So Weave doesn’t need to obey the subnet restriction and it’s free to allocate container a same IP address as host. Besides you can also change IP configurations inner container and the container could be reached by the new IP.
Docker Overlay Network allows container and host in the same subnet and achieve the isolation between them. But Docker Overlay Network rely on etcd to record routing information, so changing container’s IP address manually will mess the routing process can lead container beyond reach.
Brief conclusion:
Calico | Flannel | Weave | Docker Overlay Network | |
---|---|---|---|---|
Container Subnet Restriction | No | No | Yes, configurable after start | Yes, not configurable after start |
So let’s give a final conclusion of all the aspects into one table. This table is one of the best references for you to choose a right multi-host networking solution.
Calico | Flannel | Weave | Docker Overlay Network | |
---|---|---|---|---|
Network Model | Pure Layer-3 Solution | VxLAN or UDP Channel | VxLAN or UDP Channel | VxLAN |
Application Isolation | Profile Schema | CIDR Schema | CIDR Schema | CIDR Schema |
Protocol Support | TCP, UDP, ICMP & ICMPv6 | ALL | ALL | ALL |
Name Service | No | No | Yes | No |
Distributed Storage Requirements | Yes | Yes | No | Yes |
Encryption Channel | No | TLS | NaCl Library | No |
Partially Connected Network Support | No | No | Yes | No |
Seperate vNIC for Container | No | No | Yes | yes |
IP Overlap Support | No | Maybe | Maybe | Maybe |
Container Subnet Restriction | No | No | Yes, configurable after start | Yes, not configurable after start |
My future plan is to test the performance of these four multi-host network solutions. Since there are too many contents on this post, I will create a new post to show the details of performance test.
]]>Weave creates a virtual network that connects Dockers deployed across multiple hosts as well as their DNS discovery. Dockers on different hosts can communicate with each other just the same as they are in the same LAN, and broadcast is also well supported in such LAN network. Besides Dockers can discover each other by hostname implemented by Weave DNS discovery module, which is not supported by other multi-host network solutions.
Weave can also tranverse the firewall and operate in partially connected networks. Packets will tranvers via a shortest path to the destination host contains Docker, even though the host hides behind a firewall and the sender host cannot access destination host directly. Traffic can also be encrypted, allowing hosts connect each other via untrusted network.
Weave cooperates with Docker current single host or overlay network also, so there would be a seperate NIC for weave in Docker, as well as a weave virtual NIC on the host to capture all the packets send from Dockers.
Two or more hosts (VM or PM) are need to setup a Docker cluster via weave. Here I use two Ubuntu 15.10 VM located on VMs running on my Mac. Let’s name these two hosts node1 and node2 with IP 10.156.75.101 and 10.156.75.102 seperately. Please ensure you are running Linux (Kernel 3.8 or later) and have Docker (version 1.3.1 or later) installed. curl
or any alternative software (e.g. wget) is also necessary to download weave binary file.
Then run such commands to finish weave installation:
1 | sudo curl -L git.io/weave -o /usr/local/bin/weave |
Thus weave is installed succesfully. It’s so easy, right? The most important part for weave is not the binary itself. When weave starts, two Dockers weaveworks/weaveexec
and weaveworks/weave
will run to handle all the network configurations and network discovery service.
Run on node1 to start weave service:
1 | root@node1:~# weave launch |
You can see two weave Dockers here. Then on node2, launch weave with it’s partener node1 (10.156.75.102):
1 | root@node2:~# weave launch 10.156.75.101 |
To confirm that weave cluster starts sucessfully, run following command on node1:
1 | root@node1:~# weave status connections |
Now you sucessfully setup a weave connection between node1 and node2.
After weave cluster started, you could run Docker on node1 and node2
1 | root@node1:~# weave run -itd --name=w1 ubuntu |
Then these two Dockers can communicate with each other. Test with a simple ping
:
1 | root@node1:~# docker exec w1 ping -c4 w2 |
After setting up Weave network, I use perf
to perform a simple performance test between Dockers on same/different hosts and compare them with native network performance.
Here is the native performance between two hosts:
1 | root@node1:~# iperf -c node2 |
And the performance between Dockers on different hosts:
1 | root@w3:/# iperf -c w1 |
The performance between Dockers on the same host:
1 | root@a1:/# iperf -c w1 |
You can see network between Dockers on the same host is quite faster, the reason is Weave use pcap
to identify whether the packet’s destination is located on the same host or not. Thus for the communications of Dockers on the same host, Weave could directly forward the packets to the right destination.
This is only a simple performance test. I will perform a detailed test in the following blog with the comparison of Weave, Calico, Flannel, Docker Overlay Netowrk.
The main difference between Weave and other Docker multi-host network solutions is that Weave network uses a number peers
to perform as the routers residing on different hosts. These routers build a network of these hosts and sends or routes packets to the right destination. Each peer has a human friendly nickname and a unique identifier which is different on its each run.
Weave routers establish TCP connections to each other to perform starting handshakes and topology exchange on the runtime. Peers also establish UDP tunnels to carry encapsulate network packets. These packets can tranverse firewall with the help of other routers.
Weave creates a network bridge on each host, and each container is connected to this bridge. After you start a Docker with Weave network, you could find the created bridge via ifconfig
.
1 | root@node1:~# ifconfig |
This bridge performs the packets forwarding to and from the Dockers. Besides, Dockers connected to this bridge also creates a veth NIC. The container side veth is given an IP address and netmask by Weave’s IPAM module. The Weave router captures Ethernet packets from the bridged interface using pcap
feature. This typically bypass packets tranversing between local containers, which will gain a better local containers networking performance. For packets between different hosts, Weave router will choose a best routing and send the packet to the next hop.
Differ from other solutions, Weave doesn’t rely on distributed storage (e.g. etcd and consul) to exchange routing information. Weave peers build a routing network themselves and implement rumour protocol to exchange networking topology when new peer adds and exits. Weave can also perform on a partially connected network and exchange packets with the help of other peers. Given a partially connected network as follows:
Peer 1/2/3 are connected to each other while peer 4 only connects to peer 3. If containers on peer 1 want to talk to containers on peer 4, the packets will first be send to peer 3 and then to peer 4. The connections between two directly connected host could achieve a fastdp
connection and the indirect connections can only use sleeve
connection. These two different connections have a huge gap in speed. I run a simple test with iperf
on three containers, w1
& w2
and w1
& w3
locate on the directly connected hosts but w2
& w3
locate on the indirectly connected hosts. From the indirectly connected host node-pub-1
, you could run weave status connections
to retrieve the connections:
1 | root@node-pub-1:~# weave status connections |
Then speed test results are as follows:
1 | root@w3:/# iperf -c w2 |
We could see the directly connected w1 and w3 achieve a quite high performance of 1.52 Gbits/sec, which indirectly connected w2 and w3 only get about 10% bandwidth. This could be a bottlenet for Weave developers to overcome.
Weave provides a Docker API proxy to control weave docker in the same way of control Docker instead of using weave run
. This allows you using the ordinary Docker command-line interface or remote API to CRUD Dockers with Weave network.
In the previous chapters, we use weave launch
to run Weave directly, and we could see Weave-related Dockers created on the host:
1 | root@node1:~# docker ps |
For these two Weave services, weave
perform the main functions for Weave network, such as network configuration and DNS lookup. weaveproxy
performs the role of a proxy between Docker client (command line or API) and the Docker daemon, intercepting the communication between these two components.
Actually, weave launch
performs weave launch-router
and weave launch-proxy
in a batch, you could run weave launch-router
and weave launch-proxy
seperately with different parameters. For example, if you want to control Weave via a TCP port instead of a unix file socket, you just need to add -H
parameter to weave launch-proxy
. You can run weave stop-proxy
if you already use weave launch
to launch both router
and proxy
.
1 | root@node1:~# weave stop-proxy |
From weave env
, you can see the current intercepted DOCKER_HOST from Weave is tcp://127.0.0.1:9999
, you can use docker -H 127.0.0.1:9999 <command>
to control Docker with Weave network support.
You can also use following commands to add tcp://127.0.0.1:9999
to DOCKER_HOST
env params, thus you could use docker
directly without assigning the API address.
1 | root@node1:~# eval $(weave env) |
For more details about weave proxy, you can see the official weave proxy documentation page.
Some more parameters can be set when launching weave to make IP allocation more flexible, thus could achieve application isolation via the CIDR network isolation speculations. From weave help
you can see more detailed parameters for weave launch. The bad things is that there’s no more details on these params than listing them directly. But from the name of these params, you could guess what they are figuring out:
1 | root@node1:~# weave help |
For these params:
--password
: password for weave cluster, newer weave node must use this password to join--nickname
: alias of weave node instead of its hostname--ipalloc-range
: IP range allocated for Docker--ipalloc-default-subnet
: default subnet allocated for Docker, you can use -e WEAVE_CIDR=net:${CIDR}
when running a docker with other IP allocation method. See next chapter for more details.--no-discovery
: don’t use DNS discovery service--init-peer-count <count>
: start service after <count>
peers connect to the clusterSo if you want more flexible IP allocation methods, run the following commands on node1 and node2:
1 | root@node1:~# weave launch --ipalloc-range 10.2.0.0/16 --ipalloc-default-subnet 10.2.1.0/24 |
1 | root@node2:~# weave launch --ipalloc-range 10.2.0.0/16 --ipalloc-default-subnet 10.2.1.0/24 $node1 |
This delegates the entire 10.2.0.0/16 subnet to weave, and instructs it to allocate from 10.2.1.0/24 within that if no specific subnet is specified. Now we can launch some containers in the default subnet:
1 | root@node1:~# docker run --name a1 -ti ubuntu |
And some more containers in a different subnet:
1 | root@node1:~# docker run -e WEAVE_CIDR=net:10.2.2.0/24 --name b1 -ti ubuntu |
A quick ping
test could illustrates network connections betwwen a1~a2 and b1~b2:
1 | root@node1:~# docker exec a1 ping -c 4 a2 |
While no connections between a1~b2 or b1~a2:
1 | root@node1:~# docker exec a1 ping -c 4 b2 |
Weave is a good networking management tools for Docker and provides the most functions compared with other solutions. You could find more feature details on its official feature document.
[1] Weaveworks homepage, http://weave.works/
[2] Weave GitHub homepage, https://github.com/weaveworks/weave
[3] Weave features, http://docs.weave.works/weave/latest_release/features.html
[4] Weave proxy reference, http://docs.weave.works/weave/latest_release/proxy.html
Multi-host Networking was announced as part of experimental release in June, 2015, and turns to stable release of Docker Engine this month. There are already several Multi-host networking solutions for docker, such as Calico and Flannel. Docker multi-host networking uses VXLAN-based solution with the help of libnetwork
and libkv
library. So the overlay
network requires a valid key-value store service to exchange informations between different docker engines. Docker implements a built-in VXLAN-based overlay network driver in libnetwork
library to support a wide range virtual network between multiple hosts.
Before using Docker overlay networking, check the version of docker with docker -v
to confirm that docker version is no less than v1.9. In this blog I prepare an environment with two Linux nodes (node1/node2) with IP 192.168.236.130/131 and connect them physically or virtually, and confirm they have network access to each other.
ownload and run etcd, replace {node} with node0/1 seperately. We need at least two etcd node since the new version of etcd cannot run on single node.
1 | curl -L https://github.com/coreos/etcd/releases/download/v2.2.1/etcd-v2.2.1-linux-amd64.tar.gz -o etcd-v2.2.1-linux-amd64.tar.gz |
Docker Engine daemon should be started with cluster parameters --cluster-store
and --cluster-advertise
, thus all Docker Engine running on different nodes could communicate and cooperate with each other. Here we need to set --cluster-store
with Etcd service host and port and --cluster-advertise
with IP and Docker Daemon port on this node. Stop current docker daemon and start with new params.
On node1:1
2sudo service docker stop
sudo /usr/bin/docker daemon -H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock --cluster-store=etcd://192.168.236.130:2379 --cluster-advertise=192.168.236.130:2375
On node2:1
2sudo service docker stop
sudo /usr/bin/docker daemon -H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock --cluster-store=etcd://192.168.236.131:2379 --cluster-advertise=192.168.236.131:2375
All preparations are done until now.
On either node, we can execute docker network ls
to see the network configuration of Docker. Here’s the example of node1:1
2
3
4
5docker@node1:~# sudo docker network ls
NETWORK ID NAME DRIVER
80a36a28041f bridge bridge
6b7eab031544 none null
464fe03753fb host host
Then we also use docker network
command to create a new overlay network.1
2docker@node1:~# sudo docker network create -d overlay myapp
904f9dc335b0f91fe155b26829287c7de7c17af5cfeb9c386a1ccf75c42cd3eb
Wait for a minute and we can see the output of this command is the ID of this overlay network. Then execute docker network ls
on either node:1
2
3
4
5
6
7docker@node1:~# sudo docker network ls
NETWORK ID NAME DRIVER
904f9dc335b0 myapp overlay
80a36a28041f bridge bridge
6b7eab031544 none null
464fe03753fb host host
52e9119e18d5 docker_gwbridge bridge
On both node1 and node2, two network myapp
and docker_gwbridge
are added with type overlay
and bridge
seperately. Thus myapp
represents the overlay network associated with eth0
in containers, and docker_gwbridge
represents the bridge network connecting Internet associated with eth1
in containers.
On node1:1
docker@node1:~# sudo docker run -itd --name=worker-1 --net=myapp ubuntu
And on node2:1
docker@node1:~# sudo docker run -itd --name=worker-2 --net=myapp ubuntu
Then test the connection between two containers. On node1, execute:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27docker@node1:~/etcd-v2.0.9-linux-amd64# sudo docker exec worker-1 ifconfig
eth0 Link encap:Ethernet HWaddr 02:42:0a:00:00:02
inet addr:10.0.0.2 Bcast:0.0.0.0 Mask:255.255.255.0
inet6 addr: fe80::42:aff:fe00:2/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1
RX packets:5475264 errors:0 dropped:0 overruns:0 frame:0
TX packets:846008 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:7999457912 (7.9 GB) TX bytes:55842488 (55.8 MB)
eth1 Link encap:Ethernet HWaddr 02:42:ac:12:00:02
inet addr:172.18.0.2 Bcast:0.0.0.0 Mask:255.255.0.0
inet6 addr: fe80::42:acff:fe12:2/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:12452 errors:0 dropped:0 overruns:0 frame:0
TX packets:6883 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:22021017 (22.0 MB) TX bytes:376719 (376.7 KB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
Here we can see two NICs in container with IP 10.0.0.2 and 172.18.0.2. eth0
connects to the overlay network and eth1
connects to docker_gwbridge. Thus the container will both have access to containers on other host as well as Google. Run the same command on node2 and we can see the IP of eth0
in worker-2 is 10.0.0.3, which is assigned continuously.
Then test the connections between worker-1 and worker-2, execute command on node1:1
2
3
4
5
6
7
8
9
10docker@node1:~# sudo docker exec worker-1 ping -c 4 10.0.0.3
PING 10.0.0.3 (10.0.0.3) 56(84) bytes of data.
64 bytes from 10.0.0.3: icmp_seq=1 ttl=64 time=0.735 ms
64 bytes from 10.0.0.3: icmp_seq=2 ttl=64 time=0.581 ms
64 bytes from 10.0.0.3: icmp_seq=3 ttl=64 time=0.444 ms
64 bytes from 10.0.0.3: icmp_seq=4 ttl=64 time=0.447 ms
--- 10.0.0.3 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3000ms
rtt min/avg/max/mdev = 0.444/0.551/0.735/0.122 ms
I did a simple performance test between two containers with iperf
, and here is the result.
First I tested the native network performance between node1 and node2:
docker@node2:~# iperf -c 192.168.236.130
------------------------------------------------------------
Client connecting to 192.168.236.130, TCP port 5001
TCP window size: 136 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.236.131 port 36910 connected with 192.168.236.130 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 2.59 GBytes 2.22 Gbits/sec
Then network performance between worker-1 and worker-2:
root@3f8bc51fb458:~# iperf -c 10.0.0.2
------------------------------------------------------------
Client connecting to 10.0.0.2, TCP port 5001
TCP window size: 81.0 KByte (default)
------------------------------------------------------------
[ 3] local 10.0.0.3 port 48096 connected with 10.0.0.2 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 1.84 GBytes 1.58 Gbits/sec
The overlay network performance is a bit worse than native. It’s also a little worse than Calico, which is almost the same as native performance. Since Calico uses a pure 3-Layer protocol and Docker Multi-host Overlay Network uses VXLAN solution (MAC on UDP), Calico does make sense to gain a better performance.
Virtual Extensible LAN (VXLAN) is a network virtualization technology that attempts to ameliorate the scalability problems associated with large cloud computing deployments. It uses a VLAN-like encapsulation technique to encapsulate MAC-based OSI layer 2 Ethernet frames within layer 4 UDP packets. Open vSwitch is a former implementation of VXLAN, but Docker Engine implements a built-in VXLAN driver in libnetwork.
For more VXLAN details, you can see its official RFC and a white paper from EMulex. I’d like to post another blog to have more detailed discussion on VXLAN Technology.
[1] Docker Multi-host Networking Post: http://blog.docker.com/2015/11/docker-multi-host-networking-ga/
[2] Docker Network Docs: http://docs.docker.com/engine/userguide/networking/dockernetworks/
[3] Get Started Overlay Network for Docker: https://docs.docker.com/engine/userguide/networking/get-started-overlay/
[4] Docker v1.9 Announcemount: https://blog.docker.com/2015/11/docker-1-9-production-ready-swarm-multi-host-networking/
[5] VXLAN Official RFC: https://datatracker.ietf.org/doc/rfc7348/
[6] VXLAN White Paper: https://www.emulex.com/artifacts/d658610a-d3b6-457c-bf2d-bf8d476c6a98/elx_wp_all_VXLAN.pdf
Flannel, similar to Calico, VXLAN and Weave, provides a configurable virtual overlay network for Docker. Flannel runs an agent, flanneld, on each host and is responsible for allocating subnet lease out of a preconfigured address space. Flannel uses etcd to store network configurations. I copied this architecture image from Flannel GitHub page to illustrate the details of the path a packet take as it tranverse the overlay network.
Since Flannel depends on Etcd, you need to download, run and config Etcd before starting flanneld. Assume that you have two Linux VM (or physical machine) with hostname node1/node2 and IP 192.168.236.130/131 seperately. On each node download and run Etcd as follows:
1 | curl -L https://github.com/coreos/etcd/releases/download/v2.2.1/etcd-v2.2.1-linux-amd64.tar.gz -o etcd-v2.2.1-linux-amd64.tar.gz |
Flannel reads its configuration from etcd. By default, it will read the configuration from /coreos.com/network/config
(can be overridden via –etcd-prefix). You need to use etcdctl
utility to set values in etcd. On the directory you downloaded Etcd previously, run following commands:
./etcdctl set /coreos.com/network/config \
'{"Network": "10.0.0.0/8", \
"SubnetLen": 20, \
"SubnetMin": "10.10.0.0", \
"SubnetMax": "10.99.0.0", \
"Backend": { \
"Type": "udp", \
"Port": 7890}} '
sudo apt-get install linux-libc-dev golang gcc
. On Fedora/Redhat, run sudo yum install kernel-headers golang gcc
.If Flannel build failed on your local environment, you can also build flannel inside a Docker container. Confirm that you have install Docker first with docker -v
, and then execute:
1 | cd flannel |
After Etcd is set up, you need to run flanneld on both nodes:
1 | sudo ./bin/flanneld & |
Use ifconfig
to confirm the network of flanned was setup successfully, the outputs should be something like this:
flannel0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
inet addr:10.15.240.0 P-t-P:10.15.240.0 Mask:255.0.0.0
UP POINTOPOINT RUNNING NOARP MULTICAST MTU:1472 Metric:1
RX packets:606921 errors:0 dropped:0 overruns:0 frame:0
TX packets:308311 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:500
RX bytes:893358516 (893.3 MB) TX bytes:16225380 (16.2 MB)
After Flannel is running, you need to config network for docker0 and restart docker daemon with Flannel network configuration, execute commands as follows:
1 | service docker stop |
After Flannel set up, just start your docker without any differences without Flannel. Run the following command on node1:
1 | sudo docker run -itd --name=worker-1 ubuntu |
Then run Docker on node2:
1 | sudo docker run -itd --name=worker-3 ubuntu |
Then use sudo docker exec worker-N ifconfig
to get the IP of these workers (e.g. 10.15.240.2, 10.15.240.3 and 10.10.160.2 for worker-1/2/3). On node1, test connectivity to worker-3:
1 | sudo docker exec worker-1 ping -c4 10.10.160.2 |
All these pings should return successfully.
Until now Flannel is setup for Docker and all the workers are connected with each other physically. Then I did a simple performance test with iperf between two Dockers in different/same hosts.
Firstly let’s see the native network performance between two hosts:
flannel@node2:~# iperf -c 192.168.236.130
------------------------------------------------------------
Client connecting to 192.168.236.130, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.236.131 port 54584 connected with 192.168.236.130 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 2.57 GBytes 2.21 Gbits/sec
Then dockers on different host:
root@93c451432761:~# iperf -c 10.10.160.2
------------------------------------------------------------
Client connecting to 10.10.160.2, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 3] local 10.15.240.2 port 57496 connected with 10.10.160.2 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 418 MBytes 351 Mbits/sec
The performance of Dockers on the same host is pretty good.
root@93c451432761:~# iperf -c 10.15.240.3
------------------------------------------------------------
Client connecting to 10.15.240.3, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 3] local 10.15.240.2 port 38099 connected with 10.15.240.3 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 39.2 GBytes 33.7 Gbits/sec
The performace is so bad compared with native!!!!! I can’t figure out why the performance degrades too much with Flannel. Since Calico and Docker Multi-host Network can achieve more than 80% performance compared with native, Flannel does a aweful job apparently. If anyone knows why, please email me or comments under this blog.
After read through the configuration documents of Flannel, I found that flannel support two backends: UDP backend and VxLAN backend. Try VxLAN backend and the speed is much more fast and close to native performance.
There are two different backends supported by Flannel. The previous configuration on this blog uses UDP backend, which is a pretty slow solution because all the packets are encrypted in userspace. VxLAN backend uses Linux Kernel VxLAN support as well as some hardware features to achieve a much more faster network.
It’s easy to use VxLAN backend. When configuring Etcd, just define the backend
block with vxlan
.
./etcdctl set /coreos.com/network/config \
'{"Network": "10.0.0.0/8", \
"SubnetLen": 20, \
"SubnetMin": "10.10.0.0", \
"SubnetMax": "10.99.0.0", \
"Backend": { \
"Type": "vxlan"}} '
With VxLAN backend, the iperf result of two containers on different hosts are as follows:
root@93c451432761:~# iperf -c 10.15.240.3
------------------------------------------------------------
Client connecting to 10.15.240.3, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 3] local 10.15.240.2 port 38099 connected with 10.15.240.3 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 1.80 GBytes 1.56 Gbits/sec
This is an acceptable result with about 80% performance compared with native network.
[1] Flannel code base, https://github.com/coreos/flannel
[2] Using coreos flannel for docker networking, http://www.slideshare.net/lorispack/using-coreos-flannel-for-docker-networking
Calico is a pure 3-layer protocol to support multi-host network communication for OpenStacks VMs and Docker containers. Calico does not use overlay network such as falnnel and libnetwork overlay driver, it is a pure Layer 3 approach with a vRouter implementation instead of a vSwitcher. Each vRouter propagates workload reachability information (routes) to the rest of the data center using BGP protocol.
This post focus on how to setup a multi-host networking for Docker containers with calico-docker and some advanced features.
Setup two linux nodes with IP 192.168.236.130/131 and connect them physically or virtually, confirm that they can ping each other succesfully. Setup docker bridge (default is docker0) on two nodes. Let’s set two docker bridges with different network. Netowrk configuration details are as follows:
Node1
Node2
Install Docker, should be no error here.
1 | sudo apt-get install docker.io |
Download and run etcd, replace {node} with node0/1 seperately. We need at least two etcd node since the new version of etcd cannot run on single node.
1 | curl -L https://github.com/coreos/etcd/releases/download/v2.2.1/etcd-v2.2.1-linux-amd64.tar.gz -o etcd-v2.2.1-linux-amd64.tar.gz |
Download calicoctl1
wget https://github.com/projectcalico/calico-docker/releases/download/v0.10.0/calicoctl
Calico services in Docker environment are running as a Docker container using host network configuration. All containers configured with Calico services with use calico-node to communicate with each other and Internet.
Run the following commands on node1/2 to start calico-node
1 | sudo calicoctl node --ip={host_ip} |
You should see output like this on each node
calico@node1:~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
40b177803c97 calico/node:v0.9.0 "/sbin/my_init" 27 seconds ago Up 27 seconds calico-node
Before starting any containers, we need to configure an IP pool with the ipip
and nat-outgoing
options. Thus containers with an valid profile could have access to Internet. Run the following command on either node.
1 | calicoctl pool add 192.168.100.0/24 --ipip --nat-outgoing |
Firstly run a few containers on each host.
On node1:1
2docker run --net=none --name worker-1 -tid ubuntu
docker run --net=none --name worker-2 -tid ubuntu
On node2:1
docker run --net=none --name worker-3 -tid ubuntu
Now that all the containers are running without any network devices. Use Calico to assign network devices to these containers. Notice that IPs assigned to containers should be in the range of IP pools.
On node1:1
2sudo calicoctl container add worker-1 192.168.100.1
sudo calicoctl container add worker-2 192.168.100.2
On node2:1
sudo calicoctl container add worker-3 192.168.100.3
Once containers have Calico networking, they gain a network device with corresponding IP address. At this point them have access neither to each other nor to Internet since no profiles are created and assigned to them.
Create some profiles on either node:1
2calicoctl profile add PROF_1
calicoctl profile add PROF_2
Then assign profiles to containers. Containers in same profile have access to each other. And containers in the IP poll created before won’t have access to Internet until added to a profile.
On node1:1
2calicoctl container worker-1 profile append PROF_1
calicoctl container worker-2 profile append PROF_2
On node2:1
calicoctl container worker-3 profile append PROF_1
Until now all configurations are done and we will test network connections of these containers afterwards.
Now check the connectivities of each containers. At this point every containers should have access to Internet, try and ping google.com:1
2docker exec worker-1 ping -c 4 www.google.com
docker exec worker-2 ping -c 4 www.google.com
Then check connections of containers in same profile:1
docker exec worker-1 ping -c 4 192.168.100.3
And containers not in same profile cannot ping each other:1
docker exec worker-1 ping -c 4 192.168.100.2
If we add worker-2 into profile PROF_1, then worker-2 could ping worker-1 and worker-3.
On node1:1
2
3calicoctl container worker-2 profile append PROF_1
docker exec worker-2 ping -c 4 192.168.100.1
docker exec worker-2 ping -c 4 192.168.100.3
I perform a simple performance test using iperf
to evaluate the network between two Calico containers. Run iperf -s
on worker-1 and iperf -c 192.168.100.1
on worker-3. We can get the result:
root@39fdb1701da4:~# ./iperf -c 192.168.101.2
------------------------------------------------------------
Client connecting to 192.168.101.2, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.101.1 port 39187 connected with 192.168.101.2 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 1.08 GBytes 927 Mbits/sec
Then run the same test on native host (node1 and node2):
calico@node2:~# iperf -c 192.168.236.130
------------------------------------------------------------
Client connecting to 192.168.236.130, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.236.131 port 54584 connected with 192.168.236.130 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 2.57 GBytes 2.21 Gbits/sec
From the result we can see there’s a great gap between Calico network and native network. But according to the official documents and evaluations, calico network should be similar to the native network. WHY???
To find out the reason of slow network, firstly I test the network performance between workker-1 and worker-2, which are in the same host. The result is as follows:
root@51b78d9e6153:/# iperf -c 192.168.100.2
------------------------------------------------------------
Client connecting to 192.168.100.3, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.100.2 port 36476 connected with 192.168.100.3 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 47.3 GBytes 40.6 Gbits/sec
Since speed of my net card is only 1Gbits/sec, it seems that containers on the same host connects each other directly without going through any network device. That really make all sense.
Then I dived deep into the documents and configurations of Calico and found such configuration of IP pool:1
calicoctl pool add 192.168.100.0/24 --ipip --nat-outgoing
We use --ipip
option when creating IP pool, which means Use IP-over-IP encapsulation across hosts
. This option will enforce another layer of IP-over-IP encapsulation when packages traveling across hosts. Since our hosts node1 and node2 are in the same network (192.168.236.0/24), we could avoid this option and the speed should increase as supposed.
If your hosts located in different L2 network, which means can only connected to each other via IP network, you need to add --ipip
options when starting Calico.
Run the following command on either node to override the previous IP pool configuration.1
2calicoctl pool add 192.168.100.0/24 --nat-outgoing
calicoctl pool show
Then test networking between worker-1 and worker-3 again:
root@39fdb1701da4:~# ./iperf -c 192.168.101.2
------------------------------------------------------------
Client connecting to 192.168.101.2, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.101.1 port 39187 connected with 192.168.101.2 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 2.74 GBytes 2.35 Gbits/sec
Hurray!!! That’s the native speed!
Calico can be integrated into Docker network after Docker released it’s v1.9 Docker Engine. Calico runs another container as Docker network plug-in, and integrates into Docker docker network
commands.
Integrated Calico needs Docker Engine running on cluster mode. Stop original Docker daemon on node1/2 and run with cluster parameters:
1 | root@node1:~# sudo service docker stop |
Then run Calico with --libnetwork
param:
1 | root@node1:~# calicoctl node --libnetwork --ip={NODE_IP} |
The new command docker network
is introduced since Docker Engine v1.9 can be used to create a logical network. With the support of calico-libnetwork container, docker network
can create a network with calico network driver as follows:
1 | root@node1:~# docker network create --driver=calico --subnet=192.168.0.0/24 net1 |
You can see network net1 with driver type calico.
If you are running in a cloud environment (AWS, DigitalOcean, GCE), you will need to configure the network with --ipip
and --nat-outgoing
options. On either host, run:
1 | docker network create --driver=calico --opt nat-outgoing=true --opt ipip=true --subnet=192.168.0.0/24 net1 |
Note that we use the Calico driver calico. This driver is run within the calico-node container. We explictly choose an IP Pool for each network to avoid IP confliction. Then run docker directly with --net=net1
option without any other auxiliary configuration.
1 | root@node1:~# docker run --net net1 --name worker-1 -tid ubuntu |
A cali0 veth in container is created to communicate with other containers connected to the same net1
. There’s a little difference compared with previous configuration, another eth1 veth is created to act as normal NIC.
Calico implements a pure Layer-3 solution which encapsulate L3 package over IP or broadcast network. Though the pure Layer-3 solution brings greate performance, it also introduce a batch of limitations.
--ipip
option is a must in a public data center connected with IP network.Q: What is --ipip
options used for when configuring Calico pool?
A: --ipip
option means IP-over-IP mode. By default, calico broadcast the IP packages to all hosts through L2 switch and filter the packages by host’s routing table. For hosts connected with IP network, Calico need to encapsulate container’s IP packets in an outer IP packets and transfer to the remote host. So if you use Calico on a public data center, you’d better add --ipip
option.
Q: How to assign Etcd address instead of using default value “localhost:4001”?
A: Run export ETCD_AUTHORITY={ETCD_HOST}:{ETCD_PORT}
on shell before running calico node
.
**Q: What if I don’t want to use a distributed storage such as Etcd?
A: Choose an alternative solution - Weave, which has an internal routing mechanism.
[1] Project Calico: https://github.com/projectcalico/calico
[2] Calico Docker: https://github.com/projectcalico/calico-docker
[3] Demenstration on calico-docker: https://github.com/projectcalico/calico-docker
[4] Calico-docker in Yixin: Paper URL
Before setting up L2TP/IPSec environment, you need to enable PPP support for VPS. See details on section “Enable PPP Support of VPS” of my previous post “Setup PPTP Server on a VPS“ to enable PPP support on RamNode VPS.
When I first installed xl2tpd and openswan, it occured to me the following errors and refused my iPhone VPN connection:
May 19 05:48:46 xxx xl2tpd[1343]: result_code_avp: result code endianness fix for buggy Apple client. network=768, le=3
If you get the same error message, just follow step by step with me to setup L2TP+IPSec VPN.
Here I use openswan as my IPSec server. Just use the following commands to install xl2tpd and openswan:
1 | sudo apt-get install openswan ppp xl2tpd |
We need to configure two files for xl2tpd: /etc/xl2tpd/xl2tpd.conf
and /etc/ppp/options.xl2tpd
Here’s an example of /etc/xl2tpd/xl2tpd.conf
:
[global]
listen-addr = 106.186.127.239
[lns default]
ip range = 10.20.0.2-10.20.0.100
local ip = 10.20.0.1
assign ip = yes
length bit = yes
refuse pap = yes
require authentication = yes
pppoptfile = /etc/ppp/options.xl2tpd
“ip range” defined IPs distributed to the client side and “local ip” is assigned to the server side. pppoptfile defines the detailed config file for xl2tpd.
Then create file /etc/ppp/options.xl2tpd
and add:
ms-dns 8.8.8.8
ms-dns 8.8.4.4
noccp
asyncmap 0
auth
crtscts
lock
hide-password
modem
mru 1200
nodefaultroute
debug
mtu 1200
proxyarp
lcp-echo-interval 30
lcp-echo-failure 4
ipcp-accept-local
ipcp-accept-remote
noipx
idle 1800
connect-delay 5000
IPSec acts as a role to provide a secure routine for transferring data. OpenSwan is a good choice to set up a simple IPSec. Note that there are many IPSec choices and they should be exclusively installed in your system. And whatever IPSec server you installed, the command to call them is only “ipsec“. Use the following command to identify which IPSec service you’re using now.
ipsec --version
The config file for OpenSwan is /etc/ipsec.conf. Actually this file name is identical for all IPSec service, which the content differs anyway. When you installed another IPSec service with apt-get, you need to change the format and contents of this file.
Here’s an example of this file:
version 2.0
config setup
dumpdir=/var/run/pluto/
nat_traversal=yes
virtual_private=%v4:10.0.0.0/8,%v4:192.168.0.0/16,%v4:172.16.0.0/12,%v4:25.0.0.0/8,%v6:fd00::/8,%v6:fe80::/10
protostack=netkey
force_keepalive=yes
keep_alive=60
conn l2tp-psk
authby=secret
pfs=no
auto=add
keyingtries=3
type=transport
left=1.2.3.4 # change to your own IP
leftprotoport=17/1701
right=%any
rightprotoport=17/%any
The “virtual_private” line shows which network could use this IPSec routine, leave it as what it is. The only line you need to change is “left”, which should be your VPS IP address.
Then we need to create and edit file /etc/ipsec.secrets
.
: PSK "sharedpassword"
Note that there’s blank before and after colon!
“sharedpassword” should be used as the “shared secret” when you connect L2TP.
Edit file /etc/ppp/chap-secrets
, which is the same as PPTP server. Use the format like this:
yourname * yourpassword *
It’s also the same as PPTP server, you just need to edit file /etc/sysctl.conf
and add (or change) a following line:
net.ipv4.ip_forward=1
Then exit to shell and execute:
1 | sudo sysctl -p |
To add iptables rules, add the following lines in /etc/rc.local
:
iptables -t nat -A POSTROUTING -s 10.20.0.0/24 -o venet0 -j MASQUERADE
iptables -A FORWARD -p tcp --syn -s 10.20.0.0/24 -j TCPMSS --set-mss 1356
Note “-s 10.20.0.0/24” should be the net range defined in “ip range” section of /etc/xl2tpd/xl2tpd.conf
.
At last, restart xl2tpd and ipsec:
1 | sudo service xl2tpd restart |
Enjoy you surfing! ;)
]]>The traditional VPN solutions includes PPTP and L2TP+IPSec solutions. Both are the most popular VPN solutions which are support by almost any smart devices. PPTP and L2TP are all TCP-based VPN, which means a TCP connection must be contained between both ends to keep the status of VPN connection. Thus data lose or connection interrupted on these TCP connections will terminate the VPN connection. Besides these two VPNs are unable to change their TCP connection ports, that’s why PPTP and L2TP are easy to detect and blocked by the firewall. PPTP consumes TCP port 1723 and L2TP takes 1701. It differs on data transfer between these two VPNs. PPTP uses GRE packages with value 47, which L2TP uses UDP packages via port 500 and 4500, and L2TP may also utilize ESP packages with value 50.
It’s very easy to setup PPTP VPN on any VPS running a Linux distro. I take Ubuntu 14.04 and a ramnode OpenVZ container VPS as an example (the same environment will be used in the following article), you just need to:
1 | sudo apt-get install pptpd |
Then configure pptpd.conf
1 | sudo nano /etc/pptpd.conf |
change the server IP and client IP
localip 192.168.0.1
remoteip 192.168.0.100-200
This set the pptp server IP 192.168.0.1 to its ppp device, and distribute 192.168.0.100-200 to the client side ppp device. You could change these to any value you like. But you’d better not change it besides IP range 192.168.0.0/16 and 10.0.0.0/8, since IPs in these two ranges are assigned to LAN. IPs in other range may used by the public servers, and the NAT mechanism (which will be discussed below) may confuse the traffic from the public servers and VPN clients. Localip and remoteip should be in the same network.
Then uncomment the ms-dns and add google like below or OpenDNS:
ms-dns 8.8.8.8
ms-dns 8.8.4.4
Now add a VPN user in /etc/ppp/chap-secrets
file.
1 | sudo nano /etc/ppp/chap-secrets |
There are four columns in this file. The first is username, choose your favorite one. The second column is service name, such as pptpd or l2tpd. You can use to allow all services using this config line. The third column is your password, stored in plain test (which is awful :-( ). The fourth column presents the IPs allowed to use this config line. Leave it if you want to connect the VPN from anywhere. Here’s an example:
yourname * yourpassword *
Until now we finished all the configuration of PPTP server and we need to restart it.
1 | sudo /etc/init.d/pptpd restart |
1 | sudo service pptpd restart |
Besides of the configuration above, we need to enable IPv4 forwarding and setup the rules in iptables for SNAT. To enable IPv4 forwarding permanently, you need to edit file /etc/sysctl.conf
and add (or change) a following line:
net.ipv4.ip_forward=1
Then exit to shell and execute:
1 | sudo sysctl -p |
To add a rule in iptables, you can add the following lines in /etc/rc.local
:
iptables -t nat -A POSTROUTING -s 192.168.0.0/24 -o venet0 -j MASQUERADE
iptables -A FORWARD -p tcp --syn -s 192.168.0.0/24 -j TCPMSS --set-mss 1356
The first line means SNAT all the traffics from net 192.168.0.0/24 to the IP of local network interface venet0. If you setup PPTP server on a real machine, it maybe eth0 or em0. Check it with command ifconfig
. If you adjust 192.168.0.0/24 to your favorite IPs in localip
and remoteip
sections above, you should replace -s 192.168.0.0/24 with the same IP range here.
The second line is a little trivial and interesting. It means iptables will change MSS field of all the TCP packages with syn in header to 1356. MSS (Maximum Segment Size) defines the maximum size of a TCP package. The default value may be 1500 in some network (1500 is the maximum size in many Ethernet lines). Since VPN will consume a few spaces in the package header, the final size of a package may be larger than the maximum size which can hold by Ethernet line.
There could be some wired problems without setting the second line. Without this, I can ping/traceroute some website successfully but cannot access the pages in browsers.
Now you could use the username and password set in /etc/ppp/chap-secrets to use PPTP VPN. Remenber to enable MPPE encryption connection.
PPP support is disabled by default by some VPS providers. You need to enable it manually. For a ramnode VPS, you need to login to its vps control panel (https://vpscp.ramnode.com/login.php), choose “Settings” tab at the bottom of the page and turn PPP on.
PPTP and L2TP uses PPP support by kernel, and other VPNs such as AnyConnect, OpenVPN, ShadowVPN utilize TUN/TAP support. So enable TUN/TAP as well.
In the following post, I will introduce how to setup L2TP+IPSec VPN in a OpenVZ VPS.
]]>1 | int main() { |