This post walks through exploiting CVE-2014-4943, a type confusion bug in the Linux kernel's L2TP subsystem. I found this bug with my old colleague @vegard and turned his initial denial of service PoC into a local privilege escalation. Many, many exploitation mitigations have been added to the kernel since then, so a lot of the techniques used here no longer work.
The vulnerability
The upstream fix for this vulnerability simply removed the functionality because it never ever worked:
net/l2tp: don't fall back on UDP [get|set]sockopt
The l2tp [get|set]sockopt() code has fallen back to the UDP functions for socket option levels != SOL_PPPOL2TP since day one, but that has never actually worked, since the l2tp socket isn't an inet socket.
As David Miller points out:
"If we wanted this to work, it'd have to look up the tunnel and then use tunnel->sk, but I wonder how useful that would be"
Since this can never have worked so nobody could possibly have depended on that functionality, just remove the broken code and return -EINVAL.
diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c
index 950909f04ee6a..13752d96275e8 100644
- return udp_prot.setsockopt(sk, level, optname, optval, optlen);
+ return -EINVAL;
if (optlen < sizeof(int))
return -EINVAL;
struct pppol2tp_session *ps;
if (level != SOL_PPPOL2TP)
- return udp_prot.getsockopt(sk, level, optname, optval, optlen);
+ return -EINVAL;
if (get_user(len, optlen))
return -EFAULT;
int err;
if (level != SOL_PPPOL2TP)
The commit description and patch are fairly opaque but contains a hint: L2TP sockets aren't INET sockets. Looking at the code before the fix was applied we can see that any socket operations on a L2TP socket is forwarded to the UDP subsystem for any non-L2TP operations:
net/l2tp/l2tp_ppp.c
static int
struct sock
defines generic data for sockets in the kernel. Each specific type of socket — UDP, TCP, SCTP, L2TP, etc — defines its own structure that embeds a struct sock
as its first member. This allows generic networking code to handle sock
structures then "upcast" them when needing to handle protocol specific logic. In pppol2tp_setsockopt
, the sock
parameter actually points to a 'struct pppox_sock' allocation:
include/linux/if_pppox.h
;
When we call setsockopt
on a L2TP socket, we can specify an arbitrary "level" which allows us to easily call into the UDP socket options handler. If we choose SOL_UDP
we call into udp_lib_setsockopt
which blindly casts the sk
pointer to a struct udp_sock
via udp_sk
:
net/ipv4/udp.c
int
net/ipv4/udp.c
int
include/linux/udp.h
static inline struct udp_sock *
This is a form of type confusion, which is a relatively rare class of kernel bug. We can use this bug to either trigger an out-of-bound read/write (because a struct udp_sock
is larger than a struct pppox_sock
) or corrupt a member of the underlying struct pppox_sock
allocation. I explored both these options.
Arbitrary read/write
For example, the encap_type
field in struct udp_sock
has an offset greater than the size of struct pppox_sock
. We can use the UDP_ENCAP
socket option to read and write this field:
net/ipv4/udp.c
int
net/ipv4/udp.c
int
Unfortunately, on the kernel we're targeting generic allocations are rounded up to the next power of two. As a result, reading or writing the encap_type
field does not allow us to manipulate other objects on the heap. Rather, we can only read or write the extra padding added by the allocator which is always zeroed on allocation. Instead, I used type confusion to get code execution.
Type confusion
Comparing the two structures we can see that reading or writing through the struct udp_sock
pointer allows us to read or write elements of the struct pppox_sock
allocation. For example, writing to the pending
field (1) would actually corrupt the underlying chan
field (2):
include/net/inet_sock.h
include/linux/udp.h
include/linux/if_pppox.h
Digging some more into the layout of the two structures, we find that there is a promising candidate for our type confusion: the ppp
field:
$ gdb -q net/ipv4/udp.o
Reading symbols from net/ipv4/udp.o...done.
(gdb) p/x &((struct udp_sock *) 0)->inet.inet_opt
$1 = 0x510
...
$ gdb -q net/l2tp/l2tp_ppp.o
Reading symbols from net/l2tp/l2tp_ppp.o...done.
(gdb) p/x &((struct pppox_sock *) 0)->chan.ppp
$1 = 0x510
The inet_opt
field embedded in the UDP socket structure contains the set of IP options which can be directly controlled via the IP_OPTIONS
socket option. The ppp
field is a chunk of opaque data, represented as a void *
in the structure. At runtime, though, it points to a struct ppp
allocation.
These Point-to-Point Protocol (PPP) structures are used when sending or receiving data tunneled via a L2TP socket. For this exploit we will target the optional compression feature in PPP which is implemented as a series of function pointers embedded deeply in the struct ppp
allocation. We can trigger the use of this corrupted structure deep in the call stack when receiving data from a UDP socket. The packet is first received by the generic UDP processing which calls into the L2TP module which finally calls down into the PPP decompression logic. The callstack at this point looks like this:
__udp4_lib_rcv
udp_queue_rcv_skb
l2tp_udp_encap_recv
l2tp_udp_recv_core
l2tp_recv_common
l2tp_recv_dequeue
l2tp_recv_dequeue_skb
pppol2tp_recv
ppp_input
ppp_do_recv
ppp_receive_frame
ppp_decompress_frame
When we finally reach the decompression code, we fully control the contents of the ppp
argument (1) and we can choose whether the packet we send is compressed or not (2), which will trigger a call through a function pointer we control (3):
drivers/net/ppp/ppp_generic.c
static struct sk_buff *
Top-half vs bottom-half
When using this codepath in our exploit, we need to be aware of how the kernel handles different types of sockets.
For packets received from a physical/virtual network adapter, the adapter fires an interrupt letting the kernel know that a packet has arrived. When running in an interrupt handler, we cannot directly process the packet because the kernel is in an indeterminate state. This is the so-called "top half" routine, which is fired by an interrupt and is solely responsible for adding the packet to a workqueue. The queue is then later processed in a non-interrupt context, termed the "bottom half". This has the effect of clearly separating the send logic from the receive logic:
The kernel has different behavior for sockets bound to localhost.
tophalf:
`recv(2)`
called
|
|
v
|
\---->---- packet processed
from queue
|
^
|
+---+---+
| queue |
+---^---+
|
--------------------+-----------------------
bottomhalf: |
|
/---->--- add packet --->---\
| to queue |
interupt |
handler |
| v
^ |
| |
interrupt interrupt
fired when handler
packet received completes
When sending a packet to a localhost socket, however, no interrupts are involved and there is no longer a clear separation of sending and receiving data. Instead, when a packet is destined for a local interface it is parsed and routed immediately in the same kernel context as the send operation. Why is this important for out exploit? It means that after corrupting the L2TP structure in the kernel, we can send a packet on the L2TP socket which will directly trigger the decompression function pointer in the context of our exploit process. This allows us to construct shellcode in our exploit process, which can then be called directly called from kernelspace because we are in the same context. In diagram form:
kernelspace: | userspace:
|
|
/-------------------|----- write(l2tp_fd, packet, sizeof(packet))
v |
send logic |
| |
v |
recv logic |
| |
v |
PPP decompression |
logic |
| |
| |
sk->ppp->rcomp->incomp() |
| |
| |
\-------------------|-----------> fake
| compressor
| |
| |
| v
| shellcode
|
Cleanup
When we gain execution back in userspace we can elevate our privileges by simply setting the process's uid
and gid
to zero. But what happens after our shellcode has run? Looking again at ppp_decompress_frame
, there is still packet handling logic that runs after our shellcode has triggered that we need to navigate:
drivers/net/ppp/ppp_generic.c
static struct sk_buff *
We can bypass most of this logic by returning DECOMP_FATALERROR
from our shellcode which will set the PPP socket into an error state (4). As noted before, we are pretty deep in the call stack at this point and still need to return back through the L2TP and UDP logic but this is mostly a case of ensuring that the various fields in our fake compressor object are set up correctly.
Once we finally return to userspace from the write(2)
syscall our process is now root. 🍨