openbsd-src

mirror of https://github.com/openbsd/src.git synced 2024-12-22 16:42:56 -08:00

Author	SHA1	Message	Date
jsg	0f9e9ec23b	remove prototypes with no matching function ok mpi@	2024-05-13 01:15:50 +00:00
bluhm	93536db294	Split single TCP inpcb table into IPv4 and IPv6 parts. With two separate TCP hash tables, each one becomes smaller. When we remove the exclusive net lock from TCP, contention on internet PCB table mutex will be reduced. UDP has been split earlier into IPv4 and IPv6. Replace branch conditions based on INP_IPV6 with assertions. OK mvs@	2024-04-12 16:07:09 +00:00
bluhm	94c0e2bd60	Merge struct route and struct route_in6. Use a common struct route for both inet and inet6. Unfortunately struct sockaddr is shorter than sockaddr_in6, so netinet/in.h has to be exposed from net/route.h. Struct route has to be bsd visible for userland as netstat kvm code inspects inp_route. Internet PCB and TCP SYN cache can use a plain struct route now. All specific sockaddr types for inet and inet6 are embeded there. OK claudio@	2024-02-13 12:22:09 +00:00
bluhm	82b5c162c3	Declare address parameter in TCP SYN cache const. tcp6_ctlinput() casted a constant sockaddr_sin6 to non-const sockaddr. sa6_src may be &sa6_any which lives in read-only data section. Better pass down the const addresses to syn_cache_lookup(). They are needed for hash lookup and are not modified. OK mvs@	2024-01-27 21:13:46 +00:00
bluhm	6285ef2327	Fix white spaces in TCP.	2024-01-11 13:49:49 +00:00
bluhm	0f086867d6	Run TCP syn cache timer without kernel lock. As syn_cache_timer() uses syn cache mutex and exclusive net lock, it does not need kernel lock. OK mvs@	2023-11-29 19:19:25 +00:00
bluhm	0bfbfbe7cd	Run TCP SYN cache timer logik without net lock. Introduce global TCP SYN cache mutex. Devide timer function in parts protected by mutex and sending with netlock. Split the flags field in dynamic flags protected by mutex and fixed flags set during initialization. Document whether fields of struct syn_cache are protected by net lock or mutex. input and OK sashan@	2023-11-16 18:27:48 +00:00
bluhm	94687c0059	Fix netstat output of uses of current SYN cache left. TCP syn cache variable scs_use is basically counting packet insertions into syn cache. Prefer type long to exclude overflow on fast machines. Due to counting downwards from a limit, it can become negative. Copy it out as tcps_sc_uses_left via sysctl, and print it as signed long long integer. OK mvs@	2023-09-04 23:00:36 +00:00
bluhm	a1744ce2a1	Introduce reference counting for TCP syn cache entries. The syn_cache_reaper() is a hack to serialize timeouts. Unfortunately it has a race and panics sometimes with pool_do_get: syncache free list modified. Add a reference counter for timeout and list of syn cache entries. Currently list refcout is not strictly necessary due to exclusive netlock, but will be needed when we continue unlocking. Checking timeout_initialized() is not MP friendly, better do proper initialization during object allocation. Refcount in btrace helps to find leaks. bug reported and fix tested by Peter J. Philipp OK claudio@	2023-08-28 14:50:01 +00:00
bluhm	9e96aff02f	Convert tcp_now() time counter to 64 bit. After changing tcp now tick to milliseconds, 32 bits will wrap around after 49 days of uptime. That may be a problem in some places of our stack. Better use a 64 bit counter. As timestamp option is 32 bit in TCP protocol, use the lower 32 bit there. There are casts to 32 bits that should behave correctly. Start with random 63 bit offset to avoid uptime leakage. 2^63 milliseconds result in 2.9*10^8 years of possible uptime. OK yasuoka@	2023-07-06 09:15:23 +00:00
bluhm	a3c0391fc7	Use TSO and LRO on the loopback interface to transfer TCP faster. If tcplro is activated on lo(4), ignore the MTU with TCP packets. They are passed along with the information that they have to be chopped in case they are forwarded later. New netstat(1) counter shows that software LRO is in effect. The feature is currently turned off by default. tested by jan@; OK claudio@ jan@	2023-07-02 19:59:15 +00:00
jan	a5a54c4aaf	New counters for LRO packets from hardware TCP offloading. With tweaks from patrick@ and bluhm@. OK bluhm@	2023-05-23 09:16:16 +00:00
jan	cd396c9863	Use TSO offloading in ix(4). With a lot of tweaks, improvements and testing from bluhm. Thanks to Hrvoje Popovski from the University of Zagreb for his great testing effort to make this happen. ok bluhm	2023-05-18 08:22:37 +00:00
bluhm	510f4386e1	Implement the TCP/IP layer for hardware TCP segmentation offload. If the driver of a network interface claims to support TSO, do not chop the packet in software, but pass it down to the interface layer. Precalculate parts of the pseudo header checksum, but without the packet length. The length of all generated smaller packets is not known yet. Driver and hardware will use the mbuf packet header field ph_mss to calculate it and update checksum. Introduce separate flags IFCAP_TSOv4 and IFCAP_TSOv6 as hardware might support ony one protocol family. The old flag IFXF_TSO is only relevant for large receive offload. It is missnamed, but keep that for now. Note that drivers do not set TSO capabilites yet. Also the ifconfig flags and pseudo interfaces capabilities will be done separately. So this commit should not change behavior. heavily based on the work from jan@; OK sashan@	2023-05-15 16:34:56 +00:00
bluhm	c06845b1c3	Implement TCP send offloading, for now in software only. This is meant as a fallback if network hardware does not support TSO. Driver support is still work in progress. TCP output generates large packets. In IP output the packet is chopped to TCP maximum segment size. This reduces the CPU cycles used by pf. The regular output could be assisted by hardware later, but pf route-to and IPsec needs the software fallback in general. For performance comparison or to workaround possible bugs, sysctl net.inet.tcp.tso=0 disables the feature. netstat -s -p tcp shows TSO counter with chopped and generated packets. based on work from jan@ tested by jmc@ jan@ Hrvoje Popovski OK jan@ claudio@	2023-05-10 12:07:16 +00:00
yasuoka	b95875753e	To avoid misunderstanding, keep variables for tcp keepalive in milliseconds, which is the same unit of tcp_now(). However, keep the unit of sysctl variables in seconds and convert their unit in tcp_sysctl(). Additionally revert TCPTV_SRTTDFLT back to 3 seconds, which was mistakenly changed to 1.5 seconds by tcp_timer.h 1.19. ok claudio	2023-03-14 00:24:05 +00:00
claudio	d88352743c	In tcp_now() switch from getnsecuptime() to getnsecruntime() The tcp timer is not supposed to run during suspend but getnsecuptime() does and because of this sessions with TCP_KEEPALIVE on reset after a few hours of sleep. Problem noticed by mlarkin@, investigation by yasuoka@ additional testing jca@ OK yasuoka@ jca@ cheloha@	2022-12-13 18:10:55 +00:00
yasuoka	00007ca37e	Modify TCP receive buffer size auto scaling to use the smoothed RTT (SRTT) instead of the timestamp option. Since the timestamp option is disabled on some OSs (eg. Windows) or dropped by some firewalls/routers, in such a case the window size had been fixed at 16KB, this limits throughput at very low on high latency networks. Also replace "tcp_now" from 2HZ tick counter to binuptime in milliseconds to calculate the SRTT better. tested by krw matthieu jmatthew dlg djm stu stsp ok claudio	2022-11-07 11:22:55 +00:00
mvs	280d7fb53c	Change pru_abort() return type to the type of void and make pru_abort() optional. We have no interest on pru_abort() return value. We call it only from soabort() which is dummy pru_abort() wrapper and has no return value. Only the connection oriented sockets need to implement (*pru_abort)() handler. Such sockets are tcp(4) and unix(4) sockets, so remove existing code for all others, it doesn't called. ok guenther@	2022-10-17 14:49:01 +00:00
bluhm	62440853d9	System calls should not fail due to temporary memory shortage in malloc(9) or pool_get(9). Pass down a wait flag to pru_attach(). During syscall socket(2) it is ok to wait, this logic was missing for internet pcb. Pfkey and route sockets were already waiting. sonewconn() must not wait when called during TCP 3-way handshake. This logic has been preserved. Unix domain stream socket connect(2) can wait until the other side has created the socket to accept. OK mvs@	2022-10-03 16:43:52 +00:00
mvs	61cf011a4e	Change pru_rcvd() return type to the type of void. We have no interest on pru_rcvd() return value. Drop "pru_rcvd != NULL" check within pru_rcvd() wrapper. We only call it if the socket's protocol have PR_WANTRCVD flag set. Such sockets are route domain, tcp(4) and unix(4) sockets. ok guenther@ bluhm@	2022-09-13 09:05:47 +00:00
mvs	c3a3d6092d	Move PRU_PEERADDR request to (pru_peeraddr)(). Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets, except tcp(4) case. Also remove _usrreq() handlers. ok bluhm@	2022-09-03 22:43:38 +00:00
bluhm	8c664ca542	Use a mutex to update tcp_maxidle, tcp_iss, and tcp_now. This removes pressure from the exclusive netlock in tcp_slowtimo(). Reading is done atomically. Ensure that the tcp_now value is read only once per function to provide consistent time. OK yasuoka@	2022-09-03 19:22:19 +00:00
mvs	0dc53d81fb	Move PRU_SOCKADDR request to (*pru_sockaddr)() Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4) inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability. The key management and route domain sockets returns EINVAL error for PRU_SOCKADDR request, so keep this behaviour for a while instead of make pru_sockaddr handler optional and return EOPNOTSUPP. ok bluhm@	2022-09-03 18:48:49 +00:00
mvs	3f68dcd30a	Move PRU_CONTROL request to (pru_control)(). The 'proc ' arg is not used for PRU_CONTROL request, so remove it from pru_control() wrapper. Split out {tcp,udp}6_usrreqs from {tcp,udp}_usrreqs and use them for inet6 case. ok guenther@ bluhm@	2022-09-02 13:12:31 +00:00
mvs	f0a6a678cb	Move PRU_SENDOOB request to (*pru_sendoob)(). PRU_SENDOOB request always consumes passed `top' and `control' mbufs. To avoid dummy m_freem(9) handlers for all protocols release passed mbufs in the pru_sendoob() EOPNOTSUPP error path. Also fix `control' mbuf(9) leak in the tcp(4) PRU_SENDOOB error path. ok bluhm@	2022-08-31 21:23:02 +00:00
mvs	52454f7046	Move PRU_RCVOOB request to (*pru_rcvoob)(). ok bluhm@	2022-08-29 08:08:17 +00:00
mvs	4024125e33	Move PRU_SENSE request to (*pru_sense)(). ok bluhm@	2022-08-28 21:35:11 +00:00
mvs	fbc11c672c	Move PRU_ABORT request to (*pru_abort)(). We abort only the sockets which are linked to `so_q' or `so_q0' queues of listening socket. Such sockets have no corresponding file descriptor and are not accessed from userland, so PRU_ABORT used to destroy them on listening socket destruction. Currently all our sockets support PRU_ABORT request, but actually it required only for tcp(4) and unix(4) sockets, so i should be optional. However, they will be removed with separate diff, and this time PRU_ABORT requests were converted as is. Also, the socket should be destroyed on PRU_ABORT request, but route and key management sockets leave it alive. This was also converted as is, because this wrong code never called. ok bluhm@	2022-08-28 18:44:16 +00:00
mvs	90b3510cbf	Move PRU_SEND request to (*pru_send)(). The former PRU_SEND error path of gre_usrreq() had `control' mbuf(9) leak. It was fixed in new gre_send(). The former pfkeyv2_send() was renamed to pfkeyv2_dosend(). ok bluhm@	2022-08-27 20:28:01 +00:00
mvs	cc9f6b974a	Move PRU_RCVD request to (*pru_rcvd)(). ok bluhm@	2022-08-26 16:17:38 +00:00
mvs	86e05c9404	Move PRU_SHUTDOWN request to (*pru_shutdown)(). ok bluhm@	2022-08-22 21:18:48 +00:00
mvs	e00787e64c	Move PRU_DISCONNECT request to (*pru_disconnect). ok bluhm@	2022-08-22 13:23:06 +00:00
mvs	92a454d9ed	Move PRU_ACCEPT request to (*pru_accept)(). ok bluhm@	2022-08-22 08:08:46 +00:00
mvs	074c83889e	Move PRU_CONNECT request to (*pru_connect)() handler. ok bluhm@	2022-08-21 22:45:55 +00:00
mvs	cfab0d997b	Move PRU_LISTEN request to (*pru_listen)() handler. ok bluhm@	2022-08-21 17:30:21 +00:00
mvs	121fc5cf68	Move PRU_BIND request to (*pru_bind)() handler. For the protocols which don't support request, leave handler NULL. Do the NULL check within corresponding pru_() wrapper and return EOPNOTSUPP in such case. This will be done for all upcoming user request handlers. ok bluhm@ guenther@	2022-08-20 23:48:57 +00:00
mvs	7985bfd0a0	Introduce 'pr_usrreqs' structure and move existing user-protocol handlers into it. We want to split existing (pr_usrreq)() to multiple short handlers for each PRU_ request as it was already done for PRU_ATTACH and PRU_DETACH. This is the preparation step, (pr_usrreq)() split will be done with the following diffs. Based on reverted diff from guenther@. ok bluhm@	2022-08-15 09:11:38 +00:00
claudio	ced6d44d3b	Add TCP_INFO support to getsockopt for tcp sessions. TCP_INFO provides a lot of information about the TCP session of this socket. Many processes like to peek at the rtt of a connection but this also provides a lot of more special info for use by e.g. tcpbench(1). While the basic minimal info is available all the time the more specific data is only populated for privileged processes. This is done to not share data back to userland that may allow to attack a session. TCP_INFO is available to pledge "inet" since pledged processes like chrome tend to use TCP_INFO when available. OK bluhm@	2022-08-11 09:13:21 +00:00
guenther	532245610f	Reported-by: syzbot+1b5b209ce506db4d411d@syzkaller.appspotmail.com Revert the pr_usrreqs move: syzkaller found a NULL pointer deref and I won't be available to monitor for followup issues for a bit	2022-02-25 23:51:03 +00:00
guenther	80ceac1983	Move pr_attach and pr_detach to a new structure pr_usrreqs that can then be shared among protosw structures, following the same basic direction as NetBSD and FreeBSD for this. Split PRU_CONTROL out of pr_usrreq into pru_control, giving it the proper prototype to eliminate the previously necessary casts. ok mvs@ bluhm@	2022-02-25 08:36:01 +00:00
bluhm	e446fa2bdd	Define all TCP TF_ flags as unsigned numbers. They are stored in u_int t_flags. Shifting TF_TIMER with TCPT_DELACK can touch the sign bit. found by kubsan; suggested by deraadt@; OK miod@	2022-01-23 21:44:31 +00:00
visa	ed4744ab1e	Drop tcp_trace() from SMALL_KERNEL builds to make room on amd64 floppy OK deraadt@	2021-01-28 14:53:20 +00:00
gnezdo	7c72bba2fa	Convert tcp_sysctl to sysctl_bounded_args This introduces bounds checks for many net.inet.tcp sysctl variables. Folded some fitting cases into the framework: tcp_do_sack, tcp_do_ecn. ok derradt@	2020-08-18 05:21:21 +00:00
bluhm	0c04e3b6d5	Count the number of TCP SACK options that were dropped due to the sack hole list length or pool limit. OK claudio@	2019-07-12 19:43:51 +00:00
bluhm	4e64d49b29	The output from tcp debug sockets was incomplete. After detach tp was NULL and nothing was traced. So save the old tcpcb and use that to retrieve some information. Note that otb may be freed and must not be dereferenced. Use a heuristic for cases where the address family is in the IP header but not provided in the PCB. OK visa@	2018-06-11 07:40:26 +00:00
bluhm	5bcca80f6b	Historically there were slow and fast tcp timeouts. That is why the delack timer had a different implementation. Use the same mechanism for all TCP timer. OK mpi@ visa@	2018-05-08 15:10:33 +00:00
bluhm	772b5f4ea9	Historically TCP timeouts were implemented with pr_slowtimo and pr_fasttimo. That is the reason why we have two timeout mechanisms with complicated ticks calculation. Move the delay ACK timeout to milliseconds and remove some ticks and hz mess from the others. This makes it easier to see the actual values. OK florian@ dhill@ dlg@	2018-02-07 00:31:10 +00:00
bluhm	ded1556fba	There was a race in the TCP timers. As they may sleep to grab the netlock, timers may still run after they have been disarmed. Deleting the timeout is not sufficient to cancel them, but the code from 4.4 BSD is assuming this. The solution is to add a flag for every timer to see whether it has been armed or canceled. Remove the TF_DEAD check as tcp_canceltimers() is called before the reaper timer is fired. Cancelation works reliably now. OK mpi@	2018-02-06 15:13:08 +00:00
bluhm	9acddc59a5	The TCP reaper timeout was still imlemented as soft timeout. So it could run immediately and was not synchronized with the TCP timeouts, although that was the intension when it was introduced in revision 1.85. Convert the reaper to an ordinary TCP timeout so it is scheduled on the same timeout thread after all timeouts have finished. A net lock is not necessary as the process calling tcp_close() will not access the tcpcb after arming the reaper timeout. OK mikeb@	2018-01-23 21:41:17 +00:00

1 2 3 4

178 Commits