jessLAND notes - tcp
                    TCP - Transmission Control Protocol 
		    ***********************************

1. Gral. Info.
2. TCP Segment Format
3. TCP Connection States
4. TCP stimulus - response
5. Resetting a connection
6. Valid and Invalid Flag combinations
7. Explicit Congestion Notification (ECN)
10. Passive Fingerprinting
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


                                                                      ^ TOP ^
1. Gral. Info. ============== - RFC 793: TCP Protocol Specification RFC 879: TCP maximum segment size and related topics RFC 1072: TCP extensions for long-delay paths RFC 1106: TCP big window and NAK options RFC 1110: Problem with the TCP big window option RFC 1323: TCP Extensions for High Performance (Obsoletes RFC1072) RFC 1644: T/TCP -- TCP Extensions for Transactions Functional Specification RFC 1693: An Extension to TCP : Partial Order Service - TCP is IP protocol number 6. - TCP Ports: 0 -> 65535 [0 -> 1023: Priviledged (not in Windows)] - Characteristics: + Reliable + Connection oriented (sequence numbers) - TCP Session (3 way handshake): [ISN: Initial Seq. Number] --------- SYN (ISNa) -------> <- SYN (ISNb) - ACK (ISNa+1) -- --------- ACK (ISNb+1) ------> < ..... > -- FIN-ACK -> <--- ACK ---- <- FIN-ACK -- ---- ACK ---> - Sending data on a SYN pkt is valid, although not typical. This data will be considered part of the stream once the three way handshake is completed. Typical uses are Round-trip time measurement or NID evasion/insertion attack. - TCP Retries: src port, dst port, seq. numbers persist in all retries. IP IDs change (incrementally). - Timeout: Depending on the TCP/IP implementation, timeout of a TCP connection can be somewhere between 2 and 30 mins. ............................................................................... ^ TOP ^
2. TCP Segment Format ===================== ( TCP HEADER: min. 20b, max. 60b - DATA: var. ) 0 8 16 24 31 +-----------------------------------+---------------------------------+ | # source port | # dest. port | |-----------------------------------+---------------------------------| 32 | sequence number | |---------------------------------------------------------------------| 64 | ack number | |----------+------+--+--+-+-+-+-+-+-+---------------------------------| 96 |hdr.len(4)|RSV(4)|R1 R2 U A P R S F| window size | |----------+------+--+--+-+-+-+-+-+-+---------------------------------| 128 | TCP checksum | urgent pointer | |-----------------------------------+---------------------------------| 160 | options field (var.length - max. 40b - 0 padding to 4b mult.) | |---------------------------------------------------------------------| 192 ~ DATA ~ +---------------------------------------------------------------------+ - src port: 0-65535 ; >1023 ephymeral (only unix) Each new connection not being a retry will use a different src port. dst port: - seq. #: Each new connection not being a retry will use a different seq #. ISN - Initial seq. number: first seq. number in TCP exchange. Win 95, 98, NT use a trivial time dependency formula to generate ISNs. - ack: + Must be >= 0 (TCP SYN consumes 1 seq. #) except for ISN 0xffffffff + does not forces the ACK of an initial SYN to be zero, but it usually is (except for some OSs). + All pkts in a established connections must have the ack bit set. + The ack# will correspond to the seq.# of the last pkt sent by the peer (consecutive pkts can have the same ack number if no data is sent by the peer). If there is no actual data in the received pkt (0), the ack# will be the seq# received + 1. clnt.52930 > SRV.discard: S 4153139971:4153139971(0) SRV.discard > clnt.52930: S 1308019873:1308019873(0) ack 4153139972 clnt.52930 > SRV.discard: . ack 1308019874 <- clnt.52930 > SRV.discard: P 4153139972:4153139975(3) ack 1308019874 <- SRV.discard > clnt.52930: . ack 4153139975 clnt.52930 > SRV.discard: P 4153139975:4153139978(3) ack 1308019874 <- SRV.discard > clnt.52930: . ack 4153139978 clnt.52930 > SRV.discard: P 4153139978:4153139995(17) ack 1308019874 <- SRV.discard > clnt.52930: . ack 4153139995 + If a pkt is lost, and the receiving host receives a seq# higher than expected, it will send back to the sending host a pkt with the seq# it is expecting. After several pkts of this kind, the sending host will understand that pkts are lost and will retransmit them. If the last pkt of a stream is lost, this mechanism would never take place. Instead, a timeout mechanism will make the sending host to retransmit the pkt if it does not receive an ack after a certain time. This time changes through sessions, or even inside a session, depending on network conditions and distances. - RSV - Reserved (4b) - Flags: + SYN Synchronize the sequence numbers to establish a connection + FIN Sender is finished sending data -- initialize a half close + RST Reset - Abort the connection + PSH tells receiver not to buffer the data before passing it to the application (interactive applications use this) It's possible to send data without the PSH flag set. + ACK Acknowledgement number is valid + URG Urgent pointer is valid (often from an interrupt, e.g. CTRL-c) Rarely used. Intended to elevate the priority of the pkt. + R2 - Non ECN hosts: Reserved bit. Must be zero. - ECN hosts: ECN-echo - Turned on when a pkt has both the "ECN-capable" and "Congestion Experienced" bits set in the IP hdr. Informs the sender to reduce the rate + R1 - No ECN hosts: Reserved bit. Must be zero. - ECN hosts: CWR - Congestion Window Reduced - Upon reception of ECN-echo, sender will reduce congestion window by half, and will set CWR. - Window Size: size of TCP buffer for incoming data in current connection. It is a flow control mechanism which allows the sender to transmit multiple pkts before stopping to wait for acks. - If the sender has "a lot" of data to send, it will split it among several pkts (none of which should exceed the MSS) and will send them one after the other without waiting ACKs from the receiving host. It will stop sending pkts after the aggregated size of them all reaches the WS limit. It will then wait for the ACK from the receiving host before continuing to send more pkts. - The WS can be changed by the receiving host during the connection (e.g. if the receiving process hasn't processed all the data received, it will send an ACK with a modified window having the value of the incoming available space). If the receiver has no more space for new data, it will ACK with a window size of 0. When it has freed resources, it will send a "window update" ACK announcing its new available window size (it's not a "real" ACK since it's not acking new data). - Should be set to: netwrk bandwidth x round trip time (ping) ( e.g. 100 Mbps net, 5ms rt = 100Mbps x 5.10^-3 s = 0.5 Mbits = 512 kbits = 64 kbytes). - Many architectures have limits on the size of the socket buffer and hence the TCP window size (Typically a megabyte) - 4096 is not optimal for ethernet; 16384 is much better. - Default values by OS: check the passive fingerprinting section. - Should not be 0 in an initial SYN - TCP checksum: + The algorithm divides the data to be checksummed into 16-bit fields. + Each 16-bit field has a 1's complement operation done on it and each of + these 1's complements values are added. + Validated by the destination host only. + If checksum is wrong, the datagram is discarded silently. + The checksum is calculated over the payload (padded to 2-byte boundary) and a pseudo hdr: [ src IP (4b) | dst IP (4b) | 0x00 (1b), proto (1b), tcp-length (2b) ] [ src port (2b), dst port (2b) | tcp length (2b), tcp chksum (2b) ] [ data (2-byte padded) ] + Similar method for UDP - Urgent Ptr: way for the sender to transmit emergency data to the other end; it's up to the receiving end to decide what to do with it. + Only valid if the URG flag is set + Rarely used. Intended to elevate the priority of the pkt. + Possitive offset that must be added to the seq. number of the segment to yield the seq.number of the last byte of urgent data. There's no way to know where the data begins. + Most common application: C-c sent in a rlogin/telnet session. - TCP options: (max. 40 bytes) +-----------+------------+-------- + GENERAL STRUCTURE: | Kind (1b) | Length (1b)| .... +-----------+------------+-------- (Length includes Kind + Length bytes (2)) + Types: ------ 0. EOL - 1b - End of Option List - RFC 793 - [ 00 ] If necessary, used as padding to form 4-byte fields at the end of the option list. 1. NOP - 1b - No-Option - RFC 793 - [ 01 ] Although not compulsory, usually used as word padding at the begining or end of each option. 2. MSS - 4b (1+1+2) - Max.Segment Size - RFC793/879- [02 04 <MSS>] The MSS is the largest collection of data which the client will try to send to the server in the TCP datagram. This size refers to the payload; you still have to add headers size. It can only appear in a SYN (set once and not readjusted). Values: + 536b - Default and usual if destination is not local. + x512 - Many BSDs require MSS to be a multiple of 512. + 1460 (0x05b4): usual value (solaris, aix, ...) when both ends are ethernet (proven more efficient than 1024). + Max: Outgoing interface's MTU - TCP hdr - IP hdr (ethernet/802.3: MSS<=1460 ; 802.2<=1452) Fragmentation: + With no fragmentation, then the bigger MSS, the better. + In case of both machines announcing an MSS bigger than the maximum of any of the intermediate networks, fragmentation will occur. The MTU path discovery mechanism is the only way around it. 3. WSCALE - 3b (1+1+1) - Window Scale - RFC1072- [03 03 <shift.cnt>] Multiplicative factor that allow receiving buffers to be>65535 WSCALE designates the number of bits that WSIZE should be shifted in order to compute the actual WSIZE. Eg. WSIZE=55808, WSCALE=2 -> actual WSIZE=223,232 4. SACKOK - 2b (1+1) - Selective ACK Permitted - RFC1072/2018- [04 02] Selective acknoledgement is a method allowing the data receiver to tell the sender which segments arrived successfully. This lets the sender retransmit only lost pkts, in an attempt to improve upon TCP's cumulative acknowledgement process. 5. SACK - var length (1+1+2+2+...) - RFC1072 [05 <length(1b)> <Relative Origin(2b)> <Block Size(2b)> ... ] 6. ECHO - 6b (1+1+4) - (RFC1072)- [06 06 <info to be echoed>] 7. ECHOREPLY - 6b (1+1+4) - (RFC1072)- [07 06 <echoed info>] 8. TIMESTAMP - 10b (1+1+4+4) - (RFC1323)- [08 0a <TS Value (TSval)> <TS Echo Reply (TSecr)> ] Used to compute retransmission timer (helps to recover from pkt loss) through round trip time calculation, and to make sure a reused and old sequence number does not accidentally get included with a current exchange. 9. POC-perm - 2b(1+1)-Partial Order Service Permitted- RFC1693-[09 02] 10. POC-service-profile - 3b (1+1+1) - RFC1693 1 bit 1 bit 6 bits [ 0a 03 <Start_flag | End_flag | Filler >] 11. CC - 6b (1+1+4) - (RFC1644)- [0b 06 <Connection Count: SEG.CC>] 12. CCNEW - 6b (1+1+4) - (RFC1644)- [0c 06 <Connection Count: SEG.CC>] 13. CCECHO - 6b (1+1+4) -(RFC1644)- [0d 06 <Connection Count: SEG.CC>] - PADDING: variable The TCP header padding is used to ensure that the TCP header ends and data begins on a 32 bit boundary. The padding is composed of zeros. - KINDS EXPLAINED: 0. EOL - 1b - End of Option List - RFC 793 - [ 00 ] This option code indicates the end of the option list. This might not coincide with the end of the TCP header according to the Data Offset field. This is used at the end of all options, not the end of each option, and need only be used if the end of the options would not otherwise coincide with the end of the TCP header. 3. WSCALE - 3b (1+1+1) - Window Scale - RFC1072- [03 03 <shift.cnt>] May be sent in a SYN segment by a TCP: (1) to indicate that it is prepared to do both send and receive window scaling (2) to communicate a scale factor to be applied to its receive window. The scale factor is encoded logarithmically, as a power of 2 (presumably to be implemented by binary shifts). Note: the window in the SYN segment itself is never scaled. Here shift.cnt is the number of bits by which the receiver right-shifts the true receive-window value, to scale it into a 16-bit value to be sent in TCP header (this scaling is explained below). The value shift.cnt may be zero (offering to scale, while applying a scale factor of 1 to the receive window). 4. SACKOK - 2b (1+1) - Sack-Permitted - (RFC1072)- [04 02] May be sent in a SYN by a TCP that has been extended to receive (and presumably process) the SACK option once the wconnection has opened. ............................................................................... ^ TOP ^
3. TCP Connection States ======================== - LISTEN Waiting for connection request from remote host - SYN-SENT Waiting for SYN ACK after sending SYN - SYN-RECEIVED Waiting for confirming connection request ACK - ESTABLISHED State after the 3 way handshake is completed. Data received can be delivered to process (normal state for data transfer phase) - FIN-WAIT-1 Waiting for FIN from remote host or FIN sent, waiting for ACK - FIN-WAIT-2 Waiting for connection termination request from the remote host - CLOSE-WAIT Waiting for connection termination request from local process - CLOSING Waiting for connection termination request ACK from remote host - LAST-ACK Waiting for ACK of connection termination request previously sent (includes ACK of its connection termination request). - TIME-WAIT Waiting for enough time to pass to be sure that a remote TCP process receives the ACK to its connection termination request. - CLOSED No connection state ............................................................................... ^ TOP ^
4. TCP stimulus - response ========================== [c = client] [s = server] [r = router] [S = SYN] [R = RESET] [A = ACK] - Open port: S [c->s] / S A [s->c] F [c->s] / - + Linux: 127.0.0.1.49900 > 127.0.0.1.http: S 489924623:489924623(0) win 32767 <mss 16396,sackOK,timestamp 80581855 0,nop,wscale 0> (DF) 127.0.0.1.http > 127.0.0.1.49900: S 497843559:497843559(0) ack 489924624 win 32767 <mss 16396,sackOK,timestamp 80581855 80581855,nop,wscale 0> (DF) 127.0.0.1.49900 > 127.0.0.1.http: . ack 1 win 32767 <nop,nop,timestamp 80581855 80581855> (DF) + Anomalies: Linux 2.4 & Windows send RA in response to F. - Closed port: S [c->s] / R A [s->c] F [c->s] / R A [s->c] + Linux: 127.0.0.1.49899 > 127.0.0.1.1234: S 439514276:439514276(0) win 32767 <mss 16396,sackOK,timestamp 80576963 0,nop,wscale 0> (DF) 127.0.0.1.1234 > 127.0.0.1.49899: R 0:0(0) ack 439514277 win 0 (DF) - Unrecognized connection: [UPSF] A [c->s] / R [s->c] + Linux: (SA pkt has been forged; server respondes with a R without ACK) 127.0.0.1.2640 > 127.0.0.1.0: S 1022999587:1022999587(0) ack 1416218398 127.0.0.1.0 > 127.0.0.1.2640: R 1416218398:1416218398(0) - Response to Reset: R [c->s] / - RA [c->s] / - + A R should never elicit a response (neither R alone or RA). - Aborting a connection: 127.0.0.1.49899 > 127.0.0.1.1234: S 439514276:439514276(0) 127.0.0.1.http > 127.0.0.1.49900: S 497843559:497843559(0) ack 489924624 127.0.0.1.49900 > 127.0.0.1.http: . ack 1 [...] 127.0.0.1.http > 127.0.0.1.49900: R - Host does not exist: S [c->s] / ICMP host unreachable [r->c] - Port blocked by router: S [c->s] / ICMP host unreachable - admin prohibited filter [r->c] - Port blocked by silent router: S [c->s] / - ............................................................................... ^ TOP ^
5. Resetting a connection ========================= - As a general rule, reset (RST) must be sent whenever a segment arrives which apparently is not intended for the current connection. A reset must not be sent if it is not clear that this is the case. There are three groups of states: 1. If the connection does not exist (CLOSED): R is sent in response to any incoming segment except another R. In particular, SYNs addressed to a non-existent connection are rejected by this means. If the incoming segment has an ACK field, the reset takes its sequence number from the ACK field of the segment, otherwise the reset has sequence number zero and the ACK field is set to the sum of the sequence number and segment length of the incoming segment. The connection remains in the CLOSED state. 2. If the connection is in any non-synchronized state (LISTEN, SYN-SENT, SYN-RECEIVED), and the incoming segment acks something not yet sent (the segment carries an unacceptable ACK), or if an incoming segment has a security level or compartment which does not exactly match the level and compartment requested for the connection: R is sent + If the incoming segment has an ACK field, the reset takes its seq number from the ACK field of the segment + Otherwise the reset has seq number zero and the ACK field is set to the sum of the seq number and segment length of the incoming segment. The connection remains in the same state. 3. If the connection is in a synchronized state (ESTABLISHED, FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, TIME-WAIT), any unacceptable segment (out of window sequence number or unacceptable ack number) must elicit only an empty acknowledgment segment containing the current send-sequence number and an ack indicating the next sequence number expected to be received, and the connection remains in the same state... The R takes its seq number from the ACK field of the incoming segment. ............................................................................... ^ TOP ^
6. Valid and Invalid Flag combinations ====================================== - Valid: { S | SA | A | F | FA | R | RA } Invalid: ............................................................................... ^ TOP ^
7. Explicit Congestion Notification (ECN) ========================================= - RFC 3168 - Mechanism to reduce the congestion condition of a network by notifying sending hosts of congention, so they can reduce the transmission rate. ............................................................................... ^ TOP ^
10. Passive Fingerprinting ========================== [ From: http://project.honeynet.org/papers/finger/traces.txt ] - See below - Other values: WINDOW ------ 4.2BSD 2048 4.3BSD 4096 # OS VERSION PLATFORM TTL WINDOW DF TOS #--- ------- -------- --- ----------- -- --- DC-OSx 1.1-95 Pyramid/NILE 30 8192 n 0 Windows 9x/NT Intel 32 5000-9000 y 0 NetApp OnTap 5.1.2-5.2.2 54 8760 y 0 HPJetDirect ? HP_Printer 59 2100-2150 n 0 AIX 4.3.x IBM/RS6000 60 16000-16100 y 0 AIX 4.2.x IBM/RS6000 60 16000-16100 n 0 Cisco 11.2 7507 60 65535 y 0 DigitalUnix 4.0 Alpha 60 33580 y 16 IRIX 6.x SGI 60 61320 y 16 OS390 2.6 IBM/S390 60 32756 n 0 Reliant 5.43 Pyramid/RM1000 60 65534 n 0 FreeBSD 3.x Intel 64 17520 y 16 JetDirect G.07.x J3113A 64 5804-5840 n 0 Linux 2.2.x Intel 64 32120 y 0 OpenBSD 2.x Intel 64 17520 n 16 OS/400 R4.4 AS/400 64 8192 y 0 SCO R5 Compaq 64 24820 n 0 Solaris 8 Intel/Sparc 64 24820 y 0 FTX(UNIX) 3.3 STRATUS 64 32768 n 0 Unisys x Mainframe 64 32768 n 0 Netware 4.11 Intel 128 32000-32768 y 0 Windows 9x/NT Intel 128 5000-9000 y 0 Windows 2000 Intel 128 17000-18000 y 0 Cisco 12.0 2514 255 3800-5000 n 192 Solaris 2.x Intel/Sparc 255 8760 y 0

Last Updated: 02/09/2003-13:22:25 - © Copyright 2004, Jess García