jessLAND notes - tcp
TCP - Transmission Control Protocol
***********************************
1. Gral. Info.
2. TCP Segment Format
3. TCP Connection States
4. TCP stimulus - response
5. Resetting a connection
6. Valid and Invalid Flag combinations
7. Explicit Congestion Notification (ECN)
10. Passive Fingerprinting
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
^ TOP ^
1. Gral. Info.
==============
- RFC 793: TCP Protocol Specification
RFC 879: TCP maximum segment size and related topics
RFC 1072: TCP extensions for long-delay paths
RFC 1106: TCP big window and NAK options
RFC 1110: Problem with the TCP big window option
RFC 1323: TCP Extensions for High Performance (Obsoletes RFC1072)
RFC 1644: T/TCP -- TCP Extensions for Transactions Functional
Specification
RFC 1693: An Extension to TCP : Partial Order Service
- TCP is IP protocol number 6.
- TCP Ports: 0 -> 65535 [0 -> 1023: Priviledged (not in Windows)]
- Characteristics: + Reliable + Connection oriented (sequence numbers)
- TCP Session (3 way handshake): [ISN: Initial Seq. Number]
--------- SYN (ISNa) ------->
<- SYN (ISNb) - ACK (ISNa+1) --
--------- ACK (ISNb+1) ------>
< ..... >
-- FIN-ACK ->
<--- ACK ----
<- FIN-ACK --
---- ACK --->
- Sending data on a SYN pkt is valid, although not typical. This data will be
considered part of the stream once the three way handshake is completed.
Typical uses are Round-trip time measurement or NID evasion/insertion attack.
- TCP Retries: src port, dst port, seq. numbers persist in all retries.
IP IDs change (incrementally).
- Timeout: Depending on the TCP/IP implementation, timeout of a TCP connection
can be somewhere between 2 and 30 mins.
...............................................................................
^ TOP ^
2. TCP Segment Format
=====================
( TCP HEADER: min. 20b, max. 60b - DATA: var. )
0 8 16 24 31
+-----------------------------------+---------------------------------+
| # source port | # dest. port |
|-----------------------------------+---------------------------------|
32 | sequence number |
|---------------------------------------------------------------------|
64 | ack number |
|----------+------+--+--+-+-+-+-+-+-+---------------------------------|
96 |hdr.len(4)|RSV(4)|R1 R2 U A P R S F| window size |
|----------+------+--+--+-+-+-+-+-+-+---------------------------------|
128 | TCP checksum | urgent pointer |
|-----------------------------------+---------------------------------|
160 | options field (var.length - max. 40b - 0 padding to 4b mult.) |
|---------------------------------------------------------------------|
192 ~ DATA ~
+---------------------------------------------------------------------+
- src port: 0-65535 ; >1023 ephymeral (only unix)
Each new connection not being a retry will use a different src port.
dst port:
- seq. #: Each new connection not being a retry will use a different seq #.
ISN - Initial seq. number: first seq. number in TCP exchange.
Win 95, 98, NT use a trivial time dependency formula to generate ISNs.
- ack: + Must be >= 0 (TCP SYN consumes 1 seq. #) except for ISN 0xffffffff
+ does not forces the ACK of an initial SYN to be zero, but
it usually is (except for some OSs).
+ All pkts in a established connections must have the ack bit set.
+ The ack# will correspond to the seq.# of the last pkt sent by the
peer (consecutive pkts can have the same ack number if no data is
sent by the peer). If there is no actual data in the received pkt (0),
the ack# will be the seq# received + 1.
clnt.52930 > SRV.discard: S 4153139971:4153139971(0)
SRV.discard > clnt.52930: S 1308019873:1308019873(0) ack 4153139972
clnt.52930 > SRV.discard: . ack 1308019874 <-
clnt.52930 > SRV.discard: P 4153139972:4153139975(3) ack 1308019874 <-
SRV.discard > clnt.52930: . ack 4153139975
clnt.52930 > SRV.discard: P 4153139975:4153139978(3) ack 1308019874 <-
SRV.discard > clnt.52930: . ack 4153139978
clnt.52930 > SRV.discard: P 4153139978:4153139995(17) ack 1308019874 <-
SRV.discard > clnt.52930: . ack 4153139995
+ If a pkt is lost, and the receiving host receives a seq# higher than
expected, it will send back to the sending host a pkt with the seq#
it is expecting. After several pkts of this kind, the sending host
will understand that pkts are lost and will retransmit them.
If the last pkt of a stream is lost, this mechanism would never take
place. Instead, a timeout mechanism will make the sending host to
retransmit the pkt if it does not receive an ack after a certain time.
This time changes through sessions, or even inside a session, depending
on network conditions and distances.
- RSV - Reserved (4b)
- Flags: + SYN Synchronize the sequence numbers to establish a connection
+ FIN Sender is finished sending data -- initialize a half close
+ RST Reset - Abort the connection
+ PSH tells receiver not to buffer the data before passing it to the
application (interactive applications use this)
It's possible to send data without the PSH flag set.
+ ACK Acknowledgement number is valid
+ URG Urgent pointer is valid (often from an interrupt, e.g. CTRL-c)
Rarely used. Intended to elevate the priority of the pkt.
+ R2 - Non ECN hosts: Reserved bit. Must be zero.
- ECN hosts: ECN-echo - Turned on when a pkt has both the
"ECN-capable" and "Congestion Experienced" bits set
in the IP hdr.
Informs the sender to reduce the rate
+ R1 - No ECN hosts: Reserved bit. Must be zero.
- ECN hosts: CWR - Congestion Window Reduced -
Upon reception of ECN-echo, sender will reduce
congestion window by half, and will set CWR.
- Window Size: size of TCP buffer for incoming data in current connection.
It is a flow control mechanism which allows the sender to
transmit multiple pkts before stopping to wait for acks.
- If the sender has "a lot" of data to send, it will split it
among several pkts (none of which should exceed the MSS) and
will send them one after the other without waiting ACKs from
the receiving host. It will stop sending pkts after the
aggregated size of them all reaches the WS limit. It will then
wait for the ACK from the receiving host before continuing to
send more pkts.
- The WS can be changed by the receiving host during the
connection (e.g. if the receiving process hasn't processed all
the data received, it will send an ACK with a modified window
having the value of the incoming available space).
If the receiver has no more space for new data, it will ACK with
a window size of 0. When it has freed resources, it will send a
"window update" ACK announcing its new available window size
(it's not a "real" ACK since it's not acking new data).
- Should be set to: netwrk bandwidth x round trip time (ping)
( e.g. 100 Mbps net, 5ms rt = 100Mbps x 5.10^-3 s = 0.5 Mbits =
512 kbits = 64 kbytes).
- Many architectures have limits on the size of the socket
buffer and hence the TCP window size (Typically a megabyte)
- 4096 is not optimal for ethernet; 16384 is much better.
- Default values by OS: check the passive fingerprinting section.
- Should not be 0 in an initial SYN
- TCP checksum:
+ The algorithm divides the data to be checksummed into 16-bit fields.
+ Each 16-bit field has a 1's complement operation done on it and each of
+ these 1's complements values are added.
+ Validated by the destination host only.
+ If checksum is wrong, the datagram is discarded silently.
+ The checksum is calculated over the payload (padded to 2-byte boundary)
and a pseudo hdr:
[ src IP (4b) | dst IP (4b) | 0x00 (1b), proto (1b), tcp-length (2b) ]
[ src port (2b), dst port (2b) | tcp length (2b), tcp chksum (2b) ]
[ data (2-byte padded) ]
+ Similar method for UDP
- Urgent Ptr: way for the sender to transmit emergency data to the other end;
it's up to the receiving end to decide what to do with it.
+ Only valid if the URG flag is set
+ Rarely used. Intended to elevate the priority of the pkt.
+ Possitive offset that must be added to the seq. number of the
segment to yield the seq.number of the last byte of urgent data.
There's no way to know where the data begins.
+ Most common application: C-c sent in a rlogin/telnet session.
- TCP options: (max. 40 bytes)
+-----------+------------+--------
+ GENERAL STRUCTURE: | Kind (1b) | Length (1b)| ....
+-----------+------------+--------
(Length includes Kind + Length bytes (2))
+ Types:
------
0. EOL - 1b - End of Option List - RFC 793 - [ 00 ]
If necessary, used as padding to form 4-byte fields at the end
of the option list.
1. NOP - 1b - No-Option - RFC 793 - [ 01 ]
Although not compulsory, usually used as word padding at
the begining or end of each option.
2. MSS - 4b (1+1+2) - Max.Segment Size - RFC793/879- [02 04 <MSS>]
The MSS is the largest collection of data which the client will
try to send to the server in the TCP datagram. This size refers
to the payload; you still have to add headers size.
It can only appear in a SYN (set once and not readjusted).
Values:
+ 536b - Default and usual if destination is not local.
+ x512 - Many BSDs require MSS to be a multiple of 512.
+ 1460 (0x05b4): usual value (solaris, aix, ...) when both
ends are ethernet (proven more efficient than 1024).
+ Max: Outgoing interface's MTU - TCP hdr - IP hdr
(ethernet/802.3: MSS<=1460 ; 802.2<=1452)
Fragmentation:
+ With no fragmentation, then the bigger MSS, the better.
+ In case of both machines announcing an MSS bigger than the
maximum of any of the intermediate networks, fragmentation
will occur.
The MTU path discovery mechanism is the only way around it.
3. WSCALE - 3b (1+1+1) - Window Scale - RFC1072- [03 03 <shift.cnt>]
Multiplicative factor that allow receiving buffers to be>65535
WSCALE designates the number of bits that WSIZE should be
shifted in order to compute the actual WSIZE.
Eg. WSIZE=55808, WSCALE=2 -> actual WSIZE=223,232
4. SACKOK - 2b (1+1) - Selective ACK Permitted - RFC1072/2018- [04 02]
Selective acknoledgement is a method allowing the data receiver
to tell the sender which segments arrived successfully. This
lets the sender retransmit only lost pkts, in an attempt to
improve upon TCP's cumulative acknowledgement process.
5. SACK - var length (1+1+2+2+...) - RFC1072
[05 <length(1b)> <Relative Origin(2b)> <Block Size(2b)> ... ]
6. ECHO - 6b (1+1+4) - (RFC1072)- [06 06 <info to be echoed>]
7. ECHOREPLY - 6b (1+1+4) - (RFC1072)- [07 06 <echoed info>]
8. TIMESTAMP - 10b (1+1+4+4) - (RFC1323)-
[08 0a <TS Value (TSval)> <TS Echo Reply (TSecr)> ]
Used to compute retransmission timer (helps to recover from pkt
loss) through round trip time calculation, and to make sure a
reused and old sequence number does not accidentally get
included with a current exchange.
9. POC-perm - 2b(1+1)-Partial Order Service Permitted- RFC1693-[09 02]
10. POC-service-profile - 3b (1+1+1) - RFC1693
1 bit 1 bit 6 bits
[ 0a 03 <Start_flag | End_flag | Filler >]
11. CC - 6b (1+1+4) - (RFC1644)- [0b 06 <Connection Count: SEG.CC>]
12. CCNEW - 6b (1+1+4) - (RFC1644)- [0c 06 <Connection Count: SEG.CC>]
13. CCECHO - 6b (1+1+4) -(RFC1644)- [0d 06 <Connection Count: SEG.CC>]
- PADDING: variable
The TCP header padding is used to ensure that the TCP header ends and data
begins on a 32 bit boundary. The padding is composed of zeros.
- KINDS EXPLAINED:
0. EOL - 1b - End of Option List - RFC 793 - [ 00 ]
This option code indicates the end of the option list. This
might not coincide with the end of the TCP header according to
the Data Offset field. This is used at the end of all options,
not the end of each option, and need only be used if the end of
the options would not otherwise coincide with the end of the
TCP header.
3. WSCALE - 3b (1+1+1) - Window Scale - RFC1072- [03 03 <shift.cnt>]
May be sent in a SYN segment by a TCP:
(1) to indicate that it is prepared to do both send and
receive window scaling
(2) to communicate a scale factor to be applied to its receive
window.
The scale factor is encoded logarithmically, as a power of 2
(presumably to be implemented by binary shifts).
Note: the window in the SYN segment itself is never scaled.
Here shift.cnt is the number of bits by which the receiver
right-shifts the true receive-window value, to scale it into a
16-bit value to be sent in TCP header (this scaling is
explained below). The value shift.cnt may be zero (offering to
scale, while applying a scale factor of 1 to the receive
window).
4. SACKOK - 2b (1+1) - Sack-Permitted - (RFC1072)- [04 02]
May be sent in a SYN by a TCP that has been extended to
receive (and presumably process) the SACK option once the
wconnection has opened.
...............................................................................
^ TOP ^
3. TCP Connection States
========================
- LISTEN Waiting for connection request from remote host
- SYN-SENT Waiting for SYN ACK after sending SYN
- SYN-RECEIVED Waiting for confirming connection request ACK
- ESTABLISHED State after the 3 way handshake is completed. Data received can
be delivered to process (normal state for data transfer phase)
- FIN-WAIT-1 Waiting for FIN from remote host or FIN sent, waiting for ACK
- FIN-WAIT-2 Waiting for connection termination request from the remote host
- CLOSE-WAIT Waiting for connection termination request from local process
- CLOSING Waiting for connection termination request ACK from remote host
- LAST-ACK Waiting for ACK of connection termination request previously sent
(includes ACK of its connection termination request).
- TIME-WAIT Waiting for enough time to pass to be sure that a remote TCP
process receives the ACK to its connection termination request.
- CLOSED No connection state
...............................................................................
^ TOP ^
4. TCP stimulus - response
==========================
[c = client] [s = server] [r = router] [S = SYN] [R = RESET] [A = ACK]
- Open port: S [c->s] / S A [s->c]
F [c->s] / -
+ Linux:
127.0.0.1.49900 > 127.0.0.1.http: S 489924623:489924623(0)
win 32767 <mss 16396,sackOK,timestamp 80581855 0,nop,wscale 0> (DF)
127.0.0.1.http > 127.0.0.1.49900: S 497843559:497843559(0) ack 489924624
win 32767 <mss 16396,sackOK,timestamp 80581855 80581855,nop,wscale 0> (DF)
127.0.0.1.49900 > 127.0.0.1.http: . ack 1
win 32767 <nop,nop,timestamp 80581855 80581855> (DF)
+ Anomalies: Linux 2.4 & Windows send RA in response to F.
- Closed port: S [c->s] / R A [s->c]
F [c->s] / R A [s->c]
+ Linux:
127.0.0.1.49899 > 127.0.0.1.1234: S 439514276:439514276(0)
win 32767 <mss 16396,sackOK,timestamp 80576963 0,nop,wscale 0> (DF)
127.0.0.1.1234 > 127.0.0.1.49899: R 0:0(0) ack 439514277 win 0 (DF)
- Unrecognized connection: [UPSF] A [c->s] / R [s->c]
+ Linux: (SA pkt has been forged; server respondes with a R without ACK)
127.0.0.1.2640 > 127.0.0.1.0: S 1022999587:1022999587(0) ack 1416218398
127.0.0.1.0 > 127.0.0.1.2640: R 1416218398:1416218398(0)
- Response to Reset: R [c->s] / -
RA [c->s] / -
+ A R should never elicit a response (neither R alone or RA).
- Aborting a connection:
127.0.0.1.49899 > 127.0.0.1.1234: S 439514276:439514276(0)
127.0.0.1.http > 127.0.0.1.49900: S 497843559:497843559(0) ack 489924624
127.0.0.1.49900 > 127.0.0.1.http: . ack 1
[...]
127.0.0.1.http > 127.0.0.1.49900: R
- Host does not exist: S [c->s] / ICMP host unreachable [r->c]
- Port blocked by router: S [c->s] / ICMP host unreachable - admin
prohibited filter [r->c]
- Port blocked by silent router: S [c->s] / -
...............................................................................
^ TOP ^
5. Resetting a connection
=========================
- As a general rule, reset (RST) must be sent whenever a segment arrives
which apparently is not intended for the current connection. A reset
must not be sent if it is not clear that this is the case. There are
three groups of states:
1. If the connection does not exist (CLOSED):
R is sent in response to any incoming segment except another R.
In particular, SYNs addressed to a non-existent connection are rejected
by this means. If the incoming segment has an ACK field, the reset
takes its sequence number from the ACK field of the segment, otherwise
the reset has sequence number zero and the ACK field is set to the sum
of the sequence number and segment length of the incoming segment. The
connection remains in the CLOSED state.
2. If the connection is in any non-synchronized state (LISTEN, SYN-SENT,
SYN-RECEIVED), and the incoming segment acks something not yet sent
(the segment carries an unacceptable ACK), or if an incoming segment has
a security level or compartment which does not exactly match the level and
compartment requested for the connection:
R is sent
+ If the incoming segment has an ACK field, the reset takes its seq
number from the ACK field of the segment
+ Otherwise the reset has seq number zero and the ACK field is set to
the sum of the seq number and segment length of the incoming segment.
The connection remains in the same state.
3. If the connection is in a synchronized state (ESTABLISHED, FIN-WAIT-1,
FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, TIME-WAIT), any unacceptable
segment (out of window sequence number or unacceptable ack number) must
elicit only an empty acknowledgment segment containing the current
send-sequence number and an ack indicating the next sequence number
expected to be received, and the connection remains in the same state...
The R takes its seq number from the ACK field of the incoming segment.
...............................................................................
^ TOP ^
6. Valid and Invalid Flag combinations
======================================
- Valid: { S | SA | A | F | FA | R | RA }
Invalid:
...............................................................................
^ TOP ^
7. Explicit Congestion Notification (ECN)
=========================================
- RFC 3168
- Mechanism to reduce the congestion condition of a network by notifying sending
hosts of congention, so they can reduce the transmission rate.
...............................................................................
^ TOP ^
10. Passive Fingerprinting
==========================
[ From: http://project.honeynet.org/papers/finger/traces.txt ] - See below
- Other values:
WINDOW
------
4.2BSD 2048
4.3BSD 4096
# OS VERSION PLATFORM TTL WINDOW DF TOS
#--- ------- -------- --- ----------- -- ---
DC-OSx 1.1-95 Pyramid/NILE 30 8192 n 0
Windows 9x/NT Intel 32 5000-9000 y 0
NetApp OnTap 5.1.2-5.2.2 54 8760 y 0
HPJetDirect ? HP_Printer 59 2100-2150 n 0
AIX 4.3.x IBM/RS6000 60 16000-16100 y 0
AIX 4.2.x IBM/RS6000 60 16000-16100 n 0
Cisco 11.2 7507 60 65535 y 0
DigitalUnix 4.0 Alpha 60 33580 y 16
IRIX 6.x SGI 60 61320 y 16
OS390 2.6 IBM/S390 60 32756 n 0
Reliant 5.43 Pyramid/RM1000 60 65534 n 0
FreeBSD 3.x Intel 64 17520 y 16
JetDirect G.07.x J3113A 64 5804-5840 n 0
Linux 2.2.x Intel 64 32120 y 0
OpenBSD 2.x Intel 64 17520 n 16
OS/400 R4.4 AS/400 64 8192 y 0
SCO R5 Compaq 64 24820 n 0
Solaris 8 Intel/Sparc 64 24820 y 0
FTX(UNIX) 3.3 STRATUS 64 32768 n 0
Unisys x Mainframe 64 32768 n 0
Netware 4.11 Intel 128 32000-32768 y 0
Windows 9x/NT Intel 128 5000-9000 y 0
Windows 2000 Intel 128 17000-18000 y 0
Cisco 12.0 2514 255 3800-5000 n 192
Solaris 2.x Intel/Sparc 255 8760 y 0
Last Updated: 02/09/2003-13:22:25 - © Copyright 2004, Jess García