Introduction
This article is an introduction to socket (TCP/IP sockets) concepts. It's not meant to be a complete coverage
of all socket topics; it's meant as a primer to educate the reader at a level at which socket programming can be
easily communicated. I've also chosen not to cover higher-level protocols, such as FTP, World Wide Web, etc, as
I assume you're familiar with these (after all, you are on the Internet using the World Wide Web as you read this).
There are several concepts that must be introduced first. As much as possible, the following concepts will be
likened to a real world concept you are likely familiar with: a phone system.
Winsock ("Windows Sockets")
Winsock is a defined and documented standard API for programming network protocols. Most commonly it is used
to program TCP/IP, but can also be used to program Novell (IPX/SPX) and other network protocols. Winsock is accessible
as a DLL that is part of Win32.
TCP/IP
TCP/IP stands for Transmission Control Protocol and Internet Protocol. TCP/IP can mean many things, but in most
cases, it refers to the network protocol itself.
Client
A client is a process that initiates a connection. Typically, clients talk to one server at a time. If a process
needs to talk to multiple servers, it creates multiple clients.
Likening it to a phone call, a client would be the person who makes a call.
Server
A server is a process that answers incoming requests. A typical server handles numerous requests from many clients
simultaneously. Each connection from the server to the client, however, is a separate socket.
Likening it to a phone call, the server would be the person (or voice mail, interactive system, etc.) who answers
the phone when it rings. A server is typically set up so that it can handle multiple incoming phone calls. This
is similar to how a call center might handle many calls by having hundreds of operators and routing each incoming
call to an available operator.
IP Address
Each computer on a TCP/IP network has a unique address associated with it. Some computers may have more than
one address associated with them. An IP address is a 32-bit number and is usually represented in dot notation,
e.g. 192.168.0.1. Each section represents one byte of the 32-bit address.
An IP address is like a phone number; a location (residence, business, etc.) can have one or more phone numbers.
To talk to someone at that location, a connection attempt is initiated (a call is placed and the dialed location's
phone rings) by dialing the phone number for that location. The party at the ringing end of the phone can then
decide whether or not to answer the phone.
Ports
A port is an integer number that identifies which application or service the client wishes to contact.
A port is much like a phone extension. Calling a phone number will get you to a location, but with TCP/IP, every
location also has an extension. There is no default extension, as with a residential phone.
When an application (a server) is ready to accept incoming requests, it begins to listen on a port. This is
why sometimes application or protocol is used interchangeably with the word port. When a client wants to talk to
a server, it must know where the application is (the IP address/phone number), and which port (extension) it's
listening (answering) on.
Typically, applications have a fixed port so that no matter where they run, the port is fixed for that type
of application. For example, HTTP (Web) uses port 80, and FTP uses port 21. So when you want to retrieve a Web
page, you only need to know the location of the computer you wish to retrieve it from, as you know HTTP uses port
80.
Port numbers below 1024 are reserved, and should only be used if you're talking to or implementing a known protocol
that has such a port reserved. Most popular protocols use reserved port numbers.
Sockets
All references to sockets in this article are references to TCP/IP. A socket is the combination of an IP address
and a port number. A socket is also a virtual communication conduit between two processes. These processes may
be local (residing on the same computer) or remote.
A socket is like a phone connection that carries a conversation. To have a conversation, you must first make
the call, and have the other party answer; otherwise, no connection (socket) will be established.
Host Names
Host names are "human-readable" names for IP addresses. An example host name is www.hower.org. Every
host name has an equivalent IP address, e.g. www.hower.org = 207.65.96.71.
Host names are used both to make it easier on us humans, and to allow a computer to change its IP address without
causing all of its potential clients (callers) to lose track of it.
A host name is like a person's name or a business name. A person or business can change their phone number,
but we can still contact them.
DNS
DNS stands for Domain Name Service. DNS is the service that translates host names into IP addresses. To establish
a connection, an IP address must be used, so DNS is used to look up the IP address first.
To make a phone call, you must dial by using the phone number. You cannot dial using a person's name. If you don't
have the person's phone number, or it has changed, you would look up the person's phone number in the phone book,
or call directory assistance. Thus, DNS is the phone book/directory assistance for the Internet.
More Topics
Now that the basics have been covered, you should have a basic understanding of sockets and related topics.
Further topics can now be covered, and programming tasks communicated.
The following topics are not essential for a basic understanding, but can be useful.
TCP
TCP (Transmission Control Protocol) is sometimes also referred to as stream. TCP/IP includes many protocols
and many ways to communicate. The most common transports are TCP and UDP. TCP is a connection-based protocol -
that is, you must connect to a server before you can send data - that guarantees delivery and accuracy of the data
sent and received on the connection. TCP also guarantees that data will arrive in the order that it's sent. Most
things that use TCP/IP use TCP for their transport.
TCP connections are like placing a phone call to carry on a conversation.
UDP
UDP (User Datagram Protocol) is for datagrams and is connectionless. UDP allows "lightweight" packets
to be sent to a host without having to first connect to another host. UDP packets are not guaranteed to arrive
at their destination, and may not arrive in the same order they're sent. When sending a UDP packet, it's sent in
one block. Therefore, you must not exceed the maximum packet size specified by your TCP/IP stack or component.
Windows TCP/IP stack's UDP maximum packet size is typically 32KB.
Because of these factors, many people assume UDP is utterly useless. This is not the case. Many streaming protocols,
such as RealAudio, use UDP. (The term "streaming" can be easily confused with "stream" connection,
which is TCP. When you see these terms, you need to determine the context in which they're used to determine their
proper meaning.) The reliability of UDP packets depends on the reliability of the network. UDP packets are also
often used on applications that run on a LAN, as the LAN is very reliable. UDP packets across the Internet are
generally reliable, and are therefore often used. This, however, cannot be guaranteed - so don't assume your data
will always arrive at your destination.
Because UDP doesn't have delivery confirmation, it's not guaranteed to arrive. So if you send a UDP packet to another
host, you have no way of knowing if it arrived. Winsock will not - and cannot - determine this, and thus will not
provide an error. If you need this information, you'll need to send some sort of return notification back from
the remote host.
UDP is like sending someone a message on their pager. You know you sent it, but you don't know if they received
it. The pager may not exist, may be out of range, may not be on, or may not be functioning. In addition, the pager
network may lose the page. Unless the person pages you back, you don't know if your message was delivered. In addition,
if you send multiple pages, it's possible for them to arrive out of order.
ICMP
ICMP stands for Internet Control Message Protocol. ICMP is a control and maintenance protocol. Typically, you
won't need to use ICMP. Typically, it's used to communicate with routers and other network devices. It allows nodes
to share IP status and error information and is used for PING, TRACEROUTE, and other such protocols.
HOSTS
HOSTS is a text file that exists somewhere in your Windows directory or a sub-directory (its location varies
depending on which version of Windows you're using). By default, many installations don't have such a file, but
have a HOSTS.SAM (.SAM = Sample), which you can use to create a HOSTS file. A HOSTS file contains a local host
lookup table. When Winsock attempts to resolve a host name to an IP, it firsts look in the HOSTS file. If a matching
entry exists, it will use that entry. If an entry doesn't exist, it will proceed to use DNS.
Here is an example HOSTS file:
# This is a sample HOSTS file
caesar 192.168.0.4 # Server computer
augustus 192.168.0.5 # Firewall computer
The host name and IP can be separated by spaces or the tab character. A comment is also optional using the #
character.
HOSTS can be used to fake entries, or override DNS entries. The HOSTS file is often used on computers on a small
LAN that have no DNS server. The HOSTS file is also useful for overriding host IPs for debugging. You don't need
to read the HOSTS file; Winsock will take care of this detail for you transparently whenever name resolution occurs.
SERVICES
A SERVICES file is similar to a HOSTS file, but instead of resolving host names into IP addresses, it resolves
service names into the ports they're assigned to.
The following is a partial SERVICES file. You can look on your computer to see a complete file, or obtain RFC 1700.
RFC 1700 contains assigned and reserved port numbers:
echo 7/tcp
echo 7/udp
discard 9/tcp sink null
discard 9/udp sink null
systat 11/tcp users #Active users
systat 11/tcp users #Active users
daytime 13/tcp
daytime 13/udp
qotd 17/tcp quote #Quote of the day
qotd 17/udp quote #Quote of the day
chargen 19/tcp ttytst source #Character generator
chargen 19/udp ttytst source #Character generator
ftp-data 20/tcp #FTP, data
ftp 21/tcp #FTP. control
telnet 23/tcp
smtp 25/tcp mail #Simple Mail Transfer Protocol
The format of each entry is:
<service name> <port number>/<protocol> [aliases...] [#<comment>]
You don't need to read the SERVICES file; Winsock will also take care of this detail for you. The SERVICES file
is read by certain function calls in Winsock; however, most programs don't call these functions, and therefore
ignore its values. For example, most FTP programs default to 21 without ever using Winsock to look up the port
for the 'ftp' entry.
Normally, you should never modify this file. Some programs, however, add an entry to this file and actually use
it. You can then change this entry to tell those programs to use another port. One such program is Interbase. Interbase
makes the following entry:
gds_db 3050/tcp
You can change this entry to make Interbase use a different port. While it's not common practice to do this,
it's a good practice and should be considered if you write socket applications, especially servers. It's also good
practice for clients to use Winsock to look up the value in SERVICES, especially for non-standard protocols. If
no entry is found, a default should be used.
LOCALHOST/Loopback
LOCALHOST is similar to "Self" in Delphi, or "this" in C++. LOCALHOST refers to the computer
you're working on. It's a loopback address, and has a physical IP of 127.0.0.1. If you use 127.0.0.1 in any client,
it will always loopback and look for a server on the computer the client is on.
This is useful for debugging and can also be used to contact any service running on your computer. If you have
a local Web server, instead of needing to know the IP of the computer or have each developer change it in test
scripts, you can specify 127.0.0.1.
Ping
Ping is a protocol that verifies whether a host is reachable by the local computer. Ping is usually used in
a diagnostic capacity.
Windows has a command-line utility to perform a Ping. Its usage is:
ping <host name or IP>
The following is sample output of a successful Ping:
D:\>ping 127.0.0.1
Pinging 127.0.0.1 with 32 bytes of data:
Reply from 127.0.0.1: bytes=32 time<10ms TTL=128
Reply from 127.0.0.1: bytes=32 time<10ms TTL=128
Reply from 127.0.0.1: bytes=32 time<10ms TTL=128
Reply from 127.0.0.1: bytes=32 time<10ms TTL=128
Ping statistics for 127.0.0.1:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 0ms, Maximum = 0ms, Average = 0ms
If a host cannot be reached, the output will look similar to this:
D:\>ping CAESAR
Pinging 192.168.0.4 with 32 bytes of data:
Destination host unreachable.
Destination host unreachable.
Destination host unreachable.
Destination host unreachable.
Ping statistics for 192.168.0.4:
Packets: Sent = 4, Received = 0, Lost = 4 (100% loss),
Approximate round trip times in milli-seconds:
Minimum = 0ms, Maximum = 0ms, Average = 0ms
TraceRoute
TCP/IP packets don't travel directly from one host to another. They are routed, much like a car drives from
one house to another. Typically, the car must travel on more than one road to reach its destination. TCP/IP packets
travel much in the same way. Each time a packet changes "roads" at an "interchange," it travels
through a node. By obtaining a list of nodes or "interchanges" that a packet must travel through between
hosts, you can determine its path. This is quite useful in determining why a host cannot be reached.
Windows has a command-line utility to perform TraceRoutes, called TraceRt. TraceRt displays a list of IP routers
(nodes) that are used in delivering a packet from your computer to the specified host, and how long each trip (hop)
took. These times can be useful in determining bottlenecks. TraceRt can also display the last router that successfully
handled your packet in the case of a failed transfer. TraceRt is used to further diagnose problems detected with
Ping.
The following is sample output of a successful TraceRt:
D:\>tracert www.borland.com
Tracing route to www.borland.com [207.105.83.51]
over a maximum of 30 hops:
1 <10 ms <10 ms <10 ms LANmodem [192.168.0.1]
2 40 ms 40 ms 50 ms knoxmax4.planetc.com [207.65.110.5]
3 50 ms 50 ms 50 ms knox-uunet.planetc.com [207.65.110.9]
4 70 ms 71 ms 60 ms 588.Hssi3-0-0.GW1.ATL1.ALTER.NET [157.130.67.189]
5 60 ms 130 ms 91 ms 103.ATM2-0.XR2.ATL1.ALTER.NET [146.188.232.46]
6 60 ms 70 ms 60 ms 194.ATM2-0.TR2.ATL1.ALTER.NET [146.188.232.98]
7 120 ms 100 ms 70 ms 109.ATM6-0.TR2.CHI4.ALTER.NET [146.188.136.10]
8 110 ms 70 ms 141 ms 103.ATM2-0.XR2.ATL1.ALTER.NET [146.188.208.225]
9 90 ms 80 ms 90 ms 194.ATM9-0-0.BR1.CHI1.ALTER.NET [146.188.208.13]
10 101 ms 80 ms 90 ms 198.ATM6-0.XR2.CHI4.ALTER.NET [165.87.164.150]
11 131 ms 160 ms 90 ms scha1sr2-2-0.il.us.prserv.net [165.87.34.161]
12 290 ms 211 ms 140 ms scha1br1-ge-1-0-0-0.il.us.prserv.net [165.87.13.61]
13 141 ms 170 ms 160 ms sfra1br1-t3-2-1-0-1.ca.us.prserv.net [165.87.13.30]
14 210 ms 130 ms 170 ms sfra1br1-t3-2-1-0-1.ca.us.prserv.net [165.87.161.5]
15 130 ms 151 ms 140 ms sfra1sr3-so-0-0-0-0.ca.us.prserv.net [209.232.130.70]
16 180 ms 160 ms 151 ms ded2-fa12-1-0.snfc21.pbi.net [206.13.8.26]
17 140 ms 210 ms 180 ms 207.105.83.51
18 151 ms 150 ms 140 ms 207.105.83.51
Trace complete.
IETF
The IETF (Internet Engineering Task Force) is an open community that promotes the operation, stability, and
evolution of the Internet. The IETF works much like Open Source software development teams. The IETF can be found
at http://www.ietf.org/.
RFC
RFCs (Request For Comments) are the official documents of the IETF that describe and detail protocols of the
Internet.
Conclusion
My goal for this article was to provide the reader with a basic understanding of socket concepts necessary to
move to the next level, and to liken them to real-world concepts. I hope that I have met this goal.
More!
This article is an extract from the book Indy in Depth. Indy in Depth is an e-book which you can subscribe to and receive the complete book by e-mail. Also check out the Atozed Indy Portal at www.atozedsoftware.com
About the Author
Chad Z. Hower, a.k.a. "Kudzu" is the original author and project coordinator for Internet Direct (Indy). Indy consists of over 110 components and is included as a part of Delphi, Kylix and C++ Builder. Chad's background includes work in the employment, security, chemical, energy, trading, telecommunications, wireless, and insurance industries. Chad's area of specialty is TCP/IP networking and programming, inter-process communication, distributed computing, Internet protocols, and object-oriented programming. When not programming, he likes to cycle, kayak, hike, downhill ski, drive, and do just about anything outdoors. Chad, whose motto is "Programming is an art form that fights back", also posts free articles, programs, utilities and other oddities at Kudzu World at http://www.Hower.org/Kudzu/. Chad is an American ex-patriate who currently spends his summers in St. Petersburg, Russia and his winters in Limassol, Cyprus. Chad can be reached using this form.
Chad works as a Senior Developer for Atozed Software.