Perl Practicum: Network Wiles
(Part I)

by Hal Pomeranz

This multipart article examines network programming using Perl. Network programming with Perl is very much like network programming with C, but Perl's language constructs make it much easier to focus on the actual work of setting up a network connection, rather than issues like exception handling and data reformatting. People who have always been mystified by network applications can often use Perl to spin themselves up; veteran programmers can use Perl to prototype network applications rapidly.

Thinking About Network Programming

Discussions of network programming always seem to devolve into mazes of twisty little acronyms, but the basic concepts are very simple. The easiest network relationship is between two hosts: a "server" that possesses some collection of data and a "client" that has a question it needs to ask the server to answer (hence "client-server computing"). For example, your Web browser is a client that can ask the Web server at my organization, "What information is stored at http://www.netmarket.com/?"

One useful analogy is those in-flight music systems that airlines have. The server is the airplane's music system: it has a bunch of data (movie soundtracks and different types of music) that it can supply to the passengers (the clients). The clients have to ask explicitly for the information, however, by plugging in a pair of headphones and dialing a little num ber to get the exact sounds they want to listen to.

The headphones in the above analogy stand for a concept that network programmers call a "socket." Clients establish socket connections to servers by connecting one end of a logical pipe to the server at a well-known address (the little hole in your airline seat) with a specific port number (dialing the number on your seat to get rock music) while holding onto the other end of the pipe (keeping the headphones on your ears).

As in my airline example, properly designed network servers can handle several client connections simultaneously. Unlike my analogy, network clients can connect to several different servers (or multiple times to the same server) simultaneously.

Doing It

Easily the most complicated part of setting up a socket is preparing the binary data structure that tells the operating system which server to connect to. This stuff doesn't look very Perl-like because we are preparing a C data structure- the Perl networking functions are directly tied back to the C socket library.
     use Socket;
     $server = "www.netmarket.com";
     $port = 80;

     $server_addr =(gethostbyname($server))[4];
          $server_struct = pack("S n a4 x8", AF_INET, $port, $server_addr);
     $proto = (getprotobyname(`tcp'))[2];
     socket(MYSOCK, PF_INET, SOCK_STREAM, $proto)|| die "Failed to initialize socket: $!\n";
     connect(MYSOCK, $server_struct) || die "Failed to connect() to server: $!\n";
The first line of this example simply pulls in the Perl sockets module. This module defines a number of useful constants that are employed later in the program. Next come the name of the server that this client will contact and the network port on which to talk to the server-port 80 happens to be the port that Web servers listen on, by default. You might actually get these values passed into your program as command line arguments, or this code might become part of a function that gets these values as function arguments.

In order to be able to connect to the server, the program has to translate the server's human readable name (www.netmarket.com) into a network address. The gethostbyname() function looks up the server name and returns a list of information: the network address of the server is the fifth value of the result (don't worry about the other values right now).

The C structure is created by using this address and the pack() function. This structure has three fields: a description of the type of network address in the rest of the structure, what port address to connect to, and what server address to connect to (the rest of the structure is just filled up with zeroes). AF_INET is a constant defined in the Perl socket module, which stands for an Internet Protocol (IP) type address (unfortunate people have to use other types of networks like AppleTalk, DECnet, or X.25, all of which have their own AF_* constants in Socket.pm). Unless the programmer specifies the type of network connection at the front of the structure, the operating system will not be able to interpret the network address information in the rest of the structure, and the attempt to set up the socket will fail.

With that messy pack() business out of the way, we can start setting up the actual socket. First, the client initializes its end of the socket as a Perl file handle, MYSOCK. The other arguments to the socket() function specify the type of network connection, how the socket will be used, and the transmission protocol. PF_INET is another constant from Socket.pm that is related to AF_INET and specifies that this socket will be an IP type socket (indeed, in the early days, AF_INET was used in both the C structure and in the socket() call-avoid and abhor this practice). SOCK_STREAM is another constant which says that the client and server will talk using a connection similar to a telephone call - both parties can talk back and forth to each other and the connection will stay up until one party hangs up. (SOCK_STREAM is the most common communications method, but other methods exist such as SOCK_DGRAM which is more like smoke signalling-client and server can send out messages, but there is no guarantee that the other party will receive them.)

Finally, the transmission protocol is specified: the discussion of TCP versus UDP is beyond the scope of this article, but TCP is always the right thing to use unless you are very sure that it isn't. Always use getprotobyname() to get the right value for the TCP protocol number. Lazy programmers frequently hard-code this value because it happens to be the same on nearly every UNIX variant out there, and people like me curse them when I have to port the code to non-UNIX systems or strange UNIX variants.

With one end of the socket firmly in hand (again, as the file handle MYSOCK) the client calls connect() to actually contact the server. The connect() function takes as arguments the file handle and the C structure created earlier. Assuming the connect() succeeds, the client has actually established a session with the server.

Using It

MYSOCK can now be treated just like any Perl file handle, except that you can both read and write from the same socket. In order to save network and system resources, it is particularly important to remember to close() sockets when you are done with them.

Because this client has connected to the Web server (port 80, remember?) on www.netmarket.com, the client program can request an HTML document using the HTTP protocol:

     select(MYSOCK);
     $| = 1;
     select(STDOUT);
     print MYSOCK "GET /\n\n";
     while (<MYSOCK>) {
          print;
     }
     close(MYSOCK);
The first three lines turn off the standard I/O buffering on the socket. When reading and writing from a file, it is usually most efficient to do large reads or writes (read more data than needed or save up a lot of small writes and do them all at once), and most UNIX systems take care of doing this automatically. This behavior can, however, be disabled - for example, on a network socket where the client and server are passing short messages back and forth. The Perl mechanism for turning off buffering is to set the $| variable to be non zero (it's zero by default). Setting this variable affects only the currently selected() file handle (STDOUT is selected by default), so you have to select(MYSOCK), set the vari able, and then go back to the default of STDOUT.

That done, the client requests a file from the Web server using the GET command in the HTTP protocol. The argument to GET is the name of the file requested (in this case, the client is asking for the file at the root of the document tree, but could just as easily have asked for:

     /some/other/file.html).
The GET request is followed by two newlines.

Once the client makes its request, the server sends the contents of the requested file back down the socket (or an error message if the file was not found or some other error occurred). The standard HTTP protocol defines that when the server finishes sending the file, it hangs up its end of the connection - this causes the entire socket to be torn down. A client reading from a socket interprets this event just as if it had been reading from a file and reached the end-of-file marker. In the program above, the HTML document is simply being printed to the standard output.

Practicing It

The above example covers the basics of writing a network client program. There is a good deal of additional lore surrounding this subject, but there are a lot of people out there earning huge salaries who don't know anything more than what you have seen here. In the next article I will explore server programming by writing a simple Web server.

In the meantime, practice these concepts by taking the example above and writing a program that will take the server name, port number (default to port 80), and file name as command line arguments and fetch that file from the remote Web server. Impress your friends (and increase your productivity) by building a Web robot that surfs the Web for you by looking for HREF tags in the documents you download and then fetches those documents as well (making sure that you don't download the same document twice!). Now make sure the robot stops at some point, or you'll download the entire Web.


Reproduced from ;login: Vol. 21 No. 4, August 1996.

Back to Table of Contents

12/4/96ah