555 lines
20 KiB
Plaintext
555 lines
20 KiB
Plaintext
=head1 NAME
|
|
|
|
perlfaq9 - Networking ($Revision: 1.24 $, $Date: 1999/01/08 05:39:48 $)
|
|
|
|
=head1 DESCRIPTION
|
|
|
|
This section deals with questions related to networking, the internet,
|
|
and a few on the web.
|
|
|
|
=head2 My CGI script runs from the command line but not the browser. (500 Server Error)
|
|
|
|
If you can demonstrate that you've read the following FAQs and that
|
|
your problem isn't something simple that can be easily answered, you'll
|
|
probably receive a courteous and useful reply to your question if you
|
|
post it on comp.infosystems.www.authoring.cgi (if it's something to do
|
|
with HTTP, HTML, or the CGI protocols). Questions that appear to be Perl
|
|
questions but are really CGI ones that are posted to comp.lang.perl.misc
|
|
may not be so well received.
|
|
|
|
The useful FAQs and related documents are:
|
|
|
|
CGI FAQ
|
|
http://www.webthing.com/tutorials/cgifaq.html
|
|
|
|
Web FAQ
|
|
http://www.boutell.com/faq/
|
|
|
|
WWW Security FAQ
|
|
http://www.w3.org/Security/Faq/
|
|
|
|
HTTP Spec
|
|
http://www.w3.org/pub/WWW/Protocols/HTTP/
|
|
|
|
HTML Spec
|
|
http://www.w3.org/TR/REC-html40/
|
|
http://www.w3.org/pub/WWW/MarkUp/
|
|
|
|
CGI Spec
|
|
http://www.w3.org/CGI/
|
|
|
|
CGI Security FAQ
|
|
http://www.go2net.com/people/paulp/cgi-security/safe-cgi.txt
|
|
|
|
=head2 How can I get better error messages from a CGI program?
|
|
|
|
Use the CGI::Carp module. It replaces C<warn> and C<die>, plus the
|
|
normal Carp modules C<carp>, C<croak>, and C<confess> functions with
|
|
more verbose and safer versions. It still sends them to the normal
|
|
server error log.
|
|
|
|
use CGI::Carp;
|
|
warn "This is a complaint";
|
|
die "But this one is serious";
|
|
|
|
The following use of CGI::Carp also redirects errors to a file of your choice,
|
|
placed in a BEGIN block to catch compile-time warnings as well:
|
|
|
|
BEGIN {
|
|
use CGI::Carp qw(carpout);
|
|
open(LOG, ">>/var/local/cgi-logs/mycgi-log")
|
|
or die "Unable to append to mycgi-log: $!\n";
|
|
carpout(*LOG);
|
|
}
|
|
|
|
You can even arrange for fatal errors to go back to the client browser,
|
|
which is nice for your own debugging, but might confuse the end user.
|
|
|
|
use CGI::Carp qw(fatalsToBrowser);
|
|
die "Bad error here";
|
|
|
|
Even if the error happens before you get the HTTP header out, the module
|
|
will try to take care of this to avoid the dreaded server 500 errors.
|
|
Normal warnings still go out to the server error log (or wherever
|
|
you've sent them with C<carpout>) with the application name and date
|
|
stamp prepended.
|
|
|
|
=head2 How do I remove HTML from a string?
|
|
|
|
The most correct way (albeit not the fastest) is to use HTML::Parse
|
|
from CPAN (part of the HTML-Tree package on CPAN).
|
|
|
|
Many folks attempt a simple-minded regular expression approach, like
|
|
C<s/E<lt>.*?E<gt>//g>, but that fails in many cases because the tags
|
|
may continue over line breaks, they may contain quoted angle-brackets,
|
|
or HTML comment may be present. Plus folks forget to convert
|
|
entities, like C<<> for example.
|
|
|
|
Here's one "simple-minded" approach, that works for most files:
|
|
|
|
#!/usr/bin/perl -p0777
|
|
s/<(?:[^>'"]*|(['"]).*?\1)*>//gs
|
|
|
|
If you want a more complete solution, see the 3-stage striphtml
|
|
program in
|
|
http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/striphtml.gz
|
|
.
|
|
|
|
Here are some tricky cases that you should think about when picking
|
|
a solution:
|
|
|
|
<IMG SRC = "foo.gif" ALT = "A > B">
|
|
|
|
<IMG SRC = "foo.gif"
|
|
ALT = "A > B">
|
|
|
|
<!-- <A comment> -->
|
|
|
|
<script>if (a<b && a>c)</script>
|
|
|
|
<# Just data #>
|
|
|
|
<![INCLUDE CDATA [ >>>>>>>>>>>> ]]>
|
|
|
|
If HTML comments include other tags, those solutions would also break
|
|
on text like this:
|
|
|
|
<!-- This section commented out.
|
|
<B>You can't see me!</B>
|
|
-->
|
|
|
|
=head2 How do I extract URLs?
|
|
|
|
A quick but imperfect approach is
|
|
|
|
#!/usr/bin/perl -n00
|
|
# qxurl - tchrist@perl.com
|
|
print "$2\n" while m{
|
|
< \s*
|
|
A \s+ HREF \s* = \s* (["']) (.*?) \1
|
|
\s* >
|
|
}gsix;
|
|
|
|
This version does not adjust relative URLs, understand alternate
|
|
bases, deal with HTML comments, deal with HREF and NAME attributes in
|
|
the same tag, or accept URLs themselves as arguments. It also runs
|
|
about 100x faster than a more "complete" solution using the LWP suite
|
|
of modules, such as the
|
|
http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/xurl.gz
|
|
program.
|
|
|
|
=head2 How do I download a file from the user's machine? How do I open a file on another machine?
|
|
|
|
In the context of an HTML form, you can use what's known as
|
|
B<multipart/form-data> encoding. The CGI.pm module (available from
|
|
CPAN) supports this in the start_multipart_form() method, which isn't
|
|
the same as the startform() method.
|
|
|
|
=head2 How do I make a pop-up menu in HTML?
|
|
|
|
Use the B<E<lt>SELECTE<gt>> and B<E<lt>OPTIONE<gt>> tags. The CGI.pm
|
|
module (available from CPAN) supports this widget, as well as many
|
|
others, including some that it cleverly synthesizes on its own.
|
|
|
|
=head2 How do I fetch an HTML file?
|
|
|
|
One approach, if you have the lynx text-based HTML browser installed
|
|
on your system, is this:
|
|
|
|
$html_code = `lynx -source $url`;
|
|
$text_data = `lynx -dump $url`;
|
|
|
|
The libwww-perl (LWP) modules from CPAN provide a more powerful way to
|
|
do this. They work through proxies, and don't require lynx:
|
|
|
|
# simplest version
|
|
use LWP::Simple;
|
|
$content = get($URL);
|
|
|
|
# or print HTML from a URL
|
|
use LWP::Simple;
|
|
getprint "http://www.sn.no/libwww-perl/";
|
|
|
|
# or print ASCII from HTML from a URL
|
|
# also need HTML-Tree package from CPAN
|
|
use LWP::Simple;
|
|
use HTML::Parse;
|
|
use HTML::FormatText;
|
|
my ($html, $ascii);
|
|
$html = get("http://www.perl.com/");
|
|
defined $html
|
|
or die "Can't fetch HTML from http://www.perl.com/";
|
|
$ascii = HTML::FormatText->new->format(parse_html($html));
|
|
print $ascii;
|
|
|
|
=head2 How do I automate an HTML form submission?
|
|
|
|
If you're submitting values using the GET method, create a URL and encode
|
|
the form using the C<query_form> method:
|
|
|
|
use LWP::Simple;
|
|
use URI::URL;
|
|
|
|
my $url = url('http://www.perl.com/cgi-bin/cpan_mod');
|
|
$url->query_form(module => 'DB_File', readme => 1);
|
|
$content = get($url);
|
|
|
|
If you're using the POST method, create your own user agent and encode
|
|
the content appropriately.
|
|
|
|
use HTTP::Request::Common qw(POST);
|
|
use LWP::UserAgent;
|
|
|
|
$ua = LWP::UserAgent->new();
|
|
my $req = POST 'http://www.perl.com/cgi-bin/cpan_mod',
|
|
[ module => 'DB_File', readme => 1 ];
|
|
$content = $ua->request($req)->as_string;
|
|
|
|
=head2 How do I decode or create those %-encodings on the web?
|
|
|
|
Here's an example of decoding:
|
|
|
|
$string = "http://altavista.digital.com/cgi-bin/query?pg=q&what=news&fmt=.&q=%2Bcgi-bin+%2Bperl.exe";
|
|
$string =~ s/%([a-fA-F0-9]{2})/chr(hex($1))/ge;
|
|
|
|
Encoding is a bit harder, because you can't just blindly change
|
|
all the non-alphanumeric characters (C<\W>) into their hex escapes.
|
|
It's important that characters with special meaning like C</> and C<?>
|
|
I<not> be translated. Probably the easiest way to get this right is
|
|
to avoid reinventing the wheel and just use the URI::Escape module,
|
|
which is part of the libwww-perl package (LWP) available from CPAN.
|
|
|
|
=head2 How do I redirect to another page?
|
|
|
|
Instead of sending back a C<Content-Type> as the headers of your
|
|
reply, send back a C<Location:> header. Officially this should be a
|
|
C<URI:> header, so the CGI.pm module (available from CPAN) sends back
|
|
both:
|
|
|
|
Location: http://www.domain.com/newpage
|
|
URI: http://www.domain.com/newpage
|
|
|
|
Note that relative URLs in these headers can cause strange effects
|
|
because of "optimizations" that servers do.
|
|
|
|
$url = "http://www.perl.com/CPAN/";
|
|
print "Location: $url\n\n";
|
|
exit;
|
|
|
|
To be correct to the spec, each of those C<"\n">
|
|
should really each be C<"\015\012">, but unless you're
|
|
stuck on MacOS, you probably won't notice.
|
|
|
|
=head2 How do I put a password on my web pages?
|
|
|
|
That depends. You'll need to read the documentation for your web
|
|
server, or perhaps check some of the other FAQs referenced above.
|
|
|
|
=head2 How do I edit my .htpasswd and .htgroup files with Perl?
|
|
|
|
The HTTPD::UserAdmin and HTTPD::GroupAdmin modules provide a
|
|
consistent OO interface to these files, regardless of how they're
|
|
stored. Databases may be text, dbm, Berkley DB or any database with a
|
|
DBI compatible driver. HTTPD::UserAdmin supports files used by the
|
|
`Basic' and `Digest' authentication schemes. Here's an example:
|
|
|
|
use HTTPD::UserAdmin ();
|
|
HTTPD::UserAdmin
|
|
->new(DB => "/foo/.htpasswd")
|
|
->add($username => $password);
|
|
|
|
=head2 How do I make sure users can't enter values into a form that cause my CGI script to do bad things?
|
|
|
|
Read the CGI security FAQ, at
|
|
http://www-genome.wi.mit.edu/WWW/faqs/www-security-faq.html, and the
|
|
Perl/CGI FAQ at
|
|
http://www.perl.com/CPAN/doc/FAQs/cgi/perl-cgi-faq.html.
|
|
|
|
In brief: use tainting (see L<perlsec>), which makes sure that data
|
|
from outside your script (eg, CGI parameters) are never used in
|
|
C<eval> or C<system> calls. In addition to tainting, never use the
|
|
single-argument form of system() or exec(). Instead, supply the
|
|
command and arguments as a list, which prevents shell globbing.
|
|
|
|
=head2 How do I parse a mail header?
|
|
|
|
For a quick-and-dirty solution, try this solution derived
|
|
from page 222 of the 2nd edition of "Programming Perl":
|
|
|
|
$/ = '';
|
|
$header = <MSG>;
|
|
$header =~ s/\n\s+/ /g; # merge continuation lines
|
|
%head = ( UNIX_FROM_LINE, split /^([-\w]+):\s*/m, $header );
|
|
|
|
That solution doesn't do well if, for example, you're trying to
|
|
maintain all the Received lines. A more complete approach is to use
|
|
the Mail::Header module from CPAN (part of the MailTools package).
|
|
|
|
=head2 How do I decode a CGI form?
|
|
|
|
You use a standard module, probably CGI.pm. Under no circumstances
|
|
should you attempt to do so by hand!
|
|
|
|
You'll see a lot of CGI programs that blindly read from STDIN the number
|
|
of bytes equal to CONTENT_LENGTH for POSTs, or grab QUERY_STRING for
|
|
decoding GETs. These programs are very poorly written. They only work
|
|
sometimes. They typically forget to check the return value of the read()
|
|
system call, which is a cardinal sin. They don't handle HEAD requests.
|
|
They don't handle multipart forms used for file uploads. They don't deal
|
|
with GET/POST combinations where query fields are in more than one place.
|
|
They don't deal with keywords in the query string.
|
|
|
|
In short, they're bad hacks. Resist them at all costs. Please do not be
|
|
tempted to reinvent the wheel. Instead, use the CGI.pm or CGI_Lite.pm
|
|
(available from CPAN), or if you're trapped in the module-free land
|
|
of perl1 .. perl4, you might look into cgi-lib.pl (available from
|
|
http://cgi-lib.stanford.edu/cgi-lib/ ).
|
|
|
|
Make sure you know whether to use a GET or a POST in your form.
|
|
GETs should only be used for something that doesn't update the server.
|
|
Otherwise you can get mangled databases and repeated feedback mail
|
|
messages. The fancy word for this is ``idempotency''. This simply
|
|
means that there should be no difference between making a GET request
|
|
for a particular URL once or multiple times. This is because the
|
|
HTTP protocol definition says that a GET request may be cached by the
|
|
browser, or server, or an intervening proxy. POST requests cannot be
|
|
cached, because each request is independent and matters. Typically,
|
|
POST requests change or depend on state on the server (query or update
|
|
a database, send mail, or purchase a computer).
|
|
|
|
=head2 How do I check a valid mail address?
|
|
|
|
You can't, at least, not in real time. Bummer, eh?
|
|
|
|
Without sending mail to the address and seeing whether there's a human
|
|
on the other hand to answer you, you cannot determine whether a mail
|
|
address is valid. Even if you apply the mail header standard, you
|
|
can have problems, because there are deliverable addresses that aren't
|
|
RFC-822 (the mail header standard) compliant, and addresses that aren't
|
|
deliverable which are compliant.
|
|
|
|
Many are tempted to try to eliminate many frequently-invalid
|
|
mail addresses with a simple regexp, such as
|
|
C</^[\w.-]+\@([\w.-]\.)+\w+$/>. It's a very bad idea. However,
|
|
this also throws out many valid ones, and says nothing about
|
|
potential deliverability, so is not suggested. Instead, see
|
|
http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/ckaddr.gz ,
|
|
which actually checks against the full RFC spec (except for nested
|
|
comments), looks for addresses you may not wish to accept mail to
|
|
(say, Bill Clinton or your postmaster), and then makes sure that the
|
|
hostname given can be looked up in the DNS MX records. It's not fast,
|
|
but it works for what it tries to do.
|
|
|
|
Our best advice for verifying a person's mail address is to have them
|
|
enter their address twice, just as you normally do to change a password.
|
|
This usually weeds out typos. If both versions match, send
|
|
mail to that address with a personal message that looks somewhat like:
|
|
|
|
Dear someuser@host.com,
|
|
|
|
Please confirm the mail address you gave us Wed May 6 09:38:41
|
|
MDT 1998 by replying to this message. Include the string
|
|
"Rumpelstiltskin" in that reply, but spelled in reverse; that is,
|
|
start with "Nik...". Once this is done, your confirmed address will
|
|
be entered into our records.
|
|
|
|
If you get the message back and they've followed your directions,
|
|
you can be reasonably assured that it's real.
|
|
|
|
A related strategy that's less open to forgery is to give them a PIN
|
|
(personal ID number). Record the address and PIN (best that it be a
|
|
random one) for later processing. In the mail you send, ask them to
|
|
include the PIN in their reply. But if it bounces, or the message is
|
|
included via a ``vacation'' script, it'll be there anyway. So it's
|
|
best to ask them to mail back a slight alteration of the PIN, such as
|
|
with the characters reversed, one added or subtracted to each digit, etc.
|
|
|
|
=head2 How do I decode a MIME/BASE64 string?
|
|
|
|
The MIME-tools package (available from CPAN) handles this and a lot
|
|
more. Decoding BASE64 becomes as simple as:
|
|
|
|
use MIME::base64;
|
|
$decoded = decode_base64($encoded);
|
|
|
|
A more direct approach is to use the unpack() function's "u"
|
|
format after minor transliterations:
|
|
|
|
tr#A-Za-z0-9+/##cd; # remove non-base64 chars
|
|
tr#A-Za-z0-9+/# -_#; # convert to uuencoded format
|
|
$len = pack("c", 32 + 0.75*length); # compute length byte
|
|
print unpack("u", $len . $_); # uudecode and print
|
|
|
|
=head2 How do I return the user's mail address?
|
|
|
|
On systems that support getpwuid, the $E<lt> variable and the
|
|
Sys::Hostname module (which is part of the standard perl distribution),
|
|
you can probably try using something like this:
|
|
|
|
use Sys::Hostname;
|
|
$address = sprintf('%s@%s', getpwuid($<), hostname);
|
|
|
|
Company policies on mail address can mean that this generates addresses
|
|
that the company's mail system will not accept, so you should ask for
|
|
users' mail addresses when this matters. Furthermore, not all systems
|
|
on which Perl runs are so forthcoming with this information as is Unix.
|
|
|
|
The Mail::Util module from CPAN (part of the MailTools package) provides a
|
|
mailaddress() function that tries to guess the mail address of the user.
|
|
It makes a more intelligent guess than the code above, using information
|
|
given when the module was installed, but it could still be incorrect.
|
|
Again, the best way is often just to ask the user.
|
|
|
|
=head2 How do I send mail?
|
|
|
|
Use the C<sendmail> program directly:
|
|
|
|
open(SENDMAIL, "|/usr/lib/sendmail -oi -t -odq")
|
|
or die "Can't fork for sendmail: $!\n";
|
|
print SENDMAIL <<"EOF";
|
|
From: User Originating Mail <me\@host>
|
|
To: Final Destination <you\@otherhost>
|
|
Subject: A relevant subject line
|
|
|
|
Body of the message goes here after the blank line
|
|
in as many lines as you like.
|
|
EOF
|
|
close(SENDMAIL) or warn "sendmail didn't close nicely";
|
|
|
|
The B<-oi> option prevents sendmail from interpreting a line consisting
|
|
of a single dot as "end of message". The B<-t> option says to use the
|
|
headers to decide who to send the message to, and B<-odq> says to put
|
|
the message into the queue. This last option means your message won't
|
|
be immediately delivered, so leave it out if you want immediate
|
|
delivery.
|
|
|
|
Or use the CPAN module Mail::Mailer:
|
|
|
|
use Mail::Mailer;
|
|
|
|
$mailer = Mail::Mailer->new();
|
|
$mailer->open({ From => $from_address,
|
|
To => $to_address,
|
|
Subject => $subject,
|
|
})
|
|
or die "Can't open: $!\n";
|
|
print $mailer $body;
|
|
$mailer->close();
|
|
|
|
The Mail::Internet module uses Net::SMTP which is less Unix-centric than
|
|
Mail::Mailer, but less reliable. Avoid raw SMTP commands. There
|
|
are many reasons to use a mail transport agent like sendmail. These
|
|
include queueing, MX records, and security.
|
|
|
|
=head2 How do I read mail?
|
|
|
|
Use the Mail::Folder module from CPAN (part of the MailFolder package) or
|
|
the Mail::Internet module from CPAN (also part of the MailTools package).
|
|
|
|
# sending mail
|
|
use Mail::Internet;
|
|
use Mail::Header;
|
|
# say which mail host to use
|
|
$ENV{SMTPHOSTS} = 'mail.frii.com';
|
|
# create headers
|
|
$header = new Mail::Header;
|
|
$header->add('From', 'gnat@frii.com');
|
|
$header->add('Subject', 'Testing');
|
|
$header->add('To', 'gnat@frii.com');
|
|
# create body
|
|
$body = 'This is a test, ignore';
|
|
# create mail object
|
|
$mail = new Mail::Internet(undef, Header => $header, Body => \[$body]);
|
|
# send it
|
|
$mail->smtpsend or die;
|
|
|
|
Often a module is overkill, though. Here's a mail sorter.
|
|
|
|
#!/usr/bin/perl
|
|
# bysub1 - simple sort by subject
|
|
my(@msgs, @sub);
|
|
my $msgno = -1;
|
|
$/ = ''; # paragraph reads
|
|
while (<>) {
|
|
if (/^From/m) {
|
|
/^Subject:\s*(?:Re:\s*)*(.*)/mi;
|
|
$sub[++$msgno] = lc($1) || '';
|
|
}
|
|
$msgs[$msgno] .= $_;
|
|
}
|
|
for my $i (sort { $sub[$a] cmp $sub[$b] || $a <=> $b } (0 .. $#msgs)) {
|
|
print $msgs[$i];
|
|
}
|
|
|
|
Or more succinctly,
|
|
|
|
#!/usr/bin/perl -n00
|
|
# bysub2 - awkish sort-by-subject
|
|
BEGIN { $msgno = -1 }
|
|
$sub[++$msgno] = (/^Subject:\s*(?:Re:\s*)*(.*)/mi)[0] if /^From/m;
|
|
$msg[$msgno] .= $_;
|
|
END { print @msg[ sort { $sub[$a] cmp $sub[$b] || $a <=> $b } (0 .. $#msg) ] }
|
|
|
|
=head2 How do I find out my hostname/domainname/IP address?
|
|
|
|
The normal way to find your own hostname is to call the C<`hostname`>
|
|
program. While sometimes expedient, this has some problems, such as
|
|
not knowing whether you've got the canonical name or not. It's one of
|
|
those tradeoffs of convenience versus portability.
|
|
|
|
The Sys::Hostname module (part of the standard perl distribution) will
|
|
give you the hostname after which you can find out the IP address
|
|
(assuming you have working DNS) with a gethostbyname() call.
|
|
|
|
use Socket;
|
|
use Sys::Hostname;
|
|
my $host = hostname();
|
|
my $addr = inet_ntoa(scalar gethostbyname($host || 'localhost'));
|
|
|
|
Probably the simplest way to learn your DNS domain name is to grok
|
|
it out of /etc/resolv.conf, at least under Unix. Of course, this
|
|
assumes several things about your resolv.conf configuration, including
|
|
that it exists.
|
|
|
|
(We still need a good DNS domain name-learning method for non-Unix
|
|
systems.)
|
|
|
|
=head2 How do I fetch a news article or the active newsgroups?
|
|
|
|
Use the Net::NNTP or News::NNTPClient modules, both available from CPAN.
|
|
This can make tasks like fetching the newsgroup list as simple as:
|
|
|
|
perl -MNews::NNTPClient
|
|
-e 'print News::NNTPClient->new->list("newsgroups")'
|
|
|
|
=head2 How do I fetch/put an FTP file?
|
|
|
|
LWP::Simple (available from CPAN) can fetch but not put. Net::FTP (also
|
|
available from CPAN) is more complex but can put as well as fetch.
|
|
|
|
=head2 How can I do RPC in Perl?
|
|
|
|
A DCE::RPC module is being developed (but is not yet available), and
|
|
will be released as part of the DCE-Perl package (available from
|
|
CPAN). The rpcgen suite, available from CPAN/authors/id/JAKE/, is
|
|
an RPC stub generator and includes an RPC::ONC module.
|
|
|
|
=head1 AUTHOR AND COPYRIGHT
|
|
|
|
Copyright (c) 1997-1999 Tom Christiansen and Nathan Torkington.
|
|
All rights reserved.
|
|
|
|
When included as part of the Standard Version of Perl, or as part of
|
|
its complete documentation whether printed or otherwise, this work
|
|
may be distributed only under the terms of Perl's Artistic Licence.
|
|
Any distribution of this file or derivatives thereof I<outside>
|
|
of that package require that special arrangements be made with
|
|
copyright holder.
|
|
|
|
Irrespective of its distribution, all code examples in this file
|
|
are hereby placed into the public domain. You are permitted and
|
|
encouraged to use this code in your own programs for fun
|
|
or for profit as you see fit. A simple comment in the code giving
|
|
credit would be courteous but is not required.
|
|
|