html2pl.pl - HTML to Perl file converter

This is a simple Perl program to convert HTML files to Perl print statements. It's great for those quick and dirty projects where you need to add a little perl to a lot of html. Or when you want to keep using that graphical html editor.

You can download the latest file here, or copy it from the listing below.

It's Perl, so it's easy to customize. You can add to the @char list of characters to be \'ed out if you find something that doesn't work correctly.

I've tested it on over a megabyte of web pages, one page each from several dozen popular sites. I converted the page from Windows CrLf End Of Line to Unix style EOL. I then converted the file to perl using html2pl.pl. I then ran the resulting perl file and redirected it to a html file, which I diff'ed with the original Unix html file. The only difference that I ever found was the occasional odd End Of File character on the original html. 
If you come up with web page that doesn't work correctly, please send it to me.

Bruce Kives


 

#!/usr/bin/perl -w
###########################################################
#
#
# html2pl.pl
# HTML file to Perl print file converter
# Version 0.2
# Copyleft: 4/12/2002 by Bruce Kives
# bdkives@yahoo.com
# Distributed under the Perl Clarified Artistic License
# Tested on Perl 5.6.1 Test.htm included at bottom
#
#
###########################################################

#
# OK (){}[]<>! # ^ _ - + = | : , ? ~ tabs
# problems \ / ' " ` ; @ $ % & * ;
#
# @char contains the characters to be backslashed.
# It can be modified as needed. It should be
# @char = qw ( \ / " ' ` ; @ $ % & *);
# but s/// has problems with \ $ *
# Those characters will be handled individually
#
@char = qw (/ " ' ` ; @ % &);

#
# Get the HTML filename
#
$file = $ARGV[0];
if ($#ARGV == -1)
   {
    print"\n html2pl.pl - HTML to Perl file converter\n\nWhich file do you want converted: ";
    $file = <STDIN>;
   }
chomp($file);
($newfile, $ext) = split /\./, $file;
if ($ext eq 'pl')
   { die "\nInput file $file appears to be a .pl instead of a .htm or .html file.\n"; }
$newfile .= ".pl";

#
# Open and read the .htm file into an array
#
open(INPUT, "<$file") || die "Can't open input file $file: $!\n";
@lines = <INPUT>;
close(INPUT) || die "can't close $file: $!"; # close the .htm file

#
# Kill the return and linefeed characters
#
chomp(@lines);
$temp = $/;
$/ = "\r";
chomp(@lines);
$/ = $temp;

#
# Open the new .pl file for writing
#
open (OUTPUT, ">$newfile") || die "Can't open $file for writing: $!\n";
print "\nConverting $file to $newfile\n";

#
# Comment out the next three lines if you don't want the perl header.
#
print OUTPUT "#!/usr/bin/perl -w\n#\n";
print OUTPUT "# HTML file converted to Perl by html2pl.pl\n#\n";
print OUTPUT "# $file converted to $newfile\n#\n\n";

#
# Start the substitutions
#
foreach $line(@lines)
  {
    $_ = $line;
    s/\\/\\\\/g; # special case for \
    s/\$/\\\$/g; # special case for $
    s/\*/\\\*/g; # special case for *
    $line = $_;
    foreach $char (@char)
     {
     $_ = $line;
     s/$char/\\$char/g;
     $line = $_;
     }
    print OUTPUT "print\"$line\\n\"\; \n";
   }
close(OUTPUT) || die "can't close output file $file: $!"; # Close the .pl file

#
# Change the Unix permissions as needed. Delete this for windows.
#
chmod 0775, $newfile;

#
# DONE
#
exit (0);


###################
#
# test.htm
#
###################
#
#<!doctype html public "-//w3c//dtd html 3.2//en">
#<html>
#<head>
#<title>Test .htm to .pl converter</title>
#</head>
#
#<body bgcolor="#ffffff" text="#000000" link="#0000ff" vlink="#800080" alink="#ff0000">
#<b><font FACE="System">
#<p>OK     (){}[]&lt;&gt;! # ^ _ - + = | : , ? ~</p>
#<p>not ok     \ / ' &quot; ` ; @ $ % &amp; * ;</p>
#<! The &lt; &gt; &quot; &amp; characters are found elsewhere in the page >
#</font></b>
#</body>
#</html>
#