source: wiki-toolkit-formatter-usemod/trunk/lib/Wiki/Toolkit/Formatter/UseMod.pm @ 404

Last change on this file since 404 was 404, checked in by Dominic Hargreaves, 12 years ago

update copyright year

  • Property svn:eol-style set to native
  • Property svn:keywords set to Author Date Id Revision
File size: 21.9 KB
Line 
1package Wiki::Toolkit::Formatter::UseMod;
2
3use strict;
4
5use vars qw( $VERSION @_links_found );
6$VERSION = '0.20';
7
8use URI::Escape;
9use Text::WikiFormat as => 'wikiformat';
10use HTML::PullParser;
11use URI::Find::Delimited;
12
13=head1 NAME
14
15Wiki::Toolkit::Formatter::UseMod - UseModWiki-style formatting for Wiki::Toolkit
16
17=head1 DESCRIPTION
18
19A formatter backend for L<Wiki::Toolkit> that supports UseMod-style formatting.
20
21=head1 SYNOPSIS
22
23  use Wiki::Toolkit::Formatter::UseMod;
24
25  # Instantiate - see below for parameter details.
26  my $formatter = Wiki::Toolkit::Formatter::UseMod->new( %config );
27
28  # Format some text.
29  my $cooked = $formatter->format($raw);
30
31  # Find out which other nodes that text would link to.
32  my @links_to = $formatter->find_internal_links($raw);
33
34=head1 METHODS
35
36=over 4
37
38=item B<new>
39
40  my $formatter = Wiki::Toolkit::Formatter::UseMod->new(
41                 extended_links      => 0, # $FreeLinks
42                 implicit_links      => 1, # $WikiLinks
43                 force_ucfirst_nodes => 1, # $FreeUpper
44                 use_headings        => 1, # $UseHeadings
45                 allowed_tags        => [qw(b i)], # defaults to none
46                 macros              => {},
47                 pass_wiki_to_macros => 0,
48                 node_prefix         => 'wiki.pl?',
49                 node_suffix         => '',
50                 edit_prefix         => 'wiki.pl?action=edit;id=',
51                 edit_suffix         => '',
52                 munge_urls          => 0,
53  );
54
55Parameters will default to the values shown above (apart from
56C<allowed_tags>, which defaults to allowing no tags).
57
58=over 4
59
60=item B<Internal links>
61
62C<node_prefix>, C<node_suffix>, C<edit_prefix> and C<edit_suffix>
63allow you to control the URLs generated for links to other wiki pages.
64So for example with the defaults given above, a link to the Home node
65will have the URL C<wiki.pl?Home> and a link to the edit form for the
66Home node will have the URL C<wiki.pl?action=edit;id=Home>
67
68(Note that of course the URLs that you wish to have generated will
69depend on how your wiki application processes its CGI parameters - you
70can't just put random stuff in there and hope it works!)
71
72=item B<Internal links - advanced options>
73
74If you wish to have greater control over the links, you may use the
75C<munge_node_name> parameter.  The value of this should be a
76subroutine reference.  This sub will be called on each internal link
77after all other formatting and munging I<except> URL escaping has been
78applied.  It will be passed the node name as its first parameter and
79should return a node name.  Note that this will affect the URLs of
80internal links, but not the link text.
81
82Example:
83
84  # The formatter munges links so node names are ucfirst.
85  # Ensure 'state51' always appears in lower case in node names.
86  munge_node_name => sub {
87                         my $node_name = shift;
88                         $node_name =~ s/State51/state51/g;
89                         return $node_name;
90                     }
91
92B<Note:> This is I<advanced> usage and you should only do it if you
93I<really> know what you're doing.  Consider in particular whether and
94how your munged nodes are going to be treated by C<retrieve_node>.
95
96=item B<URL munging>
97
98If you set C<munge_urls> to true, then your URLs will be more
99user-friendly, for example
100
101  http://example.com/wiki.cgi?Mailing_List_Managers
102
103rather than
104
105  http://example.com/wiki.cgi?Mailing%20List%20Managers
106
107The former behaviour is the actual UseMod behaviour, but requires a
108little fiddling about in your code (see C<node_name_to_node_param>),
109so the default is to B<not> munge URLs.
110
111=item B<Macros>
112
113Be aware that macros are processed I<after> filtering out disallowed
114HTML tags and I<before> transforming from wiki markup into HTML.  They
115are also not called in any particular order.
116
117The keys of macros should be either regexes or strings. The values can
118be strings, or, if the corresponding key is a regex, can be coderefs.
119The coderef will be called with the first nine substrings captured by
120the regex as arguments. I would like to call it with all captured
121substrings but apparently this is complicated.
122
123You may wish to have access to the overall wiki object in the subs
124defined in your macro.  To do this:
125
126=over
127
128=item *
129
130Pass the wiki object to the C<< ->formatter >> call as described below.
131
132=item *
133
134Pass a true value in the C<pass_wiki_to_macros> parameter when calling
135C<< ->new >>.
136
137=back
138
139If you do this, then I<all> coderefs will be called with the wiki object
140as the first parameter, followed by the first nine captured substrings
141as described above.  Note therefore that setting C<pass_wiki_to_macros>
142may cause backwards compatibility issues.
143
144=back
145
146Macro examples:
147
148  # Simple example - substitute a little search box for '@SEARCHBOX'
149
150  macros => {
151
152      '@SEARCHBOX' =>
153                qq(<form action="wiki.pl" method="get">
154                   <input type="hidden" name="action" value="search">
155                   <input type="text" size="20" name="terms">
156                   <input type="submit"></form>),
157  }
158
159  # More complex example - substitute a list of all nodes in a
160  # category for '@INDEX_LINK [[Category Foo]]'
161
162  pass_wiki_to_macros => 1,
163  macros              => {
164      qr/\@INDEX_LINK\s+\[\[Category\s+([^\]]+)]]/ =>
165          sub {
166                my ($wiki, $category) = @_;
167                my @nodes = $wiki->list_nodes_by_metadata(
168                        metadata_type  => "category",
169                        metadata_value => $category,
170                        ignore_case    => 1,
171                );
172                my $return = "\n";
173                foreach my $node ( @nodes ) {
174                    $return .= "* "
175                            . $wiki->formatter->format_link(
176                                                       wiki => $wiki,
177                                                       link => $node,
178                                                           )
179                            . "\n";
180                 }
181                 return $return;
182               },
183  }
184
185
186=cut
187
188sub new {
189    my ($class, @args) = @_;
190    my $self = {};
191    bless $self, $class;
192    $self->_init(@args) or return undef;
193    return $self;
194}
195
196sub _init {
197    my ($self, %args) = @_;
198
199    # Store the parameters or their defaults.
200    my %defs = ( extended_links      => 0,
201                 implicit_links      => 1,
202                 force_ucfirst_nodes => 1,
203                 use_headings        => 1,
204                 allowed_tags        => [],
205                 macros              => {},
206                 pass_wiki_to_macros => 0,
207                 node_prefix         => 'wiki.pl?',
208                 node_suffix         => '',
209                 edit_prefix         => 'wiki.pl?action=edit;id=',
210                 edit_suffix         => '',
211                 munge_urls          => 0,
212                 munge_node_name     => undef,
213               );
214
215    my %collated = (%defs, %args);
216    foreach my $k (keys %defs) {
217        $self->{"_".$k} = $collated{$k};
218    }
219    return $self;
220}
221
222=item B<format>
223
224  my $html = $formatter->format($submitted_content, $wiki);
225
226Escapes any tags which weren't specified as allowed on creation, then
227interpolates any macros, then translates the raw Wiki language
228supplied into HTML.
229
230A L<Wiki::Toolkit> object can be supplied as an optional second parameter.
231This object will be used to determine whether a linked-to node exists
232or not, and alter the presentation of the link accordingly. This is
233only really in here for use when this method is being called from
234within L<Wiki::Toolkit>.
235
236=cut
237
238sub format {
239    my ($self, $raw, $wiki) = @_;
240    $raw =~ s/\r\n/\n/sg; # CGI newline is \r\n not \n
241    my $safe = "";
242
243    my %allowed = map {lc($_) => 1, "/".lc($_) => 1} @{$self->{_allowed_tags}};
244
245    # Parse the HTML - even if we're not allowing any tags, because we're
246    # using a custom escaping routine rather than CGI.pm
247    my $parser = HTML::PullParser->new(doc   => $raw,
248                                       start => '"TAG", tag, text',
249                                       end   => '"TAG", tag, text',
250                                       text  => '"TEXT", tag, text');
251    while (my $token = $parser->get_token) {
252        my ($flag, $tag, $text) = @$token;
253        if ($flag eq "TAG" and !defined $allowed{lc($tag)}) {
254            $safe .= $self->_escape_HTML($text);
255        } else {
256            $safe .= $text;
257        }
258    }
259
260    # Now do any inline links.
261    my $callback = sub {
262        my ($open, $close, $url, $title, $whitespace) = @_;
263        $title ||= $url;
264        if ( $open && $close ) {
265            return $self->make_external_link( title => $title, url => $url );
266        } else {
267            return $open
268                   . $self->make_external_link( title => $title, url => $url )
269                   . $close;
270        }
271    };
272 
273    my $finder = URI::Find::Delimited->new( ignore_quoted => 1, callback => $callback );
274    $finder->find(\$safe);
275
276    # Now process any macros.
277    my %macros = %{$self->{_macros}};
278    foreach my $key (keys %macros) {
279        my $value = $macros{$key};
280        if ( ref $value && ref $value eq 'CODE' ) {
281            if ( $self->{_pass_wiki_to_macros} and $wiki ) {
282                $safe=~ s/$key/$value->($wiki, $1, $2, $3, $4, $5, $6, $7, $8, $9)/eg;
283            } else {
284                $safe=~ s/$key/$value->($1, $2, $3, $4, $5, $6, $7, $8, $9)/eg;
285            }
286        } else {
287          $safe =~ s/$key/$value/g;
288        }
289    }
290
291    # Finally set up config and call Text::WikiFormat.
292    my %format_opts = $self->_format_opts;
293    my %format_tags = (
294        # chromatic made most of the regex below.  I will document it when
295        # I understand it properly.
296        indent   => qr/^(?:\t+|\s{4,}|\s*\*?(?=\**\*+))/,
297        newline => "", # avoid bogus <br />
298        paragraph       => [ "<p>", "</p>\n", "", "\n", 1 ], # no bogus <br />
299        extended_link_delimiters => [ '[[', ']]' ],
300        blocks                   => {
301                         ordered         => qr/^\s*([\d]+)\.\s*/,
302                         unordered       => qr/^\s*\*\s*/,
303                         definition      => qr/^:\s*/,
304                         pre             => qr/^\s+/,
305                         table           => qr/^\|\|/,
306                                    },
307        definition               => [ "<dl>\n", "</dl>\n", "<dd>&nbsp;", "</dd>\n" ],
308        pre                      => [ "<pre>\n", "</pre>\n", "", "\n" ],
309        table                    => [ qq|<table class="user_table">\n|, "</table>\n",
310                                       sub {
311                                           my $line = shift;
312                                           $line =~ s/\|\|$/<\/td>/;
313                                           $line =~ s/\|\|/<\/td><td>/g;
314                                           return ("<tr>","<td>$line","</tr>");
315                                       },
316                                    ],
317        # we don't label unordered lists as "not indented" so we can nest them.
318        indented   => {
319                        definition => 0,
320                        ordered    => 0,
321                        pre        => 0,
322                        table      => 0,
323                       }, 
324        blockorder => [ qw( header line ordered unordered code definition pre table paragraph )],
325        nests      => { map { $_ => 1} qw( ordered unordered ) },
326        link => sub {
327                      my $link = shift;
328                      return $self->format_link(
329                                                 link => $link,
330                                                 wiki => $wiki,
331                                               );
332        },
333    );
334
335    return wikiformat($safe, \%format_tags, \%format_opts );
336}
337
338sub _format_opts {
339    my $self = shift;
340    return (
341             extended       => $self->{_extended_links},
342             prefix         => $self->{_node_prefix},
343             implicit_links => $self->{_implicit_links}
344           );
345}
346
347=item B<format_link>
348
349  my $string = $formatter->format_link(
350                                        link => "Home Node",
351                                        wiki => $wiki,
352                                      );
353
354An internal method exposed to make it easy to go from eg
355
356  * Foo
357  * Bar
358
359to
360
361  * <a href="index.cgi?Foo">Foo</a>
362  * <a href="index.cgi?Bar">Bar</a>
363
364See Macro Examples above for why you might find this useful.
365
366C<link> should be something that would go inside your extended link
367delimiters.  C<wiki> is optional but should be a L<Wiki::Toolkit> object.
368If you do supply C<wiki> then the method will be able to check whether
369the node exists yet or not and so will call C<< ->make_edit_link >>
370instead of C<< ->make_internal_link >> where appropriate.  If you don't
371supply C<wiki> then C<< ->make_internal_link >> will be called always.
372
373This method used to be private so may do unexpected things if you use
374it in a way that I haven't tested yet.
375
376=cut
377
378sub format_link {
379    my ($self, %args) = @_;
380    my $link = $args{link};
381    my %opts = $self->_format_opts;
382    my $wiki = $args{wiki};
383
384    my $title;
385    ($link, $title) = split(/\|/, $link, 2) if $opts{extended};
386    $title =~ s/^\s*// if $title; # strip leading whitespace
387    $title ||= $link;
388
389    if ( $self->{_force_ucfirst_nodes} ) {
390        $link = $self->_do_freeupper($link);
391    }
392    $link = $self->_munge_spaces($link);
393
394    $link = $self->{_munge_node_name}($link)
395        if $self->{_munge_node_name};
396
397    if (!$link) {
398        return "[Undefined link '$title']";
399    }
400
401    my $editlink_not_link = 0;
402    # See whether the linked-to node exists, if we can.
403    if ( $wiki && !$wiki->node_exists( $link ) ) {
404        $editlink_not_link = 1;
405    }
406
407    $link =~ s/ /_/g if $self->{_munge_urls};
408    $link = uri_escape( $link );
409
410    if ( $editlink_not_link ) {
411        my $prefix = $self->{_edit_prefix};
412        my $suffix = $self->{_edit_suffix};
413        return $self->make_edit_link(
414                                      title => $title,
415                                      url   => $prefix.$link.$suffix,
416                                    );
417    } else {
418        my $prefix = $self->{_node_prefix};
419        my $suffix = $self->{_node_suffix};
420        return $self->make_internal_link(
421                                          title => $title,
422                                          url   => $prefix.$link.$suffix,
423                                        );
424    }
425}
426
427# CGI.pm is sometimes awkward about actually performing CGI::escapeHTML
428# if there's a previous instantiation - in the calling script, for example.
429# So just do it here.
430sub _escape_HTML {
431    my ($self, $text) = @_;
432    $text =~ s{&}{&amp;}gso;
433    $text =~ s{<}{&lt;}gso;
434    $text =~ s{>}{&gt;}gso;
435    $text =~ s{"}{&quot;}gso;
436    return $text;
437}
438
439=item B<find_internal_links>
440 
441  my @links_to = $formatter->find_internal_links( $content );
442 
443Returns a list of all nodes that the supplied content links to.
444 
445=cut 
446 
447sub find_internal_links { 
448    my ($self, $raw) = @_;
449 
450    @_links_found = (); 
451 
452    my %format_opts = $self->_format_opts;
453
454    my %format_tags = ( extended_link_delimiters => [ '[[', ']]' ],
455                        link => sub {
456                            my $link = shift;
457                            my %opts = $self->_format_opts;
458                            my $title;
459                            ($link, $title) = split(/\|/, $link, 2)
460                              if $opts{extended};
461                            if ( $self->{_force_ucfirst_nodes} ) {
462                                $link = $self->_do_freeupper($link);
463                            }
464                            $link = $self->{_munge_node_name}($link)
465                              if $self->{_munge_node_name};
466                            $link = $self->_munge_spaces($link);
467                            push @Wiki::Toolkit::Formatter::UseMod::_links_found,
468                                                                         $link;
469                            return ""; # don't care about output
470                                     }
471    );
472
473    my $foo = wikiformat($raw, \%format_tags, \%format_opts);
474
475    my @links = @_links_found;
476    @_links_found = ();
477    return @links;
478}
479
480
481=item B<node_name_to_node_param>
482
483  use URI::Escape;
484  $param = $formatter->node_name_to_node_param( "Recent Changes" );
485  my $url = "wiki.pl?" . uri_escape($param);
486
487In usemod, the node name is encoded prior to being used as part of the
488URL. This method does this encoding (essentially, whitespace is munged
489into underscores). In addition, if C<force_ucfirst_nodes> is in action
490then the node names will be forced ucfirst if they weren't already.
491
492Note that unless C<munge_urls> was set to true when C<new> was called,
493this method will do nothing.
494
495=cut
496
497sub node_name_to_node_param {
498    my ($self, $node_name) = @_;
499    return $node_name unless $self->{_munge_urls};
500    my $param = $node_name;
501    $param = $self->_munge_spaces($param);
502    $param = $self->_do_freeupper($param) if $self->{_force_ucfirst_nodes};
503    $param =~ s/ /_/g;
504
505    return $param;
506}
507
508=item B<node_param_to_node_name>
509
510  my $node = $q->param('node') || "";
511  $node = $formatter->node_param_to_node_name( $node );
512
513In usemod, the node name is encoded prior to being used as part of the
514URL, so we must decode it before we can get back the original node name.
515
516Note that unless C<munge_urls> was set to true when C<new> was called,
517this method will do nothing.
518
519=cut
520
521sub node_param_to_node_name {
522    my ($self, $param) = @_;
523    return $param unless $self->{_munge_urls};
524
525    # Note that this might not give us back exactly what we started with,
526    # since in the encoding we collapse and trim whitespace; but this is
527    # how usemod does it (as of 0.92) and usemod is what we're emulating.
528    $param =~ s/_/ /g;
529
530    return $param;
531}
532
533sub _do_freeupper {
534    my ($self, $node) = @_;
535
536    # This is the FreeUpper usemod behaviour, slightly modified from
537    # their regexp, as we need to do it before we check whether the
538    # node exists ie before we substitute the spaces with underscores.
539    $node = ucfirst($node);
540    $node =~ s|([- _.,\(\)/])([a-z])|$1.uc($2)|ge;
541
542    return $node;
543}
544
545sub _munge_spaces {
546    my ($self, $node) = @_;
547
548    # Yes, we really do only munge spaces, not all whitespace. This is
549    # how usemod does it (as of 0.92).
550    $node =~ s/ +/ /g;
551    $node =~ s/^ //;
552    $node =~ s/ $//;
553
554    return $node
555}
556
557=head1 SUBCLASSING
558
559The following methods can be overridden to provide custom behaviour.
560
561=over
562
563=item B<make_edit_link>
564
565    my $link = $self->make_edit_link(
566        title => "Home Page",
567        url   => "http://example.com/?id=Home",
568                                   );
569
570This method will be passed a title and a url and should return an HTML
571snippet.  For example, you can add a C<title> attribute to the link
572like so:
573
574  sub make_edit_link {
575      my ($self, %args) = @_;
576      my $title = $args{title};
577      my $url = $args{url};
578      return qq|[$title]<a href="$url" title="create">?</a>|;
579  }
580
581=cut
582
583sub make_edit_link {
584    my ($self, %args) = @_;
585    return qq|[$args{title}]<a href="$args{url}">?</a>|;
586}
587
588=item B<make_internal_link>
589
590    my $link = $self->make_internal_link(
591        title => "Home Page",
592        url   => "http://example.com/?id=Home",
593                                        );
594
595This method will be passed a title and a url and should return an HTML
596snippet.  For example, you can add a C<class> attribute to the link
597like so:
598
599  sub make_internal_link {
600      my ($self, %args) = @_;
601      my $title = $args{title};
602      my $url = $args{url};
603      return qq|<a href="$url" class="internal">$title</a>|;
604  }
605
606=cut
607
608sub make_internal_link {
609    my ($self, %args) = @_;
610    return qq|<a href="$args{url}">$args{title}</a>|;
611}
612
613=item B<make_external_link>
614
615    my $link = $self->make_external_link(
616        title => "London Perlmongers",
617        url   => "http://london.pm.org",
618                                        );
619
620This method will be passed a title and a url and should return an HTML
621snippet.  For example, you can add a little icon after each external
622link like so:
623
624  sub make_external_link {
625      my ($self, %args) = @_;
626      my $title = $args{title};
627      my $url = $args{url};
628      return qq|<a href="$url">$title</a> <img src="external.gif">|;
629  }
630
631=cut
632
633sub make_external_link {
634    my ($self, %args) = @_;
635    my ($open, $close) = ( "[", "]" );
636    if ( $args{title} eq $args{url} ) {
637        ($open, $close) = ( "", "" );
638    }
639    return qq|$open<a href="$args{url}">$args{title}</a>$close|;
640}
641
642=back
643
644=head1 AUTHOR
645
646Kake Pugh (kake@earth.li) and the Wiki::Toolkit team.
647
648=head1 COPYRIGHT
649
650     Copyright (C) 2003-2004 Kake Pugh.  All Rights Reserved.
651     Copyright (C) 2006-2008 the Wiki::Toolkit team. All Rights Reserved.
652
653This module is free software; you can redistribute it and/or modify it
654under the same terms as Perl itself.
655
656=head1 CREDITS
657
658The OpenGuides London team (L<http://openguides.org/london/>) sent
659some very helpful bug reports. A lot of the work of this module is
660done within chromatic's module, L<Text::WikiFormat>.
661
662=head1 CAVEATS
663
664This doesn't yet support all of UseMod's formatting features and
665options, by any means.  This really truly I<is> a 0.* release. Please
666send bug reports, omissions, patches, and stuff, to me at
667C<kake@earth.li>.
668
669=head1 NOTE ON USEMOD COMPATIBILITY
670
671UseModWiki "encodes" node names before making them part of a URL, so
672for example a node about Wombat Defenestration will have a URL like
673
674  http://example.com/wiki.cgi?Wombat_Defenestration
675
676So if we want to emulate a UseModWiki exactly, we need to munge back
677and forth between node names as titles, and node names as CGI params.
678
679  my $formatter = Wiki::Toolkit::Formatter::UseMod->new( munge_urls => 1 );
680  my $node_param = $q->param('id') || $q->param('keywords') || "";
681  my $node_name = $formatter->node_param_to_node_name( $node_param );
682
683  use URI::Escape;
684  my $url = "http://example.com/wiki.cgi?"
685    . uri_escape(
686       $formatter->node_name_to_node_param( "Wombat Defenestration" )
687                 );
688
689=head1 SEE ALSO
690
691=over 4
692
693=item * L<Wiki::Toolkit>
694
695=item * L<Text::WikiFormat>
696
697=item * UseModWiki (L<http://www.usemod.com/cgi-bin/wiki.pl>)
698
699=back
700
701=cut
702
7031;
Note: See TracBrowser for help on using the repository browser.