Home
Homepage
Downloads
Downloads
Forums
Forums
Your Account
Il Tuo Account
IngegneriConLePalle.com - Il sito degli Studenti della Facoltà di Ingegneria di Forlì
Promotions


Online Chat
Vuoi cambiar Nome?!
/n tuonome cambia nick
Scegli la Tua Chat 
MobileFissa
PopupOff

Menu
 Homepage
 Utenti
 Il tuo profilo
 Lista Membri
 Blog Utenti
 Firma il Guestbook!
 Contatta Web Master
 Passaparola!!!!
 Community
 Galleria Foto
 Salagiochi
 Forums
 Messaggi Privati
 Cruciverba
 Sudoku
 WebChat
 Calendario Eventi
 Torneo Fantacalcio
 Documenti
 Risorse
 Downloads
 Loghi x Cellulari
 Web Links
 Barzellette
 Cerca nel Sito
 Documenti
 Argomenti
 News
 Aggiungi News
 Digital-Sat News
 AvantGo
 Servizi
 Previsioni Meteo
 Elenco Telefonico
 Video Musicali
 Radio Streaming
 Serata in TV
 Stradario d'Italia
 GoogleMaps
 Utilitą
 Php-Nuke Tools
 GUIstuff+
 Multi Search Engine
 Codice Fiscale
 Underground Search
 Submit Engines
 Open Directory
 PHP-Nuke HOWTO
 Statistiche
 Statistiche del Sito
 Analysis
 Top 10
 Inverno 2005
 Bollettino Neve
 WebCam Neve
 Site Map
 RSS Articoli 0.91
 RSS Articoli 2.0
 RSS Downloads 0.91
 RSS Downloads 2.0
 RSS Links 0.91
 RSS Links 2.0
 RSS Forums 0.91
 RSS Forums 2.0
 RSS Calendario 0.91
 RSS Calendario 2.0
 ATOM Articoli 0.3
 Spambot Killer

Promotions

Security
_AB_WARNED
We have caught 339 shameful hackers.

NukeSentinel™ 2.5.17




Petizione pro aeroporto ridolfi: teniamo il low cost a forlì!
Regular expressions

25.3. Regular expressions

Figure 25-4. Pattern matching with a regular expression.

Pattern matching with a regular expression.

Pattern matching with a regular expression.



Regular expressions are used as "templates" that match patterns of text. For example, the regular expression for the pattern matching in Figure 25-4 is[1]

\([Ii]f \|and \)*\(<i>[AC]\+<\/i>.\)\(and\)\?

To understand any URL manipulation solution to the problem of non-search-engine-friendly URLs, you have to get acquainted with Regular Expressions. To get you started, read Using Regular Expressions and Matching Patterns in Text. We can only touch the basics here, for which we use material taken from A Brief Introduction to Regular Expressions:

An expression is a string of characters. Those characters that have an interpretation above and beyond their literal meaning are called metacharacters. A quote symbol, for example, may denote speech by a person, ditto, or a meta-meaning for the symbols that follow. Regular Expressions are sets of characters and/or metacharacters that UNIX endows with special features.

The main uses for Regular Expressions (REs) are text searches and string manipulation. An RE matches a single character or a set of characters (a substring or an entire string).

  • The asterisk -- * -- matches any number of repeats of the character string or RE preceding it, including zero.

    "1133*" matches 11 + one or more 3's + possibly other characters: 113, 1133, 111312, and so forth.
    
  • The dot -- . -- matches any one character, except a newline. [2]

    "13." matches 13 + at least one of any character (including a space): 1133, 11333, but not 13 (additional character missing).
    
  • The caret -- ^ -- matches the beginning of a line, but sometimes, depending on context, negates the meaning of a set of characters in an RE.

  • The dollar sign -- $ -- at the end of an RE matches the end of a line.

  • "^$" matches blank lines.

  • Brackets -- [...] -- enclose a set of characters to match in a single RE.

    • "[xyz]" matches the characters x, y, or z.

    • "[c-n]" matches any of the characters in the range c to n.

    • "[B-Pk-y]" matches any of the characters in the ranges B to P and k to y.

    • "[a-z0-9]" matches any lowercase letter or any digit.

    • "[^b-d]" matches all characters except those in the range b to d. This is an instance of ^ negating or inverting the meaning of the following RE (taking on a role similar to ! in a different context).

    • Combined sequences of bracketed characters match common word patterns. "[Yy][Ee][Ss]" matches yes, Yes, YES, yEs, and so forth. "[0-9][0-9][0-9]-[0-9][0-9]-[0-9

    • ][0-9][0-9][0-9]" matches any Social Security number.

  • The backslash -- \ -- escapes a special character, which means that character gets interpreted literally.

    • A "\$" reverts back to its literal meaning of "$", rather than its RE meaning of end-of-line. Likewise a "\\" has the literal meaning of "\".

    • Escaped "angle brackets" -- \<...\> -- mark word boundaries. The angle brackets must be escaped, since otherwise they have only their literal character meaning:

      "\<the\>" matches the word "the", but not the words "them", "there", "other", etc.
      
  • The question mark -- ? -- matches zero or one of the previous RE. It is generally used for matching single characters.

  • The plus -- + -- matches one or more of the previous RE. It serves a role similar to the *, but does not match zero occurrences.

  • Escaped "curly brackets" -- \{ \} -- indicate the number of occurrences of a preceding RE to match. It is necessary to escape the curly brackets since they have only their literal character meaning otherwise.

    "[0-9]\{5\}" matches exactly five digits (characters in the range of 0 to 9).
    
  • Parenthesses -- ( ) -- enclose groups of REs. They are useful with the following "|" operator and in substring extraction using expr.

  • The -- | -- "or" RE operator matches any of a set of alternate characters.

What does the above tell us when we encounter a cryptic mod_rewrite directive that looks like the following?

RewriteEngine on
RewriteRule ^page1\.html$ page2.html [R=301,L]

Of course, the first line is easy: mod_rewrite is not enabled by default, so this line starts the "Rewrite Engine". The second directive is a "Rewrite Rule" that instructs mod_rewrite to translate whatever URL is matched by the regular expression "^page1\.html$" to "page2.html".

What URLs does the regular expression "^page1\.html$" match?

In this example, adapted from An Introduction to Redirecting URLs on an Apache Server, we have a caret at the beginning of the pattern, and a dollar sign at the end. These are regex special characters called anchors. The caret tells regex to begin looking for a match with the character that immediately follows it, in this case a "p". The dollar sign anchor tells regex that this is the end of the string we want to match. In our simple example, "page1\.html" and "^page1\.html$" are interchangable expressions and match the same string. However, "page1\.html" matches any string containing "page1.html" (apage1.html for example) anywhere in the URL, but "^page1\.html$" matches only a string which is exactly equal to "page1.html". In a more complex redirect, anchors (and other special regex characters) are often essential.

Putting all the above together, we can see that "^page1\.html$" matches URLs that start (the caret -- ^ --) with "page1", immediately followed by a literal dot (escaped dot --\.--, as opposed to a simple tot, which is a metacharacter that matches any single character except newline), immediately followed by "html" and the end of the URL (dollar sign --$--).

In our example, we also have an "[R=301,L]". These are called flags in mod_rewrite and they're optional parameters. "R=301" instructs Apache to return a 301 status code with the delivered page and, when not included as in [R,L], defaults to 302. The "L" flag tells Apache that this is the last rule that it needs to process, IF the RewriteRule pattern is matched. Experts suggest that you get in the habit of including the "L" flag with every RewriteRule to avoid unpleasant surprises.

One powerful option in creating search patterns is specifying that a subexpression that was matched earlier in a regular expression is matched again later in the expression. We do this using backreferences. Backreferences are named by the numbers 1 through 9, preceded by the backslash/escape character when used in this manner (in mod_rewrite, you have to use the dollar sign instead of the backslash, but in PHP you will use the backslash, so don't get confused, it just depends on the context the regular expression is in). These backreferences refer to each successive group in the match pattern, as in /(one)(two)(three)/\1\2\3/ (or $1, $2 and $3 for mod_rewrite). Each numbered backreference refers to the group that has the word corresponding to the number.

Thus the following URL translation:

#Your Account
RewriteRule ^userinfo-([a-zA-Z0-9_-]*)\.html
modules.php?name=Your_Account&op=userinfo&username=$1

in the .htaccess file (Section 25.4) will match any URL that starts (carret --^--) with "userinfo-", immediately followed by any number (star --*--) of characters belonging to the alphanumeric class (a-z, A-Z, 0-9), including underscores (_) and dashes (-), followed by a literal dot (an escaped dot --\.--) and "html". The Rewrite Rule instructs mod_rewrite to translate ther URL to

modules.php?name=Your_Account&op=userinfo&username=$1

where $1 is a backreference, referring to the first matched subexpression, the one inside the parenthesses (). Since inside the parenthesses is a regular expression that matches "any number of characters belonging to the alphanumeric class, including underscores and dashes", $1 will contain whatever alphanumeric characters were between "userinfo-" and ".html" (including underscores and dashes). In PHP-Nuke, this is the username, so that the URL returned by mod_rewrite will be

modules.php?name=Your_Account&op=userinfo&username=(some matched username)

thus completing the transformation of a static URL (that PHP-Nuke does not understand), to a dynamic one that makes perfectly sense to PHP-Nuke (see Section 25.5.1.3 for the complete picture).

Notes

[1]

The regular expression matches the HTML code for the text shown in Figure 25-4, where the capital letters A and C were enclosed in <i> tags. This makes it look more formidable than it actually is.


Help us make a better PHP-Nuke HOWTO!

Want to contribute to this HOWTO? Have a suggestion or a solution to a problem that was not treated here? Post your comments on my PHP-Nuke Forum!

Chris Karakas, Maintainer PHP-Nuke HOWTO

.:: WebMaster Ing. Francesco Feruzzi :: ©2005 IngegneriConLePalle.com :: Regolamento ::.
Generazione pagina: 0.20 Secondi
Creative Commons License
Eccetto dove diversamente specificato, i contenuti di questo sito sono rilasciati sotto Licenza Creative Commons Attribuzione 2.5.

Add to Google
SEO Stats powered by MyPagerank.Net