Dr. Logan | COM Courses | Syllabus | Reading Notes

Quick References

The Email Example: PHP uses the mail() function to send out emails (we'll discuss mail protocols later). Its syntax is bool mail (string to, string subject, string message, string [additional_headers [, string additional_parameters]]);. Parameters to, subject, and message are required. The additional_headers include optionsal things like "from," "Cc:," or "Reply-to:" fields (See RFC822: Standard for ARPA Internet Text Messages for W3 documentation).

Formatting matters! Text is made up of words, symbols, and numbers. Interpreting, scanning, or modifying text is a significant part of form-driven interactive web sites. This chapter's example uses information from a form to create an email. If the email addresses, for example, aren't right, the email won't work. The address has to have an @sign and a dot, for example. String formatting and regular expressions are all about tearing apart information to examine how it is put together, sometimes a character at a time.

Whitespace trimming: A form will send what the user types. Extra spaces at the beginning or ending of a string should be trimmed off. Function trim() removes spaces from the front and end of a string; these whitespaces include newlines (\n and \r), horizontal and vertical tabs (\t and \x0B), end-of-string markers (\0) and simple spaces; you can also pass it a list of other characters to remove. Variations ltrim() and rtrim() remove whitespace from the left or right sides only.

Preformatted line-breaks: Information returned from form input type "textbox," or any other preformatted text, can be preserved by nl2br() (n-letter el-2br), the newline-to-break converter, which replaces newline characters (coming from the textbox) with XHTML "<br />" tags. This effect (used with echo) is similar to the html <pre> preformatted text tag, preserving line breaks.

Formatted strings: Functions echo() and print () merely copy text as recieved to the output html stream (print also returns a boolean true if successful; see page 108). Functions sprintf() and printf() respectively return and print formatted strings. They take a string and a set of mixed agruments as parameters; the string includes embedded conversion specifications—syntax %['padding_character][-][width][.precision] type—telling how to format, and the mixed arguments are data to be substituted for the embedded conversions. Padding (optional) adds a leading character; I might want to add a leading zero to the zip code for Kingston if the form value was 2881, for instance. The data in the field is right-justified by default; adding the "-" to the conversion specification makes it left-justified. Width specifies how much room to leave for the variable that is going to be substituted, and precision (begins with a decimal point) tells how many places to use after the decimal point. Type refers to one of the following codes:

Here's an example:
printf ("Hello, %s. Your credit card will be charged %.2f ", $firstname, $charge_total);

In the example, the two arguments follow the string, in the order in which they would be substituted into the string. In PHP 5 (4.0.6, actually), you can specify the order in which the arguments enter the string by placing an escaped number, as in %2\...to indicated that the second argument should be substituted. If you already have the arguments in an array, use the print variants vprintf() and vsprint(), which take a string and the array as arguments.

String case: You can make a string all uppercase (strtoupper()) or all lowercase (strtolower()). You can even capitalize only the first letter of the string (ucfirst()).

Trouble-making Strings in Databases (SQL-related): As we work with databases, we'll write instructions using Structured Query Language (SQL); we'll be referring to strings and variables, and often these contain single quotes or back-slashes which confuse SQL (which uses such characters as part of its own syntax). You have to mark these characters with escape sequences so the SQL engine will know what to do with them. These escapes include single quotes (\'), double quotes (\"), and backslashes (\\). Functions addslashes() and stripslashes() add and remove single backslashes to provide escape sequences. (see also chapter 23)

Splitting or Joining Strings: Function explode() creates an array from a string, using a specified delimiter. This allows you to unravel a string of comma- or tab-delimited data, a common output from spreadsheets, for example. Syntax is array explode (separator, string); an example is $my_array = (',', $my_string);. The opposite is implode() which is the same as join().

Function strtok() splits a string into tokens based on a character or string of characters. Syntax is string strtok(string, separator); an example is $my_word = strtok ($my_sentence, ' ');, which could be used to peal off a word in a sentence. Notice that this will produce a single string, returning only the piece that preceeding the separator (here, the space). Page 115 illustrates the peculiar nature of using this to parce (=break into pieces) a string by repeatedly invoking the function; after the first use, subsequent calls do not include the string itself, but only the separator. PHP will keep a pointer in the string, advancing it with each call; re-including the string in the call will reset the pointer to the beginning of the string.

Function substr() searches for a specified substring between specified parts of a string. Syntax is string substr(string, int start [,int length] ); an example is echo substr($my_array, 5, -9);, which would return characters from the 6th (the first character is 0) to the 9th from the end character of the string). See page 116 for more examples.

String Comparison: We could compare strings using the identity operators (==), but PHP functions can do more for us. Function strcmp() compares two strings and returns an integer, 0 if they are identical, a positive number if the first string would come after the second in a dictionary listing, and a negative number if otherwise; it is case sensitive (remember that upper case letters come before lower case). Function strcasecmp() performs the same comparison, also returning an integer, but it is not case sensitive.

Functions strnatcmp() and case-insensitive strnatcasecmp() compare strings in a more "natural order," recognizing that 2 is less than 12, for instance. (more).

String length: Function strlen() returns an integer indicating length of the string.

String Matching

Finding Strings: You could use string tokens and identity operators to search for words or phrases in a string, but there are better ways, using string-matching and regular expression-matching functions.

Strings within Strings: The generic function strstr() looks for the needle in the haystack, using syntax string strstr(string haystack, string needle); the function either returns false (needle not found) or the rest of haystack following the first occurrence of the needle. Variant stristr() is case insensitive but otherwise the same as strstr(); strrchr() is like strstr() but returns the haystack from the last occurrence of the needle.

To find the position of a string within a string, strpos() and strrpos() resemble strstr() but return the numerical position of the needle within the haystack (or false if the needle isn't there—see page 119 about the need to test for false using the identity operator (===) to distinguish from a numerical 0 return, as when the needle is found at the first character); strpos() runs faster than strstr(), making it a preferred way to look for the needle. The integer returned is the first position of needle; an optional third parameter allows you to start the search a specified number of characters into the string. The position of the last occurrence of needle is returned by function strrpos().

Replacing Strings: You can search a string and substitute one sub-string for another using str_replace(). Syntax is mixed str_replace(mixed needle, mixed replacement_needle, mixed haystack [, int &count]); count indicates the number of replacements made.

Function substr_replace() finds and replaces a particular substring based on position. Syntax is string substr_replace ( string substring_being_replaced, string replacement, int start, int [length] ); Parameter start is an offset into the string to mark where replacement should begin; zero or positive is offset from the beginning of the string and negative is offset from the end. Optional length marks a place beyond which replacements should not be made; if omitted, all instances of the substring will be replaced. Example, echo substr_replace('Hello world!', 'Joe!, 6); (more)

Introduction to Regular Expressions

Matching Patterns: If I want to check a value to see whether it is an email address, a telephone number, or a social security number, there are certain patterns that I look for, regardless of the words in the address, or the values in the number. That is, I recognize 123-45-6789 as the pattern of a social security number, whether it is an actual number or not. Patterns include characters that may occur at the beginning or end of a string, in repeated groups, or in characters of a specific type (numbers, upper case letters, etc.) . Regular Expressions are tools that can help us look for patterns.

Character Sets and Classes: Character sets are used to match any character of a particular type. The dot ( . ) character can be a wildcard to represent any other single character except a newline (\n) character. You can be more specific, limiting a character to be any lower case letter by inserting (in brackets) [a-z], any upper case letter [A-Z], any letter at all [a-zA-Z], or any list of letters [aeiou]; note that the latter is a list of possible single characters, not a string of 5. You can negate the character list by placing a caret inside of the brackets; i.e., [^a-z] excludes any lower case letter.

Similarly, you can use several defined character classes, as follows:

Repetition: Embedded in a regular expression, directly after whatever part of the expression it applies to, * means that the pattern can be repeated zero or more times and + means it can be repeated one or more times.

Subexpressions: Expressions can be split using parentheses. You can specify how many times something can be repeated using a number in curly braces, e.g., {3}, or {2,3} which means repeated 2 or 3 times, or {3, } which is open-ended and means "at least 3 repetitions".

Use the caret (^) at the start of a regular expression to mean that it must appear at the beginning of a searched string, and $ at the end of a regular expression to mean that it must appear at the end of the searched string; examples, ^H searches for an uppercase H as the first character in a string; ing$ looks for "ing" at the end of the string, and ^Hello$ looks for the capitalized word Hello and nothing else. A vertical pipe ( | ) separates options in a search, as in cat | dog.

Trouble-makers:

Regular expression characters are included in table 4.4 (page 125):

Here are a few practice exercises using these characters.

Functions that Use Regular Expressions: Functions ereg() and eregi() use regular expressions to search for match patterns. Syntax for ereg() is int ereg (string pattern, string search, array [matches]); eregi() is identical but not case sensitive.

Function ereg_replace() and case-insensitive eregi_replace() search and replace substrings. Syntax is string ereg_replace (string pattern, string replacement, string search); where pattern is a regular expression in the search string, to be replaced by replacement.

Function split() splits the string into substrings. Syntax is array split (string pattern, string search [, int max]);. This returns an array of substrings up to length max. It is useful in splitting up email addresses, domain names, or dates. See examples on page 127.

References