WebFund 2014W Lecture 20: Difference between revisions
Created page with "Basics of Regular Expressions * start and end: / * . represents any single character * * is 0 or more repeats, + is one or more repeats * Thus .* matches any number of charact..." |
No edit summary |
||
(One intermediate revision by the same user not shown) | |||
Line 1: | Line 1: | ||
The video from the lecture given on March 26, 2014 is available: | |||
* [http://www.screencast.com/t/PTqsRvHFX9Gd Small from screencast.com] | |||
* [http://www.screencast.com/t/i5kHY4YQKzd Large from screencast.com] | |||
* [http://dl.cuol.ca/capture/Anil.Somayaji/COMP_2406_Lecture_20_-_20140326_142525_27.mp4 Original from CUOL] | |||
==Notes== | |||
Basics of Regular Expressions | Basics of Regular Expressions | ||
* start and end: / | * start and end: / | ||
Line 17: | Line 26: | ||
Chomsky hierarchy | Chomsky hierarchy | ||
Regular expressions are used for '''input sanitization''' | |||
* make sure externally provided input is "safe" | |||
* two strategies: whitelists and blacklists | |||
** whilelist: explicit list of "good" things | |||
** blacklist: explicit list of "bad" things | |||
* use whitelists when possible | |||
* whitelists are more work because if you leave anything out things break |
Latest revision as of 20:28, 26 March 2014
The video from the lecture given on March 26, 2014 is available:
Notes
Basics of Regular Expressions
- start and end: /
- . represents any single character
- * is 0 or more repeats, + is one or more repeats
- Thus .* matches any number of characters (including none)
- () denote groups, normally for extraction or later substitution
- Each group is numbered, so first () is $1 (or something like that)
- can include letter ranges in [], e.g. [a-z]
- An all lowercase word with at least one character is: /[a-z]+/
- | means or (as usual), and is implicit
Apparently there are regular expression decoders online somewhere
Escaped characters
- \ is used to treat special characters as literals
- % followed by hex numbers denotes character codes
Chomsky hierarchy
Regular expressions are used for input sanitization
- make sure externally provided input is "safe"
- two strategies: whitelists and blacklists
- whilelist: explicit list of "good" things
- blacklist: explicit list of "bad" things
- use whitelists when possible
- whitelists are more work because if you leave anything out things break