Regular Expressions in VB
I sat down to write a function that takes string (which, in fact, contained a complete HTML page code) and parse it to return url of all pages/images/CSS called from HTML in string. Given conditions as
1. Pages may linked using href=”<url>”
2. There may be or may not be quotes around url
3. The equal-to sign may have one or more space(s) on either or both sides or may have spaces on any sides
4. CSS/JS pages or Images may be linked using src=”<url>”
5. Conditions 2-3 apply for these pages/Images too.
6. CSS files may be called using @import “<url>”
7. Condition 2 apply for above
8. Inline CSS declarations may call images using url(”<url>”)
9. Condition 2 apply for above
that’s all…
Thanks to Regular Expressions that took me only a few lines of code to implement all these.
Dim re As New RegExp Dim m As Match re.IgnoreCase = True re.Pattern = "((src|href)\\s*=\\s*|@import\\s*|url\\s*\\()""*\\s*[a-z0-9/_%:\\.&-\\?\\+]+\\s*""*\\s*\\)*" re.Global = True For Each m In re.Execute(rtext) ' m contains the url text along with href/src/import/url ' These may again be filtered to get clean url ' The values may be returned in array or as required. Say as… ' MsgBox m Next
The code assumes that
1. The project references “Microsoft VBScript Regular Expressions 5.5″
2. variable rtext contains the HTML code to extract URLs from



[...] urls from page and that was Regular Expressions. Next two posts were related to them titled Regular Expressions in VB and Regular Expressions – A [...]
The Blog Revisited - 1 « Jalaj P. Jha
December 11, 2008 at 12:44 am