Jalaj P. Jha

Technical & Miscellaneous Ramblings

Regular Expressions in VB

with one comment

I sat down to write a function that takes string (which, in fact, contained a complete HTML page code) and parse it to return url of all pages/images/CSS called from HTML in string. Given conditions as

1. Pages may linked using href=”<url>”
2. There may be or may not be quotes around url
3. The equal-to sign may have one or more space(s) on either or both sides or may have spaces on any sides
4. CSS/JS pages or Images may be linked using src=”<url>”
5. Conditions 2-3 apply for these pages/Images too.
6. CSS files may be called using @import “<url>”
7. Condition 2 apply for above
8. Inline CSS declarations may call images using url(”<url>”)
9. Condition 2 apply for above
that’s all…

Thanks to Regular Expressions that took me only a few lines of code to implement all these.

Dim re As New RegExp
Dim m As Match

re.IgnoreCase = True
re.Pattern = "((src|href)\\s*=\\s*|@import\\s*|url\\s*\\()""*\\s*[a-z0-9/_%:\\.&-\\?\\+]+\\s*""*\\s*\\)*"
re.Global = True

For Each m In re.Execute(rtext)

' m contains the url text along with href/src/import/url
' These may again be filtered to get clean url
' The values may be returned in array or as required. Say as…
' MsgBox m
Next

The code assumes that
1. The project references “Microsoft VBScript Regular Expressions 5.5″
2. variable rtext contains the HTML code to extract URLs from

Written by Jalaj

December 17, 2006 at 6:23 am

One Response

Subscribe to comments with RSS.

  1. [...] urls from page and that was Regular Expressions. Next two posts were related to them titled Regular Expressions in VB and Regular Expressions – A [...]


Leave a Reply