网上关于Url转链接(href)的正则表达式一搜一大堆,但真正好用的没几个。
后来在Matthew O'Riordan的Blog上发现一个很好用的正则表达式,是用Javascript写的,代码如下:
(
( // brackets covering match for protocol (optional) and domain
([A-Za-z]{3,9}:(?://)?) // match protocol, allow in format http:// or mailto:
(?:[-;:&=+$,w]+@)? // allow something@ for email addresses
[A-Za-z0-9.-]+ // anything looking at all like a domain, non-unicode domains
| // or instead of above
(?:www.|[-;:&=+$,w]+@) // starting with something@ or www.
[A-Za-z0-9.-]+ // anything looking at all like a domain
)
( // brackets covering match for path, query string and anchor
(?:/[+~%/.w-]*) // allow optional /path
???(?:[-+=&;%@.w]*) // allow optional query string starting with ?
#?(?:[.!/\w]*) // allow optional anchor #anchor
)? // make URL suffix optional
)
针对我们的使用场景(只对http或https开头的Url进行转换)简化了一下,并用C#写出:
public static class ContentFormatter
{
private static readonly Regex Url_To_Link = new Regex(@"(?<url>
(https?:(?://)?) # match protocol, allow in format http:// or https://
[A-Za-z0-9.-]+ # anything looking at all like a domain, non-unicode domains
( # brackets covering match for path, query string and anchor
(?:/[+~%/.w-]*)? # allow optional /path
??(?:[-+=&;%@.w]*?) # allow optional query string starting with ?
#?(?:[.!/\w-]*) # allow optional anchor #anchor
)? # make URL suffix optional
)",
RegexOptions.Compiled | RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace,
TimeSpan.FromMilliseconds(100));
public static string UrlToLink(string text)
{
if (string.IsNullOrEmpty(text)) return string.Empty;
return Url_To_Link.Replace(text, "<a href="${url}" target="_blank">${url}</a>");
}
}