IT Anawer: How can I write a regular expression to capture links with no link text?

How can I write a regular expression to replace links with no link text like this:

<a href="http://www.somesite.com"></a>

with

<a href="http://www.somesite.com">http://www.somesite.com</a>

This is what I was trying to do to capture the matches, and it isn't catching any. What am I doing wrong?

string pattern = "<a\\s+href\\s*=\\s*\"(?<href>.*)\">\\s*</a>";

From stackoverflow

I could be wrong, but I think you simply need to change the quantifier within the href group to be lazy rather than greedy.
```
string pattern = @"<a\s+href\s*=\s*""(?<href>.*?)"">\s*</a>";
```
(I've also changed the type of the string literal to use @, for better readability.)

The rest of the regex appears fine to me. That you're not capturing any matches at all makes me think otherwise, but there could be a problem in the rest of the code (or even the input data - have you verified that?).
I would suggest
```
string pattern = "(<a\\b[^>]*href=\"([^\"]+)\"[^>]*>)[\\s\\r\\n]*(</a>)";
```
This way also links with their href attribute somewhere else would be captured.

Replace with
```
"$1$2$3"
```
The usual word of warning: HTML and regex are essentially incompatible. Use with caution, this might blow up.
I wouldn't use a regex - I'd use the Html Agility Pack, and a query like:
```
foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[.='']")) {
    link.InnerText = link.GetAttribute("href");
}
```
womp : +1 for my daily dose of learning something new.

Tomalak : +1 for avoiding regex shallows.
Marc Gravell has the right answer, regexes are fundamentally bad at parsing HTML (see Can you provide some examples of why it is hard to parse XML and HTML with a regex? for why). See Can you provide an example of parsing HTML with your favorite parser? for examples using a variety of parsers.

IT Anawer