Hello, I am parsing out some emails. Mobile Mail, iPhone and I assume iPod touch append a signature as a separate boundary, making it simple to remove. Not all mail clients do, and just use '--' as a signature delimiter.
I need to chop off the '--' from a string, but only the last occurrence of it.
Sample copy
hello, this is some email copy-- check this out
--
Tom Foolery
I thougth about splitting on '--', removing the last part, and I would have it, but explode() and split() neither seem to return great values for letting me know if it did anything, in the event there is not a match.
I can not get preg_replace to go across more than one line. I have standardized all line endings to \n
What is the best suggestion to end up with "hello, this is some email copy-- check this out", taking not, there will be cases where there is no signature, and there are of course going to be cases where I can not cover all the cases.
-
Actually correct signature delimiter is
"-- \n"(note the space before newline), thus the delimiter regexp should be'^-- $'. Although you might consider using'^--\s*$', so it'll work with OE, which gets it wrong.John Saunders : I was unaware there was a standard for signature format. Can you cite?vartec : RFC3676 section 4.3Tomalak : Which would be http://tools.ietf.org/html/rfc3676#section-4.3. As the RFC states, it's more a widely accepted convention than a real standard.Kibbee : good information but I highly doubt that you could expect it to be consistent.vartec : @Kibbee: most mailers follow this RFC. Some (like e.g. OE) strip *all* trailing whitespace, '^--\s*$' works in both cases.vartec : @scott: true, but then there's nothing that can be done about signatures that don't comply. -
Try this:
preg_replace('/--[\r\n]+.*/s', '', $body)This will remove everything after the first occurence of
--followed by one or more line break characters. If you just want to remove the last occurence, use/.*--[\r\n]+.*/sinstead.Piskvor : Just to clarify: the final /s makes the regex treat the whole string as a [S]ingle line -
Instead of just chopping of everything after -- could you not cache the last few emails sent by that user or service and compare. The bit at the bottom that looks like the others can be safely removed leaving the proper message intact.
-
I think in the interest of being more bulletproof, I will take the non regex route
echo substr($body, 0, strrpos($body, "\n--"));
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.