erasePunctuation
Erase punctuation from text and documents
Syntax
Description
erases punctuation and symbols from newDocuments
= erasePunctuation(documents
)documents
. If a word is
empty after removing punctuation and symbol characters, then the function removes
it. For tokenized document input, the function erases punctuation from tokens with
type 'punctuation'
and 'other'
. For example,
the function does not erase punctuation and symbol characters from URLs and email
addresses.
erases punctuation and symbols from only the specified token types.newDocuments
= erasePunctuation(documents
,'TokenTypes',types
)
Examples
Input Arguments
Output Arguments
More About
Tips
For string input,
erasePunctuation
removes punctuation characters from URLs and HTML tags. This behavior can prevent the functionseraseTags
,eraseURLs
, anddecodeHTMLEntities
from working as expected. If you want to use these functions to preprocess your text, then use these functions before usingerasePunctuation
.
References
[1] Unicode Character Categories. https://www.fileformat.info/info/unicode/category/index.htm