Rule based search¶

Overview¶

Camphr provides some rule based matching pipelines: PatternSearcher and RegexRuler, and MultipleRegexRuler. These pipelines are character-based, which means that they are more robust but could be more susceptible to false positives than token-based spaCy pipelines Matcher and PhraseMatcher .

Usage: RegexRuler¶

  1. Create a pipe

    >>> import spacy
    >>> from camphr.pipelines import RegexRuler
    >>> nlp = spacy.blank("en")
    >>> pattern = r"[\d-]+"
    >>> pipe = RegexRuler(pattern, label="PHONE_NUMBER")
    >>> nlp.add_pipe(pipe)
    
  2. Parse a text

    >>> text = "My phone number is 012-2345-6666"
    >>> doc = nlp(text)
    >>> print(doc.ents)
    (012-2345-6666,)
    >>> print(doc.ents[0].label_)
    PHONE_NUMBER
    

Usage: MultipleRegexRuler¶

You can use multiple patterns with MultipleRegexRuler

  1. Create a pipe

    >>> import spacy
    >>> from camphr.pipelines import MultipleRegexRuler
    >>> nlp = spacy.blank("en")
    >>> patterns = {"PHONE_NUMBER": r"[\d-]+", "EMAIL": "[\w.]+@[\w.]+"}
    >>> pipe = MultipleRegexRuler(patterns)
    >>> nlp.add_pipe(pipe)
    
  2. Parse a text

    >>> text = "Phone: 012-2345-6666, email: bob@foomail.com"
    >>> doc = nlp(text)
    >>> print(doc.ents)
    (012-2345-6666, bob@foomail.com)
    >>> print([e.label_ for e in doc.ents])
    ['PHONE_NUMBER', 'EMAIL']
    

Usage: PatternSearcher¶

PatternSearcher is useful when you want to look up words based on a large dictionary, thanks to pyahocorasick . This pipeline searches words based on characters, while spaCy provides a similar pipeline PhraseMatcher which is a token-based searcher.

  1. Create a pipe

    >>> import spacy
    >>> nlp = spacy.blank("en")
    >>> pipe = PatternSearcher.from_words(["text", "pattern searcher"]) # add words
    >>> nlp.add_pipe(pipe)
    
  2. Parse a text

    >>> text = "This is a test text for pattern searcher."
    >>> doc = nlp(text)
    >>> doc.ents
    (text, pattern searcher)
    

Logo

Navigation

  • Transformers
  • Fine tuning Transformers
  • Udify
  • Elmo
  • Sentencepiece as a spacy.Language
  • Rule based search
  • Load models with YAML or JSON
  • KNP

Related Topics

  • Documentation overview
    • Previous: Sentencepiece as a spacy.Language
    • Next: Load models with YAML or JSON

Quick search

©2020, tamuhey. | Powered by Sphinx 4.0.2 & Alabaster 0.7.12 | Page source