Sort by OccurrencesΒΆ

Example

>>> # TEST PREPARATION
>>> import sys
>>> from io import StringIO
>>> from techminer2.thesaurus.user import CreateThesaurus, SortByOccurrences
>>> # Redirecting stderr to avoid messages during doctests
>>> original_stderr = sys.stderr
>>> sys.stderr = StringIO()
>>> # Reset the thesaurus to initial state
>>> CreateThesaurus(thesaurus_file="demo.the.txt", field="raw_descriptors",
...     root_directory="example/", quiet=True).run()
>>> # Creates, configures, an run the sorter
>>> sorter = (
...     SortByOccurrences()
...     .with_thesaurus_file("demo.the.txt")
...     .with_field("raw_descriptors")
...     .where_root_directory_is("example/")
... )
>>> sorter.run()
>>> # Capture and print stderr output to test the code using doctest
>>> output = sys.stderr.getvalue()
>>> sys.stderr = StringIO()
>>> print(output)
Reducing thesaurus keys
  File : example/thesaurus/demo.the.txt
  Keys reduced from 1729 to 1729
  Keys reduction completed successfully

Sorting thesaurus by occurrences
  File : example/thesaurus/demo.the.txt
  Thesaurus sorting completed successfully

Printing thesaurus header
  File : example/thesaurus/demo.the.txt

    FINTECH
      FINTECH; FINTECHS
    FINANCE
      FINANCE
    INNOVATION
      INNOVATION; INNOVATIONS
    TECHNOLOGIES
      TECHNOLOGIES; TECHNOLOGY
    FINANCIAL_SERVICE
      FINANCIAL_SERVICE; FINANCIAL_SERVICES
    FINANCIAL_TECHNOLOGIES
      FINANCIAL_TECHNOLOGIES; FINANCIAL_TECHNOLOGY
    BANKS
      BANKS
    THE_FINANCIAL_INDUSTRY
      THE_FINANCIAL_INDUSTRY