Cutoff Fuzzy MergingΒΆ

Smoke tests:
>>> from techminer2.thesaurus.user import InitializeThesaurus
>>> (
...     InitializeThesaurus()
...     .with_thesaurus_file("demo.the.txt")
...     .with_field("raw_descriptors")
...     .where_root_directory("examples/fintech/")
...     .using_colored_output(False)
...     .run()
... )
INFO: Thesaurus initialized successfully.
  Success : True
  File    : examples/fintech/data/thesaurus/demo.the.txt
  Status  : 1721 keys found
  Header  :
    A_A_THEORY
      A_A_THEORY
    A_BASIC_RANDOM_SAMPLING_STRATEGY
      A_BASIC_RANDOM_SAMPLING_STRATEGY
    A_BEHAVIOURAL_PERSPECTIVE
      A_BEHAVIOURAL_PERSPECTIVE
    A_BETTER_UNDERSTANDING
      A_BETTER_UNDERSTANDING
    A_BLOCKCHAIN_IMPLEMENTATION_STUDY
      A_BLOCKCHAIN_IMPLEMENTATION_STUDY
    A_CASE_STUDY
      A_CASE_STUDY
    A_CHALLENGE
      A_CHALLENGE
    A_CLUSTER_ANALYSIS
      A_CLUSTER_ANALYSIS
>>> from techminer2.thesaurus.user import ReduceKeys
>>> (
...     ReduceKeys()
...     .with_thesaurus_file("demo.the.txt")
...     .using_colored_output(False)
...     .where_root_directory("examples/fintech/")
...     .run()
... )
INFO: Thesaurus keys reduced successfully.
  Success : True
  File    : examples/fintech/data/thesaurus/demo.the.txt
  Status  : 0 changed keys
  Header  :
    A_A_THEORY
      A_A_THEORY
    A_BASIC_RANDOM_SAMPLING_STRATEGY
      A_BASIC_RANDOM_SAMPLING_STRATEGY
    A_BEHAVIOURAL_PERSPECTIVE
      A_BEHAVIOURAL_PERSPECTIVE
    A_BETTER_UNDERSTANDING
      A_BETTER_UNDERSTANDING
    A_BLOCKCHAIN_IMPLEMENTATION_STUDY
      A_BLOCKCHAIN_IMPLEMENTATION_STUDY
    A_CASE_STUDY
      A_CASE_STUDY
    A_CHALLENGE
      A_CHALLENGE
    A_CLUSTER_ANALYSIS
      A_CLUSTER_ANALYSIS
>>> from techminer2.thesaurus.user import CutoffFuzzyMerging
>>> r = (
...     CutoffFuzzyMerging(tqdm_disable=True)
...     .with_thesaurus_file("demo.the.txt")
...     .with_field("raw_descriptors")
...     .where_root_directory("examples/fintech/")
...     .using_cutoff_threshold(85)
...     .using_match_threshold(95)
...     .run()
... ).to_string()
>>> print(r)
                                                  lead                                                 candidate  fuzzy  cutoff
0                            SYSTEMIC_INNOVATION_MODEL                               A_SYSTEMIC_INNOVATION_MODEL  100.0    96.0
1               COMPETITIVE_AND_COOPERATIVE_MECHANISMS                THE_COMPETITIVE_AND_COOPERATIVE_MECHANISMS  100.0    95.0
2              ECONOMIC_AND_TECHNOLOGICAL_DETERMINANTS               THE_ECONOMIC_AND_TECHNOLOGICAL_DETERMINANTS  100.0    95.0
3                                  GROWING_COMPETITION                                     A_GROWING_COMPETITION  100.0    95.0
4                                 MULTI_LEVEL_ANALYSIS                                    A_MULTI_LEVEL_ANALYSIS  100.0    95.0
5                                THEORETICAL_FRAMEWORK                                   A_THEORETICAL_FRAMEWORK  100.0    95.0
6                                     CLUSTER_ANALYSIS                                        A_CLUSTER_ANALYSIS  100.0    94.0
7                                    HYBRID_MCDM_MODEL                                       A_HYBRID_MCDM_MODEL  100.0    94.0
8   UNIFIED_THEORY_OF_ACCEPTANCE_AND_USE_OF_TECHNOLOGY  UNIFIED_THEORY_OF_ACCEPTANCE_AND_USE_OF_TECHNOLOGY_MODEL  100.0    94.0
9                          A_SYSTEMIC_INNOVATION_MODEL                           A_NEW_SYSTEMIC_INNOVATION_MODEL  100.0    93.0
10                        ELABORATION_LIKELIHOOD_MODEL                          THE_ELABORATION_LIKELIHOOD_MODEL  100.0    93.0
11                         TECHNOLOGY_ACCEPTANCE_MODEL                           THE_TECHNOLOGY_ACCEPTANCE_MODEL  100.0    93.0
12                             A_THEORETICAL_FRAMEWORK                               A_NEW_THEORETICAL_FRAMEWORK  100.0    92.0
13                                         COMPETITION                                             A_COMPETITION  100.0    92.0
14                              HISTORICAL_DEVELOPMENT                                THE_HISTORICAL_DEVELOPMENT  100.0    92.0
15                              INFORMATION_TECHNOLOGY                                THE_INFORMATION_TECHNOLOGY  100.0    92.0
16                             SUSTAINABLE_DEVELOPMENT                               THE_SUSTAINABLE_DEVELOPMENT  100.0    92.0
17                            SYSTEMIC_CHARACTERISTICS                              THE_SYSTEMIC_CHARACTERISTICS  100.0    92.0
18                                          DEFINITION                                              A_DEFINITION  100.0    91.0
19                                DEVELOPING_COUNTRIES                                  THE_DEVELOPING_COUNTRIES  100.0    91.0
20                               INNOVATION_MECHANISMS                                 THE_INNOVATION_MECHANISMS  100.0    91.0
21                                STRATEGIC_CAPABILITY                                  THE_STRATEGIC_CAPABILITY  100.0    91.0
22                               THEORETICAL_FRAMEWORK                                 THE_THEORETICAL_FRAMEWORK  100.0    91.0
23                                            TAXONOMY                                                A_TAXONOMY  100.0    89.0
24                                       ORGANIZATIONS                                           AN_ORGANIZATION   96.0    86.0
25                                 RESEARCH_FRAMEWORKS                                      A_RESEARCH_FRAMEWORK   95.0    92.0
26                               CONCEPTUAL_FRAMEWORKS                                  THE_CONCEPTUAL_FRAMEWORK   95.0    89.0
27                                          CHALLENGES                                               A_CHALLENGE   95.0    86.0