Context Free Grammar
pyformlang.cfg
This submodule implements functions related to context-free grammars.
Available Classes
- CFG
The main context-free grammar class
- Production
A class to represent a production in a CFG
- Variable
A context-free grammar variable
- Terminal
A context-free grammar terminal
- Epsilon
The epsilon symbol (special terminal)
- class pyformlang.cfg.CFG(variables: AbstractSet[Variable | str] = None, terminals: AbstractSet[Terminal | str] = None, start_symbol: Variable | str = None, productions: Iterable[Production] = None)[source]
A class representing a context free grammar
- Parameters:
- variablesset of
Variable, optional The variables of the CFG
- terminalsset of
Terminal, optional The terminals of the CFG
- start_symbol
Variable, optional The start symbol
- productionsset of
Production, optional The productions or rules of the CFG
- variablesset of
- concatenate(other: CFG) CFG[source]
Makes the concatenation of two CFGs
- Equivalent to:
>> cfg0 + cfg1
- contains(word: Iterable[Terminal | str]) bool[source]
Gives the membership of a word to the grammar
- Parameters:
- worditerable of
Terminal The word to check
- worditerable of
- Returns:
- containsbool
Whether word if in the CFG or not
- eliminate_unit_productions() CFG[source]
Eliminate all the unit production in the CFG
- Returns:
- new_cfg
CFG A new CFG equivalent without unit productions
- new_cfg
- classmethod from_text(text, start_symbol=Variable(S))[source]
Read a context free grammar from a text. The text contains one rule per line. The structure of a production is: head -> body1 | body2 | … | bodyn where | separates the bodies. A variable (or non terminal) begins by a capital letter. A terminal begins by a non-capital character Terminals and Variables are separated by spaces. An epsilon symbol can be represented by epsilon, $, ε, ϵ or Є. If you want to have a variable name starting with a non-capital letter or a terminal starting with a capital letter, you can explicitly give the type of your symbol with “VAR:yourVariableName” or “TER:yourTerminalName” (with the quotation marks). For example: S -> “TER:John” “VAR:d” a b
- Parameters:
- textstr
The text of transform
- start_symbolstr, optional
The start symbol, S by default
- Returns:
- cfg
CFG A context free grammar.
- cfg
- generate_epsilon()[source]
Whether the grammar generates epsilon or not
- Returns:
- generate_epsilonbool
Whether epsilon is generated or not by the CFG
- get_closure() CFG[source]
Gets the closure of the CFG (*)
- Returns:
- new_cfg
CFG The closure of the current CFG
- new_cfg
- get_cnf_parse_tree(word)[source]
Get a parse tree of the CNF of this grammar
- Parameters:
- worditerable of
Terminal The word to look for
- worditerable of
- Returns:
- derivation
ParseTree The parse tree
- derivation
- get_generating_symbols() AbstractSet[CFGObject][source]
Gives the objects which are generating in the CFG
- Returns:
- generating_symbolsset of
CFGObject The generating symbols of the CFG
- generating_symbolsset of
- get_nullable_symbols() AbstractSet[CFGObject][source]
Gives the objects which are nullable in the CFG
- Returns:
- nullable_symbolsset of
CFGObject The nullable symbols of the CFG
- nullable_symbolsset of
- get_positive_closure() CFG[source]
Gets the positive closure of the CFG (+)
- Returns:
- new_cfg
CFG The positive closure of the current CFG
- new_cfg
- get_prefix_language()[source]
Generates the prefix language of the CFL, i.e., the language of all prefixes of valid words Based on https://cs.uwaterloo.ca/~s4bendav/files/CS360S21Lec10.pdf
- get_reachable_symbols() AbstractSet[CFGObject][source]
Gives the objects which are reachable in the CFG
- Returns:
- reachable_symbolsset of
CFGObject The reachable symbols of the CFG
- reachable_symbolsset of
- get_suffix_language()[source]
Generates the suffix language of the CFL, i.e., the language containing all suffixes of valid words
- get_unit_pairs() AbstractSet[Tuple[Variable, Variable]][source]
Finds all the unit pairs
- Returns:
- unit_pairsset of tuple of
Variable The unit pairs
- unit_pairsset of tuple of
- get_words(max_length: int = -1)[source]
Get the words generated by the CFG
- Parameters:
- max_lengthint
The maximum length of the words to return
- intersection(other: Any) CFG[source]
Gives the intersection of the current CFG with an other object
- Equivalent to:
>> cfg and regex
- Parameters:
- otherany
The other object
- Returns:
- new_cfg
CFG A CFG representing the intersection with the other object
- new_cfg
- Raises:
- NotImplementedError
When trying to intersect with something else than a regex or a finite automaton
- is_empty() bool[source]
Says whether the CFG is empty or not
- Returns:
- is_emptybool
Whether the CFG is empty or not
- is_finite() bool[source]
Tests if the grammar is finite or not
- Returns:
- is_finitebool
Whether the grammar is finite or not
- is_normal_form()[source]
Tells is the current grammar is in Chomsky Normal Form or not
- Returns:
- is_normal_formbool
If the current grammar is in CNF
- property productions: AbstractSet[Production]
Gives the productions
- Returns:
- productionsset of
Production The productions of the CFG
- productionsset of
- remove_epsilon() CFG[source]
Removes the epsilon of a cfg
- Returns:
- new_cfg
CFG The CFG without epsilons
- new_cfg
- remove_useless_symbols() CFG[source]
Removes useless symbols in a CFG
- Returns:
- new_cfg
CFG The CFG without useless symbols
- new_cfg
- reverse() CFG[source]
Reverse the current CFG
- Equivalent to:
>> ~cfg
- Returns:
- new_cfg
CFG Reverse the current CFG
- new_cfg
- property start_symbol: Variable
Gives the start symbol
- Returns:
- start_variable
Variable The start symbol of the CFG
- start_variable
- substitute(substitution: Dict[Terminal, CFG]) CFG[source]
Substitutes CFG to terminals in the current CFG
- property terminals: AbstractSet[Terminal]
Gives the terminals
- Returns:
- terminalsset of
Terminal The terminals of the CFG
- terminalsset of
- to_normal_form() CFG[source]
Gets the Chomsky Normal Form of a CFG
- Returns:
- new_cfg
CFG A new CFG equivalent in the CNF form
- new_cfg
Warning
As described in Hopcroft’s textbook, a normal form does not generate the epsilon word. So, the grammar generated by this function is equivalent to the original grammar except if this grammar generates the epsilon word. In that case, the language of the generated grammar contains the same word as before, except the epsilon word.
- to_pda() PDA[source]
Converts the CFG to a PDA that generates on empty stack an equivalent language
- Returns:
- new_pda
PDA The equivalent PDA when accepting on empty stack
- new_pda
- to_text()[source]
Turns the grammar into its string representation. This might lose some type information and the start_symbol. :returns: text – The grammar as a string. :rtype: str
- property variables: AbstractSet[Variable]
Gives the variables
- Returns:
- variablesset of
Variable The variables of the CFG
- variablesset of
- class pyformlang.cfg.LLOneParser(cfg)[source]
A LL(1) parser
- Parameters:
- cfg
CFG A context-free Grammar
- cfg
- get_llone_parse_tree(word)[source]
Get LL(1) parse Tree
- Parameters:
- wordlist
The word to parse
- Returns:
- parse_tree
ParseTree The parse tree
- parse_tree
- Raises:
- NotParsableException
When the word cannot be parsed
- get_llone_parsing_table()[source]
Get the LL(1) parsing table From: https://www.slideshare.net/MahbuburRahman273/ll1-parser-in-compilers
- class pyformlang.cfg.Production(head: Variable, body: List[CFGObject], filtering=True)[source]
A production or rule of a CFG
- Parameters:
- head
Variable The head of the production
- bodyiterable of
CFGObject The body of the production
- head