Context Free Grammar¶
pyformlang.cfg
¶
This submodule implements functions related to context-free grammars.
Available Classes¶
- CFG
The main context-free grammar class
- Production
A class to represent a production in a CFG
- Variable
A context-free grammar variable
- Terminal
A context-free grammar terminal
- Epsilon
The epsilon symbol (special terminal)
- class pyformlang.cfg.CFG(variables: Optional[AbstractSet[pyformlang.cfg.variable.Variable]] = None, terminals: Optional[AbstractSet[pyformlang.cfg.terminal.Terminal]] = None, start_symbol: Optional[pyformlang.cfg.variable.Variable] = None, productions: Optional[Iterable[pyformlang.cfg.production.Production]] = None)[source]¶
A class representing a context free grammar
- Parameters
variables (set of
Variable
, optional) – The variables of the CFGterminals (set of
Terminal
, optional) – The terminals of the CFGstart_symbol (
Variable
, optional) – The start symbolproductions (set of
Production
, optional) – The productions or rules of the CFG
- concatenate(other: pyformlang.cfg.cfg.CFG) pyformlang.cfg.cfg.CFG [source]¶
Makes the concatenation of two CFGs
- Equivalent to:
>> cfg0 + cfg1
- contains(word: Iterable[pyformlang.cfg.terminal.Terminal]) bool [source]¶
Gives the membership of a word to the grammar
- eliminate_unit_productions() pyformlang.cfg.cfg.CFG [source]¶
Eliminate all the unit production in the CFG
- Returns
new_cfg – A new CFG equivalent without unit productions
- Return type
- classmethod from_text(text, start_symbol=Variable(S))[source]¶
Read a context free grammar from a text. The text contains one rule per line. The structure of a production is: head -> body1 | body2 | … | bodyn where | separates the bodies. A variable (or non terminal) begins by a capital letter. A terminal begins by a non-capital character Terminals and Variables are separated by spaces. An epsilon symbol can be represented by epsilon, $, ε, ϵ or Є. If you want to have a variable name starting with a non-capital letter or a terminal starting with a capital letter, you can explicitly give the type of your symbol with “VAR:yourVariableName” or “TER:yourTerminalName” (with the quotation marks). For example: S -> “TER:John” “VAR:d” a b
- generate_epsilon()[source]¶
Whether the grammar generates epsilon or not
- Returns
generate_epsilon – Whether epsilon is generated or not by the CFG
- Return type
- get_closure() pyformlang.cfg.cfg.CFG [source]¶
Gets the closure of the CFG (*)
- Returns
new_cfg – The closure of the current CFG
- Return type
- get_cnf_parse_tree(word)[source]¶
Get a parse tree of the CNF of this grammar
- Parameters
word (iterable of
Terminal
) – The word to look for- Returns
derivation – The parse tree
- Return type
ParseTree
- get_generating_symbols() AbstractSet[pyformlang.cfg.cfg_object.CFGObject] [source]¶
Gives the objects which are generating in the CFG
- Returns
generating_symbols – The generating symbols of the CFG
- Return type
set of
CFGObject
- get_nullable_symbols() AbstractSet[pyformlang.cfg.cfg_object.CFGObject] [source]¶
Gives the objects which are nullable in the CFG
- Returns
nullable_symbols – The nullable symbols of the CFG
- Return type
set of
CFGObject
- get_positive_closure() pyformlang.cfg.cfg.CFG [source]¶
Gets the positive closure of the CFG (+)
- Returns
new_cfg – The positive closure of the current CFG
- Return type
- get_reachable_symbols() AbstractSet[pyformlang.cfg.cfg_object.CFGObject] [source]¶
Gives the objects which are reachable in the CFG
- Returns
reachable_symbols – The reachable symbols of the CFG
- Return type
set of
CFGObject
- get_unit_pairs() AbstractSet[Tuple[pyformlang.cfg.variable.Variable, pyformlang.cfg.variable.Variable]] [source]¶
Finds all the unit pairs
- Returns
unit_pairs – The unit pairs
- Return type
set of tuple of
Variable
- get_words(max_length: int = - 1)[source]¶
Get the words generated by the CFG
- Parameters
max_length (int) – The maximum length of the words to return
- intersection(other: Any) pyformlang.cfg.cfg.CFG [source]¶
Gives the intersection of the current CFG with an other object
- Equivalent to:
>> cfg and regex
- Parameters
other (any) – The other object
- Returns
new_cfg – A CFG representing the intersection with the other object
- Return type
- Raises
NotImplementedError – When trying to intersect with something else than a regex or a finite automaton
- is_empty() bool [source]¶
Says whether the CFG is empty or not
- Returns
is_empty – Whether the CFG is empty or not
- Return type
- is_finite() bool [source]¶
Tests if the grammar is finite or not
- Returns
is_finite – Whether the grammar is finite or not
- Return type
- is_normal_form()[source]¶
Tells is the current grammar is in Chomsky Normal Form or not
- Returns
is_normal_form – If the current grammar is in CNF
- Return type
- property productions: AbstractSet[pyformlang.cfg.production.Production]¶
Gives the productions
- Returns
productions – The productions of the CFG
- Return type
set of
Production
- remove_epsilon() pyformlang.cfg.cfg.CFG [source]¶
Removes the epsilon of a cfg
- Returns
new_cfg – The CFG without epsilons
- Return type
- remove_useless_symbols() pyformlang.cfg.cfg.CFG [source]¶
Removes useless symbols in a CFG
- Returns
new_cfg – The CFG without useless symbols
- Return type
- reverse() pyformlang.cfg.cfg.CFG [source]¶
Reverse the current CFG
- Equivalent to:
>> ~cfg
- Returns
new_cfg – Reverse the current CFG
- Return type
- property start_symbol: pyformlang.cfg.variable.Variable¶
Gives the start symbol
- Returns
start_variable – The start symbol of the CFG
- Return type
- substitute(substitution: Dict[pyformlang.cfg.terminal.Terminal, pyformlang.cfg.cfg.CFG]) pyformlang.cfg.cfg.CFG [source]¶
Substitutes CFG to terminals in the current CFG
- property terminals: AbstractSet[pyformlang.cfg.terminal.Terminal]¶
Gives the terminals
- Returns
terminals – The terminals of the CFG
- Return type
set of
Terminal
- to_normal_form() pyformlang.cfg.cfg.CFG [source]¶
Gets the Chomsky Normal Form of a CFG
- Returns
new_cfg – A new CFG equivalent in the CNF form
- Return type
Warning
As described in Hopcroft’s textbook, a normal form does not generate the epsilon word. So, the grammar generated by this function is equivalent to the original grammar except if this grammar generates the epsilon word. In that case, the language of the generated grammar contains the same word as before, except the epsilon word.
- to_pda() pyformlang.pda.pda.PDA [source]¶
Converts the CFG to a PDA that generates on empty stack an equivalent language
- Returns
new_pda – The equivalent PDA when accepting on empty stack
- Return type
- to_text()[source]¶
Turns the grammar into its string representation. This might lose some type information and the start_symbol. :returns: text – The grammar as a string. :rtype: str
- union(other: pyformlang.cfg.cfg.CFG) pyformlang.cfg.cfg.CFG [source]¶
Makes the union of two CFGs
- Equivalent to:
>> cfg0 or cfg1
- class pyformlang.cfg.LLOneParser(cfg)[source]¶
A LL(1) parser
- Parameters
cfg (
CFG
) – A context-free Grammar
- get_llone_parse_tree(word)[source]¶
Get LL(1) parse Tree
- Parameters
word (list) – The word to parse
- Returns
parse_tree – The parse tree
- Return type
ParseTree
- Raises
NotParsableException – When the word cannot be parsed
- get_llone_parsing_table()[source]¶
Get the LL(1) parsing table From: https://www.slideshare.net/MahbuburRahman273/ll1-parser-in-compilers
- class pyformlang.cfg.Production(head: pyformlang.cfg.variable.Variable, body: List[pyformlang.cfg.cfg_object.CFGObject], filtering=True)[source]¶
A production or rule of a CFG
- Parameters
head (
Variable
) – The head of the productionbody (iterable of
CFGObject
) – The body of the production
- property body: List[pyformlang.cfg.cfg_object.CFGObject]¶
Get the body objects
- property head: pyformlang.cfg.variable.Variable¶
Get the head variable