Context Free Grammar

pyformlang.cfg

This submodule implements functions related to context-free grammars.

Available Classes

CFG

The main context-free grammar class

Production

A class to represent a production in a CFG

Variable

A context-free grammar variable

Terminal

A context-free grammar terminal

Epsilon

The epsilon symbol (special terminal)

class pyformlang.cfg.CFG(variables: AbstractSet[Variable | str] = None, terminals: AbstractSet[Terminal | str] = None, start_symbol: Variable | str = None, productions: Iterable[Production] = None)[source]

A class representing a context free grammar

Parameters:
variablesset of Variable, optional

The variables of the CFG

terminalsset of Terminal, optional

The terminals of the CFG

start_symbolVariable, optional

The start symbol

productionsset of Production, optional

The productions or rules of the CFG

concatenate(other: CFG) CFG[source]

Makes the concatenation of two CFGs

Equivalent to:

>> cfg0 + cfg1

Parameters:
otherCFG

The other CFG to concatenate with

Returns:
new_cfgCFG

The CFG resulting of the concatenation of the two CFGs

contains(word: Iterable[Terminal | str]) bool[source]

Gives the membership of a word to the grammar

Parameters:
worditerable of Terminal

The word to check

Returns:
containsbool

Whether word if in the CFG or not

eliminate_unit_productions() CFG[source]

Eliminate all the unit production in the CFG

Returns:
new_cfgCFG

A new CFG equivalent without unit productions

classmethod from_text(text, start_symbol=Variable(S))[source]

Read a context free grammar from a text. The text contains one rule per line. The structure of a production is: head -> body1 | body2 | … | bodyn where | separates the bodies. A variable (or non terminal) begins by a capital letter. A terminal begins by a non-capital character Terminals and Variables are separated by spaces. An epsilon symbol can be represented by epsilon, $, ε, ϵ or Є. If you want to have a variable name starting with a non-capital letter or a terminal starting with a capital letter, you can explicitly give the type of your symbol with “VAR:yourVariableName” or “TER:yourTerminalName” (with the quotation marks). For example: S -> “TER:John” “VAR:d” a b

Parameters:
textstr

The text of transform

start_symbolstr, optional

The start symbol, S by default

Returns:
cfgCFG

A context free grammar.

generate_epsilon()[source]

Whether the grammar generates epsilon or not

Returns:
generate_epsilonbool

Whether epsilon is generated or not by the CFG

get_closure() CFG[source]

Gets the closure of the CFG (*)

Returns:
new_cfgCFG

The closure of the current CFG

get_cnf_parse_tree(word)[source]

Get a parse tree of the CNF of this grammar

Parameters:
worditerable of Terminal

The word to look for

Returns:
derivationParseTree

The parse tree

get_generating_symbols() AbstractSet[CFGObject][source]

Gives the objects which are generating in the CFG

Returns:
generating_symbolsset of CFGObject

The generating symbols of the CFG

get_nullable_symbols() AbstractSet[CFGObject][source]

Gives the objects which are nullable in the CFG

Returns:
nullable_symbolsset of CFGObject

The nullable symbols of the CFG

get_positive_closure() CFG[source]

Gets the positive closure of the CFG (+)

Returns:
new_cfgCFG

The positive closure of the current CFG

get_prefix_language()[source]

Generates the prefix language of the CFL, i.e., the language of all prefixes of valid words Based on https://cs.uwaterloo.ca/~s4bendav/files/CS360S21Lec10.pdf

get_reachable_symbols() AbstractSet[CFGObject][source]

Gives the objects which are reachable in the CFG

Returns:
reachable_symbolsset of CFGObject

The reachable symbols of the CFG

get_suffix_language()[source]

Generates the suffix language of the CFL, i.e., the language containing all suffixes of valid words

get_unit_pairs() AbstractSet[Tuple[Variable, Variable]][source]

Finds all the unit pairs

Returns:
unit_pairsset of tuple of Variable

The unit pairs

get_words(max_length: int = -1)[source]

Get the words generated by the CFG

Parameters:
max_lengthint

The maximum length of the words to return

intersection(other: Any) CFG[source]

Gives the intersection of the current CFG with an other object

Equivalent to:

>> cfg and regex

Parameters:
otherany

The other object

Returns:
new_cfgCFG

A CFG representing the intersection with the other object

Raises:
NotImplementedError

When trying to intersect with something else than a regex or a finite automaton

is_empty() bool[source]

Says whether the CFG is empty or not

Returns:
is_emptybool

Whether the CFG is empty or not

is_finite() bool[source]

Tests if the grammar is finite or not

Returns:
is_finitebool

Whether the grammar is finite or not

is_normal_form()[source]

Tells is the current grammar is in Chomsky Normal Form or not

Returns:
is_normal_formbool

If the current grammar is in CNF

property productions: AbstractSet[Production]

Gives the productions

Returns:
productionsset of Production

The productions of the CFG

remove_epsilon() CFG[source]

Removes the epsilon of a cfg

Returns:
new_cfgCFG

The CFG without epsilons

remove_useless_symbols() CFG[source]

Removes useless symbols in a CFG

Returns:
new_cfgCFG

The CFG without useless symbols

reverse() CFG[source]

Reverse the current CFG

Equivalent to:

>> ~cfg

Returns:
new_cfgCFG

Reverse the current CFG

property start_symbol: Variable

Gives the start symbol

Returns:
start_variableVariable

The start symbol of the CFG

substitute(substitution: Dict[Terminal, CFG]) CFG[source]

Substitutes CFG to terminals in the current CFG

Parameters:
substitutiondict of Terminal to CFG

A substitution

Returns:
new_cfgCFG

A new CFG recognizing the substitution

property terminals: AbstractSet[Terminal]

Gives the terminals

Returns:
terminalsset of Terminal

The terminals of the CFG

to_normal_form() CFG[source]

Gets the Chomsky Normal Form of a CFG

Returns:
new_cfgCFG

A new CFG equivalent in the CNF form

Warning

As described in Hopcroft’s textbook, a normal form does not generate the epsilon word. So, the grammar generated by this function is equivalent to the original grammar except if this grammar generates the epsilon word. In that case, the language of the generated grammar contains the same word as before, except the epsilon word.

to_pda() PDA[source]

Converts the CFG to a PDA that generates on empty stack an equivalent language

Returns:
new_pdaPDA

The equivalent PDA when accepting on empty stack

to_text()[source]

Turns the grammar into its string representation. This might lose some type information and the start_symbol. :returns: text – The grammar as a string. :rtype: str

union(other: CFG) CFG[source]

Makes the union of two CFGs

Equivalent to:

>> cfg0 or cfg1

Parameters:
otherCFG

The other CFG to unite with

Returns:
new_cfgCFG

The CFG resulting of the union of the two CFGs

property variables: AbstractSet[Variable]

Gives the variables

Returns:
variablesset of Variable

The variables of the CFG

class pyformlang.cfg.Epsilon[source]

An epsilon terminal

to_text() str[source]

Turns the object into a text format

class pyformlang.cfg.LLOneParser(cfg)[source]

A LL(1) parser

Parameters:
cfgCFG

A context-free Grammar

get_first_set()[source]

Used in LL(1)

get_follow_set()[source]

Get follow set

get_llone_parse_tree(word)[source]

Get LL(1) parse Tree

Parameters:
wordlist

The word to parse

Returns:
parse_treeParseTree

The parse tree

Raises:
NotParsableException

When the word cannot be parsed

get_llone_parsing_table()[source]

Get the LL(1) parsing table From: https://www.slideshare.net/MahbuburRahman273/ll1-parser-in-compilers

is_llone_parsable()[source]

Checks whether the grammar can be parse with the LL(1) parser.

Returns:
is_parsablebool
class pyformlang.cfg.Production(head: Variable, body: List[CFGObject], filtering=True)[source]

A production or rule of a CFG

Parameters:
headVariable

The head of the production

bodyiterable of CFGObject

The body of the production

property body: List[CFGObject]

Get the body objects

property head: Variable

Get the head variable

is_normal_form()[source]

Tells is the production is in Chomsky Normal Form

Returns:
is_normal_formbool

If the production is in CNF

class pyformlang.cfg.Terminal(value: Any)[source]

A terminal in a CFG

to_text() str[source]

Turns the object into a text format

class pyformlang.cfg.Variable(value)[source]

An variable in a CFG

Parameters:
valueany

The value of the variable

to_text() str[source]

Turns the object into a text format