Context Free Grammar

pyformlang.cfg

This submodule implements functions related to context-free grammars.

Available Classes

CFG

The main context-free grammar class

Production

A class to represent a production in a CFG

Variable

A context-free grammar variable

Terminal

A context-free grammar terminal

Epsilon

The epsilon symbol (special terminal)

class pyformlang.cfg.CFG(variables: Optional[AbstractSet[pyformlang.cfg.variable.Variable]] = None, terminals: Optional[AbstractSet[pyformlang.cfg.terminal.Terminal]] = None, start_symbol: Optional[pyformlang.cfg.variable.Variable] = None, productions: Optional[Iterable[pyformlang.cfg.production.Production]] = None)[source]

A class representing a context free grammar

Parameters
  • variables (set of Variable, optional) – The variables of the CFG

  • terminals (set of Terminal, optional) – The terminals of the CFG

  • start_symbol (Variable, optional) – The start symbol

  • productions (set of Production, optional) – The productions or rules of the CFG

concatenate(other: pyformlang.cfg.cfg.CFG) pyformlang.cfg.cfg.CFG[source]

Makes the concatenation of two CFGs

Equivalent to:

>> cfg0 + cfg1

Parameters

other (CFG) – The other CFG to concatenate with

Returns

new_cfg – The CFG resulting of the concatenation of the two CFGs

Return type

CFG

contains(word: Iterable[pyformlang.cfg.terminal.Terminal]) bool[source]

Gives the membership of a word to the grammar

Parameters

word (iterable of Terminal) – The word to check

Returns

contains – Whether word if in the CFG or not

Return type

bool

eliminate_unit_productions() pyformlang.cfg.cfg.CFG[source]

Eliminate all the unit production in the CFG

Returns

new_cfg – A new CFG equivalent without unit productions

Return type

CFG

classmethod from_text(text, start_symbol=Variable(S))[source]

Read a context free grammar from a text. The text contains one rule per line. The structure of a production is: head -> body1 | body2 | … | bodyn where | separates the bodies. A variable (or non terminal) begins by a capital letter. A terminal begins by a non-capital character Terminals and Variables are separated by spaces. An epsilon symbol can be represented by epsilon, $, ε, ϵ or Є. If you want to have a variable name starting with a non-capital letter or a terminal starting with a capital letter, you can explicitly give the type of your symbol with “VAR:yourVariableName” or “TER:yourTerminalName” (with the quotation marks). For example: S -> “TER:John” “VAR:d” a b

Parameters
  • text (str) – The text of transform

  • start_symbol (str, optional) – The start symbol, S by default

Returns

cfg – A context free grammar.

Return type

CFG

generate_epsilon()[source]

Whether the grammar generates epsilon or not

Returns

generate_epsilon – Whether epsilon is generated or not by the CFG

Return type

bool

get_closure() pyformlang.cfg.cfg.CFG[source]

Gets the closure of the CFG (*)

Returns

new_cfg – The closure of the current CFG

Return type

CFG

get_cnf_parse_tree(word)[source]

Get a parse tree of the CNF of this grammar

Parameters

word (iterable of Terminal) – The word to look for

Returns

derivation – The parse tree

Return type

ParseTree

get_generating_symbols() AbstractSet[pyformlang.cfg.cfg_object.CFGObject][source]

Gives the objects which are generating in the CFG

Returns

generating_symbols – The generating symbols of the CFG

Return type

set of CFGObject

get_nullable_symbols() AbstractSet[pyformlang.cfg.cfg_object.CFGObject][source]

Gives the objects which are nullable in the CFG

Returns

nullable_symbols – The nullable symbols of the CFG

Return type

set of CFGObject

get_positive_closure() pyformlang.cfg.cfg.CFG[source]

Gets the positive closure of the CFG (+)

Returns

new_cfg – The positive closure of the current CFG

Return type

CFG

get_reachable_symbols() AbstractSet[pyformlang.cfg.cfg_object.CFGObject][source]

Gives the objects which are reachable in the CFG

Returns

reachable_symbols – The reachable symbols of the CFG

Return type

set of CFGObject

get_unit_pairs() AbstractSet[Tuple[pyformlang.cfg.variable.Variable, pyformlang.cfg.variable.Variable]][source]

Finds all the unit pairs

Returns

unit_pairs – The unit pairs

Return type

set of tuple of Variable

get_words(max_length: int = - 1)[source]

Get the words generated by the CFG

Parameters

max_length (int) – The maximum length of the words to return

intersection(other: Any) pyformlang.cfg.cfg.CFG[source]

Gives the intersection of the current CFG with an other object

Equivalent to:

>> cfg and regex

Parameters

other (any) – The other object

Returns

new_cfg – A CFG representing the intersection with the other object

Return type

CFG

Raises

NotImplementedError – When trying to intersect with something else than a regex or a finite automaton

is_empty() bool[source]

Says whether the CFG is empty or not

Returns

is_empty – Whether the CFG is empty or not

Return type

bool

is_finite() bool[source]

Tests if the grammar is finite or not

Returns

is_finite – Whether the grammar is finite or not

Return type

bool

is_normal_form()[source]

Tells is the current grammar is in Chomsky Normal Form or not

Returns

is_normal_form – If the current grammar is in CNF

Return type

bool

property productions: AbstractSet[pyformlang.cfg.production.Production]

Gives the productions

Returns

productions – The productions of the CFG

Return type

set of Production

remove_epsilon() pyformlang.cfg.cfg.CFG[source]

Removes the epsilon of a cfg

Returns

new_cfg – The CFG without epsilons

Return type

CFG

remove_useless_symbols() pyformlang.cfg.cfg.CFG[source]

Removes useless symbols in a CFG

Returns

new_cfg – The CFG without useless symbols

Return type

CFG

reverse() pyformlang.cfg.cfg.CFG[source]

Reverse the current CFG

Equivalent to:

>> ~cfg

Returns

new_cfg – Reverse the current CFG

Return type

CFG

property start_symbol: pyformlang.cfg.variable.Variable

Gives the start symbol

Returns

start_variable – The start symbol of the CFG

Return type

Variable

substitute(substitution: Dict[pyformlang.cfg.terminal.Terminal, pyformlang.cfg.cfg.CFG]) pyformlang.cfg.cfg.CFG[source]

Substitutes CFG to terminals in the current CFG

Parameters

substitution (dict of Terminal to CFG) – A substitution

Returns

new_cfg – A new CFG recognizing the substitution

Return type

CFG

property terminals: AbstractSet[pyformlang.cfg.terminal.Terminal]

Gives the terminals

Returns

terminals – The terminals of the CFG

Return type

set of Terminal

to_normal_form() pyformlang.cfg.cfg.CFG[source]

Gets the Chomsky Normal Form of a CFG

Returns

new_cfg – A new CFG equivalent in the CNF form

Return type

CFG

Warning

As described in Hopcroft’s textbook, a normal form does not generate the epsilon word. So, the grammar generated by this function is equivalent to the original grammar except if this grammar generates the epsilon word. In that case, the language of the generated grammar contains the same word as before, except the epsilon word.

to_pda() pyformlang.pda.pda.PDA[source]

Converts the CFG to a PDA that generates on empty stack an equivalent language

Returns

new_pda – The equivalent PDA when accepting on empty stack

Return type

PDA

to_text()[source]

Turns the grammar into its string representation. This might lose some type information and the start_symbol. :returns: text – The grammar as a string. :rtype: str

union(other: pyformlang.cfg.cfg.CFG) pyformlang.cfg.cfg.CFG[source]

Makes the union of two CFGs

Equivalent to:

>> cfg0 or cfg1

Parameters

other (CFG) – The other CFG to unite with

Returns

new_cfg – The CFG resulting of the union of the two CFGs

Return type

CFG

property variables: AbstractSet[pyformlang.cfg.variable.Variable]

Gives the variables

Returns

variables – The variables of the CFG

Return type

set of Variable

class pyformlang.cfg.Epsilon[source]

An epsilon terminal

class pyformlang.cfg.LLOneParser(cfg)[source]

A LL(1) parser

Parameters

cfg (CFG) – A context-free Grammar

get_first_set()[source]

Used in LL(1)

get_follow_set()[source]

Get follow set

get_llone_parse_tree(word)[source]

Get LL(1) parse Tree

Parameters

word (list) – The word to parse

Returns

parse_tree – The parse tree

Return type

ParseTree

Raises

NotParsableException – When the word cannot be parsed

get_llone_parsing_table()[source]

Get the LL(1) parsing table From: https://www.slideshare.net/MahbuburRahman273/ll1-parser-in-compilers

is_llone_parsable()[source]

Checks whether the grammar can be parse with the LL(1) parser.

Returns

is_parsable

Return type

bool

class pyformlang.cfg.Production(head: pyformlang.cfg.variable.Variable, body: List[pyformlang.cfg.cfg_object.CFGObject], filtering=True)[source]

A production or rule of a CFG

Parameters
  • head (Variable) – The head of the production

  • body (iterable of CFGObject) – The body of the production

property body: List[pyformlang.cfg.cfg_object.CFGObject]

Get the body objects

property head: pyformlang.cfg.variable.Variable

Get the head variable

is_normal_form()[source]

Tells is the production is in Chomsky Normal Form

Returns

is_normal_form – If the production is in CNF

Return type

bool

class pyformlang.cfg.Terminal(value: Any)[source]

An terminal in a CFG

Parameters

value (any) – The value of the terminal

class pyformlang.cfg.Variable(value)[source]

An variable in a CFG

Parameters

value (any) – The value of the variable