Modules Reference

Renderer

Module containing the render engine

class renderer.ModuleMap

The main mapping that links modules to their name through the get method.

(Essentially an enum or dictionary)

classmethod add_module(module_name, module)

Set a new module in the mapping.

Parameters:
  • module_name (str) – Name of the new module
  • module (BaseModule) – Module class to add to the mapping
classmethod get(module_type)

Returns the module corresponding to the name passed in argument.

Parameters:module_type (str) – The desired module type.
Return type:Type[BaseModule]
class renderer.Renderer(module_list, template_dir)

The render engine that can build the DAG, check the integrity of the operation graph and generate the rendered Scala code.

Parameters:
  • module_list (List[Dict[str, Any]]) – The list of module specifications to be parsed and added to the operation graph.
  • template_dir (str) – The path to the template directory.
check_integrity()

Check the integrity of the graph. Should be called after all the modules have been added to the graph (i.e. after initialization).

get_rendered()

Get the rendered code from the module list.

render_pdf_graph()

Create the graphviz Digraph and render the pdf output graph.

Base Module

The module containing the abstract base class for all operation modules used throughout the rest of the code.

class modules.base_module.BaseModule(module, env, named_modules)

The abstract base class for modules. All modules are subclasses of BaseModule

Every module object passed to the constructor must contain the moduleType and name fields.

All modules expose the following common API.

Parameters:
  • module (dict) – The dict containing the specification of the module. Every module has this parameter that should contain the fields from all its parent classes.
  • env (jinja2.Environment) – The jinja environment where the templates can be retrieved.
  • named_modules (Dict[str, Type[BaseModule]]) – A list of all the other modules of the DAG.
add_to_graph(graph)

A method for adding the module to a graphviz graph instance.

Parameters:graph (graphviz.dot.Digraph) – A graphviz Digraph object
check_integrity()

Performs some check on the upstream modules and types when necessary to ensure the integrity of the DAG.

get_out_type()

Returns the output type of the module as a list of strings.

Return type:List[str]
rendered_result()

Returns a pair of strings containing the rendered lines of codes and external classes or objects definitions.

Return type:Tuple[str, str]
to_graph_repr()

Generate the representation of the node in the form

Name Type: $moduleType

Used for pdf graph generation

Return type:str

Adding Modules

To create new modules with new functionalities, one can subclass any of the following base classes:

  • BaseModule: Works for any new module.
  • FileImporter: For extractor modules with files as input.
  • UnaryOperation: For modules that do work on one input data flow.
  • BinaryOperation: For modules that implement an operation on two separate inputs.
  • FileOutput: For output modules with files as output.

When implementing a new module, one should use the following template:

class MyModule(BaseModule): # Any of the base classes
    """ Documentation of the module

    Args:
        module (dict): Description of the module dict to
            be passed as argument.
    """
    def __init__(self, module, env: Environment, named_modules):
        super().__init__(module, env, named_modules)

        self.template_path = # Path to template
        self.template = self.env.get_template(self.template_path)

    def rendered_result(self) -> Tuple[str, str]:
        return self.template.render(
            name=self.name,
            # Other arguments
        ), '' # or ext template if applicable

    def get_out_type(self):
        # This function should return the output type of the module
        # as a list of strings.

    def check_integrity(self):
        # This function performs integrity checks if applicable.

The module should have a scala template associated with it for generating the corresponding code.

// ===== My module {{name}} =====

// Insert code here
val {{name}} = // The Flink Dataset

Once the module is defined, it can be added to the rendering engine by adding it to the ModuleMap class directly for example.