Extractors¶
The package containing all the extractor operation modules
File Importer¶
The file importer operation module base class
-
class
modules.extractors.file_importer.FileImporter(module, env, named_modules)¶ File Importer is an abstract class that is used for building modules that read files on disk.
It cannot be used by as is because it is an abstract class.
Parameters: module (dict) – The module dict must have a pathfield that contains the path to the file to be read by the module (Ex:~/project/file.csv).-
add_to_graph(graph)¶ A method for adding the module to a graphviz graph instance.
Parameters: graph (graphviz.dot.Digraph) – A graphviz Digraph object
-
check_integrity()¶ Performs some check on the upstream modules and types when necessary to ensure the integrity of the DAG.
-
rendered_result()¶ Returns a pair of strings containing the rendered lines of codes and external classes or objects definitions.
Return type: Tuple[str,str]
-
CSV Importer¶
The CSV loader operation module
-
class
modules.extractors.csv_importer.CsvImporter(module, env, named_modules)¶ Bases:
modules.extractors.file_importer.FileImporterMain CSV loader operation module class.
Parameters: module (dict) – The module dict must have a
dataTypefield that contains the input types as a list of strings. (Ex:["String", "Int", "Double"])- Other optional fields are:
fieldDelimiter(csv delimiter if other than comma, Ex:"|")quoteCharacter(don’t separate within quoted fields, Ex:""")namedFields(for selecting only some of the columns by their name, Ex:["name", "age"])
-
get_out_type()¶ Returns the output type of the module as a list of strings.
-
rendered_result()¶ Returns a pair of strings containing the rendered lines of codes and external classes or objects definitions.
Return type: Tuple[str,str]
JSON Importer¶
Warning: The JSON importer has limited extraction capacities and the MongoDB should be used instead when possible.
The JSON loader operation module
-
class
modules.extractors.json_importer.JsonImporter(module, env, named_modules)¶ Bases:
modules.extractors.file_importer.FileImporterMain JSON loader operation module class
-
get_out_type()¶ Returns the output type of the module as a list of strings.
-
rendered_result()¶ Returns a pair of strings containing the rendered lines of codes and external classes or objects definitions.
Return type: Tuple[str,str]
-
DB Importer¶
The Database loader operation module
-
class
modules.extractors.db_importer.DbImporter(module, env, named_modules)¶ Bases:
modules.base_module.BaseModuleMain database loader operation module class.
Parameters: module (dict) – The module dict must have:
- A
dbUrlfield with the database entrypoint for JDBC. (e.g for a Postgres db named test running on localhost"jdbc:postgresql://localhost/test"). - A
dataTypefield with the input data types (Ex:["String", "Int", "Double"]). - The names of the desired columns in
fieldNames(Ex:["age", "date", "name"]). - The
queryto be interpreted by the db.
- Other optional fields are:
filterNulla boolean value for filtering null values from the db output.
-
add_to_graph(graph)¶ A method for adding the module to a graphviz graph instance.
Parameters: graph (graphviz.dot.Digraph) – A graphviz Digraph object
-
check_integrity()¶ Performs some check on the upstream modules and types when necessary to ensure the integrity of the DAG.
-
get_out_type()¶ Returns the output type of the module as a list of strings.
-
rendered_result()¶ Returns a pair of strings containing the rendered lines of codes and external classes or objects definitions.
Return type: Tuple[str,str]
- A
Mongo Importer¶
The Mongo loader operation module
-
class
modules.extractors.mongo_importer.MongoImporter(module, env, named_modules)¶ Bases:
modules.base_module.BaseModuleMain Mongo loader operation module class The Mongo loader allows to retreive an arbitrary number of fields from MongoDb Documents on convert the into a Flink DataSet.
Parameters: module (dict) – The module dict must have a dbNamefield with the name of the DB (ex:"hatvpDb"), acollectionwith the name of the desired collection (ex:"publications"), therequiredFieldsof the obtained documents (ex:["age", "name"])-
add_to_graph(graph)¶ A method for adding the module to a graphviz graph instance.
Parameters: graph (graphviz.dot.Digraph) – A graphviz Digraph object
-
check_integrity()¶ Performs some check on the upstream modules and types when necessary to ensure the integrity of the DAG.
-
get_out_type()¶ Returns the output type of the module as a list of strings.
-
rendered_result()¶ Returns a pair of strings containing the rendered lines of codes and external classes or objects definitions.
Return type: Tuple[str,str]
-