MIDACO Parallelization in Python (Spark)

MIDACO parallelization with Apache Spark is especially useful for (massive) parallelization on cluster- and cloud-computing systems which consist of individual virtual machines (VM). Such system are provided for example by the Amazon EC2 service, Digital Ocean and many industrial and academic institutions. In the future, a detailed and easy to follow step-by-step instruction will be provided here, explaining:

 

  • How to install and setup a Spark cluster of any size
  • How to run MIDACO in parallel on such Spark cluster
  • How to couple such approach with any programming language

 

Note that such Spark / MIDACO cluster uses Python as programming language for the Spark driver but is powerful enough to be coupled with applications written in any programming language or which are available only via some executable/library/third-party-software. Below is a screenshot of MIDACO running with a parallelization factor of P=1000 on a 32 Quadcore-CPU cluster using Apache Spark.

 

 

 Please feel free to contact us for more information and support on setting up a Spark / MIDACO cluster.