MIDACO Parallelization in Python (Spark)


MIDACO 6.0 Python Spark Gateway




Above gateway uses Spark instead of Multiprocessing

in order to execute several solution candidats in parallel.


MIDACO parallelization with Apache Spark is especially useful for (massive) parallelization on cluster- and cloud-computing systems which consist of individual virtual machines (VM). Such system are provided for example by Amazon EC2, Google Cloud, IBM Cloud, Digital Ocean and many academic institutions.



Setup a Spark Cluster (Linux)


Step 0.1

 Setup several virtual machines (VM). Each VM with its own IP, for example:

 IP-VM1 =, IP-VM2 =, IP-VM3 =

 Ensure that each VM can access each other via SSH Keys

Step 0.2

 Download Spark:  https://spark.apache.org/downloads.html

 For example:  spark-2.3.0-bin-hadoop2.7.tgz (pre-built for Apache Hadoop)

Step 0.3  Store a copy of the unzipped spark folder on every VM. Name it for example "spark"
Step 0.4

 Select one VM (e.g. as master node by executing the command:

 ./spark/sbin/start-master.sh --host

Step 0.5

 Select each other VM as slave node by executing the command:

 ./spark/sbin/start-slave.sh spark://

Step 0.6

 The Spark cluster should now be up and running. Visiting the address  in a web-browser should now look something like this


Running MIDACO on the Spark Cluster


Step 1  Download above MIDACO python spark gateway and remove .txt extension
Step 2  Download appropriate library file midacopy.dll or midacopy.so here
Step 3  Download an example (e.g. example.py) and remove .txt extension
Step 4

 Execute MIDACO on the Spark cluster with a command like this:

  ./spark/bin/spark-submit --master spark:// example.py


Note: The advanced Text-I/O examples are particularly well suited to be used with Spark



 Screenshot of MIDACO running on a Spark Cluster with 32 Quad-Core CPU's



Comprehensive MIDACO Spark step-by-step instruction is  [ under construction ]

These are some preliminary bash scripts for a 36 machine spark cluster:



Note that Spark relevant commands inside midaco.py.txt itself are minimal:

[Line  24]   from pyspark import SparkContext
[Line 237]   sc = SparkContext(appName="MIDACO-SPARK-PARALLEL")
[Line 254]   rdd = sc.parallelize( A , p ).map(lambda x: problem_function(x))
[Line 256]   B = rdd.take(p)