What tool boxes do I need to integrate with Hadoop.

1 view (last 30 days)
Hi, I am currently looking into integrating Matlab with a Hadoop Cluster. I have looked all over the website but it isn't clear which tool boxes are actually necessary to do this, I know that Matlab Compiler, Parallel Computing Tool Box, and the Matlab Distributed Computing Server(MDCS), are related, but I have found the website very unclear, and if all, none, or some of these are actually necessary. Thanks

Accepted Answer

Esther
Esther on 18 Nov 2015
Hi Adam,
To integrate MATLAB with a cluster (whether a Hadoop cluster or some other generic cluster), you need MATLAB Distributed Computing Server (MDCS).
Then to send mapreduce jobs to that Hadoop cluster from MATLAB, you'll need at minimum Parallel Computing Toolbox.
Matlab Compiler is only required if you wish to package MapReduce based algorithms for deploying to production Hadoop systems.
Required:
  • MATLAB, MDCS, Parallel Computing Toolbox
Optional:
  • Matlab Compiler
  1 Comment
Adam Neufeldt
Adam Neufeldt on 18 Nov 2015
I actually ended up contacting them and had a phone call with one of their engineers and here are the notes from that meeting:
There are two methods:
  • Method 1: With the parallel computing tool box(installed locally on each of our machines) and the MATLAB Distributed Computing Server(installed on the Hadoop Cluster)
-This runs interactively on a live session. You can write and test code and have it run instantaneously and it is almost identical to how you normally use Matlab except you will have all of the additional computing power of all of the cores, and you would be using Map Reduce algorithms.
  • Method 2: Matlab Compiler
- Can compile Analytics into an exe(Hadoop specific) which can then run on the cluster(so it is not intereactive). With no tool boxes at all you can still download data from the Hadoop cluster, and write and test Map Reduce algorithms on a small section of the cluster.
You can of course combine these two methods, by testing and debugging your code on the entire cluster by using the MDCS and parallel computing toolbox interactively, and then compiling the code.

Sign in to comment.

More Answers (0)

Categories

Find more on MATLAB Parallel Server in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!