Getting Started | Comments | FAQs | HPCC AWS Forum | AWS Management Console

Getting Started


This portal allows anyone with an Amazon AWS account to have their own Thor cluster at the push of a button. A Thor cluster is a high performance computing cluster designed specifically for data-intensive computing. With a click of a button, this site will configure a new security group on your Amazon AWS account, create a security key specifically for your cluster, provision cluster nodes as EC2 instances, install the necessary software on those instances, test them for existing hardware issues, and configure the instances as a Thor cluster. The site also provides additional information to help you manage your cluster and gives you a simple button to terminate your cluster.

Some individuals may want to follow a manual process to learn more about cluster setup and configuration. Instructions for manual setup can be found here. Continue reading for the automated setup.

Logging In

1.  Click the Login link at the top of the page.
2.  Enter your Access Key ID and Secret Access Key.
     NOTE: This information is passed to Amazon and is NOT retained.
     You can find your Amazon Access Keys here
3.  Read the Terms of Use, and click the checkbox to acknowledge you are responsible for AWS charges incurred
     and agree to all the Terms of Service.
4.  Click the Login button to complete the login process.
5.  Immediately after logging in, you will be taken to the View Clusters page.

Launching Your Thor Cluster

1.  Click the Launch Cluster link at the top.
2.  Enter the total number of Thor nodes to launch. The system will allocate the necessary support nodes and display the
     total number of nodes to be launched.
3.  Verify the node numbers are correct, and then press the Launch Cluster button.
4.  The Cluster Launch Log displays, and will update frequently to show what has been completed.
5.  When finished, the log says Done and the Status above the log should indicate Ready.

Congratulations! Your HPCC Systems Thor cluster is now running and ready to perform data analytics.

The View Clusters Page

At any time, after logging in, you can click the View Clusters link to view your active clusters, if any are active.

This page includes many useful links:

     ·   ESP – Launch the ECL Watch page for that cluster
     ·   Log – View the Cluster Launch log
     ·   Config – View the configuration file for the cluster
     ·   IPs – View the list of IP addresses of the nodes used in your cluster
     ·   Key – View the SSH key information for accessing your cluster’s nodes

It also has a link that allows you to Terminate the cluster instantiation.

Installing the ECL IDE

The ECL IDE installs to your Windows workstation. Once you install it, you can use it for any cluster.

1.  From the View Clusters page, click on the ESP link to launch the ECL Watch page for a cluster.
     Take note of the IP address for the ESP server running ECL Watch, you will use it later.
2.  Click on the Resources/Browse link in the left side menu.
3.  Download and save (do not run from your browser) both ECL IDE Installer and Graph Control Installer.
     Install both the ECL IDE and Graph Control. When installation is complete, launch the ECL IDE.
4.  When you open the ECL IDE for the first time, supply the IP address of the ESP server. If this is not the first time,
     you may edit the ESP server IP address by clicking the Preferences button on the Login dialog.
5.  Login using HPCCDemo as the username and password.

You will use the ECL IDE to create and edit your ECL code, run and monitor workunits, view results, and more. There are also many other resources that you may wish to install or investigate. These are available where you found the ECL IDE.

Thor Cluster Readiness, System Status

From the ECLWatch page are links that allow you to view the status of the nodes that are components of your Thor Cluster environment. Depending upon how many nodes you elected to include when you launched, system support nodes may be separate or spread across your Thor nodes. In either case, you can check the status of the system support and Thor nodes from ECLWatch.

From the menu on the left, under Topology, click System Servers to see a list of all the system support nodes in your Thor cluster. Select any or all the checkboxes, click the Submit button, and a multitude of information about those nodes will be shown.

Also under Topology, click Cluster Processes to see a list of the Thor master and slaves. Select any or all the checkboxes, click Submit, and information about these nodes will be shown.

Primarily, in addition to other information, you should see Condition of “Normal,” a State of “Ready,” and nothing under “Processes Down.” Problems, if any, are designated by a yellow/orange highlight. Note, however, that a workunit failure or abort may result in ThorMaster or ThorSlave processes being down temporarily while the system resets itself, and this is normal.

“Hello, HPCC” & Other Testing

1.  In the ECL IDE, open a new Builder window. You can do this via the menu or the icon at the top left.
2.  Type this ECL code into the new window:
     OUTPUT('Hello, HPCC');
3.  Click the Submit button.
4.  You should see a new workunit at the bottom of that window. Click it to view results and other workunit details.
5.  You may also view the workunit in the ECL Watch web page. Click Browse in the Workunits section on the left.

Assuming you have done that successfully, especially if this is your first venture into Thor or the One-Click Thor™, may we recommend you visit the Code Samples page. There you will find a number of self-contained projects, including the ECL and instructions necessary to test with real data.

For a larger, self-contained test, copy and paste the following ECL into a new window of the ECL-IDE. When you submit it to run, be sure to target your Thor (see the Target dropdown list at the top right of the window).

// Sample ECL to build a simple dataset from scratch

// The total number of records to create per node
UNSIGNED8   RecordsPerNode := 1000000;

// The record layout of our dataset, two fields
TestLayout := RECORD
  UNSIGNED4  NodeNumber;
  UNSIGNED4  Field1;
  UNSIGNED4  Field2;

// A one-record source dataset from which we will produce our output
TestInput := DATASET([{0, 0, 0}], TestLayout);

TestLayout AddNodeNumber(TestLayout L, UNSIGNED4 C) := TRANSFORM
  self.NodeNumber := C - 1;                    // DISTRIBUTE is zero-based, COUNTER is one-based
  self            := L;                        // we will set the other field values later

// Create one record for number of nodes in cluster(CLUSTERSIZE is node count)
OneRecordPerNode := NORMALIZE(TestInput, CLUSTERSIZE, AddNodeNumber(LEFT, COUNTER));

// Now distribute by NodeNumber so each node has one record, and NORMALIZE below is parallel
OneRecordOnEachNode := DISTRIBUTE(OneRecordPerNode, NodeNumber);

// The transform function, from which our result records will be produced
TestLayout MakeRecord(TestLayout L) := TRANSFORM
  self.Field1 := RANDOM() % 1000000;  // random numbers between 1 and 1 million
  self.Field2 := RANDOM() % 1000000;
  self        := L;

// Normalize the one-record-per-node dataset to many records (RecordsPerNode) in parallel
AllRecords := NORMALIZE(OneRecordOnEachNode, RecordsPerNode, MakeRecord(LEFT));

// For our right-side join target, use only those where Field2 are greater than Field1
Field2GTField1 := AllRecords(Field2 > Field1);

// Join to produce dataset containing records that have their reverse match
HasReverseMatches := JOIN(AllRecords, Field2GTField1,
                          LEFT.Field1 = Right.Field2 AND LEFT.Field2 = RIGHT.Field1,
                          TRANSFORM(TestLayout, SELF := LEFT)  // Simple transform, take LEFT

// Write result to cluster; add workunit ID to filename so that subsequent tests do not overwrite
OUTPUT(AllRecords, , '~Test::Random_Output_' + WORKUNIT);

// Write the result of the join to the cluster, too
OUTPUT(HasReverseMatches, , '~Test::Has_Reverse_Matches_' + WORKUNIT);

// End of Sample ECL

It generates a dataset from scratch, normalizing and distributing across all nodes in the cluster, fills a couple fields in each record with random numbers, joins that dataset to look for other records with the same values reversed, and writes those results to your cluster. You can change the RecordsPerNode value (now 1,000,000) to change the size of the datasets produced.

It provides a number of things you can then look at in the ECL IDE and in ECL Watch, including the workunit itself. Take a look at the graph, timings, results, and more. Click the links to explore the metadata available to you, including the filenames (in Results) to look at their metadata.

Terminating Your AWS Thor Cluster

As described under View Clusters, we provide a link to terminate each of your clusters. We also recommend you utilize the AWS Management Console directly to confirm your EC2 instances have terminated. Be sure to change the console region in the upper left corner to match the region you launched your cluster in.

Be sure to terminate your clusters when finished with them to prevent incurring additional and ongoing AWS charges from Amazon. Closing our web page, ECLWatch, or the ECL IDE does not terminate your AWS clusters. Keep in mind, however, that all data on the cluster will be destroyed when the cluster is terminated. You will want to de-spray and store any data you wish to keep.

Remember, you are responsible for all AWS charges incurred.

For more detail on using an HPCC Thor cluster on AWS, see Thor on AWS. For more examples and tutorials, see the Documentation pages on the HPCC Systems portal.