Frequently Asked Questions
What is this site?
This site allows users to launch, configure, and terminate HPCC Systems Thor clusters at the push of a button in the Amazon Web Services (AWS) environment. It also gathers and organizes additional information you may need to manage these clusters.
You can learn more about the HPCC Platform at hpccsystems.com.
Who pays for this usage?
Users are responsible for all Amazon charges. HPCC Systems is just providing an easier way of starting a cluster on the Amazon AWS environment.
Is this cloud provided by HPCC Systems?
No. Amazon.com provides this cloud service. HPCC Systems is just facilitating the creation of a cluster.
Is this equivalent to the Amazon EMR?
No. That will come later. HPCC Systems is using the Amazon AWS (Amazon Web Services) platform to help you create Thor Clusters. This is not the ideal environment but it will allow users to start testing and exploring the HPCC Systems Thor platform with relative ease and low cost.
When will an EMR equivalent be available?
HPCC Systems is working with Amazon to create a similar service to EMR (Elastic MapReduce). As the HPCC Systems platform does not use the limited data model of MapReduce, the name should change. Internally, HPCC Systems is using "EECL" to represent Elastic ECL. ECL is the data centric, cluster aware, big data declarative language that the HPCC Systems platform uses. Stay tuned for an announcement on when HPCC Systems and Amazon will have this ready.
What are some of the limitations of this implementation (EC2 AWS) vs an EMR equivalent?
AWS, and specifically EC2, are general purpose computing clouds, while EMR has been designed to provide for a simple way to provision a Hadoop cluster. Our “One-Click” AWS provisioning system for the HPCC Systems Thor platform is in some ways equivalent to EMR; it provides the power of HPCC Thor with an easy-to-use and simple deployment tool. That said, there are differences. We cannot control the proximity of nodes within a particular EC2 availability zone. Also, because EC2 is commodity hardware, you may see hardware errors beyond our control. For that reason, we are currently favoring newer Amazon regions when we stress test very large clusters on Amazon.
Where can I get some ECL examples to try in my test cluster?
Visit the Code Samples page which provides a list of example files suitable to your level of expertise. In addition, HPCC Systems has a growing set of ECL examples on the Web site: Contributions.
Your contributions are also welcome.
What about security?
The clusters you launch through this portal are built on top of AWS. You can read more about Amazon’s security processes here: Amazon Web Services Overview of Security Processes.
Regarding our One-Click Thor ™ solution, all traffic is encrypted in flight via HTTPS. Cookies are cryptographically signed to prevent tampering. To access instances launched through this portal, we use the key-exchange process required of most Amazon AMI’s to launch and manage clusters in their Data Centers.
- Create an AWS Access Key specifically for use on this site.
- Deactivate your AWS Access Key when not in use.
- Download and delete your clusters' ssh key from this site once each cluster is ready.
- Log out of this site when not in use.
- Clear your cookies when accessing this site from an unsecure computer.
Manage Your AWS Access Keys
What Amazon Machine Image (AMI) is being used?
HPCC Systems is using a large instance (m1.large). We are now using our own AMI's that are built from Canonical's Ubuntu 11.10 AMI.
How can I get support?
Visit HPCC Systems Support Forums for free community support. Enterprise level support is also available. Please contact us for details.
What if I need additional support beyond the "free" forums?
Enterprise level support is also available. Please contact us for details.
Will it work from my Mac?
Yes. The provisioning interface is web based, so it will work in any computer that can run a web browser. The client side tools, ECL Integrated Development Environment (IDE) or Eclipse for ECL, can run on MAC under Parallels or a Windows virtual machine. A native Mac port of the ECL IDE and Eclipse plugin allowing you to develop and run ECL code on a Mac will be released soon.
Will it work from my Linux workstation?
A plugin for Eclipse allowing you to develop and run ECL code on a Linux workstation is available now. You can also run the ECL IDE under Wine. Command-line tools are also available for Linux.
How big of a cluster can I build? Any recommendations?
We have tested systems from 1 node up to 1000 on Amazon. We recommend you begin with a 10 node cluster or smaller until you become accustomed to how the environment works. Please note the HPCC Systems Thor platform is significantly more efficient than an equivalent Hadoop cluster, so a fraction of the nodes to perform a similar computation (possibly 1/3 or 1/4) is needed. You may be surprised how much more you can get done on a Thor cluster with fewer nodes.
We currently do not recommend clusters larger than 200 on Amazon at this time. While we regularly run production clusters much larger than 200 nodes outside of Amazon, certain hardware errors are presenting themselves more readily on large clusters on Amazon. We are working on dynamic solutions to these Amazon hardware issues.
Is there information available around the "support nodes" that get created?
The One-Click Thor site calculates the number of support nodes needed given the cluster size you use. The Total Nodes in your cluster is the Thor nodes plus the Support nodes.
What will I see in my Amazon AWS Management Console?
When you launch a new cluster, you will see one new EC2 instance for each node in your cluster. You will also see a new security group configured specifically for your new cluster and a new key pair. Both will have the same name as your cluster.
When you terminate a cluster, the site will terminate those EC2 instances that have a security group and key pair with the same name as your cluster. The site will then delete the security group and key pair from your Amazon AWS account.
From the View Clusters window, why is the link listed under the ESP Page column not working?
This is most likely due to your internal firewall blocking this IP address. Amazon Web Services releases a new range of IP addresses periodically which might be blocked depending on your firewall settings. Visit the AWS Developer forum for a recent list of IP addresses.
What is the difference between One-Click cloud and the AWS support for HPCC previously announced?
“One-Click” provides an effective way to provision an entire HPCC Systems Thor cluster on AWS from a simple web interface. In addition, the HPCC Systems platform can be provisioned on AWS manually when more flexibility is required around deployment options or when Roxie is needed as part of the platform.
Can I still use the Amazon console and other AWS tools to manage the One-Click created Thor? How?
Yes, you can still manage the individual nodes independently from your Amazon console and other AWS tools, but this will usually have a detrimental impact on your cluster, so it’s not recommended. Please note that destroying a node instance from your Amazon console will, in the majority of the cases, stop a running job and can lead to data loss depending on your redundancy settings.
What happens to my clusters when I log out of this site?
Nothing. This system interacts with your Amazon AWS account. When you log out of this site, your cluster will continue to run on Amazon AWS until you terminate them either from this site or from your Amazon AWS Management Console.
What if I navigate away from the log page while it's busy?
The site will continue to fulfill your request and accept new requests. This is true even if you log out. You can navigate back to this page by going to the `View Clusters` page and clicking on the `View Log` link for the cluster in question.
Why did my cluster fail to configure?
The most common reason is that you have a new Amazon AWS account. Even though you have credentials from Amazon, it still takes a while for Amazon to fully enable your account. You''re almost there. Try again in an hour or two.
The second most common reason is that you requested a cluster size that would cause you to exceed the default limits imposed by Amazon. At the time of this writing, Amazon sets that default limit to no more than 20 total EC2 instances per account. Don''t worry -- many people launch significantly larger clusters on Amazon. Use the link AWS Limit Increase Request to request a limit increase. You will need to know that this site currently launches `Large` `Linux` instances in the `US East` region by default. You will also need your account number found in the upper right hand corner of the AWS Access Keys page.
It is possible that Amazon is running slow. Check the link AWS Health Dashboard to see if Amazon AWS is experiencing issues.
If none of these apply, please reach out via the Forum link.
Why was I charged for more nodes than I requested?
EC2 instances don’t always launch perfectly. You may not see this very often, but you will run across it when launching very large clusters. We do a number of tests on each EC2 instance we launch. When a bad instance is detected during cluster configuration, the instance is terminated and relaunched. Each EC2 instance is launched in an identical way. But for reasons outside of our control, there are occasional issues. These might include our software not installing properly, disk I/O errors, packet errors, or other system errors outside of our control. If we detect these before the cluster is configured, we terminate and re-launch those instances with errors and relaunch.
What can I do when I try to start a cluster and it hangs or parts of the process (per log) fail?
Use the Abort Cluster control/button (currently in development) from our main page, and/or
go to the Amazon AWS Management EC2 Console, (login into the AWS system), find the cluster in question and terminate it.
Will this tool monitor my running clusters?
No. The One-Click Thor service is designed to launch clusters, terminate clusters, and provide additional information needed to manage your clusters. It does not participate in any other activities.
What is the proper way to shut down my cluster through the AWS management console?
The proper way to shut down your cluster is to click terminate on the cluster view. When you click terminate, the One-Click Thor™ portal does the following:
- Terminates any instances in the region specified that contain a Security Group and Key with names matching your cluster (Thor-xxxx)
- Deletes the Security Group
- Deletes the Key
- Marks the cluster record on the One-Click Thor ™ site as terminated.
If you manually delete any nodes, security groups or keys through the AWS management Console, you can still click “terminate” in the “View Clusters” page to complete this action.
Can I use a Thor Cluster’s AWS security group (or key pair) to manually launch other EC2 instances?
No, we do not recommend you do this unless absolutely necessary. The instances will not be a configured part of the Thor cluster and if you click "terminate" inside the One-Click Thor service, all instances using that security group and key pair will be terminated.
How can I store my data in this environment?
You can “spray” your data from a landing zone. For more information see the HPCC Data Tutorial (Page 8 - Spray the Data File to your THOR Cluster).
Is S3 supported? If so, how can I access it?
Yes. S3 is supported, but you must mount S3 to your landing zone. You can do this by using SSH into the node that contains your landing zone and use existing Fuse Drivers and other tools to connect an S3 bucket. There is documentation on the web on how to do this. It has been requested that we automate this process. We may be automating this in a future release.
Can I change some of the default used to run this cool One-Click Thor script? If so, how?
Generally, no. “One-Click” is meant to simplify HPCC Thor cluster deployments on AWS. If additional flexibility is required, please review the Running the HPCC Systems Thor Platform within AWS documentation. You can make a limited number of choices through the Advanced Launch link.
Can I have an API or programmatic interface to the cluster?
Yes. The HPCC Systems Thor platform offers different interface options. You can publish ECL code to a QuerySet, and then call it via SOAP or JSON so that the compilation of the query only needs to be done once.
Regarding the HPCC Systems Roxie, will One-Click Roxie be available soon?
Yes, our team of engineers are working to make this available soon.
Can I leverage a Thor cluster in Amazon with an in-house Roxie?
Yes, of course! For all aspects, your Thor cluster in Amazon is a normal Thor cluster. Query packages deployed to Roxie, if configured correctly, will locate and retrieve data from your AWS Thor clusters automatically. For more information, see the HPCC Data Tutorial documentation, (page 23 - Compile and Publish the Roxie Query).
Can I run ECL code without installing the IDE?
Yes. There is a primitive interface to run ECL code directly from the web browser as long as there are no external dependencies.
- Click on your ESP Page from the ‘View Clusters’ page. This will launch your ECL Watch page.
- Click on ‘System Servers’ on the left
- Click on ‘myesp’
- Click on ‘myecldirect’
- Click on ‘RunEcl’
There are a number of tools in the IDE not present in ECLDirect, but you will be able to run code.
How can I store my data before terminating the cluster?
You can attached an EBS Volume to your landing zone by specifying an EBS Snapshot ID. You can use this to spray and de-spray data to and from the cluster. Before you terminate your cluster, de-spray data to the snapshot directory in your landing zone, then create a new EBS snapshot via your AWS Management Console before terminating your cluster. You need to create a new snapshot because the One-Click Thor portal will attempt to terminate any EBS volumes attached to the cluster.
Attaching a Snap Shots Causes Errors During Launch
Some people have reported a file system error when connecting snap shots. We are aware of the problem. For now, make sure that your snap shot's file system is ext3.