LSF DRM and the SIMULIA Execution Engine

You can use Platform LSF distributed resource management (DRM) to control the distribution and execution of your workload in a SIMULIA Execution Engine environment.

The SIMULIA Execution Engine can be configured to use LSF, the third-party distributed resource management system, to optimize the utilization of compute resources for high-performance computing tasks. Once the system is configured, you can set LSF-specific options for individual Isight components (via the DRM Settings tab on the Properties dialog box). For more information on these component settings, see Configuring the LSF DRM Settings in the Isight Component Guide.

Enabling the LSF DRM feature can significantly enhance the scheduling capabilities of the SIMULIA Execution Engine, particularly for workflows with time-consuming, resource-intensive work items. When using the Fiper DRM option, the SIMULIA Execution Engine requires that stations be running and awaiting work items sent from the SIMULIA Execution Engine server. When using the LSF DRM option, the SIMULIA Execution Engine uses LSF to launch SIMULIA Execution Engine station processes as needed on LSF compute nodes. Each process is then connected to the SIMULIA Execution Engine server, runs a single work item, and is terminated. Each work item dispatched with the LSF DRM corresponds to a single LSF job. This configuration gives LSF direct control over the station processes that are actually doing work, both for resource management and accounting purposes, and allows the SIMULIA Execution Engine server to utilize LSF’s sophisticated scheduling capabilities to select the optimal node for each piece of work.

Unlike the Fiper DRM, the LSF DRM imposes some scheduling and process-launching overhead on each SIMULIA Execution Engine work item. However, for compute-intensive, long-running work items the improved scheduling and job management that LSF DRM provides greatly outweighs this overhead. For workflows composed of significant numbers of small, short-running work items, the LSF DRM can reduce SIMULIA Execution Engine job throughput when used exclusively. Mixed-mode DRM, where both the Fiper and LSF distributed resource managers are enabled on the SIMULIA Execution Engine server, can be used to manage this scenario. For more information, see Mixed-Mode DRM and the SIMULIA Execution Engine.

Using LSF with the LSF Grid Plug-in

Isight and the SIMULIA Execution Engine can access the LSF system through the use of the LSF Grid plug-in. The OS Command, Simcode, and Abaqus components provide this functionality.

The LSF Grid plug-in option allows an Isight component to submit command-line codes to an LSF cluster directly from an Isight installation or a SIMULIA Execution Engine station. For more information on using the Grid plug-in with LSF or other distributed resource management systems (DRMs), see OS Command Component, Abaqus Component, and Simcode Component sections in the Isight Component Guide.

Using the LSF Grid plug-in is a distinct but often complementary scenario compared to the LSF DRM available with the SIMULIA Execution Engine. The plug-in represents a specific use case: when you need to run command line–based, compute-intensive codes on an LSF cluster. SIMULIA Execution Engine stations are typically not installed on these cluster nodes, and the LSF Grid plug-in can be used to access the back-office LSF systems. Furthermore, these back-office clusters may be using a DRM other than LSF, such as PBS/Pro or Torque. In this scenario you can use the appropriate LSF Grid plug-in to access those nodes for command-line codes only.

If LSF is installed on all of the nodes (both clusters and individual systems), you will most likely want to use the LSF DRM and limit the usage of the LSF Grid plug-in.

Using LSF Clusters

Whether you have LSF installed as a back-office computer cluster devoted to high-performance computing or you have LSF installed on every system on your network, you can take advantage of the available computing power.

If the SIMULIA Execution Engine server is installed outside of the LSF cluster, you can execute a SIMULIA Execution Engine station using the Fiper DRM option on the LSF head node (or several stations on several LSF nodes for redundancy) of a compute cluster as a gateway to the cluster. Once started, you can send compute-intensive, command line work items to these gateway stations using the affinity matching capability available in Isight. These gateway stations use the LSF Grid plug-in to submit LSF jobs to the compute cluster. This approach is limited to command line codes, such as those used by the OS Command and Abaqus components. Therefore, a more comprehensive overall scheduling capability can be achieved if LSF is available on all of the nodes and the LSF DRM option is used.

Understanding LSF Version Support and Prerequisites

You must make sure you are installing a supported version of LSF and have the necessary prerequisites.

For SIMULIA Execution Engine 2017 only LSF version 9.1.1 is supported.

In addition, the following prerequisites are necessary for installing LSF with the SIMULIA Execution Engine:

  • Linux: WebSphere cannot be installed as root unless the Run-As feature is enabled. WebSphere must be installed and started by a user who has the necessary credentials to submit jobs to the LSF cluster, and this user cannot be root. If Run-As is enabled, WebSphere can run as root.

  • You must install the SIMULIA Execution Engine normally and deploy on WebSphere.

  • Do not start the SIMULIA Execution Engine in your application server before you have installed and configured LSF (as described in Verifying the SIMULIA Execution Engine Configuration and in Configuring LSF for the SIMULIA Execution Engine). For example, on Linux you should source the profile.lsf file in the command shell before starting the SIMULIA Execution Engine. The SIMULIA Execution Engine must be able to find the LSF binary files in the system executable path.

  • You must install the SIMULIA Execution Engine station software on all computers that will run a SIMULIA Execution Engine station.

Limitations of LSF with the SIMULIA Execution Engine

The use of LSF distributed resource management (DRM) with the SIMULIA Execution Engine is subject to the following limitations:

  • By default, SIMULIA Execution Engine work items execute with a common dedicated user ID (the ID used to start the SIMULIA Execution Engine station service). The security context of the executing code will be that of the dedicated ID, which in general is created specifically for the execution of the SIMULIA Execution Engine workload. To run SIMULIA Execution Engine work items under the submitter's security credentials, you must enable the SIMULIA Execution Engine Run-As security feature as described in Configuring Station (Run-As) Security.

  • By default, a SIMULIA Execution Engine user cannot specify general LSF resource requirements for work items dispatched with the LSF DRM. However, standard SIMULIA Execution Engine affinities can be used. To specify more advanced LSF resource requirements for components within a model using the Properties dialog box, see Configuring the LSF DRM Settings in the Isight Component Guide.

  • LSF preemptive scheduling and suspension of in-progress SIMULIA Execution Engine work items is not possible.