/******************************************************************************
File     : SuperMUSE.h
Author   : L. Shawn Matott
Copyright: 2007, L. Shawn Matott

The SuperMUSE class interfaces Ostrich with the EPA SuperMUSE cluster computing 
system; namely via the Java-based RepeatTasker program.  The two programs 
interface via a simple file-based communication strategy, where the 
presence/absence of certain files serve as a handshaking mechanism that 
proceeds as follows:

(1) When Ostrich requires model evaluations, it will create a text file 
    (via the SuperMUSE class) that lists each model evaluation. Each line of 
    the task file must contain a script, batch file, or command line that 
    will execute the desired modeling program and perform any necessary pre- 
    and post-execution file staging. The script(/batch/command line) must
    accept three command line arguments (1) the host name of the Ostrich program,
    (2) a task id, and (3) the name of an 'Arguments' file that contains a 
    line-by-line list of parameter values for each model evaluation. It is up to 
    the client-side script to retrieve this file and extract the appropriate 
    parameter values (based on the task id command line argument).

    The existence of the task file will trigger the RepeatTasker to process 
    the tasks listed therein via the "CPU Allocator" and the client-side 
    "Tasker Client" programs. In this regard, the RepeatTasker will assign
    tasks to the first available compute nodes (clients).

[if no errors occur]
(2) Once RepeatTasker detects that all tasks requested by Ostrich have been
    successfuly completed, a 'success' output file will be created. The 
    creation of this file signals the Ostrich program to resume operation 
    (i.e. process the output generated by the various clients and decide what 
     model configurations should be evaluated next).

[if known errors occur]
(3) If RepeatTasker detects that one or more tasks failed or timed-out, 
    it will attempt to automatically retry these tasks. After some maximum 
    number of retries and if there are still failed and/or incomplete tasks, 
    then RepeatTasker will create an 'error' output file to signal that an 
    unrecoverable error occured. Ostrich will note the presence of this 
    file and will revert to serial (single-procesor) execution.

[if unknown errors occur]
(4) In case some unexpected error occurs such that RepeatTasker hangs or 
    crashes, Ostrich (via the SuperMUSE class) maintains an internal timer 
    that will timeout if some maximum amount of time elapses. If this 
    internal job timer 'goes off' then Ostrich will abandon the use of 
    SuperMUSE and will revert to serial (single processor) execution.

Configuration parameters for the SuperMUSE class can be specified by the 
user via the SuperMUSE section of the Ostrich input fule. Relevant arguments 
are as follows:
   (1) Ostrich_Tasker_Hostname - The host name of the Ostrich Tasker.
       Also doubles as the host name of the Ostrich program, meaning that
       Ostrich must be run on the same server as the Ostrich Tasker.

   (2) TaskFile - The file that, when present, contains a list of tasks that 
       can be completed in any order by any available clients. Collectively,
       these tasks constitute a SuperMUSE 'job'.

   (3) TempFile - A temporary filename for the SuperMUSE class to use while 
       constructing a new TaskFile instance.

   (4) SuccessFile - The file that RepeatTasker should create when all tasks 
       listed in 'TaskFile' have been completed successfully.

   (5) ErrorFile -  The file that RepeatTasker should create when one or more
       tasks listed in 'TaskFile' failed to complete successfully, even after 
       some number of retries.

   (6) MaxJobTime - The maximum amount of time (in minutes) to wait for RepeatTasker
       to complete a given job (i.e. set of tasks). If MaxJobTime elapses for a 
       given job, Ostrich will abandon the use of SuperMUSE and revert to single-
       processor execution.

   (7) ScriptFile - The name of a script file, batch file, or command line that
       will be executed on the SuperMUSE clients and which should execute the
       external model which is being optimized/calibrated. The file/command should
       accept command line arguments that represent a list of parameter values to
       use in the model.

    (8) ArgumentsFile -  The file that "Tasker Client" should read to determine
        parameter values for a given model evaluation.

Version History
07-13-07    lsm   added copyright information and initial comments.
******************************************************************************/
#ifndef SUPER_MUSE_H
#define SUPER_MUSE_H

#include "MyHeaderInc.h"

// forward decs
class ModelABC;
class ParameterGroup;

typedef struct ENV_VAR_LIST
{
  char pVar[1000];
  char pVal[1000];
  struct ENV_VAR_LIST * pNxt;
}EnvVarList;

/******************************************************************************
class SuperMUSE

Interfaces Ostrich with the SuperMUSE Tasker/Client job-management and job-
scheduling utilities.
******************************************************************************/
class SuperMUSE
{   
   public:
      SuperMUSE(FILE * pFile, ModelABC * pModel);
     ~SuperMUSE(void){ DBG_PRINT("SuperMUSE::DTOR"); Destroy(); }
      void Destroy(void);
      void WriteTask(ParameterGroup * pGroup);
      void FinishTaskFile(void);
      bool WaitForTasker(void);
      double GatherResult(int taskid);
      void WriteSetup(FILE * pFile);     
      void EnvVarCleanup(void);

   private:
      //mapping of environment variables
      void LoadEnvVars(char * pEnvVarFile);
      void UnloadEnvVars(void);
      void ReplaceEnvVars(char * pTarget);      
      EnvVarList * m_pEnvVars;

      char m_Server[1000];
      char m_TaskFile[1000];
      char m_TempFile[1000];
      char m_SuccessFile[1000];
      char m_ErrorFile[1000];
      char m_ScriptFile[1000];
      char m_ArgsFile[1000];
      char m_ClientDir[1000];
      char m_ServerDir[1000];
      int m_MaxJobTime; 
      int m_TaskID;  
      ModelABC * m_pModel;
}; /* end class SuperMUSE */

#endif /* SUPER_MUSE_H */


