DataGrid

 

WP1 - WMS Software Administrator and User Guide

 

 

 

 

 

 

Document identifier:

DataGrid-01-TEN-0118-0_98

 

Date:

03/12/2002

 

Work package:

WP1

 

Partner:

Datamat SpA

 

 

 

 

Document status

 

 

 

 

 

Deliverable identifier:

 

 

 

Abstract: This note provides the administrator and user guide for the WP1 WMS software.

 


Delivery Slip

 

Name

Partner

Date

Signature

From

Fabrizio Pacini

Datamat SpA

03/12/2002

 

 

Verified by

Stefano Beco

Datamat SpA

03/12/2002

 

 

Approved by

 

 

 

 

 

 

Document Log

Issue

Date

Comment

Author

0_0

21/12/2001

First draft

Fabrizio Pacini

0_1

14/01/2002

Draft

Fabrizio Pacini

0_2

24/01/2002

Draft

Fabrizio Pacini

0_3

05/02/2002

Draft

Fabrizio Pacini

0_4

15/02/2002

Draft

Fabrizio Pacini

0_5

08/04/2002

Draft

Fabrizio Pacini

0_6

13/05/2002

 

Fabrizio Pacini

0_7

19/07/2002

 

Fabrizio Pacini

0_8

16/09/2002

 

Fabrizio Pacini

0_9

03/12/2002

 

Fabrizio Pacini

 

Document Change Record

Issue

Item

Reason for Change

0_1

General update

-        Take into account changes in the rpm generation procedure.

-        Add missing info about daemons (RB/JSS/CondorG) starting accounts

-        Some general corrections

0_2

General Update

-        Add Cancelling and Cancel Reason information.

-        Add OUTPUTREADY job state.

-        Add new profile rpms.

-        Remove /etc/workload* shell scripts.

-        Add summary map table (user / daemon).

-        Add CEId format check.

-        Add new job cancel notification.

0_3

General Update

-        Modified RB/JSS start-up procedure

-        Add gridmap-file users/groups issues

-        Add proxy certificate usage by daemons

-        Job attribute CEId changed to SubmitTo

-        Add DGLOG_TIMEOUT setting

-        Add workload-profile and userinterface-profile rpms

0_4

General Update

-        Add configure option –enable-wl for system configuration files

-        Add installation checking option –with-globus for Globus to the Workload configure

-        Add new Information Index configure options

-        Remove edg-profile and edg-user-env rpms from II and UI dependencies

-        Add security configuration rpm’s for all the Certificate Authorities to UI dependencies

-        Add new parameters to RB configuration file

-        Add new Job Exit Code field to the returned job status info

-        Remove dependence from SWIG in the userinterface binary rpm

0_5

General Update

-        Modify command options syntax (getopt-like style)

-        Add MyProxy server and client package installation/utilisation

-        Modify job cancel notification

-        Add Userguide rpm

0_6

General Update

-        Modify configure options for the various components

-        UI commands modified to use python2 executable

-        Clarify myproxy usage

-        Explain how RB/LB addresses in the UI config file are used by the commands

-        Add –logfile option to the UI commands

0_7

General Update

-        Modify configure options for the various components

-        Clarify UI commands –notify option usage

-        Add make test target for UI

0_8

General Update

 

-        Specified dependencies of profile rpms

-        Update needed env vars for UI

-        Explain how to include default constraints in the job requirements

-        Explain that the lc field in the ReplicaCatalog address is now mandatory

-        Explain how to specify wildcards and special chars in "Arguments" in the JDL expression

     

0_9

General Update

-        Defaults for Rank and Requirements in the UI config file

-        Added reference to the “.BrokerInfo” file document

-        other.CEId in Requirements vs --resource option

-        Explain MyProxy Server configuration

-        Added description of new parameters in RB configuration file

-        RB/JSS databases clean-up procedure added

-        Explain usage of RetryCount JDL attribute

-        Better explain how to specify wildcards and special chars in "Arguments" in the JDL expression

-        Updated reference to JDL Attributes note

-        Added Annex on Submission failures analysis

 

 

 

Files

Software Products

User files

Word 97

DataGrid-01-TEN-0118-0_9_Document.docDatagrid_01_TEN_0118_0_8_Document

 

Acrobat Exchange 4.0

DataGrid-01-TEN-0118-0_98-Document.pdf

 


Content

1. Introduction. 8

1.1. Objectives of this document. 8

1.2. Application area. 8

1.3. Applicable documents and reference documents. 8

1.4. Document evolution procedure. 9

1.5. Terminology. 9

2. Executive summary. 11

3. Build Procedure. 12

3.1. Required Software. 12

3.2. Build Instructions. 13

3.2.1. Environment Variables. 13

3.2.2. Compiling the code. 15

3.3. RPM Installation. 25

4. Installation and Configuration. 28

4.1. Logging and Bookkeeping services. 28

4.1.1. Required software. 28

4.1.1.1. LB local-logger 28

4.1.1.2. LB Server 28

4.1.2. RPM installation. 29

4.1.3. The installation tree structure. 30

4.1.3.1. LB local-logger 30

4.1.3.2. LB Server 31

4.1.4. Configuration. 31

4.1.5. Environment Variables. 31

4.2. RB and JSS. 33

4.2.1. Required software. 33

4.2.1.1. PostgreSQL installation and configuration. 33

4.2.1.2. Condor-G installation and configuration. 34

4.2.1.3. ClassAd installation and configuration. 35

4.2.1.4. ReplicaCatalog installation and configuration. 36

4.2.2. RPM installation. 36

4.2.3. The Installation Tree structure. 36

4.2.4. Configuration. 41

4.2.4.1. RB configuration. 41

4.2.4.2. JSS configuration. 45

4.2.5. Environment variables. 46

4.2.5.1. RB.. 46

4.2.5.2. JSS.. 46

4.3. Information Index. 48

4.3.1. Required software. 48

4.3.2. RPM installation. 48

4.3.3. The Installation tree structure. 49

4.3.4. Configuration. 49

4.3.5. Environment Variables. 50

4.4. User Interface. 51

4.4.1. Required software. 51

4.4.2. RPM installation. 52

4.4.3. The tree structure. 53

4.4.4. Configuration. 54

4.4.5. Environment variables. 56

4.5. DOCUMENTATION. 57

5. Operating the System.. 58

5.1. LB local-logger. 58

5.1.1. Starting and stopping daemons. 58

5.1.2. Troubleshooting. 59

5.2. LB Server. 60

5.2.1. Starting and stopping daemons. 60

5.2.2. Purging the LB database. 61

5.2.3. Troubleshooting. 61

5.3. RB and JSS. 62

5.3.1. Starting PostGreSQL. 62

5.3.2. Starting and stopping JSS and RB daemons. 62

5.3.3. RB and JSS databases clean-up. 63

5.3.4. RB troubleshooting. 63

5.3.5. JSS troubleshooting. 64

5.4. Information Index. 64

5.4.1. Starting and stopping daemons. 64

6. User Guide. 65

6.1. User interface. 65

6.1.1. Security. 65

6.1.1.1. MyProxy. 66

6.1.2. Common behaviours. 69

6.1.3. Commands description. 73

7. Annexes. 109

7.1. JDL Attributes. 109

7.2. Job Status Diagram.. 109

7.3. Job Event Types. 111

7.4. Submission Failures Analysis. 113

7.4.1. Job OutputReady. 115

7.4.2. Job Cleared. 118

7.4.3. Job Aborted (no matching resources - II not reachable) 122

7.4.4. Job Aborted (Standard output of job wrapper does not contain useful data) 123

7.4.5. Job Aborted (CondorG failure) 134

7.5. wildcard patterns. 159

7.6. The Match Making Algorithm.. 161

7.6.1. Direct Job Submission. 161

7.6.2. Job submission without data-access requirements. 161

7.6.3. Job submission with data-access requirements. 163

7.7. Process/User Mapping Table. 166

1. Introduction

This document provides a guide to the building, installation and usage of the WP1 WMS software released within the DataGrid project.

1.1. Objectives of this document

Goal of this document is to describe the complete process by which the WP1 WMS software can be installed and configured on the DataGrid test-bed platforms.

Guidelines for operating the whole system and accessing provided functionalities are also provided.

1.2. Application area

Administrators can use this document as a basis for installing, configuring and operating WP1 WMS software. Users can refer to the User Guide chapter for accessing provided services through the User Interface.

 

1.3. Applicable documents and reference documents

Applicable documents

[A1]

Job Description Language HowTo – DataGrid-01-TEN-0102-02 – 17/12/2001

(http://www.infn.it/workload-grid/docs/DataGrid-01-TEN-0102-0_2.pdf)

[A2]

DATAGRID WP1 Job Submission User Interface for PM9 (revised presentation) – 23/03/2001

 (http://www.infn.it/workload-grid/docs/20010320-JS-UI-datamat.pdf)

[A3]

WP1 meeting - CESNET presentation in Milan – 20-21/03/2001

(http://www.infn.it/workload-grid/docs/20010320-L_B-matyska.pdf)

[A4]

Logging and Bookkeeping Service – 0705/2001

(http://www.infn.it/workload-grid/docs/20010508-lb_draft-ruda.pdf)

[A5]

Results of Meeting on Workload Manager Components Interaction – 09/05/2001

(http://www.infn.it/workload-grid/docs/20010508-WM-Interactions-pacini.pdf)

[A6]

Resource Broker Architecture and APIs – 13/06/2001

(http://www.infn.it/workload-grid/docs/20010613-RBArch-2.doc)

[A7]

JDL Attributes - DataGrid-01-NOT-0101-0_7 – 03/12/2002

(http://www.infn.it/workload-grid/docs/DataGrid-01-NOT-0101-0_7.{doc,pdf})

 

Reference documents

[R1]

The Resource Broker Info file – DataGrid-01-TEN-0135-0_0

(http://www.infn.it/workload-grid/docs/DataGrid-01-TEN-0135-0_0.{doc,pdf})


 

1.4. Document evolution procedure

The content of this document will be subjected to modification according to the following events:

·          Comments received from Datagrid project members,

·          Changes/evolutions/additions to the WMS components.

 

1.5. Terminology

Definitions

Condor

Condor is a High Throughput Computing (HTC) environment that can manage very large collections of distributively owned workstations

Globus

The Globus Toolkit is a set of software tools and libraries aimed at the building of computational grids and grid-based applications.

 

Glossary

class-ad

Classified advertisement

CE

Computing Element

DB

Data Base

FQDN

Fully Qualified Domain Name

GDMP

Grid Data Management Pilot Project

GIS

Grid Information Service, aka MDS

GSI

Grid Security Infrastructure

job-ad

Class-ad describing a job

JDL

Job Description Language

JSS

Job Submission Service

LB

Logging and Bookkeeping Service

LRMS

Local Resource Management System

MDS

Metacomputing Directory Service, aka GIS

MPI

Message Passing Interface

PID

Process Identifier

PM

Project Month

RB

Resource Broker

RC

Replica Catalogue

SE

Storage Element

SI00

Spec Int 2000

SMP

Symmetric Multi Processor

TBC

To Be Confirmed

TBD

To Be Defined

UI

User Interface

UID

User Identifier

WMS

Workload Management System

WP

Work Package

2. Executive summary

This document comprises the following main sections:

Section 3: Build Procedure

Outlines the software required to build the system and the actual process for building it and generating rpms for the WMS components; a step-by-step guide is included.

Section 4: Installation and Configuration

Describes changes that need to be made to the environment and the steps to be performed for installing the WMS software on the test-bed target platforms. The resulting installation tree structure is detailed for each system component.

Section 5: Operating the System

Provides actual procedures for starting/stopping WMS components processes and utilities.

Section 6: User Guide

Describes in a Unix man pages style all User Interface component commands allowing the user to access WMS provided services.

Section 7: Annexes

Deepens arguments introduced in the User Guide section that are considered useful for the user to better understand system behaviour.

 

 

3. Build Procedure

In the following section we give detailed instructions for the installation of the WP1 WMS software package. We provide a source code distribution as well as a binary distribution and explain installation procedures for both cases.

3.1. Required Software

The WP1 software runs and has been tested on platforms running Globus Toolkit 2.0 Beta Release 21 on top of Linux RedHat 6.2.

Hereafter are listed the software packages, apart from WP1 software version 1.0, that are required to be installed locally on a given site in order to be able to build the WP1 WMS on it. They are:

 

-        Globus Toolkit 2.0 Beta 21 or higher (download at http://datagrid.in2p3.fr/distribution/globus/beta-21)

 

-        Python 2.1.1 (download at http://datagrid.in2p3.fr/distribution/config/external.html)

 

-        Swig 1.3.9 (download at http://datagrid.in2p3.fr/distribution/config/external.html)

 

-        Expat 1.95.1 (download at http://datagrid.in2p3.fr/distribution/config/external.html)

 

-        Expat-devel 1.95.1 (download at http://datagrid.in2p3.fr/distribution/config/external.html)

 

-        MySQL Version 9.38 Distribution 3.22.32, for pc-linux-gnu (i686)  (download at http://datagrid.in2p3.fr/distribution/config/external_services.html)

 

-        MySQL  Version 11.15 Distribution 3.23.42, for pc-linux-gnu (i686)

(download at http://datagrid.in2p3.fr/distribution/external/RPMS/). Hereafter the needed rpms:

MySQL-shared-3.23.42-1

MySQL-client-3.23.42-1

MySQL-3.23.42-1

MySQL-devel-3.23.42-1

 

-        Postgresql 7.1.3 (http://datagrid.in2p3.fr/distribution/config/external_services.html)

 

-        Classads library (download at http://datagrid.in2p3.fr/distribution/external/RPMS/classads-0.0-edg2.i386.rpm)

 

-        CondorG 6.3.1 for INTEL-LINUX-GLIBC21 (download at

      http://datagrid.in2p3.fr/distribution/external/RPMS/CondorG-6.3.1-edg5.i386.rpm)

 

-        Perl IO Stty 0.02, Perl IO Tty 0.04 (download at http://datagrid.in2p3.fr/distribution/config/external.html )

 

-        MyProxy-0.4.4 (download at http://datagrid.in2p3.fr/distribution/external/RPMS/). Hereafter the needed rpms:

myproxy-server-0.4.4-edg6.i386.rpm (for the MyProxy Server machine)

myproxy-client-0.4.4-edg6.i386.rpm (for the UI machine)

 

-        Perl 5 (download at http://datagrid.in2p3.fr/distribution/config/external.html)   

 

-        gcc version 2.95.2

 

-        GNU make version 3.78.1 or higher

 

-        GNU autoconf version 2.13

 

-        GNU libtool 1.3.5

 

-        GNU automake 1.4

 

-        GNU m4 1.4 or higher

 

-        RPM 3.0.5

 

-        sendmail 8.11.6

 

 

3.2. Build Instructions

The following instructions deal with the building of the WMS software and hence apply to the source code distribution.

3.2.1. Environment Variables

Before starting the compilation, some environment variables related to the WMS components can be set or configured by means of the configure script. This is needed only if package defaults are not suitable. Involved variables are listed below:

 

-          GLOBUS_LOCATION                 base directory of the Globus installation

The default path is /opt/globus.

-          MYSQL_INSTALL_PATH base directory of the MySQL installation

The default path is /usr.

 

-          EXPAT_INSTALL_PATH             base directory of the Expat installation.

The default path is /usr.

 

-          GDMP_INSTALL_PATH              base directory of the Gdmp installation

The default path is /opt/edg.

-          PGSQL_INSTALL_PATH base directory of the Pgsql installation.

The default path is /usr.

 

-          CLASSAD_INSTALL_PATH        base directory of the Classad library installation. The

default path is /opt/classads.

 

-          CONDORG_INSTALL_PATH      base directory of the Condor installation.

The default path is /opt/CondorG.

 

-          PYTHON_INSTALL_PATH          base directory of the Python installation.

The default path is /usr.

 

-          SWIG_INSTALL_PATH               base directory of the Swig installation .

The default path is /usr/local.

 

-          MYPROXY_INSTALL_PATH       base directory of the MyProxy installation .

The default path is /usr/local.

 

In order to build the whole WP1 package, all the environment variables in the previous list must be set. Instead for building the User Interface module, the environment variables that need to be set are the following:

 

-          GLOBUS_LOCATION

-          CLASSAD_INSTALL_PATH

-          PYTHON_INSTALL_PATH

-          SWIG_INSTALL_PATH

-          EXPAT_INSTALL_PATH

 

If you plan to build the Job Submission and Resource Broker module, variable to set are:

 

-          GLOBUS_LOCATION

-          MYSQL_INSTALL_PATH

-          EXPAT_INSTALL_PATH

-          GDMP_INSTALL_PATH

-          PGSQL_INSTALL_PATH

-          CLASSAD_INSTALL_PATH

-          CONDORG_INSTALL_PATH

 

If you plan to build the Proxy module, variables to set are:

-          GLOBUS_LOCATION

-          MYPROXY_INSTALL_PATH

Whilst the LB server and Local Logger modules, to be built need the following environment variables:

 

-          GLOBUS_LOCATION

-          MYSQL_INSTALL_PATH

-          EXPAT_INSTALL_PATH

 

Finally, the LB library module needs:

 

-          GLOBUS_LOCATION

-          EXPAT_INSTALL_PATH

 

and the Information Index module only:

 

-          GLOBUS_LOCATION

 

3.2.2. Compiling the code

After having unpacked the WP1 source distribution tar file, or having downloaded the code directly from the CVS repository, change your working directory to be the WP1 base directory, i.e. the Workload directory, and run the following command:

 

   ./recursive-autogen.sh

 

At this point the configure command can be run. The configure script has to be invoked as follows:

 

   ./configure <options>

 

 The list of options that are recognized by configure is reported hereafter:

 

   ---help

 

   --prefix=<installation path>

      It is used to specify the Workload installation dir. The default

      installation dir is /opt/edg.

 

   --enable-all

It is used to enable the build of the whole WP1 package. By default this option is turned on.

 

   --enable-userinterface

 It is used to enable the build of the User Interface module with Logging/Client, Broker/Client, Broker/Socket++ and ThirdParty/trio/src sub modules.  By default this option is turned off.

 

 --enable-userinterface_profile

It is used to enable the installation of the User Interface profile. By default this option is turned off.

 

   --enable-jss_rb

It is used to enable the build of the Job Submission and Resource Broker modules with Logging/Client, Common, test, Proxy/Dgpr, and ThirdParty/trio/src submodules. By default this option is turned off.

 

 --enable-jss_profile

It is used to enable the installation of the Job Submission and Resource Broker profile with JobSubmission/utils, and Broker/utils sub modules. By default this option is turned off.

 

   --enable-lbserver

It is used to enable the build of the LB Server service with Logging/Client, Logging/etc, Logging/Server, Logging/InterLogger/Net, Logging/InterLogger/SSL, Logging/InterLogger/Error, Logging/InterLogger/Lbserver and ThirdParty/trio/src sub modules. By default this option is turned off.

 

   --enable-locallogger

It is used to enable the build of the LB Local Logger service with Logging/Client, Logging/InterLogger/Net, Logging/InterLogger/SSL, Logging/InterLogger/Error, Logging/InterLogger/InterLogger, Logging/LocalLogger, man and ThirdParty/trio/src sub modules. By default this option is turned off.

 

 --enable-locallogger_profile

It is used to enable the installation of the LB LocalLogger profile. By default this option is turned off.

 

   --enable-logging_dev

It is used to enable the build of the LB Client Library with Logging/Client and ThirdParty/trio/src sub modules. By default this option is turned off.

 

    --enable-information

It is used to enable the build of the Information Index module.By default this option is turned off.

 

  --enable-information_profile

It is used to enable the installation of the Information Index profile with InformIndex/utils sub module. By default this option is turned off.

 

  --enable-wl

It is used to enable the installation of system configuration files that are in the Workload/etc directory. By default this option is turned off.

 

  --enable-proxy

It is used to enable the build of the Proxy module. By default this option is turned off.

 

 

   --with-globus-install=<dir>

It allows specifying the Globus installation directory without setting the environment variable GLOBUS_LOCATION.

 

   --with-pgsql-install=<dir>

It allows specifying the Pgsql installation directory without setting the environment variable PGSQL_INSTALL_PATH.

 

 --with-gdmp-install=<dir>

It allows specifying the GDMP installation directory without setting the environment variable GDMP_INSTALL_PATH.

 

 --with-expat-install=<dir>

It allows specifying the Expat installation directory without setting the environment variable EXPAT_INSTALL_PATH.

 

  --with-mysql-install=<dir>

It allows to specify the MySQL installation directory without setting the environment variable MYSQL_INSTALL_PATH.

 

--with-myproxy-install=<dir>

It allows to specify the MyProxy installation directory without setting the environment variable MYPROXY_INSTALL_PATH

 

 

During the configure step, 12 spec files (i.e. wl-userinterface.spec, wl-locallogger.spec, wl lbserver.spec, wl-logging_dev.spec, wl-jss_rb.spec, wl-information.spec, wl-userinterface-profile.spec, wl-jss_rb-profile.spec, wl-information-profile.spec, wl-lbserver-profile.spec and wl-locallogger-profile.spec, wl-workload-profile.spec) are created in the following source sub-directories to produce a flavour specific version:

 

-          Workload/UserInterface

-          Workload/Proxy

-          Workload/Logging

-          Workload/JobSubmission

-          Workload/InformIndex

-          Workload

 

Once the configure script has terminated its execution, check that the make from the GNU distribution is in your path and then always in the Workload source code directory run:

 

make

 

then:

 

make apidoc

 

and then:

 

make check

 

to build the test code. If the two previous steps complete successfully, the installation of the software can be performed. In order to install the package in the installation directory specified either by the --prefix option of the configure script or by the default value (i.e. /opt/edg), you can now issue the command:

 

make install

 

It is possible to run "make clean" to remove object files, executable files, library files and all the other files that are created during ”make” and “make check”. The command:

 

make -i dist

 

can be used to produce in the workload-X.Y.Z directory, located in the Workload's base directory, a binary gzipped tar ball of the Workload distribution. This tar ball can be both transferred on other platforms and used as source for the RPM creation.

For creating the RPMs for Workload 1.0 (according to the configure options you have used) make sure that your PATH is set in such a way that the GNU autotools, make and the gcc compiler can be used and edit the file $HOME/.rpmmacros (if this file does not exist in your home directory, then you have to create it) to set the following entry:

 

%_topdir         <your home dir>/rpm/redhat

 

Then you can issue the command:

 

make rpm

 

that generates the RPMs in $(HOME)/rpm/redhat/RPMS.

For example if before building the package you have used the configure as follows:

 

./configure –-enable-all

 

then the make rpm command creates the directories:

 

$(HOME)/rpm/redhat/SOURCES

$(HOME)/rpm/redhat/SPECS

$(HOME)/rpm/redhat/BUILD

$(HOME)/rpm/redhat/RPMS

$(HOME)/rpm/redhat/SRPMS

 

and copies the previously created tar ball workload-X.Y.Z/Workload.tar.gz in $(HOME)/rpm/redhat/SOURCES. Moreover it copies the generated spec files:

 

JobSubmission/wl-jss_rb.spec

JobSubmission/wl-jss_rb-profile.spec

UserInterface/wl-userinterface.spec

UserInterface/wl-userinterface.spec

InformIndex/wl-information.spec

InformIndex/wl-informationpthr.spec

InformIndex/wl-information-profile.spec

Logging/wl-lbserver.spec

Logging/wl-lbserver-profile.spec

Logging/wl-locallogger.spec

Logging/wl-locallogger-profile.spec

Logging/wl-logging_dev.spec

Proxy/wl-proxy.spec

Workload/wl-workload-profile.spec

Workload/wl-userguide.spec

 

 

in $(HOME)/rpm/redhat/SPECS and finally executes the following commands:

 

rpm -ba wl-userinterface.spec

rpm –ba wl-userinterface-profile.spec

rpm -ba wl-locallogger.spec

rpm -ba wl-locallogger-profile.spec

rpm -ba wl-lbserver.spec

rpm -ba wl-lbserver-profile.spec

rpm -ba wl-logging_dev.spec

rpm -ba wl-jss_rb.spec

rpm -ba wl-jss_rb-profile.spec

rpm -ba wl-information.spec

rpm -ba wl-informationpthr.spec

rpm -ba wl-information-profile.spec

rpm -ba wl-proxy.spec

rpm –ba wl-workload-profile.spec

rpm -ba wl-userguide.spec

 

generating respectively the following rpms in the $(HOME)/rpm/redhat/RPMS directory:

 

-          userinterface-X.Y.Z-K.i386.rpm

-          userinterface-profile-X.Y.Z-K.i386.rpm

-          locallogger- X.Y.Z-K.i386.rpm

-          locallogger-profile- X.Y.Z-K.i386.rpm

-          lbserver- X.Y.Z-K.i386.rpm

-          lbserver-profile- X.Y.Z-K.i386.rpm

-          logging_dev- X.Y.Z-K.i386.rpm

-          jobsubmission- X.Y.Z-K.i386.rpm

-          jobsubmission-profile- X.Y.Z-K.i386.rpm

-          informationindex- X.Y.Z-K.i386.rpm

-          informationindexpthr-X.Y.Z-K.i386.rpm

-          informationindex-profile- X.Y.Z-K.i386.rpm

-          proxy-X.Y.Z-K.i386.rpm

-          workload-profile-X.Y.Z-K.i386.rpm

-          userguide-X.Y.Z-K.i386.rpm

 

where X.Y.Z-K  indicates the rpms release.

If you have instead built only the User Interface, i.e. used:

 

./configure --disable-all --enable-userinterface

 

the make rpm command will copy only the file UserInterface/wl-userinterface.spec and the file UserInterface/wl-userinterface-profile.spec in $(HOME)/rpm/redhat/SPECS and will create only the User Interface rpms (userinterface-X.Y.Z-K.i386.rpm and userinterface-profile-X.Y.Z-K.i386.rpm).

The User Interface has an additional make target to install the userinterface test suite allowing the performing of unit tests (i.e. without contacting any external component). You have to run the following commands in Worklaod/UserInterface:

 

./autogen.sh

./configure –disable-all –enable-tests

make tests

 

and you will find the commands ready to run together with the test files in Workload/UserInterface/test.

An alternative procedure can be followed to build the II and Logging packages. To do this, move in the Workload/InformIndex dir and run the following commands:

 

./autogen.sh

./configure [option]

 

where the recognised options are:

 

--prefix=<install path>

It is used to specify the Information Index installation dir. The default installation dir is /opt/edg

 

--with-globus-install=<dir>

It allows to specify the Globus install directory without setting the environment variable GLOBUS_LOCATION.

    

Then issue:

make

make install

 

Afterwards move into the Workload/Logging directory and run the following commands:

 

./autogen.sh

./configure [option]

 

where the recognised options are:

     --enable-all

It is used to enable the build of the Logging and Bookkeeping package.

By default this option is turned on.

 

--enable-userinterface

It is used to enable the build of the Client sub module. By default this option is turned off.

 

--enable-graphical_userinterface

It is used to enable the build of the Client sub module. By default this option is turned off.

 

--enable-jss_rb

It is used to enable the build of the Client sub module. By default this option is turned off.

 

--enable-lbserver

It is used to enable the build of the Logging And Bookkeeping Server service with Client, etc, Server,  InterLogger/Net, InterLogger/SSL,  InterLogger/Error, InterLogger/Lbserver and ThirdParty/trio/src sub modules. By default this option is turned off.

 

--enable-lbserver_profile

It is used to enable the installation of the LB Server profile with Logging/utils sub module. By default this option is turned off.

 

--enable-locallogger

It is used to enable the build of the Logging And Bookkeeping Local Logger service with Client, InterLogger/Net, InterLogger/SSL, InterLogger/Error, InterLogger/InterLogger, LocalLogger, Apidoc, and ThirdParty/trio/src sub modules. By default this option is turned off.

 

--enable-logging_dev

It is used to enable the build of the Logging And Bookkeeping Client Library with Client and ThirdParty/trio/src sub modules. By default this option is turned off.

 

--prefix=<install path>

It is used to specify the Logging installation dir. The default installation dir is /opt/edg  

 

--with-globus-install=<dir>

It allows specifying the Globus install directory without setting the environment variable GLOBUS_LOCATION.    

  

--with-expat-install=<dir>

It allows specifying the Expat install directory without setting the environment variable EXPAT_INSTALL_PATH

 

--with-mysql-install=<dir>

It allows specifying the MySQL install directory without setting the environment variable MYSQL_INSTALL_PATH.

 

 

Then issue:

make

make apidoc

 

make check

make install

 

Summarising, in relation to the WMS module you want to build, the configure script has to be run with the following options:

 

-          all

./configure

 

-          userinterface

./configure --disable-all --enable-userinterface

 

-          information

./configure --disable-all --enable-information

 

-          lbserver

./configure --disable-all --enable-lbserver

 

-          locallogger

./configure --disable-all --enable-locallogger

 

-          logging for developers

./configure --disable-all --enable-logging_dev

 

-          jobsubmission and broker

./configure --disable-all --enable-jss_rb

 

-          wl

./configure –disable-all –enable-wl

 

-          proxy

./configure --disable-all --enable-proxy

 

-          userinterface profile

        ./configure --disable-all --enable-userinterface_profile

 

-          information profile

         ./configure --disable-all --enable-information_profile

 

-          information pthread

./configure --disable-all --enable-information --with-globus-flavor=gcc32dbgpthr

 

-          lbserver profile

          ./configure --disable-all --enable-lbserver_profile

 

-          locallogger profile

         ./configure --disable-all --enable-locallogger_profile

 

-          jobsubmission and broker profile

             ./configure --disable-all --enable-jss_profile

 

 

3.3. RPM Installation

In order to install the WP1 RPMs on the target platforms, the following commands have to be executed as root:

 

rpm -ivh workload-profile.X.Y.Z-K.i386.rpm

rpm –ivh userinterface-profile-X.Y.Z-K.i386.rpm

rpm -ivh userinterface-X.Y.Z-K.i386.rpm

rpm -ivh informationindex-profile-X.Y.Z-K.i386.rpm

rpm -ivh informationindex-X.Y.Z-K.i386.rpm

rpm –ivh informationindexpthr-X.Y.Z-K.i386.rpm

rpm -ivh jobsubmission-profile-X.Y.Z-K.i386.rpm

rpm -ivh jobsubmission-X.Y.Z.i386.rpm

rpm -ivh locallogger-profile-X.Y.Z-K.i386.rpm

rpm -ivh locallogger-X.Y.Z-K.i386.rpm

rpm -ivh lbserver-profile-X.Y.Z-K.i386.rpm

rpm -ivh lbserver-X.Y.Z-K.i386.rpm

rpm -ivh logging_dev-X.Y.Z-K.i386.rpm

rpm -ivh proxy-X.Y.Z-K.i386.rpm

rpm -ivh userguide-X.Y.Z-k.i386.rpm

 

By default all the rpms install the software in the /opt/edg directory, but the profile rpms (i.e. informationindex-profile, jobsubmission-profile, locallogger-profile and lbserver-profile) that install instead in /etc/rc.d/init.d.

All the profile rpms depend on the workload-profile rpm that in turn only depends on the bash rpm (whose version shoul be less than 2). Each component’s rpm then depends on the corresponding profile rpm (e.g. userinterface x.y.z depends on userinterface-profile-x.y.z that depends on workload-profile-x.y.z).

If you install one of the following rpms:

-          jobsubmission-X.Y.Z-K.i386.rpm

-          locallogger-X.Y.Z-K.i386.rpm

-          lbserver-X.Y.Z-K.i386.rpm

-          informationindex-X.Y.Z-K.i386.rpm

-          informationindexpthr-X.Y.Z-K.i386.rpm

you will have all needed files installed in /opt/edg and it is necessary to install the configuration and start-up files also in /etc/rc.d/init.d additionally installing the corresponding profile rpms. Namely using the rpms:

-          jobsubmission-profile-X.Y.Z-K.i386.rpm

-          locallogger-profile-X.Y.Z-K.i386.rpm

-          lbserver-profile-X.Y.Z-K.i386.rpm

-          informationindex-profile-X.Y.Z-K.i386.rpm

the following scripts are respectively installed in /etc/rc.d/init.d

-          broker and jobsubmission

-          locallogger

-          lbserver

-          information_index

The administrator (with root privileges) has then to issue from /etc/rc.d/init.d the command:

$ <script> start

to start the desired component. All start-up scripts accept the start, stop, restart and status options but the information_index that only supports start/stop.

The workload-profile-X.Y.Z-K.rpm installs some scripts common to all services of the workload management:

 

/etc/sysconfig/edg_workload

/etc/sysconfig/edg_workload.csh

<install-path>/etc/workload.sh

<install-path>/etc/workload.csh

 

They are needed to define and export some variables for the startup script environment. Above all, the PATH and the LD_LIBRARY_PATH needed to correctly run all the software.

The jobsubmission-profile-X.Y.Z-K.i386.rpm as premised, additionally installs the wl-jss_rb-env.sh configuration file in /opt/edg/etc, that is read by the broker and jobsubmission startup files when they are launched as root. The /opt/edg/etc/wl-jss_rb-env.sh file contains setting for the following variables:

-        CONDORG_INSTALL_PATH                 the CondorG installation path. Default value is

/home/dguser/CondorG

-        CONDOR_IDS                                        this is needed by condor to know under which

user it has to run. Value for this variable has to be set in the format uid.gid where uid is the user identifier and gid is the group identifier. This value has to be set by the system administrator.

-        JSSRB_USER                                        the user running RB and JSS processes.

Generally the value of this variable is the user name corresponding to the uid.gid set for the CONDOR_IDS variable.

Details on the installation and configuration and of each of the listed rpms are provided in section 4 of this document. For further information about RPM please consult the man pages or http://www.rpm.org.

 

 

4. Installation and Configuration

This section deals with the procedures for installing and configuring the WP1 WMS components on the target platforms. For each of them, before starting with the installation procedure which is described through step-by-step examples, is reported the list of dependencies i.e. the software required on the same machine by the component to run. Moreover a description of needed configuration items and environment variables settings is also provided. It is important to remark that since the rpms are generated using gcc 2.95.2 and RPM 3.0.5 it is expected to find the same configuration on the target platforms.

4.1. Logging and Bookkeeping services

From the installation point of view LB services can be split in two main components:

The LB local-logger services must be installed on all the machines hosting processes pushing information into the LB system, i.e. the machines running RB and JSS, and the gatekeeper machine of the CE. An exception is the submitting machine (i.e. the machine running the User Interface) on which this component can be installed but is not mandatory:

The LB server services need instead to be installed only on a server machine that usually coincides with the RB server one.

 

4.1.1. Required software

4.1.1.1. LB local-logger

For the installation of the LB local-logger the only software required is the Globus Toolkit 2.0 (actually only GSI rpms are needed).  Globus 2 rpms are available at http://datagrid.in2p3.fr/distribution/globus under the directory beta-xx/RPMS (recommended beta is 21 or higher). All rpms can be downloaded with the command 

wget -nd –r <URL>/<rpm name>

and installed with

rpm –ivh <rpm name>

 

4.1.1.2. LB Server

For the installation of the LB server the Globus Toolkit 2.0 (actually only GSI rpms are needed).  Globus 2 rpms are available at http://datagrid.in2p3.fr/distribution/globus  under the directory beta-xx/RPMS (recommended beta is 21 or higher). All rpms can be downloaded with the command 

wget -nd –r <URL>/<rpm name>

and installed with

rpm –ivh <rpm name>

 

Besides Globus Toolkit 2.0 for the LB server to work properly it is also necessary to install MySQL Distribution 3.22.31 or higher.

Instructions about MySQL installation can be found at the following URLs:

 http://www.redhat.com/support/resources/faqs/RH-apache-FAQ/MySQL/mysql-install.htm

Packages and more general documentation can be found at:

http://www.mysql.org/listcats3.php?menu=21&page_id=9.

Anyway the rpm of MySQL Ver 9.38 Distribution 3.22.32, for pc-linux-gnu (i686) is available at http://datagrid.in2p3.fr/distribution/config/external_services.html.

At least packages MySQL-3.22.32 and MySQL-client-3.32.22 have to be installed for creating and configuring the LB database.

LB server stores the logging data in a MySQL database that must hence be created. The following assumes the database and the server daemons (bkserver and ileventd) run on the same machine, which is considered to be secure, i.e. no database authentication is used. In a different set-up the procedure has to be adjusted accordingly as well as a secure database connection (via ssh tunnel etc.) established.

The action list below contains placeholders DB_NAME and USER_NAME, real values have to be substituted. They form the database connection string required on some LB daemons invocation. Suggested value for both DB_NAME and USER_NAME is `lbserver', this value is also the compiled-in default (i.e. when used, the database connection string needn't be specified at all).

The following needed steps require MySQL root privileges:

1)   Create the database:

       mysqladmin -u root -p create DB_NAME

where DB_NAME is the name of the database.

 

2)      Create a dedicated LB database user:

       mysql -u root -p -e 'grant create,drop,select,insert, \   update,delete on DB_NAME.* to USER_NAME@localhost'

where USER_NAME is the name of the user running the LB server daemons.

 

3)      Create the database tables:

        mysql -u USER_NAME DB_NAME < server.sql

where server.sql is a file containing sql commands  for creating needed tables. server.sql can be found in the directory  <install path>/etc created by the LB server rpm installation.

 

4.1.2. RPM installation

In order to install the LB local-logger and the LB server services, the following command have to be respectively issued with root privileges:

 

rpm -ivh workload-profile.X.Y.Z-K.i386.rpm

rpm –ivh locallogger-X.Y.Z-K.i386.rpm

rpm -ivh locallogger-profile-X.Y.Z-K.i386.rpm

rpm -ivh lbserver-X.Y.Z-K.i386.rpm

rpm -ivh lbserver-profile-X.Y.Z-K.i386.rpm

 

By default the locallogger-X.Y.Z-K.i386.rpm and lbserver-X.Y.Z-K.i386.rpm rpms install the software in the “/opt/edg” directory whilst the remaining two in “/etc/rc.d/init.d”.

 

4.1.3. The installation tree structure

4.1.3.1. LB local-logger

When the LB local-logger RPMs are installed, the following directory tree is created:

 

<install-path>/info
<install-path>/info/interlogger.info
<install-path>/lib
<install-path>/man
<install-path>/man/man1
<install-path>/man/man1/interlogger.1
<install-path>/man/man3
<install-path>/man/man3/_dgLBJobStat.3
<install-path>/man/man3/_dgLBQueryRec.3
<install-path>/man/man3/dgLBEvent.3
<install-path>/man/man3/dglbevents.3
<install-path>/man/man3/dglog.3
<install-path>/man/man3/dgssl.3
<install-path>/man/man3/dgxferlog.3
<install-path>/man/man3/escape.3
<install-path>/man/man3/lbapi.3
<install-path>/sbin
<install-path>/sbin/dglogd
<install-path>/sbin/interlogger
<install-path>/sbin/locallogger
<install-path>/share
<install-path>/share/doc
<install-path>/share/doc/Workload
<install-path>/share/doc/Workload/Logging
<install-path>/share/doc/Workload/Logging/html
<install-path>/share/doc/Workload/Logging/html/annotated.html
<install-path>/share/doc/Workload/Logging/html/class__dgLBJobStat-include.html
<install-path>/share/doc/Workload/Logging/html/class__dgLBJobStat-members.html
<install-path>/share/doc/Workload/Logging/html/class__dgLBJobStat.html
<install-path>/share/doc/Workload/Logging/html/class__dgLBQueryRec-include.html
<install-path>/share/doc/Workload/Logging/html/class__dgLBQueryRec-members.html
<install-path>/share/doc/Workload/Logging/html/class__dgLBQueryRec.html
<install-path>/share/doc/Workload/Logging/html/class_dgLBEvent-include.html
<install-path>/share/doc/Workload/Logging/html/class_dgLBEvent.html
<install-path>/share/doc/Workload/Logging/html/doxygen.gif
<install-path>/share/doc/Workload/Logging/html/files.html
<install-path>/share/doc/Workload/Logging/html/functions.html
<install-path>/share/doc/Workload/Logging/html/globals.html
<install-path>/share/doc/Workload/Logging/html/headers.html
<install-path>/share/doc/Workload/Logging/html/index.html
<install-path>/share/doc/Workload/Logging/html/null.gif
<install-path>/share/doc/Workload/Logging/refman.ps

 

/etc/rc.d/init.d

/etc/rc.d/init.d/locallogger

 

The sbin directory contains all the LB local-logger daemons executables. The script locallogger contained in “/etc/rc.d/init.d “ has to be used for starting daemons. In the man directory can be found the man page for the inter-logger daemon.

4.1.3.2. LB Server

When the LB server RPMs are installed, the following directory tree is created:

 

<install-path>/etc
<install-path>/etc/server.sql
<install-path>/lib
<install-path>/sbin
<install-path>/sbin/bkpurge
<install-path>/sbin/bkserver
<install-path>/sbin/ileventd
<install-path>/sbin/lbserver

 

/etc/rc.d/init.d

/etc/rc.d/init.d/lbserver

           

where the sbin directory contains all the LB server daemons executables. The script lbserver contained in “/etc/rc.d/init.d “ has to be used for starting daemons.

 

4.1.4. Configuration

Both the LB local-logger and LB server have no configuration files so no action is needed for this task.

 

4.1.5. Environment Variables

All LB components need the following environment variables to be set:

-        X509_USER_KEY                 the user private key file path

-        X509_USER_CERT              the user certificate file path

-        X509_CERT_DIR                  the trusted certificate directory and ca-signing-policy          directory

-        X509_USER_PROXY            the user proxy certificate file path

as required by GSI.

However, in case of LB daemons, the recommended way for specifying security files locations is using --cert, --key, --CAdir options explicitly.

The Logging library i.e. the library that is linked into UI, RB, JSS and Jobmanager, reads its immediate logging destination form the variable DGLOG_DEST.

It defaults to  x-dglog://localhost:15830 which is the correct value, hence it normally does not need to be set but on the submitting machine.  Correct format for this variable is:

DGLOG_DEST=x-dglog://HOST:PORT

where as already mentioned HOST defaults to localhost and PORT defaults to 15830.

On the submitting machine if the variable is not set, it is dynamically assigned by the UI with the value:

DGLOG_DEST=x-dglog://<LB_CONTACT>:15830

where  LB_CONTACT is the hostname of the machine where the LB server currently associated to the RB used for submitting jobs is running.

The Logging library functions timeout is read from the environment variable DGLOG_TIMEOUT. It defaults to 2 seconds that is the correct value for locals logging. On the submitting machine the value for this variable is set dynamically by the UI to 10 seconds (recommended value for non-locals logging is 10 to 15 seconds) and it is anyway configurable through the UI configuration.

Finally there is LBDB, the environment variable needed by the LB Server daemons (ileventd, bkserver and bkpurge). LBDB represents the MySQL database connect-string, defaults to

lbserver/@localhost:lbserver” and in the recommended set-up (see section 4.1.1.2) does not need to be set. Otherwise it should be set as follows:

LBDB=USER_NAME/PASSWORD@DB_HOSTNAME:DB_NAME

where

                - USER_NAME is the name of database user,

                - PASSWORD is user password for the database

                - DB_HOSTNAME is hostname of the host where the database is located

                - DB_NAME is name of the database.

       

4.2. RB and JSS

The Resource Broker and the Job Submission Services are the WMS components allowing the submission of jobs to the CEs. They are dealt with together since they always reside on the same host and consequently are distributed by means of a single rpm.

4.2.1. Required software

For the installation of RB and JSS the Globus Toolkit 2.0 rpms available at http://datagrid.in2p3.fr/distribution/globus under the directory beta-xx/RPMS (recommended beta is 21 or higher) are required to be installed on the target platform. All needed rpms can be downloaded with the command 

wget -nd –r <URL>/<rpm name>

and installed with

rpm –ivh <rpm name>

The Globus gridftp server package must also be installed and configured on the same host (see http://marianne.in2p3.fr/datagrid/documentation/EDG-Install-HOWTO.html for details).

It is important to recall that the Globus grid-mapfile located in /etc/grid-security on the RB server machine must be filled with the certificate subjects of all the users allowed to use the Resource Broker functionalities.  Users being mapped into the gridmap-file have to belong to a group having the same name of the user itself. At the same time the dedicated user dguser has to belong to all these groups.

Moreover on the same platform the following products are expected to be installed:

-        LB local-logger services (see section 4.1.1.1)

-        PostgreSQL  (RB and JSS)

-        Condor-G  (JSS)

-        ClassAd library (RB and JSS)

-        ReplicaCatalog from the WP2 distribution (RB)

 

4.2.1.1. PostgreSQL installation and configuration

Both RB and JSS use PostgreSQL database for implementing the internal job queue. The installation kit and the documentation for PostgreSQL can be found at the following URL:

http://www3.us.postgresql.org/sites.html

Required PostgreSQL version is 7.1.3 or higher. The following packages need to be installed (respecting the order in which they are listed): postgresql-libs, posgresql-devel, postgresql, postgresql-server, postgresql-tcl, postgresql-tk and postgresql-docs.

PostgreSQL also needs packages cyrus-sasl-1-5-11 (or higher), openssl-0.9.5a and openssl-devel-0.9.5a (or higher). All of them can be found at the following URL:

http://datagrid.in2p3.fr/distribution/external/RPMS

Hereafter are reported the configuration options that must be used when installing the package:

--with-CXX

--with-tcl

  --enable-odbc

Postgresql 7.1.3 is also available in rpm format (to be installed as root) at the URL :

http://datagrid.in2p3.fr/distribution/external/RPMS

Once PostgreSQL has been installed, you need as root to create a new system account dguser using the (RH specific) command

adduser –r –m dguser 

This command allows indeed creating a system account having a home directory. Then follow steps reported here below to create an empty database for JSS:

 

su – postgres                  (become the postgres user)

createuser –d –A dguser        (create the new database user dguser)

su – dguser                                              (become the user dguser)

createdb <DBNAME>              (create the new database for JSS)

 

The name of the created database must be the same as the one assigned to the Database_name attribute in file jss.conf (see section 4.2.4.2 for more details), otherwise JSS will use as default the "template1" database. Avoiding use of the template database is anyway strongly recommended.

The RB server uses instead another database named "rb", which is created by RB itself.

4.2.1.1.1. Upgrading from a previous version

On upgrading from version 1.1.x to version 1.2.y administrators must remember to  completely remove the table containing the old version database registry. This is because the 1.2.x JSS uses a new field inside the PostGreSQL database to store the proxy file path.

Commands that have to be issued as root: are:

 

psql template1 postgres (to connect to the database)

 

and  then change template1 to the database name contained inside the jss.conf  file.

Once inside the psql client do:

 

DROP TABLE condor_submit  (to remove the table)

 

and change condor_submit to the table name contained inside the jss.conf file.

 

4.2.1.2. Condor-G installation and configuration

Condor-G release required by JSS is CondorG 6.3.1 for INTEL-LINUX-GLIBC21. The Condor-G installation toolkit can be found at the following URL:

http://www.cs.wisc.edu/condor/downloads/condorg.license.html.

whilst it is available in rpm format (to be installed as root) at:

http://datagrid.in2p3.fr/distribution/external/RPMS 

Installation and configuration are quite straightforward and for details the reader can refer to the README file included in the Condor-G package. Main steps to be performed after having unpacked the package as root are:

-        become dguser    (su – dguser)

-        make sure the directory where you are going to install CondorG is owned by dguser

-        make sure the Globus Toolkit 2.0 has been installed on the platform

-        run the /opt/CondorG/setup.sh installation script 

-        remove the link  ~dguser/.globus/certificates created by the installation script

Moreover some additional configuration steps have to be performed in the Condor configuration file pointed to by the CONDOR_CONFIG environment variable set during installation. In the $CONDOR_CONFIG file the following attributes need to be modified:

RELEASE_DIR          = $(CONDORG_INSTALL_PATH)

CONDOR_ADMIN      = <a valid e-mail address of the Condor-G administrator>

UID_DOMAIN             = < the domain of the machine (e.g. pd.infn.it)>

FILESYSTEM_DOMAIN         = < the domain of the machine (e.g. pd.infn.it)>

HOSTALLOW_WRITE          = *

CRED_MIN_TIME_LEFT       = 0

GLOBUSRUN = $(GLOBUS_LOCATION)/bin/globusrun

 

and the following entries need to be added:

 

SKIP_AUTHENTICATION                                         = YES

AUTHENTICATION_METHODS                               = CLAIMTOBE

DISABLE_AUTH_NEGOTIATION                             = TRUE

GRIDMANAGER_CHECKPROXY_INTERVAL         = 600

GRIDMANAGER_MINIMUM_PROXY_TIME              = 180

 

The environment variable CONDORG_INSTALL_PATH is also set during installation and points to the path where the Condor-G package has been installed.

The current version of Condor-G for working properly requires file /etc/grid-security/certificates/ca-signing-policy.conf   that has been instead eliminated from the Globus Toolkit 2.0 distribution and must hence be created by the administrator.  This need will be removed with next release of Condor-G that will be fully Globus Toolkit 2.0 compliant.

4.2.1.3. ClassAd installation and configuration

The ClassAd release required by JSS and RB is classads-0.9 (or higher). The ClassAd library documentation can be found at the following URL:

http://www.cs.wisc.edu/condor/classad.

whilst it is available in rpm format (to be installed as root) at:

http://datagrid.in2p3.fr/distribution/external/RPMS 

4.2.1.4. ReplicaCatalog installation and configuration

The ReplicaCatalog release required by RB is ReplicaCatalogue-gcc32dbg-2.0 (or higher) that is available in rpm format (to be installed as root) at:

http://datagrid.in2p3.fr/distribution/wp2/RPMS 

 

4.2.2. RPM installation

In order to install the Resource Broker and the Job Submission services, the following command has to be issued with root privileges:

 

rpm -ivh workload-profile.X.Y.Z-K.i386.rpm

rpm -ivh proxy-X.Y.Z-K.i386.rpm

rpm -ivh jobsubmission-X.Y.Z-K.i386.rpm

rpm -ivh jobsubmission-profile-X.Y.Z-K.i386.rpm

 

By default the jobsubmission-X.Y.Z-K.i386.rpm and the proxy-X.Y.Z-K.i386.rpm rpms install the software in the “/opt/edg” directory whilst jobsubmission-profile-X.Y.Z-K.i386.rpm in “/etc/rc.d/init.d” and “/etc/sysconfig”.

4.2.3. The Installation Tree structure

When the jobsubmission rpms have been installed, the following directory tree is created:

 

<install-path>/bin

<install-path>/bin/RBserver

<install-path>/bin/jssparser

<install-path>/bin/jssserver

<install-path>/etc

<install-path>/etc/jss.conf

<install-path>/etc/rb.conf

<install-path>/etc/wl-jss_rb-env.sh

<install-path>/lib

<install-path>/man

<install-path>/man/man3

<install-path>/man/man3/BROKER_INFOstruct.3

<install-path>/man/man3/CannotConfigure.3

<install-path>/man/man3/CannotReadFile.3

<install-path>/man/man3/ConfSchema.3

<install-path>/man/man3/DeletePointer.3

<install-path>/man/man3/GDMP_ReplicaCatalog.3

<install-path>/man/man3/InvalidURL.3

<install-path>/man/man3/JSSConfiguration.3

<install-path>/man/man3/JobWrapper.3

<install-path>/man/man3/JssClient.3

<install-path>/man/man3/LDAPConnection.3

<install-path>/man/man3/LDAPSynchConnection.3

<install-path>/man/man3/LogManager.3

<install-path>/man/man3/MalformedFile.3

<install-path>/man/man3/RBJobRegistry.3

<install-path>/man/man3/RBMaster.3

<install-path>/man/man3/RBReplicaCatalog.3

<install-path>/man/man3/RBReplicaCatalogEx.3

<install-path>/man/man3/RBjob.3

<install-path>/man/man3/URL.3

<install-path>/man/man3/brokerinfo.3

<install-path>/man/man3/do_CloseSEs_supply_CE_with_nfiles.3

<install-path>/man/man3/jsscommon.3

<install-path>/man/man3/jssthreads.3

<install-path>/man/man3/matchmaking.3

<install-path>/man/man3/rbargs_t.3

<install-path>/man/man3/rbhandlers.3

<install-path>/man/man3/rbthreads.3

<install-path>/man/man3/select_CE_on_files.3

<install-path>/sbin

<install-path>/sbin/broker

<install-path>/sbin/jobsubmission

<install-path>/share

<install-path>/share/doc

<install-path>/share/doc/Workload

<install-path>/share/doc/Workload/Broker

<install-path>/share/doc/Workload/Broker/COPYING

<install-path>/share/doc/Workload/Broker/NEWS

<install-path>/share/doc/Workload/Broker/README

<install-path>/share/doc/Workload/Broker/html

<install-path>/share/doc/Workload/Broker/html/annotated.html

<install-path>/share/doc/Workload/Broker/html/class_BROKER_INFOstruct-include.html

<install-path>/share/doc/Workload/Broker/html/class_BROKER_INFOstruct-members.html

<install-path>/share/doc/Workload/Broker/html/class_BROKER_INFOstruct.html

<install-path>/share/doc/Workload/Broker/html/class_ConfSchema-include.html

<install-path>/share/doc/Workload/Broker/html/class_ConfSchema-members.html

<install-path>/share/doc/Workload/Broker/html/class_ConfSchema.html

<install-path>/share/doc/Workload/Broker/html/class_GDMP_ReplicaCatalog-include.html

<install-path>/share/doc/Workload/Broker/html/class_GDMP_ReplicaCatalog-members.html

<install-path>/share/doc/Workload/Broker/html/class_GDMP_ReplicaCatalog.gif

<install-path>/share/doc/Workload/Broker/html/class_GDMP_ReplicaCatalog.html

<install-path>/share/doc/Workload/Broker/html/class_LDAPConnection-include.html

<install-path>/share/doc/Workload/Broker/html/class_LDAPConnection-members.html

<install-path>/share/doc/Workload/Broker/html/class_LDAPConnection.gif

<install-path>/share/doc/Workload/Broker/html/class_LDAPConnection.html

<install-path>/share/doc/Workload/Broker/html/class_LDAPSynchConnection-include.html

<install-path>/share/doc/Workload/Broker/html/class_LDAPSynchConnection-members.html

<install-path>/share/doc/Workload/Broker/html/class_LDAPSynchConnection.gif

<install-path>/share/doc/Workload/Broker/html/class_LDAPSynchConnection.html

<install-path>/share/doc/Workload/Broker/html/class_RBJobRegistry-include.html

<install-path>/share/doc/Workload/Broker/html/class_RBJobRegistry-members.html

<install-path>/share/doc/Workload/Broker/html/class_RBJobRegistry.html

<install-path>/share/doc/Workload/Broker/html/class_RBMaster-include.html

<install-path>/share/doc/Workload/Broker/html/class_RBMaster-members.html

<install-path>/share/doc/Workload/Broker/html/class_RBMaster.html

<install-path>/share/doc/Workload/Broker/html/class_RBReplicaCatalog-include.html

<install-path>/share/doc/Workload/Broker/html/class_RBReplicaCatalog-members.html

<install-path>/share/doc/Workload/Broker/html/class_RBReplicaCatalog.gif

<install-path>/share/doc/Workload/Broker/html/class_RBReplicaCatalog.html

<install-path>/share/doc/Workload/Broker/html/class_RBReplicaCatalogEx-include.html

<install-path>/share/doc/Workload/Broker/html/class_RBReplicaCatalogEx.html

<install-path>/share/doc/Workload/Broker/html/class_RBjob-include.html

<install-path>/share/doc/Workload/Broker/html/class_RBjob-members.html

<install-path>/share/doc/Workload/Broker/html/class_RBjob.html

<install-path>/share/doc/Workload/Broker/html/class_do_CloseSEs_supply_CE_with_nfiles-include.html

<install-path>/share/doc/Workload/Broker/html/class_do_CloseSEs_supply_CE_with_nfiles-members.html

<install-path>/share/doc/Workload/Broker/html/class_do_CloseSEs_supply_CE_with_nfiles.gif

<install-path>/share/doc/Workload/Broker/html/class_do_CloseSEs_supply_CE_with_nfiles.html

<install-path>/share/doc/Workload/Broker/html/class_select_CE_on_files-include.html

<install-path>/share/doc/Workload/Broker/html/class_select_CE_on_files-members.html

<install-path>/share/doc/Workload/Broker/html/class_select_CE_on_files.gif

<install-path>/share/doc/Workload/Broker/html/class_select_CE_on_files.html

<install-path>/share/doc/Workload/Broker/html/doxygen.gif

<install-path>/share/doc/Workload/Broker/html/files.html

<install-path>/share/doc/Workload/Broker/html/functions.html

<install-path>/share/doc/Workload/Broker/html/globals.html

<install-path>/share/doc/Workload/Broker/html/group_ReplicaCatalog.html

<install-path>/share/doc/Workload/Broker/html/headers.html

<install-path>/share/doc/Workload/Broker/html/hierarchy.html

<install-path>/share/doc/Workload/Broker/html/index.html

<install-path>/share/doc/Workload/Broker/html/modules.html

<install-path>/share/doc/Workload/Broker/html/null.gif

<install-path>/share/doc/Workload/Broker/refman.ps

<install-path>/share/doc/Workload/Common

<install-path>/share/doc/Workload/Common/html

<install-path>/share/doc/Workload/Common/html/annotated.html

<install-path>/share/doc/Workload/Common/html/class_DeletePointer-include.html

<install-path>/share/doc/Workload/Common/html/class_DeletePointer-members.html

<install-path>/share/doc/Workload/Common/html/class_DeletePointer.html

<install-path>/share/doc/Workload/Common/html/class_InvalidURL-include.html

<install-path>/share/doc/Workload/Common/html/class_InvalidURL.html

<install-path>/share/doc/Workload/Common/html/class_URL-include.html

<install-path>/share/doc/Workload/Common/html/class_URL-members.html

<install-path>/share/doc/Workload/Common/html/class_URL.html

<install-path>/share/doc/Workload/Common/html/doxygen.gif

<install-path>/share/doc/Workload/Common/html/files.html

<install-path>/share/doc/Workload/Common/html/functions.html

<install-path>/share/doc/Workload/Common/html/group_Common.html

<install-path>/share/doc/Workload/Common/html/headers.html

<install-path>/share/doc/Workload/Common/html/index.html

<install-path>/share/doc/Workload/Common/html/modules.html

<install-path>/share/doc/Workload/Common/html/null.gif

<install-path>/share/doc/Workload/Common/refman.ps

<install-path>/share/doc/Workload/JobSubmission

<install-path>/share/doc/Workload/JobSubmission/AUTHORS

<install-path>/share/doc/Workload/JobSubmission/COPYING

<install-path>/share/doc/Workload/JobSubmission/NEWS

<install-path>/share/doc/Workload/JobSubmission/README

<install-path>/share/doc/Workload/JobSubmission/html

<install-path>/share/doc/Workload/JobSubmission/html/annotated.html

<install-path>/share/doc/Workload/JobSubmission/html/class_CannotConfigure-include.html

<install-path>/share/doc/Workload/JobSubmission/html/class_CannotConfigure-members.html

<install-path>/share/doc/Workload/JobSubmission/html/class_CannotConfigure.html

<install-path>/share/doc/Workload/JobSubmission/html/class_CannotReadFile-include.html<install-path>/share/doc/Workload/JobSubmission/html/class_CannotReadFile-members.html<install-path>/share/doc/Workload/JobSubmission/html/class_CannotReadFile.html

<install-path>/share/doc/Workload/JobSubmission/html/class_JSSConfiguration-include.html

<install-path>/share/doc/Workload/JobSubmission/html/class_JSSConfiguration-members.html

<install-path>/share/doc/Workload/JobSubmission/html/class_JSSConfiguration.html

<install-path>/share/doc/Workload/JobSubmission/html/class_JobWrapper-include.html

<install-path>/share/doc/Workload/JobSubmission/html/class_JobWrapper-members.html

<install-path>/share/doc/Workload/JobSubmission/html/class_JobWrapper.html

<install-path>/share/doc/Workload/JobSubmission/html/class_JssClient-include.html

<install-path>/share/doc/Workload/JobSubmission/html/class_JssClient-members.html

<install-path>/share/doc/Workload/JobSubmission/html/class_JssClient.html

<install-path>/share/doc/Workload/JobSubmission/html/class_LogManager-include.html

<install-path>/share/doc/Workload/JobSubmission/html/class_LogManager-members.html

<install-path>/share/doc/Workload/JobSubmission/html/class_LogManager.html

<install-path>/share/doc/Workload/JobSubmission/html/class_MalformedFile-include.html

<install-path>/share/doc/Workload/JobSubmission/html/class_MalformedFile-members.html

<install-path>/share/doc/Workload/JobSubmission/html/class_MalformedFile.html

<install-path>/share/doc/Workload/JobSubmission/html/class_rbargs_t-include.html

<install-path>/share/doc/Workload/JobSubmission/html/class_rbargs_t-members.html

<install-path>/share/doc/Workload/JobSubmission/html/class_rbargs_t.gif

<install-path>/share/doc/Workload/JobSubmission/html/class_rbargs_t.html

<install-path>/share/doc/Workload/JobSubmission/html/doxygen.gif

<install-path>/share/doc/Workload/JobSubmission/html/files.html

<install-path>/share/doc/Workload/JobSubmission/html/functions.html

<install-path>/share/doc/Workload/JobSubmission/html/globals.html

<install-path>/share/doc/Workload/JobSubmission/html/group_JobWrapper.html

<install-path>/share/doc/Workload/JobSubmission/html/group_JssClient.html

<install-path>/share/doc/Workload/JobSubmission/html/group_JssConfigure.html

<install-path>/share/doc/Workload/JobSubmission/html/group_JssError.html

<install-path>/share/doc/Workload/JobSubmission/html/group_JssParser.html

<install-path>/share/doc/Workload/JobSubmission/html/group_JssThreads.html

<install-path>/share/doc/Workload/JobSubmission/html/headers.html

<install-path>/share/doc/Workload/JobSubmission/html/hierarchy.html

<install-path>/share/doc/Workload/JobSubmission/html/index.html

<install-path>/share/doc/Workload/JobSubmission/html/modules.html

<install-path>/share/doc/Workload/JobSubmission/html/null.gif

<install-path>/share/doc/Workload/JobSubmission/refman.ps

 

/etc/rc.d/init.d

/etc/rc.d/init.d/broker

/etc/rc.d/init.d/jobsubmission

 

 

The directory bin contains all the RB and JSS server process executables Rbserver, jssserver and jssparser. In etc are stored the configuration files (see below Section 4.2.4.1 and section 4.2.4.2). The scripts to start and stop the RB and JSS processes are contained in “/etc/rc.d/init.d”.

 

4.2.4. Configuration

Once the rpm has been installed, the RB and JSS services must be properly configured. This can be done editing the two files rb.conf and jss.conf that are stored in <install-path >/etc. Actions to be performed to configure the Resource Broker and the Job Submission Service are described in the following two sections.

 

4.2.4.1. RB configuration

Configuration of the Resource Broker is accomplished editing the file “<install-path>/etc/rb.conf:” to set opportunely the contained attributes. They are listed hereafter grouped according to the functionality they are related with:

-        MDS_contact, MDS_port and MDS_timeout refer to the II service and respectively represent the hostname where this service is running, the port number, and the timeout in seconds when the RB queries the II. E.g.:

  MDS_contact = "grid001f.cnaf.infn.it";

 MDS_port = 2170;

 MDS_timeout = 60;

 

-        MDS_gris_port refers to the port to be used by RB to contact GRIS’es. E.g.:

 MDS_gris_port = 2135;

 

-        MDS_multi_attributes define the list of the attribute that in the MDS are multi-valued (i.e. that this can assume multiple values). It is recommended to not modify the default value for this parameter which is currently:

  MDS_multi_attributes = {

     "AuthorizedUser",

     "RunTimeEnvironment",

     "CloseCE"

  };

 

-        MDS_basedn defines the basedn, which represents the distinguished name (DN) to use as a starting place for searches in the information index. It is recommended to not modify the default value for this parameter which is currently set to:

MDS_basedn = "o=Grid"

 

-        LB_CONTACT and LB_PORT refer to the LB Server service and represent respectively the hostname and port where the LB server is listening for connections. E.g.:

LB_contact = "grid004f.cnaf.infn.it";

LB_port = 7846;

The Logging library i.e. the library providing APIs for logging job events to the LB (that is linked into RB) reads its immediate logging destination form the environment variable DGLOG_DEST (see section 4.1.5) hence it is not dealt with in the configuration file. DGLOG_DEST defaults to  x-dglog://localhost:15830 which is the correct value, hence it normally does not need to be set indicating that the LB local-logger services should normally run on the same host as the RB server. The logging function timeout is instead read from the environment variable DGLOG_TIMEOUT that defaults to 2 seconds.

 

-        JSS_contact and JSS_server_port refer to the JSS and represent respectively the hostname (it must be the same host of the RB server one) and the port number (it must match with the RB_client_port parameter in the jss.conf file - see section 4.2.4.2) where the JSS server is listening. Moreover JSS_client_port represents the port used by RB to listen for JSS communications. Value of the latter parameter must match with the JSS_server_port parameter in the jss.conf file (see section 4.2.4.2). Hereafter is reported an example for these parameters:

  JSS_contact = "grid004f.cnaf.infn.it";

 JSS_client_port = 8881;

 JSS_server_port = 9991;

 

-        JSS_backlog and UI_backlog define the maximum number of simultaneous connections  from JSS and UI supported by the socket . Default values are:

 

  JSS_backlog = 5;

 UI_backlog  = 5;

 

-        UI_server port is the port used by the RB server to listen for requests coming from the User Interface. Default value for this parameter is:

 

  UI_server_port = 7771;

 

-        RB_pool_size represents the maximum number of request managed simultaneously by the RB server. Default value for this parameter is:

 

  RB_pool_size = 16;

 

-        RB_purge_threshold that defines the threshold age in seconds for RBRegistry information. Indeed RB purges all the information and frees storage space of a job (input/output sandboxes) when the last update of the internal information database has taken place since more than RB_purge_threshold seconds. Default value for this parameter is about one week:

 

  RB_purge_threshold = 600000;

 

-        RB_cleanup_threshold represents the span of time (expressed in seconds) between two consecutive cleanups of job registry. During the registry cleanup the RB removes all the entries of those jobs classified as ABORTED. At the end of the cleanup if it is needed (see RB_purge_trheshold) the purging of the registry is performed, as well. The default value for this configuration parameter is:

   

      RB_cleanup_threshold = 3600;

 

The administrator according to the estimated amount of jobs input/sandbox files in the given period must anyway tailor this value in order to not overfull RB machine disk space.

 

-        RB_sandbox_path, which represents the pathname of the root sandboxes directory i.e. the complete pathname linking to the directory where the RB creates both input/output sandboxes directories and stores the “.Brokerinfo” file. Default value for this parameter is the temporary directory:

 

RB_sandbox_path = "/tmp";

 

-        RB_logfile that defines the name of the file used by the RB for recording its various events. The default value for this parameter is:

 

RB_logfile = "/var/tmp/RBserver.log";

 

-        RB_logfile_size. This parameter limits the size of the RB log file to the specified size, each time it grows beyond this maximum the RB flushes its content in a new file with the same name of the original but having .old as extension. The size should be expressed in bytes. Default value for this parameter is:

 

RB_logfile_size = 5120000;

 

-         RB_logfile_level. This parameter allows the user to specify the verbosity of the information the RB records in its log file. Possible values are: 0 (none), 1 (verylow), 2 (low), 3 (medium), 4 (high), 5 (veryhigh) and 6 (ugly).  The default value for this configuration parameter is:

 

RB_logfile_level = 3;

 

-        RB_submission_retries. This parameter allows the user to specify the number of times the RB has to try to re-schedule and re-submit the job to JSS in case the submission to the CE fails (e.g. globus down on the CE, network problem etc.). The resubmission is tried for all the CEs satisfying the job requirements. When a job is submitted specifying the RetryCount attribute in the JDL, the RB performs a number of submission retries equals to the minimum value between RetryCount and RB_submission_retries. The default value for this configuration parameter is:

 

RB_submission_retries = 3;

 

-        MyProxyServer. This parameter allows the user to specify the server host name of the MyProxy credential repository system to be contacted for periodic credential renewal.  An example for this configuration parameter is provided hereafter:

 

MyProxyServer = "skurut.cesnet.cz";

 

-        SkipJobSubmission. If this parameter is set to true the Resource Broker will skip the actual job submission aborting the job at the end of match-making algorithm and will notify the Logging and Bookkeeping service by issuing a dgLogAbort with a text specifying the matching CE where the job would have been sent if the JSS interaction had not been disabled. The default value for this configuration parameter is

 

SkipJobSubmission = false;

 

-        RB_notification_queue_size. This parameter represents the number of maximum notifications that the RB can handle. The default value for this configuration parameter is 32:

 

RB_notification_queue_size = 32

 

No semicolon has to be put at the end of last field in the rb.conf file.

4.2.4.2. JSS configuration

Configuration of the Job Submission Service is accomplished editing the file “<install-path>/etc/jss.conf:” to set opportunely the contained parameters. They are listed hereafter together wit their meanings:

-        Condor_submit_file_prefix defines the prefix for the CondorG submission file. The job identifier dg_jobId is then appended to this prefix to build the actual submission file name). Default value for this parameter is:

 

Condor_submit_file_prefix  = "/var/tmp/CondorG.sub";

 

-        Condor_log_file defines the absolute path name of the CondorG log file, i.e. the file where the events for the submitted jobs are recorded. Default value for this parameter is:

 

Condor_log_file = "/var/tmp/CondorG.log";

 

-        Condor_stdoe_dir defines the directory where the standard output and standard error files of CondorG are temporarily saved. Default value is:

 

Condor_stdoe_dir = "/var/tmp";

 

-        Job_wrapper_file_prefix is the prefix for the Job Wrapper file name (i.e. the script wrapping the actual job which is submitted on the CE). As before the job identifier dg_jobId is appended to this prefix to build the actual file name. Default value for this parameter is:

 

Job_wrapper_file_prefix     = "/var/tmp/Job_wrapper.sh";

-        Database_name is the name of the Postgres database where JSS registers information about submitted jobs. This name must correspond to an existing database (how to create it is briefly described in section 4.2.1.1). Default value for the database name is the one of the database automatically created when installing Postgres, i.e.:

 

Database_name  = "template1";

 

-        Database_table_name is the name of the table in the previous database. This table is created by the JSS itself if not found. Default value for this parameter is:

 

Database_table_name = "condor_submit";   

 

-        JSS_server_port and RB_client_port represent respectively the port used by JSS to listen for RB communication and to communicate to the RB server (e.g. for sending notifications). The two mentioned parameters have to match respectively with the JSS_client_port and JSS_server_port parameters in the rb.conf file (see section 4.2.4.1). Default values are:

 

JSS_server_port = 8881;

RB_client_port  = 9991;

 

-        Condor_log_file_size indicates the size in bytes at which the CondorG.log log file has to be splitted. Default value is:

Condor_log_file_size = 64000;

 

4.2.5. Environment variables

4.2.5.1. RB

Environment variables that have to be set for the RB are listed hereafter:

-        PGSQL_INSTALL_PATH the Postgres database installation path. Default value is

/usr/local/pgsql

-        PGDATA                                      the path where are stored the Postgres database data

Files. Default value is  /usr/local/pgsql/data

-        GDMP_INSTALL_PATH              the gdmp installation path. Default value is /opt/edg.

 

Setting of PGSQL_INSTALL_PATH and PGDATA is only needed if installation is not performed from rpm. Moreover $GDMP_INSTALL_PATH/lib has to be added to LD_LIBRARY_PATH. Finally, there are other environment variables needed at run-time by RB. They are:

-        EDG_WL_RB_CONFIG_DIR      the RB configuration directory

-        X509_HOST_CERT                    the user certificate file path

-        X509_HOST_KEY                       the user private key file path

-        X509_USER_PROXY                  the user proxy certificate file path

-       GRIDMAP                                                location of the Globus grid-mapfile that translates X509 certificate subjects into local Unix usernames. The default is /etc/grid-security/grid-mapfile.

 

Anyway, all variable in the latter group are set by the broker start-up script.

4.2.5.2. JSS

Environment variables that have to be set for the JSS are listed hereafter:

-        PGSQL_INSTALL_PATH            the Postgres database installation path. Default value is

/usr/local/pgsql

-        PGDATA                                      the path where are stored the Postgres database data

Files. Default value is  /usr/local/pgsql/data

-        PGUSER                                     the user that has been used to start postgres services.

Default value is  postgres

-        CONDOR_CONFIG                    The CondorG configuration file path. Default value is

/ home/dguser/CondorG/etc/condor_config

-        CONDORG_INSTALL_PATH      the CondorG installation path. Default value is

/home/dguser/CondorG

 

Setting of the former variables is only needed if installation is not performed from rpms. However don't forget to check them in the file /opt/edg/etc/wl-jss_rb-env.sh when you install rpms. Moreover: 

-        $CONDORG_INSTALL_PATH/bin

-        $CONDORG_INSTALL_PATH/sbin

-        $PGSQL_INSTALL_PATH/bin                (only if installation is not performed from rpm)

must be included in the PATH environment variable and

-        $CONDORG_INSTALL_PATH/lib,

-        $PGSQL_INSTALL_PATH/lib                  (only if installation is not performed from rpm)

have to be added to  LD_LIBRARY_PATH. Finally, there are other environment variables needed at run-time by JSS. They are:

-        EDG_WL_JSS_CONFIG_DIR    the JSS configuration directory

-        X509_HOST_CERT                    the user certificate file path

-        X509_HOST_KEY                       the user private key file path

-        X509_USER_PROXY                  the user proxy certificate file path

-       GRIDMAP                                                location of the Globus grid-mapfile that translates X509 certificate subjects into local Unix usernames. The default is /etc/grid-security/grid-mapfile.

 Anyway all variables in the latter group are set into the jobsubmission start-up script.

 


 

4.3. Information Index

The Information Index (II) is the service queried by the Resource Broker to get information about resources for the submitted jobs during the matchmaking process. An II must hence be deployed for each RB/JSS instance.

This section describes steps to be performed to install and configure the Information Index service.

4.3.1. Required software

For installing the II, apart from the informationindex and the informationindex-profile rpms (see section 4.3.2 for details), the following Globus Toolkit 2.0 and Datagrid rpms are needed:

-        globus_ssl_utils-gcc32dbg_rtl             version >= 2.1

-        globus_gram_reporter-noflavor_data        version >= 2.0

-        globus_gss_assist-gcc32dbg_rtl            version >= 2.0

-        globus_libtool-gcc32dbgpthr_rtl                version >= 1.4

-        globus_openssl-gcc32dbg_rtl               version >= 0.9.6b

-        globus_openldap-gcc32dbg_pgm              version >= 2.0.14

-        globus_libtool-gcc32dbg_rtl               version >= 1.4

-        globus_openssl-gcc32dbgpthr_rtl                version >= 0.9.6b

-        globus_openldap-gcc32dbg_rtl              version >= 2.0.14

-        globus_mds_back_giis-gcc32dbg_pgm         version >= 0.3

-        globus_mds_gris-noflavor_data             version >= 2.2

-        globus_cyrus_sasl-gcc32dbg_rtl            version >= 1.5.27

-        globus_cyrus_sasl-gcc32dbgpthr_rtl        version >= 1.5.27

-        globus_gssapi_gsi-gcc32dbg_rtl            version >= 2.0

-        globus_openldap-gcc32dbgpthr_rtl          version >= 2.0.14

-        edg-info-main                             version >= 1.0.0

 

The above listed rpms are available at http://datagrid.in2p3.fr/distribution/globus under the directory beta-xx/RPMS (recommended beta is 21 or higher) and at http://datagrid.in2p3.fr/distribution/datagrid/wp6.

All the needed packages can be downloaded with the command 

wget -nd –r <URL>/<rpm name>

and installed with

rpm –ivh <rpm name>

 

4.3.2. RPM installation

In order to install the Information Index service, the following command has to be issued with root privileges:

 

rpm -ivh workload-profile.X.Y.Z-K.i386.rpm

rpm -ivh informationindex.X.Y.Z-K.i386.rpm

rpm -ivh informationindex-profile.X.Y.Z-K.i386.rpm

 

By default the first rpm installs the software in the “/opt/edg” directory whilst the second in “/etc/rc.d/init.d”.

4.3.3. The Installation tree structure

When the informationindex rpms have been installed, the following directory tree is created:

 

<install-path>/etc

<install-path>/etc/grid-info-site-giis.conf

<install-path>/etc/grid-info-slapd-giis.conf

<install-path>/sbin

<install-path>/sbin/information_index

<install-path>/share

<install-path>/share/doc

<install-path>/share/doc/Workload

<install-path>/share/doc/Workload/InformIndex

<install-path>/share/doc/Workload/InformIndex/COPYING

<install-path>/share/doc/Workload/InformIndex/NEWS

<install-path>/share/doc/Workload/InformIndex/README

<install-path>/var

 

/etc/rc.d/init.d

/etc/rc.d/init.d/information_index

 

Under the installation path in etc are stored the configuration files and var (initially empty) is used by the II to store files created at start-up, containing args and pid of the II process. The information_index script file can be used both from /etc/rc.d/init.d and <install-path>/sbin to start the II.

 

 

4.3.4. Configuration

The II has two configuration files that are located in <install-path>/etc and are named:

-        grid-info-slapd-giis.conf

-        grid-info-site-giis.conf

In grid-info-slapd-giis.conf are specified the schema file locations and the database type, whilst in grid-info-site-giis.conf are listed the entries for the GRISes that are registered to this II. Each entry has the following format:

dn: service=register, dc=mi, dc=infn, dc=it, o=grid

objectclass: GlobusTop

objectclass: GlobusDaemon

objectclass: GlobusService

objectclass: GlobusServiceMDSResource

Mds-Service-type: ldap

Mds-Service-hn: bbq.mi.infn.it

Mds-Service-port: 2135

Mds-Service-Ldap-sizelimit: 20

Mds-Service-Ldap-ttl: 200

Mds-Service-Ldap-cachettl: 50

Mds-Service-Ldap-timeout: 30

Mds-Service-Ldap-suffix: o=grid

 

The field Mds-Service-hn specifies the GRIS address; the Mds-Service-port specifies the GRIS port (2135 is strongly recommended) whilst the other entries are related to ldap sizelimit and ldap ttl. To add a new GRIS to the given II, it suffices to add a new entry like the one just showed, to the grid-info-site-giis.conf file.

Another file that can be used to configure the II is the start-up script information_index. In this file is indeed specified the number of the port that is used by the II to listen for requests whose default is 2170. This value can be changed to make II listen on another port provided it matches with the value of the MDS_port attribute in the RB configuration file rb.conf (see section 4.2.4.1).

4.3.5. Environment Variables

The only environment variable needed by the II to run is the Globus installation path GLOBUS_LOCATION that is anyway set by the start-up script information_index.

4.4. User Interface

This section describes the steps needed to install and configure the User Interface, which is the software module of the WMS allowing the user to access main services made available by the components of the scheduling sub-layer.

4.4.1. Required software

In order to install the UI, apart from the userinterface and workload-profile rpms (see section 4.4.2 for details) you will need the following packages:

-        workload-profile.X.Y.Z-K.i386.rpm

-        userinterface-profile.X.Y.Z-K.i386.rpm

-        userinterface-X.Y.Z-K.i386.rpm

 

the following Globus Toolkit 2.0 and Datagrid rpms available respectively at http://datagrid.in2p3.fr/distribution/globus and http://datagrid.in2p3.fr/distribution/datagrid/wp6  are needed:

 

-        globus_gss_assist-gcc32dbgpthr_rtl-2.0-21

-        globus_gssapi_gsi-gcc32dbgpthr_rtl-2.0-21

-        globus_ssl_utils-gcc32dbgpthr_rtl-2.1-21

-        globus_gass_transfer-gcc32dbg_rtl-2.0-21

-        globus_openssl-gcc32dbgpthr_rtl-0.9.6b-21

-        globus_ftp_control-gcc32dbg_rtl-1.0-21

-        globus_user_env-noflavor_data-2.1-21

-        globus_gss_assist-gcc32dbg_rtl-2.0-21

-        globus_gssapi_gsi-gcc32dbg_rtl-2.0-21

-        globus_ftp_client-gcc32dbg_rtl-1.1-21

-        globus_ssl_utils-gcc32dbg_rtl-2.1-21

-        globus_ssl_utils-gcc32dbg_pgm-2.1-21

-        globus_gass_copy-gcc32dbg_rtl-2.0-21

-        globus_gass_copy-gcc32dbg_pgm-2.0-21

-        globus_openssl-gcc32dbg_rtl-0.9.6b-21

-        globus_common-gcc32dbg_rtl-2.0-21

-        globus_profile-edgconfig-0.9-1

-        globus_io-gcc32dbg_rtl-2.0-21

-        globus_core-edgconfig-0.6-2

-        obj-globus-1.0-4.edg

-        globus_cyrus_sasl-gcc32dbgpthr_rtl-1.5.27-21

-        globus_libtool-gcc32dbgpthr_rtl-1.4-21

-        globus_mds_common-gcc32dbg_pgm-2.2-21

-        globus_openldap-gcc32dbg_pgm-2.0.14-21

-        globus_openldap-gcc32dbgpthr_rtl-2.0.14-21

-        globus_core-gcc32dbg_pgm-2.1-21

 

Moreover the set of security configuration rpm’s for all the Certificate Authorities in Testbed1 available at http://datagrid.in2p3.fr/distribution/datagrid/security/RPMS/ have to be installed together with the rpm to be used for renewing your certificate for your CA. This is available at  http://datagrid.in2p3.fr/distribution/datagrid/security/RPMS/local/.

The Python interpreter, version 2.1.1 has also to be installed on the submitting machine. The rpm for this package is available at http://datagrid.in2p3.fr/distribution/external/RPMS as:

-        python-2.1.1-3.i386.rpm

Information about python and the package sources can be found at www.python.org.

Since the Linux RH 6.2 and RH 7.2 distribution already encompasses Python-1.5 installed and the recent standard Python2 rpms from RedHat and from python.org avoid the conflict with previous versions by only create python2* binaries, the UI scripts use “python2” executable as Python interpreter.  Before using the UI commands it is hence important to check that the “python2” executable is available on the submission platform and if it is not the case the necessary symbolic link should be created.

All the needed packages can be downloaded with the command 

wget -nd –r <URL>/<rpm name>

and installed with

rpm –ivh <rpm name>

 

4.4.2. RPM installation

In order to install the User Interface, the following command has to be issued with root privileges:

 

rpm –ivh workload-profile.X.Y.Z-K.i386.rpm

rpm –ivh userinterface-profile.X.Y.Z-K.i386.rpm

rpm -ivh userinterface-X.Y.Z-K.i386.rpm

 

By default the rpm installs the software in the “/opt/edg” directory.

 

4.4.3. The tree structure

After the userinterface* and the workload rpms have been installed, the following directory tree is created:

<install-path>/bin
<install-path>/bin/JobAdv.py
<install-path>/bin/JobAdv.pyc
<install-path>/bin/UIchecks.py
<install-path>/bin/UIchecks.pyc
<install-path>/bin/UIutils.py
<install-path>/bin/UIutils.pyc
<install-path>/bin/dg-job-cancel
<install-path>/bin/dg-job-get-logging-info
<install-path>/bin/dg-job-get-output
<install-path>/bin/dg-job-id-info
<install-path>/bin/dg-job-list-match
<install-path>/bin/dg-job-status
<install-path>/bin/dg-job-submit
<install-path>/bin/libRBapi.py
<install-path>/bin/libRBapi.pyc
<install-path>/etc
<install-path>/etc/UI_ConfigENV.cfg
<install-path>/etc/UI_Errors.cfg

<install-path>/etc/UI_Help.cfg

<install-path>/etc/job_template.tpl

<install-path>/lib
<install-path>/lib/libLBapi.a
<install-path>/lib/libLBapi.la
<install-path>/lib/libLBapi.so
<install-path>/lib/libLBapi.so.0
<install-path>/lib/libLBapi.so.0.0.0
<install-path>/lib/libLOGapi.a
<install-path>/lib/libLOGapi.la
<install-path>/lib/libLOGapi.so
<install-path>/lib/libLOGapi.so.0
<install-path>/lib/libLOGapi.so.0.0.0
<install-path>/lib/libRBapic.a
<install-path>/lib/libRBapic.la
<install-path>/lib/libRBapic.so
<install-path>/lib/libRBapic.so.0
<install-path>/lib/libRBapic.so.0.0.0

<install-path>/share

<install-path>/share/doc

<install-path>/share/doc/Workload

<install-path>/share/doc/Workload/UserInterface

<install-path>/share/doc/Workload/UserInterface/COPYING

<install-path>/share/doc/Workload/UserInterface/NEWS

<install-path>/share/doc/Workload/UserInterface/README

 

/etc/profile.d/wl-ui-env.sh

/etc/profile.d/wl-ui-env.csh

The bin directory contains all UI python scripts including the commands made available to the user. In lib are installed all the API wrappers shared libraries, while in etc can be found the errors and configuration files UI_ConfigENV.cfg and UI_Errors.cfg plus the help file (UI_Help.cfg) and a template of a job description in JDL (job_template.tpl).

4.4.4. Configuration

Configuration of the User Interface is accomplished editing the file “<install-path>/etc/UI_ConfigENV.cfg:” to set opportunely the contained parameters. They are listed hereafter together wit their meanings:

 

-        DEFAULT_STORAGE_AREA_IN defines the path of the directory where files coming from RB (i.e. the jobs Output Sandbox files) are stored if not specified by the user through commands options. Default value for this parameter is:

 

DEFAULT_STORAGE_AREA_IN = /tmp

 

-        requirements, rank represent the values that are assigned by the UI to the corresponding job attributes (mandatory attributes) if these have not been provided by the user in the JDL file describing the job. Default values are:

 

requirements = TRUE

rank = - other.EstimatedTraversalTime

 

If the user has provided an expression for the requirements attribute in the JDL, the one specified in the configuration file is added (in AND) to the existing one. E.g. if in the configuration file there is:

requirements = other.Active

and in the JDL file the user has specified:

requirements = other.LRMSType == "PBS";

then the  job description that is passed to the RB will contain

requirements = other.LRMSType == "PBS" && other.Active ;

Obviously the value TRUE for the requirements in the configuration file does not have any impact on the evaluation of job requirements:

 requirements = other.LRMSType == "PBS" && TRUE ;

It is also possible to disable the default for the rank attribute by setting it to 0 (i.e. rank = 0) in the configuration file.  Indeed with such a default, if no rank is specified in the JDL then all matching resources will be assigned with equal ranking (i.e. 0) that is equivalent to no ranking.

-        ErrorStorage represents the path of the location where the UI creates log files. Default location is:

 

ErrorStorage = /tmp

 

-        RetryCountLB and RetryCountJobId are the number of UI retrials on fatal errors respectively when opening connection with an LB and when querying the LB for information about a given job. Default values for these parameters are: 

 

RetryCountLB = 1

RetryCountJobId = 1

 

-        LoggingTimeout represents the timeout of the dgLogTransfer LB API called by the UI for logging the JobTransfer event. This parameter makes the UI set accordingly the environment variable DGLOG_TIMEOUT. If not provided in the configuration file, it defaults to 2 seconds (UI and logging services on the same host). Recommended value for UI that are non-local to the logging services is 10 to 15 seconds. Value for this variable in the UI configuration file is

 

LoggingTimeout = 10

 

Moreover there are two sections reserved to the addresses of the LBs and RBs that are accessible for the UI from the machine where it is installed.

Special markers (e.g. %%beginLB%%) that must not be modified, indicate the sections begin-end. Hereafter is reported an example of the two mentioned sections:

 

%%beginLB%%

https://grid013g.cnaf.infn.it:7846

https://grid004f.cnaf.infn.it:7846

https://skurut.cesnet.cz:7846

%%endLB%%

 

%%beginRB%%

grid013g.cnaf.infn.it:7771

grid004f.cnaf.infn.it:7771

%%endRB%%

 

LB addresses must be in the format:

[<protocol>://]<hostname>:<port>         

where if not provided, default for <protocol> is “https” and for <port> is 7846.

RB addresses must instead be in the format:

<hostname>:<port>

i.e. no protocol is admitted. If not provided, default for <port> is 7771.

 

The LB addresses are used by the User Interface to know which LB servers have to be contacted for querying about job info. They are used only when the issued command pertain “all jobs owned by a user” (e.g. see dg-job-status –all in section 6.1.3). Indeed in this case all listed LB are taken into account for querying, whilst when a job identifier (dg_jobId) is specified the LB address is taken directly from dg_jobId (see section 6.1.3 for details on the job identifier format).

The RB addresses are used by the User Interface to know which Resource Brokers can be accessed for job submission. When the user submits a job, the first RB in the list is considered and in case this is not available for some reason, the connection to second one is tried and so on until an available RB is found. The same happens when asking the list of matching CEs for a job (see dg-job-submit and dg-job-list-match commands at section 6.1.3).

The RB addresses are used instead in a similar way as for the LB when the user asks for cancellation of all its jobs. In this case indeed all listed RB are asked for deletion of jobs owned by the requesting user (see dg-job-cancel –all at section 6.1.3).

 

4.4.5. Environment variables

Environment variables that have to be set for the User Interface are listed hereafter:

 

-        X509_USER_KEY                       the user private key file path. Default value is

$HOME/.globus/userkey.pem

-        X509_USER_CERT                    the user certificate file path.Default value is

$HOME/.globus/usercert.pem

-        X509_CERT_DIR                        the trusted certificate directory and ca-signing-policy

directory. Default value is /etc/grid-security/certificates

-        X509_USER_PROXY                  the user proxy certificate file path. Default value is

/tmp/x509up_u<UID> where UID is the user identifier on the machine as required by GSI.

Moreover there are:

 

-        EDG_WL_UI_CONFIG_PATH    Non standard location of the UI configuration file

UI_ConfigENV.cfg. This variable points to the file absolute path.

 

-        EDG_WL_LOCATION                 UI install path. It has to be set only if installation has

been made in a non default location. It defaults to /opt/edg

 

-        GLOBUS_LOCATION                 The Globus rpms installation path.

 

The two latter variables are anyway set automatically once the userinterface-profile rpm is installed.

The Logging library i.e. the library that is linked into UI for logging the jobs transfer events reads its immediate logging destination form the variable DGLOG_DEST.  Correct format for this variable is:

DGLOG_DEST=x-dglog://HOST:PORT

where HOST defaults to localhost and PORT defaults to 15830. On the submitting machine if the variable is not set it is dynamically assigned by the UI with the value:

DGLOG_DEST=x-dglog://<LB_CONTACT>:15830

where  LB_CONTACT is the hostname of the machine where the LB server currently associated to the RB used for submitting jobs is running.

4.5.  DOCUMENTATION

The userguide documentation package (see section 3.2.2 for more details) provides you all the information needed to download ,configure, install and use the Datagrid software. Once you have installed the userguide rpm, the following directory tree is created:

 

<install-path>/share
<install-path>/share/doc
<install-path>/share/doc/DataGrid_01_TEN_0118_0_X_Document.pdf

 

 

 

 

 

 

 

 

 

 

 

5. Operating the System

For security purposes all the WMS daemons run with proxy certificates. These certificates are generated from the start-up scripts that are described in the following section, before the applications are started. Lifetime of proxies created by the start-up scripts is 24 hours. In order to provide the daemons with valid proxies for all their lifetime the administrators need to ensure regular generation of new proxies. This can be achieved adding the following lines to the machine /etc/crontab:

 

57 2,8,14,20 * * * root service locallogger proxy

57 2,8,14,20 * * * root service lbserver proxy

57 2,8,14,20 * * * root service broker proxy

57 2,8,14,20 * * * root service jobsubmission proxy

 

This will make proxies be created by cron.

5.1. LB local-logger

5.1.1. Starting and stopping daemons

To run the LB local-logger services, it suffices to issue as root the following command:

 

/etc/rc.d/init.d/locallogger start

 

if the locallogger-profile rpm has been installed. Otherwise you can use

 

<install path>/sbin/locallogger start

 

This makes both the dglogd and the interlogger processes start.

The same can be done issuing the following commands:

 

<install path>/sbin/dglogd <options>

<install path>/sbin/interlogger <options>

 

Both daemons recognize a common set of options:

--key=<keyfile>              host certificate private key file (this option overrides  value of the environment variable X509_USER_KEY). Here below an example of option usage:

--key=/etc/grid-security/hostkey.pem

 

--cert=<certfile>         host certificate file (this option overrides  value of the environment variable X509_USER_CERT). Here below an example of option usage:

--cert=/etc/grid-security/hostcert.pem

 

--CAdir=<certdir>         trusted certificate and ca-signing-policy directory  (this option overrides value of the environment variable X509_CERT_DIR). Here below an example of option usage:

--CAdir=/etc/grid-security/certificates

 

--file-prefix=<file path> Absolute path of the file where are stored locally the logged events. The default value is /tmp/dglog, which can result in risk of data loss in case of reboot. Note that the same value must be specified for dglogd and interlogger.

 

--debug                                                       make the process run in foreground to produce diagnostics

 

Using the options explicitly is recommended rather than relying on the correspondent environment variables.

Stop of the LB local-logger services can be performed using the locallogger script with the stop option.

5.1.2. Troubleshooting

If the LB local-logger services are started in debug mode (i.e. using the –-debug option), the daemons log fatal failures with syslog().


 

5.2. LB Server

5.2.1. Starting and stopping daemons

To run the LB server services, it suffices to issue as root the following command:

 

/etc/rc.d/init.d/lbserver start

 

if the lbserver-profile rpm has been installed. Otherwise you can use

 

<install path>/sbin/lbserver start

 

This makes both the bkserver and the ileventd processes start.

The same can be done issuing the following commands:

 

<install path>/sbin/ileventd <options>

<install path>/sbin/bkserver <options>

 

Both daemons recognize a common set of options:

 

--key=<keyfile>              host certificate private key file (this option overrides  value of the environment variable X509_USER_KEY). Here below an example of option usage:

--key=/etc/grid-security/hostkey.pem

 

--cert=<certfile>         host certificate file (this option overrides  value of the environment variable X509_USER_CERT). Here below an example of option usage:

--cert=/etc/grid-security/hostcert.pem

 

--CAdir=<certdir>         trusted certificate and ca-signing-policy directory  (this option overrides value of the environment variable X509_CERT_DIR). Here below an example of option usage:

--CAdir=/etc/grid-security/certificates

 

--debug                               make the process run in foreground to produce diagnostics

 

Using the options explicitly is recommended rather than relying on the correspondent environment variables.

Stop of the LB server services can be performed using the lbserver script with the stop option.

 

5.2.2. Purging the LB database

The bkpurge  process, whose executable is installed in <install path>/sbin, is not a daemon but an utility which should be run periodically (e.g. using a cron job) in order to remove inactive jobs (i.e. those that have already entered the Cleared status since a certain amount of time) from the LB database.  This utility recognizes the following set of options:

 

--log                                                           data being purged from database are dumped on the stdout

--outfile=<file>                                   data being purged from database are dumped in the file named <file>

--mysql=<database>                   name of the database to be purged. It must be the same used by bkserver (this option is not required in the standard set-up

--timeout=<timeout>[smhd]   removes data for all jobs that entered the “Cleared” status since more than <timeout> [seconds/minutes/hours/days].

--debug                                           print diagnostics on the stderr

--nopurge                                                  dry run mode. It doesn't really purge (useful for debugging purposes)

--aborted,  -a                                       delete from the database data also for jobs that have entered the “Aborted” status

If --log is specified, the data in ULM format are dumped to stdout (or <file>). Normally information is appended to the file. The file is locked with flock (_LOCK_EX) to prevent race conditions, e.g. rotating logs.

An example of usage of this utility could be the issuing once a day, using a cron job, of a bkpurge like:

 

bkpurge --log --outfile=/var/log/dglb-data.log --timeout=14d

 

5.2.3. Troubleshooting

If the LB server services are started in debug mode (that is using the –-debug option) the daemons log fatal failures with syslog().


 

5.3. RB and JSS

5.3.1. Starting PostGreSQL

Both RB and JSS use the service offered by the database. It must be started before one of these daemons using its own startup script:

 

/etc/rc.d/init.d/postgresql start

 

or using RedHat service command:

 

service postgresql start

 

Stopping is achieved by the same commands with the stop parameter:

 

/etc/rc.d/init.d/postgresql stop

or

service postgresql stop

 

 

5.3.2. Starting and stopping JSS and RB daemons

The packages  *-profile.X.Y.Z.rpm provide the SysV RedHat-like scripts that allow starting these daemons. In particular startup of RB or JSS can be achieved issuing directly:

 

/etc/rc.d/init.d/broker start

/etc/rc.d/init.d/jobsubmission start

 

or, indirectly, using RedHat dedicated commands:

 

service broker start

service jobsubmission start

 

In the same way stopping is achieved by:

 

/etc/rc.d/init.d/broker stop

/etc/rc.d/init.d/jobsubmission stop

 

or

 

service broker stop

service jobsubmission stop

 

The startup script for JSS also starts and stops the underlying CondorG service. If any of the configuration steps described in section 4.2 has been followed, these scripts will start the daemons with the correct selected users (see also Table 2 in section 7.7). However do not forget to put the right files (hostkey.pem and hostcert.key) in the locations pointed respectively by the variables X509_HOST_KEY and X509_HOST_CERT (this must be located in the subdirectory hostcert of the home directory of the dguser account).

Startup scripts can also be used to know the current status of the daemons using the status option:

 

service broker status

service jobsubmission status

 

Moreover it is strongly recommended to set the configuration of the machine in such a way that all these services (PostGreSQL, RB and JSS) will be started at the startup of the system. For these issue, refer to the RedHat  chkconfig SysV script manager command.

 

5.3.3. RB and JSS databases clean-up

Hereafter are reported the instructions for cleaning-up the PostGreSQL databases used by the RB and the JSS to store persistent information about handled jobs. They can be useful when a re-start in a clean context is needed or in case the content of the databases has been corrupted following a serious failure of some component.

 

Resource Broker

pgsql -U postgres <RB_db_name>

delete from job;

"\q" (to quit)

 

RB_db_name is the name of the database used by the Resource Broker (usually set to rb)

 

Job Submission Service

pgsql -U postgres template1

delete from condor_submit;

"\q" (to quit)

 

template1 is the default name of the database used by the JSS. It is configurable through the Database_name parameter of the jss.conf file.

 

5.3.4. RB troubleshooting

The RB supplies with a log file recording its various events. This file can be used to debug abnormal behaviours of the service. . The RB log-file name and other properties can be changed by directly modifying the rb.conf configuration file. You can change the name of the file, the debug level and the maximum file size in bytes, as well.

 

5.3.5. JSS troubleshooting

The script responsible to start JSS also includes the definition of the JSS log files. There are two of them and their pathname is set respectively to: /var/tmp/JSSserver.log and /var/tmp/JSSparser.log. As before, modifying these locations implies a modification of the  /etc/rc.d/init.d/jobsubmission script in the following two lines:

 

SERVERLOG=/var/tmp/JSSserver.log

PARSERLOG=/var/tmp/JSSparser.log

 

5.4. Information Index

5.4.1. Starting and stopping daemons

To start/stop the II, the following command has to be used as root:

 

/etc/rc.d/init.d/information_index {start | stop}

 

6. User Guide

The software module of the WMS allowing the user to access main services made available by the components of the scheduling sub-layer is the User Interface that hence represents the entry-point to the whole system.

Sections 6.1.1 and 6.1.2 provide a general description of the UI, dealing with the security management, common behaviours, environment variables to be set etc. Section 6.1.3 describes the Job Submission User Interface commands in a Unix man-page style.

6.1. User interface

The Job Submission UI is the module of the WMS allowing the user to access main services made available by the components of the scheduling sub-layer. The user interaction with the system is assured by means of a JDL and a command-driven user interface providing commands to perform a certain set of basic operations. Main operations made possible by the UI are:

-          Submit a job for execution on a remote Computing Element, also encompassing:

§         automatic resource discovery and selection

§         staging of the application sandbox (input sandbox)

-          Find the list of resources suitable to run a specific job

-          Cancel one or more submitted jobs

-          Retrieve the output files of a completed job (output sandbox)

-          Retrieve and display bookkeeping information about submitted jobs

-          Retrieve and display logging information about submitted jobs.

The User Interface depends on two other Workload Management System components:

-          the Resource Broker that provides support for the job control functionality

-          the Logging and Bookkeeping Service provides support for the job monitoring functionality.

6.1.1. Security

For the DataGrid to be an effective framework for largely distributed computation, users, user processes and grid services must work in a secure environment

Due to this, all interactions between WMS components, especially those that are network-separated, will be mutually authenticated: depending on the specific interaction, an entity authenticates itself to the other peer using either its own credential or a delegated user credential or both. For example when the User Interface passes a job to the Resource Broker, the UI authenticates using a delegated user credential (a proxy certificate) whereas the RB uses its own service credential. The same happens when the UI interacts with the Logging and Bookkeeping service. The UI uses a delegated user credential to limit the risk of compromising the original credential in the hands of the user.

The user or service identity and their public key are included in a X.509 certificate signed by a DataGrid trusted Certification Authority (CA), whose purpose is to guarantee the association between that public key and its owner

According to what just premised, to take advantage of UI commands the user has to possess a valid X.509 certificate on the submitting machine, consisting of two files: the certificate file and the private key file. The location of the two mentioned files is assumed to be either pointed to respectively by  $X509_USER_CERT” and “$X509_USER_KEY” or by “$HOME/.globus/usercert.pem” and “$HOME/.globus/userkey.pem” if the X509 environment variables are not set. The user certificate and private key files are needed for the creation of the delegated user credentials. Indeed, as it is explained hereafter what is really needed is the user proxy certificate.

All UI commands, when started, check for the existence and expiration date of a user proxy certificate in the location pointed to by “$X509_USER_PROXY” or in “/tmp/x509up_u<UID>” (<UID> is the user identifier in the submitting machine OS) if the X509 environment variable is not set. If the proxy certificate does not exist or has expired a new one with default duration of 24 hours is automatically created by the UI using the GSI services (grid-proxy-init and grid-proxy-info). The user proxy certificate is created either as “$X509_USER_PROXY” or as “/tmp/x509up_u<UID>”.

Once a job has been submitted by the UI, it passes through several components of the WMS (e.g. the RB, the JSS etc.) before it completes its execution. At each step operations that are related with the job could require authentication by a certificate. For example during the scheduling phase, the RB needs to get some information about the user who wants to schedule a job and the certificate of the user could be needed to access this information. Similarly, a valid user’s certificate is needed by JSS to submit a job to the CE. Moreover JSS has to be able to repeat this process e.g. in case of crashing of the CE which the job is running on, therefore, a valid user’s certificate is needed for all the job lifetime.

A job gets a valid proxy certificate when it is submitted by the UI to RB. Validity of such a certificate is usually set to 12 hours, hence problems could occur if the job spends on CE (in a queue or running) more time than lifetime of its proxy certificate.

The UI dg-job-submit command (see description later in this document) supplies an option (--hours H) allowing the specification of the duration in hours of the proxy certificate that is created on behalf of the user. Due to this, it being understood that the certificates files search paths remains as before, the proxy checking mechanism for this command slightly differs from that of the other commands, i.e.:

-        If the “--hours H” option has not been specified, the proxy certificate check is done as explained before

-        If the “--hours H” option has been specified, then a new proxy certificate having a duration of H hours is created both when no existing proxy is found and when the existing proxy lifetime is less than H. In the latter case the existing proxy certificate is destroyed before creating the new one.

This allows the user to submit jobs running longer then the default proxy duration (12 hours).

Another way for achieving this in a more secure way is to deploy the features of MyProxy package. The underlying idea is that the user registers in a MyProxy server a valid long-term certificate proxy that will be used by JSS to perform a periodic credential renewal for the submitted job; in this way the user is no longer obliged to create very long lifetime proxies when submitting jobs lasting for a great amount of time. A more detailed description of this mechanism is provided in the following paragraph.

6.1.1.1.  MyProxy

The MyProxy credential repository system consists of a server and a set of client tools that can be used to delegate and retrieve credentials to and from a server. Normally, a user would start by using the myproxy_init client program along with the permanent credentials necessary to contact the server and delegate a set of proxy credentials to the server along with authentication information and retrieval restrictions.

The MyProxy Toolkit is available at the following URL:

http://lindir.ics.muni.cz/dg_public/myproxy-0.4.4-edg.tar.gz

In order to compile the package you'll have to follow the common Unix/Linux configure/make commands:

 

./configure --with-gsi=/opt/globus --with-globus-flavor=gcc32dbg \

--disable-anonymous-auth --prefix=/opt/myproxy

 

Type ./configure --help for all the detailed options (such as binaries, server configuration paths, etc)

Once you have successfully launched the configure script you can compile the source and install the package launching 'make' and 'make install'.

Before using the MyProxy tools, you have to restrict the users that are allowed to store credentials within the myproxy server and, more importantly, which clients are allowed to retrieve credentials from the myproxy server. To do that, just follow instructions reported hereafter (MyProxy Server).

 

MyProxy Server

myproxy-server is a daemon that runs on a trusted, secure host that manages a database of proxy credentials for use from remote sites. Proxies have a lifetime that is controlled by the myproxy-init program. When a proxy is requested to the myproxy-server, via the myproxy-get-delegation command, further delegation insures that the lifetime of the new proxy is less than the original to enforce greater security.

A configuration file is responsible for maintaining a list of trusted portals and users that can access this service. To configure a proxy server, you need to execute the following steps:

 

cd /opt/edg/etc

cp edg-myproxy.conf edg-myproxy.conf.orig

cp myproxy.conf edg-myproxy.conf

 

edit this file, substitute the present lines with a similar line, containing the name of the local resource broker:

 

/etc/rc.d/init.d/myproxy start       (this creates the file /etc/myproxy-server.config)

chkconfig -level 2345 myproxy

 

The myproxy.conf file looks as follows:

 

# #################################################

# Add to this file all of the subject names of resources

# who may renew credentials, i.e. the issuer names of

# recognized resource brokers.

#

# Add lines like the following one (without the #)

#/O=Grid/O=CERN/OU=cern.ch/CN=host/testbed013.cern.ch

###################################################

/O=Grid/O=CERN/OU=cern.ch/CN=host/testbed013.cern.ch

/O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0383.cern.ch

/O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0382.cern.ch

/O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch

/C=IT/O=INFN/OU=Resource Broker/L=CNAF/CN=grid012f.cnaf.infn.it/Email=elisabetta.ronchieri@cnaf.infn.it

 

i.e it contains subject names of all resources who are allowed to renew credentials (the recognized Resource Brokers).

In order to launch the demon you have to run the binary '<prefix>/sbin/myproxy-server'. The program will start up and background itself. It accepts connections on TCP port 7512, forking off a separate child to handle each incoming connection. It logs information via the syslog service.

 

MyProxyClient

The set of binaries provided for the client is made of the following files:

myproxy-init

myproxy-info

myproxy-destroy

myproxy-get-delegation

 

myproxy-init command allows you to create and send a delegated proxy to a myproxy server for later retrieval; in order to launch it you have to assure you're able to execute the grid-proxy-init GLOBUS command (i.e.the binary is visible from your $PATH environment and the required cert files are either stored in the common path or specified with the X509 variables). You can use the command as follows (you will be asked for your PEM passhprase):

 

myproxy-init -s <host name> -t <hours> -d –n

 

The myproxy-init command stores a user proxy in the repository specified by <host name> (the –s option). Default lifetime of proxies retrieved from the repository will be set to <hours>  (see -t) and no password authorization is permitted when fetching the proxy from the repository (the  -n option). The proxy is stored under the same username as is your subject in your certificate (-d).

The myproxy-info command returns the remaining lifetime of the proxy in the repository along with subject name of the proxy owner (in our case it will be the same as in your proxy certificate). So If you want to get information about the stored proxies you can issue:

 

myproxy-info -s <host name> -d

 

where -s and -d options have already been explained in the myproy-init command

The myproxy-destroy command simply destroys any existing proxy stored in the myproxy server. You can use it as follows:

 

myproxy-destroy  -s <host name> -d

 

where -s and -d options have already been explained in the myproy-init command

The myproxy-get-delegation command is indeed used to retrieve information about the proxies stored in the myproxy server. You can use it as follows:

 

myproxy-get-delegation -s <host name> -d -t <hours> \

-o <output file> -a <user proxy>

 

You should end up with a retrieved proxy in <output file>, which is valid for

<hours> hours.

It is worth noting that the environment variable MYPROXY_SERVER can be set to tell to all these programs the hostname where the myproxy server is running.

  

6.1.2.  Common behaviours

A User Interface installation mainly consists of three directories bin, lib and etc that are created under the UI installation path that is usually pointed by the EDG_WL_LOCATION environment variable. If this variable is not set or its value is not correct, default value is assumed to be “/opt/edg”.

bin contains the commands executables and hence it is recommended to add it to the user PATH environment variable to allow her/him to use UI commands from whatever location. lib  contains the shared libraries (wrappers of the RB/LB APIs) implementing functionalities for accessing the RB and LB services , whereas etc is the UI configuration area.

The UI configuration area etc contains the job description template file job_template.jdl, the file containing the mapping between error codes and error messages UI_Errors.cfg, and the actual configuration file UI_ConfigEnv.cfg. The latter file is the only one that could need to be edited and tailored according to the user/platform characteristics and needs. It contains the following information that are read by and have influence on commands behaviour (see section 4.4.4 for details):

-          address and port of accessible RBs ordered by priority,

-          address and port of accessible LBs ordered by priority,

-          default location of the local storage areas for the Input/Output sandbox files,

-          default values for the JDL mandatory attributes,

-          default number of retrials on fatal errors when connecting to the LB.

When started, UI commands first check if the EDG_WL_LOCATION is set and then search for the etc directory containing its configuration files in the following locations, in order of precedence: “$EDG_WL_LOCATION”, /“,  /usr/local and /opt/edg. If none of the locations contains needed files an error is returned to the user.

Since several users on the same machine can use a single installation of the UI, people concurrently issuing UI commands share the same configuration files. Anyway for users (or groups of users) having particular needs it is possible to “customise” the UI configuration through the --config option supported by each UI command.

Indeed every command launched specifying “--config file_path” reads its configuration settings in the file “file_path” instead of the default configuration file. Hence the user only needs to create such file according to her/his needs and to use the --config option to work under “private” settings.

Moreover if the user wants to make this change in some way permanent avoiding the use for each issued command of the --config option, she/he can set the environment variable EDG_WL_UI_CONFIG_PATH to point to the non-standard path of the configuration file. Indeed if that variable is set commands will read settings from file “$EDG_WL_UI_CONFIG_PATH”. Anyway the --config option takes precedence on all other settings.

It is important to note that since the job identifiers dg_jobId (see section  6.1.3 – dg-job-submit) implicitly holds the information about the RB and the LB that are managing the corresponding job, all the commands taking the dg_jobId as input parameter do not take into account the RB and LB addresses listed in the configuration file to perform the requested operation also if the –config option has been specified.

Hereafter are listed the options that are common to all UI commands (with the exception of dg-job-id-info that is a local utility):

-          --config  file_path

-          --noint

-          --debug

-          --logfile file_path

-          --version

-          --help

The --config option

The --noint option skips all interactive questions to the user and goes ahead in the command execution. All warning messages and errors (if any) are written to the file <command_name>_<UID>_<PID>.log in the “/tmp” directory instead of the standard output. It is important to note that when --noint is specified some checks on “dangerous actions” are skipped. For example if jobs cancellation is requested with this option, this action will be performed without requiring any confirmation to the user. The same applies if the command output will overwrite an existing file, so it is recommended to use the --noint option in a safe context.

The --debug option is mainly thought for testing and debugging purposes; indeed it makes the commands print additional information while running. Every time an external API function call is encountered during the command execution, values of parameters passed to the API are printed to the user. The info messages are displayed on the standard output and are also written together with possible errors, to <command_name>_<UID>_<PID>.log file in the /tmp directory. An example of the debug messages format is as follows:

#### Debug API #### - The function 'dgLBJobStatus' has been called with the following parameter(s):

>>Struct 'dgLBContext':

   -> 0

   -> 0

>>Struct 'dgJobId':

   -> lx01.hep.ph.ic.ac.uk/124445102160554

   -> grid004f.cnaf.infn.it

   -> 7846

   -> grid013g.cnaf.infn.it:7771

>> 0

If --noint option is specified together with --debug option the debug message will not be printed on standard output.

The –logfile <file_path> option allows re-location of the commands log files in the location pointed by file_path.

The --version and --help options respectively make the commands display the UI current version and the command usage.

Two further options that are common to almost all commands are --input and --output. The latter one makes the commands redirect the outcome to the file specified as option argument whilst the former reads a list of input items from the file given as option argument. The only exception is the dg-job-list-match command that does not have the --input option.

For all commands, the file given as argument to the --input option shall contain a list of job identifiers in the following format: one dg_jobId for each line, comments beginning with a “#” or a “*” character.  If the input file contains only one dg_jobId (see the description of dg-job-submit command later in this document for details about dg_jobId format), then the request is directly submitted taking the dg_jobId as input, otherwise a menu is displayed to the user listing all the contained items, i.e. something like:

------------------------------------------------------------------------------------------------------------------------------------------

1 : https://grid013g.cnaf.infn.it:7846/lx01.hep.ph.ic.ac.uk/133711137156527?grid013g.cnaf.infn.it:7781

2 : https://grid013g.cnaf.infn.it:7846/lx01.hep.ph.ic.ac.uk/133747137833158?grid013g.cnaf.infn.it:7781

3 : https://grid004f.cnaf.infn.it:7846/lx01.hep.ph.ic.ac.uk/133957138124219?grid004f.cnaf.infn.it:7771

4 : https://grid013g.cnaf.infn.it:7846/lx01.hep.ph.ic.ac.uk/134030138239274?grid013g.cnaf.infn.it:7771

5 : https://grid001f.cnaf.infn.it:7846/lx01.hep.ph.ic.ac.uk/140706140477638?grid013g.cnaf.infn.it:7771

a : all

q : quit

-------------------------------------------------------------------------------------------------------------------------------------------

Choose one or more dg_jobId(s) in the list - [1-10]all:

 

The user can choose one or more jobs from the list entering the corresponding numbers. E.g.:

-        2         makes the command take the second listed dg_jobId as input

-        1,4      makes the command take the first and the fourth listed dg_jobIds as input

-        2-5     makes the command take listed dg_jobIds from 2 to 5 (ends included) as input

-        all       makes the command take all listed dg_jobIds as input

-        q         makes the command quit

Default value for the choice is all. If the –input option is used together with the --noint then all dg_jobIds contained in the input file are taken into account by the command.

The only command whose --input behaviour differs from the one just described is dg-job-submit. First of all the input file contains in this case CEIds instead of dg_jobIds, moreover only one CE at a time can be the target of a submission hence the user is allowed to choose one and only one CEId. Default value for the choice is “1”, i.e. the first CEId in the list. This also the choice automatically made by the command when the --input option is used together with the --noint one.


 

6.1.3. Commands description

In this section we describe syntax and behavior of the commands made available by the UI to allow job submission, monitoring and control.

In the commands synopsis the mandatory arguments are showed between angle brackets (<arg>) whilst the optional ones between square brackets ([arg]).

 

       dg-job-submit

Allows the user to submit a job for execution on remote resources in a grid.

 

 

SYNOPSIS

dg-job-submit  [options]  <jdl file>

Options:

   --help

   --version

   --template

 

   --input, -i     <input_file>

   --resource, -r  <ce_id>

   --notify, -n    <e-mail_address(es)>

   --hours, -h     <hours_number>

   --nomsg

   --config, -c    <config_file>

   --output, -o    <output_file>

   --noint

   --debug

       --logfile <log_file>

 

DESCRIPTION

dg-job-submit is the command for submitting jobs to the DataGrid and hence allows the user to run a job at one or several remote resources. dg-job-submit requires as input a job description file in which job characteristics and requirements are expressed by means of Condor class-ad-like expressions. While it does not matter the order of the other arguments, the job description file has to be the last argument of this command.

The job description file given in input to this command is syntactically checked and default values are assigned to some of the not provided mandatory attributes in order to create a meaningful class-ad. The resulting job-ad is sent to the Resource Broker that finds the job best matching resource (match-making) and submits the job to it. The match-making algorithm is described in details in Annex 7.6.

Upon successful completion this command returns to the user the submitted job identifier dg_jobId (a string that identifies unambiguously the job in the whole DataGrid), generated by the User Interface, that can be later used as a handle to perform monitor and control operations on the job (e.g. see dg-job-status described later in this document). The format of the dg_jobId is as follows:

<LBname>/<UIaddress>/<time><PID><RND>?<RBname>

where:

-          LBname is the LB server name and port

-          UIaddress is the UI machine IP address (or FQDN)

-          time is the current UTC time on the submitting machine in hhmmss format

-          PID is the command process identifier

-          RND is a random number generated at each job submission

-          RBname is the RB server hostname and port

The structure of the dg_jobId that could appear in some way complex and not easily readable has been conceived in order to assure uniqueness and the same time contain information that are needed by the components of the WMS to fulfil user requests.

The --resource option can be used to target the job submission to a specific known resource identified by the provided Computing Element identifier ce_id (returned by dg-job-list-match described later in this document). A resource will be either a queue of an underlying LRMS, assuming that this queue represents a set of “homogeneous” resources or a “single” node. The CE identifier is a string, assigned by WP4 and published in the GIS (the CEId field) that univocally identifies a resource belonging to the Grid. CEId is obtained “combining” the GlobusResourceContactString and QueueName attribute, e.g. if  lxde01.pd.infn.it:2119 is the Globus resource contact string and grid01 is the queue name then it looks like  lxde01.pd.infn.it:2119/jobmanager-lsf-grid01. In other words the admitted format for CEId is:

<full-hostname>:<port-number>/jobmanager-<service>-<queue-name>

where <service> can be lsf, pbs or bqs.

When the --resource option is specified, the Resource Broker skips completely the match making process and directly submits the job to the requested CE.  It is important to note that in this case the RB does not generate the “.BrokerInfo” file also if data requirements have been specified in the JDL, so jobs submitted using this option should not rely on the .BrokerInfo file information when running on the CE. The “.BrokerInfo” file is a file generated by the RB during matchmaking and containing information about the location where input  data specified in the JDL are physically stored, the SEs that are “close” to the CE chosen for submitting the job etc. It is shipped within the InputSandbox to the CE where the job is going to run so that it can be used at run-time to get information for accessing data. Details about the “.BrokerInfo” file can be found in [R1].

A way for performing direct submission to a given CE and at the same time having the “.BrokerInfo” file generated by RB and shipped to the CE is not using the --resource option and specify the following requirements in the JDL:

Requirements = other.CEId == <Ce_identifier>;

(e.g.  Requirements = other.CEId == “lxde01.pd.infn.it:2119/jobmanager-lsf-grid01”;)

 

It is also possible to specify the target CE to which submit the job using the --input option. With the --input option an input_file must be supplied containing a list of target CE ids. In this case the dg-job-submit command parses the input_file and displays on the standard output the list of CE Ids written in the input_file. The user is then asked to choose one CEId between the listed ones. The command will then behave exactly like already explained for the --resource option. The basic idea of this command is to use as input_file the output file generated by the dg-job-list-match command when used with the --output option (see dg-job-list-match) that contains the list of CE Ids (if any) matching the requirements specified in the jobad.jdl file.  An example of a possible sequence of commands is:

>$ dg-job-list-match --output CEList.out jobad.jdl

>$ dg-job-submit --input CEList.out jobad.jdl

 If CEList.out contains more than one CEId then the user is prompted for choosing one Id from the list.

When dg-job-submit is used with the --notify option, the following schema is used to notify the user about job status changes:

-          an e-mail notification is sent to the specified e_mail_address when the match-making process has finished and the job is ready to be submitted to JSS (READY status)

-          an e-mail notification is sent to the specified e_mail_address when the job starts running on the CE (RUNNING status)

-          an e-mail notification is sent to the specified e_mail_address when the job has finished (ABORTED or DONE status).

The notification message will contain basic information about the job such as the job identifier, the Id of the assigned CE and a brief description of its status.

Notification to multiple contacts can be requested by specifying the corresponding e-mail addresses separated by commas and without blanks.

It is possible to redirect the returned dg_jobId to an output file using the --output option. If the file already exists, a check is performed: if the file was previously created by the command dg-job-submit (i.e. it contains a well defined header), the returned dg_jobId is appended to the existing file every time the command is launched. If the file wasn’t created by the command dg-job-submit the user will be prompted to choose if overwrite the file or not. If the answer is no the command will abort.

The dg-job-submit command has a particular behaviour when the job description file contains the InputSandbox attribute whose value is a list of file paths on the UI machine local disk. The purpose of the introduction of the InputSandbox attribute is to stage, from the UI to the CE, files that are not available in any SE and are not published in any Replica Catalogue.

To better understand, let’s suppose to have a job that needs for the execution a certain set of files having a small size and available on the submitting machine. Let’s also suppose that for performance reasons it is preferable not going through the WP2 data transfer services for the staging of these files on the executing node. Then the user can use the InputSandbox attribute to specify the files that have to be staged from the submitting machine to the executing CE. All of them are indeed transferred at job submission time together with the job class-ad to the RB that will store them temporarily on its local disk. The JSS will then perform the staging of these files on the executing node. The size of files to be transferred to the RB should be small since overfull of RB local storage means that no more job of this type can be submitted.

This mechanism can also be used to stage a job executable available locally on the UI machine to the executing CE. Indeed in this case the user has to include this file in the InputSandbox list (specifying its absolute path in the file system of the UI machine) and as Executable attribute value has only to specify the file name. On the contrary, if the executable is already available in the file system of the executing machine, the user has to specify as Executable an absolute path name for this file (if necessary using environment variables). The same argument can be applied to the standard input file that is specified through the StdInput JDL attribute.

Since the InputSandbox expression can consist of a great number of file names, it is admitted the use of wildcards and environment variables to specify the value of this attribute. Syntax and allowed wildcards are described in Annex 7.5.

It is important to note that since globus-url-copy (the Globus command used for the InputSanbox files staging) in general doesn't preserve the x flag, the script specified as Executable in the JDL (on which chmod +x is done automatically by the WP1 JobWrapper), should perform a chmod +x for all the files needing execution permission, that are transferred within the InputSandbox of the job.

For the standard output and error of the job the user shall instead always specify just file names (without any directory path) through the StdOutput and StdError JDL attributes. To have them staged back on the UI machine it suffices to list them in the OutputSandbox and use after job completion the dg-job-get-output command described later in this document.

The list of data specification JDL attributes is completed by the InputData attribute that refers to data used as input by the job that are not subjected to staging and are stored in one or more storage elements and published in replica catalogues. Due to this when the user specifies the InputData attribute then he/she also has to provide the name of the replica catalogue (ReplicaCatalog attribute) where these data are published and the protocol her/his application is able to “speak” for accessing data (DataAccessProtocol attribute). The InputData attribute should normally contain a list of logical and/or physical file names. If InputData only contains PFNs then the ReplicaCatalog attribute specification is no more mandatory.

The ReplicaCatalog address must be provided in the following format

ldap://<host>:<port>/<Replica Catalogue DN>

where the Replica Catalogue DN also comprises the mandatory logical collection field lc.

I.e. it is something like:

lc=<Logical collection>, rc=<replica catalogue>, dc=....

Herefater is reported an example of Replica Catalog address:

ldap://sunlab2f.cnaf.infn.it:2010/lc=test0, rc=WP2 INFN Test Replica Catalog, dc=sunlab2g, dc=cnaf, dc=infn, dc=it

The Arguments attribute in the JDL allows the user to specify all the command line arguments needed to start the job. They have to be specified as a single string, e.g. the job sum that is started with:

$ sum  N1 N2 –out result.out

is described by:

Executable = “sum”;

Arguments = “N1 N2 –out result.out”;

If you want to specify a quoted string inside the Arguments then you have to escape quotes with the  \ character. E.g. when describing a job like:

$ grep –i “my name” *.txt

you will have to specify:

Executable = “/bin/grep”;

Arguments = “-i \”my name\” *.txt”;

Analogously, if the job takes as argument a string containing a special character (e.g. the job is the tail command issued on a file whose name contains the quotes character, say file1&file2), since on the shell line you would have to write:

$ tail –f file1\&file2

in the JDL you’ll have to write:

Executable = “/usr/bin/tail”;

Arguments = “-f file1\\\&file2”;

i.e. a \ for each special character.

In general, special characters such as &, |, >, < are only allowed if specified inside a quoted string or preceded by triple \.

The character “`” cannot be specified in the Arguments attribute of the JDL.

The RetryCount attribute allows setting the number of submission retries for a job upon failure due to some grid component (i.e. not to the job itself). RetryCount has to be a positive number and the actual number of submission retries for a job is represented by the minimum value between RetryCount itself and the value of the RB_submission_retries parameter in the RB configuration file (see 4.2.4.1). The resubmission is tried for all the CEs satisfying the job requirements.

The --hours allows the user to specify the user proxy duration H, in hours, needed for submitting the job. This option has to be used for long-lasting jobs, indeed a job when submitted needs to be accompanied by a valid proxy certificate during all its life-time and the default duration of user proxy created by UI commands is 12 hours that could in some case not be enough.

It is recalled that anyway a safer way for submitting long-running jobs is to use the myproxy-init command (see section 6.1.1.1) before the dg-job-submit. The myproxy-init command registers indeed in a MyProxy server a valid long-term certificate proxy that will be used by JSS to perform a periodic credential renewal for the submitted job.

When using the myproxy-init command the hostname of the MyProxy server where to store the certificate proxy has to be specified. If the used sever host name is different from the default one used for the credential renewal, reported in the RB configuration file (rb.conf), it has to be specified within the JDL job description through the MyProxyServer attribute. An example is provided hereafter:

MyProxyServer = “skurut.cesnet.cz”;

Note that the port number must not be provided.

Lastly the --nomsg option makes the command display neither messages nor errors on the standard output. Only the dg_jobId assigned to the job is printed to the user if the command was successful. Otherwise the location of the generated log file containing error messages is printed on the standard output. This option has been provided to make easier use of the dg-job-submit command inside scripts in alternative to the --output option.

It is important to note that the dg-job-submit is a sort of fire-and-forget command, i.e. it exits successfully once the JDL has been passed to the RB and does not matter about what happens afterwards to the job. Understanding the reason of a job abort can however be accomplished by using the dg-job-status (especially looking at the “Status Reason” field) and dg-job-get-logging-info on the job identifier returned from the submission.

 

Job Description File

A job description file contains a description of job characteristics and constraints in a class-ad style. Details on the class-ad language are reported in the document [A1] also available at the following URL:

http://www.infn.it/workload-grid/docs/DataGrid-01-TEN-0102-0_2.pdf.

The job description file must be edited by the user to insert relevant information about the job that is later needed by the RB to perform the match-making. A template of the job description file, containing a basic set of attributes can be obtained by calling the dg-job-submit command with the --template option. Job description file entries are strings having the format attribute = expression and are terminated by the semicolon character. If the entry spans more than one line, the end of line has to be indicated with a backslash (\) character. Comments must be preceded by a sharp character (#) at the beginning of each line.

Being the class-ad an extensible language, it there doesn’t exist a fixed set of admitted attributes, i.e. the user can insert in the job description file whatever attribute he believes meaningful to describe her/his jobs, anyway only the attributes that can be in some way connected with the resource ones published in the GIS are taken into account by the Resource Broker for the match-making process. Unrelated attributes are simply ignored except when they are used to build the Requirements expression. In the latter case they are indeed evaluated and could affect the match-making result. The attributes taken into account by the RB together with their meaning are reported in document [A7].

There is a small subset of class-ad attributes that are compulsory, i.e. that have to be present in a job class-ad before it is sent to the Resource Broker in order to make possible the performing of the match making process.

They can be grouped in two categories: some of them must be provided by the user whilst some other, if not provided, are filled by the UI with configurable default values. The following Table 1 summarises what just stated.

 

Attribute

Mandatory

Mandatory with default value (default value)

Executable

b

 

Requirements

 

b

(TRUE)

Rank

 

b

(-other.EstimatedTraversalTime)

InputData

b

(only if the ReplicaCatalog and/or the DataAccessProtocol attributes have been specified)

 

ReplicaCatalog

b

(only if the InputData attribute has been specified)

 

DataAccessProtocol

b

(only if the InputData attribute has been specified)

 

Table 1 Mandatory Attributes

In  Table 1 the default values for Requirements and Rank can be interpreted respectively as follows:

-          if the user has not provided job constraints then Requirements is set to TRUE, i.e. it does not matter which are characteristics of the computing element where the job has to be executed, the RB will take into account all sites where the user is authorised to run her/his application.

-          Since in the JDL the greater is the value of Rank the better is considered the match, if no expression for Rank has been provided, then the resources where the jobs waits a shorter time to pass from the SCHEDULED to the RUNNING status are preferred.

The default values for the Requirements and Rank attributes can be set in the UI_ConfigEnv.cfg file.  See section 4.4.4 for details on how to use these defaults.

As the classad language (and hence the JDL) is an extensible language, it allows the user to freely include new attributes within the job description. These attributes are ignored by the RB/JSS for the scheduling but are passed-through by the UI (if their syntax is correct) since they could be relevant for the submitter of for some other component processing the JDL.

However if the job description file contains attributes that are unknown to the RB/JSS, the UI will print a warning (when used with the –debug option) listing all of them.

 

OPTIONS

--help

            displays command usage.

 

--version

            displays UI version.

 

--resource ce_id

-r ce_id

if the command is launched with this option, the job-ad sent to the RB contains a line of the type SubmitTo = ce_id  and the job is submitted by the Resource Broker to the resource identified by ce_id without going through the match-making process. Accepted format for the CEId is:

<full hostname>:<port number>/jobmanager-<service>-<queue name>

where valids for the <service> field are currently: lsf, pbs and bqs.

Note that when this option is used the RB does not generate the “.BrokerInfo” file.

 

--input input_file

-i input_file

if this option is specified the user will be asked to choose a CEId from a list of CEs contained in the input_file. Once a CEId has been selected the command behaves as explained for the --resource option. If this option is used together with the –noint one and the input file contains more than one CEId, then the first CEId in the list is taken into account for submitting the job.

 

--notify e_mail_address

-n e_mail_address

when a job is submitted with this option an e-mail message containing basic information pertaining the job identification and status is sent to the specified e_mail_address when the job enters one of the following status:

-          READY

-          RUNNING

-          ABORTED or DONE                           

Notification to multiple contacts can be requested by specifying the corresponding e-mail addresses separated by commas and without blanks.

 

--config path_name

-c path_name

if the command is launched with this option, the configuration file pointed to by path_name is used instead of the standard configuration file.

 

--output out_file

-o out_file

writes the generated dg_jobId assigned to the submitted job in the file specified by out_file. out_file can be either a simple name or an absolute path (on the submitting machine). In the former case the file out_file is created in the current working directory.

 

--hours H

-h H

allows the user to specify the user proxy duration H, in hours, needed for submitting the job. When used with this option the dg-job-submit command behaves as follows:

-        the command checks for user proxy existence and if the proxy does not exist a new proxy with H hours duration is created

-        if the proxy exists then its duration is checked against the value specified with the --hours option. If proxy duration is greater than H hours then the job is submitted with the existing proxy, otherwise the old proxy is destroyed and a new one with H hours duration is created and used for submitting the job.

This mechanism allows the user to create before submission a proxy with a suitable duration for her/his job; moreover the user is not obliged to enter the PEM pass-phrase at each submission i.e. in all those cases where the existing proxy has a validity great enough for the job.

 

--nomsg

this option makes the command print on the standard output only the dg_jobId generated for the job if submission was successful; the location of the log file containing massages and diagnostics is printed otherwise.

 

--noint

if this option is specified every interactive question to the user is skipped, moreover only the dg_jobId is returned on the standard output. All warning messages and errors (if occurred) are written to the file dg-job-submit_<UID>_<PID>.log under the /tmp directory. Log file location is configurable.

 

--debug

when this option is specified, information about parameters used for the API functions calls inside the command are displayed on the standard output and are written to dg-job-submit_<UID>_<PID>.log file under the /tmp directory too. Log file location is configurable.

 

--logfile log_file

when this option is specified, the command log file is relocated to the location pointed by log_file

 

job_description_file

this is the file containing the classad describing the job to be submitted. It must be the last argument of the command.

 

 

Exit Status

dg-job-submit exits with a status value of 0 (zero) upon success, and 1 (one) upon failure.

 


 

Examples

1.      $> dg-job-submit myjob1.jdl

where myjob1.jdl is as follows:

##############################################  

#                                                

# -------- Job description file ----------

#                       

##############################################        

Executable     = "$(CMS)/fpacini/exe/sum.exe";

InputData      = "LF:testbed0-00019";

ReplicaCatalog = "ldap://sunlab2g.cnaf.infn.it:2010/rc=WP2 INFN Test Replica Catalog,dc=sunlab2g, dc=cnaf,

dc=infn, dc=it";

DataAccessProtocol = "gridftp";

Rank           = other.MaxCpuTime;

Requirements   = other.LRMSType == "Condor" && \

  (!(RegExp("*nikhef*",other.CEId)));

 

 

submits sum.exe to a resource (supposed to contain the executable file) whose LRMS is Condor and not containing the string “nikhef” in the CE identifier. The command returns the following output to the user, containing the job handle (dg_jobid):

 

================= dg-job-submit Success ===================================

The job has been successfully submitted to the Resource Broker. Your job is identified by  (dg_jobId):

https://grid004f.cnaf.infn.it:7846/155.198.211.205/161251122764136?grid004f.cnaf.infn.it:7771

Use dg-job-status command to display current job status.

======================================================================

 

2.      $> dg-job-submit myjob2.jdl --notify fpacini@datamat.it

submits the job described by myjob2.jdl , returns the same output as above to the user and sends a notification by e-mail at well defined job status changes to fpacini@datamat.it.

 

See also

[A1], [A2], dg-job-list-match.


 

       dg-job-get-output

This command requests the RB for the job output files (specified by the OutputSandbox attribute of the job-ad) and stores them on the submitting machine local disk.

 

 

SYNOPSIS

dg-job-get-output  [options]  <job Id(s)>

Options:

   --help

   --version

 

   --input, -i     <input_file>

   --dir           <directory_path>

   --config, -c    <config_file>

   --noint

   --debug

   --logfile <log_file>

 

DESCRIPTION

The dg-job-get-output command can be used to retrieve the output files of a job that has been submitted through the dg-job-submit command with a job description file including the OutputSandbox attribute. After the submission, when the job has terminated its execution, the user can load the files generated by the job and temporarily stored on the RB machine as specified by the OutputSandbox attribute, issuing the dg-job-get-output with as input the dg_jobId returned by the dg-job-submit. It is also possible to specify a list of job identifiers when calling this command or an input file containing dg_jobIds by means of the --input option. When the --input is used, the user is requested to choose all, one or a subset of the job identifiers contained in the input file.

It is important to note that the OutputSandbox of a submitted job can only be retrieved when the job has reached the OutputReady status (see Annex 7.2) indicating that the job is done and the OutputSandbox files are ready for retrieval on the RB machine. dg-job-get-output  will always fail for jobs that are not yet in the OutputReady status.

The user can decide the local directory path on the UI machine where these files have to be stored by means of the --dir option, otherwise the retrieved files are put in a default location specified in the UI_ConfigENV.cfg configuration file (DEFAULT_STORAGE_AREA_IN parameter). In both cases a sub-directory will be added to the path supplied. The name of this sub-directory is the “<time><PID><RND>” unique number of the dg_jobId identifier (see command dg-job-submit for details on the dg_jobId structure).

If the user wants to use his “private” configuration file, this can be done using option --config path_name. As a consequence the dg-job-get-output command looks for the file “path_name” instead of the standard configuration file. If this file does not exist the user is notified with an error message and the command is aborted.

 


 

OPTIONS

--help

            displays command usage.

 

--version

            displays UI version.

 

--dir directory_path

retrieved files (previously listed by the user through the OutputSandbox attribute of the job description file)  are stored in the location indicated by directory_path/<dg_jobId unique string>.

 

--config path_name

-c path_name

if the command is launched with this option, the configuration file pointed to by path_name is used instead of the standard configuration file.

 

 

--noint

if this option is specified every interactive question to the user is skipped. All warning messages and errors (if occurred) are written to the file dg-job-get-output_<UID>_<PID>.log under the /tmp directory. Location of log file is configurable.

 

--debug

when this option is specified, information about parameters used for the API functions calls inside the command are displayed on the standard output and are written to dg-get_job_output_<UID>_<PID>.log file under the /tmp directory too. Location of log file is configurable.

 

--logfile log_file

when this option is specified, the command log file is relocated to the location pointed by log_file

 

dg_jobId

job identifier returned by dg-job-submit. If a list of oe or more job identifiers is specified, dg_jobIds have to be separated by a blank. Job identifiers must be last argument of the command.

 

--input input_file

-i input_file

this option makes the command return the OutputSandbox files for each dg_jobId contained in the input_files. This option can’t be used if one (or more) dg_jobIds have been already specified. The format of the input file must be as follows: one dg_jobId for each line and comment lines must begin with a “#” or a “*” character.

 

Exit Status

dg-job-get-output exits with a status value of 0 (zero) upon success, >0 upon failure and <0 upon partial failure. An example of partial failure is when more than one job identifiers has been specified and the OuputSandbox could be retrieved only for some of them.

 


 

Examples

Let us consider the following command:

 

$> dg-job-get-output https://grid004.it:2234/124.75.74.12/12354732109721?firefox.esrin.esa.it:4577 --dir /home/data

It retrieves the files listed in the OutputSandbox attribute of job identified by https://grid004.it:2234/124.75.74.12/12354732109721?firefox.esrin.esa.it:4577  from the RB and stores them locally in /home/data/12354732109721.


 

 

       dg-job-list-match

Returns the list of resources fulfilling job requirements.

 

 

SYNOPSIS

dg-job-list-match  [options]  <jdl file>

Options:

   --help

   --version

 

   --verbose

   --config, -c    <config_file>

   --output, -o    <output_file>

   --noint

   --debug

   --logfile <log_file>

 

DESCRIPTION

dg-job-list-match displays the list of identifiers of the resources accessible by the user and satisfying the job requirements included in the job description file. The CE identifiers are returned either on the standard output or in a file according to the chosen command options and are strings univocally identifying the CEs published in the GIS. 

dg-job-list-match requires a job description file in which job characteristics and requirements are expressed by means of a Condor class-ad. The job description file is first syntactically checked and then used as the main command-line argument to dg-job-list-match. The Resource Broker is only contacted to find job compatible resources; the job is never submitted. See the dg-job-submit section and in particular Table 1 for general rules for building the job description file.

If the user wants to use his “private” configuration, file this can be done using option --config path_name.

The option --verbose of the dg-job-list-match command can be used to obtain on the standard output the class-ad sent to the RB generated from the job description.

The --output option makes the command save the list of compatible resources into the specified file. If the provided file name is not an absolute path, then the output file is created in the current working dir.

The CEId attribute of the JDL, being a resource attribute, is only taken into account by the dg-job-list-match command if present in the Requirements expression and if prefixed by “other.”. On the other hand the job attribute SubmitTo setting is a reserved to UI and it is hence discarded if provided directly in the jdl file by the user.

 

Job Description File

See dg-job-submit for details.

 


 

OPTIONS

 

--help

            displays command usage.

 

--version

            displays UI version.

 

--verbose

-v

displays on the standard output the job class-ad that is sent to the Resource Broker generated from the job description file. This differs from the content of the job description file since the UI adds to it some attributes that cannot be directly inserted by the user (e.g. CertificateSubject, defaults for Rank and Requirements if not provided).

 

--config path_name

-c path_name

if the command is launched with this option, the configuration file pointed to by path_name is used instead of the standard configuration file.

           

 

--output output_file

-o output_file

returns the CEIds list in the file specified by output_file. output_file can be either a simple name or an absolute path (on the submitting machine). In the former case the file output_file is created in the current working directory.

 

--noint

if this option is specified every interactive question to the user is skipped. All warning messages and errors (if any) are written to the file dg-job-list-match <UID>_<PID>.log under the /tmp directory. Location of the log file is configurable.

 

--debug

when this option is specified, information about the API functions called inside the command are displayed on the standard output and are written to the file dg-job-list-match_<UID>_<PID>.log under the /tmp directory too. Location of the log file is configurable.

 

 

 

--logfile log_file

when this option is specified, the command log file is relocated to the location pointed by log_file

 

job_description_file

this is the file containing the classad describing the job to be submitted. It must be the last argument of the command.

 

Exit Status

dg-job-list-match exits with a status value of 0 (zero) upon success, and a non-zero value upon failure.

 

Examples

Let us consider the following command:

$> dg-job-list-match myjob.jdl

where the job description file myjob.jdl looks like:

 

#########################################   

#                                                

# ---- Sample Job Description File  ----

#                       

#########################################        

Executable   = "sum.exe";

StdInput    = "data.in";

InputSandbox = {"/home_firefox/fpacini/exe/sum.exe","/home1/data.in"};

OutputSandbox = {"data.out","sum.err"};

Rank         = other.MaxCpuTime;

Requirements = other.LRMSType == "Condor" &&

               other.Architecture == "INTEL" && other.OpSys== "LINUX" &&

   other.FreeCpus >= 2;

 

 

In this case the job requires CEs being Condor Pools of INTEL LINUX machines with at least 2 free Cpus.  Moreover the Rank expression states that queues with higher maximum Cpu time allowed for jobs are preferred.

The response of such a command is something as follows:

***************************************************************************

                         Computing Element IDs LIST

The following CE(s) matching your job requirements have been found:

- bbq.mi.infn.it:2119/jobmanager-pbs-dque

- skurut.cesnet.cz:2119/jobmanager-pbs-wp1

***************************************************************************

 $>

 

See also

[A1],[A2], dg-job-submit.

       dg-job-cancel

Cancels one or more submitted jobs.

 

 

SYNOPSIS

dg-job-cancel  [options]  <job Id(s)>

 

Options:

   --help

   --version

 

   --all

   --input, -i     <input_file>

   --notify, -n    <e-mail_address(es)>

   --config, -c    <config_file>

   --output, -o    <output_file>

   --noint

   --debug

   --logfile <log_file>

 

 

DESCRIPTION

This command cancels a job previously submitted using dg-job-submit. Before cancellation, it prompts the user for confirmation. The cancel request is sent to the Resource Broker that forwards it to the JSS that fulfils it.

dg-job-cancel can remove one or more jobs: the jobs to be removed are identified by their job identifiers (dg_jobIds returned by dg-job-submit) provided as arguments to the command and separated by a blank space. The result of the cancel operation is reported to the user for each specified dg_jobId.

If the --all option is specified, all the jobs owned by the user submitting the command are removed. When the command is launched with the --all option, no dg_jobId can be specified. It has to be remarked that only the owner of the job can remove the job.  When the   --all option is specified the dg-job-cancel command contacts every Resource Broker listed in the UI_ConfigEnv.cfg file and asks for the cancellation of all jobs owned by the user identified by her/his certificate subject.

If the user wants to use his “private” configuration file this could be done using option --config path_name

The --input option permits to specify a file (input_file) that contains the dg_jobIds to be removed. The format of the file must be as follows: one dg_jobId for each line and comment lines must begin with a “#” or a “*” character. When using this option the user is interrogated for choosing among all, one or a subset of the listed job identifiers. If the input_file does not represent an absolute path the file will be searched in the current working directory.

Possible job cancellation notifications are:

-        Cancel SUCCESS                      i.e. the job has been successfully marked for removal.

-        Cancel GENERIC_FAILURE     i.e. the user is not the owner of the job or the cancellation request has reached the JSS but has failed for some unknown reason.

-        Cancel CONDOR_FAILURE      i.e. the cancellation request has failed due to a CondorG problem.

-        Cancel GLOBUS_FAILURE       i.e. the cancellation request has failed due to a Globus job-manager problem.

-        Cancel NOENT_FAILURE         i.e. the job has not been found by JSS, by CondorG or by the Resource Broker.

The --notify option can be used to receive jobs cancellation notifications by e-mail. When this option is used the UI does not wait for the cancel notifications from the RB and returns control to the user immediately after the RB has accepted the cancellation request. This can be useful when a great number of jobs to cancel have been specified and the user wants to be able to perform other operations without waiting for the command results.

Notification to multiple contacts can be requested by specifying the corresponding e-mail addresses separated by commas and without blanks.

 


 

OPTIONS

 

--help

            displays command usage.

 

--version

            displays UI version.

 

--all

cancels all job owned by the user submitting the command. This option can’t be used either if one or more dg_jobIds have been specified explicitly or with the –input option.

 

--input input_file

-i input_file

cancels dg_jobId contained in the input_files. This option can’t be used neither if one or more dg_jobIds have been specified nor with the –all option.

 

--notify e_mail_address

-n e_mail_address

when a cancel request is submitted with this option, an e-mail message will be returned to the e_mail_address specified. The message will report on cancellation success/failure of the job specified in input. When the –all option has been specified or cancellation involves more than one job, an e-mail message is sent to the user for each RB that has performed cancellations on behalf of the UI.

Notification to multiple contacts can be requested by specifying the corresponding e-mail addresses separated by commas and without blanks.

 

--config path_name

-c path_name

if the command is launched with this option, the configuration file pointed to by path_name is used instead of the standard configuration file.

                                                                    

 

--output output_file

-o output_file

writes the cancel results in the file specified by output_file instead of the standard output. output_file can be either a simple name or an absolute path (on the submitting machine). In the former case the file output_file is created in the current working directory.

 

 

--noint

if this option is specified every interactive question to the user is skipped. All warning messages and errors (if occurred) are written to the file dg-job-cancel_<UID>_<PID>.log under the /tmp directory. Location of the log file is configurable.

 

--debug

when this option is specified, information about the API functions called inside the command are displayed on the standard output and are written to the file dg-job-cancel_<UID>_<PID>.log under the /tmp directory too. Location of the log file is configurable.

 

--logfile log_file

when this option is specified, the command log file is relocated to the location pointed by log_file

 

dg_jobId

            job identifier returned by dg-job-submit. The job identifier list must be the last argument of this command.

 

Exit Status

dg-job-cancel exits with a status value 0 if all the specified jobs were cancelled successfully, >0 if errors occurred for each specified job id  and <0 in case of partial failure. An example of partial failure is when more then one job has been specified: some jobs could be successfully removed and some others could be not removed.

 

Examples

1.      $> dg-job-cancel  dg_jobId1 dg_jobId2

 

displays the following confirmation message:

Are you sure you want to remove all jobs specified? [y/n]n: y

 

**********************************************

              JOBS CANCEL OUTCOME

Cancel SUCCESS for job:

 - dg_jobId1

The job has been successfully marked for removal

------

Cancel NOENT_FAILURE for job:

    - dg_jobId2

   Job not found by the Resource Broker

**********************************************

      $>

 In this case the command exit code is –1.

 

2.      $> dg-job-cancel –all

 

displays the following confirmation message:

Are you sure you want to remove all jobs owned by user Fabrizio Pacini? [y/n]n: y

 

**********************************************

              JOBS CANCEL OUTCOME

Cancel SUCCESS for job:

 - dg_jobId1

   The job has been successfully marked for removal

   ------

   Cancel SUCCESS for job:

    - dg_jobId2

   The job has been successfully marked for removal

********************************************** 

      $>

The exit code in this case is 0

 

See also

[A2], dg-job-submit.


       dg-job-status

Displays bookkeeping information about submitted jobs.

 

 

SYNOPSIS

dg-job-status  [options]  <job Id(s)>

Options:

   --help

   --version

 

   --all

   --input, -i     <input_file>

   --full, -f

   --config, -c    <config_file>

   --output, -o    <output_file>

   --noint

   --debug

   --logfile <log_file>

 

 

DESCRIPTION

This command prints the status of a job previously submitted using dg-job-submit. The job status request is sent to the LB that provides the requested information. This can be done during the whole job life.

dg-job-status can monitor one or more jobs: the jobs to be checked are identified by one or more job identifiers (dg_jobIds returned by dg-job-submit) provided as arguments to the command and separated by a blank space.

If the --all option is specified, information about all the jobs owned by the user submitting the command is printed on the standard output. When the command is launched with the --all option, neither can a dg_jobId be specified nor can the --input option be specified.

The --input option permits to specify a file (input_file) that contains the dg_jobIds to monitor. The format of the file must be as follows: one dg_jobId for each line and comment lines have to begin with a “#” or a “*” character. When using this option the user is requested for choosing among all, one or a subset of the listed job identifiers. If the input_file does not represent an absolute path, it will be searched in the current working directory.

If the user wants to use his “private” configuration file, this can be done using option --config path_name.

The job information displayed to the user encompasses (bookkeeping information):

-          dg_jobId                      (the job unique identifier)

-          Status                                    (the job current status)

-          Job Exit Code             (the job exit code; if ¹ 0)

-          Job Owner                 (User Certificate Subject)

-          Location                     (Id of RB, JSS or CE)

-          Destination                 (Id of CE where the job will be transferred to)

-          Status Enter Time      (when the job entered actual state)

-          Last Update Time      (last known event timestamp)

-          Status Reason           (reason for being in this state)

 

If the --full option is specified, dg-job-status displays a long description of the queried jobs by printing in addition the following information:

-          CE Node                                (id of cluster(s) node where the job is running)

-          JssId                                      (job identifier in the JSS)

-          GlobusId                                 (job identifier in the Globus job-manager)

-          LocalId                                   (id in the CE queue (PBS, LSF, ..))

-          Job Description (JDL)            (complete JDL description of the job)

-          JSS Job Description (JDL)    (complete JDL job description as sent to the JSS)

-          Job Description                      (job description for Condor-G built from the JDL one)

-         Moving                                                (intermediate state: JobTransfer but neither JobAccepted nor JobRefused has been logged yet; in this case ‘state’ and ‘location’ refer to the source of job transfer.)

-          Cancelling                              (whether job cancellation is in progress)

-          Cancel Reason                      (cancellation status message)

 

Information fields that are not available (i.e. not returned by the LB) are not printed at all to the user.

The job Status possible values are reported in Annex 7.2. Details on the Job Status Diagram can be found in [A4].

 

OPTIONS

--help

            displays command usage.

 

--version

            displays UI version.

 

--all

displays status information about all job owned by the user submitting the command. This option can’t be used either if one or more dg_jobIds have been specified or if the --input option has been specified. All LBs listed in the UI configuration file UI_ConfigENV.cfg are contacted to fulfil this request.

 

--input input_file

-i input_file

displays bookkeeping info about dg_jobIds contained in the input_files. When using this option the user is interrogated for choosing among all, one or a subset of the listed job identifiers. This option can’t be used either if one or more dg_jobIds have been specified or if the --all option has been specified.

 

 

--full

            displays a long description of the queried jobs

 

--config path_name

-c path_name

if the command is launched with this option, the configuration file pointed to by path_name is used instead of the standard configuration file.