DataGrid
WP1 - WMS Software Administrator
and User Guide

|
|
Document identifier: |
DataGrid-01-TEN-0118-0_9 |
|
|
Date: |
03/12/2002 |
|
|
Work package: |
WP1 |
|
|
Partner: |
Datamat SpA |
|
|
|
|
|
|
Document status |
|
|
|
|
|
|
|
Deliverable identifier: |
|
|
Abstract: This note provides the administrator and user guide for the WP1 WMS software. |
|
Delivery Slip |
||||
|
|
Name |
Partner |
Date |
Signature |
|
From |
Fabrizio
Pacini |
Datamat SpA |
03/12/2002 |
|
|
Verified by |
Stefano Beco |
Datamat SpA |
03/12/2002 |
|
|
Approved by |
|
|
|
|
|
Document Log |
|||
|
Issue |
Date |
Comment |
Author |
|
0_0 |
21/12/2001 |
First draft |
Fabrizio Pacini |
|
0_1 |
14/01/2002 |
Draft |
Fabrizio
Pacini |
|
0_2 |
24/01/2002 |
Draft |
Fabrizio
Pacini |
|
0_3 |
05/02/2002 |
Draft |
Fabrizio
Pacini |
|
0_4 |
15/02/2002 |
Draft |
Fabrizio
Pacini |
|
0_5 |
08/04/2002 |
Draft |
Fabrizio
Pacini |
|
0_6 |
13/05/2002 |
|
Fabrizio
Pacini |
|
0_7 |
19/07/2002 |
|
Fabrizio
Pacini |
|
0_8 |
16/09/2002 |
|
Fabrizio
Pacini |
|
0_9 |
03/12/2002 |
|
Fabrizio
Pacini |
|
Document Change
Record |
||
|
Issue |
Item |
Reason for Change |
|
0_1 |
General update |
- Take into account changes in the rpm generation procedure. - Add missing info about daemons (RB/JSS/CondorG) starting accounts - Some general corrections |
|
0_2 |
General Update |
- Add Cancelling and Cancel Reason information. - Add OUTPUTREADY job state. - Add new profile rpms. - Remove /etc/workload* shell scripts. - Add summary map table (user / daemon). - Add CEId format check. - Add new job cancel notification. |
|
0_3 |
General Update |
- Modified RB/JSS start-up procedure - Add gridmap-file users/groups issues - Add proxy certificate usage by daemons - Job attribute CEId changed to SubmitTo - Add DGLOG_TIMEOUT setting - Add workload-profile and userinterface-profile rpms |
|
0_4 |
General Update |
- Add configure option –enable-wl for system configuration files - Add installation checking option –with-globus for Globus to the Workload configure - Add new Information Index configure options - Remove edg-profile and edg-user-env rpms from II and UI dependencies - Add security configuration rpm’s for all the Certificate Authorities to UI dependencies - Add new parameters to RB configuration file - Add new Job Exit Code field to the returned job status info - Remove dependence from SWIG in the userinterface binary rpm |
|
0_5 |
General Update |
- Modify command options syntax (getopt-like style) - Add MyProxy server and client package installation/utilisation - Modify job cancel notification - Add Userguide rpm |
|
0_6 |
General Update |
- Modify configure options for the various components - UI commands modified to use python2 executable - Clarify myproxy usage - Explain how RB/LB addresses in the UI config file are used by the commands - Add –logfile option to the UI commands |
|
0_7 |
General Update |
- Modify configure options for the various components - Clarify UI commands –notify option usage - Add make test target for UI |
|
0_8 |
General Update |
- Specified dependencies of profile rpms - Update needed env vars for UI - Explain how to include default constraints in the job requirements - Explain that the lc field in the ReplicaCatalog address is now mandatory - Explain how to specify wildcards and special chars in "Arguments" in the JDL expression
|
|
0_9 |
General Update |
- Defaults for Rank and Requirements in the UI config file - Added reference to the “.BrokerInfo” file document - other.CEId in Requirements vs --resource option - Explain MyProxy Server configuration - Added description of new parameters in RB configuration file - RB/JSS databases clean-up procedure added - Explain usage of RetryCount JDL attribute - Better explain how to specify wildcards and special chars in "Arguments" in the JDL expression - Updated reference to JDL Attributes note - Added Annex on Submission failures analysis |
|
Files |
|
|
Software
Products |
User
files |
|
Word 97 |
DataGrid-01-TEN-0118-0_9_Document.doc |
|
Acrobat Exchange 4.0 |
DataGrid-01-TEN-0118-0_9 |
Content
1.1. Objectives of
this document
1.3. Applicable documents and reference documents
1.4. Document
evolution procedure
4. Installation and Configuration
4.1. Logging and
Bookkeeping services
4.1.3. The installation tree structure
4.2.1.1. PostgreSQL installation and configuration
4.2.1.2. Condor-G installation and configuration
4.2.1.3. ClassAd installation and configuration
4.2.1.4. ReplicaCatalog installation and configuration
4.2.3. The Installation Tree structure
4.3.3. The Installation tree structure
5.1.1. Starting and stopping daemons
5.2.1. Starting and stopping daemons
5.2.2. Purging the LB database
5.3.2. Starting and stopping JSS and RB daemons
5.3.3. RB and JSS databases clean-up
5.4.1. Starting and stopping daemons
7.4. Submission
Failures Analysis
7.4.3. Job Aborted (no matching resources - II not reachable)
7.4.4. Job Aborted (Standard output of job wrapper does not contain
useful data)
7.4.5. Job Aborted (CondorG failure)
7.6. The Match
Making Algorithm
7.6.2. Job submission without data-access requirements
7.6.3. Job submission with data-access requirements
7.7. Process/User
Mapping Table
This document provides a guide to the building, installation and usage of the WP1 WMS software released within the DataGrid project.
Goal of this document is to describe the complete process by which the WP1 WMS software can be installed and configured on the DataGrid test-bed platforms.
Guidelines for operating the whole system and accessing provided functionalities are also provided.
Administrators can use this document as
a basis for installing, configuring and operating WP1 WMS software. Users can
refer to the User Guide chapter for accessing provided services through the
User Interface.
Applicable
documents
|
[A1] |
Job Description
Language HowTo –
DataGrid-01-TEN-0102-02 – 17/12/2001 (http://www.infn.it/workload-grid/docs/DataGrid-01-TEN-0102-0_2.pdf) |
|
[A2] |
DATAGRID WP1 Job
Submission User Interface for PM9 (revised presentation) – 23/03/2001 (http://www.infn.it/workload-grid/docs/20010320-JS-UI-datamat.pdf) |
|
[A3] |
WP1 meeting - CESNET
presentation in Milan – 20-21/03/2001 (http://www.infn.it/workload-grid/docs/20010320-L_B-matyska.pdf) |
|
[A4] |
Logging and Bookkeeping
Service – 0705/2001 (http://www.infn.it/workload-grid/docs/20010508-lb_draft-ruda.pdf) |
|
[A5] |
Results of Meeting on
Workload Manager Components Interaction – 09/05/2001 (http://www.infn.it/workload-grid/docs/20010508-WM-Interactions-pacini.pdf) |
|
[A6] |
Resource Broker
Architecture and APIs – 13/06/2001 (http://www.infn.it/workload-grid/docs/20010613-RBArch-2.doc) |
|
[A7] |
JDL Attributes -
DataGrid-01-NOT-0101-0_7 – 03/12/2002 (http://www.infn.it/workload-grid/docs/DataGrid-01-NOT-0101-0_7.{doc,pdf}) |
Reference documents
|
[R1] |
The Resource Broker
Info file – DataGrid-01-TEN-0135-0_0 (http://www.infn.it/workload-grid/docs/DataGrid-01-TEN-0135-0_0.{doc,pdf}) |
The content of this document will be subjected to modification according to the following events:
· Comments received from Datagrid project members,
· Changes/evolutions/additions to the WMS components.
Definitions
|
Condor |
Condor is a High Throughput Computing (HTC) environment that can manage very large collections of distributively owned workstations |
|
Globus |
The Globus Toolkit is a set of software tools and libraries aimed at the building of computational grids and grid-based applications. |
Glossary
|
class-ad |
Classified
advertisement |
|
CE |
Computing Element |
|
DB |
Data Base |
|
FQDN |
Fully
Qualified Domain Name |
|
GDMP |
Grid Data
Management Pilot Project |
|
GIS |
Grid Information Service, aka MDS |
|
GSI |
Grid Security Infrastructure |
|
job-ad |
Class-ad describing a job |
|
JDL |
Job Description Language |
|
JSS |
Job Submission Service |
|
LB |
Logging and Bookkeeping Service |
|
LRMS |
Local Resource Management System |
|
MDS |
Metacomputing Directory Service, aka GIS |
|
MPI |
Message
Passing Interface |
|
PID |
Process Identifier |
|
PM |
Project Month |
|
RB |
Resource Broker |
|
RC |
Replica
Catalogue |
|
SE |
Storage Element |
|
SI00 |
Spec Int 2000 |
|
SMP |
Symmetric Multi
Processor |
|
TBC |
To Be Confirmed |
|
TBD |
To Be Defined |
|
UI |
User Interface |
|
UID |
User Identifier |
|
WMS |
Workload Management System |
|
WP |
Work Package |
This document comprises the following main sections:
Section 3: Build
Procedure
Outlines the software required to build the system and the actual process for building it and generating rpms for the WMS components; a step-by-step guide is included.
Section 4:
Installation and Configuration
Describes changes that need to be made to the environment and the steps to be performed for installing the WMS software on the test-bed target platforms. The resulting installation tree structure is detailed for each system component.
Section 5: Operating
the System
Provides actual procedures for starting/stopping WMS components processes and utilities.
Section 6: User Guide
Describes in a Unix man pages style all User Interface component commands allowing the user to access WMS provided services.
Section 7: Annexes
Deepens
arguments introduced in the User Guide
section that are considered useful for the user to better understand system
behaviour.
In the following section we give detailed instructions for the
installation of the WP1 WMS software package. We provide a source code
distribution as well as a binary distribution and explain installation
procedures for both cases.
The WP1 software runs and has been tested on platforms running Globus Toolkit 2.0 Beta Release 21 on top of Linux RedHat 6.2.
Hereafter are listed the software packages, apart from WP1 software version 1.0, that are required to be installed locally on a given site in order to be able to build the WP1 WMS on it. They are:
- Globus Toolkit 2.0 Beta 21 or higher (download at http://datagrid.in2p3.fr/distribution/globus/beta-21)
- Python 2.1.1 (download at http://datagrid.in2p3.fr/distribution/config/external.html)
- Swig 1.3.9 (download at http://datagrid.in2p3.fr/distribution/config/external.html)
- Expat 1.95.1 (download at http://datagrid.in2p3.fr/distribution/config/external.html)
- Expat-devel 1.95.1 (download at http://datagrid.in2p3.fr/distribution/config/external.html)
- MySQL Version 9.38 Distribution 3.22.32, for pc-linux-gnu (i686) (download at http://datagrid.in2p3.fr/distribution/config/external_services.html)
- MySQL Version 11.15 Distribution 3.23.42, for pc-linux-gnu (i686)
(download at http://datagrid.in2p3.fr/distribution/external/RPMS/). Hereafter the needed rpms:
MySQL-shared-3.23.42-1
MySQL-client-3.23.42-1
MySQL-3.23.42-1
MySQL-devel-3.23.42-1
-
Postgresql
7.1.3 (http://datagrid.in2p3.fr/distribution/config/external_services.html)
- Classads library (download at http://datagrid.in2p3.fr/distribution/external/RPMS/classads-0.0-edg2.i386.rpm)
- CondorG 6.3.1 for INTEL-LINUX-GLIBC21 (download at
http://datagrid.in2p3.fr/distribution/external/RPMS/CondorG-6.3.1-edg5.i386.rpm)
-
Perl
IO Stty 0.02, Perl IO Tty 0.04 (download at http://datagrid.in2p3.fr/distribution/config/external.html
)
- MyProxy-0.4.4 (download at http://datagrid.in2p3.fr/distribution/external/RPMS/). Hereafter the needed rpms:
myproxy-server-0.4.4-edg6.i386.rpm (for the MyProxy Server machine)
myproxy-client-0.4.4-edg6.i386.rpm (for the UI machine)
- Perl 5 (download at http://datagrid.in2p3.fr/distribution/config/external.html)
- gcc version 2.95.2
- GNU make version 3.78.1 or higher
-
GNU autoconf
version 2.13
- GNU libtool 1.3.5
- GNU automake 1.4
- GNU m4 1.4 or higher
- RPM 3.0.5
- sendmail 8.11.6
The following instructions deal with the building of the WMS software and hence apply to the source code distribution.
Before starting the compilation, some environment variables related to the WMS components can be set or configured by means of the configure script. This is needed only if package defaults are not suitable. Involved variables are listed below:
- GLOBUS_LOCATION base directory of the Globus installation
The default path is /opt/globus.
- MYSQL_INSTALL_PATH base directory of the MySQL installation
The default path is /usr.
- EXPAT_INSTALL_PATH base directory of the Expat installation.
The default path is /usr.
- GDMP_INSTALL_PATH base directory of the Gdmp installation
The default path is /opt/edg.
- PGSQL_INSTALL_PATH base directory of the Pgsql installation.
The default path is /usr.
- CLASSAD_INSTALL_PATH base directory of the Classad library installation. The
default path is /opt/classads.
- CONDORG_INSTALL_PATH base directory of the Condor installation.
The default path is /opt/CondorG.
- PYTHON_INSTALL_PATH base directory of the Python installation.
The default path is /usr.
- SWIG_INSTALL_PATH base directory of the Swig installation .
The default path is /usr/local.
- MYPROXY_INSTALL_PATH base directory of the MyProxy installation .
The default path is /usr/local.
In order to build the whole WP1 package, all the environment variables in the previous list must be set. Instead for building the User Interface module, the environment variables that need to be set are the following:
- GLOBUS_LOCATION
- CLASSAD_INSTALL_PATH
- PYTHON_INSTALL_PATH
- SWIG_INSTALL_PATH
- EXPAT_INSTALL_PATH
If you plan to build the Job Submission and Resource Broker module, variable to set are:
- GLOBUS_LOCATION
- MYSQL_INSTALL_PATH
- EXPAT_INSTALL_PATH
- GDMP_INSTALL_PATH
- PGSQL_INSTALL_PATH
- CLASSAD_INSTALL_PATH
- CONDORG_INSTALL_PATH
If you plan to build the Proxy module, variables to set are:
-
GLOBUS_LOCATION
-
MYPROXY_INSTALL_PATH
Whilst the LB server and Local Logger modules, to be built need the following environment variables:
- GLOBUS_LOCATION
- MYSQL_INSTALL_PATH
- EXPAT_INSTALL_PATH
Finally, the LB library module needs:
- GLOBUS_LOCATION
- EXPAT_INSTALL_PATH
and the Information Index module only:
- GLOBUS_LOCATION
After having unpacked the WP1 source distribution tar file, or having downloaded the code directly from the CVS repository, change your working directory to be the WP1 base directory, i.e. the Workload directory, and run the following command:
./recursive-autogen.sh
At this point the configure command can be run. The configure script has to be invoked as follows:
./configure
<options>
The list of options that are recognized by configure is reported hereafter:
---help
--prefix=<installation
path>
It is used to specify the Workload installation dir. The default
installation dir is /opt/edg.
--enable-all
It is used to enable the build of the whole WP1 package. By default this option is turned on.
--enable-userinterface
It is used to enable the build of the User Interface module with Logging/Client, Broker/Client, Broker/Socket++ and ThirdParty/trio/src sub modules. By default this option is turned off.
--enable-userinterface_profile
It is used to enable the installation of the User Interface profile. By default this option is turned off.
--enable-jss_rb
It is used to enable the build of the Job Submission and Resource Broker modules with Logging/Client, Common, test, Proxy/Dgpr, and ThirdParty/trio/src submodules. By default this option is turned off.
--enable-jss_profile
It is used to enable the installation of the Job Submission and Resource Broker profile with JobSubmission/utils, and Broker/utils sub modules. By default this option is turned off.
--enable-lbserver
It is used to enable the build of the LB Server service with Logging/Client, Logging/etc, Logging/Server, Logging/InterLogger/Net, Logging/InterLogger/SSL, Logging/InterLogger/Error, Logging/InterLogger/Lbserver and ThirdParty/trio/src sub modules. By default this option is turned off.
--enable-locallogger
It is used to enable the build of the LB Local Logger service with Logging/Client, Logging/InterLogger/Net, Logging/InterLogger/SSL, Logging/InterLogger/Error, Logging/InterLogger/InterLogger, Logging/LocalLogger, man and ThirdParty/trio/src sub modules. By default this option is turned off.
--enable-locallogger_profile
It is used to enable the installation of the LB LocalLogger profile. By default this option is turned off.
--enable-logging_dev
It is used to enable the build of the LB Client Library with Logging/Client and ThirdParty/trio/src sub modules. By default this option is turned off.
--enable-information
It is used to enable the build of the Information Index module.By default this option is turned off.
--enable-information_profile
It is used to enable the installation of the Information Index profile with InformIndex/utils sub module. By default this option is turned off.
--enable-wl
It is used to enable the installation of system configuration files that are in the Workload/etc directory. By default this option is turned off.
--enable-proxy
It is used to enable the build of the Proxy module. By default this option is turned off.
--with-globus-install=<dir>
It allows specifying the Globus installation directory without setting the environment variable GLOBUS_LOCATION.
--with-pgsql-install=<dir>
It allows specifying the Pgsql installation directory without setting the environment variable PGSQL_INSTALL_PATH.
--with-gdmp-install=<dir>
It allows specifying the GDMP installation directory without setting the environment variable GDMP_INSTALL_PATH.
--with-expat-install=<dir>
It allows specifying the Expat installation directory without setting the environment variable EXPAT_INSTALL_PATH.
--with-mysql-install=<dir>
It allows to specify the MySQL installation directory without setting the environment variable MYSQL_INSTALL_PATH.
--with-myproxy-install=<dir>
It allows to specify the MyProxy installation directory without setting the environment variable MYPROXY_INSTALL_PATH
During the configure step, 12 spec files (i.e. wl-userinterface.spec, wl-locallogger.spec, wl lbserver.spec, wl-logging_dev.spec, wl-jss_rb.spec, wl-information.spec, wl-userinterface-profile.spec, wl-jss_rb-profile.spec, wl-information-profile.spec, wl-lbserver-profile.spec and wl-locallogger-profile.spec, wl-workload-profile.spec) are created in the following source sub-directories to produce a flavour specific version:
- Workload/UserInterface
- Workload/Proxy
- Workload/Logging
- Workload/JobSubmission
- Workload/InformIndex
- Workload
Once the configure script has terminated its execution, check that the make from the GNU distribution is in your path and then always in the Workload source code directory run:
make
then:
make apidoc
and then:
make check
to build the test code. If the two previous steps complete successfully, the installation of the software can be performed. In order to install the package in the installation directory specified either by the --prefix option of the configure script or by the default value (i.e. /opt/edg), you can now issue the command:
make install
It is possible to run "make clean" to remove object files, executable files, library files and all the other files that are created during ”make” and “make check”. The command:
make -i dist
can be used to produce in the workload-X.Y.Z directory, located in the Workload's base directory, a binary gzipped tar ball of the Workload distribution. This tar ball can be both transferred on other platforms and used as source for the RPM creation.
For creating the RPMs for Workload 1.0 (according to the configure options you have used) make sure that your PATH is set in such a way that the GNU autotools, make and the gcc compiler can be used and edit the file $HOME/.rpmmacros (if this file does not exist in your home directory, then you have to create it) to set the following entry:
%_topdir <your home dir>/rpm/redhat
Then you can issue the command:
make rpm
that generates the RPMs in $(HOME)/rpm/redhat/RPMS.
For example if before building the package you have used the configure as follows:
./configure –-enable-all
then the make rpm command creates the directories:
$(HOME)/rpm/redhat/SOURCES
$(HOME)/rpm/redhat/SPECS
$(HOME)/rpm/redhat/BUILD
$(HOME)/rpm/redhat/RPMS
$(HOME)/rpm/redhat/SRPMS
and copies the previously created tar ball workload-X.Y.Z/Workload.tar.gz in $(HOME)/rpm/redhat/SOURCES. Moreover it copies the generated spec files:
JobSubmission/wl-jss_rb.spec
JobSubmission/wl-jss_rb-profile.spec
UserInterface/wl-userinterface.spec
UserInterface/wl-userinterface.spec
InformIndex/wl-information.spec
InformIndex/wl-informationpthr.spec
InformIndex/wl-information-profile.spec
Logging/wl-lbserver.spec
Logging/wl-lbserver-profile.spec
Logging/wl-locallogger.spec
Logging/wl-locallogger-profile.spec
Logging/wl-logging_dev.spec
Proxy/wl-proxy.spec
Workload/wl-workload-profile.spec
Workload/wl-userguide.spec
in $(HOME)/rpm/redhat/SPECS and finally executes the following commands:
rpm -ba wl-userinterface.spec
rpm –ba wl-userinterface-profile.spec
rpm -ba wl-locallogger.spec
rpm -ba wl-locallogger-profile.spec
rpm
-ba wl-lbserver.spec
rpm
-ba wl-lbserver-profile.spec
rpm -ba wl-logging_dev.spec
rpm -ba wl-jss_rb.spec
rpm -ba wl-jss_rb-profile.spec
rpm -ba wl-information.spec
rpm -ba wl-informationpthr.spec
rpm -ba wl-information-profile.spec
rpm -ba wl-proxy.spec
rpm –ba wl-workload-profile.spec
rpm -ba wl-userguide.spec
generating respectively the following rpms in the $(HOME)/rpm/redhat/RPMS directory:
- userinterface-X.Y.Z-K.i386.rpm
- userinterface-profile-X.Y.Z-K.i386.rpm
- locallogger- X.Y.Z-K.i386.rpm
- locallogger-profile- X.Y.Z-K.i386.rpm
- lbserver- X.Y.Z-K.i386.rpm
- lbserver-profile- X.Y.Z-K.i386.rpm
- logging_dev- X.Y.Z-K.i386.rpm
- jobsubmission- X.Y.Z-K.i386.rpm
- jobsubmission-profile- X.Y.Z-K.i386.rpm
- informationindex- X.Y.Z-K.i386.rpm
- informationindexpthr-X.Y.Z-K.i386.rpm
- informationindex-profile- X.Y.Z-K.i386.rpm
- proxy-X.Y.Z-K.i386.rpm
- workload-profile-X.Y.Z-K.i386.rpm
- userguide-X.Y.Z-K.i386.rpm
where X.Y.Z-K indicates the rpms release.
If you have instead built only the User Interface, i.e. used:
./configure
--disable-all --enable-userinterface
the make rpm command will copy only the file UserInterface/wl-userinterface.spec and the file UserInterface/wl-userinterface-profile.spec in $(HOME)/rpm/redhat/SPECS and will create only the User Interface rpms (userinterface-X.Y.Z-K.i386.rpm and userinterface-profile-X.Y.Z-K.i386.rpm).
The User Interface has an additional make target to install the userinterface test suite allowing the performing of unit tests (i.e. without contacting any external component). You have to run the following commands in Worklaod/UserInterface:
./autogen.sh
./configure –disable-all –enable-tests
make tests
and you will find the commands ready to run together with the test files in Workload/UserInterface/test.
An alternative procedure can be followed to build the II and Logging packages. To do this, move in the Workload/InformIndex dir and run the following commands:
./autogen.sh
./configure [option]
where the recognised options are:
--prefix=<install
path>
It is used to specify the Information Index installation dir. The default installation dir is /opt/edg
--with-globus-install=<dir>
It allows to specify the Globus install directory without setting the environment variable GLOBUS_LOCATION.
Then issue:
make
make install
Afterwards move into the Workload/Logging directory and run the following commands:
./autogen.sh
./configure [option]
where the recognised options are:
--enable-all
It is used to enable the build of the
Logging and Bookkeeping package.
By default this option is turned on.
--enable-userinterface
It is used to enable the build of the Client sub module. By default this option is turned off.
--enable-graphical_userinterface
It is used to enable the build of the Client sub module. By default this option is turned off.
--enable-jss_rb
It is used to enable the build of the Client sub module. By default this option is turned off.
--enable-lbserver
It is used to enable the build of the Logging And Bookkeeping Server service with Client, etc, Server, InterLogger/Net, InterLogger/SSL, InterLogger/Error, InterLogger/Lbserver and ThirdParty/trio/src sub modules. By default this option is turned off.
--enable-lbserver_profile
It is used to enable the installation of the LB Server profile with Logging/utils sub module. By default this option is turned off.
--enable-locallogger
It is used to enable the build of the Logging And Bookkeeping Local Logger service with Client, InterLogger/Net, InterLogger/SSL, InterLogger/Error, InterLogger/InterLogger, LocalLogger, Apidoc, and ThirdParty/trio/src sub modules. By default this option is turned off.
--enable-logging_dev
It is used to enable the build of the Logging And Bookkeeping Client Library with Client and ThirdParty/trio/src sub modules. By default this option is turned off.
--prefix=<install path>
It is used to specify the Logging installation dir. The default installation dir is /opt/edg
--with-globus-install=<dir>
It allows specifying the Globus install directory without setting the environment variable GLOBUS_LOCATION.
--with-expat-install=<dir>
It allows specifying the Expat install directory without setting the environment variable EXPAT_INSTALL_PATH
--with-mysql-install=<dir>
It allows specifying the MySQL install directory without setting the environment variable MYSQL_INSTALL_PATH.
Then issue:
make
make apidoc
make check
make install
Summarising, in relation to the WMS module you want to build, the configure script has to be run with the following options:
- all
./configure
- userinterface
./configure --disable-all
--enable-userinterface
- information
./configure --disable-all
--enable-information
- lbserver
./configure --disable-all
--enable-lbserver
- locallogger
./configure --disable-all
--enable-locallogger
- logging for developers
./configure --disable-all
--enable-logging_dev
- jobsubmission and broker
./configure
--disable-all --enable-jss_rb
-
wl
./configure –disable-all –enable-wl
-
proxy
./configure --disable-all
--enable-proxy
-
userinterface
profile
./configure
--disable-all --enable-userinterface_profile
-
information
profile
./configure
--disable-all --enable-information_profile
-
information
pthread
./configure
--disable-all --enable-information --with-globus-flavor=gcc32dbgpthr
-
lbserver
profile
./configure
--disable-all --enable-lbserver_profile
-
locallogger
profile
./configure
--disable-all --enable-locallogger_profile
-
jobsubmission
and broker profile
./configure
--disable-all --enable-jss_profile
In order to install the WP1 RPMs on the target platforms, the following commands have to be executed as root:
rpm -ivh
workload-profile.X.Y.Z-K.i386.rpm
rpm –ivh userinterface-profile-X.Y.Z-K.i386.rpm
rpm -ivh userinterface-X.Y.Z-K.i386.rpm
rpm -ivh
informationindex-profile-X.Y.Z-K.i386.rpm
rpm
-ivh informationindex-X.Y.Z-K.i386.rpm
rpm
–ivh informationindexpthr-X.Y.Z-K.i386.rpm
rpm
-ivh jobsubmission-profile-X.Y.Z-K.i386.rpm
rpm -ivh jobsubmission-X.Y.Z.i386.rpm
rpm -ivh
locallogger-profile-X.Y.Z-K.i386.rpm
rpm -ivh locallogger-X.Y.Z-K.i386.rpm
rpm -ivh
lbserver-profile-X.Y.Z-K.i386.rpm
rpm -ivh lbserver-X.Y.Z-K.i386.rpm
rpm -ivh logging_dev-X.Y.Z-K.i386.rpm
rpm -ivh proxy-X.Y.Z-K.i386.rpm
rpm -ivh
userguide-X.Y.Z-k.i386.rpm
By default all the rpms install
the software in the /opt/edg
directory, but the profile rpms (i.e. informationindex-profile,
jobsubmission-profile, locallogger-profile and lbserver-profile) that
install instead in /etc/rc.d/init.d.
All the profile rpms depend on the workload-profile rpm that in turn only depends on the bash rpm (whose version shoul be less than 2). Each component’s rpm then depends on the corresponding profile rpm (e.g. userinterface x.y.z depends on userinterface-profile-x.y.z that depends on workload-profile-x.y.z).
If you install one of the following rpms:
- jobsubmission-X.Y.Z-K.i386.rpm
- locallogger-X.Y.Z-K.i386.rpm
- lbserver-X.Y.Z-K.i386.rpm
- informationindex-X.Y.Z-K.i386.rpm
- informationindexpthr-X.Y.Z-K.i386.rpm
you will have all needed files installed in /opt/edg and it is necessary to install the configuration and start-up files also in /etc/rc.d/init.d additionally installing the corresponding profile rpms. Namely using the rpms:
- jobsubmission-profile-X.Y.Z-K.i386.rpm
- locallogger-profile-X.Y.Z-K.i386.rpm
- lbserver-profile-X.Y.Z-K.i386.rpm
- informationindex-profile-X.Y.Z-K.i386.rpm
the following scripts are respectively installed in /etc/rc.d/init.d
- broker and jobsubmission
- locallogger
- lbserver
- information_index
The administrator (with root privileges) has then to issue from /etc/rc.d/init.d the command:
$ <script> start
to start the desired component. All start-up scripts accept the start, stop, restart and status options but the information_index that only supports start/stop.
The workload-profile-X.Y.Z-K.rpm
installs some scripts common to all services of the workload management:
/etc/sysconfig/edg_workload
/etc/sysconfig/edg_workload.csh
<install-path>/etc/workload.sh
<install-path>/etc/workload.csh
They are needed to define and export some variables for the startup
script environment. Above all, the PATH and the LD_LIBRARY_PATH needed to
correctly run all the software.
The jobsubmission-profile-X.Y.Z-K.i386.rpm as premised, additionally installs the wl-jss_rb-env.sh configuration file in /opt/edg/etc, that is read by the broker and jobsubmission startup files when they are launched as root. The /opt/edg/etc/wl-jss_rb-env.sh file contains setting for the following variables:
- CONDORG_INSTALL_PATH the CondorG installation path. Default value is
/home/dguser/CondorG
- CONDOR_IDS this is needed by condor to know under which
user it has to run. Value for this variable has to be set in the format uid.gid where uid is the user identifier and gid is the group identifier. This value has to be set by the system administrator.
- JSSRB_USER the user running RB and JSS processes.
Generally the value of this variable is the user name corresponding to the uid.gid set for the CONDOR_IDS variable.
Details on the installation and configuration and of each of the listed rpms are provided in section 4 of this document. For further information about RPM please consult the man pages or http://www.rpm.org.
This section deals with the procedures for installing and configuring the WP1 WMS components on the target platforms. For each of them, before starting with the installation procedure which is described through step-by-step examples, is reported the list of dependencies i.e. the software required on the same machine by the component to run. Moreover a description of needed configuration items and environment variables settings is also provided. It is important to remark that since the rpms are generated using gcc 2.95.2 and RPM 3.0.5 it is expected to find the same configuration on the target platforms.
From the installation point of view LB services can be split in two main components:
The LB local-logger services must be installed on all the machines hosting processes pushing information into the LB system, i.e. the machines running RB and JSS, and the gatekeeper machine of the CE. An exception is the submitting machine (i.e. the machine running the User Interface) on which this component can be installed but is not mandatory:
The LB server services need instead to be installed only on a server machine that usually coincides with the RB server one.
For the installation of the LB local-logger the only
software required is the Globus Toolkit 2.0 (actually only GSI rpms are
needed). Globus 2 rpms are available at
http://datagrid.in2p3.fr/distribution/globus
under the directory beta-xx/RPMS
(recommended beta is 21 or higher). All rpms can be downloaded with the command
wget -nd –r <URL>/<rpm name>
and
installed with
rpm –ivh <rpm name>
For the installation of the LB server the Globus Toolkit 2.0
(actually only GSI rpms are needed).
Globus 2 rpms are available at http://datagrid.in2p3.fr/distribution/globus under the directory beta-xx/RPMS (recommended beta is 21 or higher). All rpms can be downloaded with the
command
wget -nd –r <URL>/<rpm name>
and
installed with
rpm –ivh <rpm name>
Besides Globus Toolkit 2.0 for the LB server to work properly it is also necessary to install MySQL Distribution 3.22.31 or higher.
Instructions about MySQL installation can be found at the following URLs:
http://www.redhat.com/support/resources/faqs/RH-apache-FAQ/MySQL/mysql-install.htm
Packages and more general documentation can be found at:
http://www.mysql.org/listcats3.php?menu=21&page_id=9.
Anyway the rpm of MySQL Ver 9.38 Distribution 3.22.32, for pc-linux-gnu (i686) is available at http://datagrid.in2p3.fr/distribution/config/external_services.html.
At least packages MySQL-3.22.32 and MySQL-client-3.32.22 have to be installed for creating and configuring the LB database.
LB server stores the logging data in a MySQL database that must hence be created. The following assumes the database and the server daemons (bkserver and ileventd) run on the same machine, which is considered to be secure, i.e. no database authentication is used. In a different set-up the procedure has to be adjusted accordingly as well as a secure database connection (via ssh tunnel etc.) established.
The action list below contains placeholders DB_NAME and USER_NAME, real values have to be substituted. They form the database connection string required on some LB daemons invocation. Suggested value for both DB_NAME and USER_NAME is `lbserver', this value is also the compiled-in default (i.e. when used, the database connection string needn't be specified at all).
The following needed steps require MySQL root privileges:
1)
Create the database:
mysqladmin
-u root -p create DB_NAME
where DB_NAME is the name of the database.
2) Create a dedicated LB database user:
mysql -u root -p -e 'grant create,drop,select,insert, \ update,delete on DB_NAME.* to USER_NAME@localhost'
where USER_NAME is the name of the user running the LB server daemons.
3) Create the database tables:
mysql
-u USER_NAME DB_NAME < server.sql
where server.sql is a file containing sql commands for creating needed tables. server.sql can be found in the directory “<install path>/etc” created by the LB server rpm installation.
In order to install the LB local-logger and the LB server services, the following command have to be respectively issued with root privileges:
rpm -ivh
workload-profile.X.Y.Z-K.i386.rpm
rpm –ivh
locallogger-X.Y.Z-K.i386.rpm
rpm -ivh
locallogger-profile-X.Y.Z-K.i386.rpm
rpm -ivh
lbserver-X.Y.Z-K.i386.rpm
rpm -ivh lbserver-profile-X.Y.Z-K.i386.rpm
By default the locallogger-X.Y.Z-K.i386.rpm and lbserver-X.Y.Z-K.i386.rpm rpms install the software in the “/opt/edg” directory whilst the remaining two in “/etc/rc.d/init.d”.
When the LB local-logger RPMs are installed, the following directory tree is created:
<install-path>/info
<install-path>/info/interlogger.info
<install-path>/lib
<install-path>/man
<install-path>/man/man1
<install-path>/man/man1/interlogger.1
<install-path>/man/man3
<install-path>/man/man3/_dgLBJobStat.3
<install-path>/man/man3/_dgLBQueryRec.3
<install-path>/man/man3/dgLBEvent.3
<install-path>/man/man3/dglbevents.3
<install-path>/man/man3/dglog.3
<install-path>/man/man3/dgssl.3
<install-path>/man/man3/dgxferlog.3
<install-path>/man/man3/escape.3
<install-path>/man/man3/lbapi.3
<install-path>/sbin
<install-path>/sbin/dglogd
<install-path>/sbin/interlogger
<install-path>/sbin/locallogger
<install-path>/share
<install-path>/share/doc
<install-path>/share/doc/Workload
<install-path>/share/doc/Workload/Logging
<install-path>/share/doc/Workload/Logging/html
<install-path>/share/doc/Workload/Logging/html/annotated.html
<install-path>/share/doc/Workload/Logging/html/class__dgLBJobStat-include.html
<install-path>/share/doc/Workload/Logging/html/class__dgLBJobStat-members.html
<install-path>/share/doc/Workload/Logging/html/class__dgLBJobStat.html
<install-path>/share/doc/Workload/Logging/html/class__dgLBQueryRec-include.html
<install-path>/share/doc/Workload/Logging/html/class__dgLBQueryRec-members.html
<install-path>/share/doc/Workload/Logging/html/class__dgLBQueryRec.html
<install-path>/share/doc/Workload/Logging/html/class_dgLBEvent-include.html
<install-path>/share/doc/Workload/Logging/html/class_dgLBEvent.html
<install-path>/share/doc/Workload/Logging/html/doxygen.gif
<install-path>/share/doc/Workload/Logging/html/files.html
<install-path>/share/doc/Workload/Logging/html/functions.html
<install-path>/share/doc/Workload/Logging/html/globals.html
<install-path>/share/doc/Workload/Logging/html/headers.html
<install-path>/share/doc/Workload/Logging/html/index.html
<install-path>/share/doc/Workload/Logging/html/null.gif
<install-path>/share/doc/Workload/Logging/refman.ps
/etc/rc.d/init.d
/etc/rc.d/init.d/locallogger
The sbin directory contains all the LB local-logger daemons executables. The script locallogger contained in “/etc/rc.d/init.d “ has to be used for starting daemons. In the man directory can be found the man page for the inter-logger daemon.
When the LB server RPMs are installed, the following directory tree is created:
<install-path>/etc
<install-path>/etc/server.sql
<install-path>/lib
<install-path>/sbin
<install-path>/sbin/bkpurge
<install-path>/sbin/bkserver
<install-path>/sbin/ileventd
<install-path>/sbin/lbserver
/etc/rc.d/init.d
/etc/rc.d/init.d/lbserver
where the sbin
directory contains all the LB server
daemons executables. The script lbserver contained
in “/etc/rc.d/init.d “ has to be used
for starting daemons.
Both the LB local-logger and LB server have no configuration files so no action is needed for this task.
All LB components need the following environment variables to be set:
- X509_USER_KEY the user private key file path
- X509_USER_CERT the user certificate file path
- X509_CERT_DIR the trusted certificate directory and ca-signing-policy directory
- X509_USER_PROXY the user proxy certificate file path
as required by GSI.
However, in case of LB daemons, the recommended way for specifying security files locations is using --cert, --key, --CAdir options explicitly.
The Logging library i.e. the library that is linked into UI, RB, JSS and Jobmanager, reads its immediate logging destination form the variable DGLOG_DEST.
It defaults to “x-dglog://localhost:15830“ which is the correct value, hence it normally does not need to be set but on the submitting machine. Correct format for this variable is:
DGLOG_DEST=x-dglog://HOST:PORT
where as already mentioned HOST
defaults to localhost and PORT
defaults to 15830.
On the submitting machine if the variable is not set, it is dynamically assigned by the UI with the value:
DGLOG_DEST=x-dglog://<LB_CONTACT>:15830
where LB_CONTACT is the hostname of the machine where the LB server currently associated to the RB used for submitting jobs is running.
The Logging library functions timeout is read from the environment variable DGLOG_TIMEOUT. It defaults to 2 seconds that is the correct value for locals logging. On the submitting machine the value for this variable is set dynamically by the UI to 10 seconds (recommended value for non-locals logging is 10 to 15 seconds) and it is anyway configurable through the UI configuration.
Finally there is LBDB, the environment variable needed by the LB Server daemons (ileventd, bkserver and bkpurge). LBDB represents the MySQL database connect-string, defaults to
“lbserver/@localhost:lbserver” and in the recommended set-up (see section 4.1.1.2) does not need to be set. Otherwise it should be set as follows:
LBDB=USER_NAME/PASSWORD@DB_HOSTNAME:DB_NAME
where
- USER_NAME is the name of database user,
- PASSWORD is user password for the database
- DB_HOSTNAME is hostname of the host where the database is located
- DB_NAME is name of the database.
The Resource Broker and the Job Submission Services are the WMS components allowing the submission of jobs to the CEs. They are dealt with together since they always reside on the same host and consequently are distributed by means of a single rpm.
For the installation of RB and JSS the Globus Toolkit 2.0
rpms available at http://datagrid.in2p3.fr/distribution/globus
under the directory beta-xx/RPMS
(recommended beta is 21 or higher) are required to be installed on the target
platform. All needed rpms can
be downloaded with the command
wget -nd –r <URL>/<rpm name>
and
installed with
rpm –ivh <rpm name>
The Globus
gridftp server package must also be installed and configured on the same host
(see http://marianne.in2p3.fr/datagrid/documentation/EDG-Install-HOWTO.html
for details).
It is
important to recall that the Globus grid-mapfile
located in /etc/grid-security on
the RB server machine must be filled with the certificate subjects of all the
users allowed to use the Resource Broker functionalities. Users being mapped into the gridmap-file have to belong to a group
having the same name of the user itself. At the same time the dedicated user dguser has to belong to all these groups.
Moreover on the same platform the following products are expected to be installed:
- LB local-logger services (see section 4.1.1.1)
- PostgreSQL (RB and JSS)
- Condor-G (JSS)
- ClassAd library (RB and JSS)
- ReplicaCatalog from the WP2 distribution (RB)
Both RB and JSS use PostgreSQL database for implementing the internal job queue. The installation kit and the documentation for PostgreSQL can be found at the following URL:
http://www3.us.postgresql.org/sites.html
Required PostgreSQL version is 7.1.3 or higher. The following packages need to be installed (respecting the order in which they are listed): postgresql-libs, posgresql-devel, postgresql, postgresql-server, postgresql-tcl, postgresql-tk and postgresql-docs.
PostgreSQL also needs packages cyrus-sasl-1-5-11 (or higher), openssl-0.9.5a and openssl-devel-0.9.5a (or higher). All of them can be found at the following URL:
http://datagrid.in2p3.fr/distribution/external/RPMS
Hereafter are reported the configuration options that must be used when installing the package:
--with-CXX
--with-tcl
--enable-odbc
Postgresql 7.1.3 is also available in rpm format (to be installed as root) at the URL :
http://datagrid.in2p3.fr/distribution/external/RPMS
Once PostgreSQL has been installed, you need as root to create a new system account dguser using the (RH specific) command
adduser –r –m dguser
This command allows indeed creating a system account having a home directory. Then follow steps reported here below to create an empty database for JSS:
su – postgres (become the postgres user)
createuser –d –A dguser (create the new database user dguser)
su – dguser (become the user dguser)
createdb <DBNAME> (create the new database
for JSS)
The name of the created database must be the same as the one assigned to the Database_name attribute in file jss.conf (see section 4.2.4.2 for more details), otherwise JSS will use as default the "template1" database. Avoiding use of the template database is anyway strongly recommended.
The RB server uses instead another database named "rb", which is created by RB itself.
On upgrading from version 1.1.x to version 1.2.y administrators must remember to completely remove the table containing the old version database registry. This is because the 1.2.x JSS uses a new field inside the PostGreSQL database to store the proxy file path.
Commands that have to be issued as root: are:
psql template1 postgres (to connect to the database)
and then change template1 to the database name contained inside the jss.conf file.
Once inside the psql client do:
DROP TABLE condor_submit (to remove the table)
and change condor_submit to the table name contained inside the jss.conf file.
Condor-G release required by JSS is CondorG 6.3.1 for INTEL-LINUX-GLIBC21. The Condor-G installation toolkit can be found at the following URL:
http://www.cs.wisc.edu/condor/downloads/condorg.license.html.
whilst it is available in rpm format (to be installed as root) at:
http://datagrid.in2p3.fr/distribution/external/RPMS
Installation and configuration are quite straightforward and for details the reader can refer to the README file included in the Condor-G package. Main steps to be performed after having unpacked the package as root are:
-
become
dguser (su – dguser)
- make sure the directory where you are going to install CondorG is owned by dguser
- make sure the Globus Toolkit 2.0 has been installed on the platform
- run the /opt/CondorG/setup.sh installation script
- remove the link ~dguser/.globus/certificates created by the installation script
Moreover some additional configuration steps have to be performed in the Condor configuration file pointed to by the CONDOR_CONFIG environment variable set during installation. In the $CONDOR_CONFIG file the following attributes need to be modified:
RELEASE_DIR = $(CONDORG_INSTALL_PATH)
CONDOR_ADMIN = <a valid e-mail address of the Condor-G administrator>
UID_DOMAIN = < the domain of the machine (e.g. pd.infn.it)>
FILESYSTEM_DOMAIN = < the domain of the machine (e.g. pd.infn.it)>
HOSTALLOW_WRITE = *
CRED_MIN_TIME_LEFT = 0
GLOBUSRUN =
$(GLOBUS_LOCATION)/bin/globusrun
and the following entries need to be added:
SKIP_AUTHENTICATION = YES
AUTHENTICATION_METHODS = CLAIMTOBE
DISABLE_AUTH_NEGOTIATION
= TRUE
GRIDMANAGER_CHECKPROXY_INTERVAL = 600
GRIDMANAGER_MINIMUM_PROXY_TIME = 180
The environment variable CONDORG_INSTALL_PATH is also set during installation and points to the path where the Condor-G package has been installed.
The current version of Condor-G for working properly requires file /etc/grid-security/certificates/ca-signing-policy.conf that has been instead eliminated from the Globus Toolkit 2.0 distribution and must hence be created by the administrator. This need will be removed with next release of Condor-G that will be fully Globus Toolkit 2.0 compliant.
The ClassAd release required by JSS and RB is classads-0.9 (or higher). The ClassAd library documentation can be found at the following URL:
http://www.cs.wisc.edu/condor/classad.
whilst it is available in rpm format (to be installed as root) at:
http://datagrid.in2p3.fr/distribution/external/RPMS
The ReplicaCatalog release required by RB is ReplicaCatalogue-gcc32dbg-2.0 (or higher) that is available in rpm format (to be installed as root) at:
http://datagrid.in2p3.fr/distribution/wp2/RPMS
In order to install the Resource Broker and the Job Submission services, the following command has to be issued with root privileges:
rpm -ivh
workload-profile.X.Y.Z-K.i386.rpm
rpm -ivh proxy-X.Y.Z-K.i386.rpm
rpm -ivh jobsubmission-X.Y.Z-K.i386.rpm
rpm -ivh
jobsubmission-profile-X.Y.Z-K.i386.rpm
By default the jobsubmission-X.Y.Z-K.i386.rpm and the proxy-X.Y.Z-K.i386.rpm rpms install the software in the “/opt/edg” directory whilst jobsubmission-profile-X.Y.Z-K.i386.rpm in “/etc/rc.d/init.d” and “/etc/sysconfig”.
When the jobsubmission rpms have been installed, the following directory tree is created:
<install-path>/bin
<install-path>/bin/RBserver
<install-path>/bin/jssparser
<install-path>/bin/jssserver
<install-path>/etc
<install-path>/etc/jss.conf
<install-path>/etc/rb.conf
<install-path>/etc/wl-jss_rb-env.sh
<install-path>/lib
<install-path>/man
<install-path>/man/man3
<install-path>/man/man3/BROKER_INFOstruct.3
<install-path>/man/man3/CannotConfigure.3
<install-path>/man/man3/CannotReadFile.3
<install-path>/man/man3/ConfSchema.3
<install-path>/man/man3/DeletePointer.3
<install-path>/man/man3/GDMP_ReplicaCatalog.3
<install-path>/man/man3/InvalidURL.3
<install-path>/man/man3/JSSConfiguration.3
<install-path>/man/man3/JobWrapper.3
<install-path>/man/man3/JssClient.3
<install-path>/man/man3/LDAPConnection.3
<install-path>/man/man3/LDAPSynchConnection.3
<install-path>/man/man3/LogManager.3
<install-path>/man/man3/MalformedFile.3
<install-path>/man/man3/RBJobRegistry.3
<install-path>/man/man3/RBMaster.3
<install-path>/man/man3/RBReplicaCatalog.3
<install-path>/man/man3/RBReplicaCatalogEx.3
<install-path>/man/man3/RBjob.3
<install-path>/man/man3/URL.3
<install-path>/man/man3/brokerinfo.3
<install-path>/man/man3/do_CloseSEs_supply_CE_with_nfiles.3
<install-path>/man/man3/jsscommon.3
<install-path>/man/man3/jssthreads.3
<install-path>/man/man3/matchmaking.3
<install-path>/man/man3/rbargs_t.3
<install-path>/man/man3/rbhandlers.3
<install-path>/man/man3/rbthreads.3
<install-path>/man/man3/select_CE_on_files.3
<install-path>/sbin
<install-path>/sbin/broker
<install-path>/sbin/jobsubmission
<install-path>/share
<install-path>/share/doc
<install-path>/share/doc/Workload
<install-path>/share/doc/Workload/Broker
<install-path>/share/doc/Workload/Broker/COPYING
<install-path>/share/doc/Workload/Broker/NEWS
<install-path>/share/doc/Workload/Broker/README
<install-path>/share/doc/Workload/Broker/html
<install-path>/share/doc/Workload/Broker/html/annotated.html
<install-path>/share/doc/Workload/Broker/html/class_BROKER_INFOstruct-include.html
<install-path>/share/doc/Workload/Broker/html/class_BROKER_INFOstruct-members.html
<install-path>/share/doc/Workload/Broker/html/class_BROKER_INFOstruct.html
<install-path>/share/doc/Workload/Broker/html/class_ConfSchema-include.html
<install-path>/share/doc/Workload/Broker/html/class_ConfSchema-members.html
<install-path>/share/doc/Workload/Broker/html/class_ConfSchema.html
<install-path>/share/doc/Workload/Broker/html/class_GDMP_ReplicaCatalog-include.html
<install-path>/share/doc/Workload/Broker/html/class_GDMP_ReplicaCatalog-members.html
<install-path>/share/doc/Workload/Broker/html/class_GDMP_ReplicaCatalog.gif
<install-path>/share/doc/Workload/Broker/html/class_GDMP_ReplicaCatalog.html
<install-path>/share/doc/Workload/Broker/html/class_LDAPConnection-include.html
<install-path>/share/doc/Workload/Broker/html/class_LDAPConnection-members.html
<install-path>/share/doc/Workload/Broker/html/class_LDAPConnection.gif
<install-path>/share/doc/Workload/Broker/html/class_LDAPConnection.html
<install-path>/share/doc/Workload/Broker/html/class_LDAPSynchConnection-include.html
<install-path>/share/doc/Workload/Broker/html/class_LDAPSynchConnection-members.html
<install-path>/share/doc/Workload/Broker/html/class_LDAPSynchConnection.gif
<install-path>/share/doc/Workload/Broker/html/class_LDAPSynchConnection.html
<install-path>/share/doc/Workload/Broker/html/class_RBJobRegistry-include.html
<install-path>/share/doc/Workload/Broker/html/class_RBJobRegistry-members.html
<install-path>/share/doc/Workload/Broker/html/class_RBJobRegistry.html
<install-path>/share/doc/Workload/Broker/html/class_RBMaster-include.html
<install-path>/share/doc/Workload/Broker/html/class_RBMaster-members.html
<install-path>/share/doc/Workload/Broker/html/class_RBMaster.html
<install-path>/share/doc/Workload/Broker/html/class_RBReplicaCatalog-include.html
<install-path>/share/doc/Workload/Broker/html/class_RBReplicaCatalog-members.html
<install-path>/share/doc/Workload/Broker/html/class_RBReplicaCatalog.gif
<install-path>/share/doc/Workload/Broker/html/class_RBReplicaCatalog.html
<install-path>/share/doc/Workload/Broker/html/class_RBReplicaCatalogEx-include.html
<install-path>/share/doc/Workload/Broker/html/class_RBReplicaCatalogEx.html
<install-path>/share/doc/Workload/Broker/html/class_RBjob-include.html
<install-path>/share/doc/Workload/Broker/html/class_RBjob-members.html
<install-path>/share/doc/Workload/Broker/html/class_RBjob.html
<install-path>/share/doc/Workload/Broker/html/class_do_CloseSEs_supply_CE_with_nfiles-include.html
<install-path>/share/doc/Workload/Broker/html/class_do_CloseSEs_supply_CE_with_nfiles-members.html
<install-path>/share/doc/Workload/Broker/html/class_do_CloseSEs_supply_CE_with_nfiles.gif
<install-path>/share/doc/Workload/Broker/html/class_do_CloseSEs_supply_CE_with_nfiles.html
<install-path>/share/doc/Workload/Broker/html/class_select_CE_on_files-include.html
<install-path>/share/doc/Workload/Broker/html/class_select_CE_on_files-members.html
<install-path>/share/doc/Workload/Broker/html/class_select_CE_on_files.gif
<install-path>/share/doc/Workload/Broker/html/class_select_CE_on_files.html
<install-path>/share/doc/Workload/Broker/html/doxygen.gif
<install-path>/share/doc/Workload/Broker/html/files.html
<install-path>/share/doc/Workload/Broker/html/functions.html
<install-path>/share/doc/Workload/Broker/html/globals.html
<install-path>/share/doc/Workload/Broker/html/group_ReplicaCatalog.html
<install-path>/share/doc/Workload/Broker/html/headers.html
<install-path>/share/doc/Workload/Broker/html/hierarchy.html
<install-path>/share/doc/Workload/Broker/html/index.html
<install-path>/share/doc/Workload/Broker/html/modules.html
<install-path>/share/doc/Workload/Broker/html/null.gif
<install-path>/share/doc/Workload/Broker/refman.ps
<install-path>/share/doc/Workload/Common
<install-path>/share/doc/Workload/Common/html
<install-path>/share/doc/Workload/Common/html/annotated.html
<install-path>/share/doc/Workload/Common/html/class_DeletePointer-include.html
<install-path>/share/doc/Workload/Common/html/class_DeletePointer-members.html
<install-path>/share/doc/Workload/Common/html/class_DeletePointer.html
<install-path>/share/doc/Workload/Common/html/class_InvalidURL-include.html
<install-path>/share/doc/Workload/Common/html/class_InvalidURL.html
<install-path>/share/doc/Workload/Common/html/class_URL-include.html
<install-path>/share/doc/Workload/Common/html/class_URL-members.html
<install-path>/share/doc/Workload/Common/html/class_URL.html
<install-path>/share/doc/Workload/Common/html/doxygen.gif
<install-path>/share/doc/Workload/Common/html/files.html
<install-path>/share/doc/Workload/Common/html/functions.html
<install-path>/share/doc/Workload/Common/html/group_Common.html
<install-path>/share/doc/Workload/Common/html/headers.html
<install-path>/share/doc/Workload/Common/html/index.html
<install-path>/share/doc/Workload/Common/html/modules.html
<install-path>/share/doc/Workload/Common/html/null.gif
<install-path>/share/doc/Workload/Common/refman.ps
<install-path>/share/doc/Workload/JobSubmission
<install-path>/share/doc/Workload/JobSubmission/AUTHORS
<install-path>/share/doc/Workload/JobSubmission/COPYING
<install-path>/share/doc/Workload/JobSubmission/NEWS
<install-path>/share/doc/Workload/JobSubmission/README
<install-path>/share/doc/Workload/JobSubmission/html
<install-path>/share/doc/Workload/JobSubmission/html/annotated.html
<install-path>/share/doc/Workload/JobSubmission/html/class_CannotConfigure-include.html
<install-path>/share/doc/Workload/JobSubmission/html/class_CannotConfigure-members.html
<install-path>/share/doc/Workload/JobSubmission/html/class_CannotConfigure.html
<install-path>/share/doc/Workload/JobSubmission/html/class_CannotReadFile-include.html<install-path>/share/doc/Workload/JobSubmission/html/class_CannotReadFile-members.html<install-path>/share/doc/Workload/JobSubmission/html/class_CannotReadFile.html
<install-path>/share/doc/Workload/JobSubmission/html/class_JSSConfiguration-include.html
<install-path>/share/doc/Workload/JobSubmission/html/class_JSSConfiguration-members.html
<install-path>/share/doc/Workload/JobSubmission/html/class_JSSConfiguration.html
<install-path>/share/doc/Workload/JobSubmission/html/class_JobWrapper-include.html
<install-path>/share/doc/Workload/JobSubmission/html/class_JobWrapper-members.html
<install-path>/share/doc/Workload/JobSubmission/html/class_JobWrapper.html
<install-path>/share/doc/Workload/JobSubmission/html/class_JssClient-include.html
<install-path>/share/doc/Workload/JobSubmission/html/class_JssClient-members.html
<install-path>/share/doc/Workload/JobSubmission/html/class_JssClient.html
<install-path>/share/doc/Workload/JobSubmission/html/class_LogManager-include.html
<install-path>/share/doc/Workload/JobSubmission/html/class_LogManager-members.html
<install-path>/share/doc/Workload/JobSubmission/html/class_LogManager.html
<install-path>/share/doc/Workload/JobSubmission/html/class_MalformedFile-include.html
<install-path>/share/doc/Workload/JobSubmission/html/class_MalformedFile-members.html
<install-path>/share/doc/Workload/JobSubmission/html/class_MalformedFile.html
<install-path>/share/doc/Workload/JobSubmission/html/class_rbargs_t-include.html
<install-path>/share/doc/Workload/JobSubmission/html/class_rbargs_t-members.html
<install-path>/share/doc/Workload/JobSubmission/html/class_rbargs_t.gif
<install-path>/share/doc/Workload/JobSubmission/html/class_rbargs_t.html
<install-path>/share/doc/Workload/JobSubmission/html/doxygen.gif
<install-path>/share/doc/Workload/JobSubmission/html/files.html
<install-path>/share/doc/Workload/JobSubmission/html/functions.html
<install-path>/share/doc/Workload/JobSubmission/html/globals.html
<install-path>/share/doc/Workload/JobSubmission/html/group_JobWrapper.html
<install-path>/share/doc/Workload/JobSubmission/html/group_JssClient.html
<install-path>/share/doc/Workload/JobSubmission/html/group_JssConfigure.html
<install-path>/share/doc/Workload/JobSubmission/html/group_JssError.html
<install-path>/share/doc/Workload/JobSubmission/html/group_JssParser.html
<install-path>/share/doc/Workload/JobSubmission/html/group_JssThreads.html
<install-path>/share/doc/Workload/JobSubmission/html/headers.html
<install-path>/share/doc/Workload/JobSubmission/html/hierarchy.html
<install-path>/share/doc/Workload/JobSubmission/html/index.html
<install-path>/share/doc/Workload/JobSubmission/html/modules.html
<install-path>/share/doc/Workload/JobSubmission/html/null.gif
<install-path>/share/doc/Workload/JobSubmission/refman.ps
/etc/rc.d/init.d
/etc/rc.d/init.d/broker
/etc/rc.d/init.d/jobsubmission
The directory bin contains all the RB and JSS server process executables Rbserver, jssserver and jssparser. In etc are stored the configuration files (see below Section 4.2.4.1 and section 4.2.4.2). The scripts to start and stop the RB and JSS processes are contained in “/etc/rc.d/init.d”.
Once the rpm has been installed, the RB and JSS services must be properly configured. This can be done editing the two files rb.conf and jss.conf that are stored in <install-path >/etc. Actions to be performed to configure the Resource Broker and the Job Submission Service are described in the following two sections.
Configuration of the Resource Broker is accomplished editing the file “<install-path>/etc/rb.conf:” to set opportunely the contained attributes. They are listed hereafter grouped according to the functionality they are related with:
- MDS_contact, MDS_port and MDS_timeout refer to the II service and respectively represent the hostname where this service is running, the port number, and the timeout in seconds when the RB queries the II. E.g.:
MDS_contact =
"grid001f.cnaf.infn.it";
MDS_port = 2170;
MDS_timeout = 60;
-
MDS_gris_port refers
to the port to be used by RB to contact GRIS’es. E.g.:
MDS_gris_port = 2135;
-
MDS_multi_attributes
define the list of the attribute that in the MDS are multi-valued (i.e. that
this can assume multiple values). It is recommended to not modify the default
value for this parameter which is currently:
MDS_multi_attributes = {
"AuthorizedUser",
"RunTimeEnvironment",
"CloseCE"
};
- MDS_basedn defines the basedn, which represents the distinguished name (DN) to use as a starting place for searches in the information index. It is recommended to not modify the default value for this parameter which is currently set to:
MDS_basedn = "o=Grid"
- LB_CONTACT and LB_PORT refer to the LB Server service and represent respectively the hostname and port where the LB server is listening for connections. E.g.:
LB_contact =
"grid004f.cnaf.infn.it";
LB_port = 7846;
The Logging
library i.e. the library providing APIs for logging job events to the LB (that
is linked into RB) reads its immediate logging destination form the environment
variable DGLOG_DEST (see section 4.1.5) hence it is not dealt with in the configuration
file. DGLOG_DEST defaults to “x-dglog://localhost:15830“ which
is the correct value, hence it normally does not need to be set indicating that
the LB local-logger services should normally run on the same host as the RB
server. The logging function timeout is instead read from the environment
variable DGLOG_TIMEOUT that defaults to 2 seconds.
- JSS_contact and JSS_server_port refer to the JSS and represent respectively the hostname (it must be the same host of the RB server one) and the port number (it must match with the RB_client_port parameter in the jss.conf file - see section 4.2.4.2) where the JSS server is listening. Moreover JSS_client_port represents the port used by RB to listen for JSS communications. Value of the latter parameter must match with the JSS_server_port parameter in the jss.conf file (see section 4.2.4.2). Hereafter is reported an example for these parameters:
JSS_contact
= "grid004f.cnaf.infn.it";
JSS_client_port = 8881;
JSS_server_port = 9991;
- JSS_backlog and UI_backlog define the maximum number of simultaneous connections from JSS and UI supported by the socket . Default values are:
JSS_backlog = 5;
UI_backlog = 5;
- UI_server port is the port used by the RB server to listen for requests coming from the User Interface. Default value for this parameter is:
UI_server_port = 7771;
- RB_pool_size represents the maximum number of request managed simultaneously by the RB server. Default value for this parameter is:
RB_pool_size
= 16;
- RB_purge_threshold that defines the threshold age in seconds for RBRegistry information. Indeed RB purges all the information and frees storage space of a job (input/output sandboxes) when the last update of the internal information database has taken place since more than RB_purge_threshold seconds. Default value for this parameter is about one week:
RB_purge_threshold =
600000;
- RB_cleanup_threshold represents the span of time (expressed in seconds) between two consecutive cleanups of job registry. During the registry cleanup the RB removes all the entries of those jobs classified as ABORTED. At the end of the cleanup if it is needed (see RB_purge_trheshold) the purging of the registry is performed, as well. The default value for this configuration parameter is:
RB_cleanup_threshold = 3600;
The administrator according to
the estimated amount of jobs input/sandbox files in the given period must
anyway tailor this value in order to not overfull RB machine disk space.
- RB_sandbox_path, which represents the pathname of the root sandboxes directory i.e. the complete pathname linking to the directory where the RB creates both input/output sandboxes directories and stores the “.Brokerinfo” file. Default value for this parameter is the temporary directory:
RB_sandbox_path = "/tmp";
-
RB_logfile
that defines the name of the file used by the RB for recording its various events. The
default value for this parameter is:
RB_logfile =
"/var/tmp/RBserver.log";
- RB_logfile_size. This parameter limits the size of the RB log file to the specified size, each time it grows beyond this maximum the RB flushes its content in a new file with the same name of the original but having .old as extension. The size should be expressed in bytes. Default value for this parameter is:
RB_logfile_size = 5120000;
- RB_logfile_level. This parameter allows the user to specify the verbosity of the information the RB records in its log file. Possible values are: 0 (none), 1 (verylow), 2 (low), 3 (medium), 4 (high), 5 (veryhigh) and 6 (ugly). The default value for this configuration parameter is:
RB_logfile_level = 3;
- RB_submission_retries. This parameter allows the user to specify the number of times the RB has to try to re-schedule and re-submit the job to JSS in case the submission to the CE fails (e.g. globus down on the CE, network problem etc.). The resubmission is tried for all the CEs satisfying the job requirements. When a job is submitted specifying the RetryCount attribute in the JDL, the RB performs a number of submission retries equals to the minimum value between RetryCount and RB_submission_retries. The default value for this configuration parameter is:
RB_submission_retries = 3;
- MyProxyServer. This parameter allows the user to specify the server host name of the MyProxy credential repository system to be contacted for periodic credential renewal. An example for this configuration parameter is provided hereafter:
MyProxyServer
= "skurut.cesnet.cz";
- SkipJobSubmission. If this parameter is set to true the Resource Broker will skip the actual job submission aborting the job at the end of match-making algorithm and will notify the Logging and Bookkeeping service by issuing a dgLogAbort with a text specifying the matching CE where the job would have been sent if the JSS interaction had not been disabled. The default value for this configuration parameter is
SkipJobSubmission
= false;
- RB_notification_queue_size. This parameter represents the number of maximum notifications that the RB can handle. The default value for this configuration parameter is 32:
RB_notification_queue_size = 32
No semicolon has to be put at the
end of last field in the rb.conf
file.
Configuration of the Job Submission Service is accomplished editing the file “<install-path>/etc/jss.conf:” to set opportunely the contained parameters. They are listed hereafter together wit their meanings:
- Condor_submit_file_prefix defines the prefix for the CondorG submission file. The job identifier dg_jobId is then appended to this prefix to build the actual submission file name). Default value for this parameter is:
Condor_submit_file_prefix = "/var/tmp/CondorG.sub";
- Condor_log_file defines the absolute path name of the CondorG log file, i.e. the file where the events for the submitted jobs are recorded. Default value for this parameter is:
Condor_log_file =
"/var/tmp/CondorG.log";
- Condor_stdoe_dir defines the directory where the standard output and standard error files of CondorG are temporarily saved. Default value is:
Condor_stdoe_dir = "/var/tmp";
- Job_wrapper_file_prefix is the prefix for the Job Wrapper file name (i.e. the script wrapping the actual job which is submitted on the CE). As before the job identifier dg_jobId is appended to this prefix to build the actual file name. Default value for this parameter is:
Job_wrapper_file_prefix =
"/var/tmp/Job_wrapper.sh";
- Database_name is the name of the Postgres database where JSS registers information about submitted jobs. This name must correspond to an existing database (how to create it is briefly described in section 4.2.1.1). Default value for the database name is the one of the database automatically created when installing Postgres, i.e.:
Database_name = "template1";
- Database_table_name is the name of the table in the previous database. This table is created by the JSS itself if not found. Default value for this parameter is:
Database_table_name =
"condor_submit";
- JSS_server_port and RB_client_port represent respectively the port used by JSS to listen for RB communication and to communicate to the RB server (e.g. for sending notifications). The two mentioned parameters have to match respectively with the JSS_client_port and JSS_server_port parameters in the rb.conf file (see section 4.2.4.1). Default values are:
JSS_server_port
= 8881;
RB_client_port = 9991;
-
Condor_log_file_size
indicates the size in bytes at which the CondorG.log log file has to be splitted. Default value is:
Condor_log_file_size = 64000;
Environment variables that have to be set for the RB are listed hereafter:
- PGSQL_INSTALL_PATH the Postgres database installation path. Default value is
/usr/local/pgsql
- PGDATA the path where are stored the Postgres database data
Files. Default value is /usr/local/pgsql/data
- GDMP_INSTALL_PATH the gdmp installation path. Default value is /opt/edg.
Setting of PGSQL_INSTALL_PATH and PGDATA is only needed if installation is not performed from rpm. Moreover $GDMP_INSTALL_PATH/lib has to be added to LD_LIBRARY_PATH. Finally, there are other environment variables needed at run-time by RB. They are:
- EDG_WL_RB_CONFIG_DIR the RB configuration directory
- X509_HOST_CERT the user certificate file path
- X509_HOST_KEY the user private key file path
- X509_USER_PROXY the user proxy certificate file path
- GRIDMAP location of the Globus grid-mapfile that translates X509 certificate subjects into local Unix usernames. The default is /etc/grid-security/grid-mapfile.
Anyway, all variable in the latter group are set by the broker start-up script.
Environment variables that have to be set for the JSS are
listed hereafter:
- PGSQL_INSTALL_PATH the Postgres database installation path. Default value is
/usr/local/pgsql
- PGDATA the path where are stored the Postgres database data
Files. Default value is /usr/local/pgsql/data
- PGUSER the user that has been used to start postgres services.
Default value is postgres
- CONDOR_CONFIG The CondorG configuration file path. Default value is
/ home/dguser/CondorG/etc/condor_config
- CONDORG_INSTALL_PATH the CondorG installation path. Default value is
/home/dguser/CondorG
Setting of the former variables is only needed if installation is not performed from rpms. However don't forget to check them in the file /opt/edg/etc/wl-jss_rb-env.sh when you install rpms. Moreover:
- $CONDORG_INSTALL_PATH/bin
- $CONDORG_INSTALL_PATH/sbin
- $PGSQL_INSTALL_PATH/bin (only if installation is not performed from rpm)
must be included in the PATH environment variable and
- $CONDORG_INSTALL_PATH/lib,
- $PGSQL_INSTALL_PATH/lib (only if installation is not performed from rpm)
have to be added to LD_LIBRARY_PATH. Finally, there are other environment variables needed at run-time by JSS. They are:
- EDG_WL_JSS_CONFIG_DIR the JSS configuration directory
- X509_HOST_CERT the user certificate file path
- X509_HOST_KEY the user private key file path
- X509_USER_PROXY the user proxy certificate file path
- GRIDMAP location of the Globus grid-mapfile that translates X509 certificate subjects into local Unix usernames. The default is /etc/grid-security/grid-mapfile.
Anyway all variables in the latter group are set into the jobsubmission start-up script.
The Information Index (II) is the service queried by the Resource Broker to get information about resources for the submitted jobs during the matchmaking process. An II must hence be deployed for each RB/JSS instance.
This section describes steps to be performed to install and configure the Information Index service.
For installing the II, apart from the informationindex and the informationindex-profile rpms (see
section 4.3.2 for details), the following Globus Toolkit 2.0
and Datagrid rpms are needed:
-
globus_ssl_utils-gcc32dbg_rtl version >= 2.1
-
globus_gram_reporter-noflavor_data version >= 2.0
-
globus_gss_assist-gcc32dbg_rtl version >= 2.0
-
globus_libtool-gcc32dbgpthr_rtl version >= 1.4
-
globus_openssl-gcc32dbg_rtl version
>= 0.9.6b
-
globus_openldap-gcc32dbg_pgm version >= 2.0.14
-
globus_libtool-gcc32dbg_rtl version
>= 1.4
-
globus_openssl-gcc32dbgpthr_rtl version >= 0.9.6b
-
globus_openldap-gcc32dbg_rtl version >= 2.0.14
-
globus_mds_back_giis-gcc32dbg_pgm version >= 0.3
-
globus_mds_gris-noflavor_data version >= 2.2
-
globus_cyrus_sasl-gcc32dbg_rtl version >= 1.5.27
-
globus_cyrus_sasl-gcc32dbgpthr_rtl version >= 1.5.27
-
globus_gssapi_gsi-gcc32dbg_rtl version >= 2.0
-
globus_openldap-gcc32dbgpthr_rtl version >= 2.0.14
-
edg-info-main version
>= 1.0.0
The above listed rpms are available at http://datagrid.in2p3.fr/distribution/globus under the directory beta-xx/RPMS (recommended beta is 21 or higher) and at http://datagrid.in2p3.fr/distribution/datagrid/wp6.
All the
needed packages can be downloaded with the command
wget -nd –r <URL>/<rpm name>
and
installed with
rpm –ivh <rpm name>
In order to install the Information Index service, the following command has to be issued with root privileges:
rpm -ivh
workload-profile.X.Y.Z-K.i386.rpm
rpm -ivh
informationindex.X.Y.Z-K.i386.rpm
rpm -ivh
informationindex-profile.X.Y.Z-K.i386.rpm
By default the first rpm installs the software in the “/opt/edg” directory whilst the second in
“/etc/rc.d/init.d”.
When the informationindex rpms have been installed, the following directory tree is created:
<install-path>/etc
<install-path>/etc/grid-info-site-giis.conf
<install-path>/etc/grid-info-slapd-giis.conf
<install-path>/sbin
<install-path>/sbin/information_index
<install-path>/share
<install-path>/share/doc
<install-path>/share/doc/Workload
<install-path>/share/doc/Workload/InformIndex
<install-path>/share/doc/Workload/InformIndex/COPYING
<install-path>/share/doc/Workload/InformIndex/NEWS
<install-path>/share/doc/Workload/InformIndex/README
<install-path>/var
/etc/rc.d/init.d
/etc/rc.d/init.d/information_index
Under the installation path in etc are stored the configuration files and var (initially empty) is used by the II to store files created at start-up, containing args and pid of the II process. The information_index script file can be used both from /etc/rc.d/init.d and <install-path>/sbin to start the II.
The II has two configuration files that are located in <install-path>/etc and are named:
- grid-info-slapd-giis.conf
-
grid-info-site-giis.conf
In grid-info-slapd-giis.conf are specified the schema file locations and the database type, whilst in grid-info-site-giis.conf are listed the entries for the GRISes that are registered to this II. Each entry has the following format:
dn: service=register, dc=mi, dc=infn, dc=it,
o=grid
objectclass: GlobusTop
objectclass:
GlobusDaemon
objectclass:
GlobusService
objectclass:
GlobusServiceMDSResource
Mds-Service-type: ldap
Mds-Service-hn:
bbq.mi.infn.it
Mds-Service-port: 2135
Mds-Service-Ldap-sizelimit:
20
Mds-Service-Ldap-ttl: 200
Mds-Service-Ldap-cachettl:
50
Mds-Service-Ldap-timeout: 30
Mds-Service-Ldap-suffix:
o=grid
The field Mds-Service-hn specifies the GRIS address; the Mds-Service-port specifies the GRIS port (2135 is strongly recommended) whilst the other entries are related to ldap sizelimit and ldap ttl. To add a new GRIS to the given II, it suffices to add a new entry like the one just showed, to the grid-info-site-giis.conf file.
Another file that can be used to configure the II is the start-up script information_index. In this file is indeed specified the number of the port that is used by the II to listen for requests whose default is 2170. This value can be changed to make II listen on another port provided it matches with the value of the MDS_port attribute in the RB configuration file rb.conf (see section 4.2.4.1).
The only environment
variable needed by the II to run is the Globus installation path
GLOBUS_LOCATION that is anyway set by the start-up script information_index.
This section describes the steps needed to install and
configure the User Interface, which is the software module of the WMS allowing the user to access
main services made available by the components of the scheduling sub-layer.
In order to install the UI, apart from the userinterface and workload-profile rpms (see section 4.4.2 for details) you will need the following packages:
-
workload-profile.X.Y.Z-K.i386.rpm
-
userinterface-profile.X.Y.Z-K.i386.rpm
-
userinterface-X.Y.Z-K.i386.rpm
the following Globus Toolkit 2.0 and Datagrid rpms available respectively at http://datagrid.in2p3.fr/distribution/globus and http://datagrid.in2p3.fr/distribution/datagrid/wp6 are needed:
-
globus_gss_assist-gcc32dbgpthr_rtl-2.0-21
-
globus_gssapi_gsi-gcc32dbgpthr_rtl-2.0-21
-
globus_ssl_utils-gcc32dbgpthr_rtl-2.1-21
-
globus_gass_transfer-gcc32dbg_rtl-2.0-21
-
globus_openssl-gcc32dbgpthr_rtl-0.9.6b-21
-
globus_ftp_control-gcc32dbg_rtl-1.0-21
-
globus_user_env-noflavor_data-2.1-21
-
globus_gss_assist-gcc32dbg_rtl-2.0-21
-
globus_gssapi_gsi-gcc32dbg_rtl-2.0-21
-
globus_ftp_client-gcc32dbg_rtl-1.1-21
-
globus_ssl_utils-gcc32dbg_rtl-2.1-21
-
globus_ssl_utils-gcc32dbg_pgm-2.1-21
-
globus_gass_copy-gcc32dbg_rtl-2.0-21
-
globus_gass_copy-gcc32dbg_pgm-2.0-21
-
globus_openssl-gcc32dbg_rtl-0.9.6b-21
-
globus_common-gcc32dbg_rtl-2.0-21
-
globus_profile-edgconfig-0.9-1
-
globus_io-gcc32dbg_rtl-2.0-21
-
globus_core-edgconfig-0.6-2
-
obj-globus-1.0-4.edg
-
globus_cyrus_sasl-gcc32dbgpthr_rtl-1.5.27-21
-
globus_libtool-gcc32dbgpthr_rtl-1.4-21
-
globus_mds_common-gcc32dbg_pgm-2.2-21
-
globus_openldap-gcc32dbg_pgm-2.0.14-21
-
globus_openldap-gcc32dbgpthr_rtl-2.0.14-21
-
globus_core-gcc32dbg_pgm-2.1-21
Moreover the set of security configuration rpm’s for all the
Certificate Authorities in Testbed1 available at http://datagrid.in2p3.fr/distribution/datagrid/security/RPMS/
have to be installed together with the rpm to be used for renewing your
certificate for your CA. This is available at
http://datagrid.in2p3.fr/distribution/datagrid/security/RPMS/local/.
The Python
interpreter, version 2.1.1 has also to be installed on the submitting machine.
The rpm for this package is available at http://datagrid.in2p3.fr/distribution/external/RPMS
as:
-
python-2.1.1-3.i386.rpm
Information
about python and the package sources can be found at www.python.org.
Since the
Linux RH 6.2 and RH 7.2 distribution already encompasses Python-1.5 installed
and the recent standard Python2 rpms from RedHat and from python.org avoid the
conflict with previous versions by only create python2* binaries, the UI
scripts use “python2” executable as Python interpreter. Before using the UI commands it is hence
important to check that the “python2” executable is available on the submission
platform and if it is not the case the necessary symbolic link should be
created.
All the
needed packages can be downloaded with the command
wget -nd –r <URL>/<rpm name>
and
installed with
rpm –ivh <rpm name>
In order to install the User Interface, the following command has to be issued with root privileges:
rpm –ivh
workload-profile.X.Y.Z-K.i386.rpm
rpm –ivh userinterface-profile.X.Y.Z-K.i386.rpm
rpm -ivh userinterface-X.Y.Z-K.i386.rpm
By default the rpm installs the software in the “/opt/edg” directory.
After the userinterface* and the workload rpms have been installed, the following directory tree is created:
<install-path>/bin
<install-path>/bin/JobAdv.py
<install-path>/bin/JobAdv.pyc
<install-path>/bin/UIchecks.py
<install-path>/bin/UIchecks.pyc
<install-path>/bin/UIutils.py
<install-path>/bin/UIutils.pyc
<install-path>/bin/dg-job-cancel
<install-path>/bin/dg-job-get-logging-info
<install-path>/bin/dg-job-get-output
<install-path>/bin/dg-job-id-info
<install-path>/bin/dg-job-list-match
<install-path>/bin/dg-job-status
<install-path>/bin/dg-job-submit
<install-path>/bin/libRBapi.py
<install-path>/bin/libRBapi.pyc
<install-path>/etc
<install-path>/etc/UI_ConfigENV.cfg
<install-path>/etc/UI_Errors.cfg
<install-path>/etc/UI_Help.cfg
<install-path>/etc/job_template.tpl
<install-path>/lib
<install-path>/lib/libLBapi.a
<install-path>/lib/libLBapi.la
<install-path>/lib/libLBapi.so
<install-path>/lib/libLBapi.so.0
<install-path>/lib/libLBapi.so.0.0.0
<install-path>/lib/libLOGapi.a
<install-path>/lib/libLOGapi.la
<install-path>/lib/libLOGapi.so
<install-path>/lib/libLOGapi.so.0
<install-path>/lib/libLOGapi.so.0.0.0
<install-path>/lib/libRBapic.a
<install-path>/lib/libRBapic.la
<install-path>/lib/libRBapic.so
<install-path>/lib/libRBapic.so.0
<install-path>/lib/libRBapic.so.0.0.0
<install-path>/share
<install-path>/share/doc
<install-path>/share/doc/Workload
<install-path>/share/doc/Workload/UserInterface
<install-path>/share/doc/Workload/UserInterface/COPYING
<install-path>/share/doc/Workload/UserInterface/NEWS
<install-path>/share/doc/Workload/UserInterface/README
/etc/profile.d/wl-ui-env.sh
/etc/profile.d/wl-ui-env.csh
The bin directory contains all UI python scripts including the commands made available to the user. In lib are installed all the API wrappers shared libraries, while in etc can be found the errors and configuration files UI_ConfigENV.cfg and UI_Errors.cfg plus the help file (UI_Help.cfg) and a template of a job description in JDL (job_template.tpl).
Configuration of the User Interface is accomplished editing the file “<install-path>/etc/UI_ConfigENV.cfg:” to set opportunely the contained parameters. They are listed hereafter together wit their meanings:
- DEFAULT_STORAGE_AREA_IN defines the path of the directory where files coming from RB (i.e. the jobs Output Sandbox files) are stored if not specified by the user through commands options. Default value for this parameter is:
DEFAULT_STORAGE_AREA_IN
= /tmp
- requirements, rank represent the values that are assigned by the UI to the corresponding job attributes (mandatory attributes) if these have not been provided by the user in the JDL file describing the job. Default values are:
requirements = TRUE
rank = - other.EstimatedTraversalTime
If the user has provided an expression for the requirements
attribute in the JDL, the one specified in the configuration file is added (in
AND) to the existing one. E.g. if in the configuration file there is:
requirements = other.Active
and in the JDL file the user has specified:
requirements
= other.LRMSType == "PBS";
then the job description that is
passed to the RB will contain
requirements
= other.LRMSType == "PBS" && other.Active ;
Obviously the value TRUE for the requirements in the
configuration file does not have any impact on the evaluation of job
requirements:
requirements = other.LRMSType == "PBS" && TRUE ;
It is also possible to disable the default for the rank attribute by setting it to 0 (i.e. rank = 0) in the
configuration file. Indeed with such a
default, if no rank is specified in the JDL then all matching
resources will be assigned with equal ranking (i.e. 0) that is equivalent to no
ranking.
-
ErrorStorage represents
the path of the location where the UI creates log files. Default location is:
ErrorStorage = /tmp
- RetryCountLB and RetryCountJobId are the number of UI retrials on fatal errors respectively when opening connection with an LB and when querying the LB for information about a given job. Default values for these parameters are:
RetryCountLB = 1
RetryCountJobId = 1
-
LoggingTimeout represents
the timeout of the dgLogTransfer LB
API called by the UI for logging the JobTransfer event. This parameter makes
the UI set accordingly the environment variable DGLOG_TIMEOUT. If not provided in
the configuration file, it defaults to 2 seconds (UI and logging services on
the same host). Recommended value for UI that are non-local to the logging
services is 10 to 15 seconds. Value for this variable in the UI configuration
file is
LoggingTimeout = 10
Moreover there are two sections reserved to the addresses of the LBs and RBs that are accessible for the UI from the machine where it is installed.
Special markers (e.g. %%beginLB%%) that must not be modified, indicate the sections begin-end. Hereafter is reported an example of the two mentioned sections:
%%beginLB%%
https://grid013g.cnaf.infn.it:7846
https://grid004f.cnaf.infn.it:7846
https://skurut.cesnet.cz:7846
%%endLB%%
%%beginRB%%
grid013g.cnaf.infn.it:7771
grid004f.cnaf.infn.it:7771
%%endRB%%
LB addresses must be in the format:
[<protocol>://]<hostname>:<port>
where if not provided, default for <protocol> is “https” and for <port> is 7846.
RB addresses must instead be in the format:
<hostname>:<port>
i.e. no protocol is admitted. If not provided, default for <port> is 7771.
The LB addresses are used by the User Interface to know which LB servers have to be contacted for querying about job info. They are used only when the issued command pertain “all jobs owned by a user” (e.g. see dg-job-status –all in section 6.1.3). Indeed in this case all listed LB are taken into account for querying, whilst when a job identifier (dg_jobId) is specified the LB address is taken directly from dg_jobId (see section 6.1.3 for details on the job identifier format).
The RB addresses are used by the User Interface to know which Resource Brokers can be accessed for job submission. When the user submits a job, the first RB in the list is considered and in case this is not available for some reason, the connection to second one is tried and so on until an available RB is found. The same happens when asking the list of matching CEs for a job (see dg-job-submit and dg-job-list-match commands at section 6.1.3).
The RB addresses are used instead in a similar way as for the LB when the user asks for cancellation of all its jobs. In this case indeed all listed RB are asked for deletion of jobs owned by the requesting user (see dg-job-cancel –all at section 6.1.3).
Environment variables that have to be set for the User Interface are listed hereafter:
- X509_USER_KEY the user private key file path. Default value is
$HOME/.globus/userkey.pem
- X509_USER_CERT the user certificate file path.Default value is
$HOME/.globus/usercert.pem
- X509_CERT_DIR the trusted certificate directory and ca-signing-policy
directory. Default value is /etc/grid-security/certificates
- X509_USER_PROXY the user proxy certificate file path. Default value is
/tmp/x509up_u<UID> where UID is the user identifier on the machine as required by GSI.
Moreover there are:
- EDG_WL_UI_CONFIG_PATH Non standard location of the UI configuration file
UI_ConfigENV.cfg. This variable points to the file absolute path.
-
EDG_WL_LOCATION UI
install path. It has to be set only if installation has
been made in a non default
location. It defaults to /opt/edg
- GLOBUS_LOCATION The Globus rpms installation path.
The two latter variables are anyway set automatically once the userinterface-profile rpm is installed.
The Logging library i.e. the library that is linked into UI for logging the jobs transfer events reads its immediate logging destination form the variable DGLOG_DEST. Correct format for this variable is:
DGLOG_DEST=x-dglog://HOST:PORT
where HOST defaults to localhost and PORT defaults to 15830. On the submitting machine if the variable is not set it is dynamically assigned by the UI with the value:
DGLOG_DEST=x-dglog://<LB_CONTACT>:15830
where LB_CONTACT is the hostname of the machine where the LB server currently associated to the RB used for submitting jobs is running.
The userguide documentation package (see section 3.2.2 for more details) provides you all the information needed to download ,configure, install and use the Datagrid software. Once you have installed the userguide rpm, the following directory tree is created:
<install-path>/share
<install-path>/share/doc
<install-path>/share/doc/DataGrid_01_TEN_0118_0_X_Document.pdf
For security purposes all the WMS daemons run with proxy certificates. These certificates are generated from the start-up scripts that are described in the following section, before the applications are started. Lifetime of proxies created by the start-up scripts is 24 hours. In order to provide the daemons with valid proxies for all their lifetime the administrators need to ensure regular generation of new proxies. This can be achieved adding the following lines to the machine /etc/crontab:
57 2,8,14,20 * * * root service locallogger proxy
57 2,8,14,20 * * * root service
lbserver proxy
57 2,8,14,20 * * * root service broker
proxy
57 2,8,14,20 * * * root service jobsubmission proxy
This will make proxies be created by cron.
To run the LB local-logger services, it suffices to issue as root the following command:
/etc/rc.d/init.d/locallogger start
if the locallogger-profile rpm has been installed. Otherwise you can use
<install path>/sbin/locallogger start
This makes both the dglogd and the interlogger processes start.
The same can be done issuing the following commands:
<install path>/sbin/dglogd <options>
<install path>/sbin/interlogger <options>
Both daemons recognize a common set of options:
--key=<keyfile> host certificate private key file (this option overrides value of the environment variable X509_USER_KEY). Here below an example of option usage:
--key=/etc/grid-security/hostkey.pem
--cert=<certfile> host certificate file (this option overrides value of the environment variable X509_USER_CERT). Here below an example of option usage:
--cert=/etc/grid-security/hostcert.pem
--CAdir=<certdir> trusted certificate and ca-signing-policy directory (this option overrides value of the environment variable X509_CERT_DIR). Here below an example of option usage:
--CAdir=/etc/grid-security/certificates
--file-prefix=<file path> Absolute path of the file where are stored locally the logged events. The default value is /tmp/dglog, which can result in risk of data loss in case of reboot. Note that the same value must be specified for dglogd and interlogger.
--debug make the process run in foreground to produce diagnostics
Using the options explicitly is recommended rather than relying on the correspondent environment variables.
Stop of the LB local-logger services can be performed using the locallogger script with the stop option.
If the LB local-logger services are started in debug mode (i.e. using the –-debug option), the daemons log fatal failures with syslog().
To run the LB server services, it suffices to issue as root the following command:
/etc/rc.d/init.d/lbserver start
if the lbserver-profile rpm has been installed. Otherwise you can use
<install path>/sbin/lbserver start
This makes both the bkserver and the ileventd processes start.
The same can be done issuing the following commands:
<install path>/sbin/ileventd
<options>
<install path>/sbin/bkserver <options>
Both daemons recognize a common set of options:
--key=<keyfile> host certificate private key file (this option overrides value of the environment variable X509_USER_KEY). Here below an example of option usage:
--key=/etc/grid-security/hostkey.pem
--cert=<certfile> host certificate file (this option overrides value of the environment variable X509_USER_CERT). Here below an example of option usage:
--cert=/etc/grid-security/hostcert.pem
--CAdir=<certdir> trusted certificate and ca-signing-policy directory (this option overrides value of the environment variable X509_CERT_DIR). Here below an example of option usage:
--CAdir=/etc/grid-security/certificates
--debug make the process run in foreground to produce diagnostics
Using the options explicitly is recommended rather than relying on the correspondent environment variables.
Stop of the LB server services can be performed using the lbserver script with the stop option.
The bkpurge process, whose executable is installed in <install path>/sbin, is not a daemon but an utility which should be run periodically (e.g. using a cron job) in order to remove inactive jobs (i.e. those that have already entered the Cleared status since a certain amount of time) from the LB database. This utility recognizes the following set of options:
--log data being purged from database are dumped on the stdout
--outfile=<file> data being purged from database are dumped in the file named <file>
--mysql=<database> name of the database to be purged. It must be the same used by bkserver (this option is not required in the standard set-up
--timeout=<timeout>[smhd] removes data for all jobs that entered the “Cleared” status since more than <timeout> [seconds/minutes/hours/days].
--debug print diagnostics on the stderr
--nopurge dry run mode. It doesn't really purge (useful for debugging purposes)
--aborted, -a delete from the database data also for jobs that have entered the “Aborted” status
If --log is specified, the data in ULM format are dumped to stdout (or <file>). Normally
information is appended to the file. The file is locked with flock (_LOCK_EX) to prevent race
conditions, e.g. rotating logs.
An example of usage of this utility could be the issuing once a day, using a cron job, of a bkpurge like:
bkpurge --log
--outfile=/var/log/dglb-data.log --timeout=14d
If the LB server services are started in debug mode (that is using the –-debug option) the daemons log fatal failures with syslog().
Both RB and JSS use the service offered by the database. It must be
started before one of these daemons
using its own startup script:
/etc/rc.d/init.d/postgresql
start
or using RedHat service command:
service
postgresql start
Stopping is achieved by the same commands with the stop parameter:
/etc/rc.d/init.d/postgresql
stop
or
service
postgresql stop
The packages *-profile.X.Y.Z.rpm provide the SysV
RedHat-like scripts that allow starting these daemons. In particular startup of
RB or JSS can be achieved issuing directly:
/etc/rc.d/init.d/broker
start
/etc/rc.d/init.d/jobsubmission
start
or, indirectly, using RedHat dedicated commands:
service
broker start
service
jobsubmission start
In the same way stopping is achieved by:
/etc/rc.d/init.d/broker
stop
/etc/rc.d/init.d/jobsubmission
stop
or
service
broker stop
service
jobsubmission stop
The startup script for JSS also starts and stops the
underlying CondorG service. If any of the configuration steps described in
section 4.2 has been followed, these scripts will start the
daemons with the correct selected users (see also Table 2 in section 7.7). However do not forget to put the right files (hostkey.pem and hostcert.key) in the locations pointed respectively by the
variables X509_HOST_KEY and X509_HOST_CERT (this must be located in the
subdirectory hostcert of the home
directory of the dguser account).
Startup scripts can also be used to know the current
status of the daemons using the status
option:
service broker status
service jobsubmission status
Moreover it is strongly recommended to set the configuration of the
machine in such a way that all these services (PostGreSQL, RB and JSS) will be
started at the startup of the system. For these issue, refer to the RedHat chkconfig
SysV script manager command.
Hereafter are reported the instructions for cleaning-up the PostGreSQL databases used by the RB and the JSS to store persistent information about handled jobs. They can be useful when a re-start in a clean context is needed or in case the content of the databases has been corrupted following a serious failure of some component.
Resource Broker
pgsql -U postgres <RB_db_name>
delete from job;
"\q" (to
quit)
RB_db_name is the name of the database used by
the Resource Broker (usually set to rb)
Job Submission Service
pgsql -U postgres template1
delete from
condor_submit;
"\q" (to
quit)
template1 is the default name of the database used by the JSS. It is configurable through the Database_name parameter of the jss.conf file.
The RB supplies with a log file recording its various events. This file
can be used to debug abnormal behaviours of the service. . The RB log-file name
and other properties can be changed by directly modifying the rb.conf
configuration file. You can change the name of the file, the debug level and
the maximum file size in bytes, as well.
The script responsible to start JSS also includes the definition of the
JSS log files. There are two of them and their pathname is set respectively to:
/var/tmp/JSSserver.log and /var/tmp/JSSparser.log. As before, modifying these locations
implies a modification of the /etc/rc.d/init.d/jobsubmission script in the following
two lines:
SERVERLOG=/var/tmp/JSSserver.log
PARSERLOG=/var/tmp/JSSparser.log
To start/stop the II, the following command has to be used as root:
/etc/rc.d/init.d/information_index
{start | stop}
The software module of the WMS allowing the user to access main services made available by the components of the scheduling sub-layer is the User Interface that hence represents the entry-point to the whole system.
Sections 6.1.1 and 6.1.2 provide a general description of the UI, dealing with the security management, common behaviours, environment variables to be set etc. Section 6.1.3 describes the Job Submission User Interface commands in a Unix man-page style.
The Job Submission UI is
the module of the WMS allowing the user to access main services made available
by the components of the scheduling sub-layer. The user interaction with the
system is assured by means of a JDL and a command-driven user interface
providing commands to perform a certain set of basic operations. Main
operations made possible by the UI are:
-
Submit
a job for execution on a remote Computing Element, also encompassing:
§
automatic
resource discovery and selection
§
staging
of the application sandbox (input sandbox)
-
Find
the list of resources suitable to run a specific job
-
Cancel
one or more submitted jobs
-
Retrieve
the output files of a completed job (output sandbox)
-
Retrieve
and display bookkeeping information about submitted jobs
-
Retrieve
and display logging information about submitted jobs.
The User Interface
depends on two other Workload Management System components:
-
the
Resource Broker that provides support for the job control functionality
-
the
Logging and Bookkeeping Service provides support for the job monitoring
functionality.
For the DataGrid to be
an effective framework for largely distributed computation, users, user
processes and grid services must work in a secure environment
Due to this, all
interactions between WMS components, especially those that are
network-separated, will be mutually authenticated: depending on the specific
interaction, an entity authenticates itself to the other peer using either its
own credential or a delegated user credential or both. For example when the
User Interface passes a job to the Resource Broker, the UI authenticates using
a delegated user credential (a proxy certificate) whereas the RB uses its own
service credential. The same happens when the UI interacts with the Logging and
Bookkeeping service. The UI uses a delegated user credential to limit the risk
of compromising the original credential in the hands of the user.
The user or service
identity and their public key are included in a X.509 certificate signed by a DataGrid
trusted Certification
Authority (CA), whose purpose is to guarantee the association between that
public key and its owner
According to what just premised, to take advantage of UI commands the user has to possess a valid X.509 certificate on the submitting machine, consisting of two files: the certificate file and the private key file. The location of the two mentioned files is assumed to be either pointed to respectively by “$X509_USER_CERT” and “$X509_USER_KEY” or by “$HOME/.globus/usercert.pem” and “$HOME/.globus/userkey.pem” if the X509 environment variables are not set. The user certificate and private key files are needed for the creation of the delegated user credentials. Indeed, as it is explained hereafter what is really needed is the user proxy certificate.
All UI commands, when started, check for the existence and expiration date of a user proxy certificate in the location pointed to by “$X509_USER_PROXY” or in “/tmp/x509up_u<UID>” (<UID> is the user identifier in the submitting machine OS) if the X509 environment variable is not set. If the proxy certificate does not exist or has expired a new one with default duration of 24 hours is automatically created by the UI using the GSI services (grid-proxy-init and grid-proxy-info). The user proxy certificate is created either as “$X509_USER_PROXY” or as “/tmp/x509up_u<UID>”.
Once a job has been
submitted by the UI, it passes through several components of the WMS (e.g. the
RB, the JSS etc.) before it completes its execution. At each step operations
that are related with the job could require authentication by a certificate.
For example during the scheduling phase, the RB needs to get some information
about the user who wants to schedule a job and the certificate of the user
could be needed to access this information. Similarly, a valid user’s
certificate is needed by JSS to submit a job to the CE. Moreover JSS has to be
able to repeat this process e.g. in case of crashing of the CE which the job is
running on, therefore, a valid user’s certificate is needed for all the job
lifetime.
A job gets a valid proxy
certificate when it is submitted by the UI to RB. Validity of such a
certificate is usually set to 12 hours, hence problems could occur if the job
spends on CE (in a queue or running) more time than lifetime of its proxy
certificate.
The UI dg-job-submit command (see description later in this document) supplies an option (--hours H) allowing the specification of the duration in hours of the proxy certificate that is created on behalf of the user. Due to this, it being understood that the certificates files search paths remains as before, the proxy checking mechanism for this command slightly differs from that of the other commands, i.e.:
-
If the
“--hours H” option has not been specified, the proxy certificate check is
done as explained before
-
If the
“--hours H” option has been
specified, then a new proxy certificate having a duration of H hours is created both when no existing
proxy is found and when the existing proxy lifetime is less than H. In the latter case the existing proxy
certificate is destroyed before creating the new one.
This allows the user to
submit jobs running longer then the default proxy duration (12 hours).
Another way for
achieving this in a more secure way is to deploy the features of MyProxy
package. The underlying idea is that the user registers in a MyProxy server a
valid long-term certificate proxy that will be used by JSS to perform a
periodic credential renewal for the submitted job; in this way the user is no
longer obliged to create very long lifetime proxies when submitting jobs
lasting for a great amount of time. A more detailed description of this
mechanism is provided in the following paragraph.
The MyProxy
credential repository system consists of a server and a set of client tools
that can be used to delegate and retrieve credentials to and from a server.
Normally, a user would start by using the myproxy_init client program
along with the permanent credentials necessary to contact the server and
delegate a set of proxy credentials to the server along with authentication
information and retrieval restrictions.
The MyProxy
Toolkit is available at the following URL:
http://lindir.ics.muni.cz/dg_public/myproxy-0.4.4-edg.tar.gz
In order to
compile the package you'll have to follow the common Unix/Linux configure/make
commands:
./configure --with-gsi=/opt/globus
--with-globus-flavor=gcc32dbg \
--disable-anonymous-auth --prefix=/opt/myproxy
Type ./configure
--help for all the
detailed options (such as binaries, server configuration paths, etc)
Once you
have successfully launched the configure script you can compile the source and
install the package launching 'make' and 'make install'.
Before
using the MyProxy tools, you have to restrict the users that are allowed to
store credentials within the myproxy server and, more importantly, which
clients are allowed to retrieve credentials from the myproxy server. To do
that, just follow instructions reported hereafter (MyProxy Server).
MyProxy Server
myproxy-server is a daemon that runs on a trusted,
secure host that manages a database of proxy credentials for use from remote
sites. Proxies have a lifetime that is controlled by the myproxy-init
program. When a proxy is requested to the myproxy-server, via the myproxy-get-delegation
command, further delegation insures that the lifetime of the new proxy is less
than the original to enforce greater security.
A configuration file is responsible for maintaining a list of trusted portals and users that can access this service. To configure a proxy server, you need to execute the following steps:
cd /opt/edg/etc
cp edg-myproxy.conf
edg-myproxy.conf.orig
cp myproxy.conf
edg-myproxy.conf
edit this file, substitute the present lines with a similar line, containing the name of the local resource broker:
/etc/rc.d/init.d/myproxy start (this creates the file /etc/myproxy-server.config)
chkconfig
-level 2345 myproxy
The myproxy.conf
file looks as follows:
#
#################################################
# Add to this file all of
the subject names of resources
# who may renew credentials,
i.e. the issuer names of
# recognized resource
brokers.
#
# Add lines like the
following one (without the #)
#/O=Grid/O=CERN/OU=cern.ch/CN=host/testbed013.cern.ch
###################################################
/O=Grid/O=CERN/OU=cern.ch/CN=host/testbed013.cern.ch
/O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0383.cern.ch
/O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0382.cern.ch
/O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
/C=IT/O=INFN/OU=Resource Broker/L=CNAF/CN=grid012f.cnaf.infn.it/Email=elisabetta.ronchieri@cnaf.infn.it
i.e it
contains subject names of all resources who are allowed to renew credentials
(the recognized Resource Brokers).
In order to
launch the demon you have to run the binary '<prefix>/sbin/myproxy-server'.
The program will start up and background itself. It accepts connections on TCP
port 7512, forking off a separate child to handle each incoming connection. It
logs information via the syslog service.
MyProxyClient
The set of
binaries provided for the client is made of the following files:
myproxy-init
myproxy-info
myproxy-destroy
myproxy-get-delegation
myproxy-init command allows you to create and
send a delegated proxy to a myproxy server for later retrieval; in order to
launch it you have to assure you're able to execute the grid-proxy-init GLOBUS
command (i.e.the binary is visible from your $PATH environment and the required
cert files are either stored in the common path or specified with the X509
variables). You can use the command as follows (you will be asked for your PEM
passhprase):
myproxy-init -s <host name> -t
<hours> -d –n
The myproxy-init
command stores a user proxy in the repository specified by <host name>
(the –s option). Default lifetime of proxies retrieved from the repository
will be set to <hours> (see -t)
and no password authorization is permitted when fetching the proxy from the
repository (the -n option). The proxy
is stored under the same username as is your subject in your certificate (-d).
The myproxy-info
command returns the remaining lifetime of the proxy in the repository along
with subject name of the proxy owner (in our case it will be the same as in
your proxy certificate). So If you want to get information about the stored
proxies you can issue:
myproxy-info -s <host name> -d
where -s
and -d options have already been explained in the myproy-init command
The
myproxy-destroy command simply destroys any existing proxy stored in the
myproxy server. You can use it as follows:
myproxy-destroy -s <host name> -d
where -s
and -d options have already been explained in the myproy-init command
The myproxy-get-delegation
command is indeed used to retrieve information about the proxies stored in the
myproxy server. You can use it as follows:
myproxy-get-delegation -s <host
name> -d -t <hours> \
-o <output file> -a <user
proxy>
You should
end up with a retrieved proxy in <output file>, which is valid for
<hours> hours.
It is worth
noting that the environment variable MYPROXY_SERVER can be set to tell to all
these programs the hostname where the myproxy server is running.
A User Interface installation mainly consists of three directories bin, lib and etc that are created under the UI installation path that is usually pointed by the EDG_WL_LOCATION environment variable. If this variable is not set or its value is not correct, default value is assumed to be “/opt/edg”.
bin contains the commands executables and hence it is recommended to add it to the user PATH environment variable to allow her/him to use UI commands from whatever location. lib contains the shared libraries (wrappers of the RB/LB APIs) implementing functionalities for accessing the RB and LB services , whereas etc is the UI configuration area.
The UI configuration area etc contains the job description template file job_template.jdl, the file containing the mapping between error codes and error messages UI_Errors.cfg, and the actual configuration file UI_ConfigEnv.cfg. The latter file is the only one that could need to be edited and tailored according to the user/platform characteristics and needs. It contains the following information that are read by and have influence on commands behaviour (see section 4.4.4 for details):
- address and port of accessible RBs ordered by priority,
- address and port of accessible LBs ordered by priority,
- default location of the local storage areas for the Input/Output sandbox files,
- default values for the JDL mandatory attributes,
- default number of retrials on fatal errors when connecting to the LB.
When started, UI commands first check if the EDG_WL_LOCATION is set and then search for the etc directory containing its configuration files in the following locations, in order of precedence: “$EDG_WL_LOCATION”, “/“, “/usr/local“ and “/opt/edg“. If none of the locations contains needed files an error is returned to the user.
Since several users on the same machine can use a single installation of the UI, people concurrently issuing UI commands share the same configuration files. Anyway for users (or groups of users) having particular needs it is possible to “customise” the UI configuration through the --config option supported by each UI command.
Indeed every command launched specifying “--config file_path” reads its configuration settings in the file “file_path” instead of the default configuration file. Hence the user only needs to create such file according to her/his needs and to use the --config option to work under “private” settings.
Moreover if the user wants to make this change in some way permanent avoiding the use for each issued command of the --config option, she/he can set the environment variable EDG_WL_UI_CONFIG_PATH to point to the non-standard path of the configuration file. Indeed if that variable is set commands will read settings from file “$EDG_WL_UI_CONFIG_PATH”. Anyway the --config option takes precedence on all other settings.
It is
important to note that since the job identifiers dg_jobId (see section 6.1.3 – dg-job-submit) implicitly holds the information
about the RB and the LB that are managing the corresponding job, all the
commands taking the dg_jobId as input parameter do not take into account
the RB and LB addresses listed in the configuration file to perform the
requested operation also if the –config option has been specified.
Hereafter are listed the options that are common to all UI commands (with the exception of dg-job-id-info that is a local utility):
-
--config file_path
-
--noint
-
--debug
-
--logfile file_path
-
--version
-
--help
The --config option
The --noint option skips all interactive questions to the user and goes ahead in the command execution. All warning messages and errors (if any) are written to the file <command_name>_<UID>_<PID>.log in the “/tmp” directory instead of the standard output. It is important to note that when --noint is specified some checks on “dangerous actions” are skipped. For example if jobs cancellation is requested with this option, this action will be performed without requiring any confirmation to the user. The same applies if the command output will overwrite an existing file, so it is recommended to use the --noint option in a safe context.
The --debug option is mainly thought for testing and debugging purposes; indeed it makes the commands print additional information while running. Every time an external API function call is encountered during the command execution, values of parameters passed to the API are printed to the user. The info messages are displayed on the standard output and are also written together with possible errors, to <command_name>_<UID>_<PID>.log file in the /tmp directory. An example of the debug messages format is as follows:
#### Debug API #### - The function 'dgLBJobStatus' has been called with the
following parameter(s):
>>Struct 'dgLBContext':
-> 0
-> 0
>>Struct 'dgJobId':
-> lx01.hep.ph.ic.ac.uk/124445102160554
-> grid004f.cnaf.infn.it
-> 7846
-> grid013g.cnaf.infn.it:7771
>> 0
If --noint option is specified together with --debug option the debug message will not be printed on standard output.
The –logfile <file_path> option allows re-location of the commands log files in the location pointed by file_path.
The --version and --help options respectively make the commands display the UI current version and the command usage.
Two further options that are common to almost all commands are --input and --output. The latter one makes the commands redirect the outcome to the file specified as option argument whilst the former reads a list of input items from the file given as option argument. The only exception is the dg-job-list-match command that does not have the --input option.
For all commands, the file given as argument to the --input option shall contain a list of job identifiers in the following format: one dg_jobId for each line, comments beginning with a “#” or a “*” character. If the input file contains only one dg_jobId (see the description of dg-job-submit command later in this document for details about dg_jobId format), then the request is directly submitted taking the dg_jobId as input, otherwise a menu is displayed to the user listing all the contained items, i.e. something like:
------------------------------------------------------------------------------------------------------------------------------------------
1 :
https://grid013g.cnaf.infn.it:7846/lx01.hep.ph.ic.ac.uk/133711137156527?grid013g.cnaf.infn.it:7781
2 :
https://grid013g.cnaf.infn.it:7846/lx01.hep.ph.ic.ac.uk/133747137833158?grid013g.cnaf.infn.it:7781
3 :
https://grid004f.cnaf.infn.it:7846/lx01.hep.ph.ic.ac.uk/133957138124219?grid004f.cnaf.infn.it:7771
4 : https://grid013g.cnaf.infn.it:7846/lx01.hep.ph.ic.ac.uk/134030138239274?grid013g.cnaf.infn.it:7771
5 :
https://grid001f.cnaf.infn.it:7846/lx01.hep.ph.ic.ac.uk/140706140477638?grid013g.cnaf.infn.it:7771
a : all
q : quit
-------------------------------------------------------------------------------------------------------------------------------------------
Choose one or more dg_jobId(s) in the list - [1-10]all:
The user can choose one or more jobs from the list entering the corresponding numbers. E.g.:
- 2 makes the command take the second listed dg_jobId as input
- 1,4 makes the command take the first and the fourth listed dg_jobIds as input
- 2-5 makes the command take listed dg_jobIds from 2 to 5 (ends included) as input
- all makes the command take all listed dg_jobIds as input
- q makes the command quit
Default value for the choice is all. If the –input option is used together with the --noint then all dg_jobIds contained in the input file are taken into account by the command.
The only command whose --input behaviour differs from the one just described is dg-job-submit. First of all the input file contains in this case CEIds instead of dg_jobIds, moreover only one CE at a time can be the target of a submission hence the user is allowed to choose one and only one CEId. Default value for the choice is “1”, i.e. the first CEId in the list. This also the choice automatically made by the command when the --input option is used together with the --noint one.
In this section we describe syntax and behavior of the commands made
available by the UI to allow job submission, monitoring and control.
In the commands synopsis the mandatory arguments are showed between angle brackets (<arg>) whilst the optional ones between square brackets ([arg]).
Allows the user to submit a job for execution on
remote resources in a grid.
SYNOPSIS
dg-job-submit [options] <jdl file>
Options:
--help
--version
--template
--input, -i <input_file>
--resource, -r <ce_id>
--notify, -n
<e-mail_address(es)>
--hours, -h <hours_number>
--nomsg
--config, -c <config_file>
--output, -o <output_file>
--noint
--debug
--logfile <log_file>
DESCRIPTION
dg-job-submit is the command for submitting jobs to the DataGrid and hence allows the user to run a job at one or several remote resources. dg-job-submit requires as input a job description file in which job characteristics and requirements are expressed by means of Condor class-ad-like expressions. While it does not matter the order of the other arguments, the job description file has to be the last argument of this command.
The job description file given in input to this command is syntactically checked and default values are assigned to some of the not provided mandatory attributes in order to create a meaningful class-ad. The resulting job-ad is sent to the Resource Broker that finds the job best matching resource (match-making) and submits the job to it. The match-making algorithm is described in details in Annex 7.6.
Upon successful completion this command returns to the user the submitted job identifier dg_jobId (a string that identifies unambiguously the job in the whole DataGrid), generated by the User Interface, that can be later used as a handle to perform monitor and control operations on the job (e.g. see dg-job-status described later in this document). The format of the dg_jobId is as follows:
<LBname>/<UIaddress>/<time><PID><RND>?<RBname>
where:
- LBname is the LB server name and port
- UIaddress is the UI machine IP address (or FQDN)
- time is the current UTC time on the submitting machine in hhmmss format
- PID is the command process identifier
- RND is a random number generated at each job submission
- RBname is the RB server hostname and port
The structure of the dg_jobId that could appear in some way complex and not easily readable has been conceived in order to assure uniqueness and the same time contain information that are needed by the components of the WMS to fulfil user requests.
The --resource option can be used to target the job submission to a specific known resource identified by the provided Computing Element identifier ce_id (returned by dg-job-list-match described later in this document). A resource will be either a queue of an underlying LRMS, assuming that this queue represents a set of “homogeneous” resources or a “single” node. The CE identifier is a string, assigned by WP4 and published in the GIS (the CEId field) that univocally identifies a resource belonging to the Grid. CEId is obtained “combining” the GlobusResourceContactString and QueueName attribute, e.g. if lxde01.pd.infn.it:2119 is the Globus resource contact string and grid01 is the queue name then it looks like lxde01.pd.infn.it:2119/jobmanager-lsf-grid01. In other words the admitted format for CEId is:
<full-hostname>:<port-number>/jobmanager-<service>-<queue-name>
where <service> can be lsf, pbs or bqs.
When the --resource
option is specified, the Resource Broker skips completely the match making
process and directly submits the job to the requested CE. It is important to note that in this case
the RB does not generate the “.BrokerInfo” file also if data
requirements have been specified in the JDL, so jobs submitted using this
option should not rely on the .BrokerInfo file information when running
on the CE. The “.BrokerInfo” file is a file generated by the RB during
matchmaking and containing information about the location where input data
specified in the JDL are physically stored, the SEs that are “close” to the CE
chosen for submitting the job etc. It is shipped within the InputSandbox
to the CE where the job is going to run so that it can be used at run-time to
get information for accessing data. Details about the “.BrokerInfo” file can be
found in [R1].
A way for performing direct submission to a given CE and at the same time having the “.BrokerInfo” file generated by RB and shipped to the CE is not using the --resource option and specify the following requirements in the JDL:
Requirements
= other.CEId == <Ce_identifier>;
(e.g. Requirements = other.CEId == “lxde01.pd.infn.it:2119/jobmanager-lsf-grid01”;)
It is also possible to specify the target CE to which submit the job using the --input option. With the --input option an input_file must be supplied containing a list of target CE ids. In this case the dg-job-submit command parses the input_file and displays on the standard output the list of CE Ids written in the input_file. The user is then asked to choose one CEId between the listed ones. The command will then behave exactly like already explained for the --resource option. The basic idea of this command is to use as input_file the output file generated by the dg-job-list-match command when used with the --output option (see dg-job-list-match) that contains the list of CE Ids (if any) matching the requirements specified in the jobad.jdl file. An example of a possible sequence of commands is:
>$ dg-job-list-match --output CEList.out jobad.jdl
>$ dg-job-submit --input CEList.out jobad.jdl
If CEList.out contains more than one CEId then the user is prompted for choosing one Id from the list.
When dg-job-submit
is used with the --notify option, the
following schema is used to notify the user about job status changes:
-
an e-mail notification is sent to the
specified e_mail_address when the match-making process has
finished and the job is ready to be submitted to JSS (READY status)
- an e-mail notification is sent to the specified e_mail_address when the job starts running on the CE (RUNNING status)
-
an e-mail
notification is sent to the specified
e_mail_address when the job has finished (ABORTED or DONE status).
The notification message will contain basic information about the job such as the job identifier, the Id of the assigned CE and a brief description of its status.
Notification to multiple contacts can be requested by specifying the corresponding e-mail addresses separated by commas and without blanks.
It is possible to redirect the returned dg_jobId to an output file using the --output option. If the file already exists, a check is performed:
if the file was previously created by the command dg-job-submit (i.e. it contains a well defined header), the
returned dg_jobId is appended to the
existing file every time the command is launched. If the file wasn’t created by
the command dg-job-submit the user
will be prompted to choose if overwrite the file or not. If the answer is no
the command will abort.
The dg-job-submit command has a particular behaviour when the job description file contains the InputSandbox attribute whose value is a list of file paths on the UI machine local disk. The purpose of the introduction of the InputSandbox attribute is to stage, from the UI to the CE, files that are not available in any SE and are not published in any Replica Catalogue.
To better understand, let’s suppose to have a job that needs for the execution a certain set of files having a small size and available on the submitting machine. Let’s also suppose that for performance reasons it is preferable not going through the WP2 data transfer services for the staging of these files on the executing node. Then the user can use the InputSandbox attribute to specify the files that have to be staged from the submitting machine to the executing CE. All of them are indeed transferred at job submission time together with the job class-ad to the RB that will store them temporarily on its local disk. The JSS will then perform the staging of these files on the executing node. The size of files to be transferred to the RB should be small since overfull of RB local storage means that no more job of this type can be submitted.
This mechanism can also be used to stage a job executable available locally on the UI machine to the executing CE. Indeed in this case the user has to include this file in the InputSandbox list (specifying its absolute path in the file system of the UI machine) and as Executable attribute value has only to specify the file name. On the contrary, if the executable is already available in the file system of the executing machine, the user has to specify as Executable an absolute path name for this file (if necessary using environment variables). The same argument can be applied to the standard input file that is specified through the StdInput JDL attribute.
Since the InputSandbox expression can consist of a great number of file names, it is admitted the use of wildcards and environment variables to specify the value of this attribute. Syntax and allowed wildcards are described in Annex 7.5.
It is important to note that since globus-url-copy (the Globus command used for the InputSanbox files staging) in general doesn't preserve the x flag, the script specified as Executable in the JDL (on which chmod +x is done automatically by the WP1 JobWrapper), should perform a chmod +x for all the files needing execution permission, that are transferred within the InputSandbox of the job.
For the standard output and error of the job the user shall instead always specify just file names (without any directory path) through the StdOutput and StdError JDL attributes. To have them staged back on the UI machine it suffices to list them in the OutputSandbox and use after job completion the dg-job-get-output command described later in this document.
The list of data specification JDL attributes is completed
by the InputData attribute that
refers to data used as input by the job that are not subjected to staging and
are stored in one or more storage elements and published in replica catalogues.
Due to this when the user specifies the InputData
attribute then he/she also has to provide the name of the replica catalogue (ReplicaCatalog attribute) where these
data are published and the protocol her/his application is able to “speak” for
accessing data (DataAccessProtocol
attribute). The InputData attribute
should normally contain a list of logical and/or physical file names. If InputData only contains PFNs then the ReplicaCatalog attribute specification
is no more mandatory.
The ReplicaCatalog address must be provided in the following
format
ldap://<host>:<port>/<Replica
Catalogue DN>
where the Replica Catalogue DN also comprises the mandatory
logical collection field lc.
I.e. it is something like:
lc=<Logical collection>,
rc=<replica catalogue>, dc=....
Herefater is reported an example of Replica Catalog address:
ldap://sunlab2f.cnaf.infn.it:2010/lc=test0, rc=WP2 INFN Test ReplicaCatalog, dc=sunlab2g, dc=cnaf, dc=infn, dc=it
The Arguments attribute in the JDL allows the user to specify all the command line arguments needed to start the job. They have to be specified as a single string, e.g. the job sum that is started with:
$
sum N1 N2 –out result.out
is described by:
Executable = “sum”;
Arguments = “N1 N2 –out result.out”;
If you want to specify a quoted string inside the Arguments then you have to escape quotes with the \ character. E.g. when describing a job like:
$
grep –i “my name” *.txt
you will have to specify:
Executable = “/bin/grep”;
Arguments = “-i \”my name\” *.txt”;
Analogously, if the job takes as argument a string containing a special character (e.g. the job is the tail command issued on a file whose name contains the quotes character, say file1&file2), since on the shell line you would have to write:
$
tail –f file1\&file2
in the JDL you’ll have to write:
Executable = “/usr/bin/tail”;
Arguments = “-f file1\\\&file2”;
i.e. a \ for each special character.
In general, special characters such as &, |, >, < are only allowed
if specified inside a quoted string or preceded by triple \.
The character “`” cannot be specified in the Arguments attribute of the JDL.
The RetryCount attribute allows setting the number of
submission retries for a job upon failure due to some grid component (i.e. not
to the job itself). RetryCount has to be a positive number and the
actual number of submission retries for a job is represented by the minimum
value between RetryCount itself and the value of the RB_submission_retries parameter in the RB configuration file (see 4.2.4.1). The resubmission is tried for all the CEs
satisfying the job requirements.
The --hours allows the user to specify the user proxy duration H, in hours, needed for submitting the job. This option has to be used for long-lasting jobs, indeed a job when submitted needs to be accompanied by a valid proxy certificate during all its life-time and the default duration of user proxy created by UI commands is 12 hours that could in some case not be enough.
It is recalled that anyway a safer way for submitting
long-running jobs is to use the myproxy-init command (see section 6.1.1.1) before the dg-job-submit. The myproxy-init
command registers indeed in a
MyProxy server a valid long-term certificate proxy that will be used by JSS to
perform a periodic credential renewal for the submitted job.
When using the myproxy-init command the hostname of the MyProxy server where to store the certificate proxy has to be specified. If the used sever host name is different from the default one used for the credential renewal, reported in the RB configuration file (rb.conf), it has to be specified within the JDL job description through the MyProxyServer attribute. An example is provided hereafter:
MyProxyServer
= “skurut.cesnet.cz”;
Note that the port
number must not be provided.
Lastly the --nomsg
option makes the command display neither messages nor errors on the standard
output. Only the dg_jobId assigned to
the job is printed to the user if the command was successful. Otherwise the
location of the generated log file containing error messages is printed on the
standard output. This option has been provided to make easier use of the dg-job-submit command inside scripts in
alternative to the --output option.
It is important to note that the dg-job-submit is a
sort of fire-and-forget command, i.e. it exits successfully once the JDL has
been passed to the RB and does not matter about what happens afterwards to the
job. Understanding the reason of a job abort can however be accomplished by
using the dg-job-status (especially looking at the “Status Reason” field)
and dg-job-get-logging-info on the job identifier returned from the
submission.
Job
Description File
A job description file contains a description of job characteristics and constraints in a class-ad style. Details on the class-ad language are reported in the document [A1] also available at the following URL:
http://www.infn.it/workload-grid/docs/DataGrid-01-TEN-0102-0_2.pdf.
The job description file must be edited by the user to insert relevant information about the job that is later needed by the RB to perform the match-making. A template of the job description file, containing a basic set of attributes can be obtained by calling the dg-job-submit command with the --template option. Job description file entries are strings having the format attribute = expression and are terminated by the semicolon character. If the entry spans more than one line, the end of line has to be indicated with a backslash (\) character. Comments must be preceded by a sharp character (#) at the beginning of each line.
Being the class-ad an extensible language, it there doesn’t exist a fixed set of admitted attributes, i.e. the user can insert in the job description file whatever attribute he believes meaningful to describe her/his jobs, anyway only the attributes that can be in some way connected with the resource ones published in the GIS are taken into account by the Resource Broker for the match-making process. Unrelated attributes are simply ignored except when they are used to build the Requirements expression. In the latter case they are indeed evaluated and could affect the match-making result. The attributes taken into account by the RB together with their meaning are reported in document [A7].
There is a small subset of class-ad attributes that are compulsory, i.e. that have to be present in a job class-ad before it is sent to the Resource Broker in order to make possible the performing of the match making process.
They can be grouped in two categories: some of them must be provided by the user whilst some other, if not provided, are filled by the UI with configurable default values. The following Table 1 summarises what just stated.
|
Attribute |
Mandatory |
Mandatory with default value (default value) |
|
Executable |
b |
|
|
Requirements |
|
b (TRUE) |
|
Rank |
|
b (-other.EstimatedTraversalTime) |
|
InputData |
b (only if the ReplicaCatalog and/or the DataAccessProtocol attributes have been specified) |
|
|
ReplicaCatalog |
b (only if the InputData attribute has been
specified) |
|
|
DataAccessProtocol |
b (only if the InputData attribute has been
specified) |
|
Table 1 Mandatory Attributes
In Table 1 the default values for Requirements and Rank can be interpreted respectively as follows:
-
if the user has not provided job constraints then Requirements is set to TRUE, i.e. it
does not matter which are characteristics of the computing element where the
job has to be executed, the RB will take into account all sites where the user
is authorised to run her/his application.
- Since in the JDL the greater is the value of Rank the better is considered the match, if no expression for Rank has been provided, then the resources where the jobs waits a shorter time to pass from the SCHEDULED to the RUNNING status are preferred.
The default values for the Requirements and Rank attributes can be set in the UI_ConfigEnv.cfg file. See section 4.4.4 for details on how to use these defaults.
As the classad language (and hence the JDL) is an extensible language, it allows the user to freely include new attributes within the job description. These attributes are ignored by the RB/JSS for the scheduling but are passed-through by the UI (if their syntax is correct) since they could be relevant for the submitter of for some other component processing the JDL.
However if the job description file contains attributes that are unknown to the RB/JSS, the UI will print a warning (when used with the –debug option) listing all of them.
OPTIONS
--help
displays command usage.
--version
displays UI version.
--resource ce_id
-r ce_id
if the command is launched with this option, the job-ad sent to the RB contains a line of the type SubmitTo = ce_id and the job is submitted by the Resource Broker to the resource identified by ce_id without going through the match-making process. Accepted format for the CEId is:
<full hostname>:<port number>/jobmanager-<service>-<queue name>
where valids for the <service> field are currently: lsf, pbs and bqs.
Note that when this option is used the RB does not generate the “.BrokerInfo” file.
--input input_file
-i input_file
if this option is specified the user will be asked to choose a CEId from a list of CEs contained in the input_file. Once a CEId has been selected the command behaves as explained for the --resource option. If this option is used together with the –noint one and the input file contains more than one CEId, then the first CEId in the list is taken into account for submitting the job.
--notify e_mail_address
-n e_mail_address
when a job is submitted with this option an e-mail message containing basic information pertaining the job identification and status is sent to the specified e_mail_address when the job enters one of the following status:
- READY
- RUNNING
- ABORTED or DONE
Notification to multiple contacts can be requested by specifying the corresponding e-mail addresses separated by commas and without blanks.
--config path_name
-c path_name
if the command is launched with this option, the
configuration file pointed to by path_name
is used instead of the standard configuration file.
--output out_file
-o out_file
writes the generated dg_jobId assigned to the submitted job in the file specified by out_file. out_file can be either a simple name or an absolute path (on the submitting machine). In the former case the file out_file is created in the current working directory.
--hours H
-h H
allows the user to specify the user proxy duration H, in hours, needed for submitting the job. When used with this option the dg-job-submit command behaves as follows:
- the command checks for user proxy existence and if the proxy does not exist a new proxy with H hours duration is created
- if the proxy exists then its duration is checked against the value specified with the --hours option. If proxy duration is greater than H hours then the job is submitted with the existing proxy, otherwise the old proxy is destroyed and a new one with H hours duration is created and used for submitting the job.
This mechanism allows the user to create before submission a proxy with a suitable duration for her/his job; moreover the user is not obliged to enter the PEM pass-phrase at each submission i.e. in all those cases where the existing proxy has a validity great enough for the job.
--nomsg
this option makes the command print on the standard output only the dg_jobId generated for the job if submission was successful; the location of the log file containing massages and diagnostics is printed otherwise.
--noint
if this option is specified every interactive question to the user is skipped, moreover only the dg_jobId is returned on the standard output. All warning messages and errors (if occurred) are written to the file dg-job-submit_<UID>_<PID>.log under the /tmp directory. Log file location is configurable.
--debug
when this option is specified, information about
parameters used for the API functions calls inside the command are displayed on
the standard output and are written to dg-job-submit_<UID>_<PID>.log
file under the /tmp directory too. Log file location is
configurable.
--logfile log_file
when this option is specified, the command log file is relocated to the location pointed by log_file
job_description_file
this is the file containing the classad describing the job to be submitted. It must be the last argument of the command.
Exit Status
dg-job-submit exits with a status value of 0 (zero) upon success, and 1 (one) upon failure.
Examples
1. $> dg-job-submit myjob1.jdl
where myjob1.jdl is as follows:
##############################################
#
# -------- Job
description file ----------
#
##############################################
Executable =
"$(CMS)/fpacini/exe/sum.exe";
InputData = "LF:testbed0-00019";
ReplicaCatalog =
"ldap://sunlab2g.cnaf.infn.it:2010/rc=WP2 INFN Test Replica
Catalog,dc=sunlab2g, dc=cnaf,
dc=infn,
dc=it";
DataAccessProtocol = "gridftp";
Rank
= other.MaxCpuTime;
Requirements = other.LRMSType == "Condor"
&& \
(!(RegExp("*nikhef*",other.CEId)));
submits sum.exe to a resource (supposed to contain the executable file) whose LRMS is Condor and not containing the string “nikhef” in the CE identifier. The command returns the following output to the user, containing the job handle (dg_jobid):
================= dg-job-submit Success ===================================
The job has been successfully submitted to the Resource Broker. Your job is identified by (dg_jobId):
https://grid004f.cnaf.infn.it:7846/155.198.211.205/161251122764136?grid004f.cnaf.infn.it:7771
Use dg-job-status command to display current job status.
======================================================================
2. $>
dg-job-submit myjob2.jdl --notify fpacini@datamat.it
submits the job described by myjob2.jdl , returns the same output as
above to the user and sends a notification by e-mail at well defined job status
changes to fpacini@datamat.it.
See also
[A1], [A2], dg-job-list-match.
– dg-job-get-output
This command requests the RB for the job output files (specified by the OutputSandbox attribute of the job-ad) and stores them on the submitting machine local disk.
SYNOPSIS
dg-job-get-output [options] <job Id(s)>
Options:
--help
--version
--input, -i <input_file>
--dir <directory_path>
--config, -c <config_file>
--noint
--debug
--logfile <log_file>
DESCRIPTION
The dg-job-get-output command can be used to retrieve the output files of a job that has been submitted through the dg-job-submit command with a job description file including the OutputSandbox attribute. After the submission, when the job has terminated its execution, the user can load the files generated by the job and temporarily stored on the RB machine as specified by the OutputSandbox attribute, issuing the dg-job-get-output with as input the dg_jobId returned by the dg-job-submit. It is also possible to specify a list of job identifiers when calling this command or an input file containing dg_jobIds by means of the --input option. When the --input is used, the user is requested to choose all, one or a subset of the job identifiers contained in the input file.
It is important to note that the OutputSandbox of a submitted job can only be retrieved when the job has reached the OutputReady status (see Annex 7.2) indicating that the job is done and the OutputSandbox files are ready for retrieval on the RB machine. dg-job-get-output will always fail for jobs that are not yet in the OutputReady status.
The user can decide the local directory path on the UI machine where these files have to be stored by means of the --dir option, otherwise the retrieved files are put in a default location specified in the UI_ConfigENV.cfg configuration file (DEFAULT_STORAGE_AREA_IN parameter). In both cases a sub-directory will be added to the path supplied. The name of this sub-directory is the “<time><PID><RND>” unique number of the dg_jobId identifier (see command dg-job-submit for details on the dg_jobId structure).
If the user wants to use his “private” configuration file, this can be done using option --config path_name. As a consequence the dg-job-get-output command looks for the file “path_name” instead of the standard configuration file. If this file does not exist the user is notified with an error message and the command is aborted.
OPTIONS
--help
displays command usage.
--version
displays UI version.
--dir directory_path
retrieved files (previously
listed by the user through the OutputSandbox
attribute of the job description file)
are stored in the location indicated by directory_path/<dg_jobId unique string>.
--config path_name
-c path_name
if the command is launched with this option, the
configuration file pointed to by path_name
is used instead of the standard configuration file.
--noint
if this option is specified every interactive question to the user is skipped. All warning messages and errors (if occurred) are written to the file dg-job-get-output_<UID>_<PID>.log under the /tmp directory. Location of log file is configurable.
--debug
when this option is specified, information about
parameters used for the API functions calls inside the command are displayed on
the standard output and are written to dg-get_job_output_<UID>_<PID>.log file under the /tmp directory too. Location of log file is configurable.
--logfile log_file
when this option is specified, the command log file is relocated to the location pointed by log_file
dg_jobId
job identifier returned by dg-job-submit. If a list of oe or more job identifiers
is specified, dg_jobIds have to be
separated by a blank. Job identifiers must be last argument of the command.
--input input_file
-i input_file
this option makes the command
return the OutputSandbox files for
each dg_jobId contained in the input_files. This option can’t be used
if one (or more) dg_jobIds have been
already specified. The format of the input file must be as follows: one dg_jobId for each line and comment lines
must begin with a “#” or a “*” character.
Exit Status
dg-job-get-output exits with a status value of 0 (zero) upon success, >0 upon failure and <0 upon partial failure. An example of partial failure is when more than one job identifiers has been specified and the OuputSandbox could be retrieved only for some of them.
Examples
Let us consider the following command:
$> dg-job-get-output https://grid004.it:2234/124.75.74.12/12354732109721?firefox.esrin.esa.it:4577 --dir /home/data
It retrieves the files listed in the OutputSandbox attribute of job identified by https://grid004.it:2234/124.75.74.12/12354732109721?firefox.esrin.esa.it:4577 from the RB and stores them locally in /home/data/12354732109721.
– dg-job-list-match
Returns the list of resources fulfilling job requirements.
SYNOPSIS
dg-job-list-match [options] <jdl file>
Options:
--help
--version
--verbose
--config, -c
<config_file>
--output, -o <output_file>
--noint
--debug
--logfile <log_file>
DESCRIPTION
dg-job-list-match displays the list of identifiers of the resources accessible by the user and satisfying the job requirements included in the job description file. The CE identifiers are returned either on the standard output or in a file according to the chosen command options and are strings univocally identifying the CEs published in the GIS.
dg-job-list-match requires a job description file in which job characteristics and requirements are expressed by means of a Condor class-ad. The job description file is first syntactically checked and then used as the main command-line argument to dg-job-list-match. The Resource Broker is only contacted to find job compatible resources; the job is never submitted. See the dg-job-submit section and in particular Table 1 for general rules for building the job description file.
If the user wants to use his “private” configuration, file this can be done using option --config path_name.
The option --verbose of the dg-job-list-match command can be used to obtain on the standard output the class-ad sent to the RB generated from the job description.
The --output option makes the command save the list of compatible resources into the specified file. If the provided file name is not an absolute path, then the output file is created in the current working dir.
The CEId attribute of the JDL, being a resource attribute, is only taken into account by the dg-job-list-match command if present in the Requirements expression and if prefixed by “other.”. On the other hand the job attribute SubmitTo setting is a reserved to UI and it is hence discarded if provided directly in the jdl file by the user.
Job
Description File
See dg-job-submit for details.
OPTIONS
--help
displays command usage.
--version
displays UI version.
--verbose
-v
displays on the standard output the job class-ad that is sent to the Resource Broker generated from the job description file. This differs from the content of the job description file since the UI adds to it some attributes that cannot be directly inserted by the user (e.g. CertificateSubject, defaults for Rank and Requirements if not provided).
--config path_name
-c path_name
if the command is launched with this option, the
configuration file pointed to by path_name
is used instead of the standard configuration file.
--output output_file
-o output_file
returns the CEIds list in the file specified by output_file. output_file can be either a simple name or an absolute path (on the submitting machine). In the former case the file output_file is created in the current working directory.
--noint
if this option is specified every interactive question to the user is skipped. All warning messages and errors (if any) are written to the file dg-job-list-match <UID>_<PID>.log under the /tmp directory. Location of the log file is configurable.
--debug
when this option is specified, information about the
API functions called inside the command are displayed on the standard output
and are written to the file dg-job-list-match_<UID>_<PID>.log under the /tmp directory too. Location of the log file is configurable.
--logfile log_file
when this option is specified, the command log file is relocated to the location pointed by log_file
job_description_file
this is the file containing the classad describing the job to be submitted. It must be the last argument of the command.
Exit Status
dg-job-list-match exits with a status value of 0 (zero) upon success, and a non-zero value upon failure.
Examples
Let us consider the following command:
$> dg-job-list-match myjob.jdl
where the job description file myjob.jdl looks like:
#########################################
#
# ---- Sample
Job Description File ----
#
#########################################
Executable = "sum.exe";
StdInput = "data.in";
InputSandbox =
{"/home_firefox/fpacini/exe/sum.exe","/home1/data.in"};
OutputSandbox =
{"data.out","sum.err"};
Rank
= other.MaxCpuTime;
Requirements =
other.LRMSType == "Condor" &&
other.Architecture ==
"INTEL" && other.OpSys== "LINUX" &&
other.FreeCpus
>= 2;
In this case the job requires CEs being Condor Pools of INTEL LINUX machines with at least 2 free Cpus. Moreover the Rank expression states that queues with higher maximum Cpu time allowed for jobs are preferred.
The response of such a command is something as follows:
***************************************************************************
Computing Element IDs
LIST
The
following CE(s) matching your job requirements have been found:
- bbq.mi.infn.it:2119/jobmanager-pbs-dque
- skurut.cesnet.cz:2119/jobmanager-pbs-wp1
***************************************************************************
$>
See also
[A1],[A2], dg-job-submit.
Cancels one or more submitted jobs.
SYNOPSIS
dg-job-cancel [options] <job Id(s)>
Options:
--help
--version
--all
--input, -i <input_file>
--notify, -n <e-mail_address(es)>
--config, -c <config_file>
--output, -o <output_file>
--noint
--debug
--logfile <log_file>
DESCRIPTION
This command cancels a job previously submitted using dg-job-submit. Before cancellation, it prompts the user for confirmation. The cancel request is sent to the Resource Broker that forwards it to the JSS that fulfils it.
dg-job-cancel can remove one or more jobs: the jobs to be removed are identified by their job identifiers (dg_jobIds returned by dg-job-submit) provided as arguments to the command and separated by a blank space. The result of the cancel operation is reported to the user for each specified dg_jobId.
If the --all option is specified, all the jobs owned by the user submitting the command are removed. When the command is launched with the --all option, no dg_jobId can be specified. It has to be remarked that only the owner of the job can remove the job. When the --all option is specified the dg-job-cancel command contacts every Resource Broker listed in the UI_ConfigEnv.cfg file and asks for the cancellation of all jobs owned by the user identified by her/his certificate subject.
If the user wants to use his “private” configuration file this could be done using option --config path_name
The --input option permits to specify a file (input_file) that contains the dg_jobIds to be removed. The format of the file must be as follows: one dg_jobId for each line and comment lines must begin with a “#” or a “*” character. When using this option the user is interrogated for choosing among all, one or a subset of the listed job identifiers. If the input_file does not represent an absolute path the file will be searched in the current working directory.
Possible job cancellation notifications are:
- Cancel SUCCESS i.e. the job has been successfully marked for removal.
- Cancel GENERIC_FAILURE i.e. the user is not the owner of the job or the cancellation request has reached the JSS but has failed for some unknown reason.
- Cancel CONDOR_FAILURE i.e. the cancellation request has failed due to a CondorG problem.
- Cancel GLOBUS_FAILURE i.e. the cancellation request has failed due to a Globus job-manager problem.
- Cancel NOENT_FAILURE i.e. the job has not been found by JSS, by CondorG or by the Resource Broker.
The --notify option can be used to receive jobs cancellation notifications by e-mail. When this option is used the UI does not wait for the cancel notifications from the RB and returns control to the user immediately after the RB has accepted the cancellation request. This can be useful when a great number of jobs to cancel have been specified and the user wants to be able to perform other operations without waiting for the command results.
Notification to multiple contacts can be requested by specifying the corresponding e-mail addresses separated by commas and without blanks.
OPTIONS
--help
displays command usage.
--version
displays UI version.
--all
cancels all job owned by the user submitting the command. This option can’t be used either if one or more dg_jobIds have been specified explicitly or with the –input option.
--input input_file
-i input_file
cancels dg_jobId contained in the input_files. This option can’t be used neither if one or more dg_jobIds have been specified nor with the –all option.
--notify e_mail_address
-n e_mail_address
when a cancel request is submitted with this option, an e-mail message will be returned to the e_mail_address specified. The message will report on cancellation success/failure of the job specified in input. When the –all option has been specified or cancellation involves more than one job, an e-mail message is sent to the user for each RB that has performed cancellations on behalf of the UI.
Notification to multiple contacts can be requested by specifying the corresponding e-mail addresses separated by commas and without blanks.
--config path_name
-c path_name
if the command is launched with this option, the
configuration file pointed to by path_name
is used instead of the standard configuration file.
--output output_file
-o output_file
writes the cancel results in the file specified by output_file instead of the standard output. output_file can be either a simple name or an absolute path (on the submitting machine). In the former case the file output_file is created in the current working directory.
--noint
if this option is specified every interactive question to the user is skipped. All warning messages and errors (if occurred) are written to the file dg-job-cancel_<UID>_<PID>.log under the /tmp directory. Location of the log file is configurable.
--debug
when this option is specified, information about the
API functions called inside the command are displayed on the standard output
and are written to the file dg-job-cancel_<UID>_<PID>.log under the /tmp directory too. Location of the log file is configurable.
--logfile log_file
when this option is specified, the command log file is relocated to the location pointed by log_file
dg_jobId
job identifier returned by dg-job-submit.
The job identifier list must be the last argument of this command.
Exit Status
dg-job-cancel exits with a status value 0 if all the specified jobs were cancelled successfully, >0 if errors occurred for each specified job id and <0 in case of partial failure. An example of partial failure is when more then one job has been specified: some jobs could be successfully removed and some others could be not removed.
Examples
1. $> dg-job-cancel dg_jobId1 dg_jobId2
displays the following confirmation message:
Are you sure you want to remove all
jobs specified? [y/n]n:
y
**********************************************
JOBS CANCEL OUTCOME
Cancel
SUCCESS for job:
- dg_jobId1
The job has
been successfully marked for removal
------
Cancel
NOENT_FAILURE for job:
- dg_jobId2
Job not found by the Resource Broker
**********************************************
$>
In this case the command exit code is –1.
2. $> dg-job-cancel –all
displays the following confirmation message:
Are
you sure you want to remove all jobs owned by user Fabrizio Pacini? [y/n]n: y
**********************************************
JOBS CANCEL OUTCOME
Cancel SUCCESS for job:
- dg_jobId1
The job has
been successfully marked for removal
------
Cancel SUCCESS for job:
- dg_jobId2
The job
has been successfully marked for removal
**********************************************
$>
The exit code in this case is 0
See also
[A2], dg-job-submit.
Displays bookkeeping information about submitted jobs.
SYNOPSIS
dg-job-status [options] <job Id(s)>
Options:
--help
--version
--all
--input, -i <input_file>
--full, -f
--config, -c <config_file>
--output, -o <output_file>
--noint
--debug
--logfile <log_file>
DESCRIPTION
This command prints the status of a job previously submitted using dg-job-submit. The job status request is sent to the LB that provides the requested information. This can be done during the whole job life.
dg-job-status can monitor one or more jobs: the jobs to be checked are identified by one or more job identifiers (dg_jobIds returned by dg-job-submit) provided as arguments to the command and separated by a blank space.
If the --all option is specified, information about all the jobs owned by the user submitting the command is printed on the standard output. When the command is launched with the --all option, neither can a dg_jobId be specified nor can the --input option be specified.
The --input option permits to specify a file (input_file) that contains the dg_jobIds to monitor. The format of the file must be as follows: one dg_jobId for each line and comment lines have to begin with a “#” or a “*” character. When using this option the user is requested for choosing among all, one or a subset of the listed job identifiers. If the input_file does not represent an absolute path, it will be searched in the current working directory.
If the user wants to use his “private” configuration file, this can be done using option --config path_name.
The job information displayed to the user encompasses (bookkeeping information):
- dg_jobId (the job unique identifier)
-
Status (the job current status)
-
Job Exit Code (the job exit code; if ¹ 0)
- Job Owner (User Certificate Subject)
- Location (Id of RB, JSS or CE)
-
Destination (Id of CE where the job will be transferred
to)
- Status Enter Time (when the job entered actual state)
-
Last Update Time (last known event timestamp)
- Status Reason (reason for being in this state)
If the --full option is specified, dg-job-status displays a long description of the queried jobs by printing in addition the following information:
-
CE Node (id of cluster(s) node where the job is
running)
-
JssId (job identifier in the JSS)
- GlobusId (job identifier in the Globus job-manager)
-
LocalId (id in the CE queue (PBS, LSF, ..))
- Job Description (JDL) (complete JDL description of the job)
-
JSS Job Description (JDL) (complete JDL job description as sent to the
JSS)
- Job Description (job description for Condor-G built from the JDL one)
-
Moving (intermediate state: JobTransfer but neither
JobAccepted nor JobRefused has been logged yet; in this case ‘state’ and
‘location’ refer to the source of job transfer.)
-
Cancelling (whether job
cancellation is in progress)
-
Cancel Reason (cancellation status
message)
Information fields that are not available (i.e. not returned by the LB) are not printed at all to the user.
The job Status possible values are reported in Annex 7.2. Details on the Job Status Diagram can be found in [A4].
OPTIONS
--help
displays command usage.
--version
displays UI version.
--all
displays status information about all job owned by the user submitting the command. This option can’t be used either if one or more dg_jobIds have been specified or if the --input option has been specified. All LBs listed in the UI configuration file UI_ConfigENV.cfg are contacted to fulfil this request.
--input input_file
-i input_file
displays bookkeeping info about dg_jobIds contained in the input_files. When using this option the user is interrogated for choosing among all, one or a subset of the listed job identifiers. This option can’t be used either if one or more dg_jobIds have been specified or if the --all option has been specified.
--full
displays a long description of the queried jobs
--config path_name
-c path_name
if the command is launched with this option, the
configuration file pointed to by path_name
is used instead of the standard configuration file.
--output output_file
-o output_file
writes the bookkeping information in the file specified by output_file instead of the standard output. output_file can be either a simple name or an absolute path (on the submitting machine). In the former case the file output_file is created in the current working directory.
--noint
if this option is specified every interactive question to the user is skipped. All warning messages and errors (if any) are written to the file dg-job-status_<UID>_<PID>.log under the /tmp directory. Location of log file is configurable.
--debug
when this option is specified, information about the
API functions called inside the command are displayed on the standard output
and are written to the file dg-job-status_<UID>_<PID>.log under the /tmp directory too. Location
of log file is configurable.
--logfile log_file
when this option is specified, the command log file is relocated to the location pointed by log_file
dg_jobId
job identifier returned by dg-job-submit. Job identifiers must always be provided as last arguments of the command.
Exit Status
dg-job-status exits with a value of 0 if the status of all the specified jobs is retrieved correctly, >0 if errors occurred for each specified job id and <0 in case of partial failure. An example of partial failure is when more then one job is specified: status info could be successfully retrieved for some jobs and not retrieved for some others.
Examples
$> dg-job-status dg_jobId2
displays the following lines:
********************************************************************
BOOKKEEPING INFORMATION
Printing
status for the job:
https://grid004f.cnaf.infn.it:7846/155.198.211.205/085936117861491?grid004f.cnaf.infn.it:7771
---
dg_JobId =
https://grid004f.cnaf.infn.it:7846/155.198.211.205/085936117861491?
grid004f.cnaf.infn.it:7771
Job Owner =
/C=IT/O=ESA/OU=ESRIN/CN=Fabrizio Pacini/Email=fpacini@datamat.it
Status = Ready
Location =
grid004f.cnaf.infn.it
Job
Destination = skurut.cesnet.cz:2119/jobmanager-pbs-wp1
Status Enter Time = Wed Sep 19
08:20:53 2001
Last Update Time = Wed Sep 19
08:35:13 2001
********************************************************************
$>
See also
[A1], [A2], [A4], dg-job-submit.
– dg-job-get-logging-info
Displays logging information about submitted jobs.
SYNOPSIS
dg-job-get-logging-info
[options] <job Id(s)>
Options:
--help
--version
--all
--input, -i <input_file>
--from <T1>
--to <T2>
--full
--level, -l
--config, -c <config_file>
--output, -o <output_file>
--noint
--debug
--logfile <log_file>
DESCRIPTION
This command queries the LB persistent DB for logging
information about jobs previously submitted using dg-job-submit. The job logging information are stored permanently
by the LB service and can be retrieved also after the job has terminated its
life-cycle, differently from the bookkeeping information that are in some way
“consumed” by the user during the job existence.
The dg-job-get-logging-info request is sent to the LB service that queries the DB and returns the retrieved information. Contents of the logging information are:
- Event Type (possible event types are listed in Annex 7.3)
- dg_jobId
- Logging Level
- Date (UTC)
- Job Transfer Destination
- Host Name
- Job Run Node
- Source Program
-
Job Owner
If the command is
issued with the --full option
additional information consisting in the job description, in JDL or the one for
Condor-G or both according to the WMS component that has logged the event, is
printed to the user.
Data on several jobs can be queried by specifying a list of job identifiers separated by a blank space as arguments of the command. Moreover the --input option permits to specify a file (input_file) which contains the dg_jobId whose information are requested. The format of the file must be as follows: one dg_jobId for each line and comment lines have to begin with a “#” or a “*” character. When using this option the user is interrogated for choosing among all, one or a subset of the listed job identifiers. If the input_file does not represent an absolute path, it will be searched in the current working directory.
If the --all option is used, logging information about all the jobs owned by the user submitting the command are printed on the standard output. When the command is launched with the --all option, neither can one (or more) dg_jobId be specified nor is the --input option allowed.
To perform more complex queries, the user can specify a time range he is interested to by using the --from (T1) and --to (T2) options. These options take as input timestamps in the format hhmmssDDMMYYYY (UTC) and make the command retrieve job logging information only for the specified time interval. If these options are not specified the default values are: Unix Epoch Time (for T1) and current time, i.e. the time the command has been submitted (for T2).
Each event logged in the LB has an associated log level according to “Universal Format for Logger Messages” (see draft-abela-ulm-05.txt available at http://www-didc.lbl.gov/NetLogger/draft-abela-ulm-05.txt). Default value for the log level used by WMS components is System, anyway there could be special situations in which problems investigation is needed and additional events are logged with the Debug log level. The --level option of the dg-job-get-logging-info command allows the user to have returned from the LB also events having a Debug log level. If no --level option is used only events with System log level are returned.
The --output option can be used to have the retrieved information written in the file identified by output_file instead of the standard output. output_file can be either a simple name or an absolute path (on the submitting machine). In the former case the file output_file is created in the current working directory.
If the user wants to use his “private” configuration file this could be done using option --config path_name.
OPTIONS
--help
displays command usage.
--version
displays UI version.
--all
retrieves logging information about all job owned by the user submitting the command. If used, this option must be provided as first command argument.
--input input_file
-i input_file
retrieves logging info for all dg_jobIds contained in the input_files. This option can’t be used either if specifying one or more dg_jobIds or if using the --all option.
--from T1
gets job events logged since the specified date T1. T1 must be in the form hhmmss[DDMMYYYY]. If DDMMYYYY is not provided, input time is considered in the current day.
--to T2
gets job events logged up to the specified date T2. T2 must be in the form hhmmss [DDMMYYYY]. If DDMMYYYY is not provided, input time is considered in the current day.
--full
makes the command display addition job information fields (i.e. the job description in JDL and /or the one for Condor-G).
--level
makes the command retrieve job’s information for events having a log level equal to System and Debug. Otherwise only events with a System log level are returned.
--config path_name
-c path_name
if the command is launched with this option, the
configuration file pointed to by path_name
is used instead of the standard configuration file.
--output output_file
-o output_file
writes the logging information in the file specified by output_file instead of the standard output. output_file can be either a simple name or an absolute path (on the submitting machine). In the former case the file output_file is created in the current working directory.
--noint
if this option is specified every interactive question to the user is skipped. All warning messages and errors (if occurred) are written to the file dg-job-logging_<UID>_<PID>.log under the /tmp directory. Location for log file is configurable.
--debug
when this option is specified, information about the
API functions called inside the command are displayed on the standard output
and are written to the file dg-job-logging_<UID>_<PID>.log under the /tmp directory too.
Location for log file is configurable.
--logfile log_file
when this option is specified, the command log file is relocated to the location pointed by log_file
dg_jobId
job identifier returned by dg-job-submit. Job identifiers must always be provided as last arguments for this command.
Exit Status
dg-job-get-logging-info exits with a value of 0 if the status of all the specified jobs is retrieved correctly, >0 if errors occurred for each specified job and <0 in case of partial failure. An example of partial failure is when more then one job is specified: some job’s logging info could be successfully retrieved and some others could be not retrieved.
Examples
1. $> dg-job-get-logging-info –all –from 12150005052001 –to 10000006052001 –output mylog.txt
writes in file mylog.txt in the current working directory logging information about my jobs for the time since 12:15 on 5 May 2001 up to 10 o’clock on 6 May 2001.
2. $> dg-job-get-logging-info dg_jobId1 –from 113500
where
dg_jobId1 = https://grid004f.cnaf.infn.it:7846/131.154.99.104/14010479391529?grid004f.cnaf.infn.it:7771
displays the following output:
*******************************************************************
LOGGING INFORMATION:
Printing info for the Job : dg_jobId1
For the Event :
JobTransfer
---
Event Type = JobTransfer
dg_jobId = dg_jobId1
Logging Level = System
Date(UTC) = Tue Sep 4 16:12:56 2001
Job Destination =
ResourceBroker/grid004f.cnaf.infn.it:7771
Host Name = lx01
Source Program =
UserInterface
Job Owner =
/O=Grid/O=UKHEP/OU=hep.ph.ic.ac.uk/CN=Fabrizio Pacini
Job Descr (JDL) =
[
- InputSandboxPath = "/tmp/datamat_161251122764136"
- CertificateSubject =
"/O=Grid/O=UKHEP/OU=hep.ph.ic.ac.uk/
CN=Fabrizio
Pacini"
- Rank =
other.FreeCPUs * other.AverageSI00 –
other.EstimatedTraversalTime
- Executable = "WP1testA"
- UserContact = "fpacini@datamat.it"
- StdInput = "sim.dat"
- InputSandbox =
{"/home/datamat/HandsOn-0409/file*","/
home/datamat/DATA/*"}
- StdOutput = "sim.out"
- StdError = "sim.err"
- requirements = other.OpSys == "Linux RH
6.1" ||
other.OpSys == "Linux RH 6.2"
- OutputSandbox =
{"/tmp/sim.err","/tmp/test.out"}
- dg_jobId =
dg_jobId1
]
********************************************************************
$>
See also
[A2], [A4], dg-job-submit.
- dg-job-id-info
This is a simple utility for the user; it just parses the dg_jobId string and displays formatted information contained in the job identifier. This is a “local” command since it does not need any interaction of the UI with the other WMS components.
SYNOPSIS
dg-job-id-info [options] <job Id(s)>
Options:
--help
--version
--input, -i <input_file>
--output, -o <output_file>
DESCRIPTION
This command is used to display formatted information about the job from the dg_jobId of a job previously submitted. It is possible to supply one or more dg_jobIds as input to this command. Moreover it is possible to parse the dg_jobIds listed in a file using the --input option. The parsed information is printed on standard output; redirection of the output in a file can be done through the --output option.
It is important to remark that since no interaction of the UI with other external components is foreseen for this command, it does not need any certificate to work.
OPTIONS
--help
displays command usage.
--version
displays UI version.
--input input_file
-i input_file
parses the dg_jobIds listed in the input_file. This option can’t be used specifying one (or more) dg_jobIds.
--output output_file
-o output_file
writes the formatted information in the file specified by output_file instead of the standard output. output_file can be either a simple name or an absolute path (on the submitting machine). In the first case the file output_file is created in the current working directory.
dg_jobId
job identifier returned by dg-job-submit. Job identifiers must be last arguments of this command.
exit Status
dg-job-id-info exit with a value of 0 if no error occurs, >0 if errors occurred for each specified job identifier and <0 in case of partial failure.
examples
$> dg-job-id-info https://grid001f.pd.infn.it:2234/124.75.74.12/134534534534234?http://grid004f.cnaf.infn.it:4577
displays the following output:
********************************************************************
JOB ID INFO
Printing info for the Job ID :
https://grid004.it:2234/124.75.74.12/134534534534234?www.rb.com:4577
Logging and Bookkeeping Server Address
= https://grid001f.pd.infn.it
Logging and Bookkeeping Server Port
= 2234
Resource Broker Server Address
= http://grid004f.cnaf.infn.it
Resource Broker Server Port
= 4577
Submission Time (hh:mm:ss)
= 13:45:34 (UTC)
User Interface Machine IP Address
= 124.75.74.12
User Interface Process Identifier
= 53453
Randomly Generated Number (0000-9999)
= 4234
********************************************************************
$>

The JDL is a fully extensible language
(i.e. it does not rely on a fixed schema), hence the user is allowed to use
whatever attribute for the description of a job without incurring in errors.
Anyway only a certain set of attributes (that we will refer to as “supported”
attributes) can be taken into account by the WMS components for scheduling a
submitted job. Indeed in order to be actually used for selecting a resource, an
attribute used in a job class-ad needs to have a correlation with some
characteristic of the resources that are published in the GIS (aka MDS).
The “supported”
attributes, their meaning and the way to use them to describe a job are dealt
in detail in document [A7] also available at the following URL:
http://www.infn.it/workload-grid/docs/
The following Figure 1 reports the status that a job can assume during its life cycle.
Figure 1 Job Life Cycle
Job status in Figure 1 are briefly described hereafter (see [A4] for further details):
STATUS:
-
SUBMITTED: job is
submitted but not yet received by the RB (i.e. it is waiting in the UI).
-
WAITING: job is
waiting in the queue in the RB for various reasons (e.g. no appropriate CE
(cluster) found; required dataset is not available, dependency on other job
etc.).
-
READY: appropriate
CE found; job is transferred to the CE.
-
SCHEDULED: job is
waiting in the queue on the CE.
-
RUNNING: job is
running.
-
CHKPT: job is
check-pointed and is waiting for restart; this is a system checkpointing of jobs
running on a CE, independently from Application Checkpointing.
-
DONE: job exited.
-
OUTPUTREADY: job exited
and RB is ready to return output sandbox.
-
ABORTED: job was
aborted for various reasons (e.g. waiting in the queue in RB, JSS or CE for too
long, over-use of quotas, expiration of user credentials, etc.).
-
CLEARED: output files
were transferred to the user, job is removed from bookkeeping database.
Hereafter is reported the list of job event types that could be returned to the user by the dg-job-get-logging-info command. They are organized in several categories:
JobTransfer A component generates this event when it tries to transfer a job to some other component. This event contains the identification of the receiver and possibly the job description expressed in the language accepted by the receiver. The result of the transfer, i.e. success or failure, as seen by the sender is also included.
JobAccept A component generates this event when it receives a job from another WMS component. This event contains also the locally assigned job identifier.
JobRefuse A component generates this event when the transfer of a job to some other component fails. The source of this event, which also includes the reason for the failure, can be either the receiver or the sender, e.g. when the receiver is not available.
JobAbort The job processing is stopped due to system conditions or user request. The reason is included in the event.
JobRun The job is started on a CE.
JobChkpt The job is check-pointed on a CE. The reason is included in the event.
JobDone The job has completed. The process exit status is included in the event.
JobClear The user has successfully retrieved the job results, e.g. the output files specified in the output sandbox (see Section 6.1.3); the job will be removed from the bookkeeping database in the near future.
JobScheduled The job has been successfully submitted to the appropriate CE, i.e. passed to the LRMS.
JobCancel The job has been successfully
marked for removal.
JobFail The job failed during its execution on the CE.
JobMatch An appropriate match between a job and a Computing Element has been found. The event contains the identifier of the selected CE.
JobPending A match between a job and a suitable Computing Element was not found, so the job is kept pending by the RB. The event contains the reason why no match was found.
JobStatus It contains information about resources consumed by the job. This event is generated periodically by the CE and eliminates the need for direct communication from the LB Service to the CE. Two types of information should be considered: cumulative information (e.g. CPU time) and non-cumulative information (e.g. memory consumption). For cumulative properties only the most recent value is kept in the database. For non-cumulative properties, on the other end, all the values are stored in order to allow for example.
More details on job event types can be found in [A4].
Analysis of failed job’s state can be carried out through the check of the consistency and completeness of the job related events returned by the dg-job-get-logging-info command. A further verification if needed, should be then performed on the retrieved output files produced by the jobs (if any) and through the inspection of the log files ad debugging information traced by the various system components.
As explained in section 6.1.3 to get the logging information about a job you need to issue the following command:
dg-job-get-logging-info
<job_Identifier>
Since the output of the command could be copious we advice usage of the –output option too to redirect it to a given file:
dg-job-get-logging-info –output <my_file> <job_Identifier>
Using the –full option allows then to get more detailed information (the job descriptions at the various stages are also included):
dg-job-get-logging-info –full –output <my_file> <job_Identifier>
Before using the dg-job-get-logging-info command, it is in some cases useful a check to the dg-job-status output that can contain information about the cause of a job failure.
As explained in section 6.1.3 to get the status information about a job you need to issue the following command:
dg-job-status <job_Identifier>
Here follows an example of the status information for a job failed due to the unavailability of a CondorG daemon (see the Status Reason field):
dg_JobId =
https://grid010g.cnaf.infn.it:7846/137.138.18………
Status =
Aborted
Last Update Time (UTC) = Thu Oct 31 07:41:44
2002
Job Destination =
tbn09.nikhef.nl:2119/jobmanager-pbs-qshort
Status Reason = Condor
Failure: ERROR: Can't find address of local
schedd – condor command failed
Job Owner = /O=Grid/O=CERN/OU=cern.ch/CN=Erwin Laure
Status Enter Time (UTC) = Thu Oct 31 07:41:44 2002
As said at the beginning of this section another way for analysing submission failures is to inspect the standard output and error of the job generated on the Worker Node and retrieved on the UI machine through the dg-job-get-output command. A typical example of errors that can be detected in this way is when the users submits a script that in turn tries to start enother script or an executable. E.g. the submitted scripts is like:
#!/bin/sh# Use the coincidence file to compare the meaurementscurdir=`pwd`${curdir}/lecture_new_gome_V2_sel1_PT_10idl appli.pro
Upon job abortion, the error message received through the OutputSandbox retrieval is:
./demo_june: /home/eo004/3042/lecture_new_gome_V2_sel1_PT_10: Permission denied
The reason for this error is that globus-url-copy (used for the InputSanbox files staging) in general doesn't preserve the x flag so the script specified as Executable in the JDL (on which chmod +x is done automatically by the WP1 JobWrapper), should perform a chmod +x for all the executable files (lecture_new_gome_V2_sel1_PT_10 in this example) transferred within the InputSandbox of the job.
In the following subsections are reported the events sequences related to a job completed successfully, before and after getting the output (OutputReady and Cleared status) and then events for 3 jobs aborted for different reasons.
Each event in the logging information of a job (see example below) is described by its type (Event Type), the time and date it has happened (Date), the component that was handling job (Source Program) and in some cases (especially the non-nominal ones) the reason that caused the given event (e.g. Job Fail Reason). These are the information field that have to be considered mostly when investigating a job failure.
For example the following event shows that the job submission failed due to a problem in the RSL that JSS passed to the gatekeeper:
Event Type = JobFail
Job Fail Action = 0
dg_jobId
=
https://grid012g.cnaf.infn.it:7846/155.198.211.205……
Certificate Subject =
/C=IT/O=INFN/OU=host/L=CNAF/CN=grid012g.cnaf.it ………
Logging Level = System
Date (UTC) = Fri Oct 25
08:12:16 2002
Job Fail Reason = the provided
RSL could not be properly parsed
Host Name =
grid012f.cnaf.infn.it
Source Program =
JobSubmissionService
Whilst the following one
shows a submission failure due to a temporary down of the gatekeeper at the
target CE:
Event Type = JobFail
Job Fail Action = 0
dg_jobId =
https://grid012g.cnaf.infn.it:7846/155.198.211.20 ……
Certificate Subject =
/C=IT/O=INFN/OU=host/L=CNAF/CN=grid012g.cnaf………
Logging Level = System
Date (UTC) = Fri Oct 25
08:22:16 2002
Job Fail Reason = Globus
down/Failed submission
Host Name =
grid012f.cnaf.infn.it
Source Program =
JobSubmissionService
The following events shows
that a problem occurred during transfer on InputSandbox file from the UI to the
RB:
Event Type =
JobTransfer
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.21 ……
Certificate Subject =
/C=IT/O=INFN/OU=Personal Certificate/L=Datamat/CN= …
Logging Level = System
Date (UTC) = Fri May 10
12:05:22 2002
Job Transfer Dest =
ResourceBroker/grid013g.cnaf.infn.it:7771
Job Transfer Result = FAIL
Job Transfer Reason = Unable to send all input
files
Host Name = lx01
Source Program = UserInterface
Lastly, the following events shows a problem related to a firewall misconfiguration at the fabric where the job has been sent for execution:
Event Type = JobRefuse
dg_jobId = https://atlfarm010.mi.infn.it:7846/193
………
Certificate Subject =
/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN= ……
Logging Level = System
Date (UTC) = Thu Nov 21
00:29:49 2002
Job Refuse Reason = the job manager
timed out while waiting
for a commit signal
Job Refuse Source =
atlfarm010.mi.infn.it
Host Name = polgrid1
Source Program = GlobusJobmanager
Hereafter are reported complete examples of job logging information for both successful and failed submissions of jobs
**********************************************************************
LOGGING INFORMATION:
Printing info for the Job :
https://grid013g.cnaf.infn.it:7846/155.198.211.205/122930130349224?grid013g.cnaf.infn.it:7771
---
Event Type = JobAccept
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/122930130349224?grid013g.cnaf.infn.it:7771
Certificate Subject =
/C=IT/O=INFN/OU=host/L=CNAF/CN=grid013g.cnaf.infn.it/Email=sysop@cnaf.infn.it
Logging Level = System
Date (UTC) = Thu
Jun 6 12:29:34 2002
Job Accept New Id =
RB assigned ID
Job Accept Source =
UserInterface
Host Name =
grid013g.cnaf.infn.it
Source Program =
ResourceBroker
---
Event Type =
JobTransfer
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/122930130349224?grid013g.cnaf.infn.it:7771
Certificate Subject =
/C=IT/O=INFN/OU=Personal Certificate/L=Datamat/CN=Fabrizio
Pacini/Email=fpacini@datamat.it
Logging Level = System
Date (UTC) = Thu Jun 6 12:29:37 2002
Job Transfer Dest =
ResourceBroker/grid013g.cnaf.infn.it:7771
Job Transfer Result =
OK
Host Name = lx01
Source Program =
UserInterface
---
Event Type = JobMatch
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/122930130349224?grid013g.cnaf.infn.it:7771
Certificate Subject =
/C=IT/O=INFN/OU=host/L=CNAF/CN=grid013g.cnaf.infn.it/Email=sysop@cnaf.infn.it
Logging Level = System
Date (UTC) = Thu
Jun 6 12:29:40 2002
Job Match Destination =
lxde01.pd.infn.it:2119/jobmanager-lsf-grid01
Host Name =
grid013g.cnaf.infn.it
Source Program =
ResourceBroker
---
Event Type = JobTransfer
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/122930130349224?grid013g.cnaf.infn.it:7771
Certificate Subject =
/C=IT/O=INFN/OU=host/L=CNAF/CN=grid013g.cnaf.infn.it/Email=sysop@cnaf.infn.it
Logging Level = System
Date (UTC) = Thu
Jun 6 12:29:40 2002
Job Transfer Dest =
JobSubmissionService
Job Transfer Result =
START
Host Name =
grid013g.cnaf.infn.it
Source Program =
ResourceBroker
---
Event Type = JobAccept
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/122930130349224?grid013g.cnaf.infn.it:7771
Certificate Subject =
anonymous
Logging Level = System
Date (UTC) = Thu
Jun 6 12:29:41 2002
Job Accept New Id =
5247.
Job Accept Source =
ResourceBroker
Host Name =
grid013g.cnaf.infn.it
Source Program =
JobSubmissionService
---
Event Type = JobTransfer
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/122930130349224?grid013g.cnaf.infn.it:7771
Certificate Subject =
/C=IT/O=INFN/OU=host/L=CNAF/CN=grid013g.cnaf.infn.it/Email=sysop@cnaf.infn.it
Logging Level = System
Date (UTC) = Thu
Jun 6 12:29:41 2002
Job Transfer Dest =
JobSubmissionService
Job Transfer Result =
OK
Host Name =
grid013g.cnaf.infn.it
Source Program =
ResourceBroker
---
Event Type =
JobTransfer
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/122930130349224?grid013g.cnaf.infn.it:7771
Certificate Subject =
anonymous
Logging Level = System
Date (UTC) = Thu
Jun 6 12:30:01 2002
Job Transfer Dest =
lxde01.pd.infn.it:2119/jobmanager-lsf
Job Transfer Result =
OK
Host Name =
grid013g.cnaf.infn.it
Source Program =
JobSubmissionService
---
Event Type = JobRun
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/122930130349224?grid013g.cnaf.infn.it:7771
Certificate Subject =
anonymous
Logging Level = System
Date (UTC) = Thu Jun
6 12:30:35 2002
Job Run Node =
lxde01.pd.infn.it
Host Name =
grid013g.cnaf.infn.it
Source Program =
JobSubmissionService
---
Event Type = JobRun
dg_jobId = https://grid013g.cnaf.infn.it:7846/155.198.211.205/122930130349224?grid013g.cnaf.infn.it:7771
Certificate Subject =
/C=IT/O=INFN/OU=host/L=CNAF/CN=grid013g.cnaf.infn.it/Email=sysop@cnaf.infn.it
Logging Level = System
Date (UTC) = Thu Jun
6 12:30:35 2002
Job Run Node = (nil)
Host Name =
grid013g.cnaf.infn.it
Source Program =
ResourceBroker
---
Event Type = JobDone
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/122930130349224?grid013g.cnaf.infn.it:7771
Certificate Subject =
anonymous
Logging Level = System
Date (UTC) = Thu
Jun 6 12:32:12 2002
Host Name =
grid013g.cnaf.infn.it
Source Program =
JobSubmissionService
---
Event Type = JobDone
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/122930130349224?grid013g.cnaf.infn.it:7771
Certificate Subject =
/C=IT/O=INFN/OU=host/L=CNAF/CN=grid013g.cnaf.infn.it/Email=sysop@cnaf.infn.it
Logging Level = System
Date (UTC) = Thu
Jun 6 12:32:13 2002
Host Name =
grid013g.cnaf.infn.it
Source Program =
ResourceBroker
**********************************************************************
**********************************************************************
LOGGING INFORMATION:
Printing info for the Job :
https://grid013g.cnaf.infn.it:7846/155.198.211.205/122930130349224?grid013g.cnaf.infn.it:7771
---
Event Type = JobAccept
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/122930130349224?grid013g.cnaf.infn.it:7771
Certificate Subject =
/C=IT/O=INFN/OU=host/L=CNAF/CN=grid013g.cnaf.infn.it/Email=sysop@cnaf.infn.it
Logging Level = System
Date (UTC) = Thu
Jun 6 12:29:34 2002
Job Accept New Id =
RB assigned ID
Job Accept Source =
UserInterface
Host Name = grid013g.cnaf.infn.it
Source Program =
ResourceBroker
---
Event Type =
JobTransfer
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/122930130349224?grid013g.cnaf.infn.it:7771
Certificate Subject =
/C=IT/O=INFN/OU=Personal Certificate/L=Datamat/CN=Fabrizio
Pacini/Email=fpacini@datamat.it
Logging Level = System
Date (UTC) = Thu
Jun 6 12:29:37 2002
Job Transfer Dest =
ResourceBroker/grid013g.cnaf.infn.it:7771
Job Transfer Result =
OK
Host Name = lx01
Source Program =
UserInterface
---
Event Type = JobMatch
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/122930130349224?grid013g.cnaf.infn.it:7771
Certificate Subject =
/C=IT/O=INFN/OU=host/L=CNAF/CN=grid013g.cnaf.infn.it/Email=sysop@cnaf.infn.it
Logging Level = System
Date (UTC) = Thu
Jun 6 12:29:40 2002
Job Match Destination =
lxde01.pd.infn.it:2119/jobmanager-lsf-grid01
Host Name =
grid013g.cnaf.infn.it
Source Program =
ResourceBroker
---
Event Type =
JobTransfer
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/122930130349224?grid013g.cnaf.infn.it:7771
Certificate Subject =
/C=IT/O=INFN/OU=host/L=CNAF/CN=grid013g.cnaf.infn.it/Email=sysop@cnaf.infn.it
Logging Level = System
Date (UTC) = Thu
Jun 6 12:29:40 2002
Job Transfer Dest =
JobSubmissionService
Job Transfer Result =
START
Host Name =
grid013g.cnaf.infn.it
Source Program =
ResourceBroker
---
Event Type = JobAccept
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/122930130349224?grid013g.cnaf.infn.it:7771
Certificate Subject =
anonymous
Logging Level = System
Date (UTC) = Thu
Jun 6 12:29:41 2002
Job Accept New Id = 5247.
Job Accept Source =
ResourceBroker
Host Name =
grid013g.cnaf.infn.it
Source Program =
JobSubmissionService
---
Event Type =
JobTransfer
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/122930130349224?grid013g.cnaf.infn.it:7771
Certificate Subject =
/C=IT/O=INFN/OU=host/L=CNAF/CN=grid013g.cnaf.infn.it/Email=sysop@cnaf.infn.it
Logging Level = System
Date (UTC) = Thu
Jun 6 12:29:41 2002
Job Transfer Dest =
JobSubmissionService
Job Transfer Result =
OK
Host Name =
grid013g.cnaf.infn.it
Source Program =
ResourceBroker
---
Event Type =
JobTransfer
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/122930130349224?grid013g.cnaf.infn.it:7771
Certificate Subject =
anonymous
Logging Level = System
Date (UTC) = Thu
Jun 6 12:30:01 2002
Job Transfer Dest =
lxde01.pd.infn.it:2119/jobmanager-lsf
Job Transfer Result =
OK
Host Name =
grid013g.cnaf.infn.it
Source Program =
JobSubmissionService
---
Event Type = JobRun
dg_jobId = https://grid013g.cnaf.infn.it:7846/155.198.211.205/122930130349224?grid013g.cnaf.infn.it:7771
Certificate Subject =
anonymous
Logging Level = System
Date (UTC) = Thu
Jun 6 12:30:35 2002
Job Run Node = lxde01.pd.infn.it
Host Name =
grid013g.cnaf.infn.it
Source Program =
JobSubmissionService
---
Event Type = JobRun
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/122930130349224?grid013g.cnaf.infn.it:7771
Certificate Subject =
/C=IT/O=INFN/OU=host/L=CNAF/CN=grid013g.cnaf.infn.it/Email=sysop@cnaf.infn.it
Logging Level = System
Date (UTC) = Thu
Jun 6 12:30:35 2002
Job Run Node = (nil)
Host Name =
grid013g.cnaf.infn.it
Source Program =
ResourceBroker
---
Event Type = JobDone
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/122930130349224?grid013g.cnaf.infn.it:7771
Certificate Subject =
anonymous
Logging Level = System
Date (UTC) = Thu
Jun 6 12:32:12 2002
Host Name =
grid013g.cnaf.infn.it
Source Program =
JobSubmissionService
---
Event Type = JobDone
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/122930130349224?grid013g.cnaf.infn.it:7771
Certificate Subject =
/C=IT/O=INFN/OU=host/L=CNAF/CN=grid013g.cnaf.infn.it/Email=sysop@cnaf.infn.it
Logging Level = System
Date (UTC) = Thu
Jun 6 12:32:13 2002
Host Name =
grid013g.cnaf.infn.it
Source Program =
ResourceBroker
---
Event Type = JobClear
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/122930130349224?grid013g.cnaf.infn.it:7771
Certificate Subject =
/C=IT/O=INFN/OU=host/L=CNAF/CN=grid013g.cnaf.infn.it/Email=sysop@cnaf.infn.it
Logging Level = System
Date (UTC) = Thu Jun
6 12:37:53 2002
Host Name =
grid013g.cnaf.infn.it
Source Program =
ResourceBroker
**********************************************************************
Here the job is Aborted since the Information Index could not be contacted by the RB during matchmaking and it was not possible to find resources suitable for job submission.
**********************************************************************
LOGGING INFORMATION:
Printing info for the Job :
https://grid013g.cnaf.infn.it:7846/155.198.211.205/124447132309703?grid013g.cnaf.infn.it:7771
---
Event Type = JobAccept
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/124447132309703?grid013g.cnaf.infn.it:7771
Certificate Subject =
/C=IT/O=INFN/OU=host/L=CNAF/CN=grid013g.cnaf.infn.it/Email=sysop@cnaf.infn.it
Logging Level = System
Date (UTC) = Thu
Jun 6 12:44:51 2002
Job Accept New Id =
RB assigned ID
Job Accept Source =
UserInterface
Host Name =
grid013g.cnaf.infn.it
Source Program =
ResourceBroker
---
Event Type =
JobTransfer
dg_jobId = https://grid013g.cnaf.infn.it:7846/155.198.211.205/124447132309703?grid013g.cnaf.infn.it:7771
Certificate Subject =
/C=IT/O=INFN/OU=Personal Certificate/L=Datamat/CN=Fabrizio
Pacini/Email=fpacini@datamat.it
Logging Level = System
Date (UTC) = Thu
Jun 6 12:44:54 2002
Job Transfer Dest =
ResourceBroker/grid013g.cnaf.infn.it:7771
Job Transfer Result =
OK
Host Name = lx01
Source Program =
UserInterface
---
Event Type = JobAbort
Job Abort Reason =
No matching resource found
Warning : brokerinfo creation reported:
- Search on InformationIndex failed:
while looking for
StorageElementProtocol
(SE=testbed002.cern.ch)
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/124447132309703?grid013g.cnaf.infn.it:7771
Certificate Subject =
/C=IT/O=INFN/OU=host/L=CNAF/CN=grid013g.cnaf.infn.it/Email=sysop@cnaf.infn.it
Logging Level = System
Date (UTC) = Thu
Jun 6 12:45:00 2002
Host Name =
grid013g.cnaf.infn.it
Source Program =
ResourceBroker
**********************************************************************
The Abort Reason (i.e. what the middleware is able to detect) is in that case “Standard output of job wrapper does not contain useful data” but there can be many different causes to this kind of failure. The most commonly detected are (although there may be many other failure modes):
- The home directory of the WN where the job is running is not
- Problems related with the account of the local user which runs the job on the WN (e.g. shell not available, file access permissions etc)
- Exhausted resources on the CE head node
- Race conditions for file updates between the worker node and the CE node
- Glitches in the gass_transfer (i.e. the final commit of the gass transfer for std{out,err} can occasionally take long time and then die)
- no-qstat = done-job behaviour in Globus (i.e. Globus considers a job that is not found by 'qstat' as a 'done' job
**********************************************************************
LOGGING INFORMATION:
Printing info for the Job :
https://grid013g.cnaf.infn.it:7846/155.198.211.205/125000132963246?grid013g.cnaf.infn.it:7771
---
Event Type = JobAccept
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/125000132963246?grid013g.cnaf.infn.it:7771
Certificate Subject =
/C=IT/O=INFN/OU=host/L=CNAF/CN=grid013g.cnaf.infn.it/Email=sysop@cnaf.infn.it
Logging Level = System
Date (UTC) = Thu
Jun 6 12:50:04 2002
Job Accept New Id =
RB assigned ID
Job Accept Source =
UserInterface
Host Name =
grid013g.cnaf.infn.it
Source Program =
ResourceBroker
---
Event Type =
JobTransfer
dg_jobId = https://grid013g.cnaf.infn.it:7846/155.198.211.205/125000132963246?grid013g.cnaf.infn.it:7771
Certificate Subject =
/C=IT/O=INFN/OU=Personal Certificate/L=Datamat/CN=Fabrizio
Pacini/Email=fpacini@datamat.it
Logging Level = System
Date (UTC) = Thu
Jun 6 12:50:08 2002
Job Transfer Dest =
ResourceBroker/grid013g.cnaf.infn.it:7771
Job Transfer Result =
OK
Host Name = lx01
Source Program =
UserInterface
---
Event Type = JobMatch
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/125000132963246?grid013g.cnaf.infn.it:7771
Certificate Subject =
/C=IT/O=INFN/OU=host/L=CNAF/CN=grid013g.cnaf.infn.it/Email=sysop@cnaf.infn.it
Logging Level = System
Date (UTC) = Thu
Jun 6 12:50:16 2002
Job Match Destination =
bbq.mi.infn.it:2119/jobmanager-pbs-dque
Host Name =
grid013g.cnaf.infn.it
Source Program =
ResourceBroker
---
Event Type =
JobTransfer
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/125000132963246?grid013g.cnaf.infn.it:7771
Certificate Subject =
/C=IT/O=INFN/OU=host/L=CNAF/CN=grid013g.cnaf.infn.it/Email=sysop@cnaf.infn.it
Logging Level = System
Date (UTC) = Thu
Jun 6 12:50:16 2002
Job Transfer Dest =
JobSubmissionService
Job Transfer Result =
START
Host Name =
grid013g.cnaf.infn.it
Source Program = ResourceBroker
---
Event Type = JobAccept
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/125000132963246?grid013g.cnaf.infn.it:7771
Certificate Subject =
anonymous
Logging Level = System
Date (UTC) = Thu
Jun 6 12:50:16 2002
Job Accept New Id =
5248.
Job Accept Source =
ResourceBroker
Host Name =
grid013g.cnaf.infn.it
Source Program =
JobSubmissionService
---
Event Type =
JobTransfer
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/125000132963246?grid013g.cnaf.infn.it:7771
Certificate Subject =
/C=IT/O=INFN/OU=host/L=CNAF/CN=grid013g.cnaf.infn.it/Email=sysop@cnaf.infn.it
Logging Level = System
Date (UTC) = Thu
Jun 6 12:50:17 2002
Job Transfer Dest =
JobSubmissionService
Job Transfer Result =
OK
Host Name =
grid013g.cnaf.infn.it
Source Program =
ResourceBroker
---
Event Type =
JobTransfer
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/125000132963246?grid013g.cnaf.infn.it:7771
Certificate Subject =
anonymous
Logging Level = System
Date (UTC) = Thu
Jun 6 12:50:34 2002
Job Transfer Dest =
bbq.mi.infn.it:2119/jobmanager-pbs
Job Transfer Result =
OK
Host Name =
grid013g.cnaf.infn.it
Source Program =
JobSubmissionService
---
Event Type = JobRun
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/125000132963246?grid013g.cnaf.infn.it:7771
Certificate Subject =
anonymous
Logging Level = System
Date (UTC) = Thu
Jun 6 12:51:11 2002
Job Run Node =
bbq.mi.infn.it
Host Name =
grid013g.cnaf.infn.it
Source Program =
JobSubmissionService
---
Event Type = JobFail
Job Fail Action =
0
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/125000132963246?grid013g.cnaf.infn.it:7771
Certificate Subject =
anonymous
Logging Level = System
Date (UTC) = Thu
Jun 6 12:51:11 2002
Job Fail Reason =
Standard output of Job Wrapper does not contain useful data
Host Name =
grid013g.cnaf.infn.it
Source Program =
JobSubmissionService
---
Event Type =
JobTransfer
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/125000132963246?grid013g.cnaf.infn.it:7771
Certificate Subject =
anonymous
Logging Level = System
Date (UTC) = Thu
Jun 6 12:51:11 2002
Job Transfer Dest =
ResourceBroker
Job Transfer Result =
OK
Host Name =
grid013g.cnaf.infn.it
Source Program =
JobSubmissionService
---
Event Type = JobRun
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/125000132963246?grid013g.cnaf.infn.it:7771
Certificate Subject =
/C=IT/O=INFN/OU=host/L=CNAF/CN=grid013g.cnaf.infn.it/Email=sysop@cnaf.infn.it
Logging Level = System
Date (UTC) = Thu Jun
6 12:51:11 2002
Job Run Node =
bbq.mi.infn.it:2119/jobmanager-pbs-dque
Host Name =
grid013g.cnaf.infn.it
Source Program =
ResourceBroker
---
Event Type = JobAccept
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/125000132963246?grid013g.cnaf.infn.it:7771
Certificate Subject =
/C=IT/O=INFN/OU=host/L=CNAF/CN=grid013g.cnaf.infn.it/Email=sysop@cnaf.infn.it
Logging Level = System
Date (UTC) = Thu
Jun 6 12:51:12 2002
Job Accept New Id =
Sent back by JSS
Job Accept Source =
JobSubmissionService
Host Name =
grid013g.cnaf.infn.it
Source Program =
ResourceBroker
---
Event Type = JobPending
Job Pending Reason =
Resubmitting.
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/125000132963246?grid013g.cnaf.infn.it:7771
Certificate Subject =
/C=IT/O=INFN/OU=host/L=CNAF/CN=grid013g.cnaf.infn.it/Email=sysop@cnaf.infn.it
Logging Level = System
Date (UTC) = Thu
Jun 6 12:51:27 2002
Host Name =
grid013g.cnaf.infn.it
Source Program =
ResourceBroker
---
Event Type = JobMatch
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/125000132963246?grid013g.cnaf.infn.it:7771
Certificate Subject =
/C=IT/O=INFN/OU=host/L=CNAF/CN=grid013g.cnaf.infn.it/Email=sysop@cnaf.infn.it
Logging Level = System
Date (UTC) = Thu
Jun 6 12:51:31 2002
Job Match Destination =
bbq.mi.infn.it:2119/jobmanager-pbs-dque
Host Name =
grid013g.cnaf.infn.it
Source Program =
ResourceBroker
---
Event Type =
JobTransfer
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/125000132963246?grid013g.cnaf.infn.it:7771
Certificate Subject =
/C=IT/O=INFN/OU=host/L=CNAF/CN=grid013g.cnaf.infn.it/Email=sysop@cnaf.infn.it
Logging Level = System
Date (UTC) = Thu
Jun 6 12:51:31 2002
Job Transfer Dest =
JobSubmissionService
Job Transfer Result =
START
Host Name =
grid013g.cnaf.infn.it
Source Program =
ResourceBroker
---
Event Type = JobAccept
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/125000132963246?grid013g.cnaf.infn.it:7771
Certificate Subject =
anonymous
Logging Level = System
Date (UTC) = Thu
Jun 6 12:51:31 2002
Job Accept New Id =
5249.
Job Accept Source =
ResourceBroker
Host Name =
grid013g.cnaf.infn.it
Source Program =
JobSubmissionService
---
Event Type =
JobTransfer
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/125000132963246?grid013g.cnaf.infn.it:7771
Certificate Subject =
/C=IT/O=INFN/OU=host/L=CNAF/CN=grid013g.cnaf.infn.it/Email=sysop@cnaf.infn.it
Logging Level = System
Date (UTC) = Thu
Jun 6 12:51:31 2002
Job Transfer Dest =
JobSubmissionService
Job Transfer Result =
OK
Host Name =
grid013g.cnaf.infn.it
Source Program =
ResourceBroker
---
Event Type =
JobTransfer
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/125000132963246?grid013g.cnaf.infn.it:7771
Certificate Subject =
anonymous
Logging Level = System
Date (UTC) = Thu
Jun 6 12:51:52 2002
Job Transfer Dest =
bbq.mi.infn.it:2119/jobmanager-pbs
Job Transfer Result =
OK
Host Name =
grid013g.cnaf.infn.it
Source Program =
JobSubmissionService
---
Event Type = JobRun
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/125000132963246?grid013g.cnaf.infn.it:7771
Certificate Subject =
anonymous
Logging Level = System
Date (UTC) = Thu
Jun 6 12:52:29 2002
Job Run Node =
bbq.mi.infn.it
Host Name =
grid013g.cnaf.infn.it
Source Program =
JobSubmissionService
---
Event Type = JobFail
Job Fail Action =
0
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/125000132963246?grid013g.cnaf.infn.it:7771
Certificate Subject =
anonymous
Logging Level = System
Date (UTC) = Thu Jun 6 12:52:29 2002
Job Fail Reason =
Standard output of Job Wrapper does not contain useful data
Host Name =
grid013g.cnaf.infn.it
Source Program =
JobSubmissionService
---
Event Type = JobTransfer
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/125000132963246?grid013g.cnaf.infn.it:7771
Certificate Subject =
anonymous
Logging Level = System
Date (UTC) = Thu
Jun 6 12:52:29 2002
Job Transfer Dest =
ResourceBroker
Job Transfer Result =
OK
Host Name =
grid013g.cnaf.infn.it
Source Program =
JobSubmissionService
---
Event Type = JobRun
dg_jobId = https://grid013g.cnaf.infn.it:7846/155.198.211.205/125000132963246?grid013g.cnaf.infn.it:7771
Certificate Subject =
/C=IT/O=INFN/OU=host/L=CNAF/CN=grid013g.cnaf.infn.it/Email=sysop@cnaf.infn.it
Logging Level = System
Date (UTC) = Thu
Jun 6 12:52:30 2002
Job Run Node =
bbq.mi.infn.it:2119/jobmanager-pbs-dque
Host Name =
grid013g.cnaf.infn.it
Source Program =
ResourceBroker
---
Event Type = JobAccept
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/125000132963246?grid013g.cnaf.infn.it:7771
Certificate Subject =
/C=IT/O=INFN/OU=host/L=CNAF/CN=grid013g.cnaf.infn.it/Email=sysop@cnaf.infn.it
Logging Level = System
Date (UTC) = Thu
Jun 6 12:52:31 2002
Job Accept New Id =
Sent back by JSS
Job Accept Source =
JobSubmissionService
Host Name =
grid013g.cnaf.infn.it
Source Program =
ResourceBroker
---
Event Type = JobPending
Job Pending Reason =
Resubmitting.
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/125000132963246?grid013g.cnaf.infn.it:7771
Certificate Subject =
/C=IT/O=INFN/OU=host/L=CNAF/CN=grid013g.cnaf.infn.it/Email=sysop@cnaf.infn.it
Logging Level = System
Date (UTC) = Thu
Jun 6 12:52:46 2002
Host Name =
grid013g.cnaf.infn.it
Source Program =
ResourceBroker
---
Event Type = JobMatch
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/125000132963246?grid013g.cnaf.infn.it:7771
Certificate Subject =
/C=IT/O=INFN/OU=host/L=CNAF/CN=grid013g.cnaf.infn.it/Email=sysop@cnaf.infn.it
Logging Level = System
Date (UTC) = Thu
Jun 6 12:52:49 2002
Job Match Destination =
bbq.mi.infn.it:2119/jobmanager-pbs-dque
Host Name =
grid013g.cnaf.infn.it
Source Program =
ResourceBroker
---
Event Type =
JobTransfer
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/125000132963246?grid013g.cnaf.infn.it:7771
Certificate Subject =
/C=IT/O=INFN/OU=host/L=CNAF/CN=grid013g.cnaf.infn.it/Email=sysop@cnaf.infn.it
Logging Level = System
Date (UTC) = Thu
Jun 6 12:52:49 2002
Job Transfer Dest =
JobSubmissionService
Job Transfer Result =
START
Host Name =
grid013g.cnaf.infn.it
Source Program =
ResourceBroker
---
Event Type = JobAccept
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/125000132963246?grid013g.cnaf.infn.it:7771
Certificate Subject =
anonymous
Logging Level = System
Date (UTC) = Thu
Jun 6 12:52:49 2002
Job Accept New Id =
5250.
Job Accept Source =
ResourceBroker
Host Name =
grid013g.cnaf.infn.it
Source Program =
JobSubmissionService
---
Event Type =
JobTransfer
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/125000132963246?grid013g.cnaf.infn.it:7771
Certificate Subject =
/C=IT/O=INFN/OU=host/L=CNAF/CN=grid013g.cnaf.infn.it/Email=sysop@cnaf.infn.it
Logging Level = System
Date (UTC) = Thu
Jun 6 12:52:49 2002
Job Transfer Dest =
JobSubmissionService
Job Transfer Result =
OK
Host Name =
grid013g.cnaf.infn.it
Source Program =
ResourceBroker
---
Event Type =
JobTransfer
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/125000132963246?grid013g.cnaf.infn.it:7771
Certificate Subject =
anonymous
Logging Level = System
Date (UTC) = Thu
Jun 6 12:53:11 2002
Job Transfer Dest =
bbq.mi.infn.it:2119/jobmanager-pbs
Job Transfer Result =
OK
Host Name =
grid013g.cnaf.infn.it
Source Program =
JobSubmissionService
---
Event Type = JobRun
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/125000132963246?grid013g.cnaf.infn.it:7771
Certificate Subject =
anonymous
Logging Level = System
Date (UTC) = Thu
Jun 6 12:53:46 2002
Job Run Node =
bbq.mi.infn.it
Host Name =
grid013g.cnaf.infn.it
Source Program =
JobSubmissionService
---
Event Type = JobFail
Job Fail Action =
0
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/125000132963246?grid013g.cnaf.infn.it:7771
Certificate Subject =
anonymous
Logging Level = System
Date (UTC) = Thu Jun 6 12:53:46 2002
Job Fail Reason =
Standard output of Job Wrapper does not contain useful data
Host Name =
grid013g.cnaf.infn.it
Source Program =
JobSubmissionService
---
Event Type = JobTransfer
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/125000132963246?grid013g.cnaf.infn.it:7771
Certificate Subject =
anonymous
Logging Level = System
Date (UTC) = Thu
Jun 6 12:53:46 2002
Job Transfer Dest =
ResourceBroker
Job Transfer Result =
OK
Host Name =
grid013g.cnaf.infn.it
Source Program =
JobSubmissionService
---
Event Type = JobRun
dg_jobId = https://grid013g.cnaf.infn.it:7846/155.198.211.205/125000132963246?grid013g.cnaf.infn.it:7771
Certificate Subject =
/C=IT/O=INFN/OU=host/L=CNAF/CN=grid013g.cnaf.infn.it/Email=sysop@cnaf.infn.it
Logging Level = System
Date (UTC) = Thu Jun
6 12:53:47 2002
Job Run Node =
bbq.mi.infn.it:2119/jobmanager-pbs-dque
Host Name =
grid013g.cnaf.infn.it
Source Program =
ResourceBroker
---
Event Type = JobAccept
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/125000132963246?grid013g.cnaf.infn.it:7771
Certificate Subject =
/C=IT/O=INFN/OU=host/L=CNAF/CN=grid013g.cnaf.infn.it/Email=sysop@cnaf.infn.it
Logging Level = System
Date (UTC) = Thu
Jun 6 12:53:48 2002
Job Accept New Id =
Sent back by JSS
Job Accept Source =
JobSubmissionService
Host Name =
grid013g.cnaf.infn.it
Source Program =
ResourceBroker
---
Event Type = JobPending
Job Pending Reason =
Resubmitting.
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/125000132963246?grid013g.cnaf.infn.it:7771
Certificate Subject =
/C=IT/O=INFN/OU=host/L=CNAF/CN=grid013g.cnaf.infn.it/Email=sysop@cnaf.infn.it
Logging Level = System
Date (UTC) = Thu
Jun 6 12:54:03 2002
Host Name =
grid013g.cnaf.infn.it
Source Program =
ResourceBroker
---
Event Type = JobAbort
Job Abort Reason =
No matching resource found
Warning : matchmaking process reported:
- Submission to
bbq.mi.infn.it:2119/jobmanager-pbs-dque failed.
Warning : brokerinfo creation reported:
- Search on InformationIndex failed:
while looking for
StorageElementProtocol (SE=testbed002.cern.ch)
dg_jobId =
https://grid013g.cnaf.infn.it:7846/155.198.211.205/125000132963246?grid013g.cnaf.infn.it:7771
Certificate Subject =
/C=IT/O=INFN/OU=host/L=CNAF/CN=grid013g.cnaf.infn.it/Email=sysop@cnaf.infn.it
Logging Level = System
Date (UTC) = Thu
Jun 6 12:54:05 2002
Host Name =
grid013g.cnaf.infn.it
Source Program =
ResourceBroker
**********************************************************************
**********************************************************************
LOGGING
INFORMATION:
Printing
info for the Job :
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
---
Event
Type = JobAccept
dg_jobId = https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 20:44:23 2002
Job
Accept New Id = RB assigned ID
Job
Accept Source = UserInterface
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobTransfer
dg_jobId = https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /C=IT/O=INFN/OU=Personal Certificate/L=Bologna/CN=Alessandra
Fanfani/Email=Alessandra.Fanfani@bo.infn.it
Logging
Level = System
Date
(UTC) = Wed Nov 13 20:44:32 2002
Job
Transfer Dest = ResourceBroker/lxshare0381.cern.ch:7771
Job
Transfer Result = OK
Host
Name = testbed002.cern.ch
Source
Program = UserInterface
---
Event
Type = JobMatch
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 20:44:33 2002
Job
Match Destination =
lxshare0223.cern.ch:2119/jobmanager-pbs-short
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobRefuse
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:53:23 2002
Job
Refuse Reason = Submitting job(s) - condor command failed
Job
Refuse Source = JobSubmissionService
Host
Name = lxshare0381.cern.ch
Source
Program =
JobSubmissionService
---
Event
Type = JobAccept
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:53:23 2002
Job
Accept New Id = Sent back by JSS
Job
Accept Source = JobSubmissionService
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobPending
Job
Pending Reason = Resubmitting.
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:53:38 2002
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobMatch
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:53:40 2002
Job
Match Destination =
lxshare0223.cern.ch:2119/jobmanager-pbs-medium
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobRefuse
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:53:43 2002
Job
Refuse Reason = Submitting job(s) - condor command failed
Job
Refuse Source = JobSubmissionService
Host
Name = lxshare0381.cern.ch
Source
Program = JobSubmissionService
---
Event
Type = JobAccept
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:53:44 2002
Job
Accept New Id = Sent back by JSS
Job
Accept Source = JobSubmissionService
Host
Name =
lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobPending
Job
Pending Reason = Resubmitting.
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:53:59 2002
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobMatch
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:54:01 2002
Job
Match Destination =
lxshare0223.cern.ch:2119/jobmanager-pbs-long
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobRefuse
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:54:01 2002
Job
Refuse Reason = Submitting job(s) - condor command failed
Job
Refuse Source = JobSubmissionService
Host
Name = lxshare0381.cern.ch
Source
Program = JobSubmissionService
---
Event
Type = JobAccept
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:54:01 2002
Job
Accept New Id = Sent back by JSS
Job
Accept Source = JobSubmissionService
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobPending
Job
Pending Reason = Resubmitting.
dg_jobId = https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:54:17 2002
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobMatch
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:54:19 2002
Job
Match Destination =
lxshare0223.cern.ch:2119/jobmanager-pbs-infinite
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobRefuse
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:54:19 2002
Job
Refuse Reason = Submitting job(s) - condor command failed
Job
Refuse Source = JobSubmissionService
Host
Name = lxshare0381.cern.ch
Source
Program = JobSubmissionService
---
Event
Type = JobAccept
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:54:19 2002
Job
Accept New Id = Sent back by JSS
Job
Accept Source = JobSubmissionService
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobPending
Job
Pending Reason = Resubmitting.
dg_jobId = https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:54:35 2002
Host
Name =
lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobMatch
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:54:36 2002
Job
Match Destination =
gppce06.gridpp.rl.ac.uk:2119/jobmanager-pbs-S
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobRefuse
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:54:37 2002
Job
Refuse Reason = Submitting job(s) - condor command failed
Job
Refuse Source = JobSubmissionService
Host
Name = lxshare0381.cern.ch
Source
Program = JobSubmissionService
---
Event
Type = JobAccept
dg_jobId = https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:54:37 2002
Job
Accept New Id = Sent back by JSS
Job
Accept Source = JobSubmissionService
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobPending
Job
Pending Reason = Resubmitting.
dg_jobId = https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:54:52 2002
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobMatch
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:54:53 2002
Job
Match Destination =
gppce06.gridpp.rl.ac.uk:2119/jobmanager-pbs-M
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobRefuse
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:54:54 2002
Job
Refuse Reason = Submitting job(s) - condor command failed
Job
Refuse Source = JobSubmissionService
Host
Name = lxshare0381.cern.ch
Source
Program = JobSubmissionService
---
Event
Type = JobAccept
dg_jobId = https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:54:54 2002
Job
Accept New Id = Sent back by JSS
Job
Accept Source = JobSubmissionService
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobPending
Job
Pending Reason = Resubmitting.
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:55:09 2002
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobMatch
dg_jobId = https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:55:11 2002
Job
Match Destination = gppce06.gridpp.rl.ac.uk:2119/jobmanager-pbs-L
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobRefuse
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:55:11 2002
Job
Refuse Reason = Submitting job(s) - condor command failed
Job
Refuse Source = JobSubmissionService
Host
Name = lxshare0381.cern.ch
Source
Program = JobSubmissionService
---
Event
Type = JobAccept
dg_jobId = https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:55:11 2002
Job
Accept New Id = Sent back by JSS
Job
Accept Source = JobSubmissionService
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobPending
Job
Pending Reason = Resubmitting.
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:55:27 2002
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobMatch
dg_jobId = https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:55:28 2002
Job
Match Destination = lxshare0223.cern.ch:2119/jobmanager-pbs-short
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobRefuse
dg_jobId = https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:55:29 2002
Job
Refuse Reason = Submitting job(s) - condor command failed
Job
Refuse Source = JobSubmissionService
Host
Name = lxshare0381.cern.ch
Source
Program = JobSubmissionService
---
Event
Type = JobAccept
dg_jobId = https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:55:29 2002
Job
Accept New Id = Sent back by JSS
Job
Accept Source = JobSubmissionService
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobPending
Job
Pending Reason = Resubmitting.
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:55:44 2002
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobMatch
dg_jobId = https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:55:47 2002
Job
Match Destination =
lxshare0223.cern.ch:2119/jobmanager-pbs-medium
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobRefuse
dg_jobId = https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:55:47 2002
Job
Refuse Reason = Submitting job(s) - condor command failed
Job
Refuse Source = JobSubmissionService
Host
Name = lxshare0381.cern.ch
Source
Program = JobSubmissionService
---
Event
Type = JobAccept
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) =
Wed Nov 13 21:55:47 2002
Job
Accept New Id = Sent back by JSS
Job
Accept Source = JobSubmissionService
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobPending
Job
Pending Reason = Resubmitting.
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:56:03 2002
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobMatch
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:56:04 2002
Job
Match Destination =
lxshare0223.cern.ch:2119/jobmanager-pbs-long
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobRefuse
dg_jobId = https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:56:04 2002
Job
Refuse Reason = Submitting job(s) - condor command failed
Job
Refuse Source = JobSubmissionService
Host
Name = lxshare0381.cern.ch
Source
Program = JobSubmissionService
---
Event
Type =
JobAccept
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:56:05 2002
Job
Accept New Id = Sent back by JSS
Job
Accept Source = JobSubmissionService
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobPending
Job
Pending Reason = Resubmitting.
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:56:20 2002
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobMatch
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) =
Wed Nov 13 21:56:22 2002
Job
Match Destination =
lxshare0223.cern.ch:2119/jobmanager-pbs-infinite
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobRefuse
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:56:23 2002
Job
Refuse Reason = Submitting job(s) - condor command failed
Job
Refuse Source = JobSubmissionService
Host
Name = lxshare0381.cern.ch
Source
Program = JobSubmissionService
---
Event
Type = JobAccept
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:56:23 2002
Job
Accept New Id = Sent back by JSS
Job
Accept Source = JobSubmissionService
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobPending
Job
Pending Reason = Resubmitting.
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:56:38 2002
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobMatch
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:56:39 2002
Job
Match Destination =
gppce06.gridpp.rl.ac.uk:2119/jobmanager-pbs-S
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobRefuse
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:56:40 2002
Job
Refuse Reason = Submitting job(s) - condor command failed
Job
Refuse Source = JobSubmissionService
Host
Name = lxshare0381.cern.ch
Source
Program = JobSubmissionService
---
Event
Type = JobAccept
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:56:40 2002
Job
Accept New Id = Sent back by JSS
Job
Accept Source = JobSubmissionService
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobPending
Job
Pending Reason = Resubmitting.
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:56:55 2002
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobMatch
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:56:57 2002
Job
Match Destination =
gppce06.gridpp.rl.ac.uk:2119/jobmanager-pbs-M
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobRefuse
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:56:57 2002
Job
Refuse Reason = Submitting job(s) - condor command failed
Job
Refuse Source = JobSubmissionService
Host
Name = lxshare0381.cern.ch
Source
Program = JobSubmissionService
---
Event
Type = JobAccept
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:56:57 2002
Job
Accept New Id = Sent back by JSS
Job
Accept Source = JobSubmissionService
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobPending
Job
Pending Reason = Resubmitting.
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:57:13 2002
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobMatch
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:57:15 2002
Job
Match Destination =
gppce06.gridpp.rl.ac.uk:2119/jobmanager-pbs-L
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobRefuse
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:57:15 2002
Job
Refuse Reason = Submitting job(s) - condor command failed
Job
Refuse Source = JobSubmissionService
Host
Name = lxshare0381.cern.ch
Source
Program = JobSubmissionService
---
Event
Type = JobAccept
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:57:15 2002
Job
Accept New Id = Sent back by JSS
Job
Accept Source = JobSubmissionService
Host
Name =
lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobPending
Job
Pending Reason = Resubmitting.
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:57:31 2002
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobMatch
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:57:32 2002
Job
Match Destination =
lxshare0223.cern.ch:2119/jobmanager-pbs-short
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobRefuse
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:57:33 2002
Job
Refuse Reason = Submitting job(s) - condor command failed
Job
Refuse Source = JobSubmissionService
Host
Name =
lxshare0381.cern.ch
Source
Program = JobSubmissionService
---
Event
Type = JobAccept
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:57:33 2002
Job
Accept New Id = Sent back by JSS
Job
Accept Source = JobSubmissionService
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobPending
Job
Pending Reason = Resubmitting.
dg_jobId = https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:57:49 2002
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobMatch
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:57:50 2002
Job
Match Destination =
lxshare0223.cern.ch:2119/jobmanager-pbs-medium
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobRefuse
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:57:51 2002
Job
Refuse Reason = Submitting job(s) - condor command failed
Job
Refuse Source = JobSubmissionService
Host
Name = lxshare0381.cern.ch
Source
Program = JobSubmissionService
---
Event
Type = JobAccept
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:57:51 2002
Job
Accept New Id = Sent back by JSS
Job
Accept Source = JobSubmissionService
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobPending
Job
Pending Reason = Resubmitting.
dg_jobId = https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:58:06 2002
Host
Name =
lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobMatch
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:58:08 2002
Job
Match Destination =
lxshare0223.cern.ch:2119/jobmanager-pbs-long
Host
Name =
lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobRefuse
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:58:08 2002
Job
Refuse Reason = Submitting job(s) - condor command failed
Job
Refuse Source = JobSubmissionService
Host
Name = lxshare0381.cern.ch
Source
Program = JobSubmissionService
---
Event
Type = JobAccept
dg_jobId = https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:58:09 2002
Job
Accept New Id = Sent back by JSS
Job
Accept Source = JobSubmissionService
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobPending
Job
Pending Reason = Resubmitting.
dg_jobId = https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:58:24 2002
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobMatch
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:58:26 2002
Job
Match Destination =
lxshare0223.cern.ch:2119/jobmanager-pbs-infinite
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobRefuse
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:58:26 2002
Job
Refuse Reason = Submitting job(s) - condor command failed
Job
Refuse Source = JobSubmissionService
Host
Name = lxshare0381.cern.ch
Source
Program = JobSubmissionService
---
Event
Type = JobAccept
dg_jobId = https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:58:27 2002
Job
Accept New Id = Sent back by JSS
Job
Accept Source = JobSubmissionService
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobPending
Job
Pending Reason = Resubmitting.
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:58:42 2002
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobMatch
dg_jobId = https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:58:44 2002
Job
Match Destination = gppce06.gridpp.rl.ac.uk:2119/jobmanager-pbs-S
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobRefuse
dg_jobId = https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:58:47 2002
Job
Refuse Reason = Submitting job(s) - condor command failed
Job
Refuse Source = JobSubmissionService
Host
Name = lxshare0381.cern.ch
Source
Program = JobSubmissionService
---
Event
Type = JobAccept
dg_jobId = https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:58:47 2002
Job
Accept New Id = Sent back by JSS
Job
Accept Source = JobSubmissionService
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobPending
Job
Pending Reason = Resubmitting.
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:59:02 2002
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobMatch
dg_jobId = https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:59:04 2002
Job
Match Destination = gppce06.gridpp.rl.ac.uk:2119/jobmanager-pbs-M
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobRefuse
dg_jobId = https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:59:05 2002
Job
Refuse Reason = Submitting job(s) - condor command failed
Job
Refuse Source = JobSubmissionService
Host
Name = lxshare0381.cern.ch
Source
Program = JobSubmissionService
---
Event
Type = JobAccept
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:59:05 2002
Job
Accept New Id = Sent back by JSS
Job
Accept Source = JobSubmissionService
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobPending
Job
Pending Reason = Resubmitting.
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:59:20 2002
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobMatch
dg_jobId = https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:59:22 2002
Job
Match Destination =
gppce06.gridpp.rl.ac.uk:2119/jobmanager-pbs-L
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobRefuse
dg_jobId = https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:59:23 2002
Job
Refuse Reason = Submitting job(s) - condor command failed
Job
Refuse Source = JobSubmissionService
Host
Name = lxshare0381.cern.ch
Source
Program = JobSubmissionService
---
Event
Type = JobAccept
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) =
Wed Nov 13 21:59:23 2002
Job
Accept New Id = Sent back by JSS
Job
Accept Source = JobSubmissionService
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobPending
Job
Pending Reason = Resubmitting.
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13 21:59:38 2002
Host
Name = lxshare0381.cern.ch
Source
Program = ResourceBroker
---
Event
Type = JobAbort
Job
Abort Reason = No matching resource found
Warning :
matchmaking process reported:
- Submission to
lxshare0223.cern.ch:2119/jobmanager-pbs-short
failed.
- Submission to
lxshare0223.cern.ch:2119/jobmanager-pbs-medium
failed.
- Submission to
lxshare0223.cern.ch:2119/jobmanager-pbs-long
failed.
- Submission to
lxshare0223.cern.ch:2119/jobmanager-pbs-infinite
failed.
- Submission to
gppce06.gridpp.rl.ac.uk:2119/jobmanager-pbs-S
failed.
- Submission to
gppce06.gridpp.rl.ac.uk:2119/jobmanager-pbs-M
failed.
- Submission to
gppce06.gridpp.rl.ac.uk:2119/jobmanager-pbs-L
failed.
dg_jobId =
https://lxshare0381.cern.ch:7846/137.138.181.214/204422259933560?lxshare0381.cern.ch:7771
Certificate
Subject = /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
Logging
Level = System
Date
(UTC) = Wed Nov 13
21:59:39 2002
Host
Name = lxshare0381.cern.ch
Program
**********************************************************************
The wildcard patterns that can be included in the InputSandbox attribute expression are used by the UI to perform file name “globbing” in a fashion similar to the UNIX csh shell. The result of the “globbing” is a list of the files whose names match any of the specified patterns.
The admitted special characters together with their meaning are listed hereafter:
- * wildcard for any string
- ? wildcard for any single character
- [chars ] delimits a wildcard matching any of the enclosed characters. If chars contains a sequence of the form a-b then any character between a and b (inclusive) will match. Such an expression can be negated by means of the special character “!” ([!chars] matches any character not in chars).
Examples
Consider a directory where “ls –F” gives:
1file a1.f apple.o bob.o
h4374.f john.o
2files ab apps/ foo.c
h4374.o mydir/
ABS ab.f bob foo.f john stuff/
a1 apple.f bob.f gh
john.f
That is to say some files and directories. The examples below show the way the mentioned wildcards are expanded (the notation => indicates the result of typing the command).
1) Every two letter file name:
echo ?? => a1 ab gh
2) Every two character name starting with “a“:
echo a? => a1 ab
3) Every file starting with j, o, h, or n:
echo [john]* => h4374.f h4374.o john john.f john.o
4) Include a range, e.g. everything starting with an upper case letter or a digit:
echo [A-Z0-9]* => 1file 2files ABS
5) Negate a range:
echo [!john]*.f => a1.f ab.f apple.f bob.f foo.f
6) Every file starting in “a” and ending in .f:
echo a*.f => a1.f ab.f apple.f
The main task performed by the RB is to find the best suitable Computing Element to execute the job at. In order to accomplish this task the RB interacts with the other WMS components. More precisely, the Replica Catalogue (RC) and the Information Index (II) are the two main WMS components which supply the RB with all the information required for the actual resolution of the matches between job requirements and Computing Element capabilities (i.e. runtime environments, data access features, processing resources etc.).
The following sections provide a description of the matchmaking algorithm performed by the RB. At this aim it is worth to identify three different scenarios to be dealt with separately :
- direct job submission,
- job submission without data-access requirements,
- job submission with data-access requirements.
The simplest scenario is to consider the case where the JDL submitted by the UI contains a link to the resource to submit the job at, i.e. the Computing Element identifier (CEId). In this case the RB doesn’t perform any matchmaking algorithm at all, but simply limits its action to the delegation of the submission request to the JSS, for the actual submission.

Figure 2 - Submission with CEId known
It should be pointed out that, if the CEId is specified then the RB neither checks whether the user who owns the job is authorised to access the given CE, nor interacts with the RC for the resolution of files requirements, if any. The only check performed by the RB is the JDL syntax one, while converting the JDL into a ClassAd.
Let’s do a little step onwards and consider the scenario where the user specifies a job with given execution requirements, but without data files ones. Once the JDL has been received by the RB and successfully converted into ClassAd (job-ad) the RB starts the actual match-making algorithm to find if the characteristics and status of Grid resources matches the job requirements.
The matchmaking algorithm consists of two different phases: requirements check and the rank computation.
During the requirements check phase the RB contacts the II in order to create a set of the more likely CEs to execute the job at, thus compliant with user requirements and user certificate subject, as well. Taking into account that all the CE attributes involved in the JDL requirements (defined by the user to express his/her needs) are almost constant in time (i.e. it’s improbable that a CE changes its operating system or its runtime environment in the very short term, e.g. every half an hour), it is clear that all the information cached in the II represent a good source for testing matches between job requirements and CE features. It is clearly more efficient than contacting each CE to find out the same information.

Figure 3 - Requirements checking phase
Once the RB has created the set of the suitable CEs where the job can be executed, the RB performs the second phase of the matchmaking algorithm, which allows the RB to acquire information about the “quality” of the just found suitable CEs. On the other hand if no suitable CEs have been found the RB sends an e-mail notification at the user recipient specified in the JDL.
In the ranking phase the RB contacts directly the LDAP server of the involved CEs to obtain the values of those attributes, which appears in the rank expression of the received JDL. It should be pointed out that conversely to the previous phase, it is better to contact each suitable CE, rather than using the II as source of information, since the rank attributes represents variables varying in time very frequently (i.e. FreeCPUs, FreeMemory).
Currently if all the suitable CEs are assigned with the same rank value the RB performs a “random” choice, i.e. the first CE in the list of suitable ones will receive the job for executing it. It is clear that a more sophisticated method should be adopted by the RB in case of equal ranking CEs, decoupling the user from the need of defining significant rank expressions. One possibility could be the execution of a post-ranking selection, which depending on performance factor, which should be defined, supply the user with the optimal CE choice for the actual submission of a given job. Rank computation is depicted in Figure 4.

Figure 4 – Rank computation phase
The Resource Broker interacts with the Replica Management services in order to find out the most suitable Computing Element taking into account the Storage Elements where both input data sets are physically stored and output data sets should be staged on completion of job execution.
Before describing the action taken by the RB upon reception of a JDL where both data-access and computing requirements are present, it is worth to recall the JDL attributes which represent a data requirement at the RB side: OutputSE, InputData and DataAccessProtocol, respectively representing the Storage Element (SE) where the output file should be staged, the input files (LFN, PFN) required as input for the actual job execution and the protocol “spoken” by the application to access such files.
The main two phases of the match making algorithm performed by the RB remain unchanged, but the RB executes the requirements check and ranking for each class of CEs satisfying the data-access requirements. Additionally, the RB performs a pre-match processing to find out and classify those CEs satisfying both data-access and user authorisation requirements.
During the pre-match processing phase the RB contacts the RC (the one specified through the ReplicaCatalog JDL attribute) in order to resolve logical file names and collect all the information about SEs containing at least one input data file. This information will be used to write down the broker-info-file, which will be sent to the JSS for the actual submission within the input sandbox.
At this point the RB is ready to start the CEs classification procedure, during which the RB contacts the II in order to find the CEs satisfying both the authorization requirements and having the OutputSE within its own LAN (CloseSE). Using the information retrieved during the file name resolution, the RB classifies those CEs depending on the number of input files stored in storage element(s) which is (are) close to the CE itself and speak at least one of the protocols specified in the DataAccessProtocol within the JDL.

Figure 5 - CEs classification procedure
Upon completion of the CE data classification, the RB is ready for the actual match making and starts the requirements checking phase for each CE belonging to the first non-empty class of CEs, which can access the highest number of distinguished files. If a CE doesn’t satisfy the user requirements it is removed from its class. The requirements checking phase is repeated until at least a CE matching the user requirement is found.
Once the requirements checking phase is completed either the RB knows a set of CEs satisfying both data-access and computing requirements having access to the maximum number of distinguished input files, or there does not exist a suitable CE matching such requirements. In the first case the RB starts the ranking phase in order to find the best CE to whichsubmit the job. In the second one the RB sends an e-mail notification at the user’s recipient specified in the JDL.

Figure 6 - Match-Making algorithm
The following Table 2 reports for each daemon process needed by the WMS to accomplish its tasks the user identifier under which it has to be run. The installation of the profile rpms and the use (as root) of the scripts they install in /etc/rc.d/init.d for starting WMS components is strongly recommended in order to comply with this table.
|
Service Name |
User Name |
|
mysqld |
mysql |
|
postgres |
postgres |
|
condor_gridmanager |
dguser |
|
condor_master |
dguser |
|
condor_schedd |
dguser |
|
jssparser |
dguser |
|
jssserver |
dguser |
|
RBserver |
dguser |
|
interlogger |
root |
|
dglogd |
root |
|
bkserver |
root |
|
ileventd |
root |
|
slapd |
root |
Table 2 Process/User mapping