A Computational and Data Challenge for Future INFN Experiments; a GRID approach

OUTLINE OF THE INFN-GRID Project

V65.0102 27th April 11 maggio 2000

This document presents the outcome of many meetings of the INFN-GRID project, of the INFN LHC collaborations, Virgo and APE and has been prepared by

M. Mazzucato Coordinator

F.Ruggieri INFN representative in the European project DATAGRID

A.Ghiselli CNAF, technical parts

A.Masoni INFN ALICE Computing Coordinator

L.Perini INFN ATLAS Computing Coordinator

P.Capiluppi INFN CMS Computing Coordinator

D.Galli INFN LHC-B Computing Coordinator

F.Ricci Virgo Computing Coordinator

E.Onofri, F.Rapuano APE

 

Abstract

 

This document describes the INFN Project for the future Experiment Computing and data challenge. The project is strongly related to GRID concepts to be developed in the near future either in a common EU proposal (in preparation) or in a standalone INFN program. Coordination within the between the European HEP World in Europe and and the other World Countries involved in future Experiments will be pursued.

The document is dedicated to the specific Italian (INFN) activities and plans, in order to test, implement and commit the resources needed for coming Experiments' data Analysis and Processing.

 

 

 

 

Index

    1. Objectives
    2. Relationship with EU-GRID
    3. The LHC Collaboration and the GRID Projects
    4. INFN-GRID additions to the tools and services developed in EU-GRID
    5. The role of INFN-GRID in the INFN Computing
    6. The role of INFN-GRID for the Computing of LHC Experiments
    7. LHC Regional Center Prototypes, activities foreseen for 2001-2003 and resources needed
    8. Computing for VIRGO and resources needed
    9. Computing for APE and resources needed
    10. INFN-GRID management structure

 

 

Sites and Authors (Preliminary)

BARI: Marcello Castellano, Maria D’Amato, Domenico Di Bari, Rosanna Fini, Alfredo Loconsole, Giorgio Maggi, Emanuele Magno, Vito Manzari, Sergio Natali, Giacomo Piscitelli, Lucia Silvestris, Giuseppe Zito.

BOLOGNA: F. Anselmo, G. Cara Romeo, Marisa Luvisetto, Paolo Capiluppi, Franco Semeria, Claudio Grandi, Umberto Marconi, Domenico Galli, Paolo Mazzanti, GianPiero Siroli.

CAGLIARI: Alessandro De Falco, Alberto Masoni, Giovanna Puddu, Gianluca Usai, Antonio Silvestri

CATANIA: Franco Barbanera, Roberto Barbera, Patrizia Belluomo, Ernesto Cangiano, Enrico Commis, Aurelio La Corte, Lucia Lo Bello, Armando Palmeri, Carlo Rocca, Vladimiro Sassone, Orazio Tomarchio, Santo Vanadia, Lorenzo Vita, Salvatore Cavalieri, Salvatore Monforte, Orazio Mirabella, Salvatore Costa, Alessia Tricomi.

CNAF: Federico Ruggieri, Antonia Ghiselli, Cristina Vistoli, Luca Dell’Agnello, Roberto Cucchi, Giulia Vita Finzi, Luigi Fonti, Andrea Chierici, Tiziana Ferrari, Pietro Matteuzzi

FIRENZE: Raffaello D’Alessandro, Marco Pieri, Leonardo Bellucci, Michela Lenzi, Leonardo Fabbroni, Piero Dominici (Urbino)

GENOVA: Alessandro Brunengo, Mauro Dameri, Bianca Osculati, Carlo Maria Becchi.

L. N. FRASCATI: Halina Bilokon, Vitaliano Chiarella, Elisabetta Pace, Agnese Martini.

L. N. LEGNARO: Gaetano Maron, Luciano Berti, Massimo Biasotto, Michele Gulmini, Nicola Toniolo, Luigi Vannucci.

LECCE: Giovanni Aloisio, Massimo Cafaro, Enrico M.V. Fasanelli, Edoardo Gorini, Cataldi Gabriella, Martello Daniele, Surdo Antonio, Franco Tommasi, Salvatore Campeggio, Lucio Depaolis.

MILANO: Laura Perini, Francesco Prelz, Giuseppe Lo Biondo, Silvia Resconi, G. Battistoni

NAPOLI: Alessandra Doria, Domenico Della Volpe, Paolo Mastroserio, Gianpaolo Carlino, Leonardo Merola, Fabio Garufi, Fabrizio Barone, Ketevan Qipani

PADOVA: Mirco Mazzucato, Michele Michelotto, Massimo Sgaravatto, Marco Bellato, Sandro Ventura, Ivano Lippi, Ugo Gasparini, Paolo Ronchese, Maurizio Morando.

PARMA: Enrico Onofri

PAVIA: Claudio Conta, Giacomo Polesello, Adele Rimoldi, Valerio Vercesi

PERUGIA: Leonello Servoli, Paolo Lariccia, Maurizio Biasini, Michele Punturo, Ciro Cattuto

PISA: Davide Costanzo, Giuseppe Bagliesi, Andrea Sciaba’, Alessandro Giassi, Vitaliano Ciulli, Zhen Xie, Flavia Donno, Silvia Arezzini, Fabrizio Palla, Raffaele Tripiccione, Staefano Cortese (Cascina)

ROMA: Luciano Maria Barone, Giovanni Organtini, Lamberto Luminari, Enzo Valente, Alessandro De Salvo, Speranza Falciano, Francesco Marzano, Alessandro Nisati, Riccardo Paramatti, Davide Rossetti, Federico Rapuano, Nicola Cabibbo, Fulvio Ricci, Cristiano Palomba, Sergio Frasca.

ROMA II: Paolo Camarri, Anna Di Ciaccio.

ROMA III: Ada Farilla, Cristian Stanescu.

SALERNO: M. Guida, T. Virgili, A. Seganti, D. Vicinanza, G. Grella, C, D’ Apolito

TORINO: Luciano Gaido, Mauro Gallio, Massimo Masera, Luciano Ramello, Enrico Scomparin, Ada Solano, M. Sitta, Werrborouck.

 

    1. Objectives

High Energy Physics Experiments have always requested state of the art computing facilities to efficiently perform the analysis of large data samples. The previous generation of experiments for the electron positron collider, LEP, has proven the effectiveness of the computing farms based on commodity components in providing low cost solutions to the LEP experiments needs and farms of this type are now very popular and distributed in most INFN sites.

The objectives of the INFN GRID project are to develop and deploy for INFN a prototype computational and data GRID capable to efficiently manage and provide effective usage of the large commodity components-based clusters and supercomputers distributed in the INFN nodes of the Italian research network Garr-b.

These geographically distributed resources, so far normally used by a single site, will be integrated using the GRID technology to form a coherent high throughput computing facility transparently accessible to all INFN users.

The INFN national GRID will be integrated with European and worldwide similar infrastructures being established by ongoing parallel activities in all major European countries, in US and Japan.

The scale of the computational, storage and networking capacity of the prototype INFN GRID will be determined by the needs of the LHC experiments. These include the experimental activities for physics, trigger and detector studies and the run of applications at a sufficient scale to test the scalability of the possible computing solutions to very large amount of distributed data (Pbytes), very large amount of CPU’s (thousands) and very large number of users.

The development of the new components of GRID technology will be done by the INFN GRID project, whenever possible, in collaboration with international partners through specific European or international projects. A proposal for the European Union IST program EU-RN2 is currently being submitted, asking for 10 M. EURO funding.

The MONARC Phase-3 project current activities are another important component of the INFN GRID project, as they will provide guidance for the developments of the computing models of the LHC experiments. The INFN GRID will investigate the current ideas for the computing models, based on the MONARC proposed hierarchical architecture of Tier1-TierN Regional Centers. The investigation will be performed using real applications on real Center prototypes, fulfilling at the same time the real computing needs of the experiments.

The project will encourage the diffusion of the GRID technologies in other Italian scientific research sectors (ESA/ESRIN, CNR, Universities), addressing the following points:

 

    1. Relationship with EU-GRID

The INFN-GRID Project is the framework for efficiently connecting:

The current EU-GRID proposal that will be submitted by May the 10th is only requiring EU funds for the middleware development. The "testbed" and "HEP application" aspects are taken care of in two separate Workpackages .These workpackages that however require no EU funds for hardware and are just intended to provide for the work human resources needed for the integration within the collaboration-wide aspects of these activities. Further EU-GRID proposals are currently being considered for supporting the full GRID integration of the HEP applications and the full deployment of the GRID distributed LHC computing system. The INFN GRID will be the natural framework for the Italian participation to any such project, and will provide for the h/w needed and for the extra manpower required for the setting up and running of the local prototypes and physics applications.

 

 

    1. The LHC Collaborations and GRID projects
    2. The LHC collaborations have stated they strongly support the GRID project. They recognize that the function promised by GRID middleware are needed is needed represents a promising technology for deploying the efficient system of distributed computing that they are planning for and which is assumed in their Computing Technical Proposals.

      There is a general consensus, of LHC experiments, on the advantages of a common project where the prototyping of each experiment computing architecture is discussed within a common framework. This concept has been expressed at the Computing Model Panel of the LHC Computing Review. The Italian LHC groups share the same opinion.

      There is also general consensus that all the members of the collaboration will be allowed to use the GRID tools for running the applications of the experiments. Of course all the Italian members of LHC collaborations will be granted full access to the GRID tools, Italian RC prototypes and local GRID infrastructure, regardless if they are subscribing or not to this INFN GRID project.

       

    3. INFN-GRID additions to the tools and services developed in EU-GRID

The INFN-GRID project will be planned in order to fully integrate services and tools developed in EU-GRID in order to meet the requirements of INFN Experiments and to harmonize its activities with them. In addition to that INFN-GRID will develop specific workplans concerning:

Testbed activities will include the workload management for individual analysis tasks, the data management in the distributed Italian scenario, the monitoring of the INFN resources, the standardization of INFN Farms distributed in different INFN sites and mass storage tests suited for the INFN site distributed activities. The network layout is foreseen to be provided by GARR in collaboration with the present project.

 

    1. The role of INFN-GRID in the INFN Computing

The INFN_GRID project will is going to provide the middleware and the testbeds for running the physics applications of the large big experiments which foresee the distributed analysis of very vast huge amounts of data: not only LHC experiments, but also VIRGO and APE fall in this category and are fully participating to the project. Of course all the INFN experiments that are in the conditions of taking benefit of this INFN GRID project are welcome and will be encouraged in joining for the specific activities of their interest: ARGO and COMPASS have already expressed their interest.

The project will also be beneficial for the other INFN computing activities and for Servizi Calcolo. In fact some of the most innovative techniques under evaluation for the GRID middleware were already considered by the INFN projects developed within the framework of Commissione Calcolo (CC). Large synergies will be possible between INFN Servizi di Calcolo and GRID related activities: the internal developments will profit of state of the art technology evaluated and selected by large international collaborators collaborations that include top level computing scientists also in US and Japan.

Example of activities where large synergies are possible include:

 

The role of INFN-GRID for the Computing of LHC Experiments

The needs of the LHC experiments in the Computing field for years 2001-2003 will be almost fully satisfied in the framework of the INFN-GRID project. The successful development of the GRID tools will be verified step by step using the widest possible variety of physics applications of the experiments, which will be run in a production style, aiming both at producing the physics results the experiment want and at using as much as possible the available middleware. Thus in principle all the computing activities of the experiments will be connected to the GRID which will finally provide the infrastructure for the distributed running of all of them.

Only the developments of detector related s/w and of the algorithmic parts of the experiment s/w will stay separate from the GRID project. The development of CORE s/w will instead have many points of contact with the GRID developments.

Whence we propose all the computing h/w of the experiments except the one related to standard s/w development and to on-line on-site applications to be funded in the framework of the INFN_GRID project for the years 2001-2003. Some online activities will also be covered by INFN-GRID (typically remote monitoring).

The activities performed in the framework of the MONARC project for its Phase-3 are also naturally falling within INFN-GRID, given their character of common project of the LHC experiments aimed at the definition of the common aspects of their computing models. The MONARC simulation program could provide a valuable tool for complementing the prototypes in planning for the LHC GRID implementation.

The computing people of the LHC experiments involved in the INFN-GRID project are working at the GRID because since the GRID is the instrument they are using for implementing the computing application of their experiments. ; therefore Therefore their contribution the fraction of their time devoted to the INFN-GRID project is to be counted as fully contained within the fraction of their time devoted to their LHC experiment.

 

 

    1. LHC Regional Center Prototypes, activities foreseen for 2001-2003 and resources needed

The Computing Model that LHC Experiments are developing is based on Monarc studies, as well as on their own evaluations. The key elements of the architecture are based on an hierarchy of computing resources and functionalities (Tier0, Tier1, Tier2, etc.), on an hierarchy of data samples (RAW, ESD, AOD, TAG, DPD) and on an hierarchy of applications (reconstruction, re-processing, selection, analysis) both for real and simulated data.

The hierarchy of applications requires a policy of authorization, authentication and restriction based again on a hierarchy of entities like the "Collaboration", the "Analysis Groups" and the "final users".

The LHC Experiments Computing Design requires a distribution of the applications and of the data among the distributed resources. In this respect the correlation with GRID initiatives is clearly very strong. Grid-like applications are such applications that are run over the wide area network, i.e. include Regional Centres (Tier-n) using several computing and data resources with network links over the wide area network where each of the single computing resources can itself be a network of computers.

Within this framework the Computing Project is seen as a detector component, whose The final goal of the "Computing" detector is to provide the efficient means for the analysis and potential physics discovery to all the LHC Physicists.

In the discussions, within the Computing Model Panel of the LHC Computing Review, all LHC Collaborations agreed that is mandatory to reach, by 2003, a prototyping scale of the order of 10% of the final size foreseen in operation by 2006. To reduce the prototype scale could be a serious risk for the system. The experience has clearly shown that scaling by an order of magnitude can raise unexpected problems with the system incapable to fulfil its requirements. This would represent a disaster for the LHC Physics program.

All Italian LHC groups have therefore as target a prototype size, for 2003, about 10% of the 2006 one. At the same time each group has an activity program, for the next three years, which will involve considerable resources for simulation, reconstruction, and analysis of the simulated data. The prototype will provide these resources.

The motivation for the testbed-planned capacities is therefore twofold:

Moreover the simulation process, involving event generation, reconstruction and analysis will represent a very suitable environment to reproduce the system in its full operation activity.

 

    1. ALICE
    2.  

      Based on the latest figures from ALICE simulation and reconstruction codes, it is estimated that the overall computing needs of ALICE are around 2100 KSPECint95 of CPU and 1600 TB of disk space. The global computing resources foreseen to be installed in Italy are 450 KSI95 of CPU and 400 TB of disk space. In spite of the fact that these numbers are affected by a very large error because neither the code nor the algorithms are in their final form we are confident that the figures presented are not far from the final ones. They are certainly adequate to estimate our computing needs in the next three years.

      These numbers take into account that the contribution to the ALICE computing infrastructure will be shared mainly among France, Germany and Italy, where the Italian collaboration to ALICE in terms of people is close to the sum of France and Germany together.

      The plan is to reach by 2003 a capacity of 45 KSI95 and 40 TB of disk (10% of the final size). These resources will allow performing the prototype tests at a realistic scale and, at the same time, will provide the adequate resources for the simulation and analysis tasks planned by the Italian groups of ALICE in the next three years.

      Two different ALICE projects will be involved in the GRID activity. The first is connected with the Physics Performance Report. This is a complete report on the physics capabilities and objectives of the ALICE detector that will be assessed thanks to a virtual experiment involving the simulation, reconstruction and analysis of a large sample of events. The first milestone for the Physics Performances Report when the data will have to be simulated and reconstructed is due by the end of 2001. This exercise will be repeated with a larger number of events regularly to test the progress of the simulation and reconstruction software. The simulations will be devoted to study the detector Physics performances. Careful studies of the dielectron and the dimuon trigger are necessary for instance to fully understand and optimise their efficiency and rejection power. A large emphasis will be given to interactive distribute data analysis for which special codes will be developed and that it is expected to use a very large spectrum of the services offered by the GRID.

      A second activity is linked with the ALICE Mass Storage project. In this context already two data challenges have been run, and third one is foreseen in the first quarter of 2001 involving quasi-online transmission of raw data to remote centres and remote reconstruction. Other data challenges aiming at higher data rates and more complicated data duplication schemas are planned at the rhythm of one per year.

      Current estimation of the CPU power to simulate a central event are around 2250 KSI95/s and while to reconstruct it 90 KSI95/s are needed. The storage required for a simulated event before digitisation is 2.5 GB and the storage required for a reconstructed event is 4 MB.

      In order to optimise the utilisation of the resources, signal and background events will be simulated separately and then combined, so that to produce a sample of 106 events will require the full simulation of only 103 central events. The reconstruction will be performed on the full data sample. The table below reports the CPU and storage needs to simulate and reconstruct the required number of events. In the table below it is assumed an amount of CPU that, if used for full year, at 100% efficiency, will provide the needed capacity. The corresponding amount of disk space is also indicated.

      ALICE Simulation

      2001

      2002

      2003

      Events

      1.0E+06

      3.0E+06

      7.0E+06

      Needed CPU (SI95)

      3000

      9000

      21000

      Needed disk (TB)

      6.5

      19.5

      45.5

      Adding a 50% factor for the analysis and taking into account a 30% inefficiency related to the availability of the computers, we believe that the prototype scale proposed in the table below is just adequate for the needs of the foreseen simulation activity.

       

      Table 7.1.1 - ALICE's Testbed capacity target

      Capacity targets for the INFN Testbed ALICE (preliminary)

       

      units

      2001

      2002

      2003

      CPU capacity

      SI95

      7,000

      22,000

      45,000

      Disk capacity

      TBytes

      6

      18

      40

      Tape storage – capacity

      TBytes

      12

      36

      80

      Total cost

      ML

      1000

      2020

      3200

      The consumable and manpower costs have not been evaluated yet.

      Each column represents the integral achieved in that year.

    3. ATLAS

The estimations currently being made in the ATLAS collaboration evaluate at ~1000 KSI95 the computing power needed outside the Tier0 in 2006. These estimations are of course affected by a large error, however they are the best ATLAS can provide at this time and are the ones which have been presented at the LHC Computing Review taking place in these days. The secondary storage needs of a Tier1 Regional Center are estimated ~200 TB in 2006.

The Milestones for ATLAS computing include:

The activities of specific interest for the Italian ATLAS community from 2001 include:

The detailed ATLAS milestones 2001-2003 for physics and trigger studies have still to be agreed by the collaboration: the activities of italian groups in 2002-2003 will also depend from these agreements, the participation to MDC2 is however already decided.

The table below provides a first preliminary evaluation of the target capacity to be reached each year by the ATLAS Regional Center prototypes. The breakdown of the CPU acquisitions between the different years could be somewhat revised subject to the final setting of the ATLAS computing Milestones

Table 7.2.1 - ATLAS Testbed capacity target

Capacity targets for the INFN Testbed ATLAS (preliminary)

 

units

2001

2002

2003

CPU capacity

SI95

4,000

4,000

20,000

Disk capacity

TBytes

4

10

20

Tape storage (backups)

TBytes

10

20

40

Cost

Ml

740

990

1890

Each column represents the integral achieved in that year.

The ATLAS baseline model for implementing the Tier1 Regional Center is based on manpower outsourcing from available Consorzi di Calcolo: a first evaluation of the cost of this outsourcing is ~600Ml for the 3 years. The additional cost of housing is estimated ~250Ml for the 3 years. These ~850 Ml are estimated assuming the Tier1 prototypes are implemented dividing the capacity between two different sites: a single site option would allow a saving roughly estimated in ~100Ml/year. The consumable costs (electrical power etc.) for keeping this computing capacity up and running for the full three years is estimated ~350Ml.

The total ATLAS costing for 2001-2003 amounts to ~3 Gl.

 

    1. CMS

CMS has also adopted the twofold motivation of the testbed capacities:

Milestones settled by CMS are related to both the above approaches:

Some of the global CMS Collaboration resources expected by year 2006 are as follow:

The 10% realization by year 2003 of Tier-1 and Tier-2 Centres are in the plans of current INFN-Grid project for CMS Italy. To reach this goal CMS has adopted a slowly growing approach that consists to procure and put in operation some 20% in year 2001, 30% in year 2002 and 50% in year 2003 of the resources necessary to build the 10% final system prototype.

On the other hand, Italian CMS collaboration is strongly involved in the simulation and analysis of the necessary data to define the High Level Trigger algorithms and the Physical studies.

CMS is already exploiting now (2000) the simulation and reconstruction software for the off-line analysis of the Trigger and Physical channels. For the nearest time goals the deep study and optimization of the Trigger algorithms is the driving effort.

The rejection factor that the CMS Trigger has to provide against the background reactions is as large as 106-107. The High level Trigger (level2 and 3) have to provide a rejection factor of the order of 103 in the shortest time possible and with the larger possible efficiency without depressing the searched signal.

The process of simulation and study has a planned schedule that will improve the statistical accuracy during the next three years. The number of events to be simulated increase of an order of magnitude during the three year project program, but the requested resources only increase of a factor two per year, taking into account the better efficiency and the learn management of the resources themselves.

CMS Italy (INFN) is strongly (and in many fields with a leading role) involved in this process of simulation and study, contributing with al least the 20% of the effort (both with computing resources and physics analysis competences).

Both the approaches to the prototyping phase lead to the same amount of resources needed. This scenario has the benefit of going to the prototype of Regional Centres (Tier-1 and Tier-2's) using the resources for "REAL" activities and physical results.

A preliminary deliverable plan for the two approaches can be summarized as follow:

Table 7.3.1 - CMS Prototype deliverables

2001

Tier-n first implementation

–Network connection

–Main centers in the Grid

–HLT studies as of 106-107 events/channel

–Coordination of OO databases

–Network tests on "ad hoc" links and "data mover" tests

–All disk option storage

2002

–Tier1 dimension is doubled

–Tier0 Centre is connected (Network and first functionality's)

–Grid extension including the EU project

 

–Physics studies in preparation of the TDR and 5% mock data challenge for Computing TDR

–OO distributed databases (remote access)

–Performance Tests

–Mass storage studies

2003

Infrastructure completion

–Functionality tests and CMS Integration

–Final studies for Physics TDR and preparation for 2004 mock data challenge

–Mock data challenge

–Full integration with
CMS Analysis and Production

 

A preliminary table with the most demanding resources required building the Grid-like testbed prototypes follows:

Table 7.3.2 - CMS Testbed capacity target

Capacity targets for the INFN Testbed CMS (preliminary)

 

Units

2001

2002

2003

CPU capacity

SI95

8,000

16,000

32,000

Disk capacity

TBytes

20

40

80

Tape library –( backups)

PetaBytes

0.1

0.2

0.3

Total cost

ML

2300

3860

5840

Each column represents the integral achieved in that year.

It should be stressed that Personnel and consumables needs are not included in the previous table, as well as the local support and LAN requirements. The LAN hardware costs are estimated 350 ML.

 

 

    1. LHCb

The current baseline LHCb computing model is based on a distributed multi-tier regional centre model following that described by MONARC. We assume in this scenario that regional centres (Tier 1) have significant cpu resources and a capability to archive and serve large data sets for all production activities, both for analysis of real data and for production and analysis of simulated data. At present we also assume that the production centre will be responsible for all production processing phases i.e. for generation of data, and for the reconstruction and production analysis of these data.

The Italian regional center requirements for supporting user analysis and simulation have been estimated to be of the order of 110KSI95 CPUs and 400 TB of disks. The plan is to reach the order of 10% of the final capacity by 2003, therefore to have 10KSI95 and 10 TB of disks. These resources will allow to perform the prototype tests on a significative size, continue on-going detector and low level trigger optimization, and begin signal and background Monte Carlo event simulations for physical studies.

The table below reports the estimated data volumes and CPU requirements to generate, store and reconstruct a data sample of 106 events in one year.

 

Table 7.4.1 - Capacity targets for the Testbed LHCb

 

units

2001

2002

2003

 

CPU capacity

Si95

2000

2000

10000

Estd. Number of cpus

#

40

40

100

Disk capacity

TBytes

2

5

10

Tape storage capacity

TBytes

5

10

20

WAN link to CERN

Mbits/s

155

622

1000

WAN link between Tier 1

Mbits/s

34

155

1000

WAN links between tier 1 and tier 2

Mbits/s

34

155

155

The preliminary total cost for LHC-B for the year 2001-2003 corresponds to ~1 Glire

 

    1. Computing for VIRGO and resources needed

 

The scientific goal of the Virgo project is the detection of gravitational wave in a frequency bandwidth from few Hertz up to 10 kHz. Virgo is an observatory that have has to operate and take data continuously in time. The data production rate of the instrument is ~ 4 Mbyte/s including the control signals of the interferometers, the data from the environmental monitor system and the main interferometer signals. Raw data will be available for off-line processing from the Cascina archive and from a mirror archive in the computer center of the IN2P3 of Lyon (France). The reconstructed data h(t), plus a reduced data set of the monitoring system, define the so called reduced data set. The production rate of the reduced data is expected of the order of 200 kbyte/s.

The off line analysis cases that are most demanding from the point of view of computational cost and storage are the following:

  1. the search of the binary coalescent signals
  2. the search of the continuous gravitational wave signals

We propose to approach these problems in the framework of the Grid project.

The inspiraling binary search

The search for the inspiraling binary g.w. signal is based on the well established technique of the matched filter.

This method correlates the detector output with a bank of templates chosen to represent possible signals with a sufficient accuracy over the desired range of parameter space.

The computing load is distributed over different processors dealing with different families of template. Then, the output is collected in order to perform the statistical tests and extract the candidates.

The computational cost and the storage requirements are non linear functions of the coordinates of the parameter space so that the optimization of the sharing of the computation resources and the efficiency of the job fragmentation will be the crucial point of the test.

The goal of the test bed is the application of the analysis procedure on the data generated by the Virgo central interferometer. The output of this test is essential in order to implement on line triggers for the binary search signal.

The preliminary target is the search of signals generated by binary star system with a minimum mass of 0,25 Solar Masses at the 91% of Signal to Noise Ratio (SNR).

The pulsar search

The case of pulsar signal search is more demanding for computational resources. The procedure of the signal search is performed by analyzing the data stored in a relational data base of short FFTs derived from the h(t) quantity. The search method is basically hierarchical so that the definition and control of the job granularity is characterized by a higher level of complexity.

The goal of the Grid test bed is to optimize the full analysis procedure on the data generated by the Virgo central interferometer. We plan to limit the search to the frequency interval 10 Hz - 1.25 kHz and the computation will cover 2 month of data taking.

 

      1. Computational resources and links.

Let us now list the needs for the total computational power and storage to be hierarchically distributed in the Virgo INFN sections and laboratories of the collaboration and structured in the classes of computer centers

Tier 0 Virgo- Cascina;

(Tier 1 Lyon – IN2P3)

Tiers 2: Roma 1 and Napoli

Tiers 3 Perugia and Florence

Concerning the network links we notice that

  1. the laboratory in Cascina has to become a node of the network backbone
  2. we need dedicated bandwidths for the transmission of the reduced data set to the Tiers and to perform intensive distributed analysis
  3. the international link should permit the reception and transmission of the reduced set of data to Europe, USA and Japan, and, at least in the future, the transfer of a relevant fraction of the raw data in France.

In conclusion, for test beds dedicated links connecting Virgo-Cascina, Roma1, Napoli, Perugia and Firenze must be foreseen.

Finally, we report in the Table the total capacity needed for the Virgo test beds at the end of 2001 and the final target for the off line analysis of Virgo.

 

 

Table 8.1 - VIRGO Testbed capacity target

Capacity targets for the INFN Testbed VIRGO(preliminary)

units

end 2001

end 2003

CPU capacity

SI95

8,000

80,000

disk capacity

TBytes

10

100

disk I/O rate

GBytes/sec

5

5

sustained data rate

Mbytes/sec

250

250

WAN links to Cascina

Mbits/sec

155

2 ,500

WAN links to labs

Mbits/sec

34

622

WAN links to France

Mbits/sec

622

 

The preliminary total cost for LHC-B for the year 2001-2003 corresponds to ~7 Glire

    1. Computing for APE and resources needed
    2. A test case for the GRID project is the analysis of Lattice QCD data.

      At the end of the year 2000, several APEmille installation will be up and running in several sites in Italy, Germany, France and the UK, for an integrated peak performance of about 1.5 Tflops.

      These machines will be used for numerical simulations of QCD on the lattice, with the physics goals of an accurate analysis of the low-lying hadrons in full QCD, and an investigation of the phenomenology of the weak interaction of heavy quarks systems.

      In the following years, a new generation of machines will gradually become available. A major player in this field will presumably be the new generation APE project, apeNEXT that will give an increase of a factor ten in performance, reaching the level of 5 to 10 Tflops. The coordinated use of several machines running on the same physical problem, will help accumulate large statistics made available to a wide community of users to perform independent investigations.

      We shall than see the trend to have a large computer installation running Lattice QCD code as the analogous of a particle accelerator. In this analogy, the lattice field configurations are equivalent to the collision events,

      collected and analyzed by experiments. The relevant point for GRID is that the amount of data to be stored and

      analyzed is of comparable size as a large experiment. Typical figures for the database size are of 25 TByte in the year 2000-2003 for APEmille up to 1 PByte in the years 2003-2006 for apeNEXT.

      The activities will than be that of operating a storage systems supporting such a data-base, presumably distributed on a small number of sites. A farm of processors will be set up to retrieve configurations from the data-base and process them. This will also mean that a network infrastructure and the middle-ware to allow physicists to perform remote analysis will have to be developed. All these points may represent, within GRID, a Lattice QCD specific test-bed to be set up quite rapidly, based on a couple of APEmille sites with about 5 TByte of data customizing the relevant middle-ware (presumably Globus based) to allow Lattice QCD analysis.

      This preliminary program should heavily leverage on the investment made on GRID-like techniques by our experimental colleagues and concentrate on the specific features of Lattice QCD analysis. The experience gained in this effort would be precious to sustain a smooth analysis of apeNEXT configurations as

      they become available.

    3. INFN-GRID management structure

For the organizational and managerial aspects of the project the general idea is to have

something similar to the well consolidated managerial organization of the HEP experiments.

The following organization can be proposed.

The role of this board is to to endorse Political, Administrative, Organizational and Strategic decisions proposed by the the Executive Board . The Collaboration Board meets about 2 times per year, depending on the activities , and usually after each Collaboration Meeting (see next).

The Executive Board has Political and Administrative responsibilities regarding the normal management of the project. In case of need the Board can also take urgent strategic decisions, subject to later approval by the Collaboration Board.

This Board meets whenever necessary with a minimum of a meeting every 2 months.

We now describe the role and duties of the Key Persons.