Northgrid Status Alessandra Forti Gridpp21 Swansea 4 September 2008.

13
Northgrid Status Alessandra Forti Gridpp21 Swansea 4 September 2008

Transcript of Northgrid Status Alessandra Forti Gridpp21 Swansea 4 September 2008.

Page 1: Northgrid Status Alessandra Forti Gridpp21 Swansea 4 September 2008.

Northgrid Status

Alessandra FortiGridpp21 Swansea4 September 2008

Page 2: Northgrid Status Alessandra Forti Gridpp21 Swansea 4 September 2008.

Layout

• General status• General news• Site news• VOMS and sysadmin repos• Conclusions

Page 3: Northgrid Status Alessandra Forti Gridpp21 Swansea 4 September 2008.

General Status (1)

96%2.813300DPMyesyesSL4Glite3.1Sheffield

93%36.21622160dcache/DPMyesyesSL4Glite3.1

Manchester

91%2.113592Dcache -> DPMyesyesSL4Glite3.1

Liverpool

90%39.680476.2DPMyesyesSL4Glite3.1Lancaster

Average availabili

ty

Used Storage(TB)

Storage (TB)

CPU (kSI2K)

SRM brand

Space Tokens

SRM2.2OS

MiddlewareSite

Page 4: Northgrid Status Alessandra Forti Gridpp21 Swansea 4 September 2008.

General Status (2)

Page 5: Northgrid Status Alessandra Forti Gridpp21 Swansea 4 September 2008.

General Status (3)

Page 6: Northgrid Status Alessandra Forti Gridpp21 Swansea 4 September 2008.

General news

• Manpower changes:– Liverpool:

• Gridpp post will start last week of September. .75 FTE for 3 years has been converted to 1 FTE for 2 years.

– Manchester: • EGEE Deputy coordinator will start on the 1 of November.

• Technical Board Meetings: – Increased frequency from 1 per quarter to 1 per

month.• Northgrid and atlas

– It seems it’s the only UK region to supply people for ATLAS shifts.

– Good level of Atlas production• NorthGrid VO used by local groups in Manchester

Page 7: Northgrid Status Alessandra Forti Gridpp21 Swansea 4 September 2008.

Lancaster news

• Not much to report (it seems!)• All the data have been moved from dcache to DPM

and dcache has been decomissioned.– There wasn’t much to move

• There have been few problems with power cuts.• New cluster with 126 jobs slots and 100 TB storage

is on the way– There have been some delays– Old cluster will remain. Setting up two CEs.

• Had recently problems with accounting generated by an update of tomcat– Needed to be removed and reinstalled

• Most of the errors reported for Lancaster in the monitoring pages are due to external sources.– They should be flagged directly in the monitoring system.

Page 8: Northgrid Status Alessandra Forti Gridpp21 Swansea 4 September 2008.

Liverpool News

• dcache grievances:– Ease of dCache maintenance is a big issue; the initial installation was

painful and every single update we've done since has broken something. dCache is just way too complicated for what we need from an SE and we don't have the time or manpower to justify it.

• Moving from dcache to DPM– A test DPM instance has been installed already waiting for the new

hardware to arrive to complete the operation. – 54 TB should be added in the near future

• Working to use the University cluster • Minimum availability 83% due to glite/dcache upgrade,

network configuration problems and university DNS server• Had also some problems with SAM tests due to university

firewall. – Difficult to remove a service from SAM tests once inserted in the GOCDB.

Procedure is contorted.

Page 9: Northgrid Status Alessandra Forti Gridpp21 Swansea 4 September 2008.

Manchester News

• Dcache upgrade grievances– Resilience manager didn’t start anymore– Max number of job before it started to time out was only 200

• Problems eventually resolved thanks to some serious digging from the developers who got direct access to the system– Turns out that a static parameter hadn’t been changed in the

configuration files for the resilient manager• Resilience is incompatible with space tokens anyway• DPM instance with 6TB installed for Atlas production

– Eventually new storage will be added to DPM • DPM will be dedicated to atlas

– Dcache on WN for all the other VOs that don’t have as many requirements• ATLAS split Manchester in two sites in their configuration

– This massively improved the efficiency in production• Minimum availability 79% due to dcache upgrade and

collateral problems.

Page 10: Northgrid Status Alessandra Forti Gridpp21 Swansea 4 September 2008.

Sheffield news

• Problems with university DNS• Bought new hardware for the services (SE, CE and

Mon box). – Spent July tuning them and this has affected the

availability (still 90%)• Already increased storage space to 13 TB

– This is online• Further 16 TB are on the way.

– Hardware is there but the fan are missing• CPU increased from 182 to 300 kSI2k• Very good productivity for atlas.• Availability was never below 90% in the past 6

months.

Page 11: Northgrid Status Alessandra Forti Gridpp21 Swansea 4 September 2008.

VOMS and Repos

• VOMS –skipcacheck option has been enabled on the GridPP VOMS.– Should avoid future problems to users with CA rollover

• Sergey is also testing new VOMS version

• New YUM repository has been enabled on the www.sysadmin.hep.ac.uk (other face of www.gridpp.ac.uk): egee-SA1– EGEE-SA1 is now distributing system monitoring and management tools

(following on from the WLCG monitoring WG work with Nagios). There is asingle repository for this (monitoring clients+servers, messaging clients+servers). This will eventually also included user-donated system management tools (e.g. FTSMon, WMSMon) that are approved by the EGEE Operations Automation Team.

• Manchester people using also the UKI-NORTHGRID-MAN-HEP svn repository.

Page 12: Northgrid Status Alessandra Forti Gridpp21 Swansea 4 September 2008.

Conclusions

• Storage looking good– All the sites have SRM 2.2 and space tokens

enabled– dcache relegated to a lesser role (or completely

eliminated) should increase stability– All sites are bidding for additional storage or

already have bought it– Manchester numerous problems with dcache and

atlas way of representing it have been solved.

• The sites are really active in Atlas and level of productivity is high

• Just in time

Page 13: Northgrid Status Alessandra Forti Gridpp21 Swansea 4 September 2008.

Additional slide

region March April May June July August total Percent

LondonT2 374,992 188,528 247,166 687,534 709,447 842,656 3,050,323 20.22%

NorthGrid 682,418 968,329 602,501 552,569 547,586 426,727 3,780,130 25.05%

ScotGrid 201,081 215,512 84,543 501,927 228,452 353,552 1,585,067 10.51%

SouthGrid 654,322 583,317 414,119 404,081 330,235 477,366 2,863,440 18.98%

Tier1A 447,145 585,081 571,793 891,354 228,291 113,531 2,837,195 18.80%