KFA71/79 Consolidation to avoid LS3 manpower overload and early learning 

By Christophe Boucly, Luis Feliciano and Gael Bellotto (SY/ABT)

Introduction

The KFA71 kicker system, installed in CERN’s Proton Synchrotron (PS) for beam extraction, has been a fundamental component for over five decades. A step-by-step consolidation plan was initiated in 2020 to ensure its continued performance. This article outlines the progress in modernising the system, addressing key challenges such as terminator stack failures, magnet module improvements, timing synchronisation, and control system upgrades. Furthermore, it provides a comprehensive PROS/CONS analysis of the phased approach, emphasising maintenance and operational criteria. The insights and lessons from this process will serve as valuable guidelines for future large-scale system upgrades at CERN.


Introduction

The KFA71/79 kicker system has been essential to CERN’s Proton Synchrotron’s beam extraction capabilities. As the system ages, a step-by-step consolidation process was initiated in 2020 to ensure its continued reliability, efficiency, and safety. This paper outlines the consolidation progress and explores how early learning from the project has informed strategies for avoiding LS3 manpower overload. Additionally, it analyses the system’s availability and fault data to highlight the challenges faced and the improvements achieved.

Avoiding LS3 Manpower Overload

Between 2018 and 2020, the kicker team experienced significant knowledge loss due to retirements, which led to challenges maintaining, repairing, and upgrading the KFA71 system. With LS3 (Long Shutdown 3) approaching, the team had to work diligently to avoid manpower bottlenecks. Early learnings from the phased consolidation enabled the team to distribute the workload more efficiently and minimise the impact on operational availability.

The team reduced on-site interventions by upgrading the control system, incorporating automated diagnostic tools (such as Python-based analytics for NXCALS data), and developing a custom GUI for real-time kicker monitoring. This allowed operators to adjust kick parameters dynamically and proactively address issues before they escalated, thus avoiding unscheduled downtime. This approach ensured that vital manpower resources could be allocated strategically to address critical areas during LS3.

Early Learnings from the Consolidation

One key early learning was the importance of improving data acquisition and fault analysis to reduce downtime. According to the 2023 system availability data, the overall availability of the KFA71/79 kicker system was 99.987%, with 43 faults recorded over 11.2 hours. These faults primarily stemmed from known issues with the terminator stack, insulation failures in magnet modules, and synchronisation problems. The team’s early focus on upgrading signal acquisition and introducing fast fault detection allowed for real-time adjustments and minimised the impact of these issues.

Additionally, the loss of experienced personnel necessitated focusing on knowledge transfer and documentation. The new team developed a comprehensive plan to consolidate historical knowledge while implementing new training programs to fill expertise gaps. Early consolidation of magnet modules and control systems ensured the team was well-prepared for LS3, avoiding potential overloads during the critical shutdown period.

Efficiency and Availability Trends (2014–2024)

During the initial years (2014–2016), the system appeared to function without significant issues, but this was due to the limitations in fault tracking tools rather than actual performance. At that time, the tools were not fully understood or widely used, leading to underreported faults. The focus during these years was on establishing better tracking mechanisms.

By 2017, fault tracking had improved, and more faults were systematically reported, revealing that the aging system was experiencing increasing failures. This marked the beginning of more accurate fault data collection.

From 2018 to 2022, system efficiency decreased significantly due to two main factors: the equipment was aging, requiring more frequent interventions, and the new team responsible for system maintenance faced a steep learning curve. This period saw an increase in fault durations, such as the notable downtime in 2021, where faults led to over 164 hours of interruptions. This reflected the challenges of maintaining older systems and the team’s gradual adaptation to the complex equipment.

However, in 2023 and 2024, there was a marked improvement. The consolidation efforts enhanced system hardware, while the team’s expertise increased significantly. As a result, the system now operates with fewer faults and reduced fault durations, highlighting the impact of the consolidation project and the improved fault monitoring and tracking tools.

Technical Challenges and Solutions

Terminator Stack and Kick Length Issues

With the step-by-step HV consolidation, the kicker team was confronted with one of the early challenges: the failure of the terminator resistor stack due to excessive currents during short kick lengths, which led to destructive effects. To address this, engineers redesigned the mainland dump switch, improving the materials needed for market study, R&D coherently with the functional specification and cooling mechanisms to withstand the higher current loads and extend the operational life of the stack.


Magnet Modules Upgrades

During the YETS period since 2021, three HV generators with magnet modules associated were upgraded. At the mid-period of upgrading, the challenge seems higher to continue. With the first experience, the team must adapt the consolidation strategy after learning about the Module 3 issue, which experienced a short circuit due to insufficient insulation. This event highlighted the importance of reevaluating the design and insulation of all magnet modules. After addressing the short circuit, improved insulation protocols were implemented across the system, reducing the risk of future malfunctions.

Timing Synchronization

The synchronisation of timing across the 12 modules was a challenge. Variations in cable lengths and hardware aging introduced timing jitters that could lead to operational failures. By introducing the Kicker Internal Timing System (KiTS), the team achieved sub-nanosecond precision in synchronising module delays, ensuring the system could meet the demands of complex beam cycles such as the LHC beam schemes. In 2021, a significant improvement has been made with integrating the Internal Post Mortem Acquisition (IPOC) to ensure the fast acquisition and analysis of the kick waveform from the pickup installed on the resistive terminator after the magnet shape.

Control System Consolidation

Based on outdated RS232 and analogue interfaces, the existing control system was replaced by a recent control architecture with the deployment of the intermediate upgrade solution to improve KFA71 with the CERN Control framework based on Front-End Software Application (FESA) commissioned in 20218, the Kicker Generator Controller (KGC), will be homemade design based of SbRio allowing to increase the signal acquisition, improve the system maintainability and the logging to analyse in the background for the first control specification. The partial consolidation has been passed a step in 2023, introducing a failsafe control to allow the merge to the Slow Control System and Surveillance System (SCSS). This upgrade improved diagnostic capabilities, remote monitoring, and the flexibility needed for real-time adjustments during operations.

New Discharge Protection with AVT

There was a significant safety enhancement during YETS 2023, with the consolidation of the primary capacitor unit allowing the LV discharge to the HV transformer. This improvement was also the implementation of the new safety power charger, including the Absence of Voltage Tester (AVT) system, which ensures the system is fully discharged before maintenance activities begin. This system conforms to modern safety standards and significantly reduces the risk of accidental high-voltage exposure for maintenance personnel​.

PROS/CONS Analysis of Step-by-Step Consolidation

PROS:

  • Minimised Downtime:

    The step-by-step approach allowed the kicker system to operate continuously during each consolidation phase. This ensured that critical experiments at CERN could continue uninterrupted while the system was being upgraded.

    • Incremental Risk Management

    By addressing one module or subsystem at a time, the team could test and validate each upgrade in a controlled manner. This approach helped identify unexpected issues, such as the short circuit in Module 3, before they became system-wide problems.

    • Adaptive Upgrades

    The phased method provided the flexibility to adjust project scope and focus based on real-time findings. For instance, after discovering issues with terminator stacks, the team could immediately prioritise redesigning the mainland dump switch.

    • Improved Safety

    Introducing modern safety systems, such as AVT discharge protection, at each phase of the project enhanced the working environment for maintenance personnel throughout the consolidation.

    CONS :

    • Extended Project Duration

    The phased approach extended the project timeline, pushing the completion of the final consolidation to 2027. Each phase required design, testing, and validation time, which delayed the full system.

    • Interim Failures

    During the phased consolidation, specific components continued to operate using outdated systems, leading to occasional failures, such as recurring synchronisation issues, before the timing system was fully upgraded

    • Operational Complexity

    Managing a mix of old and new systems during the transition phases introduced operational complexity. Maintenance and operational teams had to navigate between outdated systems and newly consolidated modules, increasing potential errors​.

    Maintenance and Operational Criteria

    The consolidation required establishing new maintenance and operational protocols to ensure smooth transitions between phases.

    The base of Key criteria include:

    • Regular Diagnostics

    Each phase introduced new diagnostic tools, such as the IPOC and Analogue Fanout integrator systems, which require regular calibration and monitoring to detect faults and ensure optimal performance​

    • Modular Inspections:

    These inspections focused on individual modules, emphasising checking insulation, cooling systems, and synchronisation settings to prevent failures similar to the short circuit in Module 3​ in 2023.

    • Safety Protocols:

    The AVT discharge protection system was integrated into all operational and maintenance workflows to ensure safe interventions when working with high-voltage components​.

    Availability and efficiency

    Consolidating the HV generator and installing new pickup systems improved signal acquisition and the quality of the collected data, enabling the development of advanced tools to analyze system availability and efficiency. Python-based tools were developed, utilising CERN’s NXCALS data platform to monitor kicker performance continuously. Combined with a custom GUI built using the CERN GUIs, these tools provide real-time insights into kicker waveform performance and timing distribution analysis.

    These tools enhance the system’s availability and operational resilience, allowing the kicker system to operate with greater reliability and minimal downtime.

    Conclusion

    The step-by-step consolidation of the KFA71 system has presented technical and logistical challenges, but it has successfully allowed CERN to maintain critical operations while modernising an aging infrastructure. The phased approach revealed several key weaknesses, including the terminator stack and magnet module issues, addressed through targeted upgrades. Implementing new timing and control systems, alongside a proactive fault tracking mechanism, significantly improved system reliability, performance, and overall availability.

    Between 2014 and 2016, fault-tracking tools were limited, leading to underreported issues. As fault tracking became more systematic from 2017 onwards, the system’s performance degradation became evident, particularly as the equipment aged. The period from 2018 to 2022 saw a decline in system efficiency due to knowledge acquisition challenges the new maintenance team faced. During this time, fault counts and durations increased, most notably in 2021, when the system experienced major downtimes due to extended faults, which exposed vulnerabilities in the ageing infrastructure and the lack of experienced manpower.

    However, as the team gained experience and the consolidation efforts took effect, the system showed marked improvement by 2023 and 2024. Enhanced diagnostics, real-time monitoring tools, and refined fault detection.