The Situation:
The financial services customer was facing persistent disruptions to critical client-facing IT services, hindering the development of new business. Their existing complex architecture and infrastructure posed significant challenges in troubleshooting, exacerbated by the sheer volume of data sources. A lack of data consolidation resulted in multiple teams being unnecessarily engaged in investigations and handovers which was impacting efficiency, and the absence of consolidated business views was affecting end-to-end visibility of overall health summaries of critical services and their dependencies.
The Process:
AC3 initially focused on observability for one specific business portal service to ensure a robust foundation that conformed to best practices, and adopted reusable components and repeatable processes. Leveraging Splunk Core, IT Service Intelligence (ITSI), and Splunk Observability Cloud (O11y), the team incorporated the unique strengths of each offering to form a holistic and proactive solution. Splunk Core has the capability to develop tailored and highly customisable dashboards from business-wide data sources, including O11y, as well as advanced log analysis. ITSI facilitates conceptual entity visualisations and business-oriented impact analysis with high-level health summaries for both technical and non-technical audiences, while Splunk O11y enables powerful real-time monitoring of IT infrastructure and applications, accelerated root cause analysis, and highly effective alerting.
"Partnering with AC3 has been transformative for our IT services. They tackled our challenges head-on, developing a proactive and efficient solution while prioritising our organisational objectives."
Key Challenges:
- The Customer was already ingesting significant volumes of metric data for Infrastructure and Application Performance Monitoring, but lacked consolidation of trace data and standardised instrumentation processes.
- The customer had commenced implementing RUM and Synthetic Monitoring tests, however, they required significant refinement and standardisation.
- Formal implementation of Alert Management in O11y was crucial to reduce alert fatigue and move toward a more proactive troubleshooting model.
The Solution:
AC3 implemented a comprehensive solution using Splunk Core, ITSI, and Splunk O11y. For end-to-end monitoring, a dashboard in Splunk Core consolidated the Customer’s operational workflow and addressed instrumentation gaps. we also delivered an ITSI service model that visualised the conceptual components and dependencies of the Customer’s portal service, incorporating KPIs from synthetic monitoring, error rates, and logged errors. This service model was designed to be consistent with other portals in the same technology stack, to significantly reduce future effort and implementation time. Utilising Splunk O11y, the team was able to work with the customer to consolidate trace data across their entire technology stack and standardise the instrumentation process to scale data collection. Alert management was implemented, establishing a formal framework, and intelligent alert logic capabilities, such as anomaly detection and adaptive thresholding, were leveraged where applicable. The consequent solution adhered to best practices, promoting standardised and reusable templates for future implementations.
"The innovative approach in integrating Splunk solutions not only resolved our immediate challenges, but it has laid the groundwork for a scalable observability framework. The seamless implementation of the tailored solution has significantly improved our troubleshooting processes and provided a solid foundation for future operational growth."
The Outcome:
AC3 delivered a tailored and efficient solution that resulted in improved observability and troubleshooting efficiency. The Customer experienced enhanced visibility into critical IT services, with optimised alert management and reduced Mean Time to Resolution (MTTR). The integrated solution of Splunk Core, ITSI, and Splunk O11y not only addressed the Customer’s immediate concerns, but also laid a solid foundation for future scalability and standardisation across diverse portal services. The Customer is now exploring the more advanced log analysis capabilities of Splunk Core to further accelerate operational efficiencies and excellence.