Fraud Blocker

Cloud systems are worldwide leaders in designing, manufacturing, and selling IT related services. They provide a broad line of products for transporting transactional and analytical data, voice, video across wide distances.

Cloud production operations team spans multiple product offerings and sites within its Business Groups. They provide industry-leading support and control for the distributed enterprises.

The team is responsible for managing the site’s performance and capacity, quickly finding the root cause of issues and fixing problems fast to ensure uninterrupted services.

Challenges

Cloud production operations handles hundreds of database performance issues per day. The team’s main challenge was actioning the thousands of alerts generated annually by their existing database monitoring tools. Critical time and resources were being wasted, as over 90% of these were false positives. The sheer volume of data processed across dozens of global locations raised operational and business challenges for the team. With thousands of virtual servers and serverless services serving more than 150 applications in total, organization required a scalable database performance management solution able to support its complex infrastructure.

 

At the same time, the organization has witnessed a constant and significant increase in users year over year. Constantly adding cloud resources in order to meet this demand was not sustainable. Company needed a way to scale effectively and manage growing activity, while ensuring SLA-compliant execution.

Solution

Cloud Production Operations required a robust, next-generation Database Performance Management solution able to cope with the scale of its database infrastructure and high volumes of daily activity.

The organization first engaged in May 2018, deciding to run a proof-of-concept across its database systems, using in a performance environment to spot issues.

Technical operations manager, Cloud Production Operations Group, said, “From our initial pilot, it was clear that Enteros was able to provide functionality we required to handle growing volume of database performance incidents. We evaluated other products on the market, but for us, Enteros provided true database problem visibility combined with the ease of deployment. Enteros was best positioned to enable us to scale for future growth.”

After a successful proof-of-concept, was rolled out across the organization’s data centers — deploying a hundreds of agents — in just number of days, and all reporting into a central SaaS controller.

Benefits: Increased problem visibility and speed of performance analysis

Previously, organization received thousands of alerts a year from standard monitoring tools, of which only few hundreds were genuine early alerts that provided enough time of a remedial action, while in many cases alerts were too late or alerts were false-positives. Through implementing Enteros, organization has been able to dramatically reduce false positives.

“Prior to Enteros, our production operations center team was working overtime looking in the wrong places, at the wrong problems. Even if the NOC team only spent an average of five to ten minutes looking at each erroneous alert, that amounts to many thousands man-hours every year of effort that could be better spent either working on critical issues or automating critical tasks. Since introducing the Enteros platform, we have been able to identify problems we weren’t even looking for. Enteros has enabled us to move towards data-driven database performance problem troubleshooting rather than generic “best practices” or reactive “fire-fighting”. The solution gives us visibility when we need it and the database application intelligence to know when things aren’t functioning optimally” .

Since introducing , organization has identified unique slow business transactions affecting services, as well as addressing multiple configuration errors. For example, group manager explained, “We spotted a configuration issue which meant huge number of requests (millions) were generated due to the bad pieces of code. With Enteros, it took just five minutes to find and fix this issue. We were not even looking for this problem and realized hor resources consuming it was when it was identified.” Among other benefits, this has resulted in significant decrease of CPU and memory utilization across the entire platform.

“Enteros provides a common language between operations, development, business analysis, and data scientists. Introducing the Enteros platform has helped enable us to move towards a better DevOps model, which in turn had a positive impact on employee collaboration,”.

“The better visibility has made employees more empowered to achieve problem resolution and enabled us to improve business outcomes.

“Because we release application changes frequently, for us, it was critical to have database performance management in production. Enteros gives us great visibility into what is happening, helping us to significantly reduce the number of escalations and to deliver SLA service to our customers.”

Future plans

Organization sees production database problems runbook automation as a critical part of the monitoring solution

“The goal is to minimize alerts that humans have to interact with directly, ensuring database performance issues are automatically classified and routed to operations and proper development teams” – said production operations team manager.

🎉 Thank you for subscribing!

You're now on the list for database FinOps strategies and performance insights.