GROUP A - Problem Management

July 11, 2023

As service disruptions rise in organizations, resulting in loss or potential loss of the availability or performance of services, the task of detecting, reporting, analyzing, tracking and correcting the service disruptions falls to the Problem Management team.

 

The Problem Management process is both a reactive process that is initiated at the conclusion of a Major Incident or a proactive process that is executed after analysis of the environment.

 

The Ultimate goal of the problem management process is to prevent problems and theresulting incidents from occurring. For those incidents that cannot be avoided,it is designed to minimize the impact. The process manages the lifecycle of the problems to ensure it is being executed as expected.

 

  • Ensure Service Level Agreements are achieved
  • Reduce the number of service disruptions
  • Minimize problem life cycles
  • Own and administer the problem management processes and documentation.
  • Review problems on a regular basis and provide regular status via Problem Management meeting
  • Ensuring Support Team attendance at all required meetings and action is taking in a timely manner.
  • Provide trend analysis and develop action plans to resolve problems. Review findings with Team.
  • Provide a set of standard problem reports to the Management Team.
  • Update the knowledge base as needed.
  • Updating information in problem records as required(Activity Updates)
  • Preparing and submitting necessary Requests for Change
  • Auditing all internal information and recovery steps to ensure consistency, accuracy, and proper documentation
  • Owning accountability for any internal communication required
  • Ensuring process compliance
  • Acting as the liaison between Problem Management and the business
  • Acting as the liaison between vendors, service desk,service partners, operations managers, divisional IT, and other customer integration points
  • Acting as an escalation point
  • Conduct Root Cause Analysis

 

The objective of the process should be as follows:

 

The Objective of the Problem Management process is to contribute to the timely identification and closure of Problems. The Problem Coordinator will work to ensure that all meetings are attended, root cause is found in a timely manner,and actions are taken to prevent the problems from reoccurring. 

Sample list of benefits:

 

  • The process should focus on the most efficient steps to take to coordinate root cause identification efforts with the Enterprise Problem Management Team.
  • The Process should allow for the collection of Problem data and facilitate and encourages the thorough analysis of the data to identify the risks and reoccurring problems.
  • The Process should encourage proper documentation of all relevant information in the ticketing system.
  • The Process should encourage the right level of quality in all the documentation.
  • The Process should encourage the proper use of the ticketing tool.
  • The Process should encourage the proper integration with the Enterprise Problem Management Team.

 

Sample list of observations:

 

  • Not all problems are captured.
  • Problem metrics are not accurate.
  • Problem Management training not provided.
  • Roles and responsibilities related to problem management not defined or shared.
  • PM team does not receive performance feedback
  • Time is not allocated to do effective RCAs. Rushed job.
  • Unaware of Problem Management SLAs/metrics
  • Unaware who collects and provides PM Metrics

 

Sample list of recommendations:

 

  • Develop the Problem Manager/Coordinator Roles and Responsibilities and deliver training to primary and secondary resources.
  • Document all procedures related to Problem Management And provide training to primary and secondary resources
  • Document all meetings related to Problem Management(Internal and External), ensure team is made aware of PM performance.
  • Document all tools needed to perform the PM role and provide documentation and training
  • Develop Problem Management Service Level and Operational reports and metrics and ensure team is made aware of the performance. Use standard templates to standard the measurement criteria.

Assessment Questions:

  • At the conclusion of an MI, is an outage review conducted so that all the relevant information can be handed to the Problem Manager?
  • Is the root cause (RCA) completed within the timeline and with the expected level of quality?  Are SLAs being met? What is the RCA acceptance rate? Are there any client escalations/complaints regarding the timelines and quality?
  • Are the  Problem Managers progressing the problem records, including tracking and closing problem tasks? Are there any challenges with the delivery teams participating in the problem calls or working the problem records? Do the delivery teams have the skill to work the problem records? Are there any challenges with the skill of the problem managers in managing the teams to meet the requirements?
  • Are teams opening problem records for repeat/chronic issues? Of the total problem records opened, how many are proactive and how many are reactive?
  • Are problem records opened for failed changes or changes implemented with problems?
  • Do you obtain customer concurrence before closing problem records?
  • Is  Management ensuring Problem Tickets are being managed, i.e., Meetings, time to conduct root cause, actions of root cause being implemented, all problem owners joining the meetings?
  • How often is performance feedback provided to the Problem Management Team members and those who provide input to and take the output from the Problem Management team?
  • Is the Change/Release  Management process functioning as designed? Are there any issues? Is the process causing MIs?
  • Is the Event  Management process functioning as designed? Are there any issues? Are any of the outages taking place because people have not responded to the events that have been generated alerting teams of impending issues?
  • Is the  Monitoring & Alerting process functioning as designed? Are there any issues? For the application/service/infrastructure components that are failing, were they monitored and were the thresholds appropriately set?
  • Is the  Performance & Capacity Management process functioning as designed? Are there any issues? For the application/service/infrastructure components that are failing, were the performance and capacity requirements being monitored to ensure that there were sufficient resources to meet the demand?
  • Is the  Firmware/Patch/Currency Management process functioning as designed? Are there any issues? For the application/service/infrastructure components that are failing, were they failing due to firmware or patch levels not being at the recommended levels? Were they failing due to the age/EOL status of the systems?
  • Is the  Business Controls Management process functioning as designed? Are there any issues? For the application/service/infrastructure components that are failing, were they failing due to unauthorized access issues?
  • Do you meet with management to provide a monthly status update regarding Problems? Document action plans to resolve identified trends and improve Problem Service levels  (2-4 hrs/Friday for analysis)?
  • Are there any issues with team attendance at RCA Meetings?
  • Do you meet with other teams to review the status of problems and share observations or lessons learned?
  • Please  provide any other observations/comments you may have regarding the Problem  Management process
  • Please  provide any other observations/comments you may have regarding the Problem  Management process
  • Are you given time to diagnose the problem and implement solutions?
  • Are you pressured by a business or senior management?
  • Do you receive problem tickets??
  • When  Problems occur, are the right people notified? IT Management, Business  Owners, etc.?
  • Are there any issues with RCA Action Plans?
  • Is the  Outage Review document used as a basis to start the RCA process? If not, what is the basis for a problem record?
  • Do you have any issues with the ticket records related to the Problems? How is the quality, is sufficient detail there?
  • Does the tool allow you to identify and log the Problem, provide full details of the Problem description, and classify the Problem with ease?
  • Do you know or have access to the topology of all CIs in the environment? If there is a  product outage do you know what Applications, Middleware, Databases, Storage,  Servers and Network components support that Product?
  • Are the  Problem SLAs tracked and shared with the teams?
  • Are there  Problem reports stored in a central location? Who has access to them? Who uses them?
  • What are the  Problem SLAs? Do you know what they are?
  • What are the top 10 drivers for Problems for the current and previous year?
  • What are the top 10 teams who generate Problems for the current and previous year?
  • What is the average MTTR for the current year and previous year by month?
  • What is the average resolution time for the Problems for the current and previous year?
  • What is the average response time for the Problems for the current and previous year?
  • What is the volume of records for the current year and previous year by month?
  • For any missed Problem SLAs, are there SLA reviews taking place with action plans?
  • How often are the Problem SLAs reviewed with the team?
  • Are there  Problem Management SLAs?
  • What are the  Problem Volumes by month?
  • What are the top 10 groups generating Problem Records?
  • What are the top 10 RCAs?
  • In the last year, how many RCA actions have been taken? How many are still open?
  • What quality management frameworks do you use on your team to improve the Problem Management process? DPP? PBA?
  • Do you partner with the Quality Management Coordinator – Ticket Quality Management to review a random sample of Problems and assess for quality?
  • Do you provide feedback to the Service Desk and other teams on areas of improvement?
  • Do you Share findings/observations with the Communications Coordinator via emails, newsletters or team meetings on trends and improvement areas?
  • Does anyone  on the team conduct any trend analysis of the Problems?
  • Does anyone  on your team review the Problems on a regular basis for quality, accuracy,  and completeness of information?

Signup to read full articles

Ready to listen to what your data is telling you?

Book A Consultation

Subscribe to our Newsletter

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.