At-a-Glance: Ensure oilfield equipment reliability by combining criticality-based maintenance (RCM), tight operating-envelope control, condition-based monitoring, disciplined work management, and spares/QA rigor. Track availability, MTBF/MTTR, and “bad actor” elimination to drive uptime and OPEX down.
I. Objective Definition and Key KPIs
- I.1 Objective: Maximize safe, continuous throughput by preventing functional failures of critical equipment across drilling, production, injection, power, and utility systems.
- I.2 Primary KPIs:
- Facility availability (A): target = 97.0% (estimated); drilling rig technical uptime = 95.0% (estimated).
- Reliability (MTBF): rotating assets = 12,000–24,000 hours (estimated), ESP runlife = 18–36 months (field dependent).
- Maintainability (MTTR): critical rotating equipment corrective MTTR = 8–24 hours (estimated), changeouts planned to = 12 hours where practical.
- OEE: target = 85% on bottleneck trains (estimated).
- PM/PdM compliance: = 95% on-time with = 5% deferral.
- Bad-actor frequency: top 10 contributors reduced = 50% failures within two quarters.
- Condition-monitoring coverage: = 90% of critical rotating equipment on vibration/oil PdM.
- Emissions/energy KPIs: flaring due to equipment trips = 0.5% of production; powertrain efficiency = 38–42% for gas turbines (site dependent).
- Maintenance cost intensity: = 2.5–5.0 $/BOE (onshore) or benchmarked per asset (estimated).
- I.3 Core formulas:
Availability: \( A = \dfrac{\text{MTBF}}{\text{MTBF} + \text{MTTR}} \)
Reliability (exponential): \( R(t) = e^{-t/\text{MTBF}} \), failure rate \( \lambda = 1/\text{MTBF} \)
Weibull reliability: \( R(t) = e^{-(t/\eta)^\beta} \)
OEE: \( \text{OEE} = A \times \text{Performance} \times \text{Quality} \)
FMEA risk priority number: \( \text{RPN} = \text{Severity} \times \text{Occurrence} \times \text{Detection} \)
Reorder point: \( \text{ROP} = D_{LT} + SS \), Safety stock: \( SS = z \sigma_{LT} \)
- I.4 Assumptions (estimated): conventional oilfield with mixed rotating equipment, ESPs/gas lift, surface facilities, and power utilities; no extreme HPHT or sour service beyond standard mitigation.
II. Critical Parameters and Target Ranges
| Asset Group | Key Parameters | Typical Targets (estimated) | Reliability Rationale |
|---|---|---|---|
| Centrifugal pumps/compressors | Vibration RMS, bearing temp, lube oil cleanliness, NPSH margin, surge margin | Vib = 4.5 mm/s; bearing = 90 °C; oil = ISO 19/17/14; NPSH margin = 1.5 m; compressor surge margin = 10–15% | Controls rotor dynamics, avoids lubrication and surge-induced failures |
| Reciprocating compressors | Crosshead temp, rod drop, frame vibration, valve ?P, lube rate | Rod drop within OEM tolerance; frame vib = 7 mm/s; valve ?P trend stable; oil delivery as per OEM | Manages wear, detects valve/reed and rider-band issues early |
| ESPs | Motor current/load, intake temp, discharge pressure, VSD harmonics | Amps within ±10% of design; intake = 120 °C; ?P stable; THD = 5–8% | Prevents thermal overload and electrical insulation damage |
| Gas turbines/engines | Exhaust gas temp spread, vibration, fuel quality, inlet ?P | EGT spread = 15–25 °C; vib within OEM; filter ?P within limits; fuel S/W within spec | Protects hot section, prevents surge/combustion instabilities |
| Gearboxes | Oil ISO code, water ppm, particle metals, temperature | ISO = 20/18/15; water = 500 ppm; temp = 85 °C | Extends bearing/gear life, prevents micropitting |
| Hydraulics/BOP control | Fluid cleanliness, accumulator precharge, leak-off | ISO = 18/16/13; precharge within ±10%; leak-off minimal | Assures actuation reliability under demand |
| Separators/vessels | Level control stability, DP across internals, PSV set/test | Stable LC within ±3%; DP trends; PSV inspection as per plan | Prevents carryover/carryunder and overpressure trips |
| Flowlines/pipelines | Corrosion rate, wall loss, inhibitor residual, pigging DP | CR = 0.1–0.5 mm/y; residual per chemistry; pig DP within trend | Mitigates leaks/ruptures; maintains throughput |
| Power systems | Voltage THD, frequency stability, UPS autonomy, ground faults | THD = 5%; freq 49.8–50.2 or 59.8–60.2 Hz; UPS = 15–30 min | Prevents nuisance trips and electronics damage |
| Instrumentation | Loop health, calibration drift, voting integrity | Drift within spec; proof-test intervals met; 2oo3 logic healthy | Reduces spurious trips and undetected demand failure |
| Water/chem injection | Pump vib, stroke count, filter DP, chemical residual | Vib within limits; DP = setpoint; residuals per design | Assures injection targets; protects metallurgy/flow |
III. Step-by-Step Procedure / Workflow / Checklist
- III.1 Establish criticality and failure modes
- 3.1.1 Build an equipment criticality matrix (HSE, production impact, repair cost, lead time).
- 3.1.2 Conduct RCM/FMEA on A and B critical equipment; quantify RPN and define functional failures.
- 3.1.3 Create “bad actor” list from 12–24 months of failure data; prioritize Pareto top 20% causing 80% losses.
- III.2 Define maintenance strategy (PM/PdM/Run-to-Failure)
- 3.2.1 Convert calendar PMs to condition-based where feasible (vibration, oil, thermography, ultrasound).
- 3.2.2 Set optimal intervals using Weibull: choose PF interval = 1/2 of P–F window.
- 3.2.3 Lock tasks and intervals in CMMS with job plans, tools, TORQUE values, and acceptance criteria.
- III.3 Implement condition monitoring program
- 3.3.1 Online sensors on critical machines (vibration, temperature, pressure, speed, electrical signature).
- 3.3.2 Route-based data every 2–4 weeks; alarms set at Alert/Trip bands (e.g., 1.5× and 2.5× baseline).
- 3.3.3 Oil analysis (viscosity, TAN/TBN, PQ index, ICP metals, water Karl Fischer, particle count).
- 3.3.4 Thermography on MCCs/bus ducts quarterly; ultrasonic leak surveys for pneumatics.
- III.4 Control the operating envelope
- 3.4.1 Map pump/compressor curves; maintain Best Efficiency Point (BEP) ± 10–20% flow.
- 3.4.2 Anti-surge control validation on compressors; prove trip logic and valve stroking quarterly.
- 3.4.3 Soft starts/ramp rates via VFD/VSD; avoid frequent starts; enforce min-run/min-stop timers.
- III.5 Lubrication and contamination control
- 3.5.1 Specify lubricant by duty; set filtration ß-ratio; install breathers/desiccants.
- 3.5.2 Flush new systems to target ISO code; baseline oil analysis after commissioning.
- 3.5.3 Grease practices: right type, volume, intervals; prevent overgreasing.
- III.6 Spares, MRP, and kitting
- 3.6.1 Determine ROP/SS using demand and lead-time variability: \( \text{ROP} = D_{LT} + z\sigma_{LT} \).
- 3.6.2 Dual-source or frame-agree critical spares; hold N+1 for single-point failures.
- 3.6.3 Kit PMs with gaskets, fasteners, shims; use barcode/QR for traceability.
- III.7 QA/QC and precision maintenance
- 3.7.1 Precision alignment (laser), balance, proper fits; document as-left data.
- 3.7.2 Torque-to-yield/angle for critical joints; use calibrated tools.
- 3.7.3 OEM part verification; material certs for pressure-retaining items.
- III.8 Commissioning, proof tests, and SAT
- 3.8.1 FAT/SAT with acceptance criteria; baseline vibration, thermals, electrical.
- 3.8.2 Function test ESD/PSV/reliefs; record SIL proof-test results.
- III.9 Competency, procedures, and human factors
- 3.9.1 Role-based competency matrix; cert-to-task linkage in CMMS.
- 3.9.2 Clear SOPs/LOTO; JSA embedded in work orders.
- 3.9.3 Pre-job briefs and post-job debriefs feed continuous improvement.
- III.10 Work management discipline
- 3.10.1 Backlog control: ready backlog 2–4 weeks; aged backlog < 10% > 90 days.
- 3.10.2 Schedule compliance = 80%; wrench time = 55–65%.
- III.11 Management of Change (MOC) and obsolescence
- 3.11.1 MOC for setpoint, hardware, software changes; cyber/functional safety review.
- 3.11.2 Obsolescence register; planned migrations and stocking strategy.
- III.12 Failure investigation and reliability growth
- 3.12.1 RCFA on high RPN/production-impacting events within 5 business days.
- 3.12.2 Implement corrective actions; verify risk reduction; update RCM.
IV. Risk & Mitigation (HSE, Reliability, Redundancy)
- IV.1 HSE-critical risks
- 4.1.1 Pressure/energy release: enforce LOTO, pressure tests, and calibrated relief devices.
- 4.1.2 Ignition sources: classify areas, maintain Ex integrity, verify bonding/grounding.
- 4.1.3 Confined space and SIMOPS: permits, gas tests, continuous monitoring, rescue readiness.
- 4.1.4 Dropped objects and rotating parts: guards, exclusion zones, lift plans.
- IV.2 Reliability risks
- 4.2.1 Single-point failures: design/select N+1 on bottlenecks; install bypasses where practical.
- 4.2.2 Power quality: VFD harmonics; install filters/12–18 pulse rectifiers; monitor THD.
- 4.2.3 Solids/contaminants: upstream strainers/filters, pigging program, chemical treatment.
- 4.2.4 Environmental extremes: heat/cold derates, enclosures/insulation, winterization.
- IV.3 Mitigation controls
- 4.3.1 Proof testing of SIFs; maintain achieved SIL; document PFDavg.
- 4.3.2 Spare capacity and quick-disconnects for rapid swaps; pre-commissioned spares.
- 4.3.3 Condition-based shutdown permissives with degraded-mode operation where safe.
V. Optimization Levers (Analytics, Maintenance, Debottlenecking)
- V.1 Data and analytics
- 5.1.1 Set up a historian with high-resolution tags on critical assets; calculate KPIs in near-real time.
- 5.1.2 Predictive models: anomaly detection on vibration spectra, ESP current signature, compressor surge proximity.
- 5.1.3 Weibull analysis of failures to optimize PM intervals (shape ß > 1 indicates wear-out).
- V.2 Debottlenecking and control
- 5.2.1 Re-rate pump impellers, trim recycle valves, tune anti-surge PID to reduce hunting and trips.
- 5.2.2 APC/MPC for separators and trains to dampen disturbances and stay within limits.
- V.3 Maintenance strategy and TARs
- 5.3.1 Shift low-value PMs to on-condition; extend intervals with evidence from PdM data.
- 5.3.2 Risk-based inspection (RBI) for static equipment to reduce intrusive work while managing integrity.
- 5.3.3 Turnaround readiness index: scope freeze, materials readiness = 95%, critical path float = 10%.
- V.4 Parts and lifecycle
- 5.4.1 Standardize spares across sites; use interchangeable skids where possible.
- 5.4.2 Lifecycle cost optimization: compare rebuild vs replace using NPV of failure risk and efficiency gains.
- V.5 Quantifying business impact
- 5.5.1 Downtime cost per hour: \( C_d = Q \times P \times \pi \), where Q = production loss (BOE/h), P = price ($/BOE), \( \pi \) = netback fraction.
- 5.5.2 Prioritize actions by highest avoided \( C_d \) per invested dollar.
VI. Verification & Monitoring Plan
- VI.1 What to measure
- 6.1.1 Availability, MTBF, MTTR by asset class; OEE on bottlenecks.
- 6.1.2 Condition indices: vibration overall/spectrum KPIs, oil health, thermography exceptions.
- 6.1.3 Alarm/Trip KPI: spurious trip rate, stale alarm count, alarm flood occurrences.
- 6.1.4 Work management: PM compliance, schedule compliance, backlog health, wrench time.
- 6.1.5 Spares: stockouts, ROP adherence, lead-time variance.
- 6.1.6 Integrity: corrosion rate, thickness trends, leak frequency, proof-test success rate.
- VI.2 How often
- 6.2.1 Daily: critical alarms, asset health dashboard, production deferment log.
- 6.2.2 Weekly: bad-actor review, PM/PdM completion, spares status for A-critical assets.
- 6.2.3 Monthly: reliability scorecard, Weibull/RCFA updates, integrity KPI rollup.
- 6.2.4 Quarterly: RCM refresh on bad actors; proof tests of protection systems; MOC audit.
- 6.2.5 Annually: strategy benchmarking, TAR post-mortem, budget alignment to reliability risks.
- VI.3 Acceptance thresholds
- 6.3.1 Sustain A = 97% and OEE = 85% for three consecutive months.
- 6.3.2 Reduce top-10 bad-actor failures by = 50% within two quarters.
- 6.3.3 PdM early-detection hit rate = 70% (predicted vs. actual functional failures).
- VI.4 Feedback loop
- 6.4.1 Close RCFA actions in CMMS; verify reduced \( \lambda \) via MTBF trend improvement.
- 6.4.2 Update PM tasks/intervals using data; document changes via MOC.
- 6.4.3 Publish reliability learnings to all crews to reinforce precision practices.


Collaborate and learn alongside you peers. Professional development on your schedule. API training programs will help you advance your career. Browse our list of courses today.