import numpy as np
import matplotlib.pyplot as plt
# Sample DMU data: Inputs (lower is better), Outputs (higher is better)
= np.array([8, 6, 5, 7, 9, 4])
inputs = np.array([40, 38, 36, 30, 35, 34])
outputs = ['DMU1', 'DMU2', 'DMU3', 'DMU4', 'DMU5', 'DMU6']
names
# Plot DMUs
=(8, 6))
plt.figure(figsize='blue', label='DMUs')
plt.scatter(inputs, outputs, color
# Annotate DMUs
for i, name in enumerate(names):
+0.1, outputs[i]))
plt.annotate(name, (inputs[i]
# Efficient frontier (manually connecting the efficient DMUs)
# Let's assume DMU6, DMU3, and DMU1 are efficient
= [4, 5, 8]
frontier_inputs = [34, 36, 40]
frontier_outputs
plt.plot(
frontier_inputs,
frontier_outputs, ='green',
color='--',
linestyle=2,
linewidth='Efficient Frontier')
label
'DEA Efficient Frontier Example')
plt.title('Input (lower is better)')
plt.xlabel('Output (higher is better)')
plt.ylabel(
plt.legend()True)
plt.grid( plt.show()
Efficiency measurement is about quantifying how well an entity, such as a company, department, or individual,uses its resources (inputs) to achieve its goals (outputs). The aim is to maximize outputs while minimizing inputs, ensuring optimal use of available resources.
Data Envelopment Analysis (DEA) is a non-parametric mathematical method used to measure the relative efficiency of a set of similar decision-making units (DMUs). DEA evaluates each DMU by comparing its inputs and outputs, using a ratio of the weighted sum of outputs to the weighted sum of inputs. This makes it ideal for assessing efficiency when multiple inputs and outputs are involved.
DEA has numerous real-world applications, including:
- Healthcare: Evaluating hospitals based on inputs like the number of doctors, nurses, and beds, versus outputs such as the number of patients treated or survival rates, with the goal of identifying underperforming facilities
- Banking: Assessing bank branches by comparing inputs like staff size and operating costs against outputs such as new accounts opened or loans processed, aiming to optimize branch operations
- Education: Measuring schools’ efficiency by comparing inputs like teachers, classroom space, and funding against outputs like test scores and graduation rates, to improve educational outcomes and resource allocation.
- Energy Sector: Analyzing the efficiency of power plants by comparing fuel and labor inputs to energy output and environmental impact, with the goal of minimizing waste and enhancing sustainability
- Public Sector: Evaluating local governments or public services (e.g., waste management, policing) to ensure effective and efficient use of public funds
Types of DEA Models
There are several DEA models, each suited to different scenarios:
- The CCR model (Constant Returns to Scale) assumes that a proportional increase in inputs will result in a proportional increase in outputs. It is often used as a baseline for efficiency comparisons
- The BCC model (Variable Returns to Scale) allows for cases where outputs do not necessarily change proportionally with inputs, accommodating increasing, constant, or decreasing returns to scale
DEA Orientations
DEA models can also be classified by their orientation:
- Input-oriented models focus on minimizing the inputs required to produce a given level of outputs
- Output-oriented models aim to maximize the outputs achieved with a given set of inputs
- Non-oriented models are used when both input reductions and output increases are desired simultaneously
Other advanced DEA variants include Two-Stage DEA, Slack-Based Measure (SBM), Network DEA, Super-Efficiency DEA, Fuzzy DEA, and Stochastic DEA, each addressing specific analytical needs.
Basic DEA Model: CCR Input-Oriented
The basic DEA model (CCR Input-Oriented) is designed to minimize input usage while maintaining at least the current level of outputs. It operates under the assumption of constant returns to scale. The objective is to determine how much each decision making unit (DMU) can proportionally reduce its inputs without reducing outputs, while ensuring that no other DMU performs better. The model outputs an efficiency score ranging from 0 to 1:
- A score of 1.0 signifies full efficiency.
- A score below 1 indicates potential for improvement.
Visualizing DEA: The Efficient Frontier
A graphical illustration can help visualize DEA by plotting DMUs with input on one axis and output on the other. The efficient frontier is formed by connecting the most efficient DMUs, representing those producing the most output for the least input. DMUs below the efficient frontier are considered inefficient as they use more input than necessary for their output level. This visualization helps identify best and lagging units.
Practical DEA Tools: Benchmarking with R vs. Python
While pyDEA is an open-source Python package designed for conducting DEA, in practice, it has significant limitations. Although it supports standard models like CCR and BCC and accepts CSV input, pyDEA is cumbersome to set up, lacks flexibility for interactive analysis, and is difficult to integrate into practical workflows.
R offers a more mature and practical ecosystem for DEA through packages like Benchmarking. The Benchmarking package is widely used in academia and industry for DEA and provides:
- Support for both CCR and BCC models
- Flexible handling of input- and output-oriented approaches
- Functions to compute efficiency scores, reference sets, and slacks
- Simple integration with data frames and visualization tools
- Ease of scripting and automation for batch efficiency evaluations
Because of its robustness, ease of use, and flexibility, Benchmarking with R is generally considered the best practical tool for efficiency measurement using DEA. It is especially well-suited for professionals and researchers who need accurate, transparent, and reproducible results.
Examples
Hospital Department Efficiency
You are analyzing the performance of 4 hospital departments. Each department has a different combination of medical staff and beds. You want to know: Which department is most efficient in treating patients with limited resources?
Inputs:
- Doctors: [
40
,35
,45
,38
] - Nurses: [
80
,70
,85
,75
] - Beds: [
150
,140
,160
,145
]
Outputs:
- Patients treated: [
2,000
,1,900
,2,100
,1,950
] - Survival rate: [
0.97
,0.96
,0.98
,0.95
]
Answer
%%R
# install.packages("Benchmarking")
library(Benchmarking)
library(knitr)
# Function to create efficiency table
<- function(names, scores, unit_label = "Unit") {
generate_efficiency_table # Calculate interpretation text
<- ifelse(
interpretation == 1,
scores "Fully efficient (benchmark unit)",
"Can reduce inputs by ~", round((1 - scores) * 100, 1), "%")
paste0(
)
# Create dataframe
<- data.frame(
df = names,
unit_label = round(scores, 3),
Efficiency_Score = interpretation,
Interpretation = FALSE
stringsAsFactors
)
1] <- unit_label # Dynamic column name
colnames(df)[
# Print the dataframe
kable(df) }
%%R
# Inputs: Doctors, Nurses, Beds
<- matrix(
inputs 40, 80, 150,
c(35, 70, 140,
45, 85, 160,
38, 75, 145),
= 4,
nrow = TRUE
byrow
)
# Outputs: Patients treated, Survival rate
<- matrix(
outputs 2000, 0.97,
c(1900, 0.96,
2100, 0.98,
1950, 0.95),
= 4,
nrow = TRUE
byrow
)
# Run DEA (input-oriented, constant returns to scale)
<- dea(X = inputs, Y = outputs, RTS = "crs", ORIENTATION = "in")
dea_result
# Print efficiency scores
print(dea_result$eff)
<- c("Dept 1", "Dept 2", "Dept 3", "Dept 4")
names
# Generate table
"Department") generate_efficiency_table(names, dea_result$eff,
[1] 0.9824561 1.0000000 0.9671053 0.9909256
|Department | Efficiency_Score|Interpretation |
|:----------|----------------:|:--------------------------------|
|Dept 1 | 0.982|Can reduce inputs by ~1.8% |
|Dept 2 | 1.000|Fully efficient (benchmark unit) |
|Dept 3 | 0.967|Can reduce inputs by ~3.3% |
|Dept 4 | 0.991|Can reduce inputs by ~0.9% |
Department 2 is the benchmark performer, operating fully efficiently by maximizing patient outcomes with its available staff and beds. In contrast, Department 3 is the least efficient, with potential to reduce its resource use by about 3.3% while maintaining the same output. Department 3 should be reviewed more closely - check if it’s overstaffed, underperforming, or facing operational issues. Departments 1 and 4 are close to efficiency but still have minor room for optimization.
Retail Store Efficiency
A retail company wants to assess whether its 4 branches are efficiently turning staff and marketing investment into profit. Which store branch is the best performer given similar resource levels?
Inputs:
- Marketing Spend (USD): [
$12,000
,$10,000
,$11,500
,$9,800
] - Employees: [
20
,18
,22
,19
]
Outputs:
- Monthly Revenue (USD): [
$110,000
,$100,000
,$120,000
,$95,000
] - Number of Customers: [
2,300
,2,000
,2,600
,1,900
]
Answer
%%R
# Inputs: Marketing Spend (USD), Employees
<- matrix(
inputs 12000, 20,
c(10000, 18,
11500, 22,
9800, 19),
= 4,
nrow = TRUE
byrow
)
# Outputs: Monthly Revenue (USD), Number of Customers
<- matrix(
outputs 110000, 2300,
c(100000, 2000,
120000, 2600,
95000, 1900),
= 4,
nrow = TRUE
byrow
)
# Run DEA (input-oriented, constant returns to scale)
<- dea(X = inputs, Y = outputs, RTS = "crs", ORIENTATION = "in")
dea_result
# Print efficiency scores
print(dea_result$eff)
<- c("Store 1", "Store 2", "Store 3", "Store 4")
names
# Generate table
"Branch/Store") generate_efficiency_table(names, dea_result$eff,
[1] 1.0000000 1.0000000 1.0000000 0.9289966
|Branch/Store | Efficiency_Score|Interpretation |
|:------------|----------------:|:--------------------------------|
|Store 1 | 1.000|Fully efficient (benchmark unit) |
|Store 2 | 1.000|Fully efficient (benchmark unit) |
|Store 3 | 1.000|Fully efficient (benchmark unit) |
|Store 4 | 0.929|Can reduce inputs by ~7.1% |
Stores 1, 2, and 3 are fully efficient, effectively turning their marketing spend and staff into revenue and customer traffic. In contrast, Store 4 is underperforming, with efficiency at 92.9%, suggesting it could boost its revenue/customers or cut resource use by 7% to match the top performers. Store 4 requires operational review to identify and address gaps in marketing effectiveness or staff productivity.
School Performance Evaluation
A school district monitors 4 schools to find out which one is using its resources most efficiently to produce better academic results.
Inputs:
- Teachers: [
30
,25
,20
,28
] - Annual Budget (USD): [
$800,000
,$750,000
,$700,000
,$770,000
]
Outputs:
- Graduation Rate (%): [
85
,90
,82
,88
] - Average Test Score (out of 100): [
78
,80
,76
,79
]
Answer
%%R
# Inputs: Teachers, Annual Budget (USD)
<- matrix(
inputs 30, 800000,
c(25, 750000,
20, 700000,
28, 770000),
= 4,
nrow = TRUE
byrow
)
# Outputs: Graduation Rate (%), Average Test Score
<- matrix(
outputs 85, 78,
c(90, 80,
82, 76,
88, 79),
= 4,
nrow = TRUE
byrow
)
# Run DEA (input-oriented, constant returns to scale)
<- dea(X = inputs, Y = outputs, RTS = "crs", ORIENTATION = "in")
dea_result
# Print efficiency scores
print(dea_result$eff)
<- c("School 1", "School 2", "School 3", "School 4")
names
# Generate table
"School") generate_efficiency_table(names, dea_result$eff,
[1] 0.9017857 1.0000000 1.0000000 0.9577922
|School | Efficiency_Score|Interpretation |
|:--------|----------------:|:--------------------------------|
|School 1 | 0.902|Can reduce inputs by ~9.8% |
|School 2 | 1.000|Fully efficient (benchmark unit) |
|School 3 | 1.000|Fully efficient (benchmark unit) |
|School 4 | 0.958|Can reduce inputs by ~4.2% |
Schools 2 and 3 are fully efficient, effectively using their teachers and budgets to achieve strong graduation rates and test scores. School 4 is close to efficient but has room to improve by about 4.2%. School 1 is the least efficient, with potential to optimize resources by nearly 10%. School 1 should be prioritized for review and targeted improvements because it either uses more resources than needed or isn’t getting the best academic outcomes relative to its peers.
Disclaimer: For information only. Accuracy or completeness not guaranteed. Illegal use prohibited. Not professional advice or solicitation. Read more: /terms-of-service
Reuse
Citation
@misc{kabui2025,
author = {{Kabui, Charles}},
title = {Efficiency {Measurement} and {Data} {Envelopment} {Analysis}
{(DEA)}},
date = {2025-05-08},
url = {https://toknow.ai/posts/computational-techniques-in-data-science/data-envelopment-analysis/index.html},
langid = {en-GB}
}