1. Modeling Uncertainty with Probability Distributions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. contents:: On this page
   :local:

1.1 Overview
~~~~~~~~~~~~

In real-world projects, durations are rarely fixed. Equipment speed varies, 
workers perform differently, and environmental factors create unpredictability. 
This tutorial shows how to model uncertainty using **probability distributions**, 
turning deterministic simulations into **stochastic** ones that reflect real-world 
variability.

1.2 Why Uncertainty Matters
~~~~~~~~~~~~~~~~~~~~~~~~~~~

A deterministic model (fixed durations) tells you "how long if everything 
goes perfectly." A stochastic model (random durations) shows you:

- **Best case, worst case, and likely case** scenarios
- **Risk assessment**: What's the probability of exceeding a deadline?
- **Buffer planning**: How much contingency time do we really need?
- **Sensitivity analysis**: Which operation's variability matters most?

Without modeling uncertainty, you miss critical risks.

1.3 Available Distributions
~~~~~~~~~~~~~~~~~~~~~~~~~~~

SimPM includes several probability distributions. Each is suited to different 
types of operational variability:

.. code-block:: python

   from simpm.dist import norm, uniform, triang, expon, beta

   # Normal (Gaussian): Symmetric variation around a mean
   # Use for: Loading times, travel times, processing durations
   # Example: Most loads take ~7 min, rarely outside 5-9 min
   loading_dist = norm(mean=7, std=1.5)

   # Uniform: Any value in range equally likely
   # Use for: Bounded ranges with no preferred value
   # Example: Dumping takes 2-4 minutes, any time equally likely
   dumping_dist = uniform(a=2, b=4)

   # Triangular: Min, max, and mode (peak) all specified
   # Use for: Expert estimates with "most likely" value
   # Example: Optimistic 5 min, pessimistic 15 min, likely 8 min
   return_dist = triang(5, 8, b=15)

   # Exponential: Right-skewed, long tail
   # Use for: Random event times (equipment failures, arrivals)
   # Example: ~50 min average, occasional much longer
   failure_dist = expon(50)

**Which distribution to choose?**

- **Normal**: When variation is symmetric (natural randomness)
- **Uniform**: When min/max are hard bounds, no preference inside
- **Triangular**: When you have optimistic, pessimistic, and best-guess estimates
- **Exponential**: When modeling rare but long delays

1.4 Earthmoving with Uncertainty
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Let's extend the earthmoving scenario with realistic variability. Here's 
the basic idea:

.. code-block:: python

   import simpm
   import simpm.des as des
   from simpm.dist import norm, uniform
   import numpy as np

   # Set seed for reproducible randomness
   np.random.seed(42)

   env = des.Environment("Earthmoving with uncertainty")
   loader = des.Resource(env, "loader", init=1, capacity=1, log=True)
   dumped_dirt = des.Resource(env, "dirt", init=0, capacity=100000, log=True)

   # Create trucks with size attributes
   trucks = env.create_entities("truck", 3, log=True)
   for t, size in zip(trucks, [30, 35, 50]):
       t["size"] = size

   def truck_cycle(truck: des.Entity, loader: des.Resource, dumped_dirt: des.Resource):
       while True:
           # Loading duration varies with truck size (larger trucks take longer to load)
           # We model this as a normal distribution centered on 5 + size/20
           yield truck.get(loader, 1)
           loading_mean = 5 + truck["size"] / 20  # Expected loading time
           loading_dist = norm(loading_mean, std=0.5)  # ±0.5 min variation
           yield truck.do("loading", loading_dist.sample())
           yield truck.put(loader, 1)

           # Hauling time varies due to traffic, weather, and road conditions
           # Model as normal: mean 17 minutes, std 4 minutes (high variability)
           yield truck.do("hauling", norm(17, 4))

           # Dumping time depends on site conditions (equipment availability, queue)
           # Model as uniform: anywhere from 2 to 5 minutes equally likely
           yield truck.do("dumping", uniform(2, 5))
           yield truck.add(dumped_dirt, truck["size"])

           # Return trip also has variation (traffic, driver difference)
           # Model as normal: mean 13 minutes, std 3 minutes
           yield truck.do("return", norm(13, 3))

   # Create and run the simulation
   for truck in trucks:
       env.process(truck_cycle(truck, loader, dumped_dirt))

   simpm.run(env, dashboard=False, until=480)

   # Print results
   print(f"Project completed in {env.now:.2f} minutes")
   print(f"Total dirt moved: {dumped_dirt.level():.0f} m³")

Key Concepts
~~~~~~~~~~~~

1.4.1 **Distribution parameters** – Each distribution has parameters defining its shape:
   
   - ``norm(mean, std)`` – mean is the center, std controls spread
   - ``uniform(a, b)`` – a and b are the min and max bounds
   - ``triang(a, c, b)`` – a/b are bounds, c is the peak (mode)

1.4.2 **Sampling via do()** – In SimPM, sampling happens automatically when you 
   pass a distribution object to ``entity.do()``. Each time ``do()`` executes 
   (in each simulation replicate), it draws a fresh random sample from that 
   distribution. You don't need to call ``.sample()`` explicitly – just pass 
   the distribution object directly to ``do()``.

1.4.3 **Seeding for reproducibility** – Call ``np.random.seed(42)`` at the start 
   of your simulation. This ensures the same "random" sequence every time, 
   making your results repeatable while still showing variability.

1.4.4 **Stochastic vs. deterministic** – With a seed, results are replicable but 
   vary from the fixed-duration model. Running the same simulation again 
   produces identical output (same seed). Running with a different seed 
   produces different but realistic variability.

1.5 Comparison: Deterministic vs. Stochastic
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Let's compare the same earthmoving project with and without uncertainty:

**Deterministic Model** (Fixed Durations):

.. code-block:: python

   # All durations fixed
   yield truck.do("loading", 6.5)      # Always 6.5 minutes
   yield truck.do("hauling", 17)       # Always 17 minutes
   yield truck.do("dumping", 3)        # Always 3 minutes
   yield truck.do("return", 13)        # Always 13 minutes

   # Run 1: 37 cycles, 1,330 m³
   # Run 2: 37 cycles, 1,330 m³ (identical)
   # Run 3: 37 cycles, 1,330 m³ (identical)

**Stochastic Model** (Random Durations):

.. code-block:: python

   # All durations vary based on distributions
   # SimPM's do() method accepts distribution objects directly
   # and samples internally on each replicate run
   yield truck.do("loading", norm(6.5, 0.5))  # 5.5-7.5 typical
   yield truck.do("hauling", norm(17, 4))     # 13-21 typical (higher variability)
   yield truck.do("dumping", uniform(2, 5))   # 2-5 equally likely
   yield truck.do("return", norm(13, 3))      # 10-16 typical (higher variability)

   # Run 1: 37 cycles, 1,330 m³ (with seed=42)
   # Run 2: 37 cycles, 1,330 m³ (identical with same seed)
   # Run 3: 36 cycles, 1,295 m³ (different with seed=99)

**Key Pattern: Direct Distribution Objects**

Note: In SimPM, you pass **distribution objects directly** to ``do()``, 
not pre-sampled values. The ``do()`` method automatically samples from the 
distribution on each simulation run:

.. code-block:: python

   # CORRECT: Pass the distribution object directly
   yield truck.do("hauling", norm(17, 4))     # SimPM samples internally

   # NOT: Don't pre-sample before passing
   # yield truck.do("hauling", norm(17, 4).sample())  # Inefficient

This approach is elegant because:
- Distribution definition is clear and concise
- Each replicate gets different random samples automatically
- Works seamlessly with ``simpm.run(factory, number_runs=10)``

**Key Differences:**

+---------------------------+--------------------+--------------------+
| Aspect                    | Deterministic      | Stochastic         |
+===========================+====================+====================+
| Durations                 | Fixed (6.5 min)    | Random (5.5-7.5)   |
+---------------------------+--------------------+--------------------+
| Repeatability             | Always same        | Same with same seed|
+---------------------------+--------------------+--------------------+
| Realism                   | Simplified         | Realistic          |
+---------------------------+--------------------+--------------------+
| Worst-case planning       | Not captured       | Can measure        |
+---------------------------+--------------------+--------------------+
| Risk assessment           | Single estimate    | Range of outcomes  |
+---------------------------+--------------------+--------------------+

1.7 Monte Carlo: Running Multiple Scenarios
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

One seed gives you one random scenario. To understand the full range of 
possible outcomes, run the simulation many times with different seeds.

**Modern Approach with simpm.run():**

SimPM provides a factory pattern for elegantly running Monte Carlo experiments. 
Define an environment factory function and let simpm.run() handle multiple 
replicates:

.. code-block:: python

   import simpm
   import simpm.des as des
   from simpm.dist import norm, uniform
   import statistics

   # Global variable for truck count
   _CURRENT_TRUCK_COUNT = 3

   def env_factory() -> des.Environment:
       """Create a fresh earthmoving environment."""
       global _CURRENT_TRUCK_COUNT
       
       env = des.Environment(f"Earthmoving ({_CURRENT_TRUCK_COUNT} trucks)")
       loader = des.Resource(env, "loader", init=1, capacity=1, log=True)
       dumped_dirt = des.Resource(env, "dirt", init=0, capacity=100000, log=True)
       
       trucks = env.create_entities("truck", _CURRENT_TRUCK_COUNT, log=True)
       for i, truck in enumerate(trucks):
           truck["size"] = [30, 35, 50][i % 3]
       
       # Set up truck processes with probabilistic durations
       for truck in trucks:
           env.process(truck_cycle(truck, loader, dumped_dirt))
       
       return env

   # Run 10 replicates with 5 trucks
   _CURRENT_TRUCK_COUNT = 5
   all_envs = simpm.run(env_factory, dashboard=False, number_runs=10)
   
   # Extract completion times
   durations = [env.now for env in all_envs]
   
   # Analyze results
   print(f"Mean duration: {statistics.mean(durations):.2f} minutes")
   print(f"Std deviation: {statistics.stdev(durations):.2f} minutes")
   print(f"Min: {min(durations):.2f}, Max: {max(durations):.2f}")

This **Monte Carlo** approach reveals:

- **Best case** – Maximum cycles, minimum time
- **Worst case** – Minimum cycles, maximum time
- **Average case** – Mean across all runs
- **Confidence intervals** – E.g., "90% of runs complete in X minutes or less"

1.8 Real-World Example: Earthmoving with Variable Fleet Size
------------------------------------------------------------

A practical example combining all concepts: **earthmoving_monte_carlo.py** 
runs 10 replicates for each truck configuration (3, 5, 7, 10 trucks), collecting 
actual project completion times from each run.

**Results from Monte Carlo Analysis (10 replicates per configuration):**

With **actual distributions** norm(5+size/20, 0.5-0.8) for loading, 
norm(17, 4) for hauling, uniform(2, 5) for dumping, and norm(13, 3) for return:

+----------+---------------+----------+-----------+---------------------+
| Trucks   | Mean (hours)  | Std Dev  | Min (hrs) | Risk > 300 hours    |
+==========+===============+==========+===========+=====================+
| 3        | 593.14        | 1.50 min | 590.81    | 100% (all 10 runs)  |
+----------+---------------+----------+-----------+---------------------+
| 5        | 393.41        | 24.35min | 392.65    | 100% (all 10 runs)  |
+----------+---------------+----------+-----------+---------------------+
| 7        | 297.85        | 26.11min | 297.41    | 0% (all runs < 300) |
+----------+---------------+----------+-----------+---------------------+
| 10       | 290.31        | 21.06min | 289.60    | 0% (all runs < 300) |
+----------+---------------+----------+-----------+---------------------+

**Key Insights:**

1. **Clear Decision Point** – The 300-hour target creates meaningful risk differentiation:
   - 3 trucks: 100% overrun (all 10 replicates exceed 300 hours) – **not viable**
   - 5 trucks: 100% overrun (all 10 replicates exceed 300 hours) – **not viable**
   - 7 trucks: 0% overrun risk (all 10 replicates complete within 300 hours) – **viable**
   - 10 trucks: 0% overrun risk (all 10 replicates complete within 300 hours) – **viable**

2. **Variability Pattern** – Interestingly, more trucks doesn't always mean lower variability 
   in this case. With high variability in hauling (std=4) and return (std=3), the pattern 
   is more complex than with low-uncertainty models.

3. **Efficiency Gains Diminish** – Adding trucks shows decreasing marginal benefit:
   - 3→5 trucks: saves 199.73 hours (33.7% reduction)
   - 5→7 trucks: saves 95.56 hours (24.3% reduction)
   - 7→10 trucks: saves 7.54 hours (2.5% reduction)
   The first jump (3→5 trucks) provides the biggest scheduling improvement.

4. **Distribution Impact** – Using realistic, high-variability distributions 
   (norm(17, 4), uniform(2, 5), norm(13, 3)) creates meaningful uncertainty that 
   deterministic models miss. These values reflect real-world earthmoving conditions 
   with traffic, weather, and operational variability.

**Running the Example:**

See the complete working implementation in ``example/earthmoving_monte_carlo.py``:

.. code-block:: bash

   python example/earthmoving_monte_carlo.py

The script:
- Defines ``env_factory()`` to create fresh environments with variable truck counts
- Uses ``simpm.run(env_factory, number_runs=10)`` for each configuration
- Extracts completion times directly from ``env.now``
- Calculates percentiles, confidence intervals, and risk assessments
- Compares configurations to guide fleet sizing decisions

1.9 Real-World Applications
~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Project Planning:**
  Use Monte Carlo to determine realistic project duration with contingency buffer.
  If 90% of stochastic runs complete in 310 hours, plan for 330 hours (10% buffer).

**Resource Optimization:**
  Run stochastic simulations with different fleet sizes. What fleet minimizes 
  cost while meeting reliability targets?

**Risk Management:**
  Identify which operations' variability drives project risk. Invest in reducing 
  that variability (better equipment, training, etc.).

**Decision Support:**
  Compare alternatives (different loaders, truck sizes, schedules) using 
  stochastic simulation. Make decisions based on confidence intervals, not 
  just averages.

1.10 Complete Working Examples
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

See these complete, documented examples in the ``example/`` folder:

- **earthmoving_with_truck_sizes.py** – Deterministic with attributes (fixed durations)
- **earthmoving_probabilistic.py** – Stochastic single run with factory pattern (10 replicates)
- **earthmoving_monte_carlo.py** – Monte Carlo analysis across fleet configurations

**Key Pattern Used:**

All examples use the **factory pattern** with ``simpm.run()``::

   def env_factory() -> des.Environment:
       # Create fresh environment with current truck count
       env = des.Environment(...)
       # ... set up resources and processes ...
       return env

   all_envs = simpm.run(env_factory, number_runs=10)
   durations = [env.now for env in all_envs]

**Distributions in All Probabilistic Examples:**

- Loading: ``norm(5 + size/20, 0.5 + 0.1*(i%3))`` – varies by truck size
- Hauling: ``norm(17, 4)`` – high variability (traffic, conditions)
- Dumping: ``uniform(2, 5)`` – site conditions uncertain
- Return: ``norm(13, 3)`` – traffic and route variability

**Important:** Distributions are passed **directly** to ``do()``, not sampled::

   yield truck.do("loading", loading_dists[truck_id])  # Correct
   # NOT: yield truck.do("loading", loading_dists[truck_id].sample())

Run them yourself:

.. code-block:: bash

   python example/earthmoving_with_truck_sizes.py
   python example/earthmoving_probabilistic.py
   python example/earthmoving_monte_carlo.py

Then modify them:

- Change distribution parameters (e.g., ``norm(7, 1.5)`` to ``norm(7, 3)``)
- Try different truck counts
- Adjust the risk threshold for different project targets
- Experiment with different truck sizes and loaders

1.11 Summary
~~~~~~~~~~~~

Probability distributions transform simulations from "best case" to "real world":

1. **Use distributions** to model realistic variability in durations
2. **Choose appropriate distribution** for each operation (normal, uniform, etc.)
3. **Set random seed** (``np.random.seed``) for reproducible randomness
4. **Run Monte Carlo** (multiple runs with different seeds) to understand outcomes
5. **Use results for planning** – Confidence intervals, risk assessment, optimization

1.12 Next Steps
~~~~~~~~~~~~~~~

- Read the :doc:`hello-simpm` tutorial for basic concepts and entity attributes
- Explore the complete probabilistic examples in ``example/``
- Try adding uncertainty to your own simulations
- See also the :doc:`dashboard-guide` for analyzing simulation results visually