Internship ,machine learning basics

Note

Note: In this explanation, I am using the Ride Dynamic Pricing dataset from Kaggle as an example to illustrate each step of the machine learning process. You can find the dataset on Kaggle here.
https://www.kaggle.com/datasets/arashnic/dynamic-pricing-dataset

The dataset includes columns with information such as Number_of_Riders, Number_of_Drivers, Location_Category, Customer_Loyalty_Status, Number_of_Past_Rides, Average_Ratings, Time_of_Booking, Vehicle_Type, Expected_Ride_Duration, and Historical_Cost_of_Ride.

Here are the first five entries of the dataset:

Number_of_Riders Number_of_Drivers Location_Category Customer_Loyalty_Status Number_of_Past_Rides Average_Ratings Time_of_Booking Vehicle_Type Expected_Ride_Duration Historical_Cost_of_Ride
90 45 Urban Silver 13 4.47 Night Premium 90 284.26
58 39 Suburban Silver 72 4.06 Evening Economy 43 173.87
42 31 Rural Silver 0 3.99 Afternoon Premium 76 329.80
89 28 Rural Regular 67 4.31 Afternoon Premium 134 470.20
78 22 Rural Regular 74 3.77 Afternoon Economy 149 579.68

This dataset provides detailed information about various features associated with ride pricing and is used here to demonstrate a machine learning workflow focused on dynamic pricing predictions

Overview

Step 1: Collect and Prepare the Data


Step 2: Exploratory Data Analysis (EDA)


Step 3: Define the Target for Prediction


Step 4: Split the Data (Train-Test Split)


Step 5: Choose and Train Multiple Models


Step 6: Evaluate and Choose the Best Model

SO why we use Predictions and not use our own logic to implement the update

Consider this sample code

# Function to apply the pricing strategies including demand-supply calculation

def apply_pricing_strategy_updated(row,

                                    demand_threshold=1.1,

                                    high_demand_cut_off=1.15,

                                    low_demand_cut_off=0.85,

                                    high_supply_cut_off=1.15,

                                    low_supply_cut_off=0.85):

    price = row['Historical_Cost_of_Ride']

    # Calculate Demand-Supply Ratio

    total_demand = row['Number_of_Riders']  # Total riders (demand)

    total_supply = row['Number_of_Drivers']  # Total drivers (supply)

    if total_supply > 0:  # Ensure no division by zero

        demand_supply_ratio = total_demand / total_supply

    else:

        demand_supply_ratio = float('inf')  # Set to infinity if no drivers are available

    # Increase price based on demand-supply ratio

    if demand_supply_ratio >= high_demand_cut_off:

        price *= 1.10  # 10% increase for significantly high demand

    elif demand_supply_ratio <= low_demand_cut_off:

        price *= 0.85 # 15% decrease for significantly low demand

    # Adjust for supply considerations

    if total_supply > high_supply_cut_off * total_demand:

        price *= 0.85  # 15% decrease if supply exceeds demand significantly

    elif total_supply < low_supply_cut_off * total_demand:

        price *= 1.10 # 10% increase if demand exceeds supply significantly
 

    # Customer Loyalty Pricing

    if row['Customer_Loyalty_Status'] == 'Regular':

        price *= 1.05  # 5% increase for regular customers

    elif row['Customer_Loyalty_Status'] == 'Silver':

        price *= 1.02  # 2% increase for silver customers

    # Ride Duration Adjustment

    if row['Expected_Ride_Duration'] > 60:

        price *= 1.05  # 5% increase for long rides (over 60 minutes)

    # Time of Booking Factor

    if row['Time_of_Booking'] in ['Night', 'Evening']:

        price *= 1.05  # 5% increase for high-demand times (night or evening)

    # Vehicle-Based Pricing Adjustment

    if row['Vehicle_Type'] == 'Premium':

        price *= 1.05  # 3% increase for premium vehicles

    elif row['Vehicle_Type'] == 'Economy':

        price *= 1.02  # 2% increase for economy vehicles

    return price


# Apply the updated pricing strategy to the dataset

df['new_updated_price'] = df.apply(apply_pricing_strategy_updated, axis=1)

# Calculate the total historical and new ride costs

total_historical_cost = df['Historical_Cost_of_Ride'].sum()

total_new_cost = df['new_updated_price'].sum()

# Calculate the overall percentage increase

total_percentage_increase = ((total_new_cost - total_historical_cost) / total_historical_cost) * 100


# Display total costs and percentage increase

print("Total Historical Ride Cost:", total_historical_cost)

print("Total New Updated Price:", total_new_cost)

print("Overall Percentage Increase:", total_percentage_increase)

# Optional: Calculate and print the overall total demand and supply metrics for the dataset

total_demand = df['Number_of_Riders'].sum()

total_supply = df['Number_of_Drivers'].sum()


print("Total Demand (Number of Riders):", total_demand)

print("Total Supply (Number of Drivers):", total_supply)


if total_supply > 0:

    overall_demand_supply_ratio = total_demand / total_supply

    print("Overall Demand-Supply Ratio:", overall_demand_supply_ratio)

else:

    print("Overall Demand-Supply Ratio: Undefined (no drivers available)")

Why Use Predictions Over Custom Logic for Pricing Updates

  1. Rule-Based, Not Predictive:

    • The provided code uses fixed rules (e.g., demand thresholds) to adjust prices. This means it does not learn from historical data or adapt to new trends; it simply applies pre-defined adjustments without flexibility.
  2. No Learning from Data:

    • There is no phase where a model learns from data patterns. Machine learning models analyze historical data to identify relationships and generalize these insights to new situations, which is absent in the provided logic.
  3. Fixed Thresholds and Multipliers:

    • The thresholds (e.g., high_demand_cut_off=1.15) are hard-coded and may not reflect real-world variability. A machine learning model would determine optimal thresholds based on actual data, allowing for more accurate adjustments.
  4. Deterministic Output:

    • The function produces the same output for the same input every time, leading to predictable but potentially inaccurate pricing. Predictive models, on the other hand, capture complex, data-driven relationships and adapt to changes in the data over time.
  5. Limited Scalability:

    • Rule-based systems require constant updates and revisions as market dynamics change. In contrast, machine learning models can be retrained with new data, maintaining accuracy as conditions evolve and patterns shift.
  6. Inability to Capture Complex Relationships:

    • The current logic does not account for interactions between various factors (e.g., how the combination of Time_of_Booking and Location_Category might influence price). Machine learning can uncover such interactions and provide nuanced pricing.

Summary

While the custom logic can provide a straightforward way to adjust prices, it lacks the adaptability, learning capability, and scalability of machine learning models. Predictive models can analyze complex patterns in historical data and adapt to changing conditions, resulting in more accurate and flexible pricing strategies.

Next Notes

1.Bias and Variance
2. Linear Regression

Resources