Getting started#

Data Splitting for Training#

Before training your model, it's crucial to split your data into different subsets. In machine learning, we typically split the data into three sets:

Train set: The largest portion of the data used to train the model. The model learns about patterns and objects of interest from these images.
Dev set: After the model trains on images from the Train set, it validates its training and predictions on images from the Dev set. LandingLens continues to fine-tune the model based on its performance on the Dev set.
Test set: After the model trains on the Train set and fine-tunes on the Dev set, it makes predictions on images in the Test set. The model hasn't "seen" (been trained on) images from the Test set, so the performance on the Test set provides insights into how well the model performs in real-world scenarios.

To learn more about splits and why they’re important, go to Splits.

Our API offers two ways to manage data splits for your training images: Manual Splitting and Automatic Splitting.

1. Manual Splitting:#

You can upload your images with pre-assigned splits through the API. Refer to Upload Image for details on how to specify the split during image upload.

2. Automatic Splitting:#

The API provides an endpoint to perform automatic data splitting based on the percentages you set. This is a convenient option if you don't want to manually assign splits to each image.

Here's how the automatic splitting endpoint works:

selectOption: This field determines which images are included in the split process. You have two options:
- all-labeled: This option includes all labeled images in the project, regardless of their existing split assignments. Caution: This option can overwrite any prior manual splits you've assigned.
- without-split: This option only considers labeled images that haven't been assigned a split yet. Existing splits remain unchanged.
splitPercentages: This object defines the percentage of images allocated to each split set.

In this example, we'll assign 75% of images to the Train set, 25% to the Dev set, and 0% to the Test set (this assumes you don't need a separate test set):

{
  "splitPercentages": {
    "train": 75,
    "dev": 25,
    "test": 0
  },
  "selectOption": "all-labeled"
}

The following Python code demonstrates how to call the automatic splitting endpoint:

import requests

# Replace "YOUR_API_KEY" and "YOUR_PROJECT_ID" with the actual values
project_id = "YOUR_PROJECT_ID"
api_key = "YOUR_API_KEY"

payload = {
    "splitPercentages": {
        "train": 75,
        "dev": 25,
        "test": 0
    },
    "selectOption": "all-labeled"
}

url = f"https://api.landing.ai/v1/projects/{project_id}/autosplit"
headers = {"apikey": api_key}

response = requests.post(url, headers=headers, json=payload)

if response.status_code == 201:
    print("Splits has been assigned")

Fast Training#

Once we've successfully uploaded our image dataset, we can train our computer vision model. The LandingLens platform facilitates this process through its model training API.

The training process is started by calling the Python requests library and the LandingLens model training API:

import requests

# Replace "YOUR_API_KEY" and "YOUR_PROJECT_ID" with the actual values
api_key = "YOUR_API_KEY"
project_id = "YOUR_PROJECT_ID"

# Define the model training endpoint
url = f"https://api.landing.ai/v1/projects/{project_id}/train"

# Set the authorization header
headers = {"apikey": api_key}

# Send the POST request to initiate training
response = requests.post(url, headers=headers)

if response.status_code == 200:
    data = response.json()
    training_id = data["data"]["trainingId"]
    print(f"Training started successfully! Training ID: {training_id}")
    # Store the training ID for future use (e.g., status monitoring)
else:
    print(f"Error during training request: {response.text}")

Important#

If the API call successfully initiated the model training process, it will return a message that includes the training_id. Make a note of the training_ID, because you can use it to check the status of the training process in the next step.

This API call starts the training process but doesn’t return the training status or confirm that the training completed. Training time can vary depending on factors like dataset size and model complexity.

Monitor the Training Progress#

The API call above starts the training process but doesn’t return the training status or confirm that the training completed. Training time can vary depending on factors like dataset size and model complexity.

You can use the LandingLens status API endpoint to check the status of the training process.

Here's how you can check the training progress using your trainingId:

import requests
import time

# Replace "YOUR_API_KEY" and "trainingId" with the actual values
api_key = "YOUR_API_KEY"
project_id = "YOUR_PROJECT_ID"
training_id = "YOUR_TRAINING_ID"  # Replace with the ID from the previous step

# Define the training status endpoint
url = f"https://api.landing.ai/v1/projects/{project_id}/train/{training_id}/status"

# Set the authorization header
headers = {"apikey": api_key}

# Continuously check training status until completion
training_complete = False
while not training_complete:
    time.sleep(10)  # Adjust the wait time between checks (in seconds)

    response = requests.get(url, headers=headers)

    if response.status_code == 200:
        data = response.json()
        training_status = data["data"]["status"]
        print(f"Training status: {training_status}")

        if training_status == "SUCCEEDED":
            training_complete = True
            print("Training completed successfully!")
        elif training_status == "FAILED":
            training_complete = True
            print("Training failed. Please check the LandingLens platform for details.")
    else:
        print(f"Error during training status request: {response.text}")

This script provides a basic implementation for monitoring. To get more details about the training process, you can refer to the LandingLens interface.