Data Management
Data Management
The landingai
library provides a set of APIs to manage dataset images and their metadata.
This section explains how to use the associated Python APIs to:
- Upload new (single) images with metadata
- Upload all images from a folder with metadata
- Set metadata for images already in LandingLens
Introduction to Metadata
Metadata is additional information you can attach to an image. Every image can be associated with multiple metadata. Each metadata is a key-value pair associated with an image, where key is the metadata name and value is a string that represents the information. For example, when you upload an image to LandingLens, you can add metadata like the country where the image was created, the timestamp when the image was created, etc.
Metadata is useful when you need to manage hundreds or thousands of images in LandingLens or you need to collaborate with other team members to label datasets. For example, you can metadata to group certain types of images together (ex: images taken last week), then change their split key or create a labeling task for those images.
Use the landingai.data_management.metadata.Metadata
API to manage metadata.
Prerequisite: You must create a metadata key in the LandingLens UI before you can update it (assign a value to it) via the API. Each metadata key is project-specific.
The following screenshot shows how to acess the Manage Metadata module.
Code Example of Metadata Management
The following code snippet shows how to assign values to the Timestamp, Country, and Labeler metadata keys.
from landingai.data_management import Metadata
# Provide the API key and project ID
YOUR_API_KEY = "land_sk_12345"
YOUR_PROJECT_ID = 1190
metadata_client = Metadata(YOUR_PROJECT_ID, YOUR_API_KEY)
# Set three metadata values ('timestamp', 'country' and 'labeler') for images with IDs 123 and 124
metadata_client.update(media_ids=[123, 124], timestamp=12345, country="us", labeler="tom")
# Output:
# {
# project_id: 1190,
# metadata: {"country": "us", "timestamp": "12345", labeler="tom"},
# media_ids: [123, 124]
# }
Update Split Key for Images
When managing hundreds or thousands of images on the platform, it can be more efficient to manage (add/update/remove) the split key programmatically. Use the update_split_key()
function in landingai.data_management.media.Media
to manage the the split value for images.
Example
>>> client = Media(project_id, api_key)
>>> client.update_split_key(media_ids=[1001, 1002], split_key="test") # assign split key 'test' for images with IDs 1001 and 1002
>>> client.update_split_key(media_ids=[1001, 1002], split_key="") # remove split key for images with IDs 1001 and 1002
Split Keys
Valid split keys are "train", "dev", or "test" (case insensitive). To remove a split key from an image, assign the split value as "". After that, the image split will be "unassigned".
Media ID
To update the split key, you need to provide a list of media ids
(image IDs). The image IDs can be found in the LandingLens UI or by using the ls()
function in landingai.data_management.media.Media
.
Example:
>>> media_client.ls()
>>> { medias: [{'id': 4358352, 'mediaType': 'image', 'srcType': 'user', 'srcName': 'Michal', 'properties': {'width': 258 'height': 176}, 'name': 'n01443537_501.JPEG', 'uploadTime': '2020-09-15T22:29:01.338Z', 'metadata': {'split': 'train' 'source': 'prod'}, 'media_status': 'raw'}, ...], num_requested: 1000, count: 300, offset: 0, limit: 1000 }
Upload Images to LandingLens
Use the landingai.data_management.media.Media
API to upload images to a specific project or list the images already in a specific project.
In addition to uploading images, the upload API supports the following features: 1. Assign a split ('train'/'dev'/'test') to images. An empty string '' represents Unassigned and is the default. 2. Upload labels along with images. The supported label files are: * Pascal VOC XML files for Object Detection projects. * Segmentation mask files for Segmentation projects. * A classification name (string) for Classification projects. 3. Attach additional metadata (key-value pairs) to images.
for more information, go here.
Upload Segmentation Masks
Use the upload()
function to upload an image and its segmentation mask (labels) together. When you upload a segmentation mask, the function requires a seg_defect_map
parameter. This parameter points to a JSON file that maps the pixel values to class names. To get this map, use the landingai.data_management.label.Label
API.
The following code snippet shows how to upload an image and its segmentation mask.
>>> client = Label(project_id, api_key)
>>> client.get_label_map()
>>> {'0': 'ok', '1': 'cat', '2': 'dog'} # then write this map to a JSON file locally
File Upload Limitations
Supported Image File Types
LandingLens supports the following image file types: bmp
, jpeg
, jpg
, png
.
Additionally, the Python upload()
API supports uploading tiff
image files. However, the upload()
API will automatically convert the tiff
to a png
file and, then upload the png
image to LandingLens.
Code Example of Image Management
The following code snippet shows how to list images currently in a project, upload a single image, and upload a folder of images.
from landingai.data_management import Media
# Provide API Key and project ID
YOUR_API_KEY = "land_sk_12345"
YOUR_PROJECT_ID = 1190
# Lists all medias with metadata from a project
media_client = Media(YOUR_PROJECT_ID, YOUR_API_KEY)
media_client.ls()
# Output:
# { medias: [{'id': 4358352, 'mediaType': 'image', 'srcType': 'user', 'srcName': 'Michal', 'properties': {'width': 258, 'height': 176}, 'name': 'n01443537_501.JPEG', 'uploadTime': '2020-09-15T22:29:01.338Z', 'metadata': {'split': 'train', 'source': 'prod'}, 'media_status': 'raw'}, ...],
# num_requested: 1000,
# count: 300,
# offset: 0,
# limit: 1000 }
# paginate with a custom page size
media_client.ls(offset=0, limit=100)
# any metadata can be used to filter the medias (server side filtering)
media_client.ls(split="train")
# upload a single media along with the label file, and assign it to the 'dev' split
media_client.upload("/Users/tom/Downloads/image.jpg", split="dev", object_detection_xml="/Users/tom/Downloads/image.xml")
# upload a folder of images and assign them to the 'train' split
media_client.upload("/Users/tom/Downloads/images", split="train")
# if you have created a metadata in the platform, for example "cloudy" (this is case sensitive), you can also upload a value for that metadata
media_client.upload("/Users/tom/Downloads/images/image1.png", metadata_dict={"cloudy": "true"})