Data module
The data module is intended for preparing data for training. It heavily depends on the FiftyOne package and its integrations. There are the following submodules:
Submodules
- annotations - send data for annotation in CVAT and fetch annotations.
- brain - commands for
fiftyone.brain
module. - display - print dataset-related stats to a console.
- export - export datasets to different formats (if missing in the original fiftyone cli).
- tag - tag dataset samples that meet certain criteria.
- transforms - perform changes on datasets.
- zoo - perform operations with
fiftyone.zoo
module.
finegrained.data.annotations
Send, query and get annotation results.
annotate(dataset, annotation_key, label_field, backend, overwrite=False, label_type=None, project_id=None, segment_size=10, task_name=None, image_quality=75, task_asignee=None, organization=None, classes=None, **kwargs)
Send samples to annotations
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
str
|
fiftyone dataset with samples |
required |
annotation_key |
str
|
assign this key for annotation run |
required |
label_field |
str
|
if exists, upload labels |
required |
label_type |
Optional[str]
|
if label_field does not exist, this has to be specified |
None
|
backend |
Any
|
backend name or filepath to configs |
required |
overwrite |
bool
|
overwrite existing annotation run if True |
False
|
classes |
Optional[str]
|
list of classes or path to labels.txt file |
None
|
image_quality |
int
|
image upload quality |
75
|
task_name |
Optional[str]
|
custom task name, by default dataset name + annotation key |
None
|
segment_size |
int
|
number of frames/images per one job |
10
|
project_id |
Optional[int]
|
which cvat project to connect to |
None
|
task_asignee |
Optional[str]
|
assignee for the task |
None
|
**kwargs |
dataset loading filters |
{}
|
Source code in finegrained/data/annotations.py
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 |
|
delete_key(dataset, key)
Delete an annotation key.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
str
|
fiftyone dataset name |
required |
key |
str
|
annotation key |
required |
Returns:
Type | Description |
---|---|
none |
Source code in finegrained/data/annotations.py
135 136 137 138 139 140 141 142 143 144 145 146 |
|
list_keys(dataset)
List annotation keys attributed to the dataset
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
str
|
fiftyone dataset name |
required |
Returns:
Type | Description |
---|---|
types.LIST_STR
|
a list of keys |
Source code in finegrained/data/annotations.py
121 122 123 124 125 126 127 128 129 130 131 132 |
|
load(dataset, annotation_key, backend, dest_field=None, dataset_kwargs=None)
Download annotations from an annotation backend.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
str
|
fiftyone dataset name |
required |
annotation_key |
str
|
annotation key used to send for annotations |
required |
backend |
Any
|
annotation backend name or filepath with configs |
required |
dest_field |
str
|
if given, annotations will be stored in a new field |
None
|
dataset_kwargs |
Optional[dict]
|
dataset loading filters |
None
|
Returns:
Type | Description |
---|---|
none |
Source code in finegrained/data/annotations.py
94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
|
finegrained.data.brain
Run fiftyone.brain operations on a dataset.
compute_hardness(dataset, predictions, **kwargs)
Estimate how difficult is this sample to predict.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
str
|
fiftyone dataset name |
required |
predictions |
str
|
field with predictions |
required |
**kwargs |
dataset filters |
{}
|
Returns:
Type | Description |
---|---|
None |
Source code in finegrained/data/brain.py
26 27 28 29 30 31 32 33 34 35 36 37 38 |
|
compute_mistakenness(dataset, predictions, gt_field='ground_truth', **kwargs)
Estimate a probability that a ground truth label is wrong
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
str
|
fiftyone dataset name |
required |
predictions |
str
|
a field that contains model predictions |
required |
gt_field |
str
|
a field that contains ground truth data |
'ground_truth'
|
**kwargs |
dataset loading filters |
{}
|
Returns:
Type | Description |
---|---|
none |
Source code in finegrained/data/brain.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
finegrained.data.display
Display various data about datasets.
compute_area(dataset, field='area', average_size=False, overwrite_metadata=False, overwrite=False, **kwargs)
Calculate area of an image based on metadata
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
str
|
fiftyone dataset name |
required |
field |
str
|
field where to assign area values |
'area'
|
average_size |
bool
|
if True, calculate (width + height)/2 instead |
False
|
overwrite_metadata |
bool
|
whether to overwrite metadata |
False
|
overwrite |
bool
|
delete field if already exists |
False
|
**kwargs |
dataset loading filters |
{}
|
Returns:
Type | Description |
---|---|
tuple[int, int]
|
area bounds |
Source code in finegrained/data/display.py
94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
|
eval_report(dataset, predictions, gt_field='ground_truth', cmat=False, eval_kwargs={}, **kwargs)
Print evaluation report: compare prediction field against ground_truth field.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
str
|
fiftyone dataset name |
required |
predictions |
str
|
a field with predictions |
required |
gt_field |
str
|
a field with ground truth labels |
'ground_truth'
|
cmat |
bool
|
if True, plot a confusion matrix |
False
|
eval_kwargs |
dict
|
if passed, these params will be passed to the evaluation function |
{}
|
**kwargs |
dataset loading filters |
{}
|
Source code in finegrained/data/display.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
|
label_diff(dataset, label_field, tags_left, tags_right)
Compute difference between two sets of labels.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
str
|
fiftyone dataset name |
required |
label_field |
str
|
field with labels |
required |
tags_left |
types.LIST_STR_STR
|
list of tags for base list of labels |
required |
tags_right |
types.LIST_STR_STR
|
list of tags for intersection comparison |
required |
Source code in finegrained/data/display.py
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 |
|
print_labels(dataset, label_field, **kwargs)
Print all classes in the dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
str
|
fiftyone dataset name |
required |
label_field |
str
|
field that contains labels |
required |
**kwargs |
dataset loading filters |
{}
|
Source code in finegrained/data/display.py
11 12 13 14 15 16 17 18 19 20 21 |
|
finegrained.data.export
Dataset converting and exporting utils.
to_csv(dataset, label_field, export_path, extra_fields=None, **kwargs)
Export a dataset into CSV format for uploading to external sources.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
str
|
fiftyone dataset name |
required |
label_field |
str
|
field that contains labels (will be mapped to 'label') |
required |
export_path |
str
|
where to write csv file |
required |
extra_fields |
Optional[list[str]]
|
extra fields to be added to csv |
None
|
**kwargs |
dataset loading filters |
{}
|
Source code in finegrained/data/export.py
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
|
to_cvat(dataset, label_field, export_dir, **kwargs)
Export a dataset into CVAT format for annotation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
str
|
fiftyone dataset name |
required |
label_field |
str
|
field that contains labels |
required |
export_dir |
str
|
where to write data |
required |
**kwargs |
dataset loading filters |
{}
|
Source code in finegrained/data/export.py
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
|
to_yolov5(dataset, label_field, export_dir, splits, **kwargs)
Export a dataset into yolov5 format for training
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
str
|
fiftyone dataset name |
required |
label_field |
str
|
field that contains labels |
required |
export_dir |
str
|
where to write data |
required |
splits |
List[str]
|
which splits to export |
required |
**kwargs |
dataset loading filters |
{}
|
Source code in finegrained/data/export.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
|
finegrained.data.tag
Tag or untag samples with specific filters or condition
retag_missing_labels(dataset, label_field, from_tags, to_tags)
Remove from_tags and add to_tags for labels that are present in from_tags but absent in to_tags.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
str
|
fiftyone dataset name |
required |
label_field |
str
|
a label field |
required |
from_tags |
types.LIST_STR_STR
|
tags with base list of class labels |
required |
to_tags |
types.LIST_STR_STR
|
tags with intersection of class labels |
required |
Returns:
Type | Description |
---|---|
dict
|
a count of sample tags for a subset |
Source code in finegrained/data/tag.py
120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 |
|
split_classes(dataset, label_field, train_size=0.5, val_size=0.5, min_samples=3, split_names=('train', 'val'), overwrite=False)
Split classes in a dataset into train and val.
Used for meta-learning.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
str
|
fiftyone dataset name |
required |
label_field |
str
|
which field to use for classes |
required |
train_size |
float
|
fraction of classes to tag as train |
0.5
|
val_size |
float
|
fraction of classes to tag as val |
0.5
|
min_samples |
int
|
minimum number of samples per class to include a class into a split |
3
|
split_names |
tuple[str, str]
|
splits will be tagged with these names |
('train', 'val')
|
overwrite |
bool
|
if True, existing tags are removed |
False
|
Returns:
Type | Description |
---|---|
types.DICT_STR_FLOAT
|
a dict of tag counts |
Source code in finegrained/data/tag.py
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 |
|
split_dataset(dataset, splits={'train': 0.8, 'val': 0.1, 'test': 0.1}, **kwargs)
Create data split tags for a dataset
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
str
|
fiftyone dataset |
required |
splits |
types.DICT_STR_FLOAT
|
a dict of split names and relative sizes |
{'train': 0.8, 'val': 0.1, 'test': 0.1}
|
kwargs |
dataset loading filters |
{}
|
Returns:
Type | Description |
---|---|
a dict of split counts |
Source code in finegrained/data/tag.py
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
|
tag_alignment(dataset, vertical=True, tag=None, **kwargs)
Add a vertical/horizontal tag each sample.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
str
|
fiftyone dataset name |
required |
vertical |
bool
|
if True, vertical images are tagged. If False, horizontal images are tagged. |
True
|
tag |
Optional[str]
|
overwrite default 'vertical' or 'horizontal' tag. |
None
|
**kwargs |
dataset filter kwargs |
{}
|
Returns:
Type | Description |
---|---|
dict
|
a dict with sample tag counts |
Source code in finegrained/data/tag.py
93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
tag_labels(dataset, label_field, labels, tags)
Tag labels with given tags.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
str
|
fiftyone dataset name |
required |
label_field |
str
|
a label field |
required |
labels |
types.LIST_STR_STR
|
labels to filter, can be a txt file with labels |
required |
tags |
types.LIST_STR_STR
|
tags to apply |
required |
Returns:
Type | Description |
---|---|
dict
|
a count of label tags for a subset |
Source code in finegrained/data/tag.py
149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 |
|
tag_samples(dataset, tags, **kwargs)
Tag each sample in dataset with given tags
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
str
|
fiftyone dataset name |
required |
tags |
types.LIST_STR_STR
|
tags to apply |
required |
kwargs |
dataset loading kwargs, i.e. filters |
{}
|
Returns:
Type | Description |
---|---|
dict
|
a dict of sample tag counts |
Source code in finegrained/data/tag.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
finegrained.data.transforms
Data transforms on top of fiftyone datasets.
combine_datasets(dest_name, label_field, cfg, persistent=True, overwrite=False)
Create a new dataset by adding samples from multiple datasets.
List of datasets and filters are specified in a yaml config file. Source label fields will be renamed to a destination label field.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dest_name |
str
|
a new dataset name |
required |
label_field |
str
|
a new label field |
required |
cfg |
str
|
path to yaml config |
required |
persistent |
bool
|
whether to persist destination dataset (False for testing) |
True
|
overwrite |
bool
|
if dataset exists, overwrite it |
False
|
Returns:
Type | Description |
---|---|
a dataset instance |
Source code in finegrained/data/transforms.py
329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 |
|
delete_field(dataset, fields)
Delete one or more fields from a dataset
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
str
|
fiftyone dataset name |
required |
fields |
types.LIST_STR_STR
|
fields to delete |
required |
Returns:
Type | Description |
---|---|
a fiftyone dataset |
Source code in finegrained/data/transforms.py
124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
|
delete_samples(dataset, **kwargs)
Delete samples and associated files from a dataset
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
str
|
fiftyone dataset name |
required |
**kwargs |
dataset filters to select samples for deletion (must be provided) |
{}
|
Returns:
Type | Description |
---|---|
None |
Source code in finegrained/data/transforms.py
190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 |
|
exif_transpose(dataset, **kwargs)
Rotate images that have a PIL rotate tag
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
str
|
fiftyone dataset name |
required |
**kwargs |
dataset loading filters |
{}
|
Returns:
Type | Description |
---|---|
None |
Source code in finegrained/data/transforms.py
213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 |
|
fix_filepath(src, from_dir, to_dir)
Replace from_dir part to to_dir in each sample's filepath in samples.json file.
Samples.json file is updated inplace.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
src |
str
|
sample.json file export for fiftyone.types.FiftyOneDataset |
required |
from_dir |
str
|
relative directory to replace |
required |
to_dir |
str
|
new relative directory |
required |
Source code in finegrained/data/transforms.py
380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 |
|
from_label_tag(dataset, label_field, label_tag, **kwargs)
Update a label_field label with its label_tag.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
str
|
fiftyone dataset name |
required |
label_field |
str
|
a field that contains detections labels. |
required |
label_tag |
str
|
labels that contain this tag, will be renamed to it. |
required |
**kwargs |
dataset loading filters |
{}
|
Returns:
Type | Description |
---|---|
dict
|
updated label values |
Source code in finegrained/data/transforms.py
305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 |
|
from_labels(dataset, label_field, from_field, **kwargs)
Re-assign classification label to detection labels.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
str
|
fiftyone dataset name |
required |
label_field |
str
|
a field with detections to be updated |
required |
from_field |
str
|
a field with classification to get labels from |
required |
**kwargs |
dataset loading filters |
{}
|
Source code in finegrained/data/transforms.py
273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 |
|
map_labels(dataset, from_field, to_field, label_mapping=None, overwrite=False, **kwargs)
Create a new dataset field with mapped labels.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
str
|
fiftyone dataset name |
required |
from_field |
str
|
source label field |
required |
to_field |
str
|
a new label field |
required |
label_mapping |
Optional[dict]
|
label mapping (use {}/None for creating a field copy) |
None
|
overwrite |
bool
|
if to_field already exists, then overwrite it |
False
|
**kwargs |
dataset loading kwargs |
{}
|
Returns:
Type | Description |
---|---|
fo.DatasetView
|
dataset view |
Source code in finegrained/data/transforms.py
233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 |
|
merge_diff(dataset, image_dir, tags=None, recursive=True)
Merge new files into an existing dataset.
Existing files will be skipped. No labels for new files are expected. Merger happens based on an absolute filepath.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
str
|
existing fiftyone dataset |
required |
image_dir |
str
|
a folder with new files |
required |
tags |
types.LIST_STR_STR
|
tag new samples |
None
|
recursive |
bool
|
search for files in subfolders as well |
True
|
Returns:
Type | Description |
---|---|
an updated fiftyone dataset |
Source code in finegrained/data/transforms.py
163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 |
|
prefix_label(dataset, label_field, dest_field, prefix)
Prepend each label with given prefix
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
str
|
fiftyone dataset name |
required |
label_field |
str
|
a field with class labels |
required |
dest_field |
str
|
a new field to create with ' |
required |
prefix |
str
|
a prefix value |
required |
Returns:
Type | Description |
---|---|
fiftyone dataset object |
Source code in finegrained/data/transforms.py
142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 |
|
to_patches(dataset, label_field, to_name, export_dir, overwrite=False, splits=None, **kwargs)
Crop out patches from a dataset and create a new one.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
str
|
a fiftyone dataset with detections |
required |
label_field |
str | list[str]
|
label field(s) with detection, classification or polylines |
required |
to_name |
str
|
a new dataset name for patches |
required |
export_dir |
str
|
where to save crops |
required |
overwrite |
bool
|
if True and that name already exists, delete it |
False
|
splits |
Optional[list[str]]
|
if provided, these tags will be used to split patches into subsets |
None
|
**kwargs |
dataset filters |
{}
|
Returns:
Type | Description |
---|---|
fo.Dataset
|
fiftyone dataset object |
Source code in finegrained/data/transforms.py
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 |
|
transpose_images(dataset, **kwargs)
Rotate images 90 degrees.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
str
|
fiftyone dataset name |
required |
**kwargs |
dataset loading filters |
{}
|
Returns:
Type | Description |
---|---|
fo.DatasetView
|
a dataset view instance |
Source code in finegrained/data/transforms.py
406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 |
|
finegrained.data.zoo
Base constructs to use torchvision models.
object_detection(dataset, label_field, conf=0.25, image_size=None, device=None, **kwargs)
Detect COCO objects with mask-rcnn-v2 from torchvision
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
str
|
fiftyone dataset name |
required |
label_field |
str
|
which field to write predictions to |
required |
conf |
float
|
box confidence threshold |
0.25
|
image_size |
if specified, this will be a max image size (to save memory) |
None
|
|
**kwargs |
dataset loading filters |
{}
|
Returns:
Type | Description |
---|---|
None |
Source code in finegrained/data/zoo.py
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 |
|