ndsampler.coco_regions

Maintains information about groundtruth targets. Positives are specified explicitly, and negatives are mined.

The Targets class maintains the “Positive Population”.

A Positive is a bounding box that belongs to an image or video with a class label and potentially other attributes. Negatives are similar except they are boxes that do not significantly intersect positives. A pool of positives can also be selected from the population such that only a subset of data is used per epoch.

Cases to Handle:
  • [ ] Annotations are significantly smaller than images
    • Annotations are typically very far apart

    • Annotations can be clustered tightly together

    • Annotations are at massively different scales

  • [ ] Annotations are about the same size as the images

Module Contents

Classes

Targets

Abstract API

CocoRegions

Converts Coco-Style datasets into a table for efficient on-line work

Functions

tabular_coco_targets(dset)

Transforms COCO box annotations into a tabular form

select_positive_regions(targets[, window_dims, ...])

Reduce positive example redundency by selecting disparate positive samples

new_video_sample_grid(dset[, window_dims, ...])

Create a space time-grid to sample with

new_image_sample_grid(dset, window_dims[, ...])

Create a space time-grid to sample with

Attributes

profile

ndsampler.coco_regions.profile
exception ndsampler.coco_regions.MissingNegativePool

Bases: AssertionError

Assertion failed.

class ndsampler.coco_regions.Targets

Bases: object

Abstract API

get_negative(index=None, rng=None)
get_positive(index=None, rng=None)
abstract overlapping_aids(gid, box)
preselect(n_pos=None, n_neg=None, neg_to_pos_ratio=None, window_dims=None, rng=None, verbose=0)

Shuffle selection of positive and negative samples

Todo

[X] Basic, window around positive annotation algorithm [ ] Sliding window algorithm from bioharn

class ndsampler.coco_regions.CocoRegions(dset, workdir=None, verbose=1)

Bases: Targets, ndsampler.utils.util_misc.HashIdentifiable, ubelt.NiceRepr

Converts Coco-Style datasets into a table for efficient on-line work

Perhaps rename this class to regions, and then have targets be an attribute of regions.

Parameters
  • dset (ndsampler.CocoAPI) – a dataset in coco format

  • workdir (PathLike) – a temporary directory where we can cache stuff

  • verbose (int) – verbosity level

Example

>>> from ndsampler.coco_regions import *
>>> self = CocoRegions.demo()
>>> pos_tr = self.get_positive(rng=0)
>>> neg_tr = self.get_negative(rng=0)
>>> print(ub.repr2(pos_tr, precision=2))
>>> print(ub.repr2(neg_tr, precision=2))
property catgraph
property n_negatives
property n_positives
property n_samples
property class_ids
property image_ids
property n_annots
property n_images
property n_categories
property isect_index

Lazy access to a disk-cached intersection index for this dataset

property targets

All viable positive annotations targets in a flat table.

The main idea is that this is the population of all positives that we could sample from. Often times we will simply use all of them.

This function takes a subset of annotations in the coco dataset that can be considered “viable” positives. We may subsample these futher, but this serves to collect the annotations that could feasibly be used by the network. Essentailly we remove annotations without bounding boxes. I’m not sure I 100% like the way this works though. Shouldn’t filtering be done before we even get here? Perhaps but perhaps not. This design needs a bit more thought.

property neg_anchors
lookup_class_name(class_id)
lookup_class_id(class_name)
__nice__()
classmethod demo()
_make_hashid()
_lazy_isect_index(verbose=None)
overlapping_aids(gid, region, visible_thresh=0.0)

Finds the other annotations in this image that overlap a region

Parameters
  • gid (int) – image id

  • region (kwimage.Boxes) – bounding box

  • visible_thresh (float) – does not return annotations with visibility less than this threshold.

Returns

annotation ids

Return type

List[int]

get_segmentations(aids)

Returns the segmentations corresponding to a set of annotation ids

Example

>>> from ndsampler.coco_regions import *
>>> from ndsampler import coco_sampler
>>> self = coco_sampler.CocoSampler.demo().regions
>>> aids = [1, 2]
get_negative(index=None, rng=None)

Get localization information for a negative region

Parameters
  • index (int or None) – indexes into the current negative pool or if None returns a random negative

  • rng (RandomState) – used only if index is None

Returns

tr: target info dictionary

Return type

Dict

CommandLine:

xdoctest -m ndsampler.coco_regions CocoRegions.get_negative

Example

>>> from ndsampler.coco_regions import *
>>> from ndsampler import coco_sampler
>>> rng = kwarray.ensure_rng(0)
>>> self = coco_sampler.CocoSampler.demo().regions
>>> tr = self.get_negative(rng=rng)
>>> # xdoctest: +IGNORE_WANT
>>> assert 'category_id' in tr
>>> assert 'aid' in tr
>>> assert 'cx' in tr
>>> print(ub.repr2(tr, precision=2))
{
    'aid': -1,
    'category_id': 0,
    'cx': 190.71,
    'cy': 95.83,
    'gid': 1,
    'height': 140.00,
    'img_height': 600,
    'img_width': 600,
    'width': 68.00,
}
get_positive(index=None, rng=None)

Get localization information for a positive region

Parameters
  • index (int or None) – indexes into the current positive pool or if None returns a random negative

  • rng (RandomState) – used only if index is None

Returns

tr: target info dictionary

Return type

Dict

Example

>>> from ndsampler import coco_sampler
>>> rng = kwarray.ensure_rng(0)
>>> self = coco_sampler.CocoSampler.demo().regions
>>> tr = self.get_positive(0, rng=rng)
>>> print(ub.repr2(tr, precision=2))
get_item(index, rng=None)

Loads from positives and then negatives.

_random_negatives(num, exact=False, neg_anchors=None, window_size=None, rng=None, thresh=0.0)

Samples multiple negatives at once for efficiency

Parameters
  • num (int) – number of negatives to sample

  • exact (bool) – if True, we will try to find exactly num negatives, otherwise the number returned is approximate.

  • neg_anchors () – prior normalized aspect ratios for negative boxes. Mutually exclusive with window_size.

  • window_size (Tuple) – absolute box size (width, height) used to sample negative regions. If not specified the relative anchor strategy will be used to randomly choose potentially non-square regions relative to the image size.

  • thresh (float) – overlap area threshold as a percentage of the negative box size. When thresh=0.0, that means negatives cannot overlap any positive, when threh=1.0, there are no constrains on negative placement.

Returns

targets - contains negative target information

Return type

DataFrameArray

Example

>>> from ndsampler.coco_regions import *
>>> from ndsampler import coco_sampler
>>> self = coco_sampler.CocoSampler.demo().regions
>>> num = 100
>>> rng = kwarray.ensure_rng(0)
>>> targets = self._random_negatives(num, rng=rng)
>>> assert len(targets) <= num
>>> targets = self._random_negatives(num, exact=True)
>>> assert len(targets) == num
new_sample_grid(task, window_dims, window_overlap=0, **kwargs)

New experimental method to replace preselect positives / negatives

Parameters
  • task (str) – can be video_detection image_detection # video_classification # image_classification

  • **kwargs – passed to new_video_sample_grid or new_image_sample_grid

Example

>>> from ndsampler.coco_regions import *
>>> from ndsampler import coco_sampler
>>> self = coco_sampler.CocoSampler.demo('vidshapes1').regions
>>> self.dset.conform()
>>> sample_grid = self.new_sample_grid('video_detection', window_dims=(2, 100, 100))
_preselect_positives(num=None, window_dims=None, rng=None, verbose=None)

” preload a bunch of positives

Example

>>> from ndsampler.coco_regions import *
>>> from ndsampler import coco_sampler
>>> self = coco_sampler.CocoSampler.demo().regions
>>> window_dims = (64, 64)
>>> self._preselect_positives(window_dims=window_dims, verbose=4)
_preselect_negatives(num, window_dims=None, thresh=0.3, rng=None, verbose=None)

Preselect a set of random regions to be used as negative examples.

Parameters
  • num (int) – number of desired negatives to preselect. In some cases achieving this number may not be possible.

  • window_dims (Tuple) – absolute dimensions (height, width) used to sample negative regions. If not specified the relative anchor strategy will be used to randomly choose potentially non-square regions relative to the image size.

  • thresh (float) – overlap area threshold as a percentage of the negative box size. When thresh=0.0, that means negatives cannot overlap any positive, when threh=1.0, there are no constrains on negative placement.

  • rng (int | RandomState) – random seed / state

  • verbose (int) – verbosity level

Returns

number of negatives actually chosen

Return type

int

Example

>>> from ndsampler.coco_regions import *
>>> import ndsampler
>>> self = ndsampler.CocoSampler.demo().regions
>>> num = 100
>>> self._preselect_negatives(num, window_dims=(30, 30))
_cacher(fname, extra_deps=None, disable=False, verbose=None)

Create a cacher for a known lazy computation using a common hashid.

If self.workdir or self.hashid is None, then caches are disabled by default. Caches can be explicitly disabled by setting the appropriate value in the self._enabled_caches dictionary.

Parameters
  • fname (str) – name of the property we are caching

  • extra_deps (OrderedDict) – extra data to contribute to the hashid

  • disable (bool) – explicitly disable cache if True, otherwise do normal checks to see if enabled.

  • verbose (bool, default=None) – if specified overrides self.verbose.

Returns

cacher - if enabled this cacher will minimally depend

on the self.hashid, but may also depend on extra info.

Return type

ub.Cacher

ndsampler.coco_regions.tabular_coco_targets(dset)

Transforms COCO box annotations into a tabular form

_ = xdev.profile_now(tabular_coco_targets)(dset)

ndsampler.coco_regions.select_positive_regions(targets, window_dims=(300, 300), thresh=0.0, rng=None, verbose=0)

Reduce positive example redundency by selecting disparate positive samples

Example

>>> from ndsampler.coco_regions import *
>>> import kwcoco
>>> dset = kwcoco.CocoDataset.demo('shapes8')
>>> targets = tabular_coco_targets(dset)
>>> window_dims = (300, 300)
>>> selected = select_positive_regions(targets, window_dims)
>>> print(len(selected))
>>> print(len(dset.anns))
ndsampler.coco_regions.new_video_sample_grid(dset, window_dims=None, window_overlap=0.0, space_dims=None, time_dim=None, classes_of_interest=None, ignore_coverage_thresh=0.6, negative_classes={'ignore', 'background'}, use_annots=True, legacy=True, verbose=1)

Create a space time-grid to sample with

Returns

sample_grid

contains “targets”, and if use_annots=True then also

contains “positives_indexes” and “negatives_indexes” indicating which annotations contain positive/negative samples.

The “positives” and “negatives” lists are deprecated and will be removed.

Return type

Dict

Example

>>> from ndsampler.coco_regions import *  # NOQA
>>> import kwcoco
>>> dset = kwcoco.CocoDataset.demo('vidshapes8-multispectral', num_frames=5)
>>> dset.conform()
>>> window_dims = (2, 224, 224)
>>> sample_grid = new_video_sample_grid(dset, window_dims)
>>> print('sample_grid = {}'.format(ub.repr2(sample_grid, nl=2)))
>>> # Now try to load a sample
>>> tr = sample_grid['positives'][0]
>>> import ndsampler
>>> sampler = ndsampler.CocoSampler(dset)
>>> tr_ = sampler._infer_target_attributes(tr)
>>> print('tr_ = {}'.format(ub.repr2(tr_, nl=1)))
>>> sample = sampler.load_sample(tr)
>>> assert sample['im'].shape == (2, 224, 224, 5)

Example

>>> from ndsampler.coco_regions import *  # NOQA
>>> import kwcoco
>>> dset = kwcoco.CocoDataset.demo('vidshapes8-multispectral', num_frames=5)
>>> dset.conform()
>>> window_dims = (2, 224, 224)
>>> sample_grid = new_video_sample_grid(dset, window_dims, use_annots=False)
Ignore:

import timerit ti = timerit.Timerit(10, bestof=3, verbose=2) for timer in ti.reset(‘vid use_annots=True’):

with timer:

new_video_sample_grid(dset, window_dims, use_annots=True, verbose=0)

for timer in ti.reset(‘vid use_annots=False’):
with timer:

new_video_sample_grid(dset, window_dims, use_annots=False, verbose=0)

import timerit ti = timerit.Timerit(10, bestof=3, verbose=2) for timer in ti.reset(‘img use_annots=True’):

with timer:

new_image_sample_grid(dset, window_dims[1:], use_annots=True, verbose=0)

for timer in ti.reset(‘img use_annots=False’):
with timer:

new_image_sample_grid(dset, window_dims[1:], use_annots=False, verbose=0)

Ignore:

import xdev globals().update(xdev.get_func_kwargs(new_video_sample_grid))

ndsampler.coco_regions.new_image_sample_grid(dset, window_dims, window_overlap=0.0, classes_of_interest=None, ignore_coverage_thresh=0.6, negative_classes={'ignore', 'background'}, use_annots=True, legacy=True, verbose=1)

Create a space time-grid to sample with

Returns

sample_grid

contains “targets”, and if use_annots=True then also

contains “positives_indexes” and “negatives_indexes” indicating which annotations contain positive/negative samples.

The “positives” and “negatives” lists are deprecated and will be removed.

Return type

Dict

Example

>>> from ndsampler.coco_regions import *  # NOQA
>>> import kwcoco
>>> dset = kwcoco.CocoDataset.demo('shapes8')
>>> dset = kwcoco.CocoDataset.demo('vidshapes8-multispectral')
>>> window_dims = (224, 224)
>>> sample_grid1 = new_image_sample_grid(dset, window_dims, use_annots=False)
>>> sample_grid = new_image_sample_grid(dset, window_dims)
>>> # Now try to load a sample
>>> idx = sample_grid['positives_indexes'][0]
>>> tr = sample_grid['targets'][idx]
>>> import ndsampler
>>> sampler = ndsampler.CocoSampler(dset)
>>> tr['channels'] = '<all>'
>>> tr_ = sampler._infer_target_attributes(tr)
>>> print('tr_ = {}'.format(ub.repr2(tr_, nl=1)))
>>> sample = sampler.load_sample(tr)
>>> assert sample['im'].shape == (224, 224, 5)
Ignore:

import xdev globals().update(xdev.get_func_kwargs(new_image_sample_grid))