ndsampler.abstract_frames

Fast access to subregions of images.

This implements the core convert-and-cache-as-cog logic, which enables us to read from subregions of images quickly.

Todo

  • [X] Implement npy memmap backend

  • [X] Implement gdal COG.TIFF backend
    • [X] Use as COG if input file is a COG

    • [X] Convert to COG if needed

Module Contents

Classes

Frames

Abstract implementation of Frames.

SimpleFrames

Basic concrete implementation of frames objects for images where there is a

AlignableImageData

Class for sampling channels / frames that are aligned with each other

Attributes

profile

ndsampler.abstract_frames.profile
class ndsampler.abstract_frames.Frames(hashid_mode='PATH', workdir=None, backend=None)[source]

Bases: object

Abstract implementation of Frames.

While this is an abstract class, it contains most of the Frames functionality. The inheriting class needs to overload the constructor and _lookup_gpath, which maps an image-id to its path on disk.

Parameters
  • hashid_mode (str, default=’PATH’) – The method used to compute a unique identifier for every image. to can be PATH, PIXELS, or GIVEN. TODO: Add DVC as a method (where it uses the name of the symlink)?

  • workdir (PathLike) – This is the directory where Frames can store cached results. This SHOULD be specified.

  • backend (str | Dict) – Determine the backend to use for fast subimage region lookups. This can either be a string ‘cog’ or ‘npy’. This can also be a config dictionary for fine-grained backend control. For this case, ‘type’: specified cog or npy, and only COG has additional options which are:

    {

    ‘type’: ‘cog’, ‘config’: { ‘compress’: <’LZW’ | ‘JPEG | ‘DEFLATE’ | ‘ZSTD’ | ‘auto’>, }

    }

Example

>>> from ndsampler.abstract_frames import *
>>> self = SimpleFrames.demo(backend='npy')
>>> file = self.load_image(1)
>>> print('file = {!r}'.format(file))
>>> assert self.load_image(1).shape == (512, 512, 3)
>>> assert self.load_region(1, (slice(-20), slice(-10))).shape == (492, 502, 3)
>>> # xdoctest: +REQUIRES(module:osgeo)
>>> self = SimpleFrames.demo(backend='cog')
>>> assert self.load_image(1).shape == (512, 512, 3)
>>> assert self.load_region(1, (slice(-20), slice(-10))).shape == (492, 502, 3)
Benchmark:
>>> from ndsampler.abstract_frames import *  # NOQA
>>> import ubelt as ub
>>> #
>>> ti = ub.Timerit(100, bestof=3, verbose=2)
>>> #
>>> self = SimpleFrames.demo(backend='cog')
>>> for timer in ti.reset('cog-small-subregion'):
>>>     self.load_image(1)[10:42, 10:42]
>>> #
>>> self = SimpleFrames.demo(backend='npy')
>>> for timer in ti.reset('npy-small-subregion'):
>>>     self.load_image(1)[10:42, 10:42]
>>> print('----')
>>> #
>>> self = SimpleFrames.demo(backend='cog')
>>> for timer in ti.reset('cog-large-subregion'):
>>>     self.load_image(1)[3:-3, 3:-3]
>>> #
>>> self = SimpleFrames.demo(backend='npy')
>>> for timer in ti.reset('npy-large-subregion'):
>>>     self.load_image(1)[3:-3, 3:-3]
>>> print('----')
>>> #
>>> self = SimpleFrames.demo(backend='cog')
>>> for timer in ti.reset('cog-loadimage'):
>>>     self.load_image(1)
>>> #
>>> self = SimpleFrames.demo(backend='npy')
>>> for timer in ti.reset('npy-loadimage'):
>>>     self.load_image(1)
DEFAULT_NPY_CONFIG
DEFAULT_COG_CONFIG
__getstate__()[source]
__setstate__(state)[source]
_update_backend(backend)[source]

change the backend and update internals accordingly

classmethod _coerce_backend_config(backend=None)[source]

Coerce a backend argument into a valid configuration dictionary.

Returns

a dictionary with two items: ‘type’, which is a string and

and ‘config’, which is a dictionary of parameters for the specific type.

Return type

Dict

property cache_dpath

Returns the path where cached frame representations will be stored.

This will be None if there is no backend.

abstract _build_pathinfo(image_id)[source]

A user specified function that maps an image id to paths to relevant resources on disk. These resources are also indexed by channel.

SeeAlso:

_populate_chan_info for helping populate cache info in each channel.

Parameters

image_id – the image id (usually an integer)

Returns

with the following structure:
{

<NotFinalized> ‘channels’: {

<channel_spec>: {‘path’: <abspath>, …}, …

}

}

Return type

Dict

_lookup_pathinfo(image_id)[source]
_populate_chan_info(chan, root='')[source]

Helper to construct a path dictionary in the _build_pathinfo method based on the current hashing and caching settings.

static _build_file_hashid(root, suffix, hashid_mode)[source]

Build a hashid for a specific file given as a path root and suffix.

property image_ids
__len__()[source]
__getitem__(index)[source]
load_region(image_id, region=None, channels=ub.NoParam, width=None, height=None)[source]

Ammortized O(1) image subregion loading (assuming constant region size)

Parameters
  • image_id (int) – image identifier

  • region (Tuple[slice, …]) – space-time region within an image

  • channels (str) – NotImplemented

  • width (int) – if the width of the entire image is know specify it

  • height (int) – if the height of the entire image is know specify it

_load_alignable(image_id, cache=True)[source]
load_image(image_id, channels=ub.NoParam, cache=True, noreturn=False)[source]

Load the image data for a particular image id

Parameters
  • image_id (int) – the id of the image to load

  • cache (bool, default=True) – ensure and return the efficient backend cached representation.

  • channels – NotImplemented

  • noreturn (bool, default=False) – if True, nothing is returned. This is useful if you simply want to ensure the cached representation.

CAREFUL: THIS NEEDS TO MAINTAIN A STABLE API. OTHER PROJECTS DEPEND ON IT.

Returns

an indexable array like representation, possibly

memmapped.

Return type

ArrayLike

load_frame(image_id)[source]

TODO: FINISHME or rename to lazy frame?

Returns a frame object that lazy loads on slice

prepare(gids=None, workers=0, use_stamp=True)[source]

Precompute the cached frame conversions

Parameters
  • gids (List[int] | None) – specific image ids to prepare. If None prepare all images.

  • workers (int, default=0) – number of parallel threads for this io-bound task

Example

>>> from ndsampler.abstract_frames import *
>>> workdir = ub.ensure_app_cache_dir('ndsampler/tests/test_cog_precomp')
>>> print('workdir = {!r}'.format(workdir))
>>> ub.delete(workdir)
>>> ub.ensuredir(workdir)
>>> self = SimpleFrames.demo(backend='npy', workdir=workdir)
>>> print('self = {!r}'.format(self))
>>> print('self.cache_dpath = {!r}'.format(self.cache_dpath))
>>> #_ = ub.cmd('tree ' + workdir, verbose=3)
>>> self.prepare()
>>> self.prepare()
>>> #_ = ub.cmd('tree ' + workdir, verbose=3)
>>> _ = ub.cmd('ls ' + self.cache_dpath, verbose=3)

Example

>>> from ndsampler.abstract_frames import *
>>> import ndsampler
>>> workdir = ub.get_app_cache_dir('ndsampler/tests/test_cog_precomp2')
>>> ub.delete(workdir)
>>> # TEST NPY
>>> #
>>> sampler = ndsampler.CocoSampler.demo(workdir=workdir, backend='npy')
>>> self = sampler.frames
>>> ub.delete(self.cache_dpath)  # reset
>>> self.prepare()  # serial, miss
>>> self.prepare()  # serial, hit
>>> ub.delete(self.cache_dpath)  # reset
>>> self.prepare(workers=3)  # parallel, miss
>>> self.prepare(workers=3)  # parallel, hit
>>> #
>>> ## TEST COG
>>> # xdoctest: +REQUIRES(module:osgeo)
>>> sampler = ndsampler.CocoSampler.demo(workdir=workdir, backend='cog')
>>> self = sampler.frames
>>> ub.delete(self.cache_dpath)  # reset
>>> self.prepare()  # serial, miss
>>> self.prepare()  # serial, hit
>>> ub.delete(self.cache_dpath)  # reset
>>> self.prepare(workers=3)  # parallel, miss
>>> self.prepare(workers=3)  # parallel, hit
class ndsampler.abstract_frames.SimpleFrames(id_to_path, workdir=None, backend=None)[source]

Bases: Frames

Basic concrete implementation of frames objects for images where there is a strict one-file-to-one-image mapping (i.e. no auxiliary images).

Parameters

id_to_path (Dict) – mapping from image-id to image path

Example

>>> from ndsampler.abstract_frames import *
>>> self = SimpleFrames.demo(backend='npy')
>>> pathinfo = self._build_pathinfo(1)
>>> print('pathinfo = {}'.format(ub.repr2(pathinfo, nl=3)))
>>> assert self.load_image(1).shape == (512, 512, 3)
>>> assert self.load_region(1, (slice(-20), slice(-10))).shape == (492, 502, 3)
_lookup_gpath(image_id)[source]
image_ids()
classmethod demo(**kw)[source]

Get a smple frames object

_build_pathinfo(image_id)[source]

A user specified function that maps an image id to paths to relevant resources on disk. These resources are also indexed by channel.

SeeAlso:

_populate_chan_info for helping populate cache info in each channel.

Parameters

image_id – the image id (usually an integer)

Returns

with the following structure:
{

<NotFinalized> ‘channels’: {

<channel_spec>: {‘path’: <abspath>, …}, …

}

}

Return type

Dict

class ndsampler.abstract_frames.AlignableImageData(pathinfo, cache_backend)[source]

Bases: object

Class for sampling channels / frames that are aligned with each other

Todo

  • [ ] This is more general than the older way of accessing image data

however, there is a lot more logic that hasn’t been profiled, so we may be able to find meaningful optimizations.

  • [ ] Make sure adding this didnt significantly hurt performance

  • [ ] DEPRECATE THIS IN FAVOR OF NEW KWCOCO DELAYED LOGIC

Example

>>> from ndsampler.abstract_frames import *
>>> frames = SimpleFrames.demo(backend='npy')
>>> pathinfo = frames._build_pathinfo(1)
>>> cache_backend = frames._backend
>>> print('pathinfo = {}'.format(ub.repr2(pathinfo, nl=3)))
>>> self = AlignableImageData(pathinfo, cache_backend)
>>> img_region = None
>>> prefused = self._load_prefused_region(img_region)
>>> print('prefused = {!r}'.format(prefused))
>>> img_region = (slice(0, 10), slice(0, 10))
>>> prefused = self._load_prefused_region(img_region)
>>> print('prefused = {!r}'.format(prefused))
_load_native_channel(chan_name, cache=True)[source]

Load a specific auxiliary channel, optionally caching it

_load_delayed_channel(chan_name, cache=True)[source]
_coerce_channels(channels=ub.NoParam)[source]
_load_prefused_region(img_region, channels=ub.NoParam)[source]

Loads crops from multiple channels in their native coordinate system packaged with transformation info on how to align them.

_load_fused_region(img_region, channels=ub.NoParam)[source]

Loads crops from multiple channels in aligned base coordinates.

load_region(img_region, channels=ub.NoParam, fused=True)[source]
Parameters

img_region (Tuple[slice, …]) – slice into the base image (will be warped into the auxiliary image’s frames)

__getitem__(img_region)[source]