Schema Manipulation Utilities

The scidbpy.schema_utils module contains functions useful for manipulating array schemas. Many native SciDB functions require that the schemas of input arrays obey certain properties, like having identical chunk sizes. The routines in this module help to preprocess arrays to satisfy these requirements.

The functions in this module are designed to return their inputs unchanged, if no modification is necessary. This saves you from having to pre-check whether a given preprocessing step is necessary.

See Robust AFL Operators for a collection of SciDB-Py analogs to AFL functions, which perform necessary array preprocessing automatically.

Functions

scidbpy.schema_utils.as_column_vector(array)

Convert a 1D array into a 2D array with a single column

scidbpy.schema_utils.as_row_vector(array)

Convert a 1D array into a 2D array with a single row

scidbpy.schema_utils.as_same_dimension(*arrays)

Coerce arrays into the same shape if possible, or raise a ValueError

Parameters:

*arrays

One or more arrays to test

Returns:

new_arrays : tuple of SciDBArrays

Raises:

ValueError

if arrays have mismatched dimensions, and cannot be coerced into the same shape.

Notes

Currently this function only checkes for mismatched dimensions, it is unable to fix them.

scidbpy.schema_utils.assert_schema(arrays, zero_indexed=False, bounded=False, same_attributes=False, same_dimension=False)

Check that a set of arrays obeys a set of criteria on their schemas.

Parameters:

arrays : tuple of SciDBArrays

The arrays to check

zero_indexed : boolean, optional (default False)

If True, check at all arrays have origins at 0

bounded : boolean, optional (default False)

If True, check that all arrays are bounded (ie don’t have * in

the dimension schema)

same_attributes : boolean, optional (default False)

If True, check that all arrays have identical attribute names, order, datatypes, and nullability

same_dimension : boolean, optional (default True)

If True, check that all arrays have the same dimensionality

Returns:

arrays : tuple of SciDBArrays

The input

Raises:

ValueError : If any test fails

scidbpy.schema_utils.assert_single_attribute(array)

Raise a ValueError if an array has multiple attributes

Parameters:

array : SciDBArray

The array to test

Returns:

array : SciDBArray

The input array

Raises:

ValueError

if array has multiple attributes

scidbpy.schema_utils.boundify(array, trim=False)

Redimension an array as needed so that no dimension is unbound (ie ending with *)

Parameters:

array : SciDBArray

The array to bound

Returns:

array : SciDBArray

A (possibly redimensioned) version of array

Notes

This forces evaluation of lazy arrays

scidbpy.schema_utils.cast_to_integer(array, attributes)

Cast a set of attributes in an array to integer datatypes.

This is a useful preprocessing step before redimensioning attributes as dimensions

scidbpy.schema_utils.change_axis_schema(datashape, axis, start=None, stop=None, chunk=None, overlap=None, name=None)

Create a new DataShape by modifying the parameters of one axis

Parameters:

datashape : SciDBDataShape

The template data shape

axis : int

Which axis to modify

stop : int (optional)

New axis upper bound

chunk : int (optional)

New chunk size

overlap : int (optional)

New chunk overlap

name : str (optional)

New dimension name

Returns:

new_schema : SciDBDataShape

The new schema, obtained by overriding the input parameters of the template datashape along the specified axis

scidbpy.schema_utils.coerced_shape(array)

Return an array shape, even if the array is unbound.

If the array is unbound, the shape is guaranteed to contain the data

Parameters:

array : SciDBArray

The array to lookup the shape for

Returns:

shape : tuple of ints

The shape

scidbpy.schema_utils.disambiguate(*arrays)

Process a list of arrays with calls to cast as needed, to avoid any name collisions in dimensions or attributes

The first array is guaranteed not to be modified

Parameters:

*arrays

One or more arrays to process

Returns:

arrays : tuple of SciDBArrays

The (possibly recasted) inputs. None of the dimensions or attribute names match.

scidbpy.schema_utils.expand(*arrays)

Grow arrays to equal shape, without truncating any data

Parameters:

*arrays

One or more SciDBArrays

Returns:

arrays : tuple of SciDBArrays

The input arrays, redimensioned as needed so they all have the same domain.

scidbpy.schema_utils.left_dimension_pad(array, n)

Add dummy dimensions as needed to an array, so that it is at least n-dimensional.

Parameters:

array : SciDBArray

The array to pad

n : int

The minimum dimensionality of the output

Returns:

array : SciDBArray

A version of the input, with extra dimensions added before the old dimensions.

scidbpy.schema_utils.limits(array, names)

Compute the lower/upper bounds for a set of attributes

Parameters:

array : SciDBArray

The array to consider

names : list of strings

Names of attributes to consider

Returns:

limits : dict mapping name->(lo, hi)

Contains the minimum and maximum value for each attribute

Notes

This performs a full scan of the array

scidbpy.schema_utils.match_attribute_names(*arrays)

Rename attributes in a set of arrays, so that all arrays have the same names of attributes

Parameters:

*arrays

one or more SciDBArrays

Returns:

arrays : tuple of SciDBArrays

All output arrays have the same attribute names

Raises:

ValueError : if arrays aren’t conformable

Notes

An array’s attributes will be renamed to match an attribute name in the first array, if the association is unambiguous. For example, consider two arrays with attribute schemas <a:int32, b:float> and <a:int32, c:float>. The attribute c will be renamed to b, since the datatypes match and there is no other b attribute.

scidbpy.schema_utils.match_chunk_permuted(src, target, indices, match_bounds=False)

Match chunks along a set of dimension pairs.

Parameters:

src : SciDBArray

The array to modify

target: SciDBArray

The array to match

indices: A list of tuples

Each tuple (i,j) indicates that dimension j of src should have the same chunk properties as dimension i of target

match_bounds : bool (optional, default False)

If true, match the dimension boundaries as well

Returns:

new_src, new_target : tuple of SciDBArrays

A (possibly redimensioned) version of the inputs

scidbpy.schema_utils.match_chunks(*arrays)

Redimension arrays so they have identical chunk sizes and overlaps

It is assumed that all input arrays have the same dimensionality. If needed, use as_same_dimension() to ensure this.

Parameters:

*arrays

One or more arrays to match

Returns:

arrays : Tuple of SciDBArrays

The chunk sizes and overlaps will be matched to the first input.

See also

match_chunk_permuted
to match chunks along particular pairs of dimensions
scidbpy.schema_utils.match_dimensions(A, B, dims)

Match the dimension bounds along a list of dimensions in 2 arrays.

Parameters:

A : SciDBArray

First array

B : SciDBArray

Second array

dims : list of pairs of integers

For each (i,j) pair, indicates that A[i] should have same dimension boundaries as B[j]

Returns:

Anew, Bnew : SciDBArrays

(Possibly redimensioned) versions of A and B

scidbpy.schema_utils.match_size(*arrays)

Resize all arrays in a list to the size of the first array. This requires that all arrays span a subset of the first array’s domain.

Parameters:

*arrays

One or more SciDBArrays

Returns:

arrays : tuple of SciDBArrays

The (possibly redimensioned) inputs. All arrays are resized to match the first array

Raises:

ValueError : If any arrays have a domain that is not a subset

of the first array’s domain.

scidbpy.schema_utils.rechunk(array, chunk_size=None, chunk_overlap=None)

Change the chunk size and/or overlap

Parameters:

array : SciDBArray

The array to sanitize

chunk_size : int or list of ints (optional)

The new chunk_size. Defaults to old chunk_size

chunk_overlap : int or list of ints (optional)

The new chunk overlap. Defaults to old chunk overlap

Returns:

array : SciDBArray

A (possibly redimensioned) version of the input

scidbpy.schema_utils.redimension(array, dimensions, attributes, dim_boundaries=None)

Redimension an array as needed, swapping and dropping attributes as needed.

Parameters:

array: SciDBArray

The array to redimension

dimensions : list of strings

The dimensions or attributes in array that should be dimensions

attributes : list of strings

The dimensions or attributes in array that should be attributes

dim_boundaries : dict (optional)

A dictionary mapping dimension names to boundary tuples (lo, hi) Specifies the dimension bounds for attributes promoted to dimensions. If not provided, will default to (0,*). WARNING: this will fail if promiting negatively-valued attributes to dimensions.

Returns:

result : SciDBArray

A new version of array, redimensioned as needed to ensure proper dimension/attribute schema.

Notes

  • Only integer attributes can be listed as dimensions
  • If an attribute or dimension in the original array is not explicitly provided as an input, it is dropped
  • If no attributes are specified, a new dummy attribute is added to ensure a valid schema.
scidbpy.schema_utils.right_dimension_pad(array, n)

Add dummy dimensions as needed to an array, so that it is at least n-dimensional.

Parameters:

array : SciDBArray

The array to pad

n : int

The minimum dimensionality of the output

Returns:

array : SciDBArray

A version of the input, with extra dimensions added after the old dimensions.

scidbpy.schema_utils.to_attributes(array, *dimensions)

Ensure that a set of attributes or dimensions are attributes

Parameters:

array : SciDBArray

The array to promote

dimensions : one or more strings

Dimension names to demote. Attribute labels are ignored

Returns:

demoted : SciDBArray

A new array

scidbpy.schema_utils.to_dimensions(array, *attributes)

Ensure that a set of attributes or dimensions are dimensions

Parameters:

array : SciDBArray

The array to promote

attributes : one or more strings

Attribute names to promote. Dimension labels are ignored

Returns:

promoted : SciDBArray

A new array

scidbpy.schema_utils.zero_indexed(array)

Redimension an array so all lower coordinates are at 0

Raises:ValueError : if any array has dimensions starting below zero.