API Reference

This is the list of classes and functions available in SciDB-py.

SciDB Array Class

class scidbpy.SciDBArray(datashape, interface, name, persistent=False)

SciDBArray class

It is not recommended to instantiate this class directly; use a convenience routine from SciDBInterface.

Attributes

T Permute the dimensions of an array.
afl An alias to the AFL namespace
att_names
chunk_overlap
chunk_size
datashape
dim_names
dtype
natt
ndim
persistent Controls whether the array is deleted when
query
schema Return the array schema
sdbslice
sdbtype
shape
size

Methods

aggregate(*args, **kwargs) Perform one or more aggregations over an array.
alias([name]) Return an alias of the array, optionally with a new name
all() Returns whether all elements of each attribute are true.
any() Returns whether any elements of each attribute are true.
approxdc([index, scidb_syntax]) Return the number of distinct values of the array or along an axis.
as_temp([name]) Create a SciDB TEMP array, stored in RAM
att(a) Return the attribute name of the array.
attribute(a) Return the attribute name of the array.
attribute_rename(*args) Rename a set of attributes
avg([index, scidb_syntax]) Return the average of the array or the average along an axis.
collapse() Flatten and remove all the empty cells.
compress(mask[, axis]) Extract a subset of entries along a given axis,
contains_nulls([attr]) Return True if the array contains null values.
contents(**kwargs) Return a string representation of the array contents
copy([new_name, persistent]) Make a copy of the array in the database
count([index, scidb_syntax]) Return the count of the array or the count along an axis.
cumprod([axis]) Return the cumulative product over the array.
cumsum([axis]) Return the cumulative sum over the array.
cumulate(expression[, dimension]) Compute running operations along data (e.g., cumulative sums)
dimension(d) Return the dimension name of the array
dimension_rename(*args) Rename a set of dimensions
eval([out, store]) If the array is backed by an unevaluated query,
from_query(interface, query) Build a lazily-evaulated SciDB array from a query string
groupby(by) Build a groupby object from this array
head([n]) Extract and download the first few elements in the array
index_lookup(idx_array, attribute[, ...]) Wrapper around the index_lookup AFL call.
isel(**kwargs) Select a subset of the array by dimension name
issparse() Check whether array is sparse.
max([index, scidb_syntax]) Return the maximum of the array or the maximum along an axis.
mean([index, scidb_syntax]) Return the average of the array or the average along an axis.
min([index, scidb_syntax]) Return the minimum of the array or the minimum along an axis.
nonempty() Return the number of nonempty elements in the array.
nonnull([attr]) Return the number of non-empty and non-null values.
reap([ignore]) Delete this object from the database if it isn’t persistent.
regrid(size[, aggregate]) Regrid the array using the specified aggregate
relabel(renames) relabel the attributes or dimensions in an array.
rename(new_name[, persistent]) Rename the array in the database, optionally making the new array persistent.
reshape(shape, **kwargs) Reshape data into a new array
std([index, scidb_syntax]) Return the standard deviation of the array or along an axis.
stdev([index, scidb_syntax]) Return the standard deviation of the array or along an axis.
substitute(value) Reshape data into a new array, substituting a default for any nulls.
sum([index, scidb_syntax]) Return the sum of the array or the sum along an axis.
toarray(**kwargs) Transfer data from database and store in a numpy array.
todataframe(**kwargs) Transfer array from database and store in a local Pandas dataframe
tolist(**kwargs) Download the array as a (nested) python list
tosparse([sparse_fmt]) Transfer array from database and store in a local sparse array.
transpose(*axes) Permute the dimensions of an array.
unpack([name]) Unpack with automatic dimension name disambiguation
var([index, scidb_syntax]) Return the variance of the array or the variance along an axis.
T

Permute the dimensions of an array.

Parameters:

axes : None, tuple of ints, or n ints

  • None or no argument: reverses the order of the axes.
  • tuple of ints: i in the j-th place in the tuple means a‘s i-th axis becomes a.transpose()‘s j-th axis.
  • n ints: same as an n-tuple of the same ints (this form is intended simply as a “convenience” alternative to the tuple form)
Returns:

out : ndarray

Copy of a, with axes suitably permuted.

afl

An alias to the AFL namespace

aggregate(*args, **kwargs)

Perform one or more aggregations over an array.

Parameters:

*args: One or more SciDB aggregate expressions

Aggregations to perform, like ‘sum(value) as x’

by :

See also

groupby

Examples

x = sdb.arange(10).reshape((5, 2)) x.aggregate(‘count(*)’).toarray() x.aggregate(‘max(f0)’, by=’i0’).toarray()

alias(name=None)

Return an alias of the array, optionally with a new name

all()

Returns whether all elements of each attribute are true.

Returns:

all : SciDBArray

boolean array

any()

Returns whether any elements of each attribute are true.

Returns:

any : SciDBArray

boolean array

approxdc(index=None, scidb_syntax=False)

Return the number of distinct values of the array or along an axis.

The distinct count is an estimate only.

Parameters:

index : int, optional

Axis along which to operate. By default, flattened input is used.

scidb_syntax : bool, optional (default=False)

If False, index follows the numpy convention (i.e., the array is collapsed over the index’th axis). If True, index follows the SciDB convention (i.e., the array is collapsed over all axes except index)

Returns:

A SciDB array

as_temp(name=None)

Create a SciDB TEMP array, stored in RAM

Returns:

temp : SciDBArray

A new array, stored in-memory in the database

att(a)

Return the attribute name of the array.

Parameters:

a : int

Index of the attribute to lookup

attribute(a)

Return the attribute name of the array.

Parameters:

a : int

Index of the attribute to lookup

attribute_rename(*args)

Rename a set of attributes

Parameters:

args : (old_name, new_name, ...)

0 or more rename pairs

Returns:

renamed : SciDBArray

The new array

avg(index=None, scidb_syntax=False)

Return the average of the array or the average along an axis.

Parameters:

index : int, optional

Axis along which to operate. By default, flattened input is used.

scidb_syntax : bool, optional (default=False)

If False, index follows the numpy convention (i.e., the array is collapsed over the index’th axis). If True, index follows the SciDB convention (i.e., the array is collapsed over all axes except index)

Returns:

A SciDB array

collapse()

Flatten and remove all the empty cells.

Returns:

collapsed : SciDBArray

A new 1D dense array, containing all of the nonempty cells in this array.

compress(mask, axis=0)

Extract a subset of entries along a given axis, where an input mask array is non-null

Parameters:

array : SciDBArray

The array to filter

mask : SciDBArray

A 1-dimensional SciDBArray, whose non-null values indicate the entries to retain

axis : int

The axis of array along which to apply the mask. The shape of array along this axis must be the length of mask

contains_nulls(attr=None)

Return True if the array contains null values.

Parameters:

attr : None, int, or array_like

the attribute index/indices to check. If None, then check all.

Returns:

contains_nulls : boolean

contents(**kwargs)

Return a string representation of the array contents

copy(new_name=None, persistent=False)

Make a copy of the array in the database

Parameters:

new_name : string (optional)

if specifiedmust be a valid array name which does not already exist in the database.

persistent : boolean (optional)

specify whether the new array is persistent (default=False)

Returns:

copy : SciDBArray

return a copy of the original array

count(index=None, scidb_syntax=False)

Return the count of the array or the count along an axis.

The count is equal to the number of nonnull elements.

Parameters:

index : int, optional

Axis along which to operate. By default, flattened input is used.

scidb_syntax : bool, optional (default=False)

If False, index follows the numpy convention (i.e., the array is collapsed over the index’th axis). If True, index follows the SciDB convention (i.e., the array is collapsed over all axes except index)

Returns:

A SciDB array

cumprod(axis=None)

Return the cumulative product over the array.

Parameters:

axis : int, optional

The axis to multiply over. The default multiplies over the flattened array

Returns:

prods : SciDBArray

A new array, with the same shape (but flattened if axis=None)

See also

cumsum, cumulate

cumsum(axis=None)

Return the cumulative sum over the array.

Parameters:

axis : int, optional

The axis to sum over. The default sums over the flattened array

Returns:

sums : SciDBArray

A new array, with the same shape (but flattened if axis=None)

See also

cumprod, cumulate

cumulate(expression, dimension=0)

Compute running operations along data (e.g., cumulative sums)

Parameters:

expression: str

A valid SciDB expression

dimension : int or str (optional, default=0)

Which dimension to accumulate over

Returns:

arr : SciDBArray

A new array of the same shape.

See also

cumsum, cumprod

Examples

>>> x = sdb.arange(12).reshape((3, 4))
>>> x.cumulate('sum(f0)').toarray()
array([[ 0,  1,  2,  3],
      [ 4,  6,  8, 10],
      [12, 15, 18, 21]])
dimension(d)

Return the dimension name of the array

Parameters:

d : int

The index of the dimension to lookup

dimension_rename(*args)

Rename a set of dimensions

Parameters:

args : (old_name, new_name, ...)

0 or more rename pairs

Returns:

renamed : SciDBArray

The new array

eval(out=None, store=True, **kwargs)

If the array is backed by an unevaluated query, evaluate the query and store the result in the database

This changes array.name from a query string to a stored array name. Calling eval() on an array that is already backed by a stored array does nothing.

Parameters:

out : SciDBArray (optional)

An optional pre-existing array to store the evaluation into.

classmethod from_query(interface, query)

Build a lazily-evaulated SciDB array from a query string

Parameters:

interface : SciDBInterface

The database connection to use

query : str

The query string to wrap

Returns:

array : SciDBArray

groupby(by)

Build a groupby object from this array

Parameters:

by : string or list of strings

Names of attributes and dimensions to group by

Returns:

groups : scidbpy.aggregation.GroupBy instance

An object that can be used, e.g., to perform aggregations over each group. See scidbpy.aggregation.GroupBy documentation for more information.

head(n=5)

Extract and download the first few elements in the array

Parameters:

n : int (optional, default=5)

The number of elements to retrieve

Returns:

head : SciDBArray

The first N elements in the array, downloaded as a Pandas dataframe (if pandas is installed) or a Numpy array

index_lookup(idx_array, attribute, output_attribute=u'idx')

Wrapper around the index_lookup AFL call.

This automatically wraps the array name and attribute in an alias, as is required by AFL

Parameters:

idx_array : SciDBArray

A single-attribute array of unique values to lookup

attribute : string

The name of an attribute in this array

output_attribute : string

The attribute of the output containing the indices

Returns:

indexed : SciDBArray

The current array appended with an index attribute

isel(**kwargs)

Select a subset of the array by dimension name

Parameters:

kwargs : dimension names -> slice descrption

What to select from the array

Returns:

subarray : SciDBArray

The array subset

Examples

x = sdb.arange(20).reshape((4, 5)) print(x.schema) # <f0:int64> [i0=0:3,1000,0,i1=0:4,1000,0] x.isel(i0=0) # x[0] x.isel(i1=2) # x[:, 2] x.isel(i1=slice(2,4)).toarray() # x[:, 2:4]

issparse()

Check whether array is sparse.

max(index=None, scidb_syntax=False)

Return the maximum of the array or the maximum along an axis.

Parameters:

index : int, optional

Axis along which to operate. By default, flattened input is used.

scidb_syntax : bool, optional (default=False)

If False, index follows the numpy convention (i.e., the array is collapsed over the index’th axis). If True, index follows the SciDB convention (i.e., the array is collapsed over all axes except index)

Returns:

A SciDB array

mean(index=None, scidb_syntax=False)

Return the average of the array or the average along an axis.

Parameters:

index : int, optional

Axis along which to operate. By default, flattened input is used.

scidb_syntax : bool, optional (default=False)

If False, index follows the numpy convention (i.e., the array is collapsed over the index’th axis). If True, index follows the SciDB convention (i.e., the array is collapsed over all axes except index)

Returns:

A SciDB array

Notes

Identical to SciDBArray.avg()

min(index=None, scidb_syntax=False)

Return the minimum of the array or the minimum along an axis.

Parameters:

index : int, optional

Axis along which to operate. By default, flattened input is used.

scidb_syntax : bool, optional (default=False)

If False, index follows the numpy convention (i.e., the array is collapsed over the index’th axis). If True, index follows the SciDB convention (i.e., the array is collapsed over all axes except index)

Returns:

A SciDB array

nonempty()

Return the number of nonempty elements in the array.

Nonempty refers to the sparsity of an array, and thus includes in the count elements with values which are set to NULL.

See also

nonnull

nonnull(attr=0)

Return the number of non-empty and non-null values.

This query must be done for each attribute: the default is the first attribute.

Parameters:

attr : None, int or array_like

the attribute or attributes to query. If None, then query all attributes.

Returns:

nonnull : array_like

the nonnull count for each attribute. The returned value is the same shape as the input attr.

See also

nonempty

persistent

Controls whether the array is deleted when the database is reaped

reap(ignore=False)

Delete this object from the database if it isn’t persistent.

Parameters:

ignore : bool (default False)

If False and the array is persistent, then reap raises an error If True and the array is persistent, reap does nothing

Raises:

SciDBForbidden if ``persistent=True`` and ``ignore=False`

regrid(size, aggregate=u'avg')

Regrid the array using the specified aggregate

Parameters:

size : int or tuple of ints

Specify the size of the regridding along each dimension. If a single integer, then use the same regridding along each dimension.

aggregate : string

specify the aggregation function to use when creating the new grid. Default is ‘avg’. Possible values are: [‘avg’, ‘sum’, ‘min’, ‘max’, ‘count’, ‘stdev’, ‘var’, ‘approxdc’]

Returns:

A : scidbarray

The re-gridded version of the array. The size of dimension i is ceil(self.shape[i] / size[i])

relabel(renames)

relabel the attributes or dimensions in an array.

Parameters:

renames: dict

A dictionary mapping old names to new names

Returns:

renamed : SciDBArray

A new array

rename(new_name, persistent=False)

Rename the array in the database, optionally making the new array persistent.

Parameters:

new_name : string

must be a valid array name which does not already exist in the database.

persistent : boolean (optional)

specify whether the new array is persistent (default=False)

Returns:

self : SciDBArray

return a pointer to self

reshape(shape, **kwargs)

Reshape data into a new array

Parameters:

shape : tuple or int

The shape of the new array. Must be compatible with the current shape

**kwargs :

additional keyword arguments will be passed to SciDBDatashape

Returns:

arr : SciDBArray

new array of the specified shape

schema

Return the array schema

std(index=None, scidb_syntax=False)

Return the standard deviation of the array or along an axis.

Parameters:

index : int, optional

Axis along which to operate. By default, flattened input is used.

scidb_syntax : bool, optional (default=False)

If False, index follows the numpy convention (i.e., the array is collapsed over the index’th axis). If True, index follows the SciDB convention (i.e., the array is collapsed over all axes except index)

Returns:

A SciDB array

Notes

Identical to SciDBArray.stdev()

stdev(index=None, scidb_syntax=False)

Return the standard deviation of the array or along an axis.

Parameters:

index : int, optional

Axis along which to operate. By default, flattened input is used.

scidb_syntax : bool, optional (default=False)

If False, index follows the numpy convention (i.e., the array is collapsed over the index’th axis). If True, index follows the SciDB convention (i.e., the array is collapsed over all axes except index)

Returns:

A SciDB array

substitute(value)

Reshape data into a new array, substituting a default for any nulls.

Parameters:

value : value to replace nulls (required)

Returns:

arr : SciDBArray

new non-nullable array

Notes

This is currently limited to single-attribute arrays. Use the raw AFL substutute operator for multi-attribute arrays

sum(index=None, scidb_syntax=False)

Return the sum of the array or the sum along an axis.

Parameters:

index : int, optional

Axis along which to operate. By default, flattened input is used.

scidb_syntax : bool, optional (default=False)

If False, index follows the numpy convention (i.e., the array is collapsed over the index’th axis). If True, index follows the SciDB convention (i.e., the array is collapsed over all axes except index)

Returns:

A SciDB array

toarray(**kwargs)

Transfer data from database and store in a numpy array.

Parameters:

compression : None, ‘auto’ or 1-9

Whether to use compression. None disables compression. ‘auto’ uses the default_compression attribute on the SciDB interface object. 1-9 uses gzip compression at the specified level (1=fast, 9=best)

method : ‘sparse’ or ‘dense’ (optional, default sparse)

Whether the array to download is sparse or dense.

‘sparse’ works with all SciDB arrays, but is slower (it computes and transfers array indices for each cell). It is the default.

‘dense’ transfer only works for bound arrays with no empty cells. It is faster, since it doesn’t compute or transfer indices.

transfer_bytes : DEPRECATED

Unused

Returns:

arr : np.ndarray

The dense array containing the data.

Notes

If the array is backed by a query, the query is evaluated and stored in the database

todataframe(**kwargs)

Transfer array from database and store in a local Pandas dataframe

The array dimensions are assigned to the index of the output.

Parameters:

compression : ‘auto’, None, or [1-9]

Whether and how to compress the transfer.

Returns:

arr : pd.DataFrame

The dataframe object containing the data in the array.

tolist(**kwargs)

Download the array as a (nested) python list

tosparse(sparse_fmt=u'recarray', **kwargs)

Transfer array from database and store in a local sparse array.

Parameters:

sparse_fmt : string or None

Specify the sparse format to use. Available formats are: - ‘recarray’ : a record array containing the indices and

values for each data point. This is valid for arrays of any dimension and with any number of attributes.

  • [‘coo’|’csc’|’csr’|’dok’|’lil’] : a scipy sparse matrix. These are valid only for 2-dimensional arrays with a single attribute.

compression : ‘auto’, None, or [1-9]

Whether to use compression. None disables compression. ‘auto’ uses the value from the SciDBInterface’s default_compression attribute. 1-9 specifies a gzip-compression level (1=fast, 9=best)

transfer_bytes : deprecated

Unused

Returns:

arr : ndarray or sparse matrix

The sparse representation of the data

transpose(*axes)

Permute the dimensions of an array.

Parameters:

axes : None, tuple of ints, or n ints

  • None or no argument: reverses the order of the axes.
  • tuple of ints: i in the j-th place in the tuple means a‘s i-th axis becomes a.transpose()‘s j-th axis.
  • n ints: same as an n-tuple of the same ints (this form is intended simply as a “convenience” alternative to the tuple form)
Returns:

out : ndarray

Copy of a, with axes suitably permuted.

unpack(name=u'idx')

Unpack with automatic dimension name disambiguation

Unpacking flattens an array to 1D, converting all old dimensions to attributes

Parameters:

name : str (optional, default ‘idx’)

The name of the new dimension. Will be disambiguated

var(index=None, scidb_syntax=False)

Return the variance of the array or the variance along an axis.

Parameters:

index : int, optional

Axis along which to operate. By default, flattened input is used.

scidb_syntax : bool, optional (default=False)

If False, index follows the numpy convention (i.e., the array is collapsed over the index’th axis). If True, index follows the SciDB convention (i.e., the array is collapsed over all axes except index)

Returns:

A SciDB array

SciDB Interface

scidbpy.interface.connect(url=None, username=None, password=None)

Connect to a SciDB instance.

Parameters:

url : str (optional)

Connection URL. If not provided, will fall back to the SCIDB_URL environment variable (if present), or http://127.0.0.1:8080. MUST begin with http or https. Username and password are mandatory with https.

username : str (optional)

SciDB username, for authenticated communication. Defaults to the value of the SCIDB_USER environment variable. If that doesn’t exist, unauthetnicated communication is used.

password : str (optional)

SciDB password, for authenticated communication. Defaults to the value of the SCIDB_PASSWORD environment variable. If that doesn’t exist, unauthetnicated communication is used

Returns:

A SciDBShimInterface connection to the database.

Base Class

class scidbpy.interface.SciDBInterface

Attributes

afl
default_compression The default compression to use when downloading data

Methods

acos(A) Element-wise trigonometric inverse cosine
approxdc(A[, index, scidb_syntax]) Array or axis unique element estimate.
arange([start,] stop[, step,][, dtype]) Return evenly spaced values within a given interval.
asin(A) Element-wise trigonometric inverse sine
atan(A) Element-wise trigonometric inverse tangent
avg(A[, index, scidb_syntax]) Array or axis average.
ceil(A) Element-wise ceiling function
concatenate(arrays[, axis]) Concatenate several arrays along a particular dimension.
cos(A) Element-wise trigonometric cosine
count(A[, index, scidb_syntax]) Array or axis count.
cross_join(A, B, *dims) Perform a cross-join on arrays A and B.
dot(A, B) Compute the matrix product of A and B
dstack(arrays) Stack arrays in sequence depth wise (along the third axis).
exp(A) Element-wise natural exponent
floor(A) Element-wise floor function
from_array(A[, instance_id, chunk_size]) Initialize a scidb array from a numpy array
from_dataframe(A[, instance_id]) Initialize a scidb array from a pandas dataframe
from_sparse(A[, instance_id]) Initialize a scidb array from a sparse array
hstack(arrays) Stack arrays in sequence horizontally (column wise).
identity(n[, dtype, sparse]) Return a 2-dimensional square identity matrix of size n
isnan(A) Element-wise nan test function
join(*args) Perform a series of array joins on the arguments and return the result.
linspace(start, stop[, num, endpoint, retstep]) Return evenly spaced numbers over a specified interval.
list_arrays() List the arrays currently in the database
log(A) Element-wise natural logarithm
log10(A) Element-wise base-10 logarithm
ls([pattern]) List the arrays in the database, optionally matching to a pattern
max(A[, index, scidb_syntax]) Array or axis maximum.
mean(A[, index, scidb_syntax]) Array or axis mean.
merge(left, right[, on, left_on, right_on, ...]) Perform a Pandas-like join on two SciDBArrays.
min(A[, index, scidb_syntax]) Array or axis minimum.
new_array([shape, dtype, persistent, name]) Create a new array, either instantiating it in SciDB or simply reserving the name for use in a later query.
normalize(A)
ones(shape[, dtype]) Return an array of ones
percentile(a, q[, att]) Compute the qth percentile of the data along the specified axis
query(query, *args, **kwargs) Perform a query on the database.
randint(shape[, dtype, lower, upper, persistent]) Return an array of random integers between lower and upper
random(shape[, dtype, lower, upper, persistent]) Return an array of random floats between lower and upper
reap() Reap all arrays created via new_array
remove(array) Remove an array from the database
sin(A) Element-wise trigonometric sine
sqrt(A) Element-wise square root
std(A[, index, scidb_syntax]) Array or axis standard deviation.
stdev(A[, index, scidb_syntax]) Array or axis standard deviation.
substitute(A, value) Replace null values in an array
sum(A[, index, scidb_syntax]) Array or axis sum.
svd(A[, return_U, return_S, return_VT]) Compute the Singular Value Decomposition of the array A:
tan(A) Element-wise trigonometric tangent
toarray(A[, transfer_bytes]) Convert a SciDB array to a numpy array
todataframe(A[, transfer_bytes]) Convert a SciDB array to a pandas dataframe
tosparse(A[, sparse_fmt, transfer_bytes]) Convert a SciDB array to a sparse representation
unique(x[, is_sorted]) Store the unique elements of an array in a new array
var(A[, index, scidb_syntax]) Array or axis variance.
vstack(arrays) Stack arrays in sequence vertically (column wise).
wrap_array(scidbname[, persistent]) Create a new SciDBArray object that references an existing SciDB
zeros(shape[, dtype]) Return an array of zeros
acos(A)

Element-wise trigonometric inverse cosine

approxdc(A, index=None, scidb_syntax=False)

Array or axis unique element estimate.

see SciDBArray.approxdc()

arange([start, ]stop, [step, ]dtype=None, **kwargs)

Return evenly spaced values within a given interval.

Values are generated within the half-open interval [start, stop) (in other words, the interval including start but excluding stop). For integer arguments the behavior is equivalent to the Python range function, but returns an ndarray rather than a list.

When using a non-integer step, such as 0.1, the results will often not be consistent. It is better to use linspace for these cases.

Parameters:

start : number, optional

Start of interval. The interval includes this value. The default start value is 0.

stop : number

End of interval. The interval does not include this value, except in some cases where step is not an integer and floating point round-off affects the length of out.

step : number, optional

Spacing between values. For any output out, this is the distance between two adjacent values, out[i+1] - out[i]. The default step size is 1. If step is specified, start must also be given.

dtype : dtype

The type of the output array. If dtype is not given, it is inferred from the type of the input arguments.

**kwargs :

Additional arguments are passed to SciDBDatashape when creating the output array.

Returns:

arange : SciDBArray

Array of evenly spaced values.

For floating point arguments, the length of the result is ceil((stop - start)/step). Because of floating point overflow, this rule may result in the last element of out being greater than stop.

asin(A)

Element-wise trigonometric inverse sine

atan(A)

Element-wise trigonometric inverse tangent

avg(A, index=None, scidb_syntax=False)

Array or axis average.

see SciDBArray.avg()

ceil(A)

Element-wise ceiling function

concatenate(arrays, axis=0)

Concatenate several arrays along a particular dimension.

This behaves like numpy’s concatenate function when the input array dimensions are > the concatenation axis. It behaves differently than numpy when the array dimensions are less than the concatenation axis, in the following way:

Concatenating 1D arrays along axis=0 behaves like numpy’s vstack. Concatenating 1D arrays along axis=1 behaves like numpy’s hstack. Concatenating 1D or 2D arrays along axis=2 behaves like dstack.

Parameters:

arrays : Sequence of SciDBArrays

The arrays to concatenate

axis : int, optional (default 0)

The dimension to join on. Array shapes must match along all dimensions except this axis.

Returns:

stacked : SciDBArray

A stacked array

See also

hstack, vstack, dstack

cos(A)

Element-wise trigonometric cosine

count(A, index=None, scidb_syntax=False)

Array or axis count.

see SciDBArray.count()

cross_join(A, B, *dims)

Perform a cross-join on arrays A and B.

Parameters:

A, B : SciDBArray

*dims : tuples

The remaining arguments are tuples of dimension indices which should be joined.

default_compression

The default compression to use when downloading data

dot(A, B)

Compute the matrix product of A and B

Parameters:

A : SciDBArray

A must be a two-dimensional matrix of shape (n, p)

B : SciDBArray

B must be a two-dimensional matrix of shape (p, m)

Returns:

C : SciDBArray

The wrapper of the SciDB Array, of shape (n, m), consisting of the matrix product of A and B

dstack(arrays)

Stack arrays in sequence depth wise (along the third axis).

Parameters:

arrays : Sequence of SciDBArrays

The arrays to join. All arrays must have the same shape along all but the third dimension.

Returns:

stacked : SciDBArray

The array formed by stacking the given arrays.

exp(A)

Element-wise natural exponent

floor(A)

Element-wise floor function

from_array(A, instance_id=0, chunk_size=1000, **kwargs)

Initialize a scidb array from a numpy array

Parameters:

A : array_like (numpy array or sparse array)

input array from which the scidb array will be created

instance_id : integer

the instance ID used in loading (default=0; see SciDB documentation)

chunk_size : integer or list of integers

The chunk size of the uploaded SciDBArray. Default=1000

**kwargs :

Additional keyword arguments are passed to new_array()

Returns:

arr : SciDBArray

SciDB Array object built from the input array

from_dataframe(A, instance_id=0, **kwargs)

Initialize a scidb array from a pandas dataframe

Parameters:

A : pandas dataframe

data from which the scidb array will be created.

instance_id : integer

the instance ID used in loading (default=0; see SciDB documentation)

**kwargs :

Additional keyword arguments are passed to new_array()

Returns:

arr : SciDBArray

SciDB Array object built from the input array

from_sparse(A, instance_id=0, **kwargs)

Initialize a scidb array from a sparse array

Parameters:

A : sparse array

sparse input array from which the scidb array will be created. Note that this array will internally be converted to COO format.

instance_id : integer

the instance ID used in loading (default=0; see SciDB documentation)

**kwargs :

Additional keyword arguments are passed to new_array()

Returns:

arr : SciDBArray

SciDB Array object built from the input array

hstack(arrays)

Stack arrays in sequence horizontally (column wise).

Parameters:

arrays : Sequence of SciDBArrays

The arrays to join. All arrays must have the same shape along all but the second dimension.

Returns:

stacked : SciDBArray

The array formed by stacking the given arrays.

identity(n, dtype=u'double', sparse=False, **kwargs)

Return a 2-dimensional square identity matrix of size n

Parameters:

n : integer

the number of rows and columns in the matrix

dtype : string or list

The data type of the array

sparse : boolean

specify whether to create a sparse array (default=False)

**kwargs :

Additional keyword arguments are passed to SciDBDataShape.

Returns:

arr : SciDBArray

A SciDBArray containint an [n x n] identity matrix

isnan(A)

Element-wise nan test function

join(*args)

Perform a series of array joins on the arguments and return the result.

linspace(start, stop, num=50, endpoint=True, retstep=False, **kwargs)

Return evenly spaced numbers over a specified interval.

Returns num evenly spaced samples, calculated over the interval [start, stop ].

The endpoint of the interval can optionally be excluded.

Parameters:

start : scalar

The starting value of the sequence.

stop : scalar

The end value of the sequence, unless endpoint is set to False. In that case, the sequence consists of all but the last of num + 1 evenly spaced samples, so that stop is excluded. Note that the step size changes when endpoint is False.

num : int, optional

Number of samples to generate. Default is 50.

endpoint : bool, optional

If True, stop is the last sample. Otherwise, it is not included. Default is True.

retstep : bool, optional

If True, return (samples, step), where step is the spacing between samples.

**kwargs :

additional keyword arguments are passed to SciDBDataShape

Returns:

samples : SciDBArray

There are num equally spaced samples in the closed interval [start, stop] or the half-open interval [start, stop) (depending on whether endpoint is True or False).

step : float (only if retstep is True)

Size of spacing between samples.

list_arrays()

List the arrays currently in the database Returns ——- array_list : dictionary

A mapping of array name -> schema
log(A)

Element-wise natural logarithm

log10(A)

Element-wise base-10 logarithm

ls(pattern=u'*')

List the arrays in the database, optionally matching to a pattern

Parameters:

pattern : String (optional)

A glob-style pattern string. If present, only arrays whose names match the pattern are displayed. ‘*’ matches any string, ‘?’ matches any character

Returns:

result : list

A list of SciDB array names

max(A, index=None, scidb_syntax=False)

Array or axis maximum.

see SciDBArray.max()

mean(A, index=None, scidb_syntax=False)

Array or axis mean.

see SciDBArray.mean()

static merge(left, right, on=None, left_on=None, right_on=None, how=u'inner', suffixes=(u'_x', u'_y'))

Perform a Pandas-like join on two SciDBArrays.

Parameters:

left : SciDBArray

The left array to join on

right : SciDBArray

The right array to join on

on : None, string, or list of strings

The names of dimensions or attributes to join on. Either on or both left_on and right_on must be supplied. If on is supplied, the specified names must exist in both left and right

left_on : None, string, or list of strings

The names of dimensions or attributes in the left array to join on. If provided, then right_on must also be provided, and have as many elements as left_on

right_on : None, string, or list of strings

The name of dimensions or attributes in the right array to join on. See notes above for left_join

how : ‘inner’ | ‘left’ | ‘right’ | ‘outer’

The kind of join to perform. Currently, only ‘inner’ is supported.

suffixes : tuple of two strings

The suffix to add to array dimensions or attributes which are duplicated in left and right.

Returns:

joined : SciDBArray

The joined array.

Notes

When joining on attributes, a categorical index is computed for each array. This index will appear as a dimension in the output.

This function builds an AFL join or cross join query, performing preprocessing on the inputs as necessary to match chunk sizes, avoid name collisions, etc.

If neither on, left_on, or right_on are provided, then the join defaults to the overlapping dimension names.

min(A, index=None, scidb_syntax=False)

Array or axis minimum.

see SciDBArray.min()

new_array(shape=None, dtype=u'double', persistent=False, name=None, **kwargs)

Create a new array, either instantiating it in SciDB or simply reserving the name for use in a later query.

Parameters:

shape : int or tuple (optional)

The shape of the array to create. If not specified, no array will be created and a name will simply be reserved for later use. WARNING: if shape=None and persistent=False, an error will result when the array goes out of scope, unless the name is used to create an array on the server.

dtype : string (optional)

the datatype of the array. This is only referenced if shape is specified. Default is ‘double’.

persistent : boolean (optional)

whether the created array should be persistent, i.e. survive in SciDB past when the object wrapper goes out of scope. Default is False.

name : str (optional)

The name to give the array in the databse. If present, persistent will be set to True.

**kwargs : (optional)

If shape is specified, additional keyword arguments are passed to SciDBDataShape. Otherwise, these will not be referenced.

Returns

——-

arr : SciDBArray

wrapper of the new SciDB array instance.

ones(shape, dtype=u'double', **kwargs)

Return an array of ones

Parameters:

shape : tuple or int

The shape of the array

dtype : string or list

The data type of the array

**kwargs :

Additional keyword arguments are passed to SciDBDataShape.

Returns:

arr: SciDBArray

A SciDBArray consisting of all ones.

percentile(a, q, att=None)

Compute the qth percentile of the data along the specified axis

Parameters:

a : SciDBArray

Input array

q : float in the range [0, 100] or a sequence of floats

The percentiles to compute

att : str, optional

The array attribute to compute percentiles for. Defaults to the first attribute

Returns:

qs : SciDBArray

An array with as many elements as q, listing the data value at each percentile

query(query, *args, **kwargs)

Perform a query on the database.

This wraps a query constructor which allows the creation of sophisticated SciDB queries which act on arrays wrapped by SciDBArray objects. See Notes below for details.

Parameters:

query : string

The query string, with curly-braces to indicate insertions

*args, **kwargs :

Values to be inserted (see below).

randint(shape, dtype=u'uint32', lower=0, upper=2147483647, persistent=False, **kwargs)

Return an array of random integers between lower and upper

Parameters:

shape : tuple or int

The shape of the array

dtype : string or list

The data type of the array

lower : float

The lower bound of the random sample (default=0)

upper : float

The upper bound of the random sample (default=2147483647)

persistent : bool

Whether the array is persistent (default=False)

**kwargs :

Additional keyword arguments are passed to SciDBDataShape.

Returns:

arr: SciDBArray

A SciDBArray consisting of random integers, uniformly distributed between lower and upper.

random(shape, dtype=u'double', lower=0, upper=1, persistent=False, **kwargs)

Return an array of random floats between lower and upper

Parameters:

shape : tuple or int

The shape of the array

dtype : string or list

The data type of the array

lower : float

The lower bound of the random sample (default=0)

upper : float

The upper bound of the random sample (default=1)

persistent : bool

Whether the new array is persistent (default=False)

**kwargs :

Additional keyword arguments are passed to SciDBDataShape.

Returns:

arr: SciDBArray

A SciDBArray consisting of random floating point numbers, uniformly distributed between lower and upper.

reap()

Reap all arrays created via new_array

remove(array)

Remove an array from the database

This removes the array even if its persistent property is True!

Parameters:

array : str or SciDBArray

The array (or name of array) to remove

See also

reap, SciDBArray.reap

sin(A)

Element-wise trigonometric sine

sqrt(A)

Element-wise square root

std(A, index=None, scidb_syntax=False)

Array or axis standard deviation.

see SciDBArray.std()

stdev(A, index=None, scidb_syntax=False)

Array or axis standard deviation.

see SciDBArray.stdev()

substitute(A, value)

Replace null values in an array

See SciDBArray.substitute()

sum(A, index=None, scidb_syntax=False)

Array or axis sum.

see SciDBArray.sum()

svd(A, return_U=True, return_S=True, return_VT=True)

Compute the Singular Value Decomposition of the array A:

A = U.S.V^T

Parameters:

A : SciDBArray

The array for which the SVD will be computed. It should be a 2-dimensional array with a single value per cell. Currently, the svd routine requires non-overlapping chunks of size 32.

return_U, return_S, return_VT : boolean

if any is True, then return the associated array. All are True by default

Returns:

[U], [S], [VT] : SciDBArrays

Arrays storing the singular values and vectors of A.

tan(A)

Element-wise trigonometric tangent

toarray(A, transfer_bytes=True)

Convert a SciDB array to a numpy array

todataframe(A, transfer_bytes=True)

Convert a SciDB array to a pandas dataframe

tosparse(A, sparse_fmt=u'recarray', transfer_bytes=True)

Convert a SciDB array to a sparse representation

unique(x, is_sorted=False)

Store the unique elements of an array in a new array

Parameters:

x : SciDBArray

The array to compute unique elements of.

is_sorted : bool

Whether the array is pre-sorted. If True, x must be a 1D array.

Returns:

u : SciDBArray

The unique elements of x

var(A, index=None, scidb_syntax=False)

Array or axis variance.

see SciDBArray.var()

vstack(arrays)

Stack arrays in sequence vertically (column wise).

Parameters:

arrays : Sequence of SciDBArrays

The arrays to join. All arrays must have the same shape along all but the first dimension.

Returns:

stacked : SciDBArray

The array formed by stacking the given arrays.

wrap_array(scidbname, persistent=True)

Create a new SciDBArray object that references an existing SciDB array

Parameters:

scidbname : string

Wrap an existing scidb array referred to by scidbname. The SciDB array object persistent value will be set to True, and the object shape, datashape and data type values will be determined by the SciDB array.

persistent : boolean

If True (default) then array will not be deleted when this variable goes out of scope. Warning: if persistent is set to False, data could be lost!

zeros(shape, dtype=u'double', **kwargs)

Return an array of zeros

Parameters:

shape : tuple or int

The shape of the array

dtype : string or list

The data type of the array

**kwargs :

Additional keyword arguments are passed to SciDBDataShape.

Returns:

arr: SciDBArray

A SciDBArray consisting of all zeros.

Shim Interface

class scidbpy.interface.SciDBShimInterface(hostname, user=None, password=None, pam=None, digest=None)

HTTP interface to SciDB via shim [1]_

Parameters:

hostname : string

A URL pointing to a running shim/SciDB session

user : string (optional)

A username, for authentication

password : string (optional)

A password, for authentication

pam : bool (optional)

Whether to use PAM authentication. If True, then user and password are required. If None, will be guessed based on hostname and password values

digest : bool (optional)

Whether to use Digest authentication. If True, then user and password are required. If None, will be guessed based on hostname and password values.

[1] https://github.com/Paradigm4/shim

Attributes

afl
default_compression The default compression to use when downloading data

Methods

acos(A) Element-wise trigonometric inverse cosine
approxdc(A[, index, scidb_syntax]) Array or axis unique element estimate.
arange([start,] stop[, step,][, dtype]) Return evenly spaced values within a given interval.
asin(A) Element-wise trigonometric inverse sine
atan(A) Element-wise trigonometric inverse tangent
avg(A[, index, scidb_syntax]) Array or axis average.
ceil(A) Element-wise ceiling function
concatenate(arrays[, axis]) Concatenate several arrays along a particular dimension.
cos(A) Element-wise trigonometric cosine
count(A[, index, scidb_syntax]) Array or axis count.
cross_join(A, B, *dims) Perform a cross-join on arrays A and B.
dot(A, B) Compute the matrix product of A and B
dstack(arrays) Stack arrays in sequence depth wise (along the third axis).
exp(A) Element-wise natural exponent
floor(A) Element-wise floor function
from_array(A[, instance_id, chunk_size]) Initialize a scidb array from a numpy array
from_dataframe(A[, instance_id]) Initialize a scidb array from a pandas dataframe
from_sparse(A[, instance_id]) Initialize a scidb array from a sparse array
hstack(arrays) Stack arrays in sequence horizontally (column wise).
identity(n[, dtype, sparse]) Return a 2-dimensional square identity matrix of size n
isnan(A) Element-wise nan test function
join(*args) Perform a series of array joins on the arguments and return the result.
linspace(start, stop[, num, endpoint, retstep]) Return evenly spaced numbers over a specified interval.
list_arrays() List the arrays currently in the database
log(A) Element-wise natural logarithm
log10(A) Element-wise base-10 logarithm
login(user, password) Login using PAM authentication (e.g., over HTTPS)
logout() Logout from PAM authentication (e.g., over HTTPS)
ls([pattern]) List the arrays in the database, optionally matching to a pattern
max(A[, index, scidb_syntax]) Array or axis maximum.
mean(A[, index, scidb_syntax]) Array or axis mean.
merge(left, right[, on, left_on, right_on, ...]) Perform a Pandas-like join on two SciDBArrays.
min(A[, index, scidb_syntax]) Array or axis minimum.
new_array([shape, dtype, persistent, name]) Create a new array, either instantiating it in SciDB or simply reserving the name for use in a later query.
normalize(A)
ones(shape[, dtype]) Return an array of ones
percentile(a, q[, att]) Compute the qth percentile of the data along the specified axis
query(query, *args, **kwargs) Perform a query on the database.
randint(shape[, dtype, lower, upper, persistent]) Return an array of random integers between lower and upper
random(shape[, dtype, lower, upper, persistent]) Return an array of random floats between lower and upper
reap() Reap all arrays created via new_array
remove(array) Remove an array from the database
sin(A) Element-wise trigonometric sine
sqrt(A) Element-wise square root
std(A[, index, scidb_syntax]) Array or axis standard deviation.
stdev(A[, index, scidb_syntax]) Array or axis standard deviation.
substitute(A, value) Replace null values in an array
sum(A[, index, scidb_syntax]) Array or axis sum.
svd(A[, return_U, return_S, return_VT]) Compute the Singular Value Decomposition of the array A:
tan(A) Element-wise trigonometric tangent
toarray(A[, transfer_bytes]) Convert a SciDB array to a numpy array
todataframe(A[, transfer_bytes]) Convert a SciDB array to a pandas dataframe
tosparse(A[, sparse_fmt, transfer_bytes]) Convert a SciDB array to a sparse representation
unique(x[, is_sorted]) Store the unique elements of an array in a new array
var(A[, index, scidb_syntax]) Array or axis variance.
vstack(arrays) Stack arrays in sequence vertically (column wise).
wrap_array(scidbname[, persistent]) Create a new SciDBArray object that references an existing SciDB
zeros(shape[, dtype]) Return an array of zeros
acos(A)

Element-wise trigonometric inverse cosine

approxdc(A, index=None, scidb_syntax=False)

Array or axis unique element estimate.

see SciDBArray.approxdc()

arange([start, ]stop, [step, ]dtype=None, **kwargs)

Return evenly spaced values within a given interval.

Values are generated within the half-open interval [start, stop) (in other words, the interval including start but excluding stop). For integer arguments the behavior is equivalent to the Python range function, but returns an ndarray rather than a list.

When using a non-integer step, such as 0.1, the results will often not be consistent. It is better to use linspace for these cases.

Parameters:

start : number, optional

Start of interval. The interval includes this value. The default start value is 0.

stop : number

End of interval. The interval does not include this value, except in some cases where step is not an integer and floating point round-off affects the length of out.

step : number, optional

Spacing between values. For any output out, this is the distance between two adjacent values, out[i+1] - out[i]. The default step size is 1. If step is specified, start must also be given.

dtype : dtype

The type of the output array. If dtype is not given, it is inferred from the type of the input arguments.

**kwargs :

Additional arguments are passed to SciDBDatashape when creating the output array.

Returns:

arange : SciDBArray

Array of evenly spaced values.

For floating point arguments, the length of the result is ceil((stop - start)/step). Because of floating point overflow, this rule may result in the last element of out being greater than stop.

asin(A)

Element-wise trigonometric inverse sine

atan(A)

Element-wise trigonometric inverse tangent

avg(A, index=None, scidb_syntax=False)

Array or axis average.

see SciDBArray.avg()

ceil(A)

Element-wise ceiling function

concatenate(arrays, axis=0)

Concatenate several arrays along a particular dimension.

This behaves like numpy’s concatenate function when the input array dimensions are > the concatenation axis. It behaves differently than numpy when the array dimensions are less than the concatenation axis, in the following way:

Concatenating 1D arrays along axis=0 behaves like numpy’s vstack. Concatenating 1D arrays along axis=1 behaves like numpy’s hstack. Concatenating 1D or 2D arrays along axis=2 behaves like dstack.

Parameters:

arrays : Sequence of SciDBArrays

The arrays to concatenate

axis : int, optional (default 0)

The dimension to join on. Array shapes must match along all dimensions except this axis.

Returns:

stacked : SciDBArray

A stacked array

See also

hstack, vstack, dstack

cos(A)

Element-wise trigonometric cosine

count(A, index=None, scidb_syntax=False)

Array or axis count.

see SciDBArray.count()

cross_join(A, B, *dims)

Perform a cross-join on arrays A and B.

Parameters:

A, B : SciDBArray

*dims : tuples

The remaining arguments are tuples of dimension indices which should be joined.

default_compression

The default compression to use when downloading data

dot(A, B)

Compute the matrix product of A and B

Parameters:

A : SciDBArray

A must be a two-dimensional matrix of shape (n, p)

B : SciDBArray

B must be a two-dimensional matrix of shape (p, m)

Returns:

C : SciDBArray

The wrapper of the SciDB Array, of shape (n, m), consisting of the matrix product of A and B

dstack(arrays)

Stack arrays in sequence depth wise (along the third axis).

Parameters:

arrays : Sequence of SciDBArrays

The arrays to join. All arrays must have the same shape along all but the third dimension.

Returns:

stacked : SciDBArray

The array formed by stacking the given arrays.

exp(A)

Element-wise natural exponent

floor(A)

Element-wise floor function

from_array(A, instance_id=0, chunk_size=1000, **kwargs)

Initialize a scidb array from a numpy array

Parameters:

A : array_like (numpy array or sparse array)

input array from which the scidb array will be created

instance_id : integer

the instance ID used in loading (default=0; see SciDB documentation)

chunk_size : integer or list of integers

The chunk size of the uploaded SciDBArray. Default=1000

**kwargs :

Additional keyword arguments are passed to new_array()

Returns:

arr : SciDBArray

SciDB Array object built from the input array

from_dataframe(A, instance_id=0, **kwargs)

Initialize a scidb array from a pandas dataframe

Parameters:

A : pandas dataframe

data from which the scidb array will be created.

instance_id : integer

the instance ID used in loading (default=0; see SciDB documentation)

**kwargs :

Additional keyword arguments are passed to new_array()

Returns:

arr : SciDBArray

SciDB Array object built from the input array

from_sparse(A, instance_id=0, **kwargs)

Initialize a scidb array from a sparse array

Parameters:

A : sparse array

sparse input array from which the scidb array will be created. Note that this array will internally be converted to COO format.

instance_id : integer

the instance ID used in loading (default=0; see SciDB documentation)

**kwargs :

Additional keyword arguments are passed to new_array()

Returns:

arr : SciDBArray

SciDB Array object built from the input array

hstack(arrays)

Stack arrays in sequence horizontally (column wise).

Parameters:

arrays : Sequence of SciDBArrays

The arrays to join. All arrays must have the same shape along all but the second dimension.

Returns:

stacked : SciDBArray

The array formed by stacking the given arrays.

identity(n, dtype=u'double', sparse=False, **kwargs)

Return a 2-dimensional square identity matrix of size n

Parameters:

n : integer

the number of rows and columns in the matrix

dtype : string or list

The data type of the array

sparse : boolean

specify whether to create a sparse array (default=False)

**kwargs :

Additional keyword arguments are passed to SciDBDataShape.

Returns:

arr : SciDBArray

A SciDBArray containint an [n x n] identity matrix

isnan(A)

Element-wise nan test function

join(*args)

Perform a series of array joins on the arguments and return the result.

linspace(start, stop, num=50, endpoint=True, retstep=False, **kwargs)

Return evenly spaced numbers over a specified interval.

Returns num evenly spaced samples, calculated over the interval [start, stop ].

The endpoint of the interval can optionally be excluded.

Parameters:

start : scalar

The starting value of the sequence.

stop : scalar

The end value of the sequence, unless endpoint is set to False. In that case, the sequence consists of all but the last of num + 1 evenly spaced samples, so that stop is excluded. Note that the step size changes when endpoint is False.

num : int, optional

Number of samples to generate. Default is 50.

endpoint : bool, optional

If True, stop is the last sample. Otherwise, it is not included. Default is True.

retstep : bool, optional

If True, return (samples, step), where step is the spacing between samples.

**kwargs :

additional keyword arguments are passed to SciDBDataShape

Returns:

samples : SciDBArray

There are num equally spaced samples in the closed interval [start, stop] or the half-open interval [start, stop) (depending on whether endpoint is True or False).

step : float (only if retstep is True)

Size of spacing between samples.

list_arrays()

List the arrays currently in the database Returns ——- array_list : dictionary

A mapping of array name -> schema
log(A)

Element-wise natural logarithm

log10(A)

Element-wise base-10 logarithm

login(user, password)

Login using PAM authentication (e.g., over HTTPS)

logout()

Logout from PAM authentication (e.g., over HTTPS)

ls(pattern=u'*')

List the arrays in the database, optionally matching to a pattern

Parameters:

pattern : String (optional)

A glob-style pattern string. If present, only arrays whose names match the pattern are displayed. ‘*’ matches any string, ‘?’ matches any character

Returns:

result : list

A list of SciDB array names

max(A, index=None, scidb_syntax=False)

Array or axis maximum.

see SciDBArray.max()

mean(A, index=None, scidb_syntax=False)

Array or axis mean.

see SciDBArray.mean()

merge(left, right, on=None, left_on=None, right_on=None, how=u'inner', suffixes=(u'_x', u'_y'))

Perform a Pandas-like join on two SciDBArrays.

Parameters:

left : SciDBArray

The left array to join on

right : SciDBArray

The right array to join on

on : None, string, or list of strings

The names of dimensions or attributes to join on. Either on or both left_on and right_on must be supplied. If on is supplied, the specified names must exist in both left and right

left_on : None, string, or list of strings

The names of dimensions or attributes in the left array to join on. If provided, then right_on must also be provided, and have as many elements as left_on

right_on : None, string, or list of strings

The name of dimensions or attributes in the right array to join on. See notes above for left_join

how : ‘inner’ | ‘left’ | ‘right’ | ‘outer’

The kind of join to perform. Currently, only ‘inner’ is supported.

suffixes : tuple of two strings

The suffix to add to array dimensions or attributes which are duplicated in left and right.

Returns:

joined : SciDBArray

The joined array.

Notes

When joining on attributes, a categorical index is computed for each array. This index will appear as a dimension in the output.

This function builds an AFL join or cross join query, performing preprocessing on the inputs as necessary to match chunk sizes, avoid name collisions, etc.

If neither on, left_on, or right_on are provided, then the join defaults to the overlapping dimension names.

min(A, index=None, scidb_syntax=False)

Array or axis minimum.

see SciDBArray.min()

new_array(shape=None, dtype=u'double', persistent=False, name=None, **kwargs)

Create a new array, either instantiating it in SciDB or simply reserving the name for use in a later query.

Parameters:

shape : int or tuple (optional)

The shape of the array to create. If not specified, no array will be created and a name will simply be reserved for later use. WARNING: if shape=None and persistent=False, an error will result when the array goes out of scope, unless the name is used to create an array on the server.

dtype : string (optional)

the datatype of the array. This is only referenced if shape is specified. Default is ‘double’.

persistent : boolean (optional)

whether the created array should be persistent, i.e. survive in SciDB past when the object wrapper goes out of scope. Default is False.

name : str (optional)

The name to give the array in the databse. If present, persistent will be set to True.

**kwargs : (optional)

If shape is specified, additional keyword arguments are passed to SciDBDataShape. Otherwise, these will not be referenced.

Returns

——-

arr : SciDBArray

wrapper of the new SciDB array instance.

ones(shape, dtype=u'double', **kwargs)

Return an array of ones

Parameters:

shape : tuple or int

The shape of the array

dtype : string or list

The data type of the array

**kwargs :

Additional keyword arguments are passed to SciDBDataShape.

Returns:

arr: SciDBArray

A SciDBArray consisting of all ones.

percentile(a, q, att=None)

Compute the qth percentile of the data along the specified axis

Parameters:

a : SciDBArray

Input array

q : float in the range [0, 100] or a sequence of floats

The percentiles to compute

att : str, optional

The array attribute to compute percentiles for. Defaults to the first attribute

Returns:

qs : SciDBArray

An array with as many elements as q, listing the data value at each percentile

query(query, *args, **kwargs)

Perform a query on the database.

This wraps a query constructor which allows the creation of sophisticated SciDB queries which act on arrays wrapped by SciDBArray objects. See Notes below for details.

Parameters:

query : string

The query string, with curly-braces to indicate insertions

*args, **kwargs :

Values to be inserted (see below).

randint(shape, dtype=u'uint32', lower=0, upper=2147483647, persistent=False, **kwargs)

Return an array of random integers between lower and upper

Parameters:

shape : tuple or int

The shape of the array

dtype : string or list

The data type of the array

lower : float

The lower bound of the random sample (default=0)

upper : float

The upper bound of the random sample (default=2147483647)

persistent : bool

Whether the array is persistent (default=False)

**kwargs :

Additional keyword arguments are passed to SciDBDataShape.

Returns:

arr: SciDBArray

A SciDBArray consisting of random integers, uniformly distributed between lower and upper.

random(shape, dtype=u'double', lower=0, upper=1, persistent=False, **kwargs)

Return an array of random floats between lower and upper

Parameters:

shape : tuple or int

The shape of the array

dtype : string or list

The data type of the array

lower : float

The lower bound of the random sample (default=0)

upper : float

The upper bound of the random sample (default=1)

persistent : bool

Whether the new array is persistent (default=False)

**kwargs :

Additional keyword arguments are passed to SciDBDataShape.

Returns:

arr: SciDBArray

A SciDBArray consisting of random floating point numbers, uniformly distributed between lower and upper.

reap()

Reap all arrays created via new_array

remove(array)

Remove an array from the database

This removes the array even if its persistent property is True!

Parameters:

array : str or SciDBArray

The array (or name of array) to remove

See also

reap, SciDBArray.reap

sin(A)

Element-wise trigonometric sine

sqrt(A)

Element-wise square root

std(A, index=None, scidb_syntax=False)

Array or axis standard deviation.

see SciDBArray.std()

stdev(A, index=None, scidb_syntax=False)

Array or axis standard deviation.

see SciDBArray.stdev()

substitute(A, value)

Replace null values in an array

See SciDBArray.substitute()

sum(A, index=None, scidb_syntax=False)

Array or axis sum.

see SciDBArray.sum()

svd(A, return_U=True, return_S=True, return_VT=True)

Compute the Singular Value Decomposition of the array A:

A = U.S.V^T

Parameters:

A : SciDBArray

The array for which the SVD will be computed. It should be a 2-dimensional array with a single value per cell. Currently, the svd routine requires non-overlapping chunks of size 32.

return_U, return_S, return_VT : boolean

if any is True, then return the associated array. All are True by default

Returns:

[U], [S], [VT] : SciDBArrays

Arrays storing the singular values and vectors of A.

tan(A)

Element-wise trigonometric tangent

toarray(A, transfer_bytes=True)

Convert a SciDB array to a numpy array

todataframe(A, transfer_bytes=True)

Convert a SciDB array to a pandas dataframe

tosparse(A, sparse_fmt=u'recarray', transfer_bytes=True)

Convert a SciDB array to a sparse representation

unique(x, is_sorted=False)

Store the unique elements of an array in a new array

Parameters:

x : SciDBArray

The array to compute unique elements of.

is_sorted : bool

Whether the array is pre-sorted. If True, x must be a 1D array.

Returns:

u : SciDBArray

The unique elements of x

var(A, index=None, scidb_syntax=False)

Array or axis variance.

see SciDBArray.var()

vstack(arrays)

Stack arrays in sequence vertically (column wise).

Parameters:

arrays : Sequence of SciDBArrays

The arrays to join. All arrays must have the same shape along all but the first dimension.

Returns:

stacked : SciDBArray

The array formed by stacking the given arrays.

wrap_array(scidbname, persistent=True)

Create a new SciDBArray object that references an existing SciDB array

Parameters:

scidbname : string

Wrap an existing scidb array referred to by scidbname. The SciDB array object persistent value will be set to True, and the object shape, datashape and data type values will be determined by the SciDB array.

persistent : boolean

If True (default) then array will not be deleted when this variable goes out of scope. Warning: if persistent is set to False, data could be lost!

zeros(shape, dtype=u'double', **kwargs)

Return an array of zeros

Parameters:

shape : tuple or int

The shape of the array

dtype : string or list

The data type of the array

**kwargs :

Additional keyword arguments are passed to SciDBDataShape.

Returns:

arr: SciDBArray

A SciDBArray consisting of all zeros.

Visualization and Analysis

class scidbpy.aggregation.GroupBy(array, by, columns=None)

Perform a GroupBy operation on an array

The interface of this class mimics a subset of the functionality of Pandas’ groupby.

Notes

GroupBy items can be names of attributes or dimensions, or a single-attribute array whose shape matches the input.

For each non-unsigned integer attribute used in a groupby, a new categorical index dimension is created.

Examples

>>> x = sdb.afl.build('<a:int32>[i=0:100,1000,0]', 'iif(i > 50, 1, 0)')
>>> y = sdb.afl.build('<b:int32>[i=0:100,1000,0]', 'i % 30')
>>> z = sdb.join(x, y)
>>> grp = z.groupby('a')
>>> grp.aggregate('sum(b)').todataframe()
   a  b_sum
0  0    645
1  1    715

Multiple aggregation functions can be provided with a dict:

>>> grp.aggregate({'s':'sum(b)', 'm':'max(b)'}).todataframe()
       a    s   m
    0  0  645  29
    1  1  715  29

Methods

aggregate(mappings[, unpack]) Peform an aggregation over each group
approxdc() Compute the approxdc of all attributes in each group
avg() Compute the avg of all attributes in each group
count() Compute the count of all attributes in each group
max() Compute the max of all attributes in each group
min() Compute the min of all attributes in each group
stdev() Compute the stdev of all attributes in each group
sum() Compute the sum of all attributes in each group
var() Compute the var of all attributes in each group
aggregate(mappings, unpack=True)

Peform an aggregation over each group

Parameters:

mappings : string or dictionary

If a string, a single SciDB expression to apply to each group If a dict, mapping several attribute names to expression strings

unpack : bool (optional)

If True (the default), the result will be unpacked into a dense 1D array. If False, the result will be dimensioned by each groupby item.

Returns:

agg : SciDBArray

A new SciDBArray, obtained by applying the aggregations to the groups of the input array.

approxdc()

Compute the approxdc of all attributes in each group

avg()

Compute the avg of all attributes in each group

count()

Compute the count of all attributes in each group

max()

Compute the max of all attributes in each group

min()

Compute the min of all attributes in each group

stdev()

Compute the stdev of all attributes in each group

sum()

Compute the sum of all attributes in each group

var()

Compute the var of all attributes in each group

scidbpy.aggregation.histogram(X, bins=10, att=None, range=None, plot=False, **kwargs)

Build a 1D histogram from a SciDBArray.

Parameters:

X : SciDBArray

The array to compute a histogram for

att : str (optional)

The attribute of the array to consider. Defaults to the first attribute.

bins : int (optional)

The number of bins

range : [min, max] (optional)

The lower and upper limits of the histogram. Defaults to data limits.

plot : bool

If True, plot the results with matplotlib

histtype : ‘bar’ | ‘step’ (default=’bar’)

If plotting, the kind of hisogram to draw. See matplotlib.hist for more details.

kwargs : optional

Additional keywords passed to matplotlib

Returns:

(counts, edges [, artists])

  • edges is a NumPy array of edge locations (length=bins+1)
  • counts is the number of data betwen [edges[i], edges[i+1]] (length=bins)
  • artists is a list of the matplotlib artists created if plot=True