Creating arrays

The following sections illustrate a number of ways to create SciDBArray objects. The examples assume that an sdb interface object has already been set up.

From a numpy array

Perhaps the simplest approach to creating an arbitrary SciDBArray object is to upload a numpy array into SciDB with the from_array() function. Although this approach is very convenient, it is not really suitable for very big arrays (which might exceed memory availability in a single computer, for example). In such cases, consider other options described below.

The following example creates a SciDBArray object named Xsdb from a small 5x4 numpy array named X.

from scidbpy import connect
sdb = connect()

X = np.random.random((5, 4))
Xsdb = sdb.from_array(X)

The package takes care of naming the SciDB array in this example (use Xsdb.name to see the SciDB array name).

From a scipy sparse matrix

In a similar way, a SciDBArray can be created from a scipy sparse matrix. For example:

from scipy.sparse import coo_matrix
X = np.random.random((10, 10))
X[X < 0.9] = 0  # make array sparse
Xcoo = coo_matrix(X)
Xsdb = sdb.from_sparse(Xcoo)

This operation is most efficient for matrices stored in coordinate form (coo_matrix). Other sparse formats will be internally converted to COO form in the process of transferring the data.

Convenience array creation functions

Many standard numpy functions for creating special arrays are supported. These include:

zeros()
to create an array full of zeros:
# Create a 10x10 array of double-precision zeros:
A = sdb.zeros((10, 10))
ones()
to create an array full of ones:
# Create a 10x10 array of 64-bit signed integer ones:
A = sdb.ones((10, 10), dtype='int64')
random()
to create an array of uniformly distributed random floating-point values:
# Create a 10x10 array of numbers between -1 and 2 (inclusive)
#    sampled from a uniform random distribution.
A = sdb.random((10, 10), lower=-1, upper=2)
randint()
to create an array of uniformly distributed random integers:
# Create a 10x10 array of uniform random integers between 0 and 10
#  (inclusive of 0, non-inclusive of 10)
A = sdb.randint((10, 10), lower=0, upper=10)
arange()
to create and array with evenly-spaced values given a step size:
# Create a vector of ten integers, counting up from zero
A = sdb.arange(10)
linspace()
to create an array with evenly spaced values between supplied bounds:
# Create a vector of 5 equally spaced numbers between 1 and 10,
# including the endpoints:
A = sdb.linspace(1, 10, 5)
identity()
to create a sparse or dense identity matrix:
# Create a 10x10 sparse, double-precision-valued identity matrix:
A = sdb.identity(10, dtype='double', sparse=True)

These functions should be familiar to anyone who has used NumPy, and the syntax of each function closely follows its NumPy counterpart. In each case, the array is defined and created directly in the SciDB server, and the resulting Python object is simply a wrapper of the native SciDB array. Because of this, the functions outlined here and in the following sections can be more efficient ways to generate large SciDB arrays than copying data from a numpy array.

Note

SciDB does not yet have a way to set a random seed, prohibiting reproducible results involving the random number generator.

From an existing SciDB array

Finally, SciDBArray objects may be created from existing SciDB arrays, so long as the data type restrictions outlined above are met. (It usually makes sense to load large data sets into SciDB externally from the Python package, using the SciDB parallel bulk loader or similar facility.)

The following example uses the query() function to build and store a small 10x5 SciDB array named “A” independently of Python. We then create a SciDBArray object from the SciDB array with the wrap_array() function, passing the name of the array identifier on the SciDB server:

# remove A if it already exists
if "A" in sdb.list_arrays():
    sdb.query("remove(A)")

# create an array named 'A' on the server
sdb.query("store(build(<v:double>[i=1:10,10,0,j=1:5,5,0],i+j),A)")

# create a Python object pointing to this array
A = sdb.wrap_array("A")

Note that there are some restrictions on the types of arrays which can be wrapped by SciDB-Py. The array data must be of a compatible type, and have integer indices. Also, arrays with indices that don’t start at zero may not behave as expected for item access and slicing, discussed below.

Note also that many functions in the SciDB-Py package work on single-attribute arrays. When a SciDBArray object refers to a SciDB array with more than one attribute, only the first listed attribute is used.

Persistence of SciDB-Py arrays

Each array has a persistent attribute. When persistent is set to True, arrays remain in SciDB until explicitly removed by a remove query. If persistent is set to False, the arrays are removed when the SciDBInterface.reap() or SciDBArray.reap() methods are invoked. (Note that interface.SciDBInterface.reap() is automatically invoked when Python exits).

Arrays defined from an existing SciDB array using the wrap_array() argument are always persistent, while all other array creation routines set persistent=False by default:

X = sdb.random(10, persistent=False)  # default
print(X.name in sdb.list_arrays())  # True
X.reap()
print(X.name in sdb.list_arrays())  # False

When connect() is used as a context manager, non-persistent arrays are reaped at the end of the context block:

with connect(url) as sdb:
    X = sdb.random(10)
# deleted here