The following sections illustrate a number of ways to create
objects. The examples assume that an
sdb interface object has already
been set up.
From a numpy array¶
Perhaps the simplest approach to creating an arbitrary
object is to upload a numpy array into SciDB with the
function. Although this approach is very convenient, it is not really suitable
for very big arrays (which might exceed memory availability in a single
computer, for example). In such cases, consider other options described below.
The following example creates a
SciDBArray object named
from a small 5x4 numpy array named
from scidbpy import connect sdb = connect() X = np.random.random((5, 4)) Xsdb = sdb.from_array(X)
The package takes care of naming the SciDB array in this example (use
Xsdb.name to see the SciDB array name).
From a scipy sparse matrix¶
In a similar way, a
SciDBArray can be created from a scipy sparse
matrix. For example:
from scipy.sparse import coo_matrix X = np.random.random((10, 10)) X[X < 0.9] = 0 # make array sparse Xcoo = coo_matrix(X) Xsdb = sdb.from_sparse(Xcoo)
This operation is most efficient for matrices stored in coordinate form
coo_matrix). Other sparse formats will be internally converted to
COO form in the process of transferring the data.
Convenience array creation functions¶
Many standard numpy functions for creating special arrays are supported. These include:
- to create an array full of zeros:
# Create a 10x10 array of double-precision zeros: A = sdb.zeros((10, 10))
- to create an array full of ones:
# Create a 10x10 array of 64-bit signed integer ones: A = sdb.ones((10, 10), dtype='int64')
- to create an array of uniformly distributed random floating-point values:
# Create a 10x10 array of numbers between -1 and 2 (inclusive) # sampled from a uniform random distribution. A = sdb.random((10, 10), lower=-1, upper=2)
- to create an array of uniformly distributed random integers:
# Create a 10x10 array of uniform random integers between 0 and 10 # (inclusive of 0, non-inclusive of 10) A = sdb.randint((10, 10), lower=0, upper=10)
- to create and array with evenly-spaced values given a step size:
# Create a vector of ten integers, counting up from zero A = sdb.arange(10)
- to create an array with evenly spaced values between supplied bounds:
# Create a vector of 5 equally spaced numbers between 1 and 10, # including the endpoints: A = sdb.linspace(1, 10, 5)
- to create a sparse or dense identity matrix:
# Create a 10x10 sparse, double-precision-valued identity matrix: A = sdb.identity(10, dtype='double', sparse=True)
These functions should be familiar to anyone who has used NumPy, and the syntax of each function closely follows its NumPy counterpart. In each case, the array is defined and created directly in the SciDB server, and the resulting Python object is simply a wrapper of the native SciDB array. Because of this, the functions outlined here and in the following sections can be more efficient ways to generate large SciDB arrays than copying data from a numpy array.
SciDB does not yet have a way to set a random seed, prohibiting reproducible results involving the random number generator.
From an existing SciDB array¶
SciDBArray objects may be created from existing SciDB arrays, so
long as the data type restrictions outlined above are met. (It usually makes
sense to load large data sets into SciDB externally from the Python package,
using the SciDB parallel bulk loader or similar facility.)
The following example uses the
query() function to build
and store a small 10x5 SciDB array named “A” independently of Python.
We then create a
object from the SciDB array with the
wrap_array() function, passing
the name of the array identifier on the SciDB server:
# remove A if it already exists if "A" in sdb.list_arrays(): sdb.query("remove(A)") # create an array named 'A' on the server sdb.query("store(build(<v:double>[i=1:10,10,0,j=1:5,5,0],i+j),A)") # create a Python object pointing to this array A = sdb.wrap_array("A")
Note that there are some restrictions on the types of arrays which can be
SciDB-Py. The array data must be of a compatible type, and
have integer indices. Also, arrays with indices that don’t start at zero
may not behave as expected for item access and slicing, discussed below.
Note also that many functions in the SciDB-Py package work on single-attribute
arrays. When a
SciDBArray object refers to a SciDB array with more
than one attribute, only the first listed attribute is used.
Persistence of SciDB-Py arrays¶
Each array has a
persistent is set to
True, arrays remain in SciDB
until explicitly removed by a
persistent is set to False, the arrays are removed when the
SciDBArray.reap() methods are invoked.
interface.SciDBInterface.reap() is automatically invoked when
Arrays defined from an existing SciDB array using the
wrap_array() argument are always persistent, while
all other array creation routines set
persistent=False by default:
X = sdb.random(10, persistent=False) # default print(X.name in sdb.list_arrays()) # True X.reap() print(X.name in sdb.list_arrays()) # False
connect() is used as a context manager, non-persistent
arrays are reaped at the end of the context block:
with connect(url) as sdb: X = sdb.random(10) # deleted here