# Creating arrays¶

The following sections illustrate a number of ways to create `SciDBArray`

objects. The examples assume that an `sdb`

interface object has already
been set up.

## From a numpy array¶

Perhaps the simplest approach to creating an arbitrary `SciDBArray`

object is to upload a numpy array into SciDB with the
`from_array()`

function. Although this approach is very convenient, it is not really suitable
for very big arrays (which might exceed memory availability in a single
computer, for example). In such cases, consider other options described below.

The following example creates a `SciDBArray`

object named `Xsdb`

from a small 5x4 numpy array named `X`

.

```
from scidbpy import connect
sdb = connect()
X = np.random.random((5, 4))
Xsdb = sdb.from_array(X)
```

The package takes care of naming the SciDB array in this example (use
`Xsdb.name`

to see the SciDB array name).

## From a scipy sparse matrix¶

In a similar way, a `SciDBArray`

can be created from a scipy sparse
matrix. For example:

```
from scipy.sparse import coo_matrix
X = np.random.random((10, 10))
X[X < 0.9] = 0 # make array sparse
Xcoo = coo_matrix(X)
Xsdb = sdb.from_sparse(Xcoo)
```

This operation is most efficient for matrices stored in coordinate form
(`coo_matrix`

). Other sparse formats will be internally converted to
COO form in the process of transferring the data.

## Convenience array creation functions¶

Many standard numpy functions for creating special arrays are supported. These include:

`zeros()`

- to create an array full of zeros:

```
# Create a 10x10 array of double-precision zeros:
A = sdb.zeros((10, 10))
```

`ones()`

- to create an array full of ones:

```
# Create a 10x10 array of 64-bit signed integer ones:
A = sdb.ones((10, 10), dtype='int64')
```

`random()`

- to create an array of uniformly distributed random floating-point values:

```
# Create a 10x10 array of numbers between -1 and 2 (inclusive)
# sampled from a uniform random distribution.
A = sdb.random((10, 10), lower=-1, upper=2)
```

`randint()`

- to create an array of uniformly distributed random integers:

```
# Create a 10x10 array of uniform random integers between 0 and 10
# (inclusive of 0, non-inclusive of 10)
A = sdb.randint((10, 10), lower=0, upper=10)
```

`arange()`

- to create and array with evenly-spaced values given a step size:

```
# Create a vector of ten integers, counting up from zero
A = sdb.arange(10)
```

`linspace()`

- to create an array with evenly spaced values between supplied bounds:

```
# Create a vector of 5 equally spaced numbers between 1 and 10,
# including the endpoints:
A = sdb.linspace(1, 10, 5)
```

`identity()`

- to create a sparse or dense identity matrix:

```
# Create a 10x10 sparse, double-precision-valued identity matrix:
A = sdb.identity(10, dtype='double', sparse=True)
```

These functions should be familiar to anyone who has used NumPy, and the syntax of each function closely follows its NumPy counterpart. In each case, the array is defined and created directly in the SciDB server, and the resulting Python object is simply a wrapper of the native SciDB array. Because of this, the functions outlined here and in the following sections can be more efficient ways to generate large SciDB arrays than copying data from a numpy array.

Note

SciDB does not yet have a way to set a random seed, prohibiting reproducible results involving the random number generator.

## From an existing SciDB array¶

Finally, `SciDBArray`

objects may be created from existing SciDB arrays, so
long as the data type restrictions outlined above are met. (It usually makes
sense to load large data sets into SciDB externally from the Python package,
using the SciDB parallel bulk loader or similar facility.)

The following example uses the `query()`

function to build
and store a small 10x5 SciDB array named “A” independently of Python.
We then create a `SciDBArray`

object from the SciDB array with the `wrap_array()`

function, passing
the name of the array identifier on the SciDB server:

```
# remove A if it already exists
if "A" in sdb.list_arrays():
sdb.query("remove(A)")
# create an array named 'A' on the server
sdb.query("store(build(<v:double>[i=1:10,10,0,j=1:5,5,0],i+j),A)")
# create a Python object pointing to this array
A = sdb.wrap_array("A")
```

Note that there are some restrictions on the types of arrays which can be
wrapped by `SciDB-Py`

. The array data must be of a compatible type, and
have integer indices. Also, arrays with indices that don’t start at zero
may not behave as expected for item access and slicing, discussed below.

Note also that many functions in the SciDB-Py package work on single-attribute
arrays. When a `SciDBArray`

object refers to a SciDB array with more
than one attribute, only the first listed attribute is used.

## Persistence of SciDB-Py arrays¶

Each array has a `persistent`

attribute.
When `persistent`

is set to `True`

, arrays remain in SciDB
until explicitly removed by a `remove`

query.
If `persistent`

is set to False, the arrays are removed when the
`SciDBInterface.reap()`

or `SciDBArray.reap()`

methods are invoked.
(Note that `interface.SciDBInterface.reap()`

is automatically invoked when
Python exits).

Arrays defined from an existing SciDB array using the
`wrap_array()`

argument are always persistent, while
all other array creation routines set `persistent=False`

by default:

```
X = sdb.random(10, persistent=False) # default
print(X.name in sdb.list_arrays()) # True
X.reap()
print(X.name in sdb.list_arrays()) # False
```

When `connect()`

is used as a context manager, non-persistent
arrays are reaped at the end of the context block:

```
with connect(url) as sdb:
X = sdb.random(10)
# deleted here
```