Tutorial RaggedArray#

Look at the general tutorial for Arrays first, if you haven’t done so.

What is a ragged array?#

A ragged array (also called a jagged array) can be seen as a sequence of subarrays that may be multidimensional and that may vary in the length of their first dimension only.

In the simplest case it is a sequence of variable-length one-dimensional subarrays, e.g.:

[[1,2],
 [3,4,5],
 [6],
 [7,8,9,10]]

But they may also be variable-length multi-dimensional subarrays, e.g.:

[[[1,2],[3,4]],
 [[5,6],[7,8],[9,10]],
 [[11,12]],
 [[13,14],[15,16]]]

In the last case, the two-dimensional subarrays have a variable-length first axis, but a fixed length second axis of 2. It is said to have an ‘atom’ shape of (2,).

Numpy does not natively support ragged arrays, but they are often used in science, which is why formats like HDF5 and Zarr do support them, and Darr does so too. Often-encountered use cases include (multidimensional) data that has been recorded intermittently, think of acoustic monitoring where only interesting sound events of varying duration are saved, or event-related episodes in long-term neural recordings. You could save each subarray in a separate file, but this may become unwieldy and inefficient when the size of the subarrays is comparatively small and their number very high.

Creating a RaggedArray#

The asraggedarray function takes anything that consists of a sequence of arrays:

>>>  ra1 = darr.asraggedarray('test_ra1.darr',[[[1,2],[3,4]],
                                               [[5,6],[7,8],[9,10]],
                                               [[11,12]],
                                               [[13,14],[15,16]]],
                              dtype='uint16')
>>> ra1
RaggedArray (4 subarrays with atom shape (2,), r+)

It also takes anything that generates a sequence of arrays, which is handy for large sequences generated by, e.g., a measuring device or when input is simply too large to fit in memory.

>>> ra2 = darr.asraggedarray('test_ra2.darr', (i*[i] for i in range(10)),
                             dtype='float32')
>>> ra2
RaggedArray (10 subarrays with atom shape (), r+)
>>> ra2[3]
array([3., 3., 3.], dtype=float32)
>>> ra2[7]
array([7., 7., 7., 7., 7., 7., 7.], dtype=float32)

You can also create an empty ragged array with the create_raggedarray function and then simply append data:

>>> ra3 = create_raggedarray('test_ra3.darr', atom=(3,), dtype='float64',
                             metadata={'date': "20220301"})
>>> ra3
RaggedArray (0 subarrays with atom shape (3,), r+)
>>> ra3.append([[1,2,3],[4,5,6]])
>>> ra3.append([[7,8,9],[10,11,12],[13,14,15]])
>>> ra3
RaggedArray (2 subarrays with atom shape (3,), r+)
>>> ra3[1]
array([[ 7.,  8.,  9.],
       [10., 11., 12.],
       [13., 14., 15.]])

Reading code for other computing languages#

Like Arrays, RaggedArrays have a README.txt file containing explanation and reading code for many scientific computing languages (see example). This code can also be produced on the fly, for a fast copy-paste into, say, R:

>>> print(ra2.readcode('R'))

will produce code to read the data in R:

# read array of indices to be used on values array
fileid <- file("indices/arrayvalues.bin", "rb")
i <- readBin(con=fileid, what=integer(), n=20, size=8, signed=TRUE, endian="little")
i <- array(data=i, dim=c(2, 10), dimnames=NULL)
close(fileid)
# read array of values:
fileid <- file("values/arrayvalues.bin", "rb")
v <- readBin(con=fileid, what=numeric(), n=45, size=4, signed=TRUE, endian="little")
close(fileid)
# create function to get subarrays:
get_subarray <- function(k){
    starti <- i[1,k] + 1  # R starts counting from 1
    endi <- i[2,k]        # R has inclusive end index
    if (starti > endi) {  # subarray is empty
        return (c())
    } else {
        return (v[starti:endi])
    }
}
# example to read third (k=3) subarray:
sa = get_subarray(3)

Of course, ragged arrays are more complex than simple multi-dimensional arrays, so the code is also more complex. But you only need to copy-paste it so that is not a real concern.

To see which languages are supported:

>>> ra2.readcodelanguages
    ('R',
     'darr',
     'idl',
     'julia',
     'maple',
     'mathematica',
     'matlab',
     'numpymemmap',
     'scilab')