Tutorial RaggedArray ==================== Look at the :doc:`general tutorial ` for Arrays first, if you haven't done so. .. _access: What is a ragged array? ----------------------- A ragged array (also called a jagged array) can be seen as a sequence of subarrays that may be multidimensional and that may vary in the length of their first dimension only. In the simplest case it is a sequence of variable-length one-dimensional subarrays, e.g.: .. code:: python [[1,2], [3,4,5], [6], [7,8,9,10]] But they may also be variable-length multi-dimensional subarrays, e.g.: .. code:: python [[[1,2],[3,4]], [[5,6],[7,8],[9,10]], [[11,12]], [[13,14],[15,16]]] In the last case, the two-dimensional subarrays have a variable-length first axis, but a fixed length second axis of 2. It is said to have an 'atom' shape of (2,). Numpy does not natively support *ragged* arrays, but they are often used in science, which is why formats like HDF5 and Zarr do support them, and Darr does so too. Often-encountered use cases include (multidimensional) data that has been recorded intermittently, think of acoustic monitoring where only interesting sound events of varying duration are saved, or event-related episodes in long-term neural recordings. You could save each subarray in a separate file, but this may become unwieldy and inefficient when the size of the subarrays is comparatively small and their number very high. Creating a RaggedArray ---------------------- The `asraggedarray` function takes anything that consists of a sequence of arrays: .. code:: python >>> ra1 = darr.asraggedarray('test_ra1.darr',[[[1,2],[3,4]], [[5,6],[7,8],[9,10]], [[11,12]], [[13,14],[15,16]]], dtype='uint16') >>> ra1 RaggedArray (4 subarrays with atom shape (2,), r+) It also takes anything that *generates* a sequence of arrays, which is handy for large sequences generated by, e.g., a measuring device or when input is simply too large to fit in memory. >>> ra2 = darr.asraggedarray('test_ra2.darr', (i*[i] for i in range(10)), dtype='float32') >>> ra2 RaggedArray (10 subarrays with atom shape (), r+) >>> ra2[3] array([3., 3., 3.], dtype=float32) >>> ra2[7] array([7., 7., 7., 7., 7., 7., 7.], dtype=float32) You can also create an empty ragged array with the `create_raggedarray` function and then simply append data: >>> ra3 = create_raggedarray('test_ra3.darr', atom=(3,), dtype='float64', metadata={'date': "20220301"}) >>> ra3 RaggedArray (0 subarrays with atom shape (3,), r+) >>> ra3.append([[1,2,3],[4,5,6]]) >>> ra3.append([[7,8,9],[10,11,12],[13,14,15]]) >>> ra3 RaggedArray (2 subarrays with atom shape (3,), r+) >>> ra3[1] array([[ 7., 8., 9.], [10., 11., 12.], [13., 14., 15.]]) Reading code for other computing languages ------------------------------------------ Like Arrays, RaggedArrays have a README.txt file containing explanation and reading code for many scientific computing languages (see `example `__). This code can also be produced on the fly, for a fast copy-paste into, say, R: .. code:: python >>> print(ra2.readcode('R')) will produce code to read the data in R: .. code:: r # read array of indices to be used on values array fileid <- file("indices/arrayvalues.bin", "rb") i <- readBin(con=fileid, what=integer(), n=20, size=8, signed=TRUE, endian="little") i <- array(data=i, dim=c(2, 10), dimnames=NULL) close(fileid) # read array of values: fileid <- file("values/arrayvalues.bin", "rb") v <- readBin(con=fileid, what=numeric(), n=45, size=4, signed=TRUE, endian="little") close(fileid) # create function to get subarrays: get_subarray <- function(k){ starti <- i[1,k] + 1 # R starts counting from 1 endi <- i[2,k] # R has inclusive end index if (starti > endi) { # subarray is empty return (c()) } else { return (v[starti:endi]) } } # example to read third (k=3) subarray: sa = get_subarray(3) Of course, ragged arrays are more complex than simple multi-dimensional arrays, so the code is also more complex. But you only need to copy-paste it so that is not a real concern. To see which languages are supported: .. code:: python >>> ra2.readcodelanguages ('R', 'darr', 'idl', 'julia', 'maple', 'mathematica', 'matlab', 'numpymemmap', 'scilab')