Arrays#

A Vortex array is a possibly compressed ordered set of homogeneously typed values. Each array has a logical type and a physical encoding. The logical type describes the set of operations applicable to the values of this array. The physical encoding describes how this array is realized in memory, on disk, and over the wire and how to apply operations to that realization.

array

The main entry point for creating Vortex arrays from other Python objects.

compress

Attempt to compress a vortex array.

Array

An array of zero or more rows each with the same set of columns.


vortex.encoding.array(obj: Array | list | Any) Array#

The main entry point for creating Vortex arrays from other Python objects.

This function is also available as vortex.array.

Parameters:

obj (pyarrow.Array, list, pandas.DataFrame) – The elements of this array or list become the elements of the Vortex array.

Return type:

vortex.encoding.Array

Examples

A Vortex array containing the first three integers:

>>> vortex.array([1, 2, 3]).to_arrow_array()
<pyarrow.lib.Int64Array object at ...>
[
  1,
  2,
  3
]

The same Vortex array with a null value in the third position:

>>> vortex.array([1, 2, None, 3]).to_arrow_array()
<pyarrow.lib.Int64Array object at ...>
[
  1,
  2,
  null,
  3
]

Initialize a Vortex array from an Arrow array:

>>> arrow = pyarrow.array(['Hello', 'it', 'is', 'me'], type=pyarrow.string_view())
>>> vortex.array(arrow).to_arrow_array()
<pyarrow.lib.StringViewArray object at ...>
[
  "Hello",
  "it",
  "is",
  "me"
]

Initialize a Vortex array from a Pandas dataframe:

>>> import pandas as pd
>>> df = pd.DataFrame({
...     "Name": ["Braund", "Allen", "Bonnell"],
...     "Age": [22, 35, 58],
... })
>>> vortex.array(df).to_arrow_array()
<pyarrow.lib.ChunkedArray object at ...>
[
  -- is_valid: all not null
  -- child 0 type: string_view
    [
      "Braund",
      "Allen",
      "Bonnell"
    ]
  -- child 1 type: int64
    [
      22,
      35,
      58
    ]
]
vortex.encoding.compress(array)#

Attempt to compress a vortex array.

Parameters:

array (Array) – The array.

Examples

Compress a very sparse array of integers:

>>> a = vortex.array([42 for _ in range(1000)])
>>> str(vortex.compress(a))
'vortex.constant(0x09)(i64, len=1000)'

Compress an array of increasing integers:

>>> a = vortex.array(list(range(1000)))
>>> str(vortex.compress(a))
'fastlanes.for(0x17)(i64, len=1000)'

Compress an array of increasing floating-point numbers and a few nulls:

>>> a = vortex.array([
...     float(x) if x % 20 != 0 else None
...     for x in range(1000)
... ])
>>> str(vortex.compress(a))
'vortex.alp(0x11)(f64?, len=1000)'
class vortex.encoding.Array#

An array of zero or more rows each with the same set of columns.

Examples

Arrays support all the standard comparison operations:

>>> a = vortex.array(['dog', None, 'cat', 'mouse', 'fish'])
>>> b = vortex.array(['doug', 'jennifer', 'casper', 'mouse', 'faust'])
>>> (a < b).to_arrow_array()
<pyarrow.lib.BooleanArray object at ...>
[
  true,
  null,
  false,
  false,
  false
]
>>> (a <= b).to_arrow_array()
<pyarrow.lib.BooleanArray object at ...>
[
  true,
  null,
  false,
  true,
  false
]
>>> (a == b).to_arrow_array()
<pyarrow.lib.BooleanArray object at ...>
[
  false,
  null,
  false,
  true,
  false
]
>>> (a != b).to_arrow_array()
<pyarrow.lib.BooleanArray object at ...>
[
  true,
  null,
  true,
  false,
  true
]
>>> (a >= b).to_arrow_array()
<pyarrow.lib.BooleanArray object at ...>
[
  false,
  null,
  true,
  true,
  true
]
>>> (a > b).to_arrow_array()
<pyarrow.lib.BooleanArray object at ...>
[
  false,
  null,
  true,
  false,
  true
]
__len__()#

Return len(self).

dtype#

The data type of this array.

Return type:

vortex.dtype.DType

Examples

By default, array() uses the largest available bit-width:

>>> vortex.array([1, 2, 3]).dtype
int(64, False)

Including a None forces a nullable type:

>>> vortex.array([1, None, 2, 3]).dtype
int(64, True)

A UTF-8 string array:

>>> vortex.array(['hello, ', 'is', 'it', 'me?']).dtype
utf8(False)
fill_forward()#

Fill forward non-null values over runs of nulls.

Leading nulls are replaced with the “zero” for that type. For integral and floating-point types, this is zero. For the Boolean type, this is :obj:`False.

Fill forward sensor values over intermediate missing values. Note that leading nulls are replaced with 0.0:

>>> a = vortex.array([
...      None,  None, 30.29, 30.30, 30.30,  None,  None, 30.27, 30.25,
...     30.22,  None,  None,  None,  None, 30.12, 30.11, 30.11, 30.11,
...     30.10, 30.08,  None, 30.21, 30.03, 30.03, 30.05, 30.07, 30.07,
... ])
>>> a.fill_forward().to_arrow_array()
<pyarrow.lib.DoubleArray object at ...>
[
  0,
  0,
  30.29,
  30.3,
  30.3,
  30.3,
  30.3,
  30.27,
  30.25,
  30.22,
  ...
  30.11,
  30.1,
  30.08,
  30.08,
  30.21,
  30.03,
  30.03,
  30.05,
  30.07,
  30.07
]
filter(filter)#

Filter an Array by another Boolean array.

Parameters:

filter (Array) – Keep all the rows in self for which the correspondingly indexed row in filter is True.

Return type:

Array

Examples

Keep only the single digit positive integers.

>>> a = vortex.array([0, 42, 1_000, -23, 10, 9, 5])
>>> filter = vortex.array([True, False, False, False, False, True, True])
>>> a.filter(filter).to_arrow_array()
<pyarrow.lib.Int64Array object at ...>
[
  0,
  9,
  5
]
scalar_at(index)#

Retrieve a row by its index.

Parameters:

index (int) – The index of interest. Must be greater than or equal to zero and less than the length of this array.

Returns:

If this array contains numbers or Booleans, this array returns the corresponding primitive Python type, i.e. int, float, and bool. For structures and variable-length data types, a zero-copy view of the underlying data is returned.

Return type:

one of int, float, bool, vortex.scalar.Buffer, vortex.scalar.BufferString, vortex.scalar.VortexList, vortex.scalar.VortexStruct

Examples

Retrieve the last element from an array of integers:

>>> vortex.array([10, 42, 999, 1992]).scalar_at(3)
1992

Retrieve the third element from an array of strings:

>>> array = vortex.array(["hello", "goodbye", "it", "is"])
>>> array.scalar_at(2)
<vortex.BufferString ...>

Vortex, by default, returns a view into the array’s data. This avoids copying the data, which can be expensive if done repeatedly. BufferString.into_python() forcibly copies the scalar data into a Python data structure.

>>> array.scalar_at(2).into_python()
'it'

Retrieve an element from an array of structures:

>>> array = vortex.array([
...     {'name': 'Joseph', 'age': 25},
...     {'name': 'Narendra', 'age': 31},
...     {'name': 'Angela', 'age': 33},
...     None,
...     {'name': 'Mikhail', 'age': 57},
... ])
>>> array.scalar_at(2).into_python()
{'age': 33, 'name': <vortex.BufferString ...>}

Notice that VortexStruct.into_python() only copies one “layer” of data into Python. If we want to ensure the entire structure is recurisvely copied into Python we can specify recursive=True:

>>> array.scalar_at(2).into_python(recursive=True)
{'age': 33, 'name': 'Angela'}

Retrieve a missing element from an array of structures:

>>> array.scalar_at(3) is None
True

Out of bounds accesses are prohibited:

>>> vortex.array([10, 42, 999, 1992]).scalar_at(10)
Traceback (most recent call last):
...
ValueError: index 10 out of bounds from 0 to 4
...

Unlike Python, negative indices are not supported:

>>> vortex.array([10, 42, 999, 1992]).scalar_at(-2)
Traceback (most recent call last):
...
OverflowError: can't convert negative int to unsigned
slice(start, end)#

Keep only a contiguous subset of elements.

Parameters:
  • start (int) – The start index of the range to keep, inclusive.

  • end (int) – The end index, exclusive.

Return type:

Array

Examples

Keep only the second through third elements:

>>> a = vortex.array(['a', 'b', 'c', 'd'])
>>> a.slice(1, 3).to_arrow_array()
<pyarrow.lib.StringArray object at ...>
[
  "b",
  "c"
]

Keep none of the elements:

>>> a = vortex.array(['a', 'b', 'c', 'd'])
>>> a.slice(3, 3).to_arrow_array()
<pyarrow.lib.StringArray object at ...>
[]

Unlike Python, it is an error to slice outside the bounds of the array:

>>> a = vortex.array(['a', 'b', 'c', 'd'])
>>> a.slice(2, 10).to_arrow_array()
Traceback (most recent call last):
...
ValueError: index 10 out of bounds from 0 to 4

Or to slice with a negative value:

>>> a = vortex.array(['a', 'b', 'c', 'd'])
>>> a.slice(-2, -1).to_arrow_array()
Traceback (most recent call last):
...
OverflowError: can't convert negative int to unsigned
take(indices)#

Filter, permute, and/or repeat elements by their index.

Parameters:

indices (Array) – An array of indices to keep.

Return type:

Array

Examples

Keep only the first and third elements:

>>> a = vortex.array(['a', 'b', 'c', 'd'])
>>> indices = vortex.array([0, 2])
>>> a.take(indices).to_arrow_array()
<pyarrow.lib.StringArray object at ...>
[
  "a",
  "c"
]

Permute and repeat the first and second elements:

>>> a = vortex.array(['a', 'b', 'c', 'd'])
>>> indices = vortex.array([0, 1, 1, 0])
>>> a.take(indices).to_arrow_array()
<pyarrow.lib.StringArray object at ...>
[
  "a",
  "b",
  "b",
  "a"
]
to_arrow_array()#

Convert this array to an Arrow array.

See also

to_arrow_table()

Return type:

pyarrow.Array

Examples

Round-trip an Arrow array through a Vortex array:

>>> vortex.array([1, 2, 3]).to_arrow_array()
<pyarrow.lib.Int64Array object at ...>
[
  1,
  2,
  3
]
to_arrow_table() Table#

Construct an Arrow table from this Vortex array.

See also

to_arrow_array()

Warning

Only struct-typed arrays can be converted to Arrow tables.

Return type:

pyarrow.Table

Examples

>>> array = vortex.array([
...     {'name': 'Joseph', 'age': 25},
...     {'name': 'Narendra', 'age': 31},
...     {'name': 'Angela', 'age': 33},
...     {'name': 'Mikhail', 'age': 57},
... ])
>>> array.to_arrow_table()
pyarrow.Table
age: int64
name: string_view
----
age: [[25,31,33,57]]
name: [["Joseph","Narendra","Angela","Mikhail"]]
to_numpy(*, zero_copy_only: bool = True) numpy.ndarray#

Construct a NumPy array from this Vortex array.

This is an alias for self.to_arrow_array().to_numpy(zero_copy_only)

Parameters:

zero_copy_only (bool) – When True, this method will raise an error unless a NumPy array can be created without copying the data. This is only possible when the array is a primitive array without nulls.

Return type:

numpy.ndarray

Examples

Construct an immutable ndarray from a Vortex array:

>>> array = vortex.array([1, 0, 0, 1])
>>> array.to_numpy()
array([1, 0, 0, 1])
to_pandas_df() DataFrame#

Construct a Pandas dataframe from this Vortex array.

Warning

Only struct-typed arrays can be converted to Pandas dataframes.

Return type:

pandas.DataFrame

Examples

Construct a dataframe from a Vortex array:

>>> array = vortex.array([
...     {'name': 'Joseph', 'age': 25},
...     {'name': 'Narendra', 'age': 31},
...     {'name': 'Angela', 'age': 33},
...     {'name': 'Mikhail', 'age': 57},
... ])
>>> array.to_pandas_df()
   age      name
0   25    Joseph
1   31  Narendra
2   33    Angela
3   57   Mikhail
to_polars_dataframe()#

Construct a Polars dataframe from this Vortex array.

Warning

Only struct-typed arrays can be converted to Polars dataframes.

Returns:

Examples

>>> array = vortex.array([
...     {'name': 'Joseph', 'age': 25},
...     {'name': 'Narendra', 'age': 31},
...     {'name': 'Angela', 'age': 33},
...     {'name': 'Mikhail', 'age': 57},
... ])
>>> array.to_polars_dataframe()
shape: (4, 2)
┌─────┬──────────┐
│ age ┆ name     │
│ --- ┆ ---      │
│ i64 ┆ str      │
╞═════╪══════════╡
│ 25  ┆ Joseph   │
│ 31  ┆ Narendra │
│ 33  ┆ Angela   │
│ 57  ┆ Mikhail  │
└─────┴──────────┘
to_polars_series()#

Construct a Polars series from this Vortex array.

Returns:

Examples

Convert a numeric array with nulls to a Polars Series:

>>> vortex.array([1, None, 2, 3]).to_polars_series()  
shape: (4,)
Series: '' [i64]
[
    1
    null
    2
    3
]

Convert a UTF-8 string array to a Polars Series:

>>> vortex.array(['hello, ', 'is', 'it', 'me?']).to_polars_series()  
shape: (4,)
Series: '' [str]
[
    "hello, "
    "is"
    "it"
    "me?"
]

Convert a struct array to a Polars Series:

>>> array = vortex.array([
...     {'name': 'Joseph', 'age': 25},
...     {'name': 'Narendra', 'age': 31},
...     {'name': 'Angela', 'age': 33},
...     {'name': 'Mikhail', 'age': 57},
... ])
>>> array.to_polars_series()  
shape: (4,)
Series: '' [struct[2]]
[
    {25,"Joseph"}
    {31,"Narendra"}
    {33,"Angela"}
    {57,"Mikhail"}
]
to_pylist() list[Any]#

Deeply copy an Array into a Python list.

Return type:

list

Examples

>>> array = vortex.array([
...     {'name': 'Joseph', 'age': 25},
...     {'name': 'Narendra', 'age': 31},
...     {'name': 'Angela', 'age': 33},
... ])
>>> array.to_pylist()
[{'age': 25, 'name': 'Joseph'}, {'age': 31, 'name': 'Narendra'}, {'age': 33, 'name': 'Angela'}]
tree_display()#

Internal technical details about the encoding of this Array.

Warning

The format of the returned string may change without notice.

Return type:

str

Examples

Uncompressed arrays have straightforward encodings:

>>> arr = vortex.array([1, 2, None, 3])
>>> print(arr.tree_display())
root: vortex.primitive(0x03)(i64?, len=4) nbytes=36 B (100.00%)
  metadata: PrimitiveMetadata { validity: Array }
  buffer: 32 B
  validity: vortex.bool(0x02)(bool, len=4) nbytes=3 B (8.33%)
    metadata: BoolMetadata { validity: NonNullable, first_byte_bit_offset: 0 }
    buffer: 1 B

Compressed arrays often have more complex, deeply nested encoding trees.