Arrays#
A Vortex array is a possibly compressed ordered set of homogeneously typed values. Each array has a logical type and a physical encoding. The logical type describes the set of operations applicable to the values of this array. The physical encoding describes how this array is realized in memory, on disk, and over the wire and how to apply operations to that realization.
The main entry point for creating Vortex arrays from other Python objects. |
|
Attempt to compress a vortex array. |
|
An array of zero or more rows each with the same set of columns. |
- vortex.encoding.array(obj: Array | list | Any) Array #
The main entry point for creating Vortex arrays from other Python objects.
This function is also available as
vortex.array
.- Parameters:
obj (
pyarrow.Array
,list
,pandas.DataFrame
) – The elements of this array or list become the elements of the Vortex array.- Return type:
Examples
A Vortex array containing the first three integers:
>>> vortex.array([1, 2, 3]).to_arrow_array() <pyarrow.lib.Int64Array object at ...> [ 1, 2, 3 ]
The same Vortex array with a null value in the third position:
>>> vortex.array([1, 2, None, 3]).to_arrow_array() <pyarrow.lib.Int64Array object at ...> [ 1, 2, null, 3 ]
Initialize a Vortex array from an Arrow array:
>>> arrow = pyarrow.array(['Hello', 'it', 'is', 'me'], type=pyarrow.string_view()) >>> vortex.array(arrow).to_arrow_array() <pyarrow.lib.StringViewArray object at ...> [ "Hello", "it", "is", "me" ]
Initialize a Vortex array from a Pandas dataframe:
>>> import pandas as pd >>> df = pd.DataFrame({ ... "Name": ["Braund", "Allen", "Bonnell"], ... "Age": [22, 35, 58], ... }) >>> vortex.array(df).to_arrow_array() <pyarrow.lib.ChunkedArray object at ...> [ -- is_valid: all not null -- child 0 type: string_view [ "Braund", "Allen", "Bonnell" ] -- child 1 type: int64 [ 22, 35, 58 ] ]
- vortex.encoding.compress(array)#
Attempt to compress a vortex array.
- Parameters:
array (
Array
) – The array.
Examples
Compress a very sparse array of integers:
>>> a = vortex.array([42 for _ in range(1000)]) >>> str(vortex.compress(a)) 'vortex.constant(0x09)(i64, len=1000)'
Compress an array of increasing integers:
>>> a = vortex.array(list(range(1000))) >>> str(vortex.compress(a)) 'fastlanes.bitpacked(0x15)(i64, len=1000)'
Compress an array of increasing floating-point numbers and a few nulls:
>>> a = vortex.array([ ... float(x) if x % 20 != 0 else None ... for x in range(1000) ... ]) >>> str(vortex.compress(a)) 'vortex.alp(0x11)(f64?, len=1000)'
- class vortex.encoding.Array#
An array of zero or more rows each with the same set of columns.
Examples
Arrays support all the standard comparison operations:
>>> a = vortex.array(['dog', None, 'cat', 'mouse', 'fish']) >>> b = vortex.array(['doug', 'jennifer', 'casper', 'mouse', 'faust']) >>> (a < b).to_arrow_array() <pyarrow.lib.BooleanArray object at ...> [ true, null, false, false, false ] >>> (a <= b).to_arrow_array() <pyarrow.lib.BooleanArray object at ...> [ true, null, false, true, false ] >>> (a == b).to_arrow_array() <pyarrow.lib.BooleanArray object at ...> [ false, null, false, true, false ] >>> (a != b).to_arrow_array() <pyarrow.lib.BooleanArray object at ...> [ true, null, true, false, true ] >>> (a >= b).to_arrow_array() <pyarrow.lib.BooleanArray object at ...> [ false, null, true, true, true ] >>> (a > b).to_arrow_array() <pyarrow.lib.BooleanArray object at ...> [ false, null, true, false, true ]
- __len__()#
Return len(self).
- dtype#
The data type of this array.
- Return type:
Examples
By default,
array()
uses the largest available bit-width:>>> vortex.array([1, 2, 3]).dtype int(64, False)
Including a
None
forces a nullable type:>>> vortex.array([1, None, 2, 3]).dtype int(64, True)
A UTF-8 string array:
>>> vortex.array(['hello, ', 'is', 'it', 'me?']).dtype utf8(False)
- fill_forward()#
Fill forward non-null values over runs of nulls.
Leading nulls are replaced with the “zero” for that type. For integral and floating-point types, this is zero. For the Boolean type, this is :obj:`False.
Fill forward sensor values over intermediate missing values. Note that leading nulls are replaced with 0.0:
>>> a = vortex.array([ ... None, None, 30.29, 30.30, 30.30, None, None, 30.27, 30.25, ... 30.22, None, None, None, None, 30.12, 30.11, 30.11, 30.11, ... 30.10, 30.08, None, 30.21, 30.03, 30.03, 30.05, 30.07, 30.07, ... ]) >>> a.fill_forward().to_arrow_array() <pyarrow.lib.DoubleArray object at ...> [ 0, 0, 30.29, 30.3, 30.3, 30.3, 30.3, 30.27, 30.25, 30.22, ... 30.11, 30.1, 30.08, 30.08, 30.21, 30.03, 30.03, 30.05, 30.07, 30.07 ]
- filter(filter)#
Filter an Array by another Boolean array.
- Parameters:
filter (
Array
) – Keep all the rows inself
for which the correspondingly indexed row in filter is True.- Return type:
Examples
Keep only the single digit positive integers.
>>> a = vortex.array([0, 42, 1_000, -23, 10, 9, 5]) >>> filter = vortex.array([True, False, False, False, False, True, True]) >>> a.filter(filter).to_arrow_array() <pyarrow.lib.Int64Array object at ...> [ 0, 9, 5 ]
- scalar_at(index)#
Retrieve a row by its index.
- Parameters:
index (
int
) – The index of interest. Must be greater than or equal to zero and less than the length of this array.- Returns:
If this array contains numbers or Booleans, this array returns the corresponding primitive Python type, i.e. int, float, and bool. For structures and variable-length data types, a zero-copy view of the underlying data is returned.
- Return type:
one of
int
,float
,bool
,vortex.scalar.Buffer
,vortex.scalar.BufferString
,vortex.scalar.VortexList
,vortex.scalar.VortexStruct
Examples
Retrieve the last element from an array of integers:
>>> vortex.array([10, 42, 999, 1992]).scalar_at(3) 1992
Retrieve the third element from an array of strings:
>>> array = vortex.array(["hello", "goodbye", "it", "is"]) >>> array.scalar_at(2) <vortex.BufferString ...>
Vortex, by default, returns a view into the array’s data. This avoids copying the data, which can be expensive if done repeatedly.
BufferString.into_python()
forcibly copies the scalar data into a Python data structure.>>> array.scalar_at(2).into_python() 'it'
Retrieve an element from an array of structures:
>>> array = vortex.array([ ... {'name': 'Joseph', 'age': 25}, ... {'name': 'Narendra', 'age': 31}, ... {'name': 'Angela', 'age': 33}, ... None, ... {'name': 'Mikhail', 'age': 57}, ... ]) >>> array.scalar_at(2).into_python() {'age': 33, 'name': <vortex.BufferString ...>}
Notice that
VortexStruct.into_python()
only copies one “layer” of data into Python. If we want to ensure the entire structure is recurisvely copied into Python we can specifyrecursive=True
:>>> array.scalar_at(2).into_python(recursive=True) {'age': 33, 'name': 'Angela'}
Retrieve a missing element from an array of structures:
>>> array.scalar_at(3) is None True
Out of bounds accesses are prohibited:
>>> vortex.array([10, 42, 999, 1992]).scalar_at(10) Traceback (most recent call last): ... ValueError: index 10 out of bounds from 0 to 4 ...
Unlike Python, negative indices are not supported:
>>> vortex.array([10, 42, 999, 1992]).scalar_at(-2) Traceback (most recent call last): ... OverflowError: can't convert negative int to unsigned
- slice(start, end)#
Keep only a contiguous subset of elements.
- Parameters:
- Return type:
Examples
Keep only the second through third elements:
>>> a = vortex.array(['a', 'b', 'c', 'd']) >>> a.slice(1, 3).to_arrow_array() <pyarrow.lib.StringArray object at ...> [ "b", "c" ]
Keep none of the elements:
>>> a = vortex.array(['a', 'b', 'c', 'd']) >>> a.slice(3, 3).to_arrow_array() <pyarrow.lib.StringArray object at ...> []
Unlike Python, it is an error to slice outside the bounds of the array:
>>> a = vortex.array(['a', 'b', 'c', 'd']) >>> a.slice(2, 10).to_arrow_array() Traceback (most recent call last): ... ValueError: index 10 out of bounds from 0 to 4
Or to slice with a negative value:
>>> a = vortex.array(['a', 'b', 'c', 'd']) >>> a.slice(-2, -1).to_arrow_array() Traceback (most recent call last): ... OverflowError: can't convert negative int to unsigned
- take(indices)#
Filter, permute, and/or repeat elements by their index.
Examples
Keep only the first and third elements:
>>> a = vortex.array(['a', 'b', 'c', 'd']) >>> indices = vortex.array([0, 2]) >>> a.take(indices).to_arrow_array() <pyarrow.lib.StringArray object at ...> [ "a", "c" ]
Permute and repeat the first and second elements:
>>> a = vortex.array(['a', 'b', 'c', 'd']) >>> indices = vortex.array([0, 1, 1, 0]) >>> a.take(indices).to_arrow_array() <pyarrow.lib.StringArray object at ...> [ "a", "b", "b", "a" ]
- to_arrow_array()#
Convert this array to an Arrow array.
See also
- Return type:
Examples
Round-trip an Arrow array through a Vortex array:
>>> vortex.array([1, 2, 3]).to_arrow_array() <pyarrow.lib.Int64Array object at ...> [ 1, 2, 3 ]
- to_arrow_table() Table #
Construct an Arrow table from this Vortex array.
See also
Warning
Only struct-typed arrays can be converted to Arrow tables.
- Return type:
Examples
>>> array = vortex.array([ ... {'name': 'Joseph', 'age': 25}, ... {'name': 'Narendra', 'age': 31}, ... {'name': 'Angela', 'age': 33}, ... {'name': 'Mikhail', 'age': 57}, ... ]) >>> array.to_arrow_table() pyarrow.Table age: int64 name: string_view ---- age: [[25,31,33,57]] name: [["Joseph","Narendra","Angela","Mikhail"]]
- to_numpy(*, zero_copy_only: bool = True) numpy.ndarray #
Construct a NumPy array from this Vortex array.
This is an alias for
self.to_arrow_array().to_numpy(zero_copy_only)
- Parameters:
zero_copy_only (
bool
) – WhenTrue
, this method will raise an error unless a NumPy array can be created without copying the data. This is only possible when the array is a primitive array without nulls.- Return type:
Examples
Construct an immutable ndarray from a Vortex array:
>>> array = vortex.array([1, 0, 0, 1]) >>> array.to_numpy() array([1, 0, 0, 1])
- to_pandas_df() DataFrame #
Construct a Pandas dataframe from this Vortex array.
Warning
Only struct-typed arrays can be converted to Pandas dataframes.
- Return type:
Examples
Construct a dataframe from a Vortex array:
>>> array = vortex.array([ ... {'name': 'Joseph', 'age': 25}, ... {'name': 'Narendra', 'age': 31}, ... {'name': 'Angela', 'age': 33}, ... {'name': 'Mikhail', 'age': 57}, ... ]) >>> array.to_pandas_df() age name 0 25 Joseph 1 31 Narendra 2 33 Angela 3 57 Mikhail
- to_polars_dataframe()#
Construct a Polars dataframe from this Vortex array.
See also
Warning
Only struct-typed arrays can be converted to Polars dataframes.
- Returns:
.. – Polars excludes the DataFrame class from their Intersphinx index pola-rs/polars#7027
Examples
>>> array = vortex.array([ ... {'name': 'Joseph', 'age': 25}, ... {'name': 'Narendra', 'age': 31}, ... {'name': 'Angela', 'age': 33}, ... {'name': 'Mikhail', 'age': 57}, ... ]) >>> array.to_polars_dataframe() shape: (4, 2) ┌─────┬──────────┐ │ age ┆ name │ │ --- ┆ --- │ │ i64 ┆ str │ ╞═════╪══════════╡ │ 25 ┆ Joseph │ │ 31 ┆ Narendra │ │ 33 ┆ Angela │ │ 57 ┆ Mikhail │ └─────┴──────────┘
- to_polars_series()#
Construct a Polars series from this Vortex array.
See also
- Returns:
.. – Polars excludes the Series class from their Intersphinx index pola-rs/polars#7027
Examples
Convert a numeric array with nulls to a Polars Series:
>>> vortex.array([1, None, 2, 3]).to_polars_series() shape: (4,) Series: '' [i64] [ 1 null 2 3 ]
Convert a UTF-8 string array to a Polars Series:
>>> vortex.array(['hello, ', 'is', 'it', 'me?']).to_polars_series() shape: (4,) Series: '' [str] [ "hello, " "is" "it" "me?" ]
Convert a struct array to a Polars Series:
>>> array = vortex.array([ ... {'name': 'Joseph', 'age': 25}, ... {'name': 'Narendra', 'age': 31}, ... {'name': 'Angela', 'age': 33}, ... {'name': 'Mikhail', 'age': 57}, ... ]) >>> array.to_polars_series() shape: (4,) Series: '' [struct[2]] [ {25,"Joseph"} {31,"Narendra"} {33,"Angela"} {57,"Mikhail"} ]
- to_pylist() list[Any] #
Deeply copy an Array into a Python list.
- Return type:
Examples
>>> array = vortex.array([ ... {'name': 'Joseph', 'age': 25}, ... {'name': 'Narendra', 'age': 31}, ... {'name': 'Angela', 'age': 33}, ... ]) >>> array.to_pylist() [{'age': 25, 'name': 'Joseph'}, {'age': 31, 'name': 'Narendra'}, {'age': 33, 'name': 'Angela'}]
- tree_display()#
Internal technical details about the encoding of this Array.
Warning
The format of the returned string may change without notice.
- Return type:
Examples
Uncompressed arrays have straightforward encodings:
>>> arr = vortex.array([1, 2, None, 3]) >>> print(arr.tree_display()) root: vortex.primitive(0x03)(i64?, len=4) nbytes=36 B (100.00%) metadata: PrimitiveMetadata { validity: Array } buffer: 32 B validity: vortex.bool(0x02)(bool, len=4) nbytes=3 B (8.33%) metadata: BoolMetadata { validity: NonNullable, first_byte_bit_offset: 0 } buffer: 1 B
Compressed arrays often have more complex, deeply nested encoding trees.