Guide#
Rustaceans
See the Vortex Rust documentation, for details on Vortex in Rust.
Python#
Construct a Vortex array from lists of simple Python values:
>>> import vortex
>>> vtx = vortex.array([1, 2, 3, 4])
>>> vtx.dtype
int(64, False)
Python’s None
represents a missing or null value and changes the dtype of the array from
non-nullable 64-bit integers to nullable 64-bit integers:
>>> vtx = vortex.array([1, 2, None, 4])
>>> vtx.dtype
int(64, True)
A list of dict
is converted to an array of structures. Missing values may appear at any
level:
>>> vtx = vortex.array([
... {'name': 'Joseph', 'age': 25},
... {'name': None, 'age': 31},
... {'name': 'Angela', 'age': None},
... {'name': 'Mikhail', 'age': 57},
... {'name': None, 'age': None},
... None,
... ])
>>> vtx.dtype
struct({"age": int(64, True), "name": utf8(True)}, True)
Array.to_pylist()
converts a Vortex array into a list of Python values.
>>> vtx.to_pylist()
[{'age': 25, 'name': 'Joseph'}, {'age': 31, 'name': None}, {'age': None, 'name': 'Angela'}, {'age': 57, 'name': 'Mikhail'}, {'age': None, 'name': None}, {'age': None, 'name': None}]
Arrow#
The array()
function constructs a Vortex array from an Arrow one without any
copies:
>>> import pyarrow as pa
>>> arrow = pa.array([1, 2, None, 3])
>>> arrow.type
DataType(int64)
>>> vtx = vortex.array(arrow)
>>> vtx.dtype
int(64, True)
Array.to_arrow_array()
converts back to an Arrow array:
>>> vtx.to_arrow_array()
<pyarrow.lib.Int64Array object at ...>
[
1,
2,
null,
3
]
If you have a struct array, use Array.to_arrow_table()
to construct an Arrow table:
>>> struct_vtx = vortex.array([
... {'name': 'Joseph', 'age': 25},
... {'name': 'Narendra', 'age': 31},
... {'name': 'Angela', 'age': 33},
... {'name': 'Mikhail', 'age': 57},
... ])
>>> struct_vtx.to_arrow_table()
pyarrow.Table
age: int64
name: string_view
----
age: [[25,31,33,57]]
name: [["Joseph","Narendra","Angela","Mikhail"]]
Pandas#
Array.to_pandas_df()
converts a Vortex array into a Pandas DataFrame:
>>> df = struct_vtx.to_pandas_df()
>>> df
age name
0 25 Joseph
1 31 Narendra
2 33 Angela
3 57 Mikhail
array()
converts from a Pandas DataFrame into a Vortex array:
>>> vortex.array(df).to_arrow_table()
pyarrow.Table
age: int64
name: string_view
----
age: [[25,31,33,57]]
name: [["Joseph","Narendra","Angela","Mikhail"]]
Query Engines#
VortexDataset
implements the pyarrow.dataset.Dataset
API which
enables many Python-based query engines to pushdown row filters and column projections on Vortex
files. All the query engine examples use the same Vortex file:
>>> import vortex
>>> import pyarrow.parquet as pq
>>> vtx = vortex.array(pq.read_table("_static/example.parquet"))
>>> vortex.io.write_path(vtx, 'example.vortex')
>>> ds = vortex.dataset.from_path(
... 'example.vortex'
... )
Polars#
>>> import polars as pl
>>> lf = pl.scan_pyarrow_dataset(ds)
>>> lf = lf.select('tip_amount', 'fare_amount')
>>> lf = lf.head(3)
>>> lf.collect()
shape: (3, 2)
┌────────────┬─────────────┐
│ tip_amount ┆ fare_amount │
│ --- ┆ --- │
│ f64 ┆ f64 │
╞════════════╪═════════════╡
│ 0.0 ┆ 61.8 │
│ 5.1 ┆ 20.5 │
│ 16.54 ┆ 70.0 │
└────────────┴─────────────┘
DuckDB#
>>> import duckdb
>>> duckdb.sql('select ds.tip_amount, ds.fare_amount from ds limit 3').show()
┌────────────┬─────────────┐
│ tip_amount │ fare_amount │
│ double │ double │
├────────────┼─────────────┤
│ 0.0 │ 61.8 │
│ 5.1 │ 20.5 │
│ 16.54 │ 70.0 │
└────────────┴─────────────┘