Input and Output#
Vortex arrays support reading and writing to local and remote file systems, including plain-old HTTP, S3, Google Cloud Storage, and Azure Blob Storage.
Read a vortex struct array from the local filesystem. |
|
Read a vortex struct array from a URL. |
|
Write a vortex struct array to the local filesystem. |
- vortex.io.read_path(path, *, projection=None, row_filter=None, indices=None)#
Read a vortex struct array from the local filesystem.
- Parameters:
Examples
Read an array with a structured column and nulls at multiple levels and in multiple columns.
>>> a = vortex.array([ ... {'name': 'Joseph', 'age': 25}, ... {'name': None, 'age': 31}, ... {'name': 'Angela', 'age': None}, ... {'name': 'Mikhail', 'age': 57}, ... {'name': None, 'age': None}, ... ]) >>> vortex.io.write_path(a, "a.vortex") >>> b = vortex.io.read_path("a.vortex") >>> b.to_arrow_array() <pyarrow.lib.StructArray object at ...> -- is_valid: all not null -- child 0 type: int64 [ 25, 31, null, 57, null ] -- child 1 type: string_view [ "Joseph", null, "Angela", "Mikhail", null ]
Read just the age column:
>>> c = vortex.io.read_path("a.vortex", projection = ["age"]) >>> c.to_arrow_array() <pyarrow.lib.StructArray object at ...> -- is_valid: all not null -- child 0 type: int64 [ 25, 31, null, 57, null ]
Read just the name column, by its index:
>>> d = vortex.io.read_path("a.vortex", projection = [1]) >>> d.to_arrow_array() <pyarrow.lib.StructArray object at ...> -- is_valid: all not null -- child 0 type: string_view [ "Joseph", null, "Angela", "Mikhail", null ]
Keep rows with an age above 35. This will read O(N_KEPT) rows, when the file format allows.
>>> e = vortex.io.read_path("a.vortex", row_filter = vortex.expr.column("age") > 35) >>> e.to_arrow_array() <pyarrow.lib.StructArray object at ...> -- is_valid: all not null -- child 0 type: int64 [ 57 ] -- child 1 type: string_view [ "Mikhail" ]
TODO(DK): Repeating a column in a projection does not work
Read the age column by name, twice, and the name column by index, once:
>>> # e = vortex.io.read_path("a.vortex", projection = ["age", 1, "age"]) >>> # e.to_arrow_array()
TODO(DK): Top-level nullness does not work.
>>> a = vortex.array([ ... {'name': 'Joseph', 'age': 25}, ... {'name': None, 'age': 31}, ... {'name': 'Angela', 'age': None}, ... None, ... {'name': 'Mikhail', 'age': 57}, ... {'name': None, 'age': None}, ... ]) >>> vortex.io.write_path(a, "a.vortex") >>> # b = vortex.io.read_path("a.vortex") >>> # b.to_arrow_array()
- vortex.io.read_url(url, *, projection=None, row_filter=None, indices=None)#
Read a vortex struct array from a URL.
See also
- Parameters:
Examples
Read an array from an HTTPS URL:
>>> a = vortex.io.read_url("https://example.com/dataset.vortex")
Read an array from an S3 URL:
>>> a = vortex.io.read_url("s3://bucket/path/to/dataset.vortex")
Read an array from an Azure Blob File System URL:
>>> a = vortex.io.read_url("abfss://my_file_system@my_account.dfs.core.windows.net/path/to/dataset.vortex")
Read an array from an Azure Blob Stroage URL:
>>> a = vortex.io.read_url("https://my_account.blob.core.windows.net/my_container/path/to/dataset.vortex")
Read an array from a Google Stroage URL:
>>> a = vortex.io.read_url("gs://bucket/path/to/dataset.vortex")
Read an array from a local file URL:
>>> a = vortex.io.read_url("file:/path/to/dataset.vortex")
- vortex.io.write_path(array, f, *, compress=True)#
Write a vortex struct array to the local filesystem.
- Parameters:
Examples
Write the array a to the local file a.vortex.
>>> a = vortex.array([ ... {'x': 1}, ... {'x': 2}, ... {'x': 10}, ... {'x': 11}, ... {'x': None}, ... ]) >>> vortex.io.write_path(a, "a.vortex")