Skip to content

Commit fbaaebd

Browse files
Add README and docstrings for Julia API (#17)
* Add docstrings to Julia API * WIP README * WIP README design notes * Update test to more specific exception types * More readme * fixup! More readme * Update README.md
1 parent d68e690 commit fbaaebd

File tree

5 files changed

+266
-26
lines changed

5 files changed

+266
-26
lines changed

Project.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,12 @@ uuid = "1b5eed3d-1f46-4baa-87f3-a4a892b23610"
33
version = "0.1.0"
44

55
[deps]
6+
DocStringExtensions = "ffbed154-4ef7-542d-bbb7-c09d3a79fcae"
67
object_store_ffi_jll = "0e112785-0821-598c-8835-9f07837e8d7b"
78

89
[compat]
910
CloudBase = "1"
11+
DocStringExtensions = "0.9"
1012
HTTP = "1"
1113
ReTestItems = "1"
1214
Sockets = "1"

README.md

Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,121 @@
11
# RustyObjectStore.jl
2+
3+
[![CI](https://github.com/RelationalAI/RustyObjectStore.jl/actions/workflows/CI.yml/badge.svg)](https://github.com/RelationalAI/RustyObjectStore.jl/actions/workflows/CI.yml)
4+
5+
RustyObjectStore.jl is a Julia package for getting and putting data in cloud object stores, such as Azure Blob Storage and AWS S3.
6+
It is built on top of the Rust [object_store crate](https://docs.rs/object_store/).
7+
It provides a minimal API and focusses on high throughput.
8+
9+
_The package is under active development. Currently only Azure Blob Storage is supported._
10+
11+
## Usage
12+
13+
The object_store runtime must be started before any requests are sent.
14+
15+
```julia
16+
using RustyObjectStore
17+
init_object_store()
18+
```
19+
20+
Requests are sent via calling `blob_put` or `blob_get!`, providing the location of the object to put/get, either the data to send or a buffer that will receive data, and credentials.
21+
For `blob_put` the data must be a vector of bytes (`UInt8`).
22+
For `blob_get!` the buffer must be a vector into which bytes (`UInt8`) can be written.
23+
```julia
24+
credentials = AzureCredentials("my_account", "my_container", "my_key")
25+
input = "1,2,3,4,5,6,7,8,9,0\n" ^ 5 # 100 B
26+
27+
nbytes_written = blob_put("path/to/example.csv", codeunits(input), credentials)
28+
@assert nbytes_written == 100
29+
30+
buffer = Vector{UInt8}(undef, 1000) # 1000 B
31+
@assert sizeof(buffer) > sizeof(input)
32+
33+
nbytes_read = blob_get!("path/to/example.csv", buffer, credentials)
34+
@assert nbytes_read == 100
35+
@assert String(buffer[1:nbytes_read]) == input
36+
```
37+
38+
## Design
39+
40+
#### Packaging
41+
42+
The Rust [object_store](https://github.com/apache/arrow-rs/tree/master/object_store) crate does not provide a C API, so we have defined a C API in [object_store_ffi](https://github.com/relationalAI/object_store_ffi).
43+
RustyObjectStore.jl depends on [object_store_ffi_jll.jl](https://github.com/JuliaBinaryWrappers/object_store_ffi_jll.jl) to provides a pre-built object_store_ffi library, and calls into the native library via `@ccall`.
44+
45+
#### Rust/Julia Interaction
46+
47+
Julia calls into the native library providing a libuv condition variable and then waits on that variable.
48+
In the native code, the request from Julia is passed into a queue that is processed by a Rust spawned task.
49+
Once the request to cloud storage is complete, Rust signals the condition variable.
50+
In this way, the requests are asynchronous all the way up to Julia and the network processing is handled in the context of native thread pool.
51+
52+
For a GET request, Julia provides a buffer for the native library to write into.
53+
This requires Julia to know a suitable size before-hand and requires the native library to do an extra memory copy, but the upside is that Julia controls the lifetime of the memory.
54+
55+
#### Threading Model
56+
57+
Rust object_store uses the [tokio](https://docs.rs/tokio) async runtime.
58+
59+
TODO
60+
61+
#### Rust Configuration
62+
63+
TODO
64+
65+
## Developement
66+
67+
When working on RustyObjectStore.jl you can either use [object_store_ffi_jll.jl](https://github.com/JuliaBinaryWrappers/object_store_ffi_jll.jl) or use a local build of [object_store_ffi](https://github.com/relationalAI/object_store_ffi).
68+
Using object_store_ffi_jll.jl is just like using any other Julia package.
69+
For example, you can change object_store_ffi_jll.jl version by updating the Project.toml `compat` entry and running `Pkg.update` to get the latest compatible release,
70+
or `Pkg.develop` to use an unreleased version.
71+
72+
Alternatively, you can use a local build of object_store_ffi library by setting the `OBJECT_STORE_LIB` environment variable to the location of the build.
73+
For example, if you have the object_store_ffi repository at `~/repos/object_store_ffi` and build the library by running `cargo build --release` from the base of that repository,
74+
then you could use that local build by setting `OBJECT_STORE_LIB="~/repos/object_store_ffi/target/release"`.
75+
76+
The `OBJECT_STORE_LIB` environment variable is intended to be used only for local development.
77+
The library path is set at package precompile time, so if the environment variable is changed RustyObjectStore.jl must recompile for the change to take effect.
78+
You can check the location of the library in use by inspecting `RustyObjectStore.rust_lib`.
79+
80+
Since RustyObjectStore.jl is the primary user of object_store_ffi, the packages should usually be developed alongside one another.
81+
For example, updating object_store_ffi and then testing out the changes in RustyObjectStore.jl.
82+
A new release of object_store_ffi should usually be followed by a new release of object_store_ffi_jll.jl, and then a new release RustyObjectStore.jl.
83+
84+
#### Testing
85+
86+
Tests use the [ReTestItems.jl](https://github.com/JuliaTesting/ReTestItems.jl) test framework.
87+
88+
Run tests using the package manager Pkg.jl like:
89+
```sh
90+
$ julia --project -e 'using Pkg; Pkg.test()'
91+
```
92+
or after starting in a Julia session started with `julia --project`:
93+
```julia
94+
julia> # press ] to enter the Pkg REPL mode
95+
96+
(RustyObjectStore) pkg> test
97+
```
98+
Alternatively, tests can be run using ReTestItems.jl directly, which supports running individual tests.
99+
For example:
100+
```julia
101+
julia> using ReTestItems
102+
103+
julia> runtests("test/azure_api_tests.jl"; name="AzureCredentials")
104+
```
105+
106+
If `OBJECT_STORE_LIB` is set, then running tests locally will use the specified local build of the object_store_ffi library, rather than the version installed by object_store_ffi_jll.jl.
107+
This is useful for testing out changes to object_store_ffi.
108+
109+
Adding new tests is done by writing test code in a `@testitem` in a file suffixed `*_tests.jl`.
110+
See the existing [tests](./test) or the [ReTestItems documentation](https://github.com/JuliaTesting/ReTestItems.jl/#writing-tests) for examples.
111+
112+
#### Release Process
113+
114+
New releases of RustyObjectStore.jl can be made by incrementing the version number in the Project.toml file following [Semantic Versioning](semver.org),
115+
and then commenting on the commit that should be released with `@JuliaRegistrator register`
116+
(see [example](https://github.com/RelationalAI/RustyObjectStore.jl/commit/1b1ba5a198e76afe37f75a1d07e701deb818869c#comments)).
117+
The [JuliaRegistrator](https://github.com/JuliaRegistries/Registrator.jl) bot will reply to the comment and automatically open a PR to the [General](https://github.com/JuliaRegistries/General/) package registry, that should then automatically be merged within a few minutes.
118+
Once that PR to General is merged the new version of RustyObjectStore.jl is available, and the TagBot Github Action will make add a Git tag and a GitHub release for the new version.
119+
120+
RustyObjectStore.jl uses the object_store_ffi library via depending on object_store_ffi_jll.jl which installs pre-built binaries.
121+
So when a new release of object_store_ffi is made, we need there to be a new release of object_store_ffi_jll.jl before we can make a release of RustyObjectStore.jl that uses the latest object_store_ffi.

src/RustyObjectStore.jl

Lines changed: 124 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,10 @@ module RustyObjectStore
22

33
export init_object_store, blob_get!, blob_put, AzureCredentials, ObjectStoreConfig
44

5-
using object_store_ffi_jll
65
using Base.Libc.Libdl: dlext
6+
using Base: @kwdef, @lock
7+
using DocStringExtensions
8+
using object_store_ffi_jll
79

810
const rust_lib = if haskey(ENV, "OBJECT_STORE_LIB")
911
# For development, e.g. run `cargo build --release` and point to `target/release/` dir.
@@ -20,38 +22,91 @@ else
2022
object_store_ffi_jll.libobject_store_ffi
2123
end
2224

23-
struct ObjectStoreConfig
25+
"""
26+
$TYPEDEF
27+
28+
Global configuration for the object store requests.
29+
30+
# Keywords
31+
$TYPEDFIELDS
32+
"""
33+
@kwdef struct ObjectStoreConfig
34+
"The maximum number of times to retry a request."
2435
max_retries::Culonglong
36+
"The number of seconds from the initial request after which no further retries will be attempted."
2537
retry_timeout_sec::Culonglong
2638
end
2739

28-
const OBJECT_STORE_STARTED = Ref(false)
40+
function Base.show(io::IO, config::ObjectStoreConfig)
41+
print(io, "ObjectStoreConfig("),
42+
print(io, "max_retries=", Int(config.max_retries), ", ")
43+
print(io, "retry_timeout_sec=", Int(config.retry_timeout_sec), ")")
44+
end
45+
46+
const DEFAULT_CONFIG = ObjectStoreConfig(max_retries=15, retry_timeout_sec=150)
47+
48+
const _OBJECT_STORE_STARTED = Ref(false)
2949
const _INIT_LOCK::ReentrantLock = ReentrantLock()
30-
function init_object_store(config::ObjectStoreConfig = ObjectStoreConfig(15, 150))
31-
Base.@lock _INIT_LOCK begin
32-
if OBJECT_STORE_STARTED[]
33-
return
50+
51+
struct InitException <: Exception
52+
msg::String
53+
return_code::Cint
54+
end
55+
56+
"""
57+
init_object_store()
58+
init_object_store(config::ObjectStoreConfig)
59+
60+
Initialise object store.
61+
62+
This starts a `tokio` runtime for handling `object_store` requests.
63+
It must be called before sending a request e.g. with `blob_get!` or `blob_put`.
64+
The runtime is only started once and cannot be re-initialised with a different config,
65+
subsequent `init_object_store` calls have no effect.
66+
67+
# Throws
68+
- `InitException`: if the runtime fails to start.
69+
"""
70+
function init_object_store(config::ObjectStoreConfig=DEFAULT_CONFIG)
71+
@lock _INIT_LOCK begin
72+
if _OBJECT_STORE_STARTED[]
73+
return nothing
3474
end
3575
res = @ccall rust_lib.start(config::ObjectStoreConfig)::Cint
3676
if res != 0
37-
error("Failed to init_object_store")
77+
throw(InitException("Failed to initialise object store runtime.", res))
3878
end
39-
OBJECT_STORE_STARTED[] = true
79+
_OBJECT_STORE_STARTED[] = true
4080
end
81+
return nothing
4182
end
4283

84+
"""
85+
$TYPEDEF
86+
87+
# Arguments
88+
$TYPEDFIELDS
89+
"""
4390
struct AzureCredentials
91+
"Azure account name"
4492
account::String
93+
"Azure container name"
4594
container::String
95+
"Azure access key"
4696
key::String
97+
"(Optional) Alternative Azure host. For example, if using Azurite."
4798
host::String
99+
function AzureCredentials(account::String, container::String, key::String, host::String="")
100+
return new(account, container, key, host)
101+
end
48102
end
49103
function Base.show(io::IO, credentials::AzureCredentials)
50104
print(io, "AzureCredentials("),
51-
print(io, repr(credentials.account), ", ")
52-
print(io, repr(credentials.container), ", ")
53-
print(io, "\"*****\", ") # don't print the secret key
54-
print(io, repr(credentials.host), ")")
105+
print(io, repr(credentials.account), )
106+
print(io, ", ", repr(credentials.container))
107+
print(io, ", ", "\"*****\"") # don't print the secret key
108+
!isempty(credentials.host) && print(io, ", ", repr(credentials.host))
109+
print(io, ")")
55110
end
56111

57112
const _AzureCredentialsFFI = NTuple{4,Cstring}
@@ -67,6 +122,7 @@ function Base.cconvert(::Type{Ref{AzureCredentials}}, credentials::AzureCredenti
67122
# safely in the unsafe_convert call.
68123
return credentials_ffi, Ref(credentials_ffi)
69124
end
125+
70126
function Base.unsafe_convert(::Type{Ref{AzureCredentials}}, x::Tuple{T,Ref{T}}) where {T<:_AzureCredentialsFFI}
71127
return Base.unsafe_convert(Ptr{_AzureCredentialsFFI}, x[2])
72128
end
@@ -79,6 +135,37 @@ struct Response
79135
Response() = new(-1, 0, C_NULL)
80136
end
81137

138+
abstract type RequestException <: Exception end
139+
struct GetException <: RequestException
140+
msg::String
141+
end
142+
struct PutException <: RequestException
143+
msg::String
144+
end
145+
146+
# TODO: this should be `blob_get!(buffer, path, credentials)` i.e. mutated argument first.
147+
"""
148+
blob_get!(path, buffer, credentials) -> Int
149+
150+
Send a get request to Azure Blob Storage.
151+
152+
Fetches the data bytes at `path` and writes them to the given `buffer`.
153+
154+
# Arguments
155+
- `path::String`: The location of the data to fetch.
156+
- `buffer::AbstractVector{UInt8}`: The buffer to write the blob data to.
157+
The contents of the buffer will be mutated.
158+
The buffer must be at least as large as the data.
159+
The buffer will not be resized.
160+
- `credentials::AzureCredentials`: The credentials to use for the request.
161+
162+
# Returns
163+
- `nbytes::Int`: The number of bytes read from Blob Storage and written to the buffer.
164+
That is, `buffer[1:nbytes]` will contain the blob data.
165+
166+
# Throws
167+
- `GetException`: If the request fails for any reason, including if the `buffer` is too small.
168+
"""
82169
function blob_get!(path::String, buffer::AbstractVector{UInt8}, credentials::AzureCredentials)
83170
response = Ref(Response())
84171
size = length(buffer)
@@ -109,13 +196,35 @@ function blob_get!(path::String, buffer::AbstractVector{UInt8}, credentials::Azu
109196
if response.result == 1
110197
err = "failed to process get with error: $(unsafe_string(response.error_message))"
111198
@ccall rust_lib.destroy_cstring(response.error_message::Ptr{Cchar})::Cint
112-
error(err)
199+
throw(GetException(err))
113200
end
114201

115202
return Int(response.length)
116203
end
117204
end
118205

206+
# TODO: this should be `blob_put(buffer, path, credentials)` so match `blob_get!`
207+
# when that is changed to put its mutated argument first.
208+
"""
209+
blob_put(path, buffer, credentials) -> Int
210+
211+
Send a put request to Azure Blob Storage.
212+
213+
Atomically writes the data bytes in `buffer` to `path`.
214+
215+
# Arguments
216+
- `path::String`: The location to write data to.
217+
- `buffer::AbstractVector{UInt8}`: The data to write to Blob Storage.
218+
This buffer will not be mutated.
219+
- `credentials::AzureCredentials`: The credentials to use for the request.
220+
221+
# Returns
222+
- `nbytes::Int`: The number of bytes written to Blob Storage.
223+
Is always equal to `length(buffer)`.
224+
225+
# Throws
226+
- `PutException`: If the request fails for any reason.
227+
"""
119228
function blob_put(path::String, buffer::AbstractVector{UInt8}, credentials::AzureCredentials)
120229
response = Ref(Response())
121230
size = length(buffer)
@@ -146,7 +255,7 @@ function blob_put(path::String, buffer::AbstractVector{UInt8}, credentials::Azur
146255
if response.result == 1
147256
err = "failed to process put with error: $(unsafe_string(response.error_message))"
148257
@ccall rust_lib.destroy_cstring(response.error_message::Ptr{Cchar})::Cint
149-
error(err)
258+
throw(PutException(err))
150259
end
151260

152261
return Int(response.length)

test/azure_api_tests.jl

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
@testitem "AzureCredentials" begin
2+
# access key is obscured when printing
3+
@test repr(AzureCredentials("a", "b", "c", "d")) == "AzureCredentials(\"a\", \"b\", \"*****\", \"d\")"
4+
# host is optional
5+
@test AzureCredentials("a", "b", "c") == AzureCredentials("a", "b", "c", "")
6+
# host is not shown if not set
7+
@test repr(AzureCredentials("a", "b", "c")) == "AzureCredentials(\"a\", \"b\", \"*****\")"
8+
end

0 commit comments

Comments
 (0)