Skip to content

Commit 462cbac

Browse files
committed
More docs and test
1 parent 4bdc16c commit 462cbac

File tree

3 files changed

+66
-21
lines changed

3 files changed

+66
-21
lines changed

README.md

Lines changed: 20 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -2,21 +2,30 @@
22

33
Extremely efficient string interning solution for Rust crates.
44

5+
*String interning:* The technique of representing all strings which are equal by
6+
a pointer or ID that is unique to the *contents* of that strings, such that O(n)
7+
string equality check becomes a O(1) pointer equality check.
8+
9+
Interned strings in Stringleton are called "symbols", in the tradition of Ruby.
10+
511
## Distinguishing characteristics
612

713
- Ultra fast: Getting the string representation of a `Symbol` is a lock-free
814
memory load. No reference counting or atomics involved.
915
- Symbol literals (`sym!(...)`) are "free" at the call-site. Multiple
1016
invocations with the same string value are eagerly reconciled on program
11-
startup, using link-time tricks.
17+
startup using linker tricks.
1218
- Symbols are tiny. Just a single pointer - 8 bytes on 64-bit platforms.
19+
- Symbols are trivially copyable - no reference counting.
20+
- No size limit - symbol strings can be arbitrarily long (i.e., this is not a
21+
"small string optimization" implementation).
1322
- Debugger friendly: If your debugger is able to display a plain Rust `&str`, it
1423
is capable of displaying `Symbol`.
1524
- Dynamic library support: Symbols can be passed across dynamic linking
1625
boundaries (terms and conditions apply - see the documentation of
1726
`stringleton-dylib`).
1827
- `no_std` support: `std` synchronization primitives used in the symbol registry
19-
can be replaced with `once_cell` and `spin`. _See below for caveats._
28+
can be replaced with `once_cell` and `spin`. *See below for caveats.*
2029
- `serde` support - symbols are serialized/deserialized as strings.
2130
- Fast bulk-insertion of symbols at runtime.
2231

@@ -33,6 +42,7 @@ Extremely efficient string interning solution for Rust crates.
3342
of memory leaks, which is a denial-of-service hazard.
3443
- You need a bit-stable representation of symbols that does not change between
3544
runs.
45+
- Consider if `smol_str` or `cowstr` is a better fit for such use cases.
3646

3747
## Usage
3848

@@ -58,18 +68,18 @@ assert_eq!(message.as_str().as_ptr(), message2.as_str().as_ptr());
5868

5969
## Crate features
6070

61-
- **std** _(enabled by default)_: Use synchronization primitives from the
71+
- **std** *(enabled by default)*: Use synchronization primitives from the
6272
standard library. Implies `alloc`. When disabled, `critical-section` and
63-
`spin` must both be enabled _(see below for caveats)_.
64-
- **alloc** _(enabled by default)_: Support creating symbols from `String`.
73+
`spin` must both be enabled *(see below for caveats)*.
74+
- **alloc** *(enabled by default)*: Support creating symbols from `String`.
6575
- **serde**: Implements `serde::Serialize` and `serde::Deserialize` for symbols,
6676
which will be serialized/deserialized as plain strings.
6777
- **debug-assertions**: Enables expensive debugging checks at runtime - mostly
6878
useful to diagnose problems in complicated linker scenarios.
6979
- **critical-section**: When `std` is not enabled, this enables `once_cell` as a
7080
dependency with the `critical-section` feature enabled. Only relevant in
71-
`no_std` environments. _[See `critical-section` for more
72-
details.](https://docs.rs/critical-section/latest/critical_section/)_
81+
`no_std` environments. *[See `critical-section` for more
82+
details.](https://docs.rs/critical-section/latest/critical_section/)*
7383
- **spin**: When `std` is not enabled, this enables `spin` as a dependency,
7484
which is used to obtain global read/write locks on the symbol registry. Only
7585
relevant in `no_std` environments (and is a pessimization in other
@@ -103,7 +113,7 @@ are deduplicated when the program starts. Any theoretically faster solution
103113
would need fairly deep cooperation from the compiler aimed at this specific use
104114
case.
105115

106-
Also, symbol literals are _always_ a memory load. The compiler cannot perform
116+
Also, symbol literals are *always* a memory load. The compiler cannot perform
107117
optimizations based on the contents of symbols, because it doesn't know how they
108118
will be reconciled until link time. For example, while `sym!(a) != sym!(a)` is
109119
always false, the compiler cannot eliminate code paths relying on that.
@@ -126,10 +136,10 @@ broadly compatible with dynamic libraries, but there are a few caveats:
126136
the dependency graph, the "host" crate must be prevented from linking
127137
statically to `stringleton`, because it would either cause duplicate symbol
128138
definitions, or worse, the host and client binaries would disagree about
129-
which `Registry` to use. To avoid this, the _host_ binary can use
139+
which `Registry` to use. To avoid this, the *host* binary can use
130140
`stringleton-dylib` explicitly instead of `stringleton`, which forces dynamic
131141
linkage of the symbol registry.
132-
4. Dynamically _unloading_ libraries is extremely risky (`dlclose()` and
142+
4. Dynamically *unloading* libraries is extremely risky (`dlclose()` and
133143
similar). Unloading a library that has any calls to the `sym!(..)` or
134144
`static_sym!(..)` macros is instant UB. Such a library can in principle use
135145
`Symbol::new()`, but probably not `Symbol::new_static()`.

stringleton-registry/symbol.rs

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -398,10 +398,29 @@ mod tests {
398398

399399
#[test]
400400
fn new_static() {
401+
static UNIQUE_SYMBOL: &str =
402+
"This is a globally unique string that exists nowhere else in the test binary.";
403+
401404
let a = Symbol::new_static(&"a");
402405
let b = Symbol::new_static(&"b");
403406
let a2 = Symbol::new_static(&"a");
404407
assert_eq!(a, a2);
405408
assert_ne!(a, b);
409+
410+
let unique = Symbol::new_static(&UNIQUE_SYMBOL);
411+
assert_eq!(
412+
std::ptr::from_ref(unique.inner()),
413+
std::ptr::from_ref(&UNIQUE_SYMBOL)
414+
);
415+
}
416+
417+
#[test]
418+
fn address() {
419+
let a = Symbol::new_static(&"a");
420+
let a2 = Symbol::new(String::from("a"));
421+
assert_eq!(a, a2);
422+
assert_eq!(a.to_ffi(), a2.to_ffi());
423+
let a3 = Symbol::try_from_ffi(a.to_ffi()).unwrap();
424+
assert_eq!(a3, a);
406425
}
407426
}

stringleton/lib.rs

Lines changed: 27 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,10 @@ pub use stringleton_registry::{Registry, StaticSymbol, Symbol};
44

55
/// Create a literal symbol from a literal identifier or string
66
///
7-
/// Symbols created with the [`sym!(...)`](sym) macro are statically allocated
8-
/// and deduplicated on program startup. This means that there is no discernible
9-
/// overhead at the point of use, making them suitable even in long chains of
10-
/// `if` statements and inner loops.
7+
/// Symbols created with the [`sym!(...)`](crate::sym) macro are statically
8+
/// allocated and deduplicated on program startup. This means that there is no
9+
/// discernible overhead at the point of use, making them suitable even in long
10+
/// chains of `if` statements and inner loops.
1111
///
1212
/// **IMPORTANT:** For this macro to work in a particular crate, the
1313
/// [`enable!()`](crate::enable) macro must appear exactly once in the crate's
@@ -78,13 +78,13 @@ macro_rules! sym {
7878

7979
/// Create a static location for a literal symbol.
8080
///
81-
/// This macro works the same as [`sym!(...)`](sym), except that it produces a
82-
/// [`StaticSymbol`] instead of a [`Symbol`]. [`StaticSymbol`] implements
83-
/// `Deref<Target = Symbol>`, so it can be used in most places where a `Symbol`
84-
/// is expected.
81+
/// This macro works the same as [`sym!(...)`](crate::sym), except that it
82+
/// produces a [`StaticSymbol`] instead of a [`Symbol`]. [`StaticSymbol`]
83+
/// implements `Deref<Target = Symbol>`, so it can be used in most places where
84+
/// a `Symbol` is expected.
8585
///
86-
/// This macro also requires the presence of a call to the [`enable!()`](enable)
87-
/// macro at the crate root.
86+
/// This macro also requires the presence of a call to the
87+
/// [`enable!()`](crate::enable) macro at the crate root.
8888
///
8989
/// This macro can be used in the initialization of a `static` or `const` variable:
9090
///
@@ -144,7 +144,7 @@ macro_rules! static_sym {
144144
}}
145145
}
146146

147-
/// Enable the [`sym!(...)`](sym) macro in the calling crate.
147+
/// Enable the [`sym!(...)`](crate::sym) macro in the calling crate.
148148
///
149149
/// Put a call to this macro somewhere in the root of each crate that uses the
150150
/// `sym!(...)` macro.
@@ -163,6 +163,22 @@ macro_rules! static_sym {
163163
/// **CAUTION:** Using the second variant is discouraged, because it will not
164164
/// work when the other crate is being loaded as a dynamic library. However, it
165165
/// is very slightly more efficient.
166+
///
167+
/// ## Why?
168+
///
169+
/// The reason that this macro is necessary is dynamic linking. Under "normal"
170+
/// circumstances where all dependencies are statically linked, all crates could
171+
/// share a single symbol table. But dynamic libraries are linked independently
172+
/// of their host binary, so they have no access to the host's symbol table, if
173+
/// it even has one.
174+
///
175+
/// On Unix-like platforms, there is likely a solution for this based on "weak"
176+
/// linkage, but:
177+
///
178+
/// 1. Weak linkage is not a thing in Windows (DLLs need to explicitly request
179+
/// functions from the host binary using `GetModuleHandle()`, which is more
180+
/// brittle).
181+
/// 2. The `#[linkage]` attribute is unstable in Rust.
166182
#[macro_export]
167183
macro_rules! enable {
168184
() => {

0 commit comments

Comments
 (0)