22
33Extremely efficient string interning solution for Rust crates.
44
5+ * String interning:* The technique of representing all strings which are equal by
6+ a pointer or ID that is unique to the * contents* of that strings, such that O(n)
7+ string equality check becomes a O(1) pointer equality check.
8+
9+ Interned strings in Stringleton are called "symbols", in the tradition of Ruby.
10+
511## Distinguishing characteristics
612
713- Ultra fast: Getting the string representation of a ` Symbol ` is a lock-free
814 memory load. No reference counting or atomics involved.
915- Symbol literals (` sym!(...) ` ) are "free" at the call-site. Multiple
1016 invocations with the same string value are eagerly reconciled on program
11- startup, using link-time tricks.
17+ startup using linker tricks.
1218- Symbols are tiny. Just a single pointer - 8 bytes on 64-bit platforms.
19+ - Symbols are trivially copyable - no reference counting.
20+ - No size limit - symbol strings can be arbitrarily long (i.e., this is not a
21+ "small string optimization" implementation).
1322- Debugger friendly: If your debugger is able to display a plain Rust ` &str ` , it
1423 is capable of displaying ` Symbol ` .
1524- Dynamic library support: Symbols can be passed across dynamic linking
1625 boundaries (terms and conditions apply - see the documentation of
1726 ` stringleton-dylib ` ).
1827- ` no_std ` support: ` std ` synchronization primitives used in the symbol registry
19- can be replaced with ` once_cell ` and ` spin ` . _ See below for caveats._
28+ can be replaced with ` once_cell ` and ` spin ` . * See below for caveats.*
2029- ` serde ` support - symbols are serialized/deserialized as strings.
2130- Fast bulk-insertion of symbols at runtime.
2231
@@ -33,6 +42,7 @@ Extremely efficient string interning solution for Rust crates.
3342 of memory leaks, which is a denial-of-service hazard.
3443- You need a bit-stable representation of symbols that does not change between
3544 runs.
45+ - Consider if ` smol_str ` or ` cowstr ` is a better fit for such use cases.
3646
3747## Usage
3848
@@ -58,18 +68,18 @@ assert_eq!(message.as_str().as_ptr(), message2.as_str().as_ptr());
5868
5969## Crate features
6070
61- - ** std** _ (enabled by default)_ : Use synchronization primitives from the
71+ - ** std** * (enabled by default)* : Use synchronization primitives from the
6272 standard library. Implies ` alloc ` . When disabled, ` critical-section ` and
63- ` spin ` must both be enabled _ (see below for caveats)_ .
64- - ** alloc** _ (enabled by default)_ : Support creating symbols from ` String ` .
73+ ` spin ` must both be enabled * (see below for caveats)* .
74+ - ** alloc** * (enabled by default)* : Support creating symbols from ` String ` .
6575- ** serde** : Implements ` serde::Serialize ` and ` serde::Deserialize ` for symbols,
6676 which will be serialized/deserialized as plain strings.
6777- ** debug-assertions** : Enables expensive debugging checks at runtime - mostly
6878 useful to diagnose problems in complicated linker scenarios.
6979- ** critical-section** : When ` std ` is not enabled, this enables ` once_cell ` as a
7080 dependency with the ` critical-section ` feature enabled. Only relevant in
71- ` no_std ` environments. _ [ See ` critical-section ` for more
72- details.] ( https://docs.rs/critical-section/latest/critical_section/ ) _
81+ ` no_std ` environments. * [ See ` critical-section ` for more
82+ details.] ( https://docs.rs/critical-section/latest/critical_section/ ) *
7383- ** spin** : When ` std ` is not enabled, this enables ` spin ` as a dependency,
7484 which is used to obtain global read/write locks on the symbol registry. Only
7585 relevant in ` no_std ` environments (and is a pessimization in other
@@ -103,7 +113,7 @@ are deduplicated when the program starts. Any theoretically faster solution
103113would need fairly deep cooperation from the compiler aimed at this specific use
104114case.
105115
106- Also, symbol literals are _ always _ a memory load. The compiler cannot perform
116+ Also, symbol literals are * always * a memory load. The compiler cannot perform
107117optimizations based on the contents of symbols, because it doesn't know how they
108118will be reconciled until link time. For example, while ` sym!(a) != sym!(a) ` is
109119always false, the compiler cannot eliminate code paths relying on that.
@@ -126,10 +136,10 @@ broadly compatible with dynamic libraries, but there are a few caveats:
126136 the dependency graph, the "host" crate must be prevented from linking
127137 statically to ` stringleton ` , because it would either cause duplicate symbol
128138 definitions, or worse, the host and client binaries would disagree about
129- which ` Registry ` to use. To avoid this, the _ host _ binary can use
139+ which ` Registry ` to use. To avoid this, the * host * binary can use
130140 ` stringleton-dylib ` explicitly instead of ` stringleton ` , which forces dynamic
131141 linkage of the symbol registry.
132- 4 . Dynamically _ unloading _ libraries is extremely risky (` dlclose() ` and
142+ 4 . Dynamically * unloading * libraries is extremely risky (` dlclose() ` and
133143 similar). Unloading a library that has any calls to the ` sym!(..) ` or
134144 ` static_sym!(..) ` macros is instant UB. Such a library can in principle use
135145 ` Symbol::new() ` , but probably not ` Symbol::new_static() ` .
0 commit comments