I got a ton of value from this book. It actually pushed me to dive deeper into profiling and eventually start building Rust perf tooling of my own https://github.com/pawurb/hotpath-rs
I have some frontend use cases for rust that I just ended up rewriting in typescript because transferring and loading the wasm rust blob is more expensive than running the program.
I imagine wasm-conscious optimizations would look a lot like targeting microcontrollers, but with weird escape hatches to high-level browser apis.
> The default hashing algorithm is not specified, but at the time of writing the default is an algorithm called SipHash 1-3. This algorithm is high quality—it provides high protection against collisions—but is relatively slow, particularly for short keys such as integers.
> An attempt to switch from fxhash back to the default hasher resulted in slowdowns ranging from 4-84%!
I/O
> Rust’s print! and println! macros lock stdout on every call. If you have repeated calls to these macros it may be better to lock stdout manually.
Build times
> If you use dev builds but don’t often use a debugger, consider disabling debuginfo. This can improve dev build times significantly, by as much as 20-40%.
Interesting std library alternatives
> If you have many short vectors, you can use the SmallVec type from the smallvec crate. SmallVec<[T; N]> is a drop-in replacement for Vec that can store N elements within the SmallVec itself, and then switches to a heap allocation if the number of elements exceeds that.
> If you have many short vectors and you precisely know their maximum length, ArrayVec from the arrayvec crate is a better choice than SmallVec. It does not require the fallback to heap allocation, which makes it a little faster.
> The SmallString type from the smallstr crate is similar to the SmallVec type.
I doubt I'll change my use of the standard types often, but this is good information to know for cases where this might be applicable.
Advice on enums
> If an enum has an outsized variant, consider boxing one or more fields.
I'm surprised I didn't see any advice about skipping proc macros or Serde for faster compile times.
Most of these compile time improvements seem to be more along the lines of drop-in changes that don't require larger refactors. Removing something like serde from a codebase that makes use of it generally is going to be a lot more work.
If you're referring to serde being brought in by a dependency when you don't need it, most well-behaved crates should already have this be something you opt into by specifying the feature rather than something you need to go out of your way to enable. That said, I've had a theory for a while now that when Rust projects end up suffering from long compile times, the most significant cause is unneeded code from dependencies getting compiled, and that the poor ergonomics around Cargo features have basically encouraged the opposite of the good behavior I described above. I've still almost never seen this discussed outside of when I bring it up, so I wrote up my thoughts on it in a blog post a while back rather than try to restate my case every time, but I don't have much hope that anyone will take it seriously enough to either convince me I'm wrong or do anything about it: https://saghm.com/cargo-features-rust-compile-times/
> That said, I've had a theory for a while now that when Rust projects end up suffering from long compile times, the most significant cause is unneeded code from dependencies getting compiled, and that the poor ergonomics around Cargo features have basically encouraged the opposite of the good behavior I described above.
Are you talking about clean builds or incremental builds? Rust developers care far more about the latter, and dependencies should only be monomorphized not rebuilt.
Interesting. It feels like once you have the features defined this is basically dead code elimination. To solve the transitive dependency issue could you not execute a dead code elimination pass once and cache the results?
Do you know how it then maps hashes to buckets? I'd expect the natural occurrence of integer values to be very non-uniform and heavily biased toward the small end.
But if you know the hash table implementation, it makes forcing collisions trivial for user-generated input, leading to easy denial of service attacks.
The first requirement for safe hashtable implementations is a secret key, which makes it impossible to know the hash value for an external observer. (Or even, to know the relative hash value between any two inputs.)
> The first requirement for safe hashtable implementations is a secret key,
Some languages use different approaches. The buckets in a Java HashMap turn into a sorted tree if they grow too large. Then there are trivial solutions like adding an input limit for untrusted users. Either way works, is secure and doesn't depend on a secret key.
Did you reply to the wrong comment, I'm not sure it follows from what I posted?
You can't force a collision for the function f(x) = x, by definition it has no collisions. It's not just collision-resistant, it's actually collision proof!
hash(x) = x is of course not a cryptographic hash, but it also has the advantage that it's unlikely to be confused for one if anyone looks at the output.
Nitpick: it’s still going to collide when taking the modulo to decide what bucket to put it in. If you know that the hashmap has 1000 buckets, and you spam it with multiples of 1000, you could make it put every key into the same bucket and have O(1) lookups turn into O(n). That doesn’t happen with real hash functions with random seeds.
It's not about collisions of the hash function itself.
Every hashtable implementation will put the hash value through some sort of modulo because you generally don't want to waste memory storing the key 5726591 at index 5726591 in an array.
So if you know how the implementation works and the hash function is predictable you can keep providing the program with values that will consistently go into the same bucket resulting in linear lookups and insertions.
the locking of stdout on every call is common amongst a lot of programming languages, a common issue when multi-threading code where every thread is allowed to print to the terminal.
> If you do not care about the compatibility of your binary on older (or other types of) processors, you can tell the compiler to generate the newest (and potentially fastest) instructions specific to a certain CPU architecture, such as AVX SIMD instructions for x86-64 CPUs.
> To request these instructions from the command line, use the -C target-cpu=native flag
I got a ton of value from this book. It actually pushed me to dive deeper into profiling and eventually start building Rust perf tooling of my own https://github.com/pawurb/hotpath-rs
Is there something similar to this for rust wasm?
I have some frontend use cases for rust that I just ended up rewriting in typescript because transferring and loading the wasm rust blob is more expensive than running the program.
I imagine wasm-conscious optimizations would look a lot like targeting microcontrollers, but with weird escape hatches to high-level browser apis.
I find the RustWasm book has a section on profiling throughput and latency performance for Rust Wasm: https://rustwasm.github.io/book/reference/time-profiling.htm.... Hope it helps.
Print version (full page) https://nnethercote.github.io/perf-book/print.html
This is a great resource!
Some TILs:
Hashing
> The default hashing algorithm is not specified, but at the time of writing the default is an algorithm called SipHash 1-3. This algorithm is high quality—it provides high protection against collisions—but is relatively slow, particularly for short keys such as integers.
> An attempt to switch from fxhash back to the default hasher resulted in slowdowns ranging from 4-84%!
I/O
> Rust’s print! and println! macros lock stdout on every call. If you have repeated calls to these macros it may be better to lock stdout manually.
Build times
> If you use dev builds but don’t often use a debugger, consider disabling debuginfo. This can improve dev build times significantly, by as much as 20-40%.
Interesting std library alternatives
> If you have many short vectors, you can use the SmallVec type from the smallvec crate. SmallVec<[T; N]> is a drop-in replacement for Vec that can store N elements within the SmallVec itself, and then switches to a heap allocation if the number of elements exceeds that.
> If you have many short vectors and you precisely know their maximum length, ArrayVec from the arrayvec crate is a better choice than SmallVec. It does not require the fallback to heap allocation, which makes it a little faster.
> The SmallString type from the smallstr crate is similar to the SmallVec type.
I doubt I'll change my use of the standard types often, but this is good information to know for cases where this might be applicable.
Advice on enums
> If an enum has an outsized variant, consider boxing one or more fields.
I'm surprised I didn't see any advice about skipping proc macros or Serde for faster compile times.
Most of these compile time improvements seem to be more along the lines of drop-in changes that don't require larger refactors. Removing something like serde from a codebase that makes use of it generally is going to be a lot more work.
If you're referring to serde being brought in by a dependency when you don't need it, most well-behaved crates should already have this be something you opt into by specifying the feature rather than something you need to go out of your way to enable. That said, I've had a theory for a while now that when Rust projects end up suffering from long compile times, the most significant cause is unneeded code from dependencies getting compiled, and that the poor ergonomics around Cargo features have basically encouraged the opposite of the good behavior I described above. I've still almost never seen this discussed outside of when I bring it up, so I wrote up my thoughts on it in a blog post a while back rather than try to restate my case every time, but I don't have much hope that anyone will take it seriously enough to either convince me I'm wrong or do anything about it: https://saghm.com/cargo-features-rust-compile-times/
> That said, I've had a theory for a while now that when Rust projects end up suffering from long compile times, the most significant cause is unneeded code from dependencies getting compiled, and that the poor ergonomics around Cargo features have basically encouraged the opposite of the good behavior I described above.
Are you talking about clean builds or incremental builds? Rust developers care far more about the latter, and dependencies should only be monomorphized not rebuilt.
Interesting. It feels like once you have the features defined this is basically dead code elimination. To solve the transitive dependency issue could you not execute a dead code elimination pass once and cache the results?
> slow, particularly for short keys such as integers
An interesting thing about the CLR is that the default hash for integers is:
Which as well as being collision-free, also avoids the trap of a slow default hash.Do you know how it then maps hashes to buckets? I'd expect the natural occurrence of integer values to be very non-uniform and heavily biased toward the small end.
But if you know the hash table implementation, it makes forcing collisions trivial for user-generated input, leading to easy denial of service attacks.
The first requirement for safe hashtable implementations is a secret key, which makes it impossible to know the hash value for an external observer. (Or even, to know the relative hash value between any two inputs.)
> The first requirement for safe hashtable implementations is a secret key,
Some languages use different approaches. The buckets in a Java HashMap turn into a sorted tree if they grow too large. Then there are trivial solutions like adding an input limit for untrusted users. Either way works, is secure and doesn't depend on a secret key.
Did you reply to the wrong comment, I'm not sure it follows from what I posted?
You can't force a collision for the function f(x) = x, by definition it has no collisions. It's not just collision-resistant, it's actually collision proof!
hash(x) = x is of course not a cryptographic hash, but it also has the advantage that it's unlikely to be confused for one if anyone looks at the output.
Nitpick: it’s still going to collide when taking the modulo to decide what bucket to put it in. If you know that the hashmap has 1000 buckets, and you spam it with multiples of 1000, you could make it put every key into the same bucket and have O(1) lookups turn into O(n). That doesn’t happen with real hash functions with random seeds.
Oh right, got you.
It's not about collisions of the hash function itself.
Every hashtable implementation will put the hash value through some sort of modulo because you generally don't want to waste memory storing the key 5726591 at index 5726591 in an array.
So if you know how the implementation works and the hash function is predictable you can keep providing the program with values that will consistently go into the same bucket resulting in linear lookups and insertions.
the locking of stdout on every call is common amongst a lot of programming languages, a common issue when multi-threading code where every thread is allowed to print to the terminal.
RE ArrayVec: I would recommend the heapless crate instead, better code and nicer API.
I miss SIMD...
There is this:
> CPU Specific Instructions
> If you do not care about the compatibility of your binary on older (or other types of) processors, you can tell the compiler to generate the newest (and potentially fastest) instructions specific to a certain CPU architecture, such as AVX SIMD instructions for x86-64 CPUs.
> To request these instructions from the command line, use the -C target-cpu=native flag