Ensure concurrent execution is race-free by (a) not storing per-inference mutable state in shared layer objects, (b) keeping OpenMP loop temporaries thread-local, and (c) using the correct synchronization primitive for the target platform.
Apply these rules: 1) State must be owned by the running instance, not by the shared model/layer.
mutable members updated in forward().Example pattern:
// Instead of: mutable Mat hidden, cell; hidden=...; cell=...; in forward()
// Do:
int forward(const std::vector<Blob*>& bottom_blobs,
std::vector<Blob*>& top_blobs) {
const Mat& hidden_in = bottom_blobs[HIDDEN_BLOB_INDEX]->data;
const Mat& cell_in = bottom_blobs[CELL_BLOB_INDEX]->data;
Mat hidden_out, cell_out;
// compute hidden_out/cell_out...
top_blobs[HIDDEN_OUT]->data = hidden_out;
top_blobs[CELL_OUT]->data = cell_out;
return 0;
}
2) In parallel loops, never let per-iteration data become shared state.
#pragma omp parallel for, any pointer/variable that changes per iteration (e.g., pc, pa, scales, bias) must be thread-local or loop-local (declared inside the loop or as private variables), not shared across iterations.3) Locking primitives must match platform semantics.
CRITICAL_SECTION vs newer primitives).Enter the URL of a public GitHub repository