When implementing algorithms that need to execute efficiently across different platforms, consider both compile-time and runtime optimizations: 1. For template-heavy code, use explicit instantiations to improve compilation speed and avoid compiler limitations on specific platforms:
When implementing algorithms that need to execute efficiently across different platforms, consider both compile-time and runtime optimizations:
// Use CMake to generate explicit template instantiations
// in separate compilation units
#cmakedefine LIBND4J_TYPE_GEN
#include <ops/declarable/helpers/cpu/summaryReductions.hpp>
// Prefer explicit sized types rather than platform-dependent types
typedef int Nd4jInt; // For 32-bit integers
typedef long long Nd4jLong; // For 64-bit integers
// Instead of:
// if (rootSeed == 0)
// rootSeed = currentMilliseconds();
// Directly use the implementation:
if (rootSeed == 0){
auto s = std::chrono::system_clock::now().time_since_epoch();
rootSeed = std::chrono::duration_cast<std::chrono::milliseconds>(s).count();
}
template<typename Op, typename ...Args>
FORCEINLINE void callCudnnIfNoErr(cudnnStatus_t &err, Op op, Args&&... args) {
if(err==CUDNN_STATUS_SUCCESS) {
err = op(std::forward<Args>(args)...);
if(err) {
nd4j_printf("Cudnn error code %s\n", cudnnGetErrorString(err));
}
}
}
Always document platform-specific considerations directly in the code to help future developers understand your optimization decisions.
Enter the URL of a public GitHub repository