scatter is the opposite of compact so the leaves of the node are most significant when sorting through the machine topology map. Specifying scatter distributes the threads as evenly as possible across the entire system. Specifying explicit assigns OpenMP threads to a list of OS proc IDs that have been explicitly specified by using the proclist= modifier, which is required for this affinity type. This includes the low-level API interfaces such as kmp_set_affinity and kmp_get_affinity, which have no effect and will return a nonzero error code. This forces the OpenMP run-time library to behave as if the affinity interface was not supported by the operating system. Specifying disabled completely disables the thread affinity interfaces. For example, in a topology map, the nearer a node is to the root, the more significance the node has when sorting the threads. Specifying compact assigns the OpenMP thread +1 to a free thread context as close as possible to the thread context where the OpenMP thread was placed. Specify KMP_AFFINITY=verbose,none to list a machine topology map. type = none (default)ĭoes not bind OpenMP threads to particular thread contexts however, if the operating system supports affinity, the compiler still uses the OpenMP thread affinity interface to determine machine topology. Not valid with type values of explicit, none, or disabled. The logical and physical types are deprecated but supported for backward compatibility. Physical (deprecated instead use scatter, possibly with an offset value) Logical (deprecated instead use compact, but omit any permute value) Takes the following specifiers: fine, thread, and core String consisting of keyword and specifier. The following table describes the supported specific arguments. The KMP_AFFINITY environment variable uses the following general syntax: Syntaxįor example, to list a machine topology map, specify KMP_AFFINITY=verbose,none to use a modifier of verbose and a type of none. Shell, Environment, and Other Software Settings Open MP Tuning Flags To override one of the options set by /fast, specify that option after the Where -prec-div improves precision of FP divides (some speed impact) Statically link in libraries at link time xT (generate code specialized for Intel(R) Core(TM)2 Duo processors, Intel(R) Core(TM)2 Quad processorsĪnd Intel(R) Xeon(R) processors with SSSE3) ipo (enables interprocedural optimizations across files) O3 (maximum speed and high-level optimizations) The -fast option enhances execution speed across the entire programīy including the following options that can improve run-time performance: GF (/Qvc7 and above), /Gf (/Qvc6 and below), and /Ob2 Windows platforms, -O3 sets the following: Use floating-point calculations and process large data sets. The O3 option is recommended for applications that have loops that heavily The optimizations may slowĭown code in some cases compared to O2 optimizations. ![]() Memory access transformations take place. The O3 optimizations may not cause higher performance unless loop and Performs more aggressive data dependency analysis than for O2, which ax or -x (Linux) or with options /Qax or /Qx (Windows), the compiler On IA-32 and Intel EM64T processors, when O3 is used with options Padding the size of certain power-of-two arrays to allow Loop unrolling, including instruction scheduling Such as prefetching, scalar replacement, and loop and memoryĪccess transformations. Og, /Oi-, /Os, /Oy, /Ob2, /GF (/Qvc7 and above), /Gf (/Qvc6Įnables O2 optimizations plus more aggressive optimizations, ![]() On IA-32 Windows platforms, -O2 sets the following: structure assignment lowering and optimizations strength reduction/induction variable simplification global instruction scheduling and control speculation The following capabilities for performance gain: Intra-file interprocedural optimizations, which include:
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |