Abstract:
It is tradition to assume that computation decomposed in the certain way into several threads is executed on the systems with shared memory (SMP or NUMA) more efficiently than the same computation but decomposed into several processes. In the presented work we hypothesize that this assumption may be false for the computations with big data volumes, mainly by two reasons. Firstly, the support of common shared address space for the treads may introduce substantially more overhead than aggregate expenses on the execution context switching between processes. Secondly, even when computation does not require intensive memory management, the natural limitation for the memory workset description volume stored in TLB results in necessity to frequently renew that translation cache in the case of using threads too. Experiments and their results which prove our hypothesis correctness are described later in the article.