Binding Threads to Cores on OSX
When I was porting the mbw to the OSX, I found that
sched_getaffinity
pthread_setaffinity_np
are missing, so that I could not bind a thread to a core. The result is the mbw threads could be running on the same core and compete for per-core resources, although the OSX kernel scheduler is unlikly to do so. I simply simulate these calls by using native OSX calls.
The first thing to do is to detect the number CPU cores available. On Linux,
the sched_getffinity
fills the cpu_set_t
data structure provided and
we can use CPU_ISSET(id, cpu_set_t)
to determine the number cores. Similarly
on OSX, the function
can be used to get various system informations. In our case, the follwoing
code gets the number of CPU cores. The cpu_set_t
is just a 32-bit variable
so that the max number cores is 32. Of course the actual
sched_getaffinity
function is much more complicated, but the current
implementation is enough to suit our purpose. The physical core number
is stored directly in the core_count
variable.
The next step is to simulate the function pthread_setaffinity_np
. This time
we use the affinity API
to bind a thread to a core. The first paramter is the thread you want to
bind. The second parameter specifies the policy you want to set, and
it is THREAD_AFFINITY_POLICY
. The third one is called the affinity tag
.
Threads in an affinity set have the same affinity tag
indicating that the
threads want to share an L2 cache. We set the tags differently
for different threads so that the kernel may try to schedule on different cores.
As you may already notice that the pthread_t
is not compatible with the first
parameter provided to thread_policy_set
, which is a thread_port_t
. The
first step we call pthread_mach_thread_np
to get a thread_port_t
form
pthread_t
. The core number that the thread should be bound to is provided
in the cpu_set
parameters. The CPU_SET(int core_num, cpu_set_t *cs)
is called to set the core number in the cpu_set_t
data structure.
Overall, the implementations for these two missing system calls are quite simple with limited power. Full implementations complied with the POSIX sematics are beyond the scope of this blog, and I do not really need all the functions. If I do, I will update them accordingly :-).