SSE2 がなくても find_ith_1_bit の分岐はなくせる

BitVector の select に SSE2 を使ってみる - やた@はてな日記 の続きです.

変更箇所が極わずかなので,説明は省略します.SSE2 を使わなくても分岐をなくすことができました.めでたしめでたし.

SSE2 命令を使うバージョン

std::size_t find_ith_1_bit(std::uint64_t bits, std::size_t i) {
  // See http://en.wikipedia.org/wiki/Hamming_weight for details.
  std::uint64_t counts;
  counts = bits - ((bits >> 1) & MASK_55);
  counts = (counts & MASK_33) + ((counts >> 2) & MASK_33);
  counts = (counts + (counts >> 4)) & MASK_0F;
  counts *= MASK_01;

  // ↓ここから
  __m128i x = _mm_cvtsi64_si128(i * MASK_01);
  __m128i y = _mm_cvtsi64_si128(counts);
  x = _mm_cmpgt_epi8(x, y);
  const std::uint8_t offset = POPCNT_TABLE[_mm_movemask_epi8(x)];
  // ↑ここまで

  bits >>= offset;
  i -= ((counts << 8) >> offset) & 0xFF;
  return offset + SELECT_TABLE[i - 1][bits & 0xFF] + 1;
}

SSE 命令を使わないバージョン

std::size_t find_ith_1_bit(std::uint64_t bits, std::size_t i) {
  // See http://en.wikipedia.org/wiki/Hamming_weight for details.
  std::uint64_t counts;
  counts = bits - ((bits >> 1) & MASK_55);
  counts = (counts & MASK_33) + ((counts >> 2) & MASK_33);
  counts = (counts + (counts >> 4)) & MASK_0F;
  counts *= MASK_01;

  // ↓ここから
  const UInt64 x = (counts | MASK_80) - (i * MASK_01);
  const int offset = ::__builtin_ctzll((x & MASK_80) >> 7);
  // ↑ここまで

  bits >>= offset;
  i -= ((counts << 8) >> offset) & 0xFF;
  return offset + SELECT_TABLE[i - 1][bits & 0xFF] + 1;
}