SSE2 がなくても find_ith_1_bit の分岐はなくせる
BitVector の select に SSE2 を使ってみる - やた@はてな日記 の続きです.
変更箇所が極わずかなので,説明は省略します.SSE2 を使わなくても分岐をなくすことができました.めでたしめでたし.
SSE2 命令を使うバージョン
std::size_t find_ith_1_bit(std::uint64_t bits, std::size_t i) { // See http://en.wikipedia.org/wiki/Hamming_weight for details. std::uint64_t counts; counts = bits - ((bits >> 1) & MASK_55); counts = (counts & MASK_33) + ((counts >> 2) & MASK_33); counts = (counts + (counts >> 4)) & MASK_0F; counts *= MASK_01; // ↓ここから __m128i x = _mm_cvtsi64_si128(i * MASK_01); __m128i y = _mm_cvtsi64_si128(counts); x = _mm_cmpgt_epi8(x, y); const std::uint8_t offset = POPCNT_TABLE[_mm_movemask_epi8(x)]; // ↑ここまで bits >>= offset; i -= ((counts << 8) >> offset) & 0xFF; return offset + SELECT_TABLE[i - 1][bits & 0xFF] + 1; }
SSE 命令を使わないバージョン
std::size_t find_ith_1_bit(std::uint64_t bits, std::size_t i) { // See http://en.wikipedia.org/wiki/Hamming_weight for details. std::uint64_t counts; counts = bits - ((bits >> 1) & MASK_55); counts = (counts & MASK_33) + ((counts >> 2) & MASK_33); counts = (counts + (counts >> 4)) & MASK_0F; counts *= MASK_01; // ↓ここから const UInt64 x = (counts | MASK_80) - (i * MASK_01); const int offset = ::__builtin_ctzll((x & MASK_80) >> 7); // ↑ここまで bits >>= offset; i -= ((counts << 8) >> offset) & 0xFF; return offset + SELECT_TABLE[i - 1][bits & 0xFF] + 1; }