뭐하는 놈인지는 알고 쓰자 시리즈의 네번째 주제는 boost io_context 실행 모델입니다.

execution model where you launch N threads for the same io_context class instance.와 execution model where you create N pairs of “1 io_context + 1 thread”에 포스팅된 내용을 바탕으로 작성되었으니 참고부탁드립니다.

#define BOOST_ASIO_NO_DEPRECATED
#include <boost/asio.hpp>
// ...

io::io_context io_context;
// Prepare things
std::vector<std::thread> threads;
auto count = std::thread::hardware_concurrency() * 2;

for(int n = 0; n < count; ++n)
{
    threads.emplace_back([&]
    {
        io_context.run();
    });
}

for(auto& thread : threads)
{
    if(thread.joinable())
    {
        thread.join();
    }
}

이 경우에 io_context는 클래식한 쓰레드 풀과 같이 동작합니다. 비동기 작업들은 OS 측면의 어딘가에서 수행되지만 완료 핸들러는 io_context::run 함수가 실행중인 쓰레드들에서 발생하게 됩니다. 좀 더 정확히 말하자면: 모든 완료 핸들러는 io_context::run 함수가 실행 중인 최초의 유휴 상태 쓰레드에서 발생합니다.

In that case io_context operates like a classic thread pool. Asynchronous tasks are performed somewhere on the OS side, however completion handlers are invoked on those threads where io_context::run function is running. To be more precise: every completion handler is invoked on a first free thread which io_context::run function is running on.

이것은 완료 핸들러가 병렬적으로 실행될 수 있다는 것을 의미합니다. 그리고 이것은 우리가 동기화가 필요한 지점에 도달했다는 것을 의미하기도 합니다.

It means that completion handlers could run in parallel. And this is, in turn, mean that we’ve reached a point where we need some synchronization.

멀티쓰레드 환경에서 여러분이 동기화에 대해 신경써야하는 부분들을 줄일 수록 더 좋을 것입니다. 좋은 소식은 Boost.Asio를 활용한 멀티쓰레드 환경에서는 동기화를 위해 뮤텍스 또는 세마포어와 같은 엣지 로우 레벨 동기화 툴들이 필요하지 않다는 것입니다.

The less you have to care about synchronization in a multithreaded environment — the better. The good news is that we don’t need such edgy low-level synchronization tools like mutexes or semaphores to get things synchronized in the Boost.Asio multithreaded environment.

여러분의 완료 핸들러를 적절하게 동기화하기 위해 필요한 것은 io_context::strand 클래스 인스턴스뿐입니다. 그것은 꽤 단순하게 동작합니다: 동일한 io_context::strand에 할당된 완료 핸들러는 순차적으로 실행될 것입니다. 그것들은 서로 다른 쓰레드들에서 실행되지만, 그것들의 실행은 순차적이게 될 것입니다. 이것은 그것들이 병렬적으로 처리되지 않을것이며 여러분이 동기화 처리를 해줄 필요가 없다는 것을 의미합니다.

The only thing you need to get your completion handlers synchronized properly is io_context::strand class instance. It works pretty simple: completion handlers attached to the same io_context::strand will be invoked serially. They could be invoked from different threads, however those invocations will be serialized. This means that things won’t go in parallel and you don’t have to deal with synchronization.

따라서 여러분이 해야 할 일은 공유 데이터에서 작동하는 완료 핸들러와 동일한 io_context::strand에 연결되어야 하는 완료 핸들러, 그리고 이들 중 어느 것이 독립적이고 병렬적으로 처리할 수 있는지를 결정하는 것입니다. 여러분은 완료 핸들러를 strand로 감싸기위해 boost::asio::bind_executor 함수를 사용해야합니다. 예제를 살펴봅시다. 우리의 io_context::run 함수가 멀티 쓰레드 환경에서 실행중이라고 가정해보겠습니다.

So, everything you need to do is to decide which completion handlers operate on a shared data and should be attached to the same io_context::strand, and which of them are independent and can go in parallel. You should use boost::asio::bind_executor function to wrap a completion handler into a strand. Let’s look at the example. Assume that our io_context::run is running on multiple threads:

class session
{
    session(io::io_context& io_context)
    : socket(io_context)
    , read  (io_context)
    , write (io_context)
    {
    }

    void async_read()
    {
        io::async_read(socket, read_buffer, io::bind_executor(read, [&] (error_code error, std::size_t bytes_transferred)
        {
            if(!error)
            {
                // ...
                async_read();
            }
        }));
    }

    void async_write()
    {
        io::async_read(socket, write_buffer, io::bind_executor(write, [&] (error_code error, std::size_t bytes_transferred)
        {
            if(!error)
            {
                // ...
                async_write();
            }
        }));
    }

private:

    tcp::socket socket;
    io::io_context::strand read;
    io::io_context::strand write;
}

위의 예제에서 우리는 2개의 strand를 사용했는데 하나는 read 연산을 다른 하나는 write 연산을 처리하기 위함입니다. 이것은 read 완료 핸들러와 write 완료 핸들러가 서로 다른 strand에서 직렬화 될 것이라는 것을 의미합니다. 또한 이것은 동일한 타입(여기서는 read냐 write냐)의 완료 핸들러들은 순차적으로 진행되나 read와 write 핸들러는 각각 병렬적으로 진행될 것이라는 의미이기도 합니다. 이것이 여러분이 제어 흐름을 동기적으로 유지하기 위해 해야할 전부입니다. 너무 간단하지 않은가요! 여러분이 strand를 적절하게 할당해주는 한 교착상태에 빠지거나 다른 일반적인 멀티쓰레딩 이슈가 발생할 수 없습니다.

In the example above we used two strands, one for reading and one for writing operations. This means that read completion handlers will be serialized with one strand and write handlers will be serialized with another strand. Which means that completion handlers of the same type will go serially, however read and write handlers will go in parallel to each other. And that’s all you need to keep your control flow synchronized, so simple! Note that you can’t get deadlocked here or run into other common multithreading issues. As long as you designate your strands properly.

여러분은 주어진 strand안에서 여러분의 functor를 실행하기 위해 boost::asio::post 함수를 사용할 수도 있습니다.

You can also use boost::asio::post function with io_context::strand to execute your functors within a given strand:

io::post(read, []
{
    std::cout << "We're inside a read sequence, it's safe to access a read-related data here!\n";
});

1개의 io_context와 1개의 쓰레드 쌍을 N개 생성하는 실행 모델

이전 강의에서 N개의 쓰레드를 하나의 io_context 클래스 객체에서 동작하도록 하는 실행 모델을 배워보았습니다. 이 경우에 io_context는 여러분을 위해 자체적으로 로드 밸런싱을 수행하며, 여러분은 다음 핸들러를 실행하기 위해 어떤 쓰레드를 사용해야하는지에 대해 신경 쓸 필요가 없습니다.

In the previous lesson we’ve learned an execution model where you launch N threads for the same io_context class instance. In that case io_context do load balancing for you, and you don’t need to care which thread should be used for the next handler to execute on.

1개의 io_context와 1개의 쓰레드 쌍을 N개 생성하는 또 다른 실행 모델이 있습니다. 이 경우에 모든 쓰레드는 각자의 io_context 클래스 객체를 갖게됩니다. 아래 예제를 살펴봅시다. io_context group wrapper는 요청된 io_context, work guard 그리고 쓰레드 클래스 객체의 카운트 값을 생성합니다. 우리는 아래 예제를 통해 이 실행 모델에 대해 논의할 예정입니다.

There is another execution model where you create N pairs of “1 io_context + 1 thread” instead. In that case every thread has its own io_context class instance. Look at the example below. It’s io_context group wrapper which creates requested count of io_context, work guard and thread class instances. We will discuss this execution model below the example.

#define BOOST_ASIO_NO_DEPRECATED
#include <boost/asio.hpp>
#include <thread>

namespace io = boost::asio;
using tcp = io::ip::tcp;
using work_guard_type = io::executor_work_guard<io::io_context::executor_type>;
using error_code = boost::system::error_code;

class io_context_group
{
public:

    io_context_group(std::size_t size)
    {
        // Create io_context and work guard pairs
        for(std::size_t n = 0; n < size; ++n)
        {
            contexts.emplace_back(std::make_shared<io::io_context>());
            guards.emplace_back(std::make_shared<work_guard_type>(contexts.back()->get_executor()));
        }
    }

    void run()
    {
        // Create threads
        for(auto& io_context : contexts)
        {
            threads.emplace_back([&]
            {
                io_context->run();
            });
        }

        // Join threads
        for(auto& thread : threads)
        {
            thread.join();
        }
    }

    // Round-robin io_context& query
    io::io_context& query()
    {
        return *contexts[index++ % contexts.size()];
    }

private:

    template <typename T>
    using vector_ptr = std::vector<std::shared_ptr<T>>;

    vector_ptr<io::io_context> contexts;
    vector_ptr<work_guard_type> guards;
    std::vector<std::thread> threads;

    std::atomic<std::size_t> index = 0;
};

int main()
{
    io_context_group group(std::thread::hardware_concurrency() * 2);
    tcp::socket socket(group.query());
    // Schedule some tasks
    group.run();
    return 0;
}

여러분이 이 실행 모델에 관해 알아야하는 내용들:

Things you should know about this execution model:

여러분은 strands 또는 다른 어떤 동기화 도구들로 스트레스 받을 필요가 없습니다: 모든 io_context가 단일 쓰레드 내에서 실행되기때문에 동기화가 필요한 데이터가 없습니다. 물론 동일한 io_context 핸들러에서 동일한 데이터에 접근하는것에 한해서 입니다. 이러한 점은 플러스 요인으로 보입니다.

💡 You don’t need to mess with strands or any other synchronization tools: since every io_context runs within a single thread, no data requires synchronization. As long as you access the same data from the same io_context handlers only. That looks like a plus.

io_context에서 동작하는 socket이나 acceptor 등과 같은 객체들은 io_context 객체에 한 번만 바인딩됩니다. 여러분은 이러한 객체들의 생명주기 내에서는 또 다른 io_context에 다시 바인딩할 수 없습니다. 이 말은 곧 동일한 io_context에 바인딩된 모든 객체들은 단일 쓰레드 내에서 실행될 것이라는 것을 의미합니다. 이것이 그들이 모든 시간 동안에 동일한 CPU 코어에 바인딩된다는 의미는 아닙니다 - OS는 가장 적합한 코어라고 생각되는 곳에서 쓰레드를 실행하며 쓰레드의 생명주기 내에서 쓰레드의 코어가 변경될 수 있습니다. 하지만 쓰레드가 실행되는 코어가 어디냐에 관계없이 모든 io_context 객체들은 항상 (현재의)단일 코어 내에서 실행될 것입니다. 그래서 여러분은 하나의 코어가 100% 사용률을 보이는 반면 다른 코어들은 놀고 있는 상태인 상황을 직면할 수도 있습니다. 언뜻보기에 이것은 마이너스 요인처럼 보입니다.

💡 Objects working on io_context, such as sockets, acceptors, etc, are bound to io_context object once. You can’t rebind any of them to another io_context within their lifetime. Which means that all objects bound to the same io_context will run within a single thread. This doesn’t mean that they’re bound to the same hardware CPU core all the time — an operating system runs a thread on the most suitable core and may change a thread’s core within a thread’s lifetime. However wherever that thread is running, all io_context objects will always run within a single (current) core. So you may face a situation when one core runs at 100% load while the others are idle. At a first glance that’s look like a minus.

진짜 마이너스 요인은 이러한 사실 자체(하나의 코어가 100% 사용률을 보이는 반면 다른 코어들은 놀고 있는 상태)라기 보다는 선택한 밸런싱 알고리즘 또는 특정한 이용 사례에 대한 실행 모델의 부적절한 사용의 결과라고 볼 수 있습니다. 이전에 살펴본 1:N 실행 모델의 경우에는 자체적으로 여러분에게 밸런싱을 제공해주는 반면, 이 모델은 어플리케이션(또는 라이브러리) 개발자에 의해 구현되는 밸런싱 알고리즘을 요구하게 됩니다.

💡 Well, it’s not really a minus of the execution model itself, but a minus of a balancing algorithm chosen or a result of improper usage of the execution model for a specific use-case. While the execution model from the previous lesson do balancing for you, this lesson’s model requires balancing algorithm to be implemented by the application (or a library) developer.

이전 강의에서 살펴본 1:N 실행 모델은 범용적으로 사용되는 모델입니다. 여러분이 선택해야하는 밸런싱 알고리즘이 무엇인지 모르겠는 상황이라면 이 모델을 선택하시면 됩니다. 위에서 살펴본 1:1 실행 모델은 더 빠르게 동작할 수 있습니다. 하지만 이것은 특별한 케이스에 대해서만 알맞은 선택입니다: 여러분의 어플리케이션이 또 다른 어플리케이션과 몇가지 특별한 방식으로 상호작용하는 경우에. 그리고 이 특별한 방식은 적절한 밸런싱 알고리즘을 요구합니다. 위에서 본 예제에서는 라운드로빈 알고리즘을 사용했고, 일반적으로 그 알고리즘이 나쁘다 혹은 좋다라고 말할 수는 없습니다 - 이 알고리즘이 적절한지에 대한 판단은 다른 어플리케이션과의 상호작용 방식이 무엇이냐에 따라 달라집니다. 예를 들어, 우리의 어플리케이션이 많은 랜덤한 가벼운 작업들을 처리하는 경우라면 자동화된 밸런서보다 더 나은 해결책일 수 있습니다. 하지만 이러한 판단을 명확히 하는것은 쉽지 않습니다. 커스텀 밸런서의 다른 이용 사례나 디자인 패턴들은 이번 강의의 범위를 벗어나는 내용입니다. 이것들에 대해서는 나중에 다루어보도록 하겠습니다. 다시 한번: 여러분이 어떠한 타입의 실행 모델을 사용해야할지 확신이 서지 않는다면 이전 강의에서 살펴본 1:N 실행 모델을 선택하는것이 바람직합니다.

💡 The execution model from the previous lesson is a universal one. Pick it if you don’t really know what balancing algorithm you should choose. The execution model from the current lesson may work faster though. However it best fit for special cases only: when your application interacts with other applications in some of special ways. And those special ways require a proper balancing algorithms. In the example above we used a round-robin algorithm, and we can’t really say if that algorithm is bad or good in general — that depends on a way of interaction of our application with other applications. For example, if our application handles a lot of random lightweight tasks then it could be a better solution than the automatic balancer. However things could be not as obvious as they’re appear to be at a first glance. Different use-cases and design patterns of a custom balancer is out of scope of this lesson. We will discuss them some later. Once again: if you’re not sure what type of execution model you should choose then pick an automatic one from the previous lesson.

마치며

지금까지 boost::io_context의 실행 모델(execution model)에 대해 알아보았습니다. 아래와 같이 정리해보며 해당 포스팅은 여기서 마치도록 하겠습니다.

boost::io_context의 실행 모델이란 하나의 어플리케이션에서 특정한 작업을 수행하는 thread와 그 쓰레드를 구동하는 공간인 io_context간의 관계를 규정해놓은 것이다.
boost::io_context의 실행 모델에는 1:N 방식과 1:1 방식이 있다. (io_context:thread)
1:N 실행 모델의 경우에는 N개의 쓰레드를 하나의 io_context에서 처리하는 구조이다.
1:1 실행 모델의 경우에는 모든 쓰레드가 각자의 io_context를 갖는 구조이다.
1:N 실행 모델은 N개의 쓰레드들에 대해서 자체적인 로드 밸런싱을 제공하지만, 1:1 실행 모델의 경우에는 그렇지 않다(적절한 밸런싱 알고리즘을 개발자가 선택해서 적용해야함).
어떠한 타입의 실행 모델을 사용해야할지 확신이 서지 않는다면 1:N 모델을 택하라.

해당 게시글에서 발생한 오탈자나 잘못된 내용에 대한 정정 댓글 격하게 환영합니다😎

Reference

boost::io_context 실행 모델에 대해 알아보자

목차

N개의 쓰레드에서 1개의 io_context를 사용하는 실행 모델

1개의 io_context와 1개의 쓰레드 쌍을 N개 생성하는 실행 모델

마치며

첼시팬개발자

boost::io_context 실행 모델에 대해 알아보자

목차

N개의 쓰레드에서 1개의 io_context를 사용하는 실행 모델

1개의 io_context와 1개의 쓰레드 쌍을 N개 생성하는 실행 모델

마치며

첼시팬개발자

Hexo Tranquilpeak 테마의 Markdown syntax

(BBC 스포츠) 첼시 루드 굴리트 25년이 지난 지금 그의 프리미어그리 진출을 애정어린 시선으로 되돌아 보다

I/O는 어떻게 처리될까?

강력한 터미널 멀티플렉서 tmux를 활용해보자

Blocking, Nonblocking, Sync, Async

(BBC 스포츠) 첼시의 물망에 오른 루카쿠 적절한 오퍼가 온다면 인터 밀란을 떠나기 원해

(BBC 스포츠) 토트넘 공격수 손흥민 4년 재계약 성사

인텔 Hyperscan 레퍼런스 가이드 번역

인텔 Hyperscan API 테스트 코드 작성

Seg Fault는 왜 발생하는가?