Libfuzzer: hello world basics

I’am new to libfuzzer: libFuzzer – a library for coverage-guided fuzz testing. — LLVM 16.0.0git documentation. I took time to read the doc and googling… But I feel like I still miss basic concepts when dealing with real world examples?! Maybe I missed relevant doc/examples: if so, feel free to let me know.

The doc example here (like other very basic examples you may find googling) fuzzing/fuzz_me.cc at master · google/fuzzing · GitHub is OK: libfuzzer runs and crash as expected.

Now, say I have a real life code with real life API like such:

void MyAPI(std::vector<double> const& msg, size_t const& idx) 

How can I take inputs provided by libfuzzer and turn them into arguments for MyAPI:

#include <cstdint>
#include <cstddef>
#include <vector>
#include <iostream>
#include <string>

void MyAPI(std::vector<std::string> const& msg, size_t const& idx) {
  std::cout << msg[idx] << std::endl; // BUG here: idx must be < msg.size().
}

bool FuzzAPI(const uint8_t *Data, size_t Size) {
  // How to call MyAPI from here ?
}

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
  FuzzAPI(Data, Size);
  return 0;
}

I don’t know what is the value of Size is, nor, the content of Data as, if I get this correctly, they are randomly generated by the fuzzer: so how can I turn them into something MyAPI needs?

Next, what if MyAPI needs a filename as an argument like such?

>> cat helloworld.txt 
hello world!
hello world!
hello world!
hello world!
hello world!
hello world!
hello world!
hello world!
hello world!
hello world!

#include <cstdint>
#include <cstddef>
#include <vector>
#include <iostream>
#include <string>
#include <fstream>

void MyAPI(std::string const& filename, size_t const& idx) {
  ifstream myfile(filename.c_str()); // filename typically is helloworld.txt
  std::string line;
  std::vector<std::string> msg;
  while (std::getline(myfile, line)) {
    msg.push_back(line);
  }
  std::cout << msg[idx] << std::endl; // BUG here: idx must be < msg.size().
}

bool FuzzAPI(const uint8_t *Data, size_t Size) {
  // How to call MyAPI from here ?
}

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
  FuzzAPI(Data, Size);
  return 0;
}

I understand the notion of corpus to be input data I would have passed to MyAPI (the way you would in CI for instance), but, it’s not clear to me how to define this corpus (what’s supposed to be inside it) nor how the fuzzer will handle it (and in the end what I get in Data - do I get random number ? Or lines of the file ? Or something different ?). For instance, how to set an initial corpus to be ‘msg.assign(10, ‘hello world!’)’ or a file ‘hello_world.txt’ containing 10 ‘hello world!’ lines?

Looks to me like basic usage / concepts… But surprisingly I had a hard time to find examples. How one is supposed to fuzz such kind of API? What should be the (conversion) logic between LLVMFuzzerTestOneInput and MyAPI? Can somebody provide working examples in use cases presented above?

Note: not sure this is the good/best place to ask. If not, let me know and ideally point me to the correct place