Ignoring the fact that training an AI is insanely transformative and definitely fair use, people would not get any kind of pay. The data is owned by websites and corporations.
If AI training was to be highly restricted, Microsoft and google would just pay each other for the data and pay the few websites they don't own (stack, GitHub, Reddit, Shutterstock, etc), a bit of money would go to publishing houses and record companies, not enough for the actual artist to get anything over a few dollars.
And they would happily do it, since they would be the only players in the game and could easily overcharge for a product that is eventually going to replace 30% of our workforce.
Your emotional short sighted response kills all open source and literally gives our economy to Google and Microsoft. They become the sole owners of AI tech. Don't be stupid, please. They want you to be mad, it literally only helps them.
Most Loras use instruction type datasets. I know of only one that used straight text but that was the unreal docs and not just a book.
From what I understand, if you want it to answer questions on the book, you need to feed paragraphs into chat gpt and generate questions and answers.
If you want to generate text, you will want to have the input be the previous paragraphs and the output be the current paragraphs. Depending on what you want, you can grab paragraphs and then have ChatGtp write a summary, and put that summary as the prompt in the input instead of the previous paragraphs.
I like the alpaca format so I would add an instruction above all that explaining what it's suppose to do.
I would look into how the other fine tunes are structuring their data and mirror that. I would even grab some of theirs and add it to boost the diversity in your data. I find just training on one narrow subject makes it a bit dumb but I don't have all that much experience with it.