Here’s an example of a mechanistic set-up with zero judgment. This is the kind of “work” offer I get in my inbox all the time.
here are the latest jobs we think you’d be great for:
STEM PhD Qualifier Contributor | AI Training Data Already have your PhD qualifying exam materials saved? Turn them into income. Submit real quals to help train and evaluate frontier AI models. $50 per accepted submission, fully remote and flexible. $40 – $50/hr Apply →
Hardware & Systems Engineering Work Products Contributor | AI Training Data Real engineering docs? Turn them into income. Submit project artifacts to train frontier AI models. $90 to $100 per submission, remote and flexible. $90 – $100/hr. Apply →
Now, I’m not qualified for either of these things. I’m not a STEM person. The company that sends me these emails presumably knows that. They do, after all, have my resume. However it seems as if the AI that reads resumes either cannot read or is programmed not to care about what it reads. But for just pennies an email blast, why not just send this out to EVERYBODY? Maybe someone will bite.
This more important question, however, is, “What am I biting off?” Let’s consider the first job. They want people to send in the PhD qualifying exams they were asked to take. The purpose here is clear: they want “legitimate” data on which to test their models. They want to see if their models are PhD-level. The problem here is that, in almost every case I can think of, the exam questions are not the property of the person who is being examined. They are the property of the examiner. What they are asking for here is an act of theft. It may be petty theft, but frankly, I’m not sure the professor giving the exam would feel that way. Indeed, he or she might be basing the exam question on current cutting-edge research they would rather not have shared with everyone on the internet. They certainly don’t want it run through the training data of a large language model so that it becomes, in a very diffuse form, potentially available to every person who ever uses that model. Aside from faculty feelings (and, of course, their rights), there’s the academic honesty component. At many major universities — and if they want really good data, the company presumably wants exams from major universities — distributing exam questions to people outside the exam is an act of academic dishonesty. It’s precisely the kind of thing that gets you “separated” from the institution. That is, it’s the thing that gets you kicked out. Imagine having gone through all your PhD classwork, having taken your prelims, and now you’re working on your thesis. Then it turns out that for $50, you have given away data that actually gets you thrown out of school. Talk about the most expensive $50 that you have ever earned! You won’t end up in the prison system, but you will have locked yourself out of some important career paths. And don’t think that the company that is asking for this data cares. Hey, they already paid you your $50. Let the seller beware.
The request for “real engineering docs” works on the same principle, but it is more legally fraught for everyone involved. If you are a hardware or systems engineer working for some corporation, those documents are most certainly NOT your property. Check your contract. There’s a Work Products section; it’s usually pretty specific. There’s likely an entire section in there on that. Moreover, there is also, especially if you have worked in tech, a non-disclosure agreement you have signed. So now you are being asked to sell documents that aren’t yours and violate an NDA. But hey, at least the pay is better. Double what you get for being a lowly grad student. What could possibly go wrong there? Dollars to donuts the company asking for this has a Terms of Service agreement that says that you (the submitter of such data) warrant that you have ownership over any of these documents. They think that clears them legally and leaves you on the hook for any consequences that get directed at them. But that’s like a fence saying that he will sell anything you have as long as you attest that it’s yours.
Neither you nor the company wants to end up in front of a jury on this, especially in the engineering case. Yes, the engineering docs got you more money, but that’s the tell. The company asking for the data is pricing in the risk they are taking. What risk? Oh, basically suborning theft. Depending on the exam questions or the engineering docs that get transmitted, both you and they could be violating federal laws about trade secrets and trafficking in technical data that has some impact on national security. The company may still think they are covered by their Terms of Service, but that’s a risky presumption. This is the kind of thing that is just waiting for FBI agents to stumble upon. “Hey, boss, we think we have a possible national security issue. We can just log in and send a couple of documents that look like they would violate that law and see what’s up. We will have payment logs and everything. It’ll take an hour or so.” At least your speedy trial will be super speedy. The jury will be back in 15 minutes.
Some business model, eh? Who thinks of this stuff? Who tells them, “No, that’s not a good idea.” The same people who write their contracts with the Work Product clause that threatens to sue me for everything I have if I share anything I have done for them? Was there any objection at all, or is cheap data just so addictive that nobody can just say no? They could, of course, hire elite STEM professors at their going rate to write qualifying exam questions for them. They could ask their own hardware and systems engineers to clean up the company’s own mountain of internal documents in a way that could be used to train frontier AI models. Nah, let’s just email strangers and ask them to commit a crime. Move fast. Breaks Laws. Optimize profit.
