FNet and Pre-training
Phil,
Thanks for the reply! The FNet architecture is incredibly interesting. I am definitely going to attempt to build my own model using that, and attempt to train it on a sequence classification task on large documents. The tricky part is that I am relatively new to ML/AI compared to the folks writing these papers, so I'm fighting against the learning curve still. The other caveat is that FNet is poorly documented from a python/code standpoint, so I'll have to do some digging and collaboration with some folks to understand the implementation a little better.
I did get some time to look at the limitations of some of the other models and especially the transformers, and even CUAD. CUAD’s code and implementation is interesting as they can't use the model to run against a whole contract in their code. I’m not certain where in their code they implement the sliding window approach, and my research on the approach has been fruitless. (Not surprising.)
I am trying to get around the need to make a really large/complex algorithm to split documents into subparts to send to a model to get the predictions on each subpart, and then reassemble. Contracts aren’t always formatted in a way where a clause is only one paragraph, or two, so writing an Algo to handle the segmentation without using ML/NLP would be imperfect at best.
So my thinking was to handle that by using an architecture that can take a lot more tokens and search the larger token set for clauses, and then have it spit out predictions of what clauses are in the data and that would save the headache of reassembling them afterwards.
I think pretraining an FNet on the C4 dataset like they did will be helpful, and I have a huge 100GB corpus of contracts that I can use for that purpose too. I’m not sure whether or not there’s value in pre-training on both, or only one. WOuld you stick to the C4 or would you pre-train on both, or only the unlabeled contracts corpus?
— Pre-training FNet: Is the pretraining done in a similar way on FNet, as it is for transformers? Are you familiar enough with it to speak to how it can be done?
My next steps are going to be to build a model using the FNet architecture, and then attempt to pre-train it on the C4 dataset, then move to my CUAD-like dataset for sequence tagging.
If you have any insights into how you might approach the task, I would appreciate any insight you have, especially in the pre-training area.
Again, thanks for your time, and all the wonderful information you've provided, it's a huge help!!! Once I get a trained model, I am happy to share the outcomes with you if you if you’re interested. Just let me know!
Thanks,
Dane