-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CoNLL support #10
Comments
I think it is worth it yes. @Evizero has support for it in MLDatasets.jl if that is ported across, and enhanced to match the CorpusLoaders style:
And is working well, perhaps we can talk about deprecating it out of MLDatasets.jl. |
I am starting with the addition of CoNLL 2003 Corpus. The original files from the shared task are freely available. To extract the required files from it, one needs to have the Reuters Corpus file Instead of doing this, there are files of CoNLL 2003 that have been built and are openly available. I feel it will be very very difficult to take care of the downloading part with the former method and that I should go with the latter approach. What do you suggest in this case? Edit: I feel the latter approach will be simpler overall as well as easier to multiplicate this to other CoNLL datasets. |
The later sounds legit |
It'd be good to support CoNLL format in a generic sense (and then perhaps some of the more specific CoNLL formats as an offshoot). I'd be happy to work on this if this is something you think would be worth it.
The text was updated successfully, but these errors were encountered: