GitHub - aielte-research/Diacritics_restoration: Diacritics restoration

Lightweight Diacritics Restoration
with
Dilated Convolutional Neural Networks

Try the model »

About The Project

This repository contains code for training, evaluation, and inference of our lightweight model for diacritics restoration, which employs a character-level 1D convolutional architecture. We demonstrate that solutions based on 1D dilated convolutional neural networks are competitive alternatives to models utilizing recurrent neural networks or linguistic modeling for diacritics restoration. Our proposed solution outperforms models of comparable size and demonstrates competitiveness with larger models. An additional advantage of our solution is its ability to run locally in a web browser, demonstrated in a functional implementation. We evaluate our model on various corpora, with an emphasis on the Hungarian language. We conducted comparative analyses to assess the generalization capabilities of the model across three Hungarian corpora. Additionally, we analyzed the errors to understand the limitations of corpus-based self-supervised training. More information can be found in our paper presented at LREC2022.

Model architecture visualization

Built With

Getting Started

If you want to try out the model, the demo is available here.

For training the model:

Prerequisites

The project logs both locally and to neptune.ai, to use a neptune.ai an account is neeeded. Logging to neptune can be disabled for individual experiments in the experiment's config, or globally by not providing an api token in the neptune_cfg.yaml.

Copy neptune_cfg_template.yaml to neptune_cfg.yaml, and fill out the appropriate details:

project_qualified_name: 
api_token: 
offline_logging_dir:

Installation

Refer to the example below and install the missing packages manually,

or use the environment.yml file: conda env create -f environment.yml.

Usage

The following command should work with the small example data provided in this repository.

CUDA_VISIBLE_DEVICES=0,1 python run.py -c conf/example.yaml

How to Cite

If you use this code in your research, please cite the corresponding paper:

@inproceedings{csanady-lukacs-2022-dilated,
    title = "Dilated Convolutional Neural Networks for Lightweight Diacritics Restoration",
    author = "Csan{\'a}dy, B{\'a}lint  and
      Luk{\'a}cs, Andr{\'a}s",
    editor = "Calzolari, Nicoletta  and
      B{\'e}chet, Fr{\'e}d{\'e}ric  and
      Blache, Philippe  and
      Choukri, Khalid  and
      Cieri, Christopher  and
      Declerck, Thierry  and
      Goggi, Sara  and
      Isahara, Hitoshi  and
      Maegaard, Bente  and
      Mariani, Joseph  and
      Mazo, H{\'e}l{\`e}ne  and
      Odijk, Jan  and
      Piperidis, Stelios",
    booktitle = "Proceedings of the Thirteenth Language Resources and Evaluation Conference",
    month = jun,
    year = "2022",
    address = "Marseille, France",
    publisher = "European Language Resources Association",
    url = "https://aclanthology.org/2022.lrec-1.452/",
    pages = "4253--4259",
}

Contributors

Bálint Csanády ([email protected])
András Lukács ([email protected])

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
diacritics_restorator		diacritics_restorator
img		img
webcorpus		webcorpus
webcorpus_2		webcorpus_2
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lightweight Diacritics Restoration
with
Dilated Convolutional Neural Networks

Try the model »

About The Project

Model architecture visualization

Built With

Getting Started

Prerequisites

Installation

Usage

How to Cite

Contributors

About

Releases

Packages

Contributors 2

Languages

aielte-research/Diacritics_restoration

Folders and files

Latest commit

History

Repository files navigation

Lightweight Diacritics RestorationwithDilated Convolutional Neural Networks

Try the model »

About The Project

Model architecture visualization

Built With

Getting Started

Prerequisites

Installation

Usage

How to Cite

Contributors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Lightweight Diacritics Restoration
with
Dilated Convolutional Neural Networks

Packages