Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does this project need to download a model from huggingface? #651

Open
neverlatetolearn0 opened this issue Dec 25, 2024 · 7 comments
Open
Labels
bug Something isn't working

Comments

@neverlatetolearn0
Copy link

Bug

When I run the code, I encounter the following problem
requests.exceptions.SSLError: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /api/models/ds4sd/docling-models/revision/v2.1.0 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))"), '(Request ID: 17d20384-a725-4353-8425-f27cb0dc2ff6)')

Does anyone else have the same problem

@neverlatetolearn0 neverlatetolearn0 added the bug Something isn't working label Dec 25, 2024
@Leflak
Copy link

Leflak commented Dec 25, 2024

Got the same problem, tried to update everything but does not work.

@tanaha2002
Copy link

same to me, do you know how to fix it?

@Leflak
Copy link

Leflak commented Dec 26, 2024

Was initially using personnal project to launch docling and got that error then executed the main page script which somehow solved my problem, hope that will help you too:

from docling.document_converter import DocumentConverter

source = "https://arxiv.org/pdf/2408.09869"  # document per local path or URL
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown())  # output: "## Docling Technical Report[...]"

@tanaha2002
Copy link

Was initially using personnal project to launch docling and got that error then executed the main page script which somehow solved my problem, hope that will help you too:

from docling.document_converter import DocumentConverter

source = "https://arxiv.org/pdf/2408.09869"  # document per local path or URL
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown())  # output: "## Docling Technical Report[...]"

There's still an error, but I found an alternative solution. You guys can try running it on Google Colab. In my case, installing it on my computer didn’t work, but it worked on Colab. And yes, it downloads some weights from Hugging Face. Somehow, my network blocks Hugging Face, which caused the problem.

@Leflak
Copy link

Leflak commented Dec 26, 2024

Requires transformers but maybe launch the simple python script (picked from the "Use this model" button from https://huggingface.co/ds4sd/docling-models) and then retry with my previous answer:

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("ds4sd/docling-models")

@trinanjan12
Copy link

trinanjan12 commented Dec 28, 2024

Yes, There is Layout model and the other one is Table detection model from hugging face
#Example code to only run layout model
from docling_ibm_models.layoutmodel.layout_predictor import LayoutPredictor
test_layout_predictor = LayoutPredictor("/home/your_name/.cache/huggingface/hub/models--ds4sd--docling-models/snapshots/36bebf56681740529abd09f5473a93a69373fbf0/model_artifacts/layout/")

import pypdfium2 as pdfium
test_pdoc = pdfium.PdfDocument("./tests/data/2305.03393v1-pg9.pdf")

#layout prediction for one page
test_layout_op = test_layout_predictor.predict(test_pdoc[0].render().to_pil())
print(list(test_layout_op))

@jdbranham
Copy link

I was also getting models errors with 2.14.0

Saw this in the log -

  File "/Users/###/###/.venv/lib/python3.12/site-packages/docling_ibm_models/layoutmodel/layout_predictor.py", line 98, in __init__
    raise FileNotFoundError("Missing ONNX file: {}".format(self._onnx_fn))
FileNotFoundError: Missing ONNX file: /Users/###/.cache/huggingface/hub/models--ds4sd--docling-models/snapshots/36bebf56681740529abd09f5473a93a69373fbf0/model_artifacts/layout/beehive_v0.0.5/model.pt

I'd cleared the hugging face cache but it didn't seem to fix the issue.

Downgrading to 2.13.0 seemed to work, then I went back to 2.14.0 and it works without error now. 😓

https://ds4sd.github.io/docling/usage/#setting-up-a-documentconverter

There's a section in the docs that might be helpful, but these imports don't appear to be valid -

from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.pipeline.standard_pdf_pipeline import StandardPdfPipeline

# # to explicitly prefetch:
# artifacts_path = StandardPdfPipeline.download_models_hf()

artifacts_path = "/local/path/to/artifacts"

pipeline_options = PdfPipelineOptions(artifacts_path=artifacts_path)
doc_converter = DocumentConverter(
    format_options={
        InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)
    }
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants