-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better handling of responses without correct content-type charset #1022
Comments
This comment was marked as spam.
This comment was marked as spam.
@Ousret Excellent! It's a bit unclear from the discussion: will that be merged into the requests or would we need to monkeypatch around? |
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
@Ousret I was trying the move from >>> import requests
>>> r = requests.get('https://zoek.officielebekendmakingen.nl/kst-34200-14/metadata_owms.xml')
>>> r.headers['Content-Type']
'text/xml'
>>> r.encoding
'ISO-8859-1' # <-- not good
# Check 1
>>> r.content[:38]
b'<?xml version="1.0" encoding="UTF-8"?>'
# Check 2
>>> requests.compat.chardet
<module 'charset_normalizer' from '/.../lib/python3.9/site-packages/charset_normalizer/__init__.py'>
# Check 3
>>> requests.compat.chardet.detect(r.content)
{'encoding': 'utf-8', 'language': 'English', 'confidence': 0.975} So the new module is working fine, I had no doubt about that ;) |
This comment was marked as spam.
This comment was marked as spam.
Oh I see. Unfortunately it will not be so easy: $ python -m httpie 'https://zoek.officielebekendmakingen.nl/kst-34200-14/metadata_owms.xml'
HTTP/1.1 200 OK
Cache-Control: private
Content-Disposition: inline; filename=metadata_owms.xml
Content-Encoding: gzip
Content-Length: 871
Content-Security-Policy: frame-ancestors 'self'
Content-Type: text/xml
Date: Thu, 15 Jul 2021 15:00:26 GMT
Expect-CT: enforce, max-age=30
Permissions-Policy: geolocation=(), midi=(), notifications=(), push=(), microphone=(), camera=(), magnetometer=(), gyroscope=(), speaker=(), vibrate=(), fullscreen=(), payment=()
Referrer-Policy: strict-origin-when-cross-origin
Server:
Strict-Transport-Security: max-age=31536000; includeSubDomains;
Vary: Accept-Encoding
X-AspNet-Version:
X-AspNetMvc-Version:
X-Content-Security-Policy: frame-ancestors 'self'
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-Powered-By:
X-XSS-Protection: 1; mode=block
x-webkit-csp: frame-ancestors 'self'
__main__.py: error: RuntimeError: The content for this response was already consumed
Out of curiosity, do you see a workaround? :D |
The root cause seems to be in get_encoding_from_headers(): it will return |
psf/requests#2086 is interesting to follow. |
This comment was marked as spam.
This comment was marked as spam.
Yes, let me try something :) |
Problem
If server doesn't provide correct content-type charset, we're defaulting to latin1 because requests underlying does that. This is undesirable for user experience.
Possible solutions
application/json
, first line or BOM for XML, meta tag in HTML)Considerations
The text was updated successfully, but these errors were encountered: