Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update configuration to always download HTML/JS/XML Bitstreams (no inline display) #9638

Merged
merged 6 commits into from
Jun 17, 2024

Conversation

tdonohue
Copy link
Member

@tdonohue tdonohue commented Jun 6, 2024

Description

This PR updates our default settings for webui.content_disposition_format to include HTML, JavaScript, XML and RDF by default.

The changes in this PR are the equivalent of these settings in dspace.cfg. These settings will work in any DSpace 7.6 or 7.6.1 installation to provide the same fixes as this PR.

webui.content_disposition_format = text/html
webui.content_disposition_format = text/javascript
webui.content_disposition_format = text/xml
webui.content_disposition_format = rdf

This ensures that bitstreams of these formats are always downloaded and never opened inline by a user's browser. Opening these formats inline may cause the browser to execute any JavaScript contained within them. (Execution of embedded JavaScript may not always occur for all sites, but it is possible.)

New Wildcard Option

This PR also provides sites the ability to force all formats to download only. This means that no files will ever be opened inline in a user's browser. To force all formats to download only, use a wildcard/asterisk:

webui.content_disposition_format = *

Add JavaScript to Format Registry

This PR also adds JavaScript to the list of default formats in the Bitstream Format Registry to ensure it is a recognized format (by default).

Instructions for Reviewers

  • Deploy this PR
  • Upload one or more HTML or JS or XML or RDF bitstreams to a new or existing Item
  • Attempt to download those bitstreams. Ensure do not open in your browser window. Instead, they should download directly to your machine.
  • Test the new wildcard option.
    • Set webui.content_disposition_format = * in your local.cfg or dspace.cfg
    • Ensure all file formats will now only download.
    • For example, your browser should no longer attempt to open PDFs or image files in the browser window. These will all be downloaded.

Related work

For more on webui.content_disposition_format see #8891

@tdonohue tdonohue added bug high priority interface: REST API v7+ REST API for v7 and later (dspace-server-webapp module) 1 APPROVAL pull request only requires a single approval to merge. port to dspace-7_x This PR needs to be ported to `dspace-7_x` branch for next bug-fix release labels Jun 6, 2024
@tdonohue tdonohue added this to the 8.0 milestone Jun 6, 2024
Copy link
Member

@mwoodiupui mwoodiupui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks right but seems to be failing integration tests.

@tdonohue
Copy link
Member Author

tdonohue commented Jun 6, 2024

@mwoodiupui : Thanks for the note! It appears we have an IT which is still hardcoding a specific count of BitstreamFormats (sigh). Since this PR added a single new BitstreamFormat those counts all failed. So, it's really a flawed test & not a sign of an issue in this PR.

Nonetheless, I'll get the test fixed (hopefully later today) so that it succeeds. (If I can figure out a way to easily remove the hardcoded numbers, I'll do so...but not sure how much effort that'd be)

Copy link
Member

@abollini abollini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with this PR except for the IT fix that I would recommend to fix keeping the number of types hardcoded

@tdonohue
Copy link
Member Author

tdonohue commented Jun 6, 2024

@mwoodiupui : I fixed the failing tests by ensure the BitstreamFormatRestRepositoryIT no longer hardcodes the number of BitstreamFormats. The updated code now loads that count from the BitstreamFormatService before tests run, ensuring it will update dynamically for the future.

Nevermind, I changed it to update the hardcoded value per @abollini 's feedback above. See 8d81c14

@tdonohue tdonohue force-pushed the avoid_inline_js branch 2 times, most recently from 1b16836 to 8d81c14 Compare June 6, 2024 21:50
@tdonohue tdonohue requested review from abollini and mwoodiupui June 6, 2024 21:51
@mwoodiupui
Copy link
Member

Perhaps, separate from this issue, we should have a discussion about why the test should depend on a magic number that has to be changed when unrelated improvements are made.

Copy link
Member

@mwoodiupui mwoodiupui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 by inspection

@Xib3rR4dAr
Copy link

Xib3rR4dAr commented Jun 8, 2024

XML files (text/xml) also allow JavaScript execution.

@tdonohue tdonohue changed the title Update configuration to always download HTML/JS Bitstreams (no inline display) Update configuration to always download HTML/JS/XML Bitstreams (no inline display) Jun 10, 2024
@tdonohue
Copy link
Member Author

I've updated this PR today to include the following:

  • Added XML (text/xml) to the list of formats to always download
  • Added automated tests to BitstreamRestControllerIT to verify that HTML, JS & XML are all downloaded by default. (This is to simply protect us and other sites from removing these configs in the future)

@mwoodiupui
Copy link
Member

I hate to drag this out further, but if we list text/xml then we probably also need application/xml. https://stackoverflow.com/q/4832357/2916377

…ly. Add a new wildcard setting to allow sites to force all files to download only.
@tdonohue tdonohue requested a review from mwoodiupui June 10, 2024 20:03
@tdonohue
Copy link
Member Author

@abollini and @mwoodiupui : I've made some changes to this PR today which I hope are improvements, so I'd appreciate a re-review. See the updated PR description for all the details...but here are the highlights:

  • I've hardcoded some file formats into Java code to avoid misconfiguration. While I don't tend to like hardcoded settings, I think it may be best to do so in this situation. For example, I cannot think of a use case where these settings should NOT be in place.
  • I've added a new wildcard (asterisk: *) option to disable inline opening for all files.
  • I've added more automated tests to verify everything is working as it should.

@tdonohue
Copy link
Member Author

tdonohue commented Jun 10, 2024

@mwoodiupui : Per your comment, we don't need to include application/xml in this PR as that's not a format in our DSpace Format Registry: https://github.com/DSpace/DSpace/blob/main/dspace/config/registries/bitstream-formats.xml

Any file format that is NOT in that registry will be downloaded as application/octet-stream. Browsers seem to always download application/octet-stream by default (regardless of what the actual file format is). But, I've also added some automated tests & minor code cleanup in my last few commits to force anything that comes back as application/octet-stream to use content-disposition: attachment (just to formally tell the browser we want it to keep downloading this unrecognized format)

@tdonohue
Copy link
Member Author

Final note for today: Because everything that comes back as application/octet-stream now uses content-disposition: attachment, we could now remove text/javascript from the registry (i.e. revert those changes from this PR). I don't believe it is needed anymore. But, I'll wait to do anything like that until I have further feedback on the directions of this PR.

(We could also wait until this week's Developer's Meeting to discuss all this in greater detail together.)

@tdonohue tdonohue requested a review from kshepherd June 13, 2024 14:14
Copy link
Member

@pnbecker pnbecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a code review. It looks good to me. I have one comment: while I understand why we have a hard-coded restriction on html, javascript and xml, we do not have a way to override this. There are probably repositories that want to be able to show html after properly reviewing it or by allowing very trusted users to submit anything. I think it would be less than one hour to introduce a new configuration property that allows to switch off this hardcoded list.

Seeing the timeline for DSpace 8 and the importance of this patch I'm +1.

Copy link
Member

@abollini abollini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with the current PR. Unfortunately, in DSpace 7+ we loose the possibility to serve properly html content that wourl require relative url to work to download css and additional content in the same item. So force the download is the safer and easier solution at this time. We could reconsider it when we have a full functional html support

Copy link
Member

@kshepherd kshepherd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 tested successfully and the code looks good to me.

Yes, it is a 'hard-coded' fix, but sets the safest defaults, and I think any implementer who is willing to take the risk (perhaps they have extra validation of uploaded html/xml/js) of serving this content inline could reasonably be expected to apply a patch that reverts this change for DSpace 8.x

@kshepherd kshepherd merged commit f1059b4 into DSpace:main Jun 17, 2024
22 checks passed
@dspace-bot
Copy link
Contributor

@tdonohue tdonohue removed the port to dspace-7_x This PR needs to be ported to `dspace-7_x` branch for next bug-fix release label Jun 17, 2024
@tdonohue tdonohue deleted the avoid_inline_js branch June 17, 2024 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1 APPROVAL pull request only requires a single approval to merge. bug high priority interface: REST API v7+ REST API for v7 and later (dspace-server-webapp module)
Projects
No open projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

7 participants