Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content type (de)serialization #152

Merged
merged 6 commits into from
Apr 12, 2017

Conversation

errm
Copy link
Contributor

@errm errm commented Aug 20, 2015

Ok so this is my second bash at one of the ideas mooted in #126.

This allows ContentTypes to be configured like so:

Sneakers::ContentType.register(
  content_type: 'application/json',
  deserializer: ->(payload) { JSON.parse(payload) },
  serializer: ->(payload) { JSON.dump(payload) },
)

Then when messages arrive with the given content type metadata set the payload should be deserialized neatly.

Its also possible to add content_type: 'whatever/thing' to the worker options, in order that every message will be deserialized using the given type, irrespective of what the message metadata purports.

The publishing methods now also use the serializers when content_type is present in the options.

If nothing is registered everything just falls back to a noop pair of (de)serializers that just pass through the message unchanged, so this shouldn't break anyones existing applications.

@@ -52,10 +53,11 @@ def do_work(delivery_info, metadata, msg, handler)
metrics.increment("work.#{self.class.name}.started")
Timeout.timeout(@timeout_after, Timeout::Error) do
metrics.timing("work.#{self.class.name}.time") do
deserialized_msg = ContentType[@content_type || metadata[:content_type]].deserialize(msg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no docs on this method indicating metadata will always be present and it's not private so you might need to handle the metadata == nil case.

Separately I think there are cases where you would want to prefer the worker's content type over the message's and cases for the inverse. Might be worth refactoring that bit of logic into a method that respects a new preference set on the worker indicating which order to prefer.

@justinmills
Copy link
Contributor

This is a nifty idea. We've done something similar but with base worker classes that delegate from #work to a new method on the subclass. This is far more elegant.

@jondot
Copy link
Owner

jondot commented Aug 20, 2015

I agree! it looks elegant and in the spirit of the existing api.
Is there a way for users to opt out of the overhead, perhaps short-circuit it, in case they do some non-standard serde? and in case they already have code that does this which they want to keep until they migrate off to this solution?

@errm
Copy link
Contributor Author

errm commented Aug 20, 2015

Ok made some changes that it short circuits if no content types are registered. This halves the overhead, but at the cost of some elegance.

Before

require 'benchmark'
require 'sneakers/content_type'

msg = "foo"
itr = 10_000_000

def dummy(something)
  something
end

Benchmark.bmbm do |x|
  x.report("content_type lookup:") { itr.times { dummy(Sneakers::ContentType[nil].serialize(msg)) } }
  x.report("no lookup:") { itr.times { dummy(msg) } }
end
Rehearsal --------------------------------------------------------
content_type lookup:   3.550000   0.000000   3.550000 (  3.542536)
no lookup:             0.990000   0.000000   0.990000 (  0.998273)
----------------------------------------------- total: 4.540000sec

                           user     system      total        real
content_type lookup:   3.610000   0.000000   3.610000 (  3.606535)
no lookup:             0.990000   0.000000   0.990000 (  0.989097)

After

require 'benchmark'
require 'sneakers/content_type'

msg = "foo"
itr = 10_000_000

def dummy(something)
  something
end

Benchmark.bmbm do |x|
  x.report("content_type lookup:") { itr.times { dummy(Sneakers::ContentType.serialize(msg, nil)) } }
  x.report("no lookup:") { itr.times { dummy(msg) } }
end
Rehearsal --------------------------------------------------------
content_type lookup:   1.740000   0.000000   1.740000 (  1.741147)
no lookup:             0.990000   0.000000   0.990000 (  0.988361)
----------------------------------------------- total: 2.730000sec

                           user     system      total        real
content_type lookup:   1.760000   0.000000   1.760000 (  1.766063)
no lookup:             0.980000   0.000000   0.980000 (  0.986241)

@errm
Copy link
Contributor Author

errm commented Aug 20, 2015

Its a difference of 0.185μs, do we care about performance that much? We are basically optimising away 2 method dispatches ...

@errm
Copy link
Contributor Author

errm commented Aug 20, 2015

re @justin-yesware
I agree there is one more use case where you might want to usually take the content/type from the metadata, but then have a default on the worker. But on the whole i think it is not going to be a widely used option.

I prefer the simplicity of:

  • set it on the worker (or globally) explicitly and have the behaviour set in stone.
    --or--
  • Have workers that can magically work with messages with a number of serialisation formats, but the producers had better set the content_type to something that you expect if you want to do that.

In my experience I do want to spend some time making sure that my producers are formatting messages in a somewhat sane way, and wrapping some tests around that stuff rather than having to have the worker work to hard to work out what it is getting.

Having said that I don't feel overly strongly about the issue and am happy to add some option to flip the priority if we can think of a good name for it.

@justinmills
Copy link
Contributor

I don't feel strongly about it, just wanted to call it out. What you've done is the hard part so it should be easy to extend it if that use case ever comes up.

@errm
Copy link
Contributor Author

errm commented Aug 21, 2015

@jondot, @justin-yesware how do we feel about this now? Is this good enough or would you like me to explore some more ways of making it simpler to optionally include the feature...

If you guys are happy, shall I squash my commits?

@justinmills
Copy link
Contributor

Overall this looks great. Looking forward to picking this up next time we're writing json/thrift messages. One question about the default values in the #register method. What's the use case for a nil content type. Also, is the expectation to allow a nil serializer/deserializer if you plan to only use the publish/subscribe feature of sneakers? Might be worth some documentation indicating that. These are just nits though 😄.

@errm
Copy link
Contributor Author

errm commented Aug 23, 2015

TBH re .register I would have preferred to have content_type as a required argument, unfortunately it's not a supported thing in ruby 2.0 to do def foo(bar:) that didn't come in until 2.1.

It might be useful to add a guard condition like:

raise ArgumentError, 'missing keyword: content_type' unless content_type

I am not sure if we should do some validation on the serializer/deserializer:

  • Clearly if neither is supplied the registration is not useful, but would it be better to fail hard, or log a warning in this case.
  • In the case where only one is supplied, but then we attempt to use the other, what should we do? Fail or try and fall back to a passthough?
  • Both of the above scenarios would be avoided if both serializer and deserializer were required arguments, perhaps that is better, and not too much of an overhead for someone to setup?

@justinmills
Copy link
Contributor

Those are all very good points. I asked because the method declaration has content_type: nil explicitly in it, not so much because it could be nil. Re-reading the change, it's not clear that that default value for content_type is required (all callers seem to pass in a content_type value), so maybe it can be dropped? I realize that's not making it required, but at least Ruby would prevent you from calling #register without any args.

@gabrieljoelc
Copy link
Collaborator

Maybe it could include a couple shortcuts for some common ones:

ContentType.register_json
ContentType.register_xml

@errm
Copy link
Contributor Author

errm commented Mar 31, 2016

Hey @jondot I noticed this PR has been rotting for a while, how do you feel about it? I am happy to do some more work on it or give up ?

@errm
Copy link
Contributor Author

errm commented Jun 6, 2016

ping . . . .

@jondot
Copy link
Owner

jondot commented Jun 6, 2016

Hi @errm thanks for the wakeup call, I find that Github email notifications suck.. I must solve that some day.
I'll start going over issues and PRs now

@jondot
Copy link
Owner

jondot commented Jun 6, 2016

@errm I'm still very happy with this work!, let's get it merged. I'm working on a new documentation website, instead of the Wiki. Just so I know how to document this, what are the implicit things that happen here? For example, if I don't specify content_type at all - does it use json by default? (looks like it..?)

@errm
Copy link
Contributor Author

errm commented Jun 8, 2016

@jondot Sounds great, give me a ping if you wan't me to help/read anything documentation wise.

As far as what should happen implicitly - nothing. It's covered here: spec/sneakers/content_type_spec.rb:33. Basically internally the hash of ContentTypes is defaulted to a passthrough content type that has ->(payload) { payload } for the serializer and deserializer.

If you want to make use of this, it is required to explicitly register a content type, thus:

Sneakers::ContentType.register(
  content_type: 'application/json',
  deserializer: ->(payload) { JSON.parse(payload) },
  serializer: ->(payload) { JSON.dump(payload) },
)

Are you still supporting ruby 2.0.x if not we can loose a lot of the cruft from lib/sneakers/content_type.rb:4-14 I think 2.0.x is EOL now so it should be fine.

@errm errm force-pushed the content_type_serialization branch from 72f94d7 to e2a4cb6 Compare January 26, 2017 17:35
errm added 6 commits January 26, 2017 17:52
I was not really happy that trying to short circuit
and save a few method calls when the feature is
not in use only to impact on the feature itself.

This makes the whole thing a little more efficient
by switching to attr_reader and imposes a constant
cost whichever execution path is followed.
Short circuiting with a guard condition like this
does not seem to impose any significant cost to
the happy path.
@errm errm force-pushed the content_type_serialization branch from e2a4cb6 to a3b73be Compare January 26, 2017 17:55
@errm
Copy link
Contributor Author

errm commented Jan 26, 2017

ping ... rebased this again

@michaelklishin
Copy link
Collaborator

@jondot @gabrieljoelc do we have a consensus that this feature is worth including (worth the overhead, primarily)? I'm somewhat on the fence.

@michaelklishin
Copy link
Collaborator

@errm thank you for keeping this up-to-date.

@gabrieljoelc
Copy link
Collaborator

@michaelklishin do you have some specifics about your concerns?

@michaelklishin
Copy link
Collaborator

@gabrieljoelc I haven't noticed that the difference is for 10M iterations. So I guess this is good to go if we are willing to have this feature.

@gabrieljoelc @jondot should I merge?

@gabrieljoelc
Copy link
Collaborator

@michaelklishin are you thinking that this should be in a separate extension library?

@michaelklishin
Copy link
Collaborator

@gabrieljoelc this can be part of Sneakers proper.

@michaelklishin
Copy link
Collaborator

@gabrieljoelc @jondot so, should we merge this?

@gabrieljoelc
Copy link
Collaborator

gabrieljoelc commented Mar 7, 2017 via email

@michaelklishin michaelklishin merged commit 56cc08e into jondot:master Apr 12, 2017
@michaelklishin
Copy link
Collaborator

This is finally in.

@errm errm deleted the content_type_serialization branch June 14, 2017 09:41
@gabrieljoelc gabrieljoelc mentioned this pull request Aug 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants