Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 28 additions & 3 deletions spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
- [Fetching a Service Binding](#fetching-a-service-binding)
- [Unbinding](#unbinding)
- [Deprovisioning](#deprovisioning)
- [Corrupt Service Instances](#corrupt-service-instances)
- [Orphans](#orphans)

## API Overview
Expand Down Expand Up @@ -768,7 +769,10 @@ For success responses, the following fields are defined:
| Response Field | Type | Description |
| --- | --- | --- |
| state* | string | Valid values are `in progress`, `succeeded`, and `failed`. While `"state": "in progress"`, the Platform SHOULD continue polling. A response with `"state": "succeeded"` or `"state": "failed"` MUST cause the Platform to cease polling. |

| description | string | A user-facing message that can be used to tell the user details about the status of the operation. |
| status_code | number | The HTTP status code that would have been returned if the operation would have been executed synchronously. If the state is `failed` this field SHOULD be present and the value MUST be an integer in the range of 400 to 599. This field MUST NOT be present for any other state. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the line that I don't like: The HTTP status code that would have been returned if the operation would have been executed synchronously.. That suggests that only the HTTP codes defined for a synchronous response are permitted.

What about An HTTP code that describes why the operation failed.?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, how about we explicitly call out the status codes that can be returned? This would help platforms build a better UX around this:
200 OK / 400 Bad Request / 409 Conflict seem appropriate from the Provision section.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That suggests that only the HTTP codes defined for a synchronous response are permitted.

The sync case's status codes can be anything, not just what's listed in the table - I don't think we need to list all possible values since we don't in the sync case. As for your proposed alternative text in the first comment, I'm ok with it but that's because I don't see the difference :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this more... I do think it's important that the spec correlate the sync and async flows - meaning the status_code isn't just some random HTTP code - if the operation were invoked twice (once sync and once async) then typically people would see the same status_code in both cases, just in different locations/messages. W/o that I fear some people will not realize what we're trying to do and come up with different codes between the two flows.

| error | string | An error code as described in the [Service Broker Errors](#service-broker-errors) section. If present, MUST be a non-empty string. If the state is `failed` and there is an error code this field SHOULD be present. This field MUST NOT be present for any other state. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which error code are we actually expecting to get back here?
AsyncRequired = not applicable
ConcurrencyError = should be returned to the original request with a 422 as outlined in the Blocking Operations section (surely the broker can decide at that time whether or not the instance is already being operated on?)
RequiresApp = not applicable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have listed the codes the spec defines and none of them might be applicable here. But a broker can make up its own error code if the spec doesn’t define one.
But the main reason for this field is for parity with an error messages of a synchronous call.


\* Fields with an asterisk are REQUIRED.

Expand Down Expand Up @@ -1155,9 +1159,13 @@ $ curl http://username:password@service-broker-url/v2/service_instances/:instanc
| 400 Bad Request | MUST be returned if the request is malformed or missing mandatory data. |
| 422 Unprocessable entity | MUST be returned if the requested change is not supported or if the request cannot currently be fulfilled due to the state of the Service Instance (e.g. Service Instance utilization is over the quota of the requested plan). Additionally, a `422 Unprocessable Entity` MUST be returned if the Service Broker only supports asynchronous update for the requested plan and the request did not include `?accepts_incomplete=true`; in this case the response body MUST contain a error code `"AsyncRequired"` (see [Service Broker Errors](#service-broker-errors)). The error response MAY include a helpful error message in the `description` field such as `"This Service Plan requires client support for asynchronous service operations."`. |

Responses with any other status code MUST be interpreted as a failure.
When the response includes a 4xx status code, the Service Broker MUST NOT
apply any of the requested changes to the Service Instance.
apply any of the requested changes to the Service Instance and the
Service Instance MUST be in an unmodified and usable state.

Responses with any other status code MUST be interpreted as a failure.
The Service Instance MUST be considered corrupt and the Platform SHOULD NOT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have a section on corrupt instances that explains:

  • what the meaning of 'corrupt' is
  • when you should set instance_corrupt as a broker author
  • whether corrupt instances can be updated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "Responses with any other status code..." is kind of worrisome. If it said "Any other 5xx error..." then I agree. But what about some 3xx status code? If people agree I can open a new PR to address this since this isn't really part of this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fmui did you want to address these comments?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fmui ^^

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, does "any other" mean non-4xx or does it mean "not mentioned above in the table or previous paragraph" ? I think we need to clarify this.

Copy link
Contributor

@Samze Samze Nov 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry that if a Service Broker has an intermittent failure, and returns 500s for a period of time, then instances that receive update calls during this time will be marked as corrupt, leaving them unusable.

I would prefer the broker be explicit about when an instance is corrupt rather than inferring it. It seems kinda risky.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely agree with @Samze. I'm also starting to question how prevalent this problem is in the wild. Do service brokers ever leave instances as corrupt today? Do they allow this to happen or have they built preventative measures?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this call into question the entire orphan mitigation strategy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Samze I agree with you, but that would be an incompatible change. The PR just clarifies the current spec, it doesn't change it. The current spec says, a status code 500 must be interpreted as a failure. Following the spec, orphan mitigation must kick in. Depending on the platform, there is shorter or longer period of time between the failure and orphan mitigation. Within this timeframe, the platform must already today consider the service instance as unusable. Otherwise, why should it trigger the orphan mitigation process later? So, the PR just spells out that platform should treat this instance as broken.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattmcneeney There are brokers that update in multiple steps. If one step fails, they can’t move forwards or backwards and then they wait for orphan mitigation. Sure, they could immediately clean up, but from a platform perspective this is not any different.

allow the creation of new bindings.

#### Body

Expand Down Expand Up @@ -1599,8 +1607,13 @@ $ curl 'http://username:password@service-broker-url/v2/service_instances/:instan
| 410 Gone | MUST be returned if the Service Instance does not exist. |
| 422 Unprocessable Entity | MUST be returned if the Service Broker only supports asynchronous deprovisioning for the requested plan and the request did not include `?accepts_incomplete=true`. The response body MUST contain error code `"AsyncRequired"` (see [Service Broker Errors](#service-broker-errors)). The error response MAY include a helpful error message in the `description` field such as `"This Service Plan requires client support for asynchronous service operations."`. |

When the response includes a 4xx status code other than 410 Gone, the
Service Instance MUST be in an unmodified and usable state.

Responses with any other status code MUST be interpreted as a failure and the
Platform MUST remember the Service Instance.
Platform MUST remember the Service Instance. The Service Instance MUST be
considered corrupt and the Platform SHOULD NOT allow the creation of
new bindings.

#### Body

Expand All @@ -1616,6 +1629,18 @@ For success responses, the following fields are defined:
}
```

## Corrupt Service Instances

When an update or delete operation fails, the Service Instance MAY be corrupt.
A corrupt instance MAY be misconfigured, in an invalid state, not reachable, or
not working at all.
Platforms SHOULD NOT try to create bindings for this instance anymore.
Whether or not a corrupt instance can be repaired by, for example, updating it
again, is undefined.
Deprovisioning a corrupt instance SHOULD still be possible. A Platform MUST
remember the Service Instance until it is successfully deprovisioned or it has
been cleaned up as an orphan.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It really feels like we need some text here about the Platform performing orphan mitigation. What if we combine #598 into this PR so we get the text just right.


## Orphans

The Platform is the source of truth for Service Instances and Service Bindings.
Expand Down