generated from amazon-archives/__template_Custom
-
Notifications
You must be signed in to change notification settings - Fork 186
Open
Labels
bugSomething isn't workingSomething isn't working
Description
What is the bug?
Failure to deploy a model due to version conflict exception. Suspecting that concurrent updates to multiple nodes step on each other w/o retrying when the exception occurs.
Exception
[2025-10-21T18:09:30,853][ERROR][o.o.m.a.d.TransportDeployModelAction] [opensearch-cluster-nodes-1] Failed to deploy model ROP0B5oBBqOXgeKlR2fw
org.opensearch.OpenSearchStatusException: Document version conflict updating ROP0B5oBBqOXgeKlR2fw in index .plugins-ml-model
at org.opensearch.remote.metadata.client.impl.LocalClusterIndicesClient.lambda$updateDataObjectAsync$2(LocalClusterIndicesClient.java:245) [opensearch-remote-metadata-sdk-3.3.0.0.jar:?]
at org.opensearch.core.action.ActionListener$1.onFailure(ActionListener.java:90) [opensearch-core-3.3.0.jar:3.3.0]
at org.opensearch.action.support.TransportAction$1.onFailure(TransportAction.java:124) [opensearch-3.3.0.jar:3.3.0]
at org.opensearch.action.update.TransportUpdateAction.lambda$innerExecute$0(TransportUpdateAction.java:207) [opensearch-3.3.0.jar:3.3.0]
at org.opensearch.core.action.ActionListener$1.onFailure(ActionListener.java:90) [opensearch-core-3.3.0.jar:3.3.0]
at org.opensearch.action.support.single.instance.TransportInstanceSingleOperationAction$AsyncSingleAction$1.handleException(TransportInstanceSingleOperationAction.java:245) [opensearch-3.3.0.jar:3.3.0]
at org.opensearch.security.transport.SecurityInterceptor$RestoringTransportResponseHandler.handleException(SecurityInterceptor.java:428) [opensearch-security-3.3.0.0.jar:3.3.0.0]
at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1607) [opensearch-3.3.0.jar:3.3.0]
at org.opensearch.transport.NativeMessageHandler.lambda$handleException$0(NativeMessageHandler.java:495) [opensearch-3.3.0.jar:3.3.0]
at org.opensearch.common.util.concurrent.OpenSearchExecutors$DirectExecutorService.execute(OpenSearchExecutors.java:341) [opensearch-3.3.0.jar:3.3.0]
at org.opensearch.transport.NativeMessageHandler.handleException(NativeMessageHandler.java:493) [opensearch-3.3.0.jar:3.3.0]
at org.opensearch.transport.NativeMessageHandler.handlerResponseError(NativeMessageHandler.java:485) [opensearch-3.3.0.jar:3.3.0]
at org.opensearch.transport.NativeMessageHandler.handleMessage(NativeMessageHandler.java:195) [opensearch-3.3.0.jar:3.3.0]
at org.opensearch.transport.NativeMessageHandler.messageReceived(NativeMessageHandler.java:149) [opensearch-3.3.0.jar:3.3.0]
at org.opensearch.transport.InboundHandler.messageReceivedFromPipeline(InboundHandler.java:152) [opensearch-3.3.0.jar:3.3.0]
at org.opensearch.transport.InboundHandler.inboundMessage(InboundHandler.java:144) [opensearch-3.3.0.jar:3.3.0]
at org.opensearch.transport.TcpTransport.inboundMessage(TcpTransport.java:804) [opensearch-3.3.0.jar:3.3.0]
at org.opensearch.transport.InboundBytesHandler.forwardFragments(InboundBytesHandler.java:137) [opensearch-3.3.0.jar:3.3.0]
at org.opensearch.transport.InboundBytesHandler.doHandleBytes(InboundBytesHandler.java:77) [opensearch-3.3.0.jar:3.3.0]
at org.opensearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:124) [opensearch-3.3.0.jar:3.3.0]
at org.opensearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:113) [opensearch-3.3.0.jar:3.3.0]
at org.opensearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:95) [transport-netty4-client-3.3.0.jar:3.3.0]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) [netty-transport-4.1.125.Final.jar:4.1.125.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.125.Final.jar:4.1.125.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) [netty-transport-4.1.125.Final.jar:4.1.125.Final]
at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:280) [netty-handler-4.1.125.Final.jar:4.1.125.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) [netty-transport-4.1.125.Final.jar:4.1.125.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.125.Final.jar:4.1.125.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) [netty-transport-4.1.125.Final.jar:4.1.125.Final]
at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1519) [netty-handler-4.1.125.Final.jar:4.1.125.Final]
at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1377) [netty-handler-4.1.125.Final.jar:4.1.125.Final]
at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1428) [netty-handler-4.1.125.Final.jar:4.1.125.Final]
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:530) [netty-codec-4.1.125.Final.jar:4.1.125.Final]
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:469) [netty-codec-4.1.125.Final.jar:4.1.125.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290) [netty-codec-4.1.125.Final.jar:4.1.125.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) [netty-transport-4.1.125.Final.jar:4.1.125.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.125.Final.jar:4.1.125.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) [netty-transport-4.1.125.Final.jar:4.1.125.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1357) [netty-transport-4.1.125.Final.jar:4.1.125.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) [netty-transport-4.1.125.Final.jar:4.1.125.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.125.Final.jar:4.1.125.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:868) [netty-transport-4.1.125.Final.jar:4.1.125.Final]
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) [netty-transport-4.1.125.Final.jar:4.1.125.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:796) [netty-transport-4.1.125.Final.jar:4.1.125.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:697) [netty-transport-4.1.125.Final.jar:4.1.125.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:660) [netty-transport-4.1.125.Final.jar:4.1.125.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) [netty-transport-4.1.125.Final.jar:4.1.125.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:998) [netty-common-4.1.125.Final.jar:4.1.125.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.125.Final.jar:4.1.125.Final]
at java.base/java.lang.Thread.run(Thread.java:1447) [?:?]
Caused by: org.opensearch.index.engine.VersionConflictEngineException: [ROP0B5oBBqOXgeKlR2fw]: version conflict, required seqNo [19], primary term [1]. current document has seqNo [20] and primary term [1]
How can one reproduce the bug?
- Give multiple nodes ml role. In our case it's 3 nodes with the same roles for each.
- Attempt to deploy a model from a mounted disk.
What is the expected behavior?
Model deployment is successful
What is your host/environment?
Ubuntu 24.04 in minikube
Do you have any screenshots?
N/A
Do you have any additional context?
N/A
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working