Skip to content

Conversation

Z1Wu
Copy link
Contributor

@Z1Wu Z1Wu commented Sep 1, 2025

Why are the changes needed?

  • There are some SQLs need to access hbase such as SQL read from Hive-Hbase Storage Handler tables. And for proxy-user basd SparkSQLEngine, it is unable to renew delegation token by itself. It would be convenient if kyuubi can renew hbase delegation token.
  • Currently kyuubi server only supports updating hdfs and hms delegation token.

How was this patch tested?

unit test

Add a unit test which fetching delegation token from a hbase cluster created by in docker containter.

manual integrate test with hbase deps

compile with -Dhbase.deps.scope=compile

./build/dist --tgz --spark-provided --flink-provided --hive-provided --name spark3.5 -Pspark3.5 -Dhbase.deps.scope=compile
image

run kyuubi server with hbase related conf

hbase.zookeeper.quorum xxxx
hbase.security.authentication kerberos
hbase.master.kerberos.principal xxx
hbase.regionserver.kerberos.principal xxx

request hbase token

image

run sample sql

image

manual integrate test without hbase deps

compile without -Dhbase.deps.scope=compile

./build/dist --tgz --spark-provided --flink-provided --hive-provided --name spark3.5 -Pspark3.5
image

only fetch HMS token and hdfs token

image

run sample query

image

Was this patch authored or co-authored using generative AI tooling?

No

…xy spark applications which need to access secure hbase cluster
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>2.5.12-hadoop3</version>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember HBase has shaded-client, is it sufficient for requesting DT? I am afraid pulling Hive/HBase deps into the server classpath is the beginning of a nightmare ...

Copy link
Contributor Author

@Z1Wu Z1Wu Sep 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems ok to use this shaded hbase client.

<dependency>
    <groupId>org.apache.hbase</groupId>
    <artifactId>hbase-shaded-client</artifactId>
    <version>2.5.12-hadoop3</version>
</dependency>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

define version at root pom.xml's dependencyManagement

Copy link
Member

@pan3793 pan3793 Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, why is version 2.5.12 chosen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use 2.5.12 and 1.4.9 in our production environment. And 2.5.12 client seems able to fetch token from servers of those two versions.

doAsProxyUser(owner) {
try {
info(s"Getting hbase token for ${owner} ...")
val conn = ConnectionFactory.createConnection(hbaseConf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use reflection to obtain tokens similar to Spark, this will avoid direct introduction of dependencies.

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/security/HBaseDelegationTokenProvider.scala#L45-L54

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introducing HBase's dependencies directly is based on the following considerations :

  • Using reflection may reduce code readability and make unit testing more difficult to implement.
  • Furthermore, it increases the deployment complexity of the Kyuubi Server, as users need to place related HBase jars into the classpath of the Kyuubi Server.
  • The shaded hbase client package seems to have clean depdency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Z1Wu I think we'd better align with Spark, we can add a maven property hbase.scope, default value is provided, then the official release does not ship HBase jars, and users who use HBase can build with -Dhbase.scope=compile.

If this approach is adopted, we must ensure it won't throw linkage issues without HBase jars under the Kyuubi server's classpath

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable, I use property name hbase.deps.scope to align with spark.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested running Kyuubi server without HBase deps? Will it cause linkage error on initializing HBaseDelegationTokenProvider?

@Z1Wu
Copy link
Contributor Author

Z1Wu commented Sep 2, 2025

Failed unit tests in CI look confusing. I run that unit test locally without failure. How could I debug the problem in CI.

image image

@codecov-commenter
Copy link

codecov-commenter commented Sep 3, 2025

Codecov Report

❌ Patch coverage is 0% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 0.00%. Comparing base (ea75fa8) to head (3c61b4f).
⚠️ Report is 6 commits behind head on master.

Files with missing lines Patch % Lines
...ubi/credentials/HBaseDelegationTokenProvider.scala 0.00% 21 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff           @@
##           master   #7182   +/-   ##
======================================
  Coverage    0.00%   0.00%           
======================================
  Files         695     696    +1     
  Lines       43433   43500   +67     
  Branches     5887    5884    -3     
======================================
- Misses      43433   43500   +67     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Z1Wu
Copy link
Contributor Author

Z1Wu commented Sep 4, 2025

All failed unit tests have been fixed. Please review this patch at your convenience. Thank you.
@cxzl25 @pan3793 @zhouyifan279

final val HBASE_KERBEROS_REALM = "TEST.ORG"
final val HBASE_KERBEROS_PRINCIPAL = "hbase/localhost"
final val HBASE_KERBEROS_KEYTAB = "/opt/hbase/conf/hbase.keytab"
final val DOCKER_IMAGE_NAME = "z1wu97/kyuubi-hbase-cluster:latest"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this docker have a github repository

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must be either built in the CI workflow, or managed by Kyuubi PMC. I can help create an image after you freeze the Dockerfile

@pan3793 pan3793 changed the title [KYUUBI #7156] Support update hbase delegation token for Spark SQL Engine [KYUUBI #7156] Support update HBase delegation token for Spark SQL Engine Sep 5, 2025
override def initialize(hadoopConf: Configuration, kyuubiConf: KyuubiConf): Unit = {
this.kyuubiConf = kyuubiConf
this.hbaseConf = hadoopConf
this.tokenRequired = hbaseConf.get("hbase.security.authentication") == "kerberos"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the config value case sensitive? IIRC, hadoop.security.authentication is case insensitive

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. This config hbase.security.authentication is case insensitive.

@Z1Wu
Copy link
Contributor Author

Z1Wu commented Sep 10, 2025

The failed unit tests seem irrelevant with this patch. Please review again at your convenience. Thank you.
@cxzl25 @pan3793 @zhouyifan279

# --build-arg HBASE_VERSION="2.5.12" \
# --build-arg ZK_VERSION="3.8.4" \
# --file build/Dockerfile.HBase \
# --tag nekyuubi/kyuubi-hbase-cluster:<tag> \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will upload the image after you fix the style

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pan3793 All style related comment have been resolved, PTAL. And the failed unit tests seem irrelevant with the modification.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nekyuubi/kyuubi-hbase-cluster:latest is available on dockerhub

Copy link
Member

@pan3793 pan3793 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, @cxzl25 @zhouyifan279 do you want to take another look?

@zhouyifan279 zhouyifan279 self-requested a review September 14, 2025 03:59

override def initialize(hadoopConf: Configuration, kyuubiConf: KyuubiConf): Unit = {
this.kyuubiConf = kyuubiConf
this.hbaseConf = hadoopConf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hadoopConf does not contain HBase properties. HBase properties is in hbase-site.xml and needs to be loaded using class HBaseConfiguration.

Copy link
Contributor Author

@Z1Wu Z1Wu Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently I enable this feature by adding hbase related configurations in kyuubi-default.conf. And hadoop conf is initialized as below:

// org.apache.kyuubi.credentials.HadoopCredentialsManager#initialize
override def initialize(conf: KyuubiConf): Unit = {
    hadoopConf = KyuubiHadoopUtils.newHadoopConf(conf)
    providers = HadoopCredentialsManager.loadProviders(conf)
     .filter { case (_, provider) =>
        provider.initialize(hadoopConf, conf)
        val required = provider.delegationTokensRequired()
        if (!required) {
          warn(s"Service ${provider.serviceName} does not require a token." +
            s" Check your configuration to see if security is disabled or not." +
            s" If security is enabled, some configurations of ${provider.serviceName} " +
            s" might be missing, please check the configurations in " +
            s" https://kyuubi.readthedocs.io/en/master/security" +
            s"/hadoop_credentials_manager.html#required-security-configs")
          provider.close()
        }
        required
      }
} 

doAsProxyUser(owner) {
try {
info(s"Getting HBase delegation token for ${owner} ...")
val conn = ConnectionFactory.createConnection(hbaseConf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Connection should be closed after obtaining token.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

I have re-read Spark's implementation of calling Hbase, and there is a close connection.

org.apache.hadoop.hbase.security.token.TokenUtil#obtainToken(org.apache.hadoop.conf.Configuration)

    try (Connection connection = ConnectionFactory.createConnection(conf)) {
      return obtainToken(connection);
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, fixed by referring to Spark's implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants