Skip to content

Conversation

hgromer
Copy link
Collaborator

@hgromer hgromer commented Sep 11, 2025

No description provided.

hgromer and others added 7 commits August 7, 2025 15:20
…kup failures (will be in 2.6.4)

Signed-off-by: Ray Mattingly <[email protected]>
Co-authored-by: Ray Mattingly <[email protected]>
Co-authored-by: Hernan Gelaf-Romer <[email protected]>
…adlock (apache#7084) (apache#7122) (#195)

Signed-off-by: Duo Zhang <[email protected]>
Signed-off-by: Ray Mattingly <[email protected]>
Signed-off-by: Wellington Ramos Chevreuil <[email protected]>
Co-authored-by: Ray Mattingly <[email protected]>
Co-authored-by: Hernan Gelaf-Romer <[email protected]>
…ottlingExceptions … (#196)

* HBASE-29469 Add metrics with more detail for RpcThrottlingExceptions (apache#7214)

Co-authored-by: skhillon <[email protected]>
Signed-off by: cconnell <[email protected]>
Reviewed by: kgeisz <[email protected]>

* Removing unnecessary sanitization

* Remove unnecessary tests

---------

Co-authored-by: skhillon <[email protected]>

@Override
public void setup(Context context) throws IOException {
// do nothing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we should drop the do nothing comment when/if this goes upstream

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ope yeah, did so upstream but forgot to do it here

Comment on lines +90 to +92
diskBasedSortingEnabled =
HFileOutputFormat2.diskBasedSortingEnabled(context.getConfiguration());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe paranoid, but it could be worth a debug log indicating the gate status here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I'll add it in the underlying method so we log everytime we poll the config.

KeyValue kv = new KeyValue(key, 0, key.length, cell.getFamilyArray(),
cell.getFamilyOffset(), cell.getFamilyLength(), cell.getQualifierArray(),
cell.getQualifierOffset(), cell.getQualifierLength(), cell.getTimestamp(),
KeyValue.Type.codeToType(cell.getType().getCode()), null, 0, 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worth propagating the cell value rather than null here?

Copy link
Collaborator Author

@hgromer hgromer Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we actually specifically do not want to propagate the cell value to save on memory here. We want the MR to sort based on cell coordinates (key, cf, cq, ts, type), and we explicitly drop the value to incur unnecessary memory costs. Not that it matters much here if we do or don't, check out what happens when we call write on the key

@Override
    public void write(DataOutput out) throws IOException {
      int keyLen = PrivateCellUtil.estimatedSerializedSizeOfKey(kv);
      int valueLen = 0; // We avoid writing value here. So just serialize as if an empty value. <--- value dropped 
      out.writeInt(keyLen + valueLen + KeyValue.KEYVALUE_INFRASTRUCTURE_SIZE);
      out.writeInt(keyLen);
      out.writeInt(valueLen);
      PrivateCellUtil.writeFlatKey(kv, out);
    }

Additionally, a vlength of 0 (last arg in constructor) means the KeyValue will treat the value as non-existent.

public void readFields(DataInput in) throws IOException {
cell = KeyValue.create(in);
long seqId = in.readLong();
cell.setSequenceId(seqId);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to avoid creating a new class, and wanted to use the CellWritableComparable but it write/read the sequenceId which caused issues with sorting. So this entire class exists to write the cell coordinates + the sequenceId

@hgromer hgromer merged commit a045e15 into hubspot-2.6 Sep 15, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants