-
Notifications
You must be signed in to change notification settings - Fork 6
HBASE-28440: Add support for using mapreduce sort in HFileOutputFormat2 (not yet merged upstream) #197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…kup failures (will be in 2.6.4) Signed-off-by: Ray Mattingly <[email protected]> Co-authored-by: Ray Mattingly <[email protected]> Co-authored-by: Hernan Gelaf-Romer <[email protected]>
… only one replica found
…adlock (apache#7084) (apache#7122) (#195) Signed-off-by: Duo Zhang <[email protected]> Signed-off-by: Ray Mattingly <[email protected]> Signed-off-by: Wellington Ramos Chevreuil <[email protected]> Co-authored-by: Ray Mattingly <[email protected]> Co-authored-by: Hernan Gelaf-Romer <[email protected]>
…ottlingExceptions … (#196) * HBASE-29469 Add metrics with more detail for RpcThrottlingExceptions (apache#7214) Co-authored-by: skhillon <[email protected]> Signed-off by: cconnell <[email protected]> Reviewed by: kgeisz <[email protected]> * Removing unnecessary sanitization * Remove unnecessary tests --------- Co-authored-by: skhillon <[email protected]>
d3d35b3
to
f7d0b37
Compare
|
||
@Override | ||
public void setup(Context context) throws IOException { | ||
// do nothing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we should drop the do nothing
comment when/if this goes upstream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ope yeah, did so upstream but forgot to do it here
diskBasedSortingEnabled = | ||
HFileOutputFormat2.diskBasedSortingEnabled(context.getConfiguration()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe paranoid, but it could be worth a debug log indicating the gate status here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I'll add it in the underlying method so we log everytime we poll the config.
KeyValue kv = new KeyValue(key, 0, key.length, cell.getFamilyArray(), | ||
cell.getFamilyOffset(), cell.getFamilyLength(), cell.getQualifierArray(), | ||
cell.getQualifierOffset(), cell.getQualifierLength(), cell.getTimestamp(), | ||
KeyValue.Type.codeToType(cell.getType().getCode()), null, 0, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it worth propagating the cell value rather than null here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we actually specifically do not want to propagate the cell value to save on memory here. We want the MR to sort based on cell coordinates (key, cf, cq, ts, type), and we explicitly drop the value to incur unnecessary memory costs. Not that it matters much here if we do or don't, check out what happens when we call write on the key
@Override
public void write(DataOutput out) throws IOException {
int keyLen = PrivateCellUtil.estimatedSerializedSizeOfKey(kv);
int valueLen = 0; // We avoid writing value here. So just serialize as if an empty value. <--- value dropped
out.writeInt(keyLen + valueLen + KeyValue.KEYVALUE_INFRASTRUCTURE_SIZE);
out.writeInt(keyLen);
out.writeInt(valueLen);
PrivateCellUtil.writeFlatKey(kv, out);
}
Additionally, a vlength
of 0 (last arg in constructor) means the KeyValue will treat the value as non-existent.
f7d0b37
to
5ed63d7
Compare
…t2 (not yet merged upstream)
5ed63d7
to
6dc9e9d
Compare
public void readFields(DataInput in) throws IOException { | ||
cell = KeyValue.create(in); | ||
long seqId = in.readLong(); | ||
cell.setSequenceId(seqId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to avoid creating a new class, and wanted to use the CellWritableComparable but it write/read the sequenceId which caused issues with sorting. So this entire class exists to write the cell coordinates + the sequenceId
6690644
to
78bcd70
Compare
No description provided.