Skip to content

Conversation

@sundeepn
Copy link
Contributor

Fix to ensure tablenames for multi-insert/partitioned cached table get reflected on the shark UI.

@harveyfeng
Copy link
Member

Hi Sundeep, the current Shark master doesn't include support for partitioned cached tables.
"insert into" commands that involve UnionRDDs in MemoryStoreSinkOperator are appends to a single, non-partitioned table.
It seems like this patch tracks how many sequential appends (i.e., "insert into"s) have been done to each table, but doesn't account for new RDDs created by interleaved "insert overwrite"s - those RDDs are assigned the table name.

@sundeepn
Copy link
Contributor Author

Hi Harvey, The current patch is meant to allow users to track the storage/memory usage on Shark Storage UI per table as opposed to 'rdd_###'. Inserts/overwrites to the cached tables render the current Storage UI quite hard to follow.

It does not handle drop parititions and overwrites in any special way, but it does guarantee that each block of data is identified by a unique number and has the table name associated with it on the UI.

I am planning on submitting another patch once we have partition support that has naming conventions derived from hive's partition information.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@harveyfeng
Copy link
Member

Yeah, the storage UI is a bit confusing right now :(
Assigning unique IDs to RDDs created from "insert into" definitely helps, but is there a way to assign unique identifiers to RDDs created from "insert overwrite", and possibly distinguish between valid or invalid RDDs? For example, right now it seems like five "insert overwrite" commands will result in five RDDs displayed under the same (table) name.
One way might be to mark overwritten RDDs with something like "stale_table-name".

@sundeepn
Copy link
Contributor Author

Based on hive's documentation, shouldn't the insert overwrite on table unpersist the existing RDDs? (partitions just unpersist the overwritten partitions). If this is the case, I can push a fix on that front.

@harveyfeng
Copy link
Member

Yeah, that sounds good - created a ticket for that here: https://spark-project.atlassian.net/browse/SHARK-202.
Could you assign yourself to it? :)

@sundeepn
Copy link
Contributor Author

Sure. I do not seem to have permissions to assign myself the ticket. If you can help with that, I will take on the ticket. :)

@harveyfeng
Copy link
Member

Done - assigned it to you. Thx!

@harveyfeng
Copy link
Member

Oh, it looks like the assignments were concurrent....

@rxin
Copy link
Member

rxin commented Nov 1, 2013

What's the status of this pr?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants