Skip to content

Conversation

koute
Copy link

@koute koute commented Jun 24, 2024

Initial PVM test vectors/test suite.

This is still incomplete; not every instruction is covered yet and only very simple test cases were added. I will be expanding this aggressively.

Since we will still be making some changes (e.g. 64-bit support) I'll be explicitly versioning this, with a detailed changelog so that anyone who uses these tests can easily keep up.

@sourabhniyogi
Copy link
Contributor

@koute Can you kindly add test cases for the host functions of Appendix B.6/B.7/B.8?

Here are 4 groups, in priority order:

  1. LOOKUP, READ, WRITE, SOLICIT, HISTORICAL_LOOKUP, IMPORT, EXPORT [7]
  2. NEW, MACHINE, PEEK, POKE, INVOKE, TRANSFER [6]
  3. QUIT, INFO, GAS, CHECKPOINT, FORGET [5]
  4. EMPOWER, DESIGNATE, ASSIGN, UPGRADE, EXPUNGE [5]

The first group is DA-centric, the second group is service+VM setup/invocations -- the first 2 groups are valuable to connect to code up connections to state merklization + erasure coding, whereas the latter 2 groups can be done later as they are bookkeeping oriented and are easy to get right once we solve the first 2 groups.

Thank you!

@ec2
Copy link

ec2 commented Jul 8, 2024

How are these programs supposed to be consumed? The program blob parse expects things like the program to start with BLOB_MAGIC.

@koute
Copy link
Author

koute commented Jul 9, 2024

Can you kindly add test cases for the host functions of Appendix B.6/B.7/B.8?

For these initial test vectors the priority is to get basic tests for the instruction set ready. I will also add some host call tests later, but comprehensive test suite for all host calls is probably out-of-scope, at least for pure PVM tests.

How are these programs supposed to be consumed?

Take a look at the schema to see to which parameters of the Ψ equation from the Gray Paper they correspond to, and use them accordingly to test your own PVM implementation.

The program blob parse expects things like the program to start with BLOB_MAGIC.

Yes, my PolkaVM uses its own container format for the program blobs which the GP doesn't use. PolkaVM is not the source of truth for how a PVM should work, the GP is.

@ec2
Copy link

ec2 commented Jul 9, 2024

@koute Good point on the GP being the source of truth.

Take a look at the schema to see to which parameters of the Ψ equation from the Gray Paper they correspond to, and use them accordingly to test your own PVM implementation.

For context I'm building FFI bindings to PolkaVM. So would you say that these tests in particular are for folks who are implementing the PVM from scratch?

@sourabhniyogi
Copy link
Contributor

Since we will still be making some changes (e.g. 64-bit support) I'll be explicitly versioning this, with a detailed changelog so that anyone who uses these tests can easily keep up.

We passed all the test vectors you provided so far. What is the reason for the GP needing to support 32-bit registers while the contracts pallet should definitely aim for 64-bit? After you have 64-bit PVM engineered for contracts pallet shouldn't the GP be adjusted to be 64-bit?

@koute
Copy link
Author

koute commented Jul 13, 2024

So would you say that these tests in particular are for folks who are implementing the PVM from scratch?

Yes.

What is the reason for the GP needing to support 32-bit registers while the contracts pallet should definitely aim for 64-bit?

This is only temporary. We will be migrating GP to 64-bit too; we just need to first prototype the design in PolkaVM to make sure it's solid. (Otherwise we might end up with a design that looks good on paper but is bad in practice.) We're working on it right now.

(That said, the changes when migrating to 64-bit won't be huge - the registers will be extended and there will be a couple of new instructions, but that's about it as far as major changes go.)

As far as the instruction set and the core semantics are concerned, we aim to have both PolkaJAM and pallet-contracts in alignment and we're making effort to make sure they don't diverge. (With PolkaJAM having the priority here, but I believe we can support both with the same VM.)

@ec2
Copy link

ec2 commented Jul 17, 2024

In GP(A.1), the program is defined as follows:
image

E(|c|) is the SCALE compact integer encoding on the length of c. So in this test case, |c| should be 3 (8, 135, 9), so shouldn't E(|c|) be [12] instead of 3?

Edit: Looks like I misunderstood |k| = |c|. It seems like this is talking about the bit length of the mask being equal to the the byte length of the instructions rounded to 8. Encoding question still stands though.

@koute
Copy link
Author

koute commented Jul 18, 2024

@ec2 Where did you read that these are SCALE compact integers? These are not SCALE compact integers. From the GP:

e

If you look at this equation and crosscheck it with how parity-scale-codec encodes compact integers you can see that they're not the same.

This is a slightly different varint serialization format which:

  • only supports up to 64-bits,
  • is more efficient to decode,
  • encodes numbers which fit within 7-bits as if raw-encoded,
  • is uniform and more compact when dealing with small numbers (SCALE compact encoding uses one more byte to encode integers between 64..128 and 16384..2097152, but needs one less byte to encode integers between 268435456..1073741824).

@ec2
Copy link

ec2 commented Jul 18, 2024

@koute GP(Appendix I.3) says that E() is the SCALE encode function. I also did see the screenshot you posted from the GP.

I'm not super familiar with SCALE so I assumed that the screenshot just formally describes how SCALE does variable int encoding.

@ec2
Copy link

ec2 commented Jul 19, 2024

@koute Sorry to keep hounding you here! I think I found a discrepancy between the testcases and the GP.
The test case inst_move_reg.json tests move_reg (opcode 82).

According to the GP,
image

In the test, the supplied arg to move_reg is [121]. And so, r_A = min(12, 121%16) = 9 and r_D = min(12, 121/16) = 7. So the mutation will end up being reg[7] = reg[9].

The test case has only initial-regs[7] set to 1 and 0 elsewhere. And so the expected mutation is reg[7] = reg[9] = 0.

TLDR: I think the impl of PVM that made these test cases have the arguments for move_reg flipped.

@sourabhniyogi
Copy link
Contributor

I will also add some host call tests later, but comprehensive test suite for all host calls is probably out-of-scope, at least for pure PVM tests.

Alright, we don't want to interrupt your deep work but legend has it you implemented PVM in a day =) so if its not too much to ask ... could you give us the simplest "Jam Service" byte code (for a refine+accumulate) for us to implement many of the basic host functions? Given one good example we can probably fill in the rest and provide a few more back.

My idea of the simplest "Jam Service" byte code is to compute the sum of squares for a set of integer work items, like
Work Items in a Work Package: 5, 7, 9
Refine: squares the work items, exports 25, 49, 81
Accumulate: reads the result of refine ( 25, 49, 81 ) and writes to a service's storage
We can attempt to build the byte code by hand like it is 1964 but maybe you already have something "simple" like this that you can share?

If not, do you have a better recommendation for simplest "Jam Service"? Or, a strategy that is better than hand building byte code?

This sort of baby JAM test case will help teams get baby JAM implementations blood flowing, and set up a low V (like V=6) cluster complete with QUIC, erasure coding, Patricia Merkle Trie, BMT proofs, and so on.

@xlc
Copy link
Contributor

xlc commented Jul 30, 2024

I am confused about trap vs halt vs panic in PVM. In GP, the trap instruction will exit with the black square, so does the jump to 2^32-2^16 address. To my understanding, that is exit 0. But in the pvm testvector, trap is panic and the test of the trap instruction will result trap exit status but the inst_ret_halt test results halt. There is some inconsistency.

-- (called "panic" in the Graypaper)
-- the execution ended with a trap (the `trap` instruction was executed, the execution went "out of bounds", an invalid jump was made, or an invalid instruction was executed)
trap,

Another question. jump_ind is using djump which can be used to exit the program. But how about the one using branch? e.g. jump. What happen to jump into the exit address? panic or halt?

@koute
Copy link
Author

koute commented Jul 30, 2024

Sorry to keep hounding you here! I think I found a discrepancy between the testcases and the GP.

@ec2 Yes, indeed, there is. We will fix it soon. Thanks! We highly appreciate anyone who helps crosscheck these.

Alright, we don't want to interrupt your deep work but legend has it you implemented PVM in a day =) so if its not too much to ask ... could you give us the simplest "Jam Service" byte code (for a refine+accumulate) for us to implement many of the basic host functions?

@sourabhniyogi The rumors of my exploits seem to be grossly exaggerated; it was actually two days, not one. :P

Anyway, we will most likely put up some more tests out in the future, but for now if you quickly want something to test with then your best bet would be to build one yourself.

You don't have to build a blob by hand; you could use my work-in-progress PVM assembler. For example:

$ git clone https://github.com/koute/polkavm.git
$ cd polkavm
$ cargo run -p polkatool -- assemble tools/spectool/spec/src/inst_branch_greater_or_equal_signed_ok.txt -o output.polkavm
$ cargo run -p polkatool disassemble --show-raw-bytes output.polkavm

This will output the program in a PolkaVM-specific container (which is not part of the GP), but you can extract the code blob with a simple Rust program - use polkavm_common::program::ProgramParts::from_bytes to load the blob and then the code_and_jump_table field will have the raw program bytes.

I am confused about trap vs halt vs panic in PVM. In GP, the trap instruction will exit with the black square, so does the jump to 2^32-2^16 address.

@xlc

  • "halt" is meant to be a normal termination (dynamic jump to 0xffff0000)
  • "panic" (called a "trap" here) is meant to be an abnormal termination

Hm, you're right that the trap instruction in the GP is specified to halt instead of panicking; this should have been a panic instead. I'll see about correcting this; thanks.

@clw8998
Copy link

clw8998 commented Sep 3, 2024

Hello @koute , I recently encountered some issues while using your PVM.

Here’s my code:

pub @main:
    a1 = 0

When I use the following command to compile:

cargo run -p polkatool -- assemble ./test_txt_code/test.txt -o test.pvm

The bytecode content of test.pvm is as follows:

[80, 86, 77, 0, 1, 5, 7, 1, 0, 4, 109, 97, 105, 110, 6, 6, 0, 0, 2, 4, 8, 253, 0]

My question is, how do I extract the pure program portion as defined in GP_0.36(213), because it seems the first part contains some ASCII-encoded section names.

ASCII encoded section name:

[80, 86, 77, 0, 1, 5, 7, 1, 0, 4, 109, 97, 105, 110, 6, 6]
// [80, 86, 77] "PVM" in ASCII
// [109, 97, 105, 110] "main" in ASCII

GP_0.36(213) should be:

[0, 0, 2, 4, 8, 253, 0]

@emielsebastiaan
Copy link

With the current Graypaprer spec (0.6.1) the following test vector is incorrect.
https://graypaper.fluffylabs.dev/#/4bb8fd2/2a48012a7d01

Test vector

rem_s_64 (206)
Current Current incorrect calculation: -9223372036854775791 mod 7 = 2
Current incorrect output ω_9: 18446744073709551611
Correct output ω_9: 2

Z_8(ω_A) mod Z_8(ω_B)
Current incorrect calculation: -9223372036854775791 mod 7 = -5
Correct calculation: -9223372036854775791 mod 7 = 2

Analysis

The incorrect current output is explained because some programming languages (such as: RUST en C) provide a negative output for a modulo operation on a negative number.
Python on the other hand outputs a positive number for a modulo operation on a negative number.
In 'Maths', "the usual representative is the least positive residue, the smallest non-negative integer"

In mathematics, the result of the modulo operation is an equivalence class, and any member of the class may 
be chosen as representative; however, the usual representative is the least positive residue, the smallest 
non-negative integer that belongs to that class (i.e., the remainder of the Euclidean division).[2] 

However, other conventions are possible. Computers and calculators have various ways of storing and 
representing numbers; thus their definition of the modulo operation depends on the programming language 
or the underlying hardware. 

Source: https://en.wikipedia.org/wiki/Modulo

Possible solutions

  1. Test vector is incorrect with current GP-0.6.1 specification and should be corrected.
  2. Graypaper should explicitly state that modulo operations of negative numbers should be allowed to be negative.

In summary 'maths' shoud have priority over any implementation ambiguity, therefore as is solution 1 should be the way to go.

@koute
Copy link
Author

koute commented Feb 3, 2025

The incorrect current output is explained because some programming languages (such as: RUST en C) provide a negative output for a modulo operation on a negative number. [...] In summary 'maths' shoud have priority over any implementation ambiguity, therefore as is solution 1 should be the way to go.

No, it's explained because PVM is based on RISC-V, and that's how the RISC-V's (and also coincidentally how AMD64's) modulo instruction works. It has absolutely nothing to do with how the modulo operator works in any programming language.

Two of the main design principles of PVM are:

  1. we can use upstream RISC-V compilers without any modifications, so while we have some leeway because we postprocess the RISC-V code into PVM, ultimately we must keep the original RISC-V instruction semantics intact,
  2. so that it's easy to recompile PVM into native machine code, so the semantics of instructions should, in general, match how real hardware tends implements those instructions.

So the test vector here is correct and is what we want, and changing the semantics here as you suggest is not a good idea as it will provide no practical benefit while having significant practical downsides.

Anyway, thank you for bringing this ambiguity to our attention.

@emielsebastiaan
Copy link

Sure thanks, this is fine of course. But then GP should be adjusted to explicitly state that the modulo operator on a negative number yields a negative number, and not a positive number as expected by 'Maths'.

@clw8998
Copy link

clw8998 commented Feb 4, 2025

Test Case: inst_div_signed_64

When pc = 0, calling div_s_64 (204), we have:
(All numbers here are in decimal.)

  • $ω7 = 9223372036854775824$
  • $ω8 = 7$
  • $Z_8(ω7) = -9223372036854775792$
  • $Z_8(ω8) = 7$

Performing the division and applying the floor function:

$⌊ -9223372036854775792 / 7 ⌋$ = $⌊ -1317624576693539398.8571428571429 ⌋$ = -1317624576693539398

(However, it should be -1317624576693539399 after applying the floor function.)

  • $Z_8^{-1}(-1317624576693539398) = 17129119497016012218$
  • $Z_8^{-1}(-1317624576693539399) = 17129119497016012217$

Question:

  • The expected result after applying the floor function should be -1317624576693539399, but the test vector shows -1317624576693539398.
  • Since GP does not explicitly define the floor function? (not sure). I checked the floor function definition, but I am unsure whether this applies to GP.
  • A similar issue also occurs in riscv_rv64um_div.

Did I make a mistake somewhere?

@koute
Copy link
Author

koute commented Feb 5, 2025

@clw8998 I can confirm the test vector is correct here and the expected value is -1317624576693539398 (0xedb6db6db6db6dba). In this case the fractional part of the result should always be truncated because these are integer (non-floating point) division instructions.

To illustrate why let's pick some smaller numbers to make this more obvious. Let's try to divide a positive number first:

7 / 3 = 2.333 ~= 2
2 * 3 = 6

Now let's try flipping the sign:

-7 / 3 = -2.333 ~= -3
-3 * 3 = -9

vs

-7 / 3 = -2.333 ~= -2
-2 * 3 = -6

Notice that flipping the sign of one of the inputs doesn't change the numerical value of a / b * b (just its sign) if we use truncation.

You're right that mathematically floor does the (in this case) incorrect thing; we will fix the GP.

@daiagi
Copy link

daiagi commented Feb 28, 2025

These are in sync with which gp version?

@koute
Copy link
Author

koute commented Mar 3, 2025

These are in sync with which gp version?

Unless I missed something, should be in sync with the most recent GP 0.6.2.

@sourabhniyogi
Copy link
Contributor

sourabhniyogi commented Mar 9, 2025

@koute Several implementers consider looking at this to interpret polkavm disassembly as form of collusion:

image image image

Understanding the registry map was one of the first things we did to learn how building JAM Services in Rust with polkatool since this -- Its pretty important to have a mental model of the registry map and I believe a recent comment of yours caused people to believe they should never even think about looking a single line of polkavm code.

What do you think of their "collusion" claim?

@koute
Copy link
Author

koute commented Mar 10, 2025

@sourabhniyogi

No. The claim that this is collusion is nonsense; those are just standard RISC-V register names. But you do need to be careful if you're reading PolkaVM source code as reading some parts (but not all of it, as I already tried to explain in the JAM chat) can be considered against the rules of the JAM prize.

@dakk
Copy link

dakk commented May 25, 2025

@koute is it possible that riscv tests expect that the code is filled with ones instead of zeros? (A.4 from GP)

If I fill with ones I pass all the tests, while if I put zeros I fail riscvs. But if I put ones, I fail davxy traces tests. (davxy#45)

@koute
Copy link
Author

koute commented May 26, 2025

is it possible that riscv tests expect that the code is filled with ones instead of zeros? (A.4 from GP)

I'm afraid I don't know what you mean by "the code is filled with ones instead of zeros". By "code" do you mean the blob where the instructions opcodes + arguments are encoded? Or the bitmask? And by "filled with" do you mean what happens when you read out of bounds of the physical dimensions of either of these blobs? Can you explain in more detail or give me an exact equation number?

@dakk
Copy link

dakk commented May 26, 2025

I'm referring to A.4 as I said;

image

Are tests following this rule in the graypaper? Your tests on jampy fails if ζ is extended with zeros, while are passes if ζ is extended with ones.

@koute
Copy link
Author

koute commented May 26, 2025

Are tests following this rule in the graypaper? Your tests on jampy fails if ζ is extended with zeros, while are passes if ζ is extended with ones.

Yes. For example, here you have a test which depends on this behavior (panics since the execution goes out of bounds). If you'd fill it with ones instead then that test would run forever (assuming unlimited gas) because 1 is the opcode for the fallthrough instruction which wouldn't stop the execution.

@dakk
Copy link

dakk commented May 26, 2025

Are tests following this rule in the graypaper? Your tests on jampy fails if ζ is extended with zeros, while are passes if ζ is extended with ones.

Yes. For example, here you have a test which depends on this behavior (panics since the execution goes out of bounds). If you'd fill it with ones instead then that test would run forever (assuming unlimited gas) because 1 is the opcode for the fallthrough instruction which wouldn't stop the execution.

Ok, thank you for the clarification and for pointing out a proper example.

@AKJUS
Copy link

AKJUS commented Jun 9, 2025

aki

@0xjunha
Copy link

0xjunha commented Jun 17, 2025

@koute Recently I've revisited PVM test vectors and got several questions:

[PageFault cases]
In 9 test cases with PageFault exit reason,

  1. In GP, Ψ has expected_pc=ɩ' for PageFault (otherwise case), whereas test vectors have expected_pc=0 (eq A.1)
  2. Test vectors assume PageFault cases consume 2 gas units, but shouldn't it be 1 gas unit, since it exits after only executing 1 instruction?

[Post pc value on Panic or Halt]

  1. In GP, Ψ has expected_pc=0 for Panic or Halt, (eq A.1) whereas many test vectors which end with TRAP have non-zero expected_pc values

[TRAP instruction]

  1. It seems TRAP instruction doesn't alter pc value in test vectors, but in GP it seems it should be updated to ɩ' = ɩ + 1 + skip(ɩ) after executing Ψ1 and then in Ψ expected_pc should be set 0.

Am I missing something?

@koute
Copy link
Author

koute commented Jun 20, 2025

3. In GP, Ψ has expected_pc=0 for Panic or Halt, (eq A.1) whereas many test vectors which end with TRAP have non-zero expected_pc values

4. It seems TRAP instruction doesn't alter pc value in test vectors, but in GP it seems it should be updated to ɩ' = ɩ + 1 + skip(ɩ) after executing Ψ1 and then in Ψ expected_pc should be set 0.

This is done to lessen the requirements (it can be tricky to implement grabbing the guest-side program counter in a recompiler) and/or reduce the surface area for consensus bugs - since the program counter value shouldn't really be needed on-chain we've decided to just forcefully set it to zero in the protocol in cases where the execution should not be continue. (Although, to be fair, it might actually be a better idea to set it to 0xffffffff or something like that, since '0' can still be a valid program counter but 0xffffffff will practically never be)

But, for the purpose of testing your PVM it's still useful to know where exactly it's trapping, hence the test vectors don't have those set to zero.

1. In GP, Ψ has expected_pc=ɩ' for PageFault (otherwise case), whereas test vectors have expected_pc=0 (eq A.1)

Not changing the program counter is correct, otherwise it would skip the instruction after servicing the page fault, which doesn't make any sense. But yes, unless I'm missing something it seems like the A.1 equation should be corrected to not change the instruction pointer when page faulting or running out of gas.

2. Test vectors assume PageFault cases consume 2 gas units, but shouldn't it be 1 gas unit, since it exits after only executing 1 instruction?

Yes. This behavior is not yet specced in the GP, but for the final gas cost model we will be charging gas not per instruction but per each basic block (otherwise the overhead of the gas metering will be too high); I'm planning to add this to the GP soon-ish.

@0xjunha
Copy link

0xjunha commented Jun 20, 2025

Thank you for the clarification, that's really helpful!

@dakk
Copy link

dakk commented Aug 21, 2025

@koute after updating my implementation to 0.6.7 and above, I'm unable to pass your tests. The issue arises after implementing A.5 and A.19; are your tests aligned with this?

@koute
Copy link
Author

koute commented Aug 22, 2025

@koute after updating my implementation to 0.6.7 and above, I'm unable to pass your tests. The issue arises after implementing A.5 and A.19; are your tests aligned with this?

Can you be more specific about what's actually failing? These shouldn't fail after implementing the updated eq. A.5 and A.19, so either I made a whoopsie when updating those equations in the GP, or you implemented them incorrectly.

@dakk
Copy link

dakk commented Aug 22, 2025

@koute after updating my implementation to 0.6.7 and above, I'm unable to pass your tests. The issue arises after implementing A.5 and A.19; are your tests aligned with this?

Can you be more specific about what's actually failing? These shouldn't fail after implementing the updated eq. A.5 and A.19, so either I made a whoopsie when updating those equations in the GP, or you implemented them incorrectly.

I indeed implemented them incorrectly but I was asking if your tests were aligned with the latest version of the GP. I think A.19 is missing a boundary check, I should investigate more anyway and in that case I'll open an issue there. Thank you

@koute
Copy link
Author

koute commented Aug 22, 2025

I indeed implemented them incorrectly but I was asking if your tests were aligned with the latest version of the GP.

There should be nothing to align because these test programs shouldn't trigger the corner cases which those new equations are designed to handle.

@davxy
Copy link
Contributor

davxy commented Sep 9, 2025

@koute can we merge this?

@koute
Copy link
Author

koute commented Sep 10, 2025

@koute can we merge this?

Yes, I don't see why not; I can always made a new PR for any further modifications.

@vekexasia
Copy link

Hello, i successfully pass trace 1757406516 now but I noticed i had to change how I account pageFaults in instructions.

I have no idea why it was like this but apparently my codebase before the fix accounted for the IX gasCost + Trap GasCost when the IX caused a pagefault.

By doing so i am able to pass the pvm instructions checks from @koute here in this pr.

for example inst_store_imm_indirect_u16_with_offset_nok >= https://github.com/w3f/jamtestvectors/pull/3/files#diff-ca69518f98802d38482848f24f5d3431f0fe2fa744712da6a49e76a9253e4269R57

expects 2 gas to be consumed. only one IX is executed (the one which triggers pagefault).

I inspected the graypaper and apparently there is no special handling for when pagefault happens so i guess the current fuzzer implementation is ok... tossing it here so that it happens before the merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.