Skip to content

Conversation

gadial
Copy link
Contributor

@gadial gadial commented Mar 19, 2023

Summary

This PR handles some issues related to ExperimentData

  1. Fixing a bug in _add_job_data.
  2. Adding multi-upload capability to ExperimentData.save()
  3. Different provider handling to enable better data loading
  4. start_datetime and end_datetime are not being set at all, and creation_datetime and updated_datetime are being set only after loading the experiment from the server.

Details and comments

  1. Currently _add_job_data is adding the result of a job without explicitly supplying its job_id. While in the old qiskit-ibmq-provider it was ok, in the new qiskit-ibm-provider it seems the job id contained in the Result object is different than the job id of the actual job itself. Since ExperimentData keeps the original job id, the result is that for every submitted job, it ends up with two different ids: One of a seemingly unfinished job, and the second for a job which was seemingly never initiated. This PR addresses this issue by using the original job id whenever possible.

  2. ExperimentData.save() currently uploads both analysis results and figures one-by-one, with the result being inefficient which already affects other projects. This issue is handled in qiskit-ibm-experiments whose API was enlarged to allow multiple uploading of analysis results and figures; this PR enables this API usage in ExperimentData

  3. ExperimentData.load() currently takes the experiment_id and an IBMExperimentService object. This has two setbacks: First, IBMExperimentService should be transparent to the users as much as possible. Second, IBMExperimentService handles the resultDB data, but the job data stored by ExperimentData is handled by the IBMProvider. This issue can be fixed by allowing the provider to be passed as parameter to load() since the service can be obtained from the provider. This change also fixes ExperimentData uses deprecated backend.retrieve_job method #1093.

  4. start_datetime and end_datetime were not set by ExperimentData nor by the database itself. This PR makes the experiment data set start_datetime to the time it was created (unless another value is passed on creation; currently the BaseExperiment creates the experiment data right before beginning the experiment. Also, this PR makes every job update the end_datetime once it terminates. Along with that, calls to save() now update the values of creation_datetime and updated_datetime (which are set by the server). All the times are stored in UTC timezone, but the getters return them in local time, and the setters convert from local time to UTC.

  5. ExperimentData.save() did not raise error in case no database service was available. Now it raises an error if suppress_errors is False.

@gadial gadial changed the title [WIP] Experiment data fixes Experiment data fixes Mar 29, 2023
@gadial gadial requested review from coruscating and yaelbh March 29, 2023 09:10
Copy link
Collaborator

@coruscating coruscating left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments for now, will review more closely when we decide on end_datetime.


@staticmethod
def get_service_from_backend(backend):
def get_service_from_provider(provider):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should update the documentation that uses get_service_from_backend if you're changing this interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll keep both.

Comment on lines 966 to 967
# if self._result_data or not self._backend:
# return
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these commented out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's actually a nontrivial issue. _retrieve_data is called from ExperimentData.load() which first initializes a new object via the line expdata = cls(service=service, db_data=data, provider=provider) and then calls expdata._retrieve_data(). However, when initializing the expdata, the _result_data field is also initialized to an empty thread safe dict, so it seems _retrieve_data will never run. I don't see why this line is here.

@yaelbh
Copy link
Collaborator

yaelbh commented Mar 29, 2023

Release notes are missing.

@coruscating
Copy link
Collaborator

I ran a test experiment, but creation_datetime and updated_datetime of the experiment data object were not populated after saving:

child_exp1 = T1(physical_qubits=(3,), delays=np.arange(1e-6, 10e-5, 3e-5))
child_exp2 = StandardRB(physical_qubits=(0,1), lengths=np.arange(1,100,20), num_samples=10)
child_exp3 = StandardRB(physical_qubits=(4,5), lengths=np.arange(1,100,20), num_samples=10)
parallel_exp = ParallelExperiment([child_exp1, child_exp2], flatten_results=True)
batch_exp = BatchExperiment([parallel_exp, child_exp3], flatten_results=True)
parallel_data = batch_exp.run(backend, seed_simulator=101).block_for_results()
parallel_data.save()

# these are still None
print(parallel_data.creation_datetime)
print(parallel_data.updated_datetime)

Also, after loading the experiment data object from ResultsDB, the start_datetime has changed to be later than the start_datetime of the original experiment data object, such that creation_datetime is actually earlier than start_datetime in this test:

image

end_datetime is nearly the same, I assume the small deviation is due to the server and local clock time difference which is fine.

@gadial
Copy link
Contributor Author

gadial commented May 10, 2023

I ran a test experiment, but creation_datetime and updated_datetime of the experiment data object were not populated after saving:

You're not seeing creation_datetime updating probably because you need to update your version of qiskit-ibm-experiment to 0.3.1; I needed a minor fix there for this to work.

updated_datetime is a slightly different story because it is not returned from the server when a new experiment is created, so we'll need to change this one server-side. Try doing parallel_data.save() twice and you'll see the correct value for updated_datetime (given you're using qiskit-ibm-experiment 0.3.1).

Copy link
Collaborator

@coruscating coruscating left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing the issues. There are a few remaining problems when running a composite experiment with flatten_results=False:

  • The child experiments don't have end_datetimes while the parent experiment does
  • Only one child experiment will have updated_datetime while the other child experiments and the parent don't

These should be addressed in a follow-up PR.

@coruscating coruscating enabled auto-merge May 11, 2023 20:38
@coruscating coruscating added this pull request to the merge queue May 11, 2023
Merged via the queue into qiskit-community:main with commit 6a732f4 May 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ExperimentData uses deprecated backend.retrieve_job method
3 participants