Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache block fix #5

Open
wants to merge 74 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
74 commits
Select commit Hold shift + click to select a range
7822d91
Parallel Qiskit Aer (GPU + MPI) by using cache blocking transpiler (#…
doichanj Feb 4, 2021
b475f19
Adding gates to the MPS simulator (#1088)
yaelbh Feb 4, 2021
f869a36
adding qutip copyright to mc controller (#1124)
DanPuzzuoli Feb 10, 2021
d9d4593
Fix numpy ABI incompatibility when building with numpy 1.20 (#1125)
vvilpas Feb 10, 2021
cf8edab
Add new save expectation value instructions (#1101)
chriseclectic Feb 10, 2021
600806c
Add ``SaveStatevector`` and ``SaveDensityMatrix`` instructions (#1116)
chriseclectic Feb 11, 2021
d58737a
Add `SaveProbabilities` and `SaveProbabilitiesDict` instructions (#1117)
chriseclectic Feb 11, 2021
09383a8
Fix cache blocking diagonal matrix
doichanj Feb 16, 2021
c2908f9
fix blocking diagonal matrix
doichanj Feb 17, 2021
998d532
Pass CMAKE_GENERATOR_PLATFORM thorugh scikit_build in win32 builds (#…
vvilpas Feb 17, 2021
424ae67
correct block bits after cache block transpiler
doichanj Feb 18, 2021
63e33d3
Add tests for SaveExpectationValueVariance (#1140)
chriseclectic Feb 18, 2021
43a150c
Add `SaveAmplitudes` and `SaveAmplitudesSquared` instructions. (#1129)
chriseclectic Feb 18, 2021
b274872
No numpy install before CMake runs (#1142)
vvilpas Feb 18, 2021
2da63a2
Migrate windows CI to all be in github actions (#1137)
mtreinish Feb 18, 2021
940a8a6
Add _directive attr to SaveData and Snapshot instruction (#1139)
chriseclectic Feb 18, 2021
a5e9f5c
fix save_density_matrix
doichanj Feb 19, 2021
47dba5f
bit scaling of matrix state is not multiplied to num_qubits_ and chun…
doichanj Feb 19, 2021
a0da338
fix MPI compilation
doichanj Feb 19, 2021
185b28e
change MPI tests to multi chunk tests
doichanj Feb 19, 2021
6e34272
Merge remote-tracking branch 'upstream/master' into cache-block-fix
doichanj Feb 19, 2021
12e20d3
modify contribution document
doichanj Feb 19, 2021
70dc825
Implemented save_amplitudes
doichanj Feb 22, 2021
1415180
added MPI support for save_amplitudes
doichanj Feb 22, 2021
32be341
Start updating tests to use configurable simulator backend (#1150)
chriseclectic Feb 22, 2021
2e777bd
explicitly avoid omp use for experiments in serial execution (#1147)
hhorii Feb 22, 2021
0ece80f
Add SaveUnitary and SaveStabilizer instructions (#1136)
chriseclectic Feb 22, 2021
a2f29ee
Fix noise sampling for conditional gates (#1154)
chriseclectic Feb 24, 2021
5923392
Extended stabilizer simulator expval command (#1121)
gadial Feb 24, 2021
5b5658c
Pass some json utils function args by const reference (#1151)
vvilpas Feb 24, 2021
1a6d5df
change default max_memory_mb from half to full system memory (#1152)
hhorii Feb 25, 2021
9c30458
smalle fix for CI test
doichanj Feb 25, 2021
09fd95d
debug for statevector chunk state
doichanj Feb 25, 2021
6dd4708
Fix bug in StatevectorChunk::State::vec2density
doichanj Feb 25, 2021
9749e0f
delete debug message
doichanj Feb 25, 2021
666e99f
Merge remote-tracking branch 'upstream/master' into cache-block-fix
doichanj Feb 25, 2021
96e7cf0
merged with upstream/master
doichanj Feb 25, 2021
756af11
Fix multi-chunk diagonal matrix (#1155)
doichanj Feb 25, 2021
bf92091
resolve conflict
doichanj Feb 26, 2021
857f093
fix merge failure
doichanj Feb 26, 2021
4a41d44
fix again
doichanj Feb 26, 2021
1063501
Add arm64 release wheel jobs (#1162)
mtreinish Feb 26, 2021
892a6fd
Add default save instruction labels (#1161)
chriseclectic Feb 26, 2021
cc7b1af
Add pending deprecation warnings to snapshots (#1158)
chriseclectic Mar 2, 2021
dffb1d7
Fix out of bounds array access. (#1167)
vvilpas Mar 2, 2021
71c940b
Fix memory_leak due to shared_ptr circular references. (#1168)
vvilpas Mar 2, 2021
493b558
prepare for merge upstream
doichanj Mar 3, 2021
57591ee
merge upstream
doichanj Mar 3, 2021
56b44f4
remove debug message
doichanj Mar 3, 2021
bdb7718
remove comparing weak_ptr with nullptr
doichanj Mar 3, 2021
7d1f5b8
remove reset to weak_ptr
doichanj Mar 3, 2021
7267429
Fixed bug in sample_measure_using_probabilities (#1132)
merav-aharoni Mar 3, 2021
43d8307
Merge branch 'master' into cache-block-fix
vvilpas Mar 3, 2021
f8f3bb7
reflect review comments
doichanj Mar 4, 2021
12d18bc
Merge remote-tracking branch 'refs/remotes/origin/cache-block-fix' in…
doichanj Mar 4, 2021
ae36d65
fix expval_pauli for density matrix
doichanj Mar 4, 2021
90831eb
Fix density matrix expval_pauli (#1171)
chriseclectic Mar 4, 2021
5c2f507
Disable all warnings emitted from thrust headers (#1169)
vvilpas Mar 4, 2021
ba56322
merge upstream/master
doichanj Mar 5, 2021
11c5b8a
Remove previously deprecated methods (#1160)
chriseclectic Mar 5, 2021
5a72c02
Fix grammar, capitalization, text inconsistencies (#900)
RafeyIqbalRahman Mar 5, 2021
feb9fb2
Update README.md to mention Linux-only GPU support (#1095)
amirebrahimi Mar 5, 2021
3283d37
Fix density matrix chunk expval_pauli
doichanj Mar 8, 2021
07cc9b8
Fix statevector chunk expval_pauli
doichanj Mar 8, 2021
19e846c
Merge branch 'master' into cache-block-fix
doichanj Mar 8, 2021
7fe4f9e
Fix expval tests (#1173)
chriseclectic Mar 8, 2021
d69f7e9
Fix extended stabilizer method basis gates (#1175)
chriseclectic Mar 9, 2021
3d2575a
Update CODEOWNERS (#1174)
chriseclectic Mar 9, 2021
a41d5c7
Merge branch 'master' into cache-block-fix
chriseclectic Mar 9, 2021
46b24a6
Merge remote-tracking branch 'upstream/master' into cache-block-fix
doichanj Mar 10, 2021
af41169
Merge branch 'cache-block-fix' of github.com:doichanj/qiskit-aer into…
doichanj Mar 10, 2021
acd216d
Fixes of multi-chunk State implementation (#1149)
doichanj Mar 10, 2021
1994b35
Add Fusion variations (#1110)
hhorii Mar 10, 2021
58d44d1
Merge remote-tracking branch 'upstream/master' into cache-block-fix
doichanj Mar 11, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Fixes of multi-chunk State implementation (Qiskit#1149)
Co-authored-by: Victor Villar <vvilpas@gmail.com>
Co-authored-by: Christopher J. Wood <cjwood@us.ibm.com>
3 people authored Mar 10, 2021

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
commit acd216d040c0d9ec1161c82331820841cb13386f
3 changes: 3 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -681,7 +681,10 @@ This technique allows applying quantum gates to each chunk independently without
Before the actual simulation, we apply transpilation to remap the input circuits to the equivalent circuits that has all the quantum gates on the lower qubits than the chunk's number of qubits.
And the (noiseless) swap gates are inserted to exchange data.

Please refer to this paper (https://arxiv.org/abs/2102.02957) for more detailed algorithm and implementation of parallel simulation.

So to simulate by using multiple GPUs or multiple nodes on the cluster, following configurations should be set to backend options.
(If there is not enough memory to simulate the input circuit, Qiskit Aer automatically set following options, but it is recommended to explicitly set them)

- blocking_enable

55 changes: 55 additions & 0 deletions src/controllers/controller.hpp
Original file line number Diff line number Diff line change
@@ -51,6 +51,7 @@
#include "noise/noise_model.hpp"
#include "transpile/basic_opts.hpp"
#include "transpile/truncate_qubits.hpp"
#include "transpile/cacheblocking.hpp"

namespace AER {
namespace Base {
@@ -216,8 +217,19 @@ class Controller {
set_distributed_parallelization(const std::vector<Circuit> &circuits,
const std::vector<Noise::NoiseModel> &noise);

virtual bool multiple_chunk_required(const Circuit &circuit,
const Noise::NoiseModel &noise) const;

void save_exception_to_results(Result &result,const std::exception &e);


//setting cache blocking transpiler
Transpile::CacheBlocking transpile_cache_blocking(const Circuit& circ,
const Noise::NoiseModel& noise,
const json_t& config,
const size_t complex_size,bool is_matrix) const;


// Get system memory size
size_t get_system_memory_mb();
size_t get_gpu_memory_mb();
@@ -274,6 +286,8 @@ class Controller {
//process information (MPI)
int myrank_ = 0;
int num_processes_ = 1;

uint_t cache_block_qubit_ = 0;
};

//=========================================================================
@@ -348,6 +362,11 @@ void Controller::set_config(const json_t &config) {
JSON::get_value(accept_distributed_results_, "accept_distributed_results", config);
}

//enable multiple qregs if cache blocking is enabled
cache_block_qubit_ = 0;
if(JSON::check_key("blocking_qubits", config)){
JSON::get_value(cache_block_qubit_,"blocking_qubits", config);
}
}

void Controller::clear_config() {
@@ -535,6 +554,21 @@ uint_t Controller::get_distributed_num_processes(bool par_shots) const
}
}

bool Controller::multiple_chunk_required(const Circuit &circ,
const Noise::NoiseModel &noise) const
{
if(circ.num_qubits < 3)
return false;

if(num_process_per_experiment_ > 1 || Controller::get_min_memory_mb() < required_memory_mb(circ, noise))
return true;

if(cache_block_qubit_ >= 2 && cache_block_qubit_ < circ.num_qubits)
return true;

return false;
}

size_t Controller::get_system_memory_mb() {
size_t total_physical_memory = 0;
#if defined(__linux__) || defined(__APPLE__)
@@ -654,6 +688,27 @@ void Controller::save_exception_to_results(Result &result,const std::exception &
}
}

Transpile::CacheBlocking Controller::transpile_cache_blocking(const Circuit& circ,
const Noise::NoiseModel& noise,
const json_t& config,
const size_t complex_size,bool is_matrix) const
{
Transpile::CacheBlocking cache_block_pass;

cache_block_pass.set_config(config);
if(!cache_block_pass.enabled()){
//if blocking is not set by config, automatically set if required
if(multiple_chunk_required(circ,noise)){
int nplace = num_process_per_experiment_;
if(num_gpus_ > 0)
nplace *= num_gpus_;
cache_block_pass.set_blocking(circ.num_qubits, get_min_memory_mb() << 20, nplace, complex_size,is_matrix);
}
}

return cache_block_pass;
}

//-------------------------------------------------------------------------
// Qobj execution
//-------------------------------------------------------------------------
94 changes: 28 additions & 66 deletions src/controllers/qasm_controller.hpp
Original file line number Diff line number Diff line change
@@ -215,11 +215,6 @@ class QasmController : public Base::Controller {
const Operations::OpSet &opset,
const json_t& config) const;


Transpile::CacheBlocking transpile_cache_blocking(const Circuit& circ,
const Noise::NoiseModel& noise,
const json_t& config) const;

//----------------------------------------------------------------
// Run circuit helpers
//----------------------------------------------------------------
@@ -306,9 +301,6 @@ class QasmController : public Base::Controller {

// Controller-level parameter for CH method
bool extended_stabilizer_measure_sampling_ = false;

//using multiple chunks
bool multiple_qregs_ = false;
};

//=========================================================================
@@ -381,11 +373,6 @@ void QasmController::set_config(const json_t& config) {
"QasmController: initial_statevector is not a unit vector");
}
}

//enable multiple qregs if cache blocking is enabled
if(JSON::check_key("blocking_enable", config)){
JSON::get_value(multiple_qregs_,"blocking_enable", config);
}
}

void QasmController::clear_config() {
@@ -407,7 +394,7 @@ void QasmController::run_circuit(const Circuit& circ,
// Validate circuit for simulation method
switch (simulation_method(circ, noise, true)) {
case Method::statevector: {
if(multiple_qregs_){
if(Base::Controller::multiple_chunk_required(circ,noise)){
if (simulation_precision_ == Precision::double_precision) {
// Double-precision Statevector simulation
return run_circuit_helper<StatevectorChunk::State<QV::QubitVector<double>>>(
@@ -440,7 +427,7 @@ void QasmController::run_circuit(const Circuit& circ,
"QasmController: method statevector_gpu is not supported on this "
"system");
#else
if(multiple_qregs_ || (parallel_shots_ > 1 || parallel_experiments_ > 1)){
if(Base::Controller::multiple_chunk_required(circ,noise) || (parallel_shots_ > 1 || parallel_experiments_ > 1)){
if (simulation_precision_ == Precision::double_precision) {
// Double-precision Statevector simulation
return run_circuit_helper<
@@ -478,7 +465,7 @@ void QasmController::run_circuit(const Circuit& circ,
"QasmController: method statevector_thrust is not supported on this "
"system");
#else
if(multiple_qregs_){
if(Base::Controller::multiple_chunk_required(circ,noise)){
if (simulation_precision_ == Precision::double_precision) {
// Double-precision Statevector simulation
return run_circuit_helper<
@@ -511,7 +498,7 @@ void QasmController::run_circuit(const Circuit& circ,
#endif
}
case Method::density_matrix: {
if(multiple_qregs_){
if(Base::Controller::multiple_chunk_required(circ,noise)){
if (simulation_precision_ == Precision::double_precision) {
// Double-precision density matrix simulation
return run_circuit_helper<
@@ -548,7 +535,7 @@ void QasmController::run_circuit(const Circuit& circ,
"QasmController: method density_matrix_gpu is not supported on this "
"system");
#else
if(multiple_qregs_ || (parallel_shots_ > 1 || parallel_experiments_ > 1)){
if(Base::Controller::multiple_chunk_required(circ,noise) || (parallel_shots_ > 1 || parallel_experiments_ > 1)){
if (simulation_precision_ == Precision::double_precision) {
// Double-precision density matrix simulation
return run_circuit_helper<
@@ -586,7 +573,7 @@ void QasmController::run_circuit(const Circuit& circ,
"this "
"system");
#else
if(multiple_qregs_){
if(Base::Controller::multiple_chunk_required(circ,noise)){
if (simulation_precision_ == Precision::double_precision) {
// Double-precision density matrix simulation
return run_circuit_helper<
@@ -938,42 +925,6 @@ Transpile::Fusion QasmController::transpile_fusion(Method method,
return fusion_pass;
}

Transpile::CacheBlocking QasmController::transpile_cache_blocking(const Circuit& circ,
const Noise::NoiseModel& noise,
const json_t& config) const
{
Transpile::CacheBlocking cache_block_pass;

cache_block_pass.set_config(config);
if(!cache_block_pass.enabled()){
//if blocking is not set by config, automatically set if required
if(Base::Controller::num_process_per_experiment_ > 1 || Base::Controller::get_min_memory_mb() < required_memory_mb(circ, noise)){
int nplace = Base::Controller::num_process_per_experiment_;
if(Base::Controller::num_gpus_ > 0)
nplace *= Base::Controller::num_gpus_;

size_t complex_size = (simulation_precision_ == Precision::single_precision) ? sizeof(std::complex<float>) : sizeof(std::complex<double>);

switch (simulation_method(circ, noise, false)) {
case Method::statevector:
case Method::statevector_thrust_cpu:
case Method::statevector_thrust_gpu:
cache_block_pass.set_blocking(circ.num_qubits, Base::Controller::get_min_memory_mb() << 20, nplace, complex_size,false);
break;
case Method::density_matrix:
case Method::density_matrix_thrust_cpu:
case Method::density_matrix_thrust_gpu:
cache_block_pass.set_blocking(circ.num_qubits, Base::Controller::get_min_memory_mb() << 20, nplace, complex_size,true);
break;
default:
throw std::runtime_error("QasmController: No enough memory to simulate this method on the sysytem");
}
}
}

return cache_block_pass;
}

void QasmController::set_parallelization_circuit(
const Circuit& circ,
const Noise::NoiseModel& noise_model) {
@@ -1148,9 +1099,19 @@ void QasmController::run_circuit_helper(const Circuit& circ,
auto fusion_pass = transpile_fusion(method, opt_circ.opset(), config);
fusion_pass.optimize_circuit(opt_circ, dummy_noise, state.opset(), result);

auto cache_block_pass = transpile_cache_blocking(opt_circ,noise,config);
bool is_matrix = false;
if(method == Method::density_matrix || method == Method::density_matrix_thrust_gpu || method == Method::density_matrix_thrust_cpu)
is_matrix = true;
auto cache_block_pass = transpile_cache_blocking(opt_circ,noise,config,(simulation_precision_ == Precision::single_precision) ? sizeof(std::complex<float>) : sizeof(std::complex<double>),is_matrix);
cache_block_pass.optimize_circuit(opt_circ, dummy_noise, state.opset(), result);

uint_t block_bits = 0;
if(cache_block_pass.enabled())
block_bits = cache_block_pass.block_bits();

//allocate qubit register
state.allocate(Base::Controller::max_qubits_,block_bits);

// Run simulation
run_multi_shot(opt_circ, shots, state, initial_state, method, result, rng);
}
@@ -1179,9 +1140,6 @@ void QasmController::run_multi_shot(const Circuit& circ,
// Implement measure sampler
auto pos = circ.first_measure_pos; // Position of first measurement op

//allocate qubit register
state.allocate(Base::Controller::max_qubits_);

// Run circuit instructions before first measure
std::vector<Operations::Op> ops(circ.ops.begin(),
circ.ops.begin() + pos);
@@ -1197,9 +1155,6 @@ void QasmController::run_multi_shot(const Circuit& circ,
// Add measure sampling metadata
result.metadata.add(true, "measure_sampling");
} else {
//allocate qubit register
state.allocate(Base::Controller::max_qubits_);

// Perform standard execution if we cannot apply the
// measurement sampling optimization
while (shots-- > 0) {
@@ -1225,10 +1180,10 @@ void QasmController::run_circuit_with_sampled_noise(const Circuit& circ,
measure_pass.set_config(config);
Noise::NoiseModel dummy_noise;

auto cache_block_pass = transpile_cache_blocking(circ,noise,config);

//allocate qubit register
state.allocate(Base::Controller::max_qubits_);
bool is_matrix = false;
if(method == Method::density_matrix || method == Method::density_matrix_thrust_gpu || method == Method::density_matrix_thrust_cpu)
is_matrix = true;
auto cache_block_pass = transpile_cache_blocking(circ,noise,config,(simulation_precision_ == Precision::single_precision) ? sizeof(std::complex<float>) : sizeof(std::complex<double>),is_matrix);

// Sample noise using circuit method
while (shots-- > 0) {
@@ -1238,6 +1193,13 @@ void QasmController::run_circuit_with_sampled_noise(const Circuit& circ,
fusion_pass.optimize_circuit(noise_circ, dummy_noise, state.opset(), result);
cache_block_pass.optimize_circuit(noise_circ, dummy_noise, state.opset(), result);

uint_t block_bits = 0;
if(cache_block_pass.enabled())
block_bits = cache_block_pass.block_bits();

//allocate qubit register
state.allocate(Base::Controller::max_qubits_,block_bits);

run_single_shot(noise_circ, state, initial_state, result, rng);
}
}
Loading