I am a Distributed Systems Engineer at
which I joined in
August 2014. Before that, I completed my PhD at
College of Computer and Information Science
at Northeastern University
I was working with Gene Cooperman
in the High Performance Computing Lab.
Before coming to Northeastern, I did my Bachelor of Science from
Jai Narain Vyas University, Jodhpur, INDIA.
I can be reached by email at kapil "at" ccs.neu.edu.
User-Space Process Virtualization in the Context of Checkpoint-Restart and Virtual Machines
Operating Systems, High Performance Computing and related areas.
Current Research / Projects:
- Distributed Multi-Threaded Checkpointing (DMTCP)
- DMTCP is a tool for transparently checkpointing
the state of a distributed program spread across many machines without modifying the user's program or the operating system kernel.
- The checkpoint image can later be used to restore program in case of node/process failure or
can be migrated to another homogeneous system.
- Fast Reversible Debugger (FReD)
- FReD (Fast Reversible Debugger) is a new system that uses
temporal search automatically over the process lifetime to rapidly
travel back in time to an earlier point of interest.
- Two important components of FReD are deterministic replay and
checkpointing. Deterministic replay is a prerequisite for such a
system. Checkpoints are used to speed up the search.
- FReD can reversibly debug multithreaded applications.
- FReD also supports reverse expression watchpoints, a form of
temporal search within a process lifetime. For example, the current
value of a user specified expression indicates a bug. The user asks
FReD/gdb to go back in time to a statement where the expression is
about to take on its current value. This uses binary search: if n
statements have been executed, FReD finds the point in time using
only log2 n evaluations of the expression.
Click here for Previous projects
System-level Checkpoint-Restart for Petascale Computing
Jiajun Cao, Kapil Arya, Gene Cooperman, Rohan Garg, Khaled Hamidouche,
Shawn Matott, D.K. Panda, Jonathan Perkins, Hari Subramoni,
IEEE International Conference on Parallel and Distributed Systems
Design and Implementation for Checkpointing of Distributed Resources
using Process-level Virtualization.
Kapil Arya, Rohan Garg, Artem Polyakov, Gene Cooperman.
IEEE International Conference on Cluster Computing, (Cluster '16).
Extended Batch Sessions and Three-Phase Debugging:
Using DMTCP to Enhance the Batch Environment
Rohan Garg, Jiajun Cao, Kapil Arya, Gene Cooperman, Jerome Vienne
Annual Conference on Extreme Science and Engineering Discovery Environment, (XSEDE '16).
Miami, Florida, USA.
DMTCP: bringing interactive checkpoint-restart to Python.
Kapil Arya and Gene Cooperman
Scientific Computing with Python (SciPy'13), Computational Science & Discovery
User-Space Process Virtualization in the Context of Checkpoint-Restart and Virtual Machines (Ph.D. Thesis)
Boston, MA. August, 2014.
- Transparent Checkpoint-Restart over InfiniBand.
Jiajun Cao, Gregory Kerr, Kapil Arya and Gene Cooperman.
ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC'14).
- Tesseract: Reconciling Guest I/O and Hypervisor Swapping in a VM.
Kapil Arya, Yury Baskakov, and Alex Garthwaite.
ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE'14).
Salt Lake City, Utah, USA.
- Explorations of the Viability of ARM and Xeon Phi for Physics Processing.
David Abdurachmanov, Kapil Arya, Josh Bendavid, Tommaso Boccali,
Gene Cooperman, Andrea Dotti,
Peter Elmer, Giulio Eulisse,
Francesco Giacomini, Christopher D. Jones, Matteo Manzali,
International Conference on Computing in High Energy and Nuclear Physics (CHEP'13).
- Use of Checkpoint-Restart for Complex HEP Software on Traditional Architectures and Intel MIC.
Kapil Arya, Gene Cooperman, Andrea Dotti and Peter Elmer.
International Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT'13)..
- Semi-Automated Debugging via Binary Search through a Process Lifetime.
Kapil Arya, Tyler Denniston, Ana-Maria Visan, Gene Cooperman.
Workshop on Programming Languages and Operating Systems (PLOS '13).
- Towards Fault-Tolerant Energy-Efficient High Performance Computing in the Cloud.
Kurt L. Keville, Rohan Garg, David J. Yates, Kapil Arya, Gene Cooperman.
IEEE International Conference on Cluster Computing, (Cluster '12).
- URDB: A Universal Reversible
Debugger Based on Decomposing Debugging Histories.
Ana-Maria Visan, Kapil Arya, Gene Cooperman, and Tyler Denniston.
Workshop on Programming Languages and Operating Systems (PLOS '11).
- DMTCP: Transparent
Checkpointing for Cluster Computations and the Desktop.
Jason Ansel, Kapil Arya, and Gene Cooperman.
IEEE International Parallel and Distributed Processing Symposium
Rome, Italy. May, 2009.
- Detecting and Suppressing Redundant Input-Output Operations.
Alex Garthwaite, Maxime Austruy, Kapil Arya.
- Reducing Latency of Read I/O Operations in a VM Based on Prior Write Patterns.
Kapil Arya, Yury Baskakov, Alex Garthwaite.
- VMware, Inc. (Summer 2008 - Summer 2013)
MTS Intern, Virtual Machine Monitor Group, Research & Development
- Avidyne Corporation (Jan 2007 - Aug 2007)
Software Engineering Coop, Display Technologies Engineering group