To see abstracts click here. These papers are also available via direct ftp.

Journal Articles

Peng, J. and Williams, R. J. (1996).
Incremental multi-step Q-learning.
*Machine Learning, 22*, 283-290.

Peng, J. and Williams, R. J. (1993).
Efficient learning and planning within the Dyna framework.
*Adaptive Behavior, 2*, 437-454.

Williams, R. J. (1992).
Simple statistical gradient-following algorithms
for connectionist reinforcement learning.
*Machine Learning, 8*, 229-256.

Williams, R. J. and Peng, J. (1991).
Function optimization using connectionist
reinforcement learning algorithms.
*Connection Science, 3*, 241-268. [Figures not included]

Williams, R. J. and Peng, J. (1990).
An efficient gradient-based algorithm for on-line
training of recurrent network trajectories.
*Neural Computation, 2*, 490-501.

Williams, R. J. and Zipser, D. (1989).
A learning algorithm for continually running
fully recurrent neural networks.
*Neural Computation, 1*, 270-280.

Conference and Workshop Presentations

Al-Ansari, M. A. and Williams, R. J. (1999).
Efficient, globally-optimized reinforcement learning with the
parti-game algorithm.
*Advances in Neural Information Processing Systems 11*

Al-Ansari, M. A. and Williams, R. J. (1998).
Modifying the parti-game algorithm for increased robustness, higher
efficiency, and better policies.
*Proceedings of the Tenth Yale Workshop on Adaptive and Learning
Systems*, June, New Haven, CT, 204-209.

Peng, J. and Williams, R. J. (1994).
Incremental multi-step Q-learning.
*Proceedings of the Eleventh International Conference on
Machine Learning*, July, New Brunswick, NJ, 226-232.

Williams, R. J. and Baird, L. C., III (1994).
Tight performance bounds on greedy policies based on imperfect
value functions.
*Proceedings of the Eighth Yale Workshop on Adaptive
and Learning Systems*, June, New Haven, CT, 108-113.

Williams, R. J. (1992).
Training recurrent networks using the extended Kalman filter.
*Proceedings of the International Joint
Conference on Neural Networks*, June, Baltimore, MD, Vol. IV, 241-246.

Williams, R. J. and Baird, L. C., III (1990).
A mathematical analysis of actor-critic architectures for learning
optimal controls through incremental dynamic programming.
*Proceedings of the Sixth Yale Workshop on Adaptive
and Learning Systems*, August, New Haven, CT, 96-101.

Williams, R. J. and Peng, J. (1989).
Reinforcement learning algorithms as function optimizers.
*Proceedings of the International Joint
Conference on Neural Networks*, Washington, DC, Vol. II, 89-95.

Book Chapters

Williams, R. J. and Zipser, D. (1995).
Gradient-based learning algorithms for recurrent networks
and their computational complexity.
In: Y. Chauvin and D. E. Rumelhart (Eds.)
*Back-propagation: Theory, Architectures and Applications*,
Hillsdale, NJ: Erlbaum.
[Figures not included]

Miller, S. and Williams, R. J. (1995).
Temporal difference learning: A chemical process control application.
In: A. F. Murray (Ed.)
*Applications of Artificial Neural Networks*,
Norwell, MA: Kluwer.

Williams, R. J. (1990).
Adaptive state representation and estimation using recurrent
connectionist networks.
In: W. T. Miller, R. S. Sutton, and P. J. Werbos (Eds.)
*Neural Networks for Control*,
Cambridge: MIT Press/Bradford Books.

Technical Reports

Al-Ansari, M. A. and Williams, R. J. (1998). Modifying the parti-game algorithm for increased robustness, higher efficiency, and better policies. Technical Report NU-CCS-98-13. Boston: Northeastern University, College of Computer Science.

Williams, R. J. and Baird, L. C., III (1993). Tight performance bounds on greedy policies based on imperfect value functions. Technical Report NU-CCS-93-14. Boston: Northeastern University, College of Computer Science.

Williams, R. J. and Baird, L. C., III (1993). Analysis of some incremental variants of policy iteration: First steps toward understanding actor-critic learning systems. Technical Report NU-CCS-93-11. Boston: Northeastern University, College of Computer Science.

Williams, R. J. (1992). Some observations on the use of the extended Kalman filter as a recurrent network learning algorithm. Technical Report NU-CCS-92-1. Boston: Northeastern University, College of Computer Science.

Williams, R. J. and Zipser, D. (1990). Gradient-based learning algorithms for recurrent connectionist networks. (Technical Report NU-CCS-90-9). Boston: Northeastern University, College of Computer Science. [Figures not available]

Greene, R. L. and Williams, R. J. (1989). An approach to using rule-like training data in connectionist networks. Technical Report NU-CCS-89-30. Boston: Northeastern University, College of Computer Science. [Figures not available]

To see just the titles without abstracts click here. These papers are also available by direct ftp.

Journal Articles

Peng, J. and Williams, R. J. (1996).
Incremental multi-step Q-learning.
*Machine Learning, 22*, 283-290.

Peng, J. and Williams, R. J. (1993).
Efficient learning and planning within the Dyna framework.
*Adaptive Behavior, 2*, 437-454.

Williams, R. J. (1992).
Simple statistical gradient-following algorithms
for connectionist reinforcement learning.
*Machine Learning, 8*, 229-256.

Williams, R. J. and Peng, J. (1991).
Function optimization using connectionist
reinforcement learning algorithms.
*Connection Science, 3*, 241-268. [Figures not included]

Williams, R. J. and Peng, J. (1990).
An efficient gradient-based algorithm for on-line
training of recurrent network trajectories.
*Neural Computation, 2*, 490-501.

Williams, R. J. and Zipser, D. (1989).
A learning algorithm for continually running
fully recurrent neural networks.
*Neural Computation, 1*, 270-280.

Conference and Workshop Presentations

Al-Ansari, M. A. and Williams, R. J. (1999).
Efficient, globally-optimized reinforcement learning with the
parti-game algorithm.
*Advances in Neural Information Processing Systems 11*

Al-Ansari, M. A. and Williams, R. J. (1998).
Modifying the parti-game algorithm for increased robustness, higher
efficiency, and better policies.
*Proceedings of the Tenth Yale Workshop on Adaptive and Learning
Systems*, June, New Haven, CT, 204-209.

Peng, J. and Williams, R. J. (1994).
Incremental multi-step Q-learning.
*Proceedings of the Eleventh International Conference on
Machine Learning*, July, New Brunswick, NJ, 226-232.

Williams, R. J. and Baird, L. C., III (1994).
Tight performance bounds on greedy policies based on imperfect
value functions.
*Proceedings of the Eighth Yale Workshop on Adaptive
and Learning Systems*, June, New Haven, CT, 108-113.

Williams, R. J. (1992).
Training recurrent networks using the extended Kalman filter.
*Proceedings of the International Joint
Conference on Neural Networks*, June, Baltimore, MD, Vol. IV, 241-246.

Williams, R. J. and Baird, L. C., III (1990).
A mathematical analysis of actor-critic architectures for learning
optimal controls through incremental dynamic programming.
*Proceedings of the Sixth Yale Workshop on Adaptive
and Learning Systems*, August, New Haven, CT, 96-101.

Williams, R. J. and Peng, J. (1989).
Reinforcement learning algorithms as function optimizers.
*Proceedings of the International Joint
Conference on Neural Networks*, Washington, DC, Vol. II, 89-95.

Book Chapters

Williams, R. J. and Zipser, D. (1995).
Gradient-based learning algorithms for recurrent networks
and their computational complexity.
In: Y. Chauvin and D. E. Rumelhart (Eds.)
*Back-propagation: Theory, Architectures and Applications*,
Hillsdale, NJ: Erlbaum.
Also appeared as:
*Gradient-based learning algorithms for recurrent connectionist
networks*, (Technical Report NU-CCS-90-9).
Boston: Northeastern University, College of Computer Science.
[Figures not included]

Miller, S. and Williams, R. J. (1995).
Temporal difference learning: A chemical process control application.
In: A. F. Murray (Ed.)
*Applications of Artificial Neural Networks*,
Norwell, MA: Kluwer.

Williams, R. J. (1990).
Adaptive state representation and estimation using recurrent
connectionist networks.
In: W. T. Miller, R. S. Sutton, and P. J. Werbos (Eds.)
*Neural Networks for Control*,
Cambridge: MIT Press/Bradford Books.

Technical Reports

Al-Ansari, M. A. and Williams, R. J. (1998). Modifying the parti-game algorithm for increased robustness, higher efficiency, and better policies. Technical Report NU-CCS-98-13. Boston: Northeastern University, College of Computer Science.

Williams, R. J. and Baird, L. C., III (1993). Tight performance bounds on greedy policies based on imperfect value functions. Technical Report NU-CCS-93-14. Boston: Northeastern University, College of Computer Science.

Williams, R. J. and Baird, L. C., III (1993). Analysis of some incremental variants of policy iteration: First steps toward understanding actor-critic learning systems. Technical Report NU-CCS-93-11. Boston: Northeastern University, College of Computer Science.

Williams, R. J. (1992). Some observations on the use of the extended Kalman filter as a recurrent network learning algorithm. Technical Report NU-CCS-92-1. Boston: Northeastern University, College of Computer Science.

Williams, R. J. and Zipser, D. (1990). Gradient-based learning algorithms for recurrent connectionist networks. (Technical Report NU-CCS-90-9). Boston: Northeastern University, College of Computer Science. [Figures not available]

Greene, R. L. and Williams, R. J. (1989). An approach to using rule-like training data in connectionist networks. Technical Report NU-CCS-89-30. Boston: Northeastern University, College of Computer Science. [Figures not available]