 |
Projects & Research |
|
The following are projects from a mix of peer-reviewed publications, thesis
work, class work, professional projects, and pet projects. For a formal
listing of work, see my resume [pdf]. The
catoragization used below is fuzzy and many of the projects involve techniques
from several areas.
Note: This listing is somewhat out of date. The last update was during
the fall semester of 2005.
|
Search and Optimization
Machine Learning and Data Mining
Visualization
Other
Presentation
|
This is my primary current research project and tentative PhD thesis topic. See my Current Status for some details. You can also refer to a set of Presentation Slides on the topic.
We propose a system for determining the right algorithm and its
parameters given a specific problem to solve. This method is
particularly useful when: there are many possible algorithm variants for
a single problem; the appropriate algorithm is dependent on the actual
problem instance being solved; there are no simple rules mapping
problem instance to algorithm variant; and the algorithms are
dependent on implementation and computing architecture. The system
involves methods from multiobjective optimization, case based
reasoning, and supervised learning. The process of
constructing the automatic algorithm selection system is highlighted with an
illustrative example: the high performance solving of linear systems of
equations. |
|
Report
|
A comparison of 19 multiobjective evolutionary algorithms (MOEAs). These algorithms are useful in optimizing several, usually conflicting, objectives simultaneously. Unlike more traditional multiobjective optimization techniques, MOEAs work well even when the Pareto frontier of optimal solutions is non-convex, discontinuous, or ortherwise poorly behaved.
The reviwed algorithms include: NPGA, NPGA II, NSGA, NSGA II, SPEA, SPEA2, SPEA2+, ISPEA, PAES, M-PAES, PESA, PESA II, Micro-GA, Micro-GA2, MPGA, CAEP, MOPSO, and ParEGO |
|
Report
|
Peter G. Anderson, Jonathan S. Arney, Samuel A. Inverso, Daniel R. Kunkle, Timothy M. Lebo, and Chadd Merrigan. A Genetic Algorithm Search for Improved Halftone Masks. Artificial Neural Networks in Engineering, 2003.
We present a genetic algorithm that automatically generates halftone masks optimized for use in specific printing systems. The search is guided by a single figure of merit based on a detailed model of the printing process and the human visual system. Our experiments show that genetic algorithms are effective in finding improved halftone masks and that two methods of reducing the search space to particular subsets of possible halftone masks greatly enhance the performance of the GA. |
|
Report
|
We employ genetic algorithms to find a set of state transition rules for cellular automata that lead to global synchronization. This is an extension of research conducted by the Evolving Cellular Automata (EvCA) group at the Santa Fe Institute. |
|
Report
|
This paper describes methods for evolving 2-D cellular automata to perform global computations. This is a difficult task because global behaviors must arise from local computations of many parallel cells. We present the results of numerous tests involving different genetic algorithm methods to perform the 2-D equivalent of classic 1-D CA tasks, including density classification and synchronization, and our own 2-D CA balanced surface minimization task. The performance of the GA was improved greatly by the use of totalistic CA rule tables, increasing the fidelity of fitness functions, and with coevolutionary techniques.
Note: this project is an extension of the above project "Searching for a Synchronizing 2-D Cellular Automata" and is co-authored by Samuel Inverso and Chadd Merrigan. |
|
Report
|
This is an extension of the Ordered Greed technique introduced by Anderson and Gustafson in [1]. The Ordered Greed algorithm (OG) is a hybrid greedy algorithm using permutations. OG is applicable to many classical optimization problems, including job assignment, graph coloring, bin packing, and satisfiability. Here, OG is applied to a variant of the N Queens problem, the N SuperQueens problem. The performance of the OG method and that of the Plain Permutation GA in solving the N SuperQueens problem are compared. The results here support those found in [1], that the OG method is far superior to the Plain Permutation GA.
[1] Peter G. Anderson and William T. Gustafson. Ordered Greed. |
|
Report
|
In general, this paper is about self-organizing systems (which are responsible for most, if not all of the complex order we see in our universe) and how the science of self-organization can be applied to computation and information systems. In specific, this paper will examine ant systems and algorithms as an example of a self-organizing computation system. Further, the ant algorithm will be applied to a classic optimization algorithm benchmark, the Traveling Salesman Problem (TSP). |
|
Report
|
We explore the effects of varying GA parameters such as mutation rate and population size. A MATLAB implementation of a generation based GA is also given. |
|
Report
|
The 8 Puzzle is a simple game, but one with a state space large enough to warrant the use of heuristic search, as opposed to an exhaustive or blind search. The A* algorithm is applied, guaranteeing that the best solution (that with the least number of moves) will be found. Heuristics are examined to allow the algorithm to find the optimal solution while examining as few states as possible (maximizing the informedness of the heuristic). |
|
Report
|
Master's Thesis in Computer Science, Rochester Institute of Technology, 2003.
Abstract: Cellular automata, a class of discrete dynamical systems, show a wide range of dynamic behavior, some very complex despite the simplicity of the system's definition. This range of behavior can be organized into six classes, according to the Li-Packard system: null, fixed point, two-cycle, periodic, complex, and chaotic. An advanced method for automatically classifying cellular automata into these six classes is presented. Seven parameters were used for automatic classification, six from existing literature and one newly presented. These seven parameters were used in conjunction with neural networks to automatically classify an average of 98.3% of elementary cellular automata and 93.9% of totalistic k=2 r=3 cellular automata. In addition, the seven parameters were ranked based on their effectiveness in classifying cellular automata into the six Li-Packard classes. |
|
Report
|
|
G. Cooperman, D. Zhang, and D. Kunkle. “On Mining Max Frequent Generalized Itemsets”, submitted.
A fundamental task of data mining is to mine frequent itemsets. Since the number of frequent itemsets may be large, a compact
representation, namely the max frequent itemsets, has been introduced.
On the other hand, the concept of generalized itemsets was proposed.
Here, the items form a taxonomy. Although the transactional database
only contains items in the leaf level of the taxonomy, a generalized itemset may contain some non-leaf generalized items. Naturally, an interesting
question arises: can we combine the two concepts and develop algorithms
to mine max frequent generalized itemsets? This is a compact representation of the set of all frequent generalized itemsets. To the best of our
knowledge, this paper is the first work that efficiently solves this new
problem. Our solution has the following components: a conceptual classification tree, the algorithm MFGI class that dynamically generates
the needed part of the conceptual classification tree by applying three
pruning techniques, an online method for checking false positives, and an
optimization technique called PHDB that batch-compute frequencies.
Besides market-basket analysis, our technique has a vast range of other
applications. For instance, identifying associations among (categories of )
diseases, identifying associations among (groups of ) occupations, and so on. Experimental results demonstrate the efficiency of our approach. |
|
Report
|
G. Cooperman, D. Zhang, and D. Kunkle. “On Mining Essential Generalized Association Rules”, submitted.
A recent direction of association rule mining is to mine
generalized association rules. Here, the items form a taxonomy, and “generalized items”, which correspond to internal nodes of the taxonomy, may appear in generalized
itemsets and association rules. Since the number of strong
generalized association rules are large, compact representations become especially important. In ordinary rule mining,
there exist good compact representations, e.g. the essential
rules. However, in the generalized environment there are
special challenges not present in the ordinary cases. Nevertheless, the new problem has important applications. As
an example, in the market-basket example, we now can find
associations among categories of items, rather than being
restricted to the associations among individual items. Besides market-basket analysis, our methods have a vast range
of other applications as well. Examples include identifying
associations among (categories of) diseases, identifying associations among (groups of) occupations, and so on. This
paper presents, for the first time, algorithms to mine essential generalized rules. A lattice-based algorithm and a
classification-based algorithm are proposed and compared
experimentally. |
|
Report
|
Pulsed neural networks are networks of spiking neurons, which represent an entirely new class of artificial neurons. Here we present an overview of pulsed neural networks, including the structure, function and available training mechanisms for networks of spiking neurons. We highlight differences between this model, “first generation” threshold gates, and “second generation” sigmoid activation gates, and we examine current research into pulsed neural networks and the use of temporal information in neural processing. Lastly, we summarize our own research toward the end of using pulsed neural networks to identify computer users by the cadence of their keystrokes. |
|
Report
|
The empirical time complexities of several algorithms for finding the Length of Longest Common Subsequence (LLCS) and the actual Longest Common Subsequence (LCS) of two sequences are examined. These algorithms include a naive recursive algorithm, a recursive method with memoization, dynamic programming, and the Hirschberg LCS algorithm. The empirical time complexities are compared to theoretical time complexities in order to find the "hidden" constant ignored by the theoretical limits. Further, the effects of sequence structure and alphabet size on empirical time complexity are examined. |
|
Report
|
|
Randomized algorithms are now common in nearly all areas of
computer science. The ability to make random choices has been shown to
both simplify algorithms and improve their running times. The success
of randomized algorithms in efficiently solving problems for which no
known deterministic algorithm exists has fueled theoretical research
in the area of probabilistic computation. We present the foundations
and some current research related to the theory of randomized computation.
Particularly, we: (1) explore the effects of randomness on
computational power; (2) present a hierarchy of randomized complexity
classes in relation to the traditional complexity classes; (3) discuss
probabilistic models of computation and (4) summarize current
research in the derandomization of randomized algorithms and complexity
classes. |
|
Report
|
Guerin, S. and Kunkle, D. (2004) Emergence of constraint in self-organizing systems. Journal of Nonlinear Dynamics, Psychology, and Life Sciences, Vol. 8, No. 2, April, 2004.
Abstract: Practitioners of agent-based modeling are often tasked to
model or design self-organizing systems while the theoretical foundation
of self-organization in science remains to be set. This paper explores
self-organization in the context of an agent-based model of ant colony
food foraging. We gather specific measures of order-creation and
constraint construction particular to leading theories of nonequilibrium
thermodynamics that purport to govern self-organizing dynamics. These
measures are used to explore three claims: (a) Constraints are
constructed from entropy-producing processes in the bootstrapping
phase of self-organizing systems; (b) positive feedback loops are critical
in the structure formation phase; and (c) constraints tend to decay. The
continued presence of far-from-equilibrium boundary conditions are
required to reinforce constraints in the maintenance phase.
Key Words: constraint, self-organization, entropy, ant simulation, agent-
based modeling. |
|
Report
|
Gambhir, M., Guerin, S., Kauffman, S., Kunkle, D. (2004) Steps toward a possible theory of organization. In: Proceedings of International Conference on Complex Systems 2004. Boston, MA.
We anticipate a theory of organization to generalize the common structuring processes
present in Rayleigh-Bénard convection, lasers, cellular slime molds, immune systems,
genetic regulatory networks, neural systems, and social insect systems. A robust theory
should also apply to systems of more commercial interest including firms, supply webs,
financial markets, transportation systems, manufacturing production lines and consumer
markets. This paper explores how Kauffman’s Autonomous Agent might be used as a
foundation for such a theory of organization and will use the context of a familiar
computational model of an ant foraging system to demonstrate how the emergence and
degradation of constraints simultaneously define the process of organization. |
|
Boyle, S., Guerin, S., and Kunkle, D. (2006) . An Application of Multi-Agent Simulation to Policy Appraisal in the Criminal Justice System. In Chen, S. H., Jain, L., and Tai, C. C. (Eds.), Computational Economics: A Perspective from Computational Intelligence. Hershey, PA : Idea Group [book chapter]
Chapter XII, An Application of Multi-Agent Simulation to Policy Appraisal in the Criminal Justice System, by Seán Boyle, Stephen Guerin, and Daniel Kunkle, illustrates an interesting case of the criminal justice system in England. In an intricate system such as the CJS in England, three distinct departments are involved in these affairs. The diverse government bodies have reciprocal influences on each other and therefore any policy taken by one of them will have complex impacts on the system. Hence Boyle et al. have developed an agent-based program to simulate the whole system, and on assessment of the impact across the whole justice system of a variety of policies is thus possible. |
|
Report
[full proceedings of Agent 2004, 47.1 MB, see pages 651-665]
|
M. Agar, S. Guerin, R. Holmes, and D. Kunkle. “DrugSim: the Structure of Agent-based
Revolutions”, Proceedings of Agent 2004: Social Dynamics: Interaction, Reflexivity and
Emergence, Chicago, October 2004.
On the basis of ethnographic work with youth in Baltimore County, an agent-based
model was developed to test the finding that the stories that circulated about a drug
(heroin) explained the rise and fall in the curves showing data on heroin epidemic
incidences. This paper features the use of design of experiment approaches to evaluate
the parameters in that model. Results of the analysis suggest that drug epidemics can be
better understood as the diffusion of a commodity rather than an infectious disease, which
is the view of the medically dominated substance abuse field. Policy implications of this
change in view are sketched in the conclusion. |
|
Report
|
Go is two-player board game that poses one of
the largest challenges to game programmers
today. Multiagent systems are those composed of
multiple autonomous, interacting computer
systems, or agents. These two seemingly diverse
areas have a number of similarities, which will
be explored here. |
|
|
|
 |
|