Experience

This paper addresses the problem of steering a swarm of autonomous agents out of an unknown graph to some goal located at an unknown location. To address this task, an e-greedy, collaborative reinforcement learning method using only local information exchanges is introduced in this paper to balance exploitation and exploration in the unknown graph and to optimize the ability of the swarm to exit from the graph. The learning and routing algorithm given here provides a mechanism for storing data needed to represent the collaborative utility function based on the experiences of previous agents visiting a node that results in routing decisions that improve with time. Two theorems show the theoretical soundness of the proposed learning method and illustrate the importance of the stored information in improving decision-making for routing. Simulation examples show that the introduced simple rules of learning from past experience significantly improve performance over random search and search based on ant colony optimization, a metaheuristic algorithm.

This paper introduces a certain graphical coalitional game where the internal topology of the coalition depends on a prescribed communication graph structure among the agents. The game Value Function is required to satisfy four Axioms of Value. These axioms make it possible to provide a refined study of coalition structures on graphs by defining a formal graphical game and by assigning a Positional Advantage, based on the Shapley value, to each agent in a coalition based on its connectivity properties within the graph. Using the Axioms of Value the graphical coalitional game can be shown to satisfy properties such as convexity, fairness, cohesiveness, and full cooperativeness. Three measures of the contributions of agents to a coalition are introduced: marginal contribution, competitive contribution, and altruistic contribution. The mathematical framework given here is used to establish results regarding the dependence of these three types of contributions on the graph topology, and changes in these contributions due to changes in graph topology. Based on these different contributions, three online sequential decision games are defined on top of the graphical coalitional game, and the stable graphs under each of these sequential decision games are studied. It is shown that the stable graphs under the objective of maximizing the marginal contribution are any connected graph. The stable graphs under the objective of maximizing the competitive contribution are the complete graph. The stable graphs under the objective of maximizing the altruistic contribution are any tree.

In heterogeneous battlefield teams, the balance between team and individual objectives forms the basis for the internal topological structure of teams. The team structure is studied by presenting a graphical coalitional game (GCG) with Positional Advantage (PA). PA is Shapley value strengthened by the Axioms of value. The notion of team and individual objectives is studied by defining altruistic and competitive contribution made by an individual; altruistic and competitive contributions made by an agent are components of its total or marginal contribution. Moreover, the paper examines the dynamic team effects, by defining three online sequential decision games. These sequential decision games are based on marginal, competitive and altruistic contributions of the individuals towards team. The stable graphs under these sequential decision games are studied and found to be any connected, complete, or tree respectively

This paper presents a method of Q-learning to solve the discounted linear quadratic regulator (LQR) problem for continuous-time (CT) continuous-state systems. Most available methods in the existing literature for CT systems to solve the LQR problem generally need partial or complete knowledge of the system dynamics. Q-learning is effective for unknown dynamical systems, but has generally been well understood only for discrete-time systems. The contribution of this paper is to present a Q-learning methodology for CT systems which solves the LQR problem without having any knowledge of the system dynamics. A natural and rigorous justified parameterization of the Q-function is given in terms of the state, the control input, and its derivatives. This parameterization allows the implementation of an online Q-learning algorithm for CT systems. The simulation results supporting the theoretical development are also presented.

In heterogeneous battlefield teams, the balance between team and individual objectives forms the basis for the internal topological structure of teams. The team structure is studied by presenting a graphical coalitional game (GCG) with Positional Advantage (PA). PA is Shapley value strengthened by the Axioms of value. The notion of team and individual objectives is studied by defining altruistic and competitive contribution made by an individual; altruistic and competitive contributions made by an agent are components of its total or marginal contribution. Moreover, the paper examines the dynamic team effects, by defining three online sequential decision games. These sequential decision games are based on marginal, competitive and altruistic contributions of the individuals towards team. The stable graphs under these sequential decision games are studied and found to be any connected, complete, or tree respectively



This paper addresses the problem of steering a swarm of autonomous agents out of an unknown maze to some goal located at an unknown location. This is particularly the case in situations where no direct communication between the agents is possible and all information exchange between agents has to occur indirectly through information “deposited” in the environment. To address this task, an ?-greedy, collaborative reinforcement learning method using only local information exchanges is introduced in this paper to balance exploitation and exploration in the unknown maze and to optimize the ability of the swarm to exit from the maze. The learning and routing algorithm given here provides a mechanism for storing data needed to represent the collaborative utility function based on the experiences of previous agents visiting a node that results in routing decisions that improve with time. Two theorems show the theoretical soundness of the proposed learning method and illustrate the importance of the stored information in improving decision-making for routing. Simulation examples show that the introduced simple rules of learning from past experience significantly improve performance over random search and search based on Ant Colony Optimization, a metaheuristic algorithm.

In heterogeneous battlefield teams, the balance between team and individual objectives forms the basis for the internal topological structure of teams. The stability of team structure is studied by presenting a graphical coalitional game (GCG) with Positional Advantage (PA). PA is Shapley value strengthened by the Axioms of value. The notion of team and individual objectives is studied by defining altruistic and competitive contribution made by an individual; altruistic and competitive contributions made by an agent are components of its total or marginal contribution. Moreover, the paper examines dynamic team effects by defining three online sequential decision games based on marginal, competitive and altruistic contributions of the individuals towards team. The stable graphs under these sequential decision games are studied and found to be structurally connected, complete, or tree respectively.

In heterogeneous battlefield teams, the balance between team and individual objectives forms the basis for the internal topological structure of teams. The stability of team structure is studied by presenting a graphical coalitional game (GCG) with Positional Advantage (PA). PA is Shapley value strengthened by the Axioms of value. The notion of team and individual objectives is studied by defining altruistic and competitive contribution made by an individual; altruistic and competitive contributions made by an agent are components of its total or marginal contribution. Moreover, the paper examines dynamic team effects by defining three online sequential decision games based on marginal, competitive and altruistic contributions of the individuals towards team. The stable graphs under these sequential decision games are studied and found to be structurally connected, complete, or tree respectively.

In this paper, the growth of the telecommunication sector in Pakistan and consequent development in the related professional education is studied. The widening gap between the telecommunication industry and associated education sector is identified. The higher educational programs in Pakistan have grown very rapidly to meet the needs of the explosive growth in the telecommunications’ engineering sector but this growth is not in synchronization with the requirements of the industry due to non-existence of collaboration and co-operation between the two. The professional education in telecommunication in Pakistan and the higher educational degree programs are very precisely focused on producing quality graduates with refined technical and mathematical skills. While the telecom sector in Pakistan is in principle a service provider and a consumer market that mainly requires engineers for operation and maintenance related activities. As such the skills imparted by the education sector are rarely utilized, which results in dissatisfaction among the telecommunications’ engineers. A survey of both the telecommunication sector and the academia has been conducted along with detailed discussions to explore the reasons for this ever-increasing gap, ways and means to arrest this trend and future course of action for the academia and the telecom sector to develop. A study related to other emerging technical fields like computer science has also been made for the comparison. On the basis of this extensive exercise outlined above, measures have been suggested to bridge the gap between the education and the industrial needs of the telecom sector. By adopting these measures not only our education sector will become more beneficial to the industry, but the industry would also get the advantage of immense potential of young graduates and the academic research.