ISDA'03 - Tentative Final Program (PDF Format)
Parallel Technical Sessions: Monday, August 11, 11:00 AM 01:00 PM
Session 1.1 (Room A): Connectionist Paradigms and Machine Learning
Chair: Saratchandran P.
New Model for Time-series Forecasting using RBFS and Exogenous Data
Juan Manuel Gorriz, Carlos G. Puntonet, J. J. G. De La Rosa and Moises Salmeron
In this paper we present a new model for time-series forecasting using Radial Basis Functions (RBFs) as a unit of ANNīs (Artificial Neural Networks), which allows the inclusion of exogenous information (EI) without additional preprocessing. We begin summarizing the most well known EI techniques used ad hoc, i.e. PCA or ICA; we analyse advantages and disadvantages of these techniques in time-series forecasting using Spanish banks and companies stocks. Then we describe a new hybrid model for time-series forecasting which combines ANNīs with GA (Genetic Algorithms); we also describe the possibilities when implementing on parallel processing systems.
On Improving Data Fitting Procedure in Reservoir Operation using Artificial Neural Networks
S. Mohan and V. Ramani Bai
It is an attempt to overcome the problem of not knowing at what least count to reduce the size of the steps taken in weight space and by how much in artificial neural network approach. The parameter estimation phase in conventional statistical models is equivalent to the process of optimizing the connection weights, which is known as learning. Consequently the theory of nonlinear optimization is applicable to the training of feed forward networks. Multilayer Feed forward (BPM & BPLM) and Recurrent Neural network (RNN) models as intra and intra neuronal architectures are formed. The aim is to find a near global solution to what is typically a highly non-linear optimization problem like reservoir operation. The reservoir operation policy derivation has been developed as a case study on application of neural networks. A better management of its allocation and management of water among the users and resources of the system is very much needed. The training and testing sets in the ANN model consisted of data from water year 1969-1994. The water year 1994-1997 data were used in validation of the model performance as learning progressed. The improved monthly operation results obtained when two hidden layer were used than when one hidden layer is used. Results obtained by BPLM are more satisfactory as compared to BPM. In addition the performance by RNN models when applied to the problem of reservoir operation have proved to be the fastest method in speed and produced satisfactory results among all artificial neural network models.
Automatic Vehicle License Plate Recognition using Artificial Neural Networks
Cemil Oz and Fikret Ercal
In this study, we present an artificial neural network based computer vision system which can analyze the image of a car taken by a camera in real-time, locates its license plate and recognizes the registration number of the car. The model has four stages. In the first stage, vehicle license plate (VLP) is located. Second stage performs the segmentation of VLP and produces a sequence of characters. An ANN runs in the third stage of the process and tries to recognize these characters which form the VLP.
Neural Network Predictive Control Applied to Power System Stability
In this paper we consider the problem of power system stability with an application of predictive control systems, and in particular control systems which rely on Neural Networks to maintain system stability. The work focuses on how a hybrid control system utilizing neural networks has the capabilities to improve the stability of an electric power grid when used in place of traditional control systems, i.e. PID type controllers. Testing is done using simulation of the dynamical system with the different control schemes implemented, resulting in a measure of stability through which the controllers can be justified.
Identification of Residues Involved in Protein-Protein Interaction from Amino Acid Sequence - A Support Vector Machine Approach
Changhui Yan, Drena Dobbs and Vasant Honavar
We describe a machine learning approach for sequence-based prediction of protein-protein interaction sites. A support vector machine (SVM) classifier was trained to predict whether or not a surface residue is an interface residue (i.e., is located in the protein-protein interaction surface) based on the identity of the target residue and its 10 sequence neighbors. Separate classifiers were trained on proteins from two categories of complexes, antibody-antigen and protease-inhibitor. The effectiveness of each classifier was evaluated using leave-one-out (jack-knife) cross-validation. Interface and non-interface residues were classified with relatively high sensitivity (82.3% and 78.5%) and specificity (81.0% and 77.6%) for proteins in the antigen-antibody and protease inhibitor complexes, respectively. The correlation between predicted and actual labels was 0.430 and 0.462, indicating that the method performs substantially better than chance (zero correlation). Combined with recently developed methods for identification of surface residues from sequence information, this offers a promising approach to prediction of residues involved in protein-protein interaction from sequence information alone.
From Short Term Memory to Semantics - a Computational Model
Parag C. Prasad and Subramani Arunkumar
Aphasias are disorders of language and have been shown in recent literature, to be associated with deficits of Short Term Memory (STM). Physiological STM has a semantic component that is responsible for comprehension of language. This work brings forth a new model that learns semantics of words across an STM. The approach to validating the model is through experiments that model word comprehension at the level of several sentences which when taken together convey semantics. Experimentally, the model when used to assign semantic labels to a collection of sentences, gives an accuracy that is comparable to or better than that obtained using Support Vector Machines (SVM).
Session 1.2 (Room B): Fuzzy Sets, Rough Sets and Approximate Reasoning
Chair: Ronald Yager
Axiomatization of Qualitative Multicriteria Decision Making with the Sugeno Integral
D. Iourinski and F. Modave
In multicriteria decision making (MCDM), we aim at ranking multidimensional alternatives. A traditional approach is to define an appropriate aggregation operator acting over each set of values of attributes. Non-additive measures (or fuzzy measures) have been shown to be well-suited tools for this purpose. However, this was done in an ad hoc way until recently. An axiomatization of multicriteria decision making was given in a quantitative setting, using the Choquet integral for aggregation operator. The aim of this paper is to formalize the axiomatization of multicriteria decision making in the qualitative setting, using the Sugeno integral in lieu of the Choquet integral.
A Self-learning Fuxxy Inference for Truth Discovery Framework
Alex Sim Tze Hiang, Vincent C. S. Lee, Maria Indrawan and Hee Jee Mei
Knowledge discovery from massive business data is a nontrivial research issue. A generalized framework to guide knowledge discovery process is necessary to improve its efficiency and effectiveness. This paper proposes a framework to relate certainty factor to an absolute factor hereafter called alpha (a ) factor. Alpha factor represents the magnitude of useful local knowledge that is extracted from raw data in a fuzzy inference system also its correctness in relate to a global knowledge. The concept of alpha, a , is explained with mathematical illustration and a research design. Three specific case studies are included to illustrate the use of the proposed self-learning framework for true discovery.
Exact Approximations for Rough Sets
Dmitry Sitnikov, Oleg Ryabov, Nataly Kravets and Olga Vilchinska
Classical topological definitions of rough approximations are based on the indiscernibility relation. Unlike classical approaches in this paper we define rough approximations in an algebraic way. We do not use any binary indiscernibility relation but only unary predicates in terms of which an arbitrary predicate should be described. The terms "exact upper approximation" and "exact lower approximation" have been introduced to stress the fact that there can exist a variety of approximations but it is always possible to select the approximations that cannot be improved in the terms of the approximation language. These new definitions are compared to the classical ones (which use an equivalence relation) and are shown to be more general in the sense that the classical definitions can be deduced from them if we put some restrictions on our model. The process of generating logic rules based on the exact approximations is considered. We also introduce an algebraic definition of a local reduct (a minimal set of predicates describing a rough set) for any subset of the universe.
Correlation Coefficient Estimate for Fuzzy Data
Yongshen Ni and John Y. Cheung
Correlation coefficient reflects the co-varying relationship between random variables and has become an important measurement in many applications. However conventional methods assume the data set as crisp value which isnt proper under some situations. In this paper, fuzzy set theory is introduced to deal with observations in fuzzy environment; also we proposed a method to calculate correlation coefficient of fuzzy data. Then some data sets are tested to check the validity of the proposed algorithm. Realization of this method can help decision-maker have a better knowledge of the data characteristics and therefore very useful in many data mining applications.
Parallel Technical Sessions: Monday, August 11, 04:00 PM 05:30 PM
Session 1.3 (Room A): Internet Security
Chair: Andrew H. Sung
Real-time Certificate Validation Service by Client's Selective Request
Jin Kwak, Seungwoo Lee and Dongho Won
The application of PKI (Public Key Infrastructure) was enlarged with the development of Internet. The certificate using services to validate the public key are increasing according to this. To verify the certificate, the client must confirm the availability of certificate's current status first. Various methods to validate the certificate have been proposed so far and the most of them are CRL-based. But those CRL-based methods have many problems because of the CRL's periodicity. Therefore, the CA in the field that requires frequent modification needs to provide the latest certificate status information to the client in a real-time. In this paper, we propose a new model which can offer the timely certificate status information to the client in a real-time, we called RCVM. Also, we define the MITP (Modified Information Transmission Protocol), and the selective request and response message. The RCVM is that client does not incur the overhead of certificate status validation and the certificate status validation service provides information about a selective request of the client.
Internet Attack Representation using a Hierarchical State Transition Graph
Cheol-Won Lee, Eul Gyu Im and Dong-Kyu Kim
Internet attacking tools become automated and advance quickly, so an attacker can easily deploy attacks from distributed hosts to acquire resources as well as to disrupt services of a target host. One of the best feasible ways to study internet attacks and their consequences is to simulate attacks. In this paper, we introduced a new approach to express attack scenarios for simulation. Our approach allows relations between states to be expressed in graphs, so that users can identify relations between states and find new scenarios.
A Secure Patch Distribution Architecture
Cheol-Won Lee, Eul Gyu Im, Jung-Taek Seo, Tae-Shik Sohn , Jong-Sub Moon and Dong-Kyu Kim
Patch distribution is one of important processes to fix vulnerabilities of softwares and to ensure security of systems. Since an institute or a company has various operating systems or applications, it is not easy to update patches promptly. In this paper, we will propose a secure and consolidated patch distribution architecture with an authentication mechanism, a security assurance mechanism, a patch integrity assurance mechanism, and an automatic patch installation mechanism. We argue that the proposed architecture can allow prompt updates of patches and improve security of patch distribution processes within a domain.
Intrusion Detection Using Ensemble of Soft Computing Paradigms
Srinivas Mukkamala, Andrew H. Sung and Ajith Abraham
Soft computing techniques are increasingly being used for problem solving. This paper addresses using ensemble approach of different soft computing techniques for intrusion detection. Due to increasing incidents of cyber attacks, building effective intrusion detection systems (IDSs) are essential for protecting information systems security, and yet it remains an elusive goal and a great challenge. Two classes of soft computing techniques are studied: Artificial Neural Networks (ANNs) and Support Vector Machines (SVMs). We show that ensemble of ANN and SVM is superior to individual approaches for intrusion detection in terms of classification accuracy.
Session 1.4 (Room B): Intelligent Web Computing
Chair: Yan-Qing Zhang
Real Time Graphical Chinese Chess Game Agents Based on the Client and Server Architecture
Peter Vo, Yan-Qing Zhang, G. S. Owen and R. Sunderraman
The client and server architecture is currently widely used in industry; therefore, it is worthwhile to perform further investigations into its usefulness in different applications. To accomplish this, this paper will demonstrate its appropriateness by implementing an Internet based Chinese chess game using client and server architecture. This paper also has value for developers who would like to develop similar applications, such as Western chess, Internet Re-lay Chat, etc. In implementing the game application, the server program is de-veloped with Java technology while the client program is implemented in C++ with the help of MFC to facilitate the development of a 2D graphical user inter-face. In the future, the client program can be modified to be a web application. Importantly, the Internet-based graphical Chinese chess agent system can be used to teach students to understand intelligent agents and game playing in an artificial intelligence class, networks, graphics and other relevant techniques in other computer science classes.
DIMS: an XML-Based Information Integration Prototype Accessing Web Heterogeneous Sources
Linghua Fan, Jialin Cao and Rene Soenen
The goal of information integration is to provide a uniform interface to a multitude of distributed, autonomics, heterogeneous information sources available online (e.g., databases, XML or HTML files from the WWW). Distributed Information Management System (DIMS) is an XML-based information integration system for accessing these web sources. It utilizes some efficiencies tools, such as Fatdog's XQEngine, Jarkarta Lucene, Jtidy, JDBC or JDBC-ODBC drivers, to wrapper heterogeneous web sources into standard XML data and uses mediator to translate the users query to relational wrapper and integrate these results. Using materialized view to speed query response, using metadata to easy add and delete sources, etc. DIMS provide a new innovation and flexible way to design and implement. Our system uses UML as a method to design our software, and provide a prototype can be using independence platform implementing on Java language.
A Frame-Work for High-Performance Web Mining in Dynamic Environments Using Honeybee Search Strategies
The methodology for the knowledge discovery in databases architecture outlines possible approaches taken by search engines to improve their IR systems. The conventional approach provided the requester with query results based on the user's knowledge of respective IR systems. This paper proposes the use of an information sharing model based on the information processing methodology of honeybees and knowledge discovery in databases as opposed to the traditional IR models used by current search engines. The major limitation of IR-based systems is their dependency on human editors which is reflected in static sets of query terms and the use of stemming. Experimental results are presented for data clustering component (Web page indexer) of the Tocorime Apicu search engine which is based on the information sharing model.
A Framework for Multiagent-Based System for Intrusion Detection
Islam M Hegazy, Taha Al-Arif , Zaki T. Fayed and Hossam M. Faheem
Networking security demands have been considerably increased during the last few years. One of the critical networking security applications is the intrusion detection system. Intrusion detection systems should be faster enough to catch different types of intruders. This paper describes a framework for multiagent-based system for intrusion detection using the agent-based technology. Agents are ideally qualified to play an important role in intrusion detection systems due to their reactivity, interactivity, autonomy, and intelligence. The system is implemented in a real TCP/IP LAN environment. It is considered a step towards a complete multiagent based system for networking security.
Parallel Technical Sessions: Tuesday, August 12, 10:30 AM 01:00 PM
Session 2.1 (Room A): Agent Architectures and Distributed Intelligence
Chair: Sandip Sen
An Adaptive Platform Based Multi-Agents for Architecting Dependability
Samir Benarif, Amar Ramdane-Cherif and Nicole Levy
Research into describing software architecture with respect to their dependability proprieties has gained attention recently. Fault-tolerance is one of the approaches used to maintain dependability; it is associated with the ability of a system to deliver services according to its specifications in spite of the presence of faults. In this paper, we use a platform based on multi-agents system in order to test, evaluate component, detect fault and error recovery by dynamical reconfigurations of the architecture. An implementation of this platform on Client/Server architecture is presented and some scenarios addressing dependability at architectural level are outlined. In this paper, we discuss the importance of our approach and its benefits for architecting dependable systems and how it supports the improvement of dependability and performance of complex systems.
Stochastic Distributed Algorithms for Target Surveillance
Luis Caffarelli, Valentino Crespi and George Cybenko
In this paper we investigate problems of target surveillance with the aim of building a general framework for the evaluation of the performance of a system of autonomous agents. To this purpose we propose a class of semi-distributed stochastic navigation algorithms, that drive swarms of autonomous scouts to the surveillance of grounded targets, and we provide a novel approach to performance estimation based on analysing sequential observations of the system's state with information theoretical techniques. Our goal is to achieve a deeper understanding of the interrelations between randomness, resource consumption and ergodicity of a decentralized control system in which the decision--making process is stochastic.
What-if Planning for Military Logistics
The decision makers involved in military logistics planning need tools that can answer hypothetical ("what-if") questions such as how will a detailed logistics plan change if the high level operational plan is changed. Such tools must be able to generate alternative plans in response to such questions, while maintaining the original plan(s) for comparison. This paper reports on the work performed to add this capability to the multiagent planning and execution system called Cougaar. We state the what-if planning problem, describe the challenges that have to be addressed to solve it, discuss a solution that we designed, and describe the limitations of our approach.
Effects of Reciprocal Social Exchanges on Trust and Autonomy
Hexmoor Henry and Prapulla Poli
In this paper we present effects of reciprocal exchanges on trust and autonomy of peer level agents. We present models of trust and autonomy and show changes to autonomy and trust in an abstract problem domain. As social exchanges increase, average agent autonomy is increased. Autonomy and trust are more susceptible to certain number of social ties among agents mirroring the principle of peak performance.
Session 2.3 (Room A): Computational Intelligence in Management
Chair: Raj Kumar
Using IT To Assure a Culture For Success
It is forty years since the ARPANET conceived interactive computing but Information Technology (IT) has barely raised the capability of teams. The reason is ITs inability to manage the heavy and unpredictable workflows of daily knowledge exchange. The intelligence presented harnesses IT to organize and anticipate the decision-maker. Prototype operations establish its thesis that all unstructured collaboration on a Document is synthesized by known repeatable actions; universal norms determine the actions possible next. Replication technology, controlled by the norms, extends the marvelous communication offline and across servers. The give and take culture induced increases the enterprises capability for success.
Gender Differences in Performance Feedback Utilizing an Expert System: A Replication and Extension
Tim Peterson, David D. Van Fleet, Peggy C. Smith and Jon W. Beard
Expert system proponents claim that expert systems can provide managers with the knowledge they need to perform managerial tasks. One domain for expert system support is in giving performance feedback. This study replicates an earlier study that used only male managers. The result of the current laboratory study using female managers provides interesting empirical findings. The implications of these findings, limitations to the study, and future research are discussed.
Session 2.2 (Room B) Data mining, Knowledge Management and Information Analysis
Chair: Suliman Al-Hawamdeh
NETMARK: Adding Hierarchical Object to Relational Databases
David A. Maluf, Peter B. Tran, Tracy La and Mohana Guram
Object-Relational database management system is an integrated hybrid cooperative approach to combine the best practices of both the relational model utilizing SQL queries and the object-oriented, semantic paradigm for supporting complex data creation. In this paper, a highly scalable, information on demand database framework, called NETMARK, is introduced. NETMARK takes advantages of the Oracle 8i object-relational database using physical addresses data types for very efficient keyword search of records spanning across both context and content. NETMARK was originally developed in early 2000 as a research and development prototype to solve the vast amounts of unstructured and semi-structured documents existing within NASA enterprises. Today, NETMARK is a flexible, high-throughput open database framework for managing, storing, and searching unstructured or semi-structured arbitrary hierarchical models, such as XML and HTML.
Academic KDD Project LISp-Miner
An academic KDD system is introduced. System is freely available and is suitable for KDD research, teaching and for data mining tasks up to medium size. This paper describes its architecture and project management for purpose of those interested to join the project team either as developer or user. Some possible directions of future research are also mentioned.
Performance Evaluation Metrics for Link Discovery Systems
Recently there has been an explosion of work on the design of automated link discovery (LD) systems but little work has been done to investigate methods to evaluate the performance of such systems. This paper states the link discovery system evaluation problem, explores the issues involved in evaluating the performance of link discovery systems by relating it to the traditional problems of evaluating classification systems, and describes the metrics that we derived and the system we designed to evaluate the LD systems being developed under the Evidence Extraction & Link Discovery program.
Generalizing Association Rules: A Theoretical Framework and an Implementation
Antonio Badia and Mehmed Kantardzic
We present a generalization of the concept of association rule and an efficient implementation to support mining of generalized rules. To give our approach a sound formal basis, we use the idea of generalized quantification, developed in logic theory. The generalization is based on allowing more complex relationships among itemsets, including negative associations. Negative associations are defined as associations among large itemsets which denote a strong degree of disjointness. We also propose stricter measures of relevance than confidence and support, in order to make sure that extracted rules (positive and negative) truly represent a pattern in the data. Finally, we show how to extend the A-priori algorithm to mine for the extended associations introduced with high efficiency and little overhead.
New Geometric method for Blind Separation of Sources
Manuel Rodriguez-Alvarez, Fernando Rojas, Carlos G. Puntonet, F. Theis, E. Lang and R. M. Clemente
This work explains a new method for blind separation of a linear mixture of sources, based on geometrical considerations concerning the observation space. This new method is applied to a mixture of several sources and it obtains the estimated coefficients of the unknown mixture matrix A and separates the unknown sources. In this work, the principles of the new method and a description of the algorithm are shown.
Codifying the "Know How" Using CyKnit Knowledge Integration Tools
The increased interest in knowledge management and in particular the "Know How" has created the need for interactive and collaborative knowledge management tools. Integrated knowledge management tools such as knowledge portals provide access to wide range of facilities that encourage collaboration and promote organizational learning. Enhanced portal facilities such as personalization, summarization and ask the expert can be used to help users locate relevant knowledge sources and assist them in finding answers to queries rather than references to documents. In this paper, we discuss cyKNIT knowledge management integration tools that can be used to facilitate human interaction and help capture the "know how". The system also provides a set of tools for knowledge categorization, taxonomies, document management, discussion forums, content management, ask the expert, and advance searching and extraction facilities.
Decision Tree Induction from Distributed Heterogeneous Autonomous Data Sources
Doina Caragea, Adrian Silvescu and Vasant Honavar
With the growing use of distributed information networks, there is an increasing need for algorithmic and system solutions for data-driven knowledge acquisition using distributed, heterogeneous and autonomous data repositories. In many applications, practical constraints require such systems to provide support for data analysis where the data and the computational resources are available. This presents us with distributed learning problems. We precisely formulate a class of distributed learning problems; present a general strategy for transforming traditional machine learning algorithms into distributed learning algorithms; and demonstrate the application of this strategy to devise algorithms for decision tree induction (using a variety of splitting criteria) from distributed data. The resulting algorithms are provably exact in that the decision tree constructed from distributed data is identical to that obtained by the corresponding algorithm when in the batch setting. The distributed decision tree induction algorithms have been implemented as part of INDUS, an agent-based system for data-driven knowledge acquisition from heterogeneous, distributed, autonomous data sources.
A Taxonomy of Data Mining Applications Supporting Software Reuse
S. Tangsripairoj and M. H. Samadzadeh
A taxonomy is a classification of items in a systematic way based on their inherent properties and relationships. In addition to serving as a descriptive facility to distinguish among existing items, a taxonomy typically contains provisions for not only predicting items not among its baseline set, but also the ability to prescribe new items. Data mining is an advanced data analysis technique whose primary function is to extract likely useful knowledge or hidden patterns from large databases. Software reuse, the development of software systems from previously constructed software components rather than from scratch, is considered a means of improving software productivity and quality. For efficient and effective reuse, a software library or repository can be built to store and organize a collection of software components. It is indispensable that a software library provide tools for software developers to locate, compare, and retrieve reusable software components that meet their requirements. Data mining tools, techniques, and approaches can be employed to acquire useful information about software components in a software library. Such information can be beneficial to software developers in searching for desired reusable software components. In this paper, we catalog the major characteristics of several existing data mining applications supporting software reuse, and propose a taxonomy based on these characteristics. The taxonomy provides a predictive framework to help identify possible new data mining applications.
Parallel Technical Sessions: Wednesday, August 13, 10:30 AM 01:00 PM
Session 3.1 (Room A): 2003 International Workshop on Intelligence, Soft computing and the Web
Workshop Organizers: Damminda Alahakoon and Shyue-Liang Wang
Enhanced Cluster Visualization Using the Data Skeleton Model
R. Amarasiri, L. K. Wickramasinghe and D. Alahakoon
The Growing Self Organizing Map (GSOM), which is an extended version of the Self Organizing Map (SOM), has significant advantages when dealing with dynamically or incrementally changing data structures. The Data Skeleton Model (DSM) is a technique developed using the GSOM to identify clusters and relate them to the input data sequence. In this paper, we have developed the DSM as a visualization tool to automate the cluster identification process. We also demonstrate the advantage of using the Spread Factor (SF) in the GSOM for clustering data.
Generating Concept Hierarchies for Categorical Attributes
Been-Chian Chien and Su-Yu Liao
Extracting knowledge from a large amount of data is one of the important research topics in knowledge discovery. A concept hierarchy is a kind of concise and general form of concept description that organizes relationships of data and expresses knowledge as a tree-like or partial ordering structure. In this paper, we propose an approach to generate concept hierarchies automatically for a given data set with nominal attributes based on rough entropy. The proposed method reduces the number of attributes after each process of generating concept level. We give two experiments to show that the proposed algorithm is intuitive and efficient for knowledge discovery in databases.
Learning from Hierarchical Attribute Values
Tzung-Pei Hong, Chun-E Lin, Jiann-Horng Lin and Shyue-Liang Wang
The rough-set theory has been widely used in dealing with data classification problems. Most of the previous studies on rough sets focused on deriving certain rules and possible rules on the single concept level. Data with hierarchical attribute values are, however, commonly seen in real-world applications. This pa-per thus attempts to propose a new learning algorithm based on rough sets to find cross-level certain and possible rules from training data with hierarchical attribute values. It is more complex than learning rules from training examples with single-level values, but may derive more general knowledge from data.
Delivering Distributed Data Mining E-Services
The growing number of commercial Internet-based data mining service providers is indicative of the emerging trend of data mining application services. It validates the recognition that knowledge is a key resource in strategic organisational decision-making. The trend also establishes that the Application Service Provider (ASP) paradigm is seen as a cost-effective approach to meet the business intelligence needs of small to medium range organisations that are the most constrained by the high cost of niche software technologies. An important issue in a service context is the optimization of performance in terms of throughput and response time. The paper presents a hybrid distributed data mining (DDM) model that improves the overall response time.
Maintenance of Discovered Functional Dependencies: Incremental Deletion
Shyue-Liang Wang, Wen-Chieh Tsou, Jiann-Horng Lin and Tzung-Pei Hong
The discovery of functional dependencies (FDs) in relational databases is an important data-mining problem. Most current work assumes that the database is static, and a database update requires rediscovering all the FDs by scanning the entire old and new database repeatedly. Some works consider the incremental discovery of FDs in the presence of a new set of tuples added to an old database. In this work, we present two incremental data mining algorithms, top-down and bottom-up, to discover all FDs when deletion of tuples occurred to the database. Based on the principle of monotonicity of FDs, we avoid re-scanning of the database and thereby reduce computation time. Feasibility and efficiency of the two proposed algorithms are demonstrated through examples.
Filtering Multilingual Web Content Using Fuzzy Logic
Rowena Chau and Chung-Hsing Yeh
An emerging requirement to sift through the increasing flood of multilingual textual content available electronically over the World Wide Web has led to the pressing demand for effective multilingual Web information filtering. In this pa-per, a content-based approach to multilingual information filtering is proposed. This approach is capable of screening and evaluating multilingual documents based on their semantic content. As such, correlated multilingual documents are disseminated according to their corresponding themes/topics to facilitate both efficient and effective content-based information access. The objective of alleviating users' burden of information overload is thus achieved. This approach is realized by incorporating fuzzy clustering and fuzzy inference techniques. To illustrate, development of a Web-based multilingual online news filtering system by apply-ing this approach is also presented.
A Comparison of Patient Classification Using Data Mining in Acute Health Care
Eu-Gene Siew, Kate A. Smith, Leonid Churilov and Jeff Wassertheil
Patients diagnoses are used currently as a basis for resource consumption. There are other alternative forms of groupings: one approach is to group patients according to common characteristics and infer their resource consumption based on their group membership. In this paper, we compare the effectiveness of the alternative forms of patient classification obtained from data mining with the current classification for an objective assessment of the average difference between the inferred and the actual resource consumption. In tackling this prediction tasks, classification trees and neural clustering are used. Demographic and hospital admission information is used to generate the clusters and decision tree nodes. For the case study under consideration, the alternative forms of patient classifications seem to be better able to reflect the resource consumption than diagnosis related groups.
Session 3.2 (Room B): Peer-to-Peer Computing
Chairs: Prithviraj Dasgupta and Vana Kalogeraki
A Peer to Peer System Architecture for Multi-Agent Collaboration
A peer-to-peer(P2P) network comprises a collection of nodes that can cooperate and collaborate with each other in a de-centralized and distributed manner. A node in a P2P network can access information present in the network using peer discovery followed by a search and retrieval phase. At present, most P2P systems employ TCP/IP based message communication to implement the operations in a P2P network. In this paper, we propose the use of mobile software agents to implement the protocols in a P2P system. Mobile software agents are autonomous, economic in terms of size and bandwidth consumption, and can operate remotely without the continuous supervision of a central server. Our research indicates that mobile software agents provide a suitable paradigm for implementing P2P systems that is both scalable and robust.
A Soft Real Time Agent Based Peer-to-Peer Architecture
Feng Chen and Vana Kalogeraki
As computers become more pervasive and communication technologies advance, a new generation of peer-to-peer (P2P) networks are increasingly becoming popular for real-time communication, ad-hoc collaboration and resource sharing in large-scale distributed systems. In this paper we present an agent-based peer-to-peer architecture that provides soft real-time guarantees to distributed tasks in peer-to-peer systems. The architecture exploits the urgency of the tasks, the objects at the peers and the resource utilization of the nodes, to dynamically determine an efficient schedule for the distributed activities. The mechanism uses only local knowledge and is entirely distributed and therefore scales well with the size of the system.
Social Networks as a Coordination Technique for Multi-Robot Systems
Daniel Rodic and Andries P. Engelbrecht
The last decade saw a renewed interest in the robotics research field and a shift in research focus. In the eighties and early nineties, the focus of robotic research was on finding optimal robot architectures, often resulting in non-cognitive, in-sect-like entities. In recent years, processing power available to autonomous agents has improved and that has allowed for more complex robot architectures. The focus has shifted from single robot to multi-robot teams. The key to the full utilisation of multi-robot teams lies in cooperation. Although a robot is a special case of an agent, many existing multi-agent cooperation techniques could not be directly ported to multi-robot teams. In this paper, we overview mainstream multi-robot coordination techniques and propose a new approach to coordination, based on models of organisational sociology. The proposed coordination model is not robot specific, but it can be applied to any multi-agent system without any modifications.
Biology-Inspired Approaches to Peer to Peer Computing in BISON
Alberto Montresor Ozalp Babaoglu
BISON is a research project funded by the European Commission that is developing new techniques and paradigms for the construction of robust, self-organizing and self-repairing information systems as ensembles of autonomous agents that mimic the behavior of some natural or biological process. In this paper we give a brief overview of BISON, discuss some preliminary results for peer-to-peer systems, describe the ongoing work and indicate future research directions that appear to be promising.
UbAgent: A Mobile Agent Middleware Infrastructure for Ubiquitous Pervasive Computing
George Samaras and Paraskevas Evripidou
Pervasive computing can be summarized as the process to automatically gather, organize, process, and analyses data in order to provide sophisticated services that enhance and improve human activities, experience, learning and knowledge anytime, anywhere and from any device. UbAgents middleware infrastructure provides an efficient way of handling the back-end processing of Pervasive Computing: accessing information from distributed databases, computation processing, and location specific services. UbAgent is a mobile agent based middleware for Pervasive computing. It is build on top of commercially available mobile agent systems. The second tier components are the TRAcKER a location management system, the Taskhandler framework and a Unified Message System for Mobile agents. At tier three we have PaCMAn, a web-based meta computer, the DBMS-agent system for distributed information retrieval and the DVS system for the creating of personalized views. At present UbAgent deals with the back-end of Pervasive applications; provides the services and it does not deal with real life user interface, i.e. to interpret the user needs and wishes.
Session 3.3 (Room B) Data mining, Knowledge Management and Information Analysis
Chair: Sugatha Sanyal
Data Mining Techniques in Materialised Projection View
Ying Wah Teh and Abu Bakar Zaitun
This paper investigates one of the important factors like the attributes specified in the criteria of a query and their influence on the response time of decision support query processing. The study of the attributes is based on materialised projection view to select relevant attributes to form a new table or relation in order to minimise accessing irrelevant attributes. As it is very costly to form a new table for every query, top priority users like production managers and frequently accessed attributes are the two major parameters used to build a new table. We introduce the data mining techniques in our study such as by using decision tree to perform a materialised projection view on a relation/table.
Data Mining Techniques in Index Techniques
Ying Wah Teh and Abu Bakar Zaitun
Redundant data structures such as indexes have emerged as some of the improved query processing techniques for dealing with very large data volumes and fast response time requirements of a data warehouse. This paper investigates factors like the use of tuples specified in the criteria of a structured query language (SQL) query and their influence on the response time of a query in a data warehouse environment. Experience of the pioneers in the data warehouse industry in handling queries by using redundant data structures has al-ready been well established. Since we are given very large data storage nowadays, redundant data structures are no longer a big issue. However, an intelligent way of managing storage or redundant data structures that can lead to fast access of data is the vital issue to be dealt with in this paper.
Parallel Technical Sessions: Wednesday, August 1 3, 04:00 PM 06:00 PM
Session 3.3 (Room A): 2003 International Workshop on Intelligence, Soft computing and the Web
Workshop Organizers: Damminda Alahakoon and Shyue-Liang Wang
Criteria for a Comparative Study of Visualization Techniques in Data Mining
Robert Redpath and Bala Srinivasan
The paper considers the relationship of information visualization tools to the data mining process. The types of structures that may be of interest to data mining experts in data sets are outlined. The performance of a particular visualization technique in revealing those structures and supporting the subsequent steps in the process needs to be based on a number of criteria. The criteria for performing an evaluation are suggested and explained. A division into two main groups is suggested; criteria that relate to interface issues and criteria that relate to the characteristics of the data set. An example application of some of the criteria is made.
Controlling the Spread of Dynamic Self Organising Maps
The Growing Self Organising Map (GSOM) has recently been proposed as an alternative neural network architecture based on the traditional Self Organising Map (SOM). The GSOM provides the user with the ability to control the map spread by defining a parameter called the Spread Factor (SF), which results in enhanced data mining as well as hierarchical clustering opportunities. In this paper we highlight the effect of the spread factor on the GSOM and contrast this effect with grid size change (increase and decrease) in the SOM. We also present experimental results to describe our claims regarding the difference in the GSOM to the SOM.
Session 3.4 (Room A) Image Processing and Retrieval
Chair: Phillip A. Mlsna
Image Database Query Using Shape-Based Boundary Descriptors
Nikolay M. Sirakov, Jim Swift and Phillip A. Mlsna
This paper presents an approach capable of querying an unorganized and dynamic image database. Each automatically extracted image region is represented as an ellipsoid in N-D vector feature space. The ellipsoids are used to measure the similarity between the query region and the regions found in the database. Each vector component represents a single feature of potential user interest. Features used include regularities, essential boundary points, second order B-spline control points, boundary support, the number of the regions in an image, and minimum and maximum gray levels. On the basis of the theoretical concepts is developed an intelligent system capable of searching multiple regions. A query is formulated by constructing a feature vector using values extracted from a desired query region. Queries are then answered by comparison with feature vectors from the database. The system retrieves those stored images, whose regions' features most closely match those of the query. An experiment is performed to validate the capability of the system using an image database that contains 2D sections of 3D subsurface structures. A discussion of the advantages and disadvantages of our approach is also provided.
Image Retrieval by Auto Weight Regulation PCA Algorithm
W. H. Chang and M. C. Cheng
A new image retrieval method by integrating color and edge features is proposed in this study. We also implement a practical interface to retrieve images from image database that are relevant to the user query. The proposed system uses auto weight regulation PCA (Principal Component Analysis) algorithm for similarity measure and image retrieval. In our proposed system, there are three steps: features extraction, similarity measure, and image retrieval. In the first step, we extract color features and edge features of a query image and save extracted features as code word. Similarity measure is done by auto weight regulation PCA algorithm in the second step. Then retrieve the most relevant images from image database optimally by comparison the projection value in codebooks with query image. The effectiveness and practicality of the proposed method has been demonstrated by various experiments.
Improving the Initial Image Retrieval Set by Inter-Query Learning with One-Class SVMs
Iker Gondra, Douglas R. Heisterkamp and Jing Peng
Relevance Feedback attempts to reduce the semantic gap between a users perception of similarity and a feature-based representation of an image by asking the user to provide feedback regarding the relevance or non-relevance of the retrieved images. This is intra-query learning. However, in most current systems, all prior experience is lost whenever a user generates a new query thus inter-query information is not used. In this paper, we focus on the possibility of incorporating prior experience (obtained from the historical interaction of users with the system) to improve the retrieval performance on future queries. We propose learning one-class SVMs from retrieval experience to represent the set memberships of users query concepts. Using a fuzzy classification approach, this historical knowledge is then incorporated into future queries to improve the retrieval performance. In order to learn the set membership of a users query concept, a one-class SVM maps the relevant or training images into a nonlinearly transformed kernel-induced feature space and attempts to include most of those images into a hyper-sphere. The use of kernels allows the one-class SVM to deal with the non-linearity of the distribution of training images in an efficient manner, while at the same time, providing good generalization. The proposed approach is evaluated against real data sets and the results obtained confirm the effectiveness of using prior experience in improving retrieval performance.
Tongue Image Analysis Software
Dipti Prasad Mukherjee and D. Dutta Majumder
In this paper we describe an application software for tongue image analysis. The color, shape and texture of tongue are very important information for medical diagnostic purpose. Given a tongue image, this software derives features specific to color, shape and texture of the tongue surface. Based on the feature characteristics of a normal tongue image, the possible health disorder associated with a particular patient is ascertained. A set of conventional image processing algorithms is integrated for this purpose. The treatment management for similar past cases with respect to a given patient case can also be retrieved from the image and the associated database based on image similarity measure.
2 D Object Recognition Using the Hough Transform
Venu Madhav Gummadi and Thompson Sarkodie-Gyan
The objective of this paper is to identify 2D object features in object recognition in order to ensure quality assurance. In this paper we use Hough transform technique to identify the shape of the object by mapping the edge points of the image and also to identify the existing straight lines in the image. The Edge Detection Algorithm is applied to detect the edge points by the sharp or sudden change in intensity. Object recognition is usually performed using the property of shape as a discriminator. The shape of the edges, their size, orientation, and location can be extracted from the objects, which is used for object recognition.
Session 3.5 (Room B): Optimization, Scheduling and Heuristics
Chair: Johnson P. Thomas
Adjusting Population Size of Differential Evolution Algorithm using Fuzzy Logic
Junhong Liu and Jouni Lampinen
The Differential Evolution algorithm is a floating-point encoded Evolutionary Algorithm for global optimization over continuous spaces. The objective of this study is to introduce a dynamically controlled adaptive population size for the Differential Evolution algorithm by the means of a fuzzy controller. The controller's inputs incorporate the changes in objective function values and individual solution vectors between the populations of two successive generations. The fuzzy controller then uses these data for dynamically adapting the population size. The obtained preliminary results suggest that the adaptive population size may result in a higher convergence rate and reduce the number of objective function evaluations required.
Intelligent Management of QoS Requirements for Perceptual Benefit
George Ghinea, George D.Magoulas and J. P. Thomas
The vision of a new generation of network communication architectures, which deliver a Quality of Service based on intelligent decisions about the interactions that typically take place in a multimedia scenario, encourages researchers to look at novel ways of matching user-level requirements with parameters characterising underlying network performance. In this paper, we suggest an integrated architecture that makes use of the objective-technical information provided by the designer and the subjective-perceptual information supplied by the user for intelligent decision making in the construction of communication protocols. Our approach opens the possibility for such protocols to dynamically adapt based on a changing operating environment.
Integrating Random Ordering into Multi-heuristic List Scheduling Genetic Algorithm
Andy Auyeung, Iker Gondra and H. K. Dai
This paper presents an extension of a Multi-heuristic List Scheduling Genetic Algorithm. The problem is to find a schedule that will minimize the execution time of a program in multi-processor platform. Because this problem is known to be NP-complete, many heuristics have been developed. List Scheduling is one of the classical heuristic solutions. The idea of Multi-heuristic List Scheduling Genetic Algorithm is to find an "optimal" combination of List Scheduling heuristics, which outperforms only one heuristic. However, a very important drawback of this algorithm is that it does not take all the search space into consideration. We improved it by introducing random ordering that allows it to cover the entire search space while at the same time focusing on the heuristics region. An experimental comparison is then made to against both Multi-heuristic List Scheduling Genetic Algorithm and Combined Genetic-List Algorithm.
Scheduling to be Competitive in Supply Chains
Sabyasachi Saha and Sandip Sen
Supply chain management system has an important role in the globalized trading market, where sub-contracts and sub-tasks are given to other manufacturer or suppliers through competitive auction based contracts. The suppliers can decide their scheduling strategy for completing the contracted tasks depending on their capacity, the nature of the contracts, the profit margins and other commitments and expectations about future contracts. Such decision mechanisms can incorporate task features including length of the task, priority type of the task, scheduling windows, estimated arrival probabilities, and profit margins. Task scheduling decisions for a supplier should be targeted towards creating a schedule that is flexible about accommodating tasks with urgency and in general, those tasks which produce larger profits to the supplier. In previous work, different scheduling heuristics like first fit, best fit and worst fit have been evaluated. We believe that a more opportunistic, comprehensive and robust scheduling approach can significantly improve the competitiveness of the suppliers and the efficiency of the supply chain. We present an expected utility-based scheduling strategy and experimentally compare its effectiveness with previously used scheduling heuristics.
Contract Net Protocol for Cooperative Optimisation and Dynamic Scheduling of Steel Production
D. Ouelhadj, P. I. Cowling and S. Petrovic
This paper describes a negotiation protocol proposed for inter-agent cooperation in a multi-agent system we developed for optimisation and dynamic integrated scheduling of steel production. The negotiation protocol is a two-level bidding mechanism based on the contract net protocol. The purpose of this protocol is to allow the agents to cooperate and coordinate their actions in order to find globally near-optimal robust schedules, which are able to optimise the original production goals whilst minimising the disruption caused by the occurrence of unexpected real-time events. Experimental results show the performance of this negotiation protocol to coordinate the agents in generating good quality robust schedules. This performance is evaluated in terms of stability and utility measures used to evaluate the robustness of the steel production processes in the presence of real-time events.