Web Data Mining Definition
What is Web Mining? – Definition from Techopedia
What Does Web Mining Mean?
Web mining is the process of using data mining techniques and algorithms to extract information directly from the Web by extracting it from Web documents and services, Web content, hyperlinks and server logs. The goal of Web mining is to look for patterns in Web data by collecting and analyzing information in order to gain insight into trends, the industry and users in general.
Techopedia Explains Web Mining
Web mining is a branch of data mining concentrating on the World Wide Web as the primary data source, including all of its components from Web content, server logs to everything in between. The contents of data mined from the Web may be a collection of facts that Web pages are meant to contain, and these may consist of text, structured data such as lists and tables, and even images, video and audio.
Categories of Web mining:
Web content mining — This is the process of mining useful information from the contents of Web pages and Web documents, which are mostly text, images and audio/video files. Techniques used in this discipline have been heavily drawn from natural language processing (NLP) and information retrieval.
Web structure mining — This is the process of analyzing the nodes and connection structure of a website through the use of graph theory. There are two things that can be obtained from this: the structure of a website in terms of how it is connected to other sites and the document structure of the website itself, as to how each page is connected.
Web usage mining — This is the process of extracting patterns and information from server logs to gain insight on user activity including where the users are from, how many clicked what item on the site and the types of activities being done on the site.
Internet Mining and its Phases – IJERT
1, 2, 3Department of Computer Science & Engineering, Ganga Institute of Technology and Management, Kablana, Jhajjar, Haryana, IndiaAbstract In this paper, we describe the data warehousing and data mining. Data Warehousing is the process of storing the data on large scale and Data mining is the process of analyzing data from different perspectives and summarizing it into useful information – information that can be used to increase revenue, cuts costs, or massive amount of data is continuously being collected and stored, many industries are becoming interested in mining some patterns (association rules, correlations, clusters etc) from their database. Association rule mining is one of the important tasks that are used to find out the frequent itemset from customer transactional database. Each transaction consists of items purchased by a customer in a ternet mining is the application of data mining techniques to discover patterns from the Internet. Internet Usage Mining (IUM) is the process of application of data mining techniques over web data. The data sources are mainly the web server logs, proxy server logs and cookies stored in the users computer. IUM is composed of three phases namely, preprocessing, pattern discovery and pattern analysis. This paper describes these phases in detail. A necessary introduction to Internet Mining is also provided for the purpose of background ywords Data warehousing and its architectures, Data Mining, Techniques of Data Mining, Internet mining.
INTRODUCTION
Data warehousing helps us to store the data. Data warehouse architecture is primarily based on the business processes of a business enterprise taking into consideration the data consolidation across the business enterprise with adequate security, data modeling and organization, extent of query requirements, meta data management and application, warehouse staging area planning for optimum bandwidth utilization and full technology implementation.
The Data Warehouse Architecture includes many facets. Some of these are listed as follows:
Process architecture Date Model architecture Technology architecture Information architecture
Resource architecture
PROCESS ARCHITECTURE
Describes the number of stages and how data is processed to convert raw / transactional data into information for end user usage. The data staging process includes three main areas of concerns or sub- processes for planning data
warehouse architecture namely Extract, Transform and Load.
These interrelated sub-processes are sometimes referred to as an ETL process.
Extract- Since data for the data warehouse can come from different sources and may be of different types, the plan to extract the data along with appropriate compression and encryption techniques is an important requirement for consideration.
Transform- Transformation of data with appropriate conversion, aggregation and cleaning besides de- normalization and surrogate key management is also an important process to be planned for building a data warehouse.
Load- Steps to be considered to load data with optimization by considering the multiple areas where the data is targeted to be loaded and retrieved is also an important part of the data warehouse architecture plan.
DATA MODEL ARCHITECTURE
In Data Model Architecture (also known as Dimensional Data Model), there are 3 main data modeling styles for enterprise warehouses:
3rd Normal Form – Top Down Architecture, Top Down Implementation
Federated Star Schemas – Bottom Up Architecture, Bottom Up Implementation
Data Vault – Top Down Architecture, Bottom Up Implementation
Technology Architecture
Scalability and flexibility is required in all facets. The extent of these features is largely depending upon organizational size, business requirements, nature of business etc.
Technology or Technical architecture primary evolved from derivations from the process architecture, meta data management requirements based on business rules and security levels implementations and technology tool specific evaluation.
Besides these, the Technology architecture also looks into the various technology implementation standards in database management, database connectivity protocols (ODBC, JDBC, OLE DB etc), Middleware (based on ORB,
RMI, COM/DOM etc. ), Network protocols (DNS, LDAP etc) and other related technologies.
Information Architecture
It is the process of translating the information from one form to another in a step by step sequence so as to manage the storage, retrieval, modification and deletion of the data in the data warehouse.
Resource Architecture
Resource architecture is related to software architecture in that many resources come from software resources. Resources are important because they help determine performance. Workload is the other part of the equation. If you have enough resources to complete the workload in the right amount of time, then performance will be high. If there are not enough resources for the workload, then performance will be low.
DATA MINING
Data mining involves the use of sophisticated data analysis tools to discover previously unknown, valid patterns and relationships in large data sets. These tools can include statistical models, mathematical algorithms, and machine learning methods (algorithms that improve their performance automatically through experience, such as neural networks or decision trees). Consequently, data mining consists of more than collecting and managing data, it also includes analysis and prediction.
Fig:1 Data Mining is the core of Knowledge Discovery process
Data mining has its own tools and techniques to mine interesting information. When these tools and techniques are applied to the World Wide Web [as is or with some modifications and adaptations for the www environment], it can be called as Internet Mining.
So, Internet mining refers to discovery and analysis of useful information over the World Wide Web. Internet mining can be broadly classified into three categories:
Content Mining
Structure Mining
Usage Mining
Internet Mining
Content Mining Structure Mining Usage Mining
Fig:2 Types of Internet Mining
Content Mining:
Content Mining refers to mining of desired content over World Wide Web. Various search engines exists for the content mining, such as altavista, Lycos, WebCrawlar, MetaCrawlar etc.
Structure Mining:
Structure mining tries to discover the link structure of the hyperlinks at the inter-document level to generate structural summary about the Website and Web page.
Usage Mining:
Usage Mining refers to automatic knowledge mining of user access patterns from web servers. It includes,
Preprocessing
Pattern Discovery Tools Pattern Analysis Tools
Figure 3: Types of Internet Mining
THE INTERNET USAGE MINING
Internet Usage Mining refers to automatic knowledge mining of user access patterns from different web servers.. It is the application of various techniques used in Data Mining to discover and analyze the usage patterns of web data.
Why Internet Usage Mining?
Internet has been growing at explosive rate since last decades. Lots of information is available on the internet. Millions of Websites exists and more are uploaded daily containing a lot of information. Bilions of users browse on internet for different reasons, each searching for some interesting information. By Interesting Information, we refer to the information for which the user is browsing on internet, rest all information doesnt seems to be interesting
to him. How interesting the information is to a particular user, is identified by interestingness measures. Interestingness measures are used based on data mining techniques such as clustering, classification and association. These users needs tool and techniques [e. g. browsers], so that they can find needed information in a less time with more accurate results.
Another perspective is from the engineers, developers, web designers, and such professionals who strive to create more and more structured information, on structured websites. They are responsible for managing the structure of websites and providing interesting information in an interesting manner. They design tools and techniques for this and use them to manage websites by their content, and structure.
A very different perspective is from the companies who have invested millions into the web and web technologies. These are the organizations which are mostly based on E- Commerce, selling their products and services over the World Wide Web. For these organizations, it is very essential to keep the patterns of user visits, their profiles and their interestingness measures. This gives requirement for the development of client and server side intelligent systems that can mine knowledge across web.
So, it is essential to have some techniques and tools for satisfying the above said requirements. All these requirements give rise to INTERNET MINING. The term INTERNET MINING is very broad in its sense. But a special kind of internet mining called INTERNET USAGE MINING is the focus of the work presented here.
A number of organizations has invested highly on web technologies and carrying out business there. For example,, etc. A lot of people access their websites across the world and does business with them. Analyzing this data can provide these organizations with the value of the customers. It helps the organizations to identify the Good, Valued and Bad customers based on their access patterns. This data also helps them for cross marketing strategies, their campaigns and others. Organizations can identify the effectiveness of their websites and also the effectiveness of their advertisements on different websites. Web Usage Mining helps them to identify the market segment and target interesting customers.
From where the data comes:
All the data, regarding the users is stored in their server access logs. Other sources include referrer logs which contains the information about referring pages from which the user has been referred to a particular page. User forms, survey results are also used as input. In Internet Usage Mining, data is collected at Web Servers, proxy servers, and organizations own database. Various methods such as cookies, CGI Script, Java Script, forms, session tracking, query data, click streams and page views are frequently used in web usage mining.
The data that is required to perform includes web server logs, cookies, proxy server logs, surveys, registration forms
filled by users, access patterns of users (click stream) etc. The data sources can be classified into three categories:
Collection of Data from Server:
These data sources include logs from web server. Web server logs are important because they provide major user access patterns. All the works that user performs on a website are recorded in logs in the web server. Web servers are the computers having special software installed on them which are used to fulfill the user requests. A web server software may be Apache Tomcat, BEA WebLogic, IBMs WebSphere, Sun Microsystems J2EE Application server etc. Logs that are maintained can be in different formats.
So, care should be taken when data is collected from more than one web server. A web usage mining tool must be capable of processing logs of more than one web server software.
However, the logs stored in web servers cannot be called the complete input, as there are different levels of caching in the internet architecture. Often, clients are first directed to cache and then web servers. Moreover there are different data that are not logged in the web servers such as information passed through POST method. Other sources includes cookies. Cookies are special files that are generated by web servers to collect information about individual clients. For creating cookies, user must authorize web server to created cookies, as cookies concern with privacy. Various scripting languages such as CGI Script, Java Script, VB Script and Perl Script are also used to handle the data that is sent back to the web server from client browsers.
Collection of Data from Clients:
Client side collection requires user cooperation. The technologies includes Java Applets, and various scripts which requires users to enable them. Data from clients can also be collected by using modified browsers. But user must be made willing to use that browser. Different companies like NetZoro[9], YouMint[10] and AllAdvantage[11] offers users incentives for using modified browsers and clicking on the advertisements on them.
Collection of Data from Proxy Servers:
Data collection only from web servers is not efficient to perform web usage mining. This is because, not all the requests reach the web servers each and every time. To speed up the browsing of internet, proxy servers are also used thus reducing the load on a web server. So, proxy servers also acts as servers and also contain user access logs. These logs should also be analyzed to perform web usage mining.
PROCESS OF INTERNET USAGE MINING
The process of internet usage mining is composed of three steps. As given in the figure,
Pre-processing
Pattern Discovery
Pattern Analysis
Figure 4: Web usage mining process
Pre-processing:
Pre-processing is the process of preparing data received through server logs, proxy server logs and other data ready for pattern discovery and analysis task. The pre-processing task includes many processes. These are:
Data Cleaning: Involves removal of those log entries, which does not contribute to the data mining task. These unnecessary entries may be called noise.
Identification of users: Involves identification of users. It associates a page reference with a particular user. User identification is not an easy task because (i) a single IP address can be used by multiple users, (ii)
Different IP addresses can be used by a single user
Identification of session: involves identification of session over a web server. It associates a groups web page references into user/server session. It also involves some issues: (i) a single IP address can have multiple server sessions, such as in case of proxy servers. (ii) Multiple IP address can have a single server session.
Path Completion: Due to proxy servers, and caching, it is not always possible to get complete data from web servers. The access paths shown in web server are incomplete if some page is referenced through proxy servers or cache. Path completion is the process of completing those incomplete paths.
Pattern Discovery:
Once the necessary transactions have been identified, the next step is the discovery of patterns. Pattern discovery phase extensively uses data mining algorithms. Various pattern discovery methods are:
Statistical Analysis: Statistical Analysis techniques are most commonly used techniques. These include frequency distribution, Mean, Mode, Median etc upon the web server logs. These techniques provide the basis for the IUM process. It provides the statistical data, and thus provides support for making market decisions.
Clustering: Clustering is division of ata into groups of similar objects. A cluster represents objects that are similar between themselves. From machine learning perspective clusters corresponds to hidden patterns. Many clustering algorithms have been devised. Some major algorithms includes: Hierarchical Methods, K-means method, Grid based Clustering etc. In IUM, two type of clusters needs to be discovered: Usage Clusters and Page Clusters. Usage clusters helps to identify groups of users having similar browsing patterns. Page clusters helps to identify groups of pages with similar content. A dynamic clustering based model based on Markov Analysis is presented in [15]
Classification: Classification is a procedure in which individual items are placed into groups based on quantitative information on one or more characteristics inherent in the items (referred to as traits, variables, characters, etc) and based on a training set of previously labeled items. Formally, the problem can be stated as follows: given training data {(x1, y1),., (xn, yn)} produce a classifier which maps any object
to its true classification label defined by some unknown mapping (ground truth). For example, if the problem is filtering spam, then is
some representation of an email and y is either “Spam” or
“Non-Spam”. Statistical classification algorithms are typically used in pattern recognition systems. In WUM, we are interested in profiling users from same class. Classification algorithms includes: K-Nearest-Neighbor (KNN) Algorithm, Naïve Bayesian (NB) Algorithm, Concept Vector based Algorithms etc.
Association: Association Algorithms find correlations between different attributes in a dataset. The most common application of this kind of algorithm is for creating association rules, which can be used in a market basket analysis. For example, Microsoft association Algorithm. In IUM, association algorithms are used to relate web pages which were referenced by a user in a single session.. Algorithms like Apriori can be used for association rule mining.
Sequential patterns: Sequential patterns tend to find inter- transaction patterns in such a way that one pattern is followed by another in a time sequential manner. Web logs are periodically recorded in Web Servers. These log entries also includes time-stamps associated with each user visit on the link. These sequential patterns can help organizations to predict the future visit time of the user over their website. It can also help to establish the relation that which file/page was visited most during which user session/day/time/week/month.
Pattern Analysis:
Pattern Analysis is the last step in our IUM process. This helps to analyze organizations that how customers are accessing their website, and which are the pages they mostly visits. The purpose of pattern analysis is to filter out uninteresting rules and analyze the interesting rules which were found during the pattern discovery process. The major techniques included in this phase include:
SQL Queries Visualization Techniques OLAP Techniques and Usability analysis.
CONCLUSIONS
The Internet Usage Mining is special case of Data Mining where the usage patterns of web pages are analyzed. Web pages can be on one or more servers, and also can be in different formats. Internet Usage Mining is very useful tool for organizations who wants to keep their customer base. We provided a detailed survey of research in this area. Various softwares and tools are available in market for IUM. We also provided the demonstration of WebLogAnalyzer® by Nihuo. Though, the survey is short as the area is not very well established. There isimmense scope of research in this area for identifying new methods and tools to discover pattern and analyze them.
Web usage mining: discovery and applications of usage patterns from …
AbstractWeb usage mining is the application of data mining techniques to discover usage patterns from Web data, in order to understand and better serve the needs of Web-based applications. Web usage mining consists of three phases, namely preprocessing, pattern discovery, and pattern analysis. This paper describes each of these phases in detail. Given its application potential, Web usage mining has seen a rapid increase in interest, from both the research and practice communities. This paper provides a detailed taxonomy of the work in this area, including research efforts as well as commercial offerings. An up-to-date survey of the existing work is also provided. Finally, a brief overview of the WebSIFT system as an example of a prototypical Web usage mining system is given.
References
Accrue. ]Google ScholarAlladvantage. ]Google ScholarAndromedia aria. ]Google ScholarBroádvision. ]Google ScholarHit list commerce, ]Google ScholarLikeminds. ]Google ScholarNetgenesis. ]Google ScholarNetperceptions. ]Google ScholarNetzero. ]Google ScholarPlatform for privacy project. ]Google ScholarSurfaid analytics. ]Google ScholarTruste: Building a web you can believe in. ]Google ScholarWebtrends log analyzer. ]Google ScholarWorld wide web committee web usage characterization activity. ]Google ScholarEuropean commission, the directive on the protection of individuals with regard ot the processing of personal data and on the free movement of such data., 1998. ]]Google ScholarData mining: Crossing the chasm, 1999. Invited talk at the 5th ACM SIGKDD Int’l Conference on Knowledge Discovery and Data Mining(KDD99). ]]Google ScholarCharu C Aggarwal and Philip S Yu. On disk caching of web objects in proxy servers. In CIKM 97, pages 238–245, Las Vegas, Nevada, 1997. ]] Google ScholarDigital LibraryR. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. of the 20th VLDB Conference, pages 487–499, Santiago, Chile, 1994. ]] Google ScholarDigital LibraryVirgilio Almeida, Azer Bestavros, Mark Crovella, and Adriana de Oliveira. Characterizing reference locality in the Technical Report TR-96-11, Boston University, 1996. ]] Google ScholarDigital LibraryMartin F Arlitt and Carey L Williamson. Internet web servers: Workload characterization and performance implications. IEEE/ACM Transactions on Networking, 5(5):631–645, 1997. ]] Google ScholarDigital LibraryM. Balabanovic and Y. Shoham. Learning information retrieval agents: Experiments with automated web browsing. In On-line Working Notes of the AAAI Spring Symposium Series on Information Gathering from Distributed, Heterogeneous Environments, 1995. ]]Google ScholarAlex Buchner and Maurice D Mulvenna. Discovering internet marketing intelligence through online analytical web usage mining. SIGMOD Record, 27(4):54–61, 1998. ]] Google ScholarDigital LibraryL. Catledge and J. Pitkow. Characterizing browsing behaviors on the world wide web. Computer Networks and ISDN Systems, 27(6), 1995. S. Chen, J. Han, and P. Yu. Data mining: An overview from a database perspective. IEEE Transactions on Knowledge and Data Engineering, 8(6):866–883, 1996. Park, and P. Data mining for path traversal patterns in a web environment. In 16th International Conference on Distributed Computing Systems, pages 385–392, 1996. ]] Google ScholarDigital LibraryRoger Clarke. Internet privacy concerns conf the case for intervention. 42(2):60–67, 1999. ]] Google ScholarDigital LibraryE. Cohen, B. Krishnamurthy, and J. Rexford. Improving end-to-end performance of the web using server volumes and proxy filters. ACM SIGCOMM, pages 241–253, 1998. ]] Google ScholarDigital LibraryRobert Cooley, Bamshad Mobasher, and Jaideep Srivastava. Grouping web page references into transactions for mining world wide web browsing patterns. In Knowledge and Data Engineering Workshop, pages 2–9, Newport Beach, CA, 1997. IEEE. ]] Google ScholarDigital LibraryRobert Codley, Bamshad Mobasher, and Jaideep Srivastava. Web mining: Information and pattern discovery on the world wide web. In International Conference on Tools with Artificial Intelligence, pages 558–567, Newport Beach, 1997. Data preparation for mining world wide web browsing patterns. Knowledge and Information Systems, 1(1), 1999. ]]Google ScholarRobert Cooley, Pang-Ning Tan, and Jaideep Srivastava. Discovery of interesting usage patterns from web data. Technical Report TR 99-022, University of Minnesota, 1999. ]]Google ScholarT. Fawcett and F. Provost. Activity monitoring: Noticing interesting changes in behavior. In Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 53–62, San Diego, CA, 1999. ACM. ]] Google ScholarDigital LibraryU. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. From data mining to knowledge discovery: An overview. ACM KDD, 1994. ]]Google ScholarDavid Gibson, Jon Kleinberg, and Prabhakar Raghavan. Inferring web communities from link topology. In Conference on Hypertext and Hypermedia. ACM, 1998. ]] Google ScholarDigital LibraryChi E. H., Pitkow J., Mackinlay J., Pirolli P., Gossweiler, and Card S. K. Visualizing the evolution of web ecologies. In CHI ’98, Los Angeles, California, 1998. ]] Google ScholarDigital LibraryBernardo Huberman, Peter Pirolli, James Pitkow, and Rajan Kukose. Strong regularities in world wide web surfing. Technical report, Xerox PARC, 1998. Joachims, D. Freitag, and T. Mitchell. Webwatcher: A tour guide for the world wide web. In The 15th International Conference on Artificial Intelligence, Nagoya, Japan, 1997. ]]Google ScholarReagle Joseph and Cranor Lorrie Faith. The platform for privacy preferences. 42(2):48–55, 1999. ]] Google ScholarDigital LibraryH. Lieberman. Letizia: An agent that assists web browsing. of the 1995 International Joint Conference on Artificial Intelligence, Montreal, Canada, 1995. ]]Google ScholarDigital LibraryStephen Lee Manley. An Analysis of Issues Facing World Wide Web Servers. Undergraduate, Harvard, 1997. ]]Google ScholarB. Masand and M. Spiliopoulou, editors. Workshop on Web Usage Analysis and User Profiling (WebKDD), 1999. ]] Google ScholarDigital LibraryB. Mobasher, N. Jain, E. Han, and J. Srivastava. Web mining: Pattern discovery from world wide web transactions. (TR 96-050), 1996. ]]Google ScholarBamshad Mobasher, Robert Cooley, and Jaideep Srivastava. Creating adaptive web sites through usage-based clustering of urls. In Knowledge and Data Engineering Workshop, 1999. ]] Google ScholarDigital LibraryOlfa Nasraoui, Raghu Krishnapuram, and Anupam Joshi. Mining web access logs using a fuzzy relational clustering algorithm based on a robust estimator. In Eighth International World Wide Web Conference, Toronto, Canada, 1999. ]]Google ScholarD. W. Ngu and X. Wu. Sitehelper: A localized agent that helps incremental exploration of the world wide web. In 6th International World Wide Web Conference, Santa Clara, CA, 1997. ]] Google ScholarDigital LibraryBalaji Padmanabhan and Alexander Tuzhilin. A belief-driven method for discovering unexpected patterns. In Fourth International Conference on Knowledge Discovery and Data Mining, pages 94–100, New York, New York, 1998. ]]Google ScholarDigital LibraryM. Pazzani, L. Nguyen, and S. Mantik. Learning from hotlists and coldlists: Towards a www information filtering and seeking agent. In IEEE 1995 International Conference on Tools with Artificial Intelligence, 1995. ]] Google ScholarDigital LibraryMike Perkowitz and Oren Etzioni. Adaptive web sites: Automatically synthesizing web pages. In Fifteenth National Conference on Artificial Intelligence, Madison, WI, 1998. Adaptive web sites: Conceptual cluster mining. In Sixteenth International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 1999. ]] Google ScholarDigital LibraryPeter Pirolli, James Pitkow, and Ramana Rao. Silk from a sow’s ear: Extracting usable structures from the web. In CHI-96, Vancouver, 1996. ]] Google ScholarDigital LibraryG. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, New York, 1983. ]] Google ScholarDigital LibraryS. Schechter, M. Krishnan, and M. D. Smith. Using path profiles to predict requests. In 7th International World Wide Web Conference, Brisbane, Australia, 1998. ]] Google ScholarDigital LibraryCyrus Shahabi, Amir M Zarkesh, Jafar Adibi, and Vishal Shah. Knowledge discovery from users web-page navigation. In Workshop on Research Issues in Data Engineering, Birmingham, England, 1997. Spertus. Parasite: Mining structural information on the web. Computer Networks and ISDN Systems: The International Journal of Computer and Telecommunication Networking, 29:1205–1215, 1997. ]] Google ScholarDigital LibraryMyra Spiliopoulou and Lukas C Faulstich. Wum: A web utilization miner. In EDBT Workshop WebDB98, Valencia, Spain, 1998. Springer Verlag. ]]Google ScholarKun-lung Wu, Philip S Yu, and Allen Ballman. Speed-tracer: A web usage mining and analysis tool. IBM Systems Journal, 37(1), 1998. ]] Google ScholarDigital LibraryT. Yan, M. Jacobsen, H. Garcia-Molina, and U. Dayal. From user access patterns to dynamic hypertext linking. In Fifth International World Wide Web Conference, Paris, France, 1996. ]] Google ScholarDigital LibraryO. R. Zaiane, M. Xin, and J. Han. Discovering web access patterns and trends by applying olap and data mining technology on web logs. In Advances in Digital Libraries, pages 19–29, Santa Barbara, CA, 1998. ]] Google ScholarDigital LibraryAmir Zarkesh, Jafar Adibi, Cyrus Shahabi, Reza Sadri, and Vishal Shah. Analysis and design of server informative wwwsites. In Sixth International Conference on Information and Knowledge Management, Las Vegas, Nevada, 1997. ]] Google ScholarDigital Library
Index Terms (auto-classified)
Web usage mining: discovery and applications of usage patterns from Web data
Login optionsCheck if you have access through your login credentials or your institution to get full access on this inInformationContributorsPublished in
ACM SIGKDD Explorations Newsletter Volume 1, Issue 2January 2000115 pages
Copyright © 2000 AuthorsPublisherAssociation for Computing MachineryNew York, NY, United States
Publication HistoryPublished: 1 January 2000
ConferenceBibliometricsCitations1, 312Article MetricsView CitationsDownloads (Last 12 months)184Downloads (Last 6 weeks)14PDF FormatView or Download as a PDF file. PDFeReaderView online with eader
Frequently Asked Questions about web data mining definition
What is Internet data mining?
Internet mining is the application of data mining techniques to discover patterns from the Internet. Internet Usage Mining (IUM) is the process of application of data mining techniques over web data. The data sources are mainly the web server logs, proxy server logs and cookies stored in the users computer.
What is web usage mining in data mining?
Web usage mining is the application of data mining techniques to discover usage patterns from Web data, in order to understand and better serve the needs of Web-based applications. Web usage mining consists of three phases, namely preprocessing, pattern discovery, and pattern analysis.Jan 1, 2000
What is web mining types of web mining?
Web mining methods are divided into three categories: web content mining, web structure mining and web usage mining. There are several functional areas including e- commerce web mining, text mining, and management of customer behavior.