Posted by: Sai - 02-26-2014, 04:07 PM - Forum: Project
- No Replies
Project Abstract
Auto-summarization tool
Auto-summarization is a technique used to generate summaries of electronic documents. This has some applications like summarizing the search-engine results, providing briefs of big documents that do not have an abstract etc. There are two categories of summarizers, linguistic and statistical. Linguistic summarizers use knowledge about the languange (syntax/semantics/usage etc) to summarize a document. Statistical ones operate by finding the important sentences using statistical methods (like frequency of a particular word etc). Statistical summarizers normally do not use any linguistic information.
In this project, an auto-summarization tool is developed using statistical techniques. The techniques involve finding the frequency of words, scoring the sentences, ranking the sentences etc. The summary is obtained by selecting a particular number of sentences (specified by the user) from the top of the list. It operates on a single document (but can be made to work on multiple documents by choosing proper algorithms for integration) and provides a summary of the document. The size of the summary can be specified by the user when invoking the tool. Pre-processing interfaces are there to handle the following document types: Plain Text, HTML, Word Document.
Following is a list of the functional components of the tool.
1. Text pre-processor. This will work on the HTML or Word Documents and convert them to plain text for processing by the rest of the system.
2. Sentence separator. This goes through the document and separates the sentences based on some rules (like a sentence ending is determined by a dot and a space etc). Any other appropriate criteria might also be added to separate the sentences.
3. Word separator. This separates the words based on some criteria (like a space denotes the end of a word etc).
4. Stop-words eliminator. This eliminates the regular English words like ‘a, an, the, of, from..’ etc for further processing. These words are known as ‘stop-words’. A list of applicable stop-words for English is available on the Internet.
5. Word-frequency calculator. This calculates the number of times a word appears in the document (stop-words have been eliminated earlier itself and will not figure in this calculation) and also the number of sentences that word appears in the document. For example, the word ‘Unix’ may appear a total of 100 times in a document, and in 80 sentences. (Some sentences might have more than one occurrence of the word). Some min-max thresholds can be set for the frequencies (the thresholds to be determined by trial-and-error)
6. Scoring algorithm. This algorithm determines the score of each sentence. Several possibilities exist. The score can be made to be proportional to the sum of frequencies of the different words comprising the sentence (ie, if a sentence has 3 words A, B and C, then the score is proportional the sum of how many times A, B and C have occurred in the document). The score can also be made to be inversely proportional to the number of sentences in which the words in the sentence appear in the document. Likewise, many such heuristic rules can be applied to score the sentences.
7. Ranking. The sentences will be ranked according to the scores. Any other criteria like the position of a sentence in the document can be used to control the ranking. For example, even though the scores are high, we would not put consecutive sentences together.
8. Summarizing. Based on the user input on the size of the summary, the sentences will be picked from the ranked list and concatenated. The resulting summary file could be stored with a name like <originalfilename>_summary.txt.
9. User Interface. The tool could use a GUI or a plain command-line interface. In either case, it should have easy and intuitive ways of getting the input from the user (the document, the size of the summary needed etc).
sir there are three module require that are admin,owner and traveler.ex:-flipkey.com.I am having problem in designing and also somewhere in coding so please send me your work.
i am waiting for it sir.I Wanted this project in php
I want source code for this project.Please do help me out..
Asthma is a chronic lung disease that blocks the airways which makes the patient susceptible to irritations and allergies.So by using fuzzy rule based expert systems we need to detect asthma and the levels of asthma
hello there,
please send me the working project for the topic: "A Secure Protocol for Spontaneous Wireless Ad Hoc Networks Creation", it is the ieee project.
abstract:
This paper presents a secure protocol for spontaneous wireless ad hoc networks which uses a hybrid symmetric/ asymmetric scheme and the trust between users in order to exchange the initial data and to exchange the secret keys that will be used to encrypt the data. Trust is based on the first visual contact between users. Our proposal is a complete self-configured secure protocol that is able to create the network and share secure services without any infrastructure. The network allows sharing resources and offering new services among users in a secure environment. The protocol includes all functions needed to operate without any external support. We have designed and developed it in devices with limited resources. Network creation stages are detailed and the communication, protocol messages, and network management are explained. Our proposal has been implemented in order to test the protocol procedure and performance. Finally, we compare the protocol with other spontaneous ad hoc network protocols in order to highlight its features and we provide a security analysis of the system.
I want a project on Genetic algorithm in which 2 hindustani raagas are merged using genetic algorithm to generate music (good music). this is final year project plz help.
As electronic communication became more prevalent, mobile
and universal, the threats of data compromises also grown larger.
Security is the prime concern in this world.BSN interconnects the
sensors placed in the human body. Biosensors are used to collect
the biological data from different parts of the body at different
times. Security and privacy are the potential problem in BSN.
The solution to this problem is using the biometric approach. The
conventional biometrics cannot be used in the BSN because it is
not having high randomness and it is time invariant. Thus novel
biometrics can be used.
Human body is inherently having the ability to communicate in
a secure manner. Thus the physiological characteristics can be
captured by the sensors of BSN to generate the entity identifier.
The main goal of this paper is to secure the communication
between the nodes of the BSN and to avoid interference. The
InterPulsed Interval (IPI) of the ECG signal can be used as a
biometric trait to distribute the key in a secure manner. Since the
medical information is sent over the network, the security plays
a vital role in the BSN.
i need the source code to develop a flowchart generator of c language...n i even need to know how can i start doing the project..the project source code must be in java.
can anybody tell me what is the password??? i have downloaded source code for college management project from dis website...it is asking password to open the file