SUPPORTING PRIVACY
PROTECTION IN PERSONALIZED WEB SEARCH
ABSTRACT
Personalized
web search (PWS) has demonstrated its effectiveness in improving the quality of
various search services on the Internet. However, evidences show that users’
reluctance to disclose their private information during search has become a
major barrier for the wide proliferation of PWS. We study privacy protection in
PWS applications that model user preferences as hierarchical user profiles. We
propose a PWS framework called UPS that can adaptively generalize profiles by
queries while respecting user specified privacy requirements. Our runtime
generalization aims at striking a balance between two predictive metrics that
evaluate the utility of personalization and the privacy risk of exposing the
generalized profile. We present two greedy algorithms, namely GreedyDP and
GreedyIL, for runtime generalization. We also provide an online prediction
mechanism for deciding whether personalizing a query is beneficial. Extensive
experiments demonstrate the effectiveness of our framework. The experimental
results also reveal that GreedyIL significantly outperforms GreedyDP in terms
of efficiency.
The
solutions to PWS can generally be categorized into two types
Ø Click-log-based
methods and
Ø Profile-based
methods
Click-log-based methods
Ø The
click-log based methods are straightforward they simply impose bias to clicked
pages in the user’s query history.
Ø It
can only work on repeated queries from the same user, which is a strong
limitation confining its applicability.
Profile-based methods
Profile-based
methods can be potentially effective for almost all sorts of queries, but are
reported to be unstable under some circumstances.
Improve
the search experience with complicated user-interest models generated from user
profiling techniques.
PWS
has demonstrated more effectiveness in improving the quality of web search
recently, with increasing usage of personal and behavior information to profile
its users, which is usually gathered implicitly from query history, browsing
history, click-through data bookmarks, user documents and so forth.
EXISTING SYSTEM
Profile based PWS
Ø A
user profile is typically generalized for only once offline, and used to
personalize all queries from a same user indiscriminatingly.
Ø Such
“one profile fits all” strategy certainly has drawbacks given the variety of
queries.
Ø Profile-based
personalization may not even help to improve the search quality for some ad hoc
queries, though exposing user profile to a server has put the user’s privacy at
risk.
Ø A
better approach is to make an online decision on whether to personalize the
query and what to expose in the user profile at runtime.
Customization of
privacy requirements
Ø This
considers, all the sensitive topics are detected using an absolute metric
called surprisal based on the information theory, assuming that the interests
with less user document support are more sensitive.
Iterative user
interactions
Ø They
usually refine the search results with some metrics which require multiple user
interactions, such as rank scoring, average rank, and so on.
Ø This
paradigm is, however, infeasible for runtime profiling, as it will not only
pose too much risk of privacy breach, but also demand prohibitive processing
time for profiling.
Ø Thus,
we need predictive metrics to measure the search quality and breach risk after
personalization, without incurring iterative user interaction.
Disadvantages
Ø The
existing profile-based PWS do not support runtime profiling.
Ø The
existing methods do not take into account the customization of privacy
requirements.
Ø Many
personalization techniques require iterative user interactions when creating
personalized search results.
PROPOSED SYSTEM
Ø To
propose UPS (User customizable Privacy-preserving Search) framework, which is a
privacy-preserving personalized web search framework, which can generalize
profiles for each query according to user-specified privacy requirements.
Ø To
develop two simple but effective generalization algorithms, GreedyDP and
GreedyIL, to support runtime profiling. GreedyDP tries to maximize the
discriminating power (DP), GreedyIL attempts to minimize the information loss
(IL).
Ø The
framework assumes that the queries do not contain any sensitive information,
and aims at protecting the privacy in individual user profiles while retaining
their usefulness for PWS.
Ø UPS
consists of a nontrusty search engine server and a number of clients. Each
client (user) accessing the search service trusts no one but himself/ herself.
Ø The
key component for privacy protection is an online profiler implemented as a
search proxy running on the client machine itself.
Ø The
proxy maintains both the complete user profile, in a hierarchy of nodes with
semantics, and the user-specified (customized) privacy requirements represented
as a set of sensitive-nodes.
Ø During
the offline phase, a hierarchical user profile is constructed and customized
with the user-specified privacy requirements.
Ø The
online phase handles queries as When a user issues a query qi on the client,
the proxy generates a user profile in runtime in the light of query terms. The
output of this step is a generalized user profile Gi satisfying the privacy
requirements. The generalization process is guided by considering two
conflicting metrics, namely the personalization utility and the privacy risk,
both defined for user profiles.
Ø The
query and the generalized user profile are sent together to the PWS server for
personalized search.
Ø The
search results are personalized with the profile and delivered back to the
query proxy.
Ø Finally,
the proxy either presents the raw results to the user, or reranks them with the
complete user profile.
Advantages
Ø UPS
provides runtime profiling, which in effect optimizes the personalization
utility while respecting user’s privacy requirements;
Ø Allows
for customization of privacy needs; and
Ø Does
not require iterative user interaction.
Ø Provides
an inexpensive mechanism for the client to decide whether to personalize a
query in UPS.
Hardware requirements:
Processor : Any Processor above 500
MHz.
Ram : 128Mb.
Hard
Disk : 10 Gb.
Compact
Disk : 650 Mb.
Input
device : Standard Keyboard and Mouse.
Output
device : VGA and High Resolution Monitor.
Software requirements:
Operating
System : Windows Family.
Language : JDK 1.5
Database : MySQL 5.0
Tool : HeidiSQL 3.0