Donghui Zhang's Recent Talks
The default length of each talk is one hour
-
[abstract]
Privacy-Preserving Data Publishing
- CCIS Ph.D. seminar, February 28, 2008.
-
[abstract]
PC-CSC: Profile-based and Constrained Compressed Skycube, extended from a SIGMOD'06 paper.
- UMass Lowell, October 26, 2007.
-
[abstract]
Algorithms in Spatial Databases (a short course)
- Renmin University, China, May 26 - June 1, 2007.
-
[abstract]
The Skyline Query in Databases: Which Objects are the Most Important?
- CCIS Ph.D. seminar, March 22, 2007.
-
[abstract]
The Min-Dist Optimal-Location Query, extended from a VLDB'06 paper.
- UMass Boston, October 17, 2006.
- CCIS Ph.D. Seminar, November 30, 2006.
- UMass Lowell, December 6, 2006.
-
[abstract]
Introduction to Spatial Database Research (30 min)
- CCIS Ph.D. Open House, Mar 23, 2007.
-
[abstract]
Introduction to Spatial Databases
- NEU ACM Talk, Feburary 21, 2007.
-
[abstract]
The optimal-location query, extended from an SSTD'05 paper.
- CCIS Ph.D. seminar, November 3, 2005.
- Boston University, April 19, 2006.
- WPI, April 28, 2006.
-
[abstract]
Spatial Database Query Processing using Indices.
- CCIS Ph.D. seminar, October 21, 2004.
-
[abstract]
On Spatial-Range Closest-Pair Query, extended from an SSTD'03 paper.
- UMass Boston, March 12, 2003.
-
[abstract]
ER-Miner: A New Method to Mine Essential Rules and
Constrained Essential Rules.
- CCIS Ph.D. seminar, 2003.
Talk Abstracts
-
Privacy-Preserving Data Publishing
Data publishing has a widespread need. For instance, a medical
institute may need to publish its medical data, so as to enable
the public researchers to ask queries such as: "what fraction of
males in Massachusetts have lung cancer?" However, at the same
time the privacy of the individuals needs to be preserved. The
first attempt, of simply removing the name attributes before
publishing
the data, does not work. The values of the remaining attributes, if
unique, may reveal the identities of some individuals. This talk
discusses some existing and on-going work on this topic.
-
PC-CSC: Profile-based and Constrained Compressed Skycube
This talk introduces the SIGMOD'06 paper on the Compressed Skycube, and
then raises the algorithmic question on how to choose a set of CSCs to
build given a query profile and a resource constraint. In more
detail, the CSC (built on all attributes) supports the skyline queries on
an arbitrary set of attributes, e.g. considering (#points, #rebounds,
#assists), who are the best NBA players? Here a player is one of the
best NBA players if there does not exist another player who is better in
all dimensions. With a high dimensional dataset, the complete CSC may be
too large. It may be better to build a set of CSCs, each on a subset of
attributes. Given a query profile such as the percentage of queries on
every subset of attributes, and a constraint such as storage size, the
open question is what CSCs to build which satisfy the constraint and
which covers the maximal number of queries.
-
Algorithms in Spatial Databases (a short course)
Spatial databases have many practical applications, and have close relationship with GIS, Computational Geometry, and Operations Research. The spatial database research community has produced many interesting algorithms for solving various spatial queries. This course covers some of these algorithms. It discusses spatial index structures such as the R-tree. It introduces the concepts and algorithms of the range query, nearest neighbor query, reverse nearest neighbor query, aggregation query, fastest-path query, optimal location query, skyline query, and join query. By learning the state-of-the-art algorithms for processing these spatial queries, the students are expected to develop a foundational understanding of the spatial database research field.
-
The Skyline Query in Databases: Which Objects are the Most Important?
Recently, the database research community has witnessed a strong interest in the skyline operator. This operator can be used to suggest important objects among a set of multi-attribute objects where different attributes may not be comparable. Given a set of d-dimensional objects, the skyline is the set of objects that are not dominated by any other object. An object O1 dominates another object O2 if O1 is better than or equal to O2 in all dimensions, and is strictly better than O2 in at least one dimension. A classic example is to find important hotels in Nassau, Bahamas. A user wants to find hotels which are both cheap and close to beach. The traditional top-k query requires the user to provide a preference function which converts a pair of price and distance to a single numeric score.
However, this requirement may not be reasonable for some applications, since different attributes may not be comparable. The skyline concept provides a way of suggesting important objects, without requiring comparison among different attributes. This talk introduces the skyline concept and describes the progressive and optimal BBS algorithm. It also presents the Compressed Skycube developed by the Database Lab at NEU.
-
The Min-Dist Optimal-Location Query
This paper proposes and solves the min-dist optimal-location
query in spatial databases. Given a set S of sites, a set O
of weighted objects, and a spatial region Q, the min-dist
optimal-location query returns a location in Q which, if a
new site is built there, minimizes the average distance from
each object to its closest site. This query can help a franchise
(e.g. McDonald¡¯s) decide where to put a new store in or-
der to maximize the benefit to its customers. To solve this
problem is challenging, for there are theoretically infinite
number of locations in Q, all of which could be candidates.
This paper first provides a theorem that limits the number
of candidate locations without losing the power to find ex-
act answers. Then it provides a progressive algorithm that
quickly suggests a location, tells the maximum error it may
have, and keeps refining the result. When the algorithm
finishes, the exact answer can be found. The intermediate
result of early runs can be used to prune the search space
for later runs. Crucial to the pruning technique are novel
lower-bound estimators. The proposed algorithm, the effect
of several optimizations, and the progressiveness are exper-
imentally evaluated.
-
Introduction to Spatial Database Research (30 min)
The spatial database research is an interesting sub-field of
database research. It has close relationship with GIS, Computational
Geometry, and Operations Research. And it has practical applications.
This talk introduces the spatial database research field by discussing
several crucial spatial queries.
In particular, the talk will introduce the concepts of the fastest-path query,
range query, aggregation query,
nearest neighbor query, reverse nearest neighbor query, optimal
location query, and skyline query.
The paper then discusses one query processing scheme: the compressed skycube.
-
Introduction to Spatial Databases
The spatial database research is an interesting sub-field of
database research. It has close relationship with GIS, Computational
Geometry, and Operations Research. And it has practical applications.
This talk introduces the spatial database research field by discussing
several crucial spatial queries and their efficient processing algorithms.
In particular, the talk will introduce the range query, aggregation query,
nearest neighbor query, reverse nearest neighbor query, optimal
location query, and skyline query.
-
The Optimal-Location Query
We propose and solve the optimal-location query in spatial
databases. Given a set S of sites, a set O of weighted objects, and a
spatial region Q, the optimal-location query returns a location in Q with
maximum influence. Here the influence of a location l is the total weight
of its RNNs, i.e. the total weight of objects in O that are closer to l than
to any site in S. This new query has practical applications, but is very
challenging to solve. Existing work on computing RNNs assumes a single
query location, and thus cannot be used to compute optimal locations.
The reason is that there are infinite candidate locations in Q. If we
check a finite set of candidate locations, the result can be inaccurate,
i.e. the revealed location may not have maximum influence. This paper
proposes three methods that accurately compute optimal locations. The
first method uses a standard R*-tree. To compute an optimal location,
the method retrieves certain objects from the R*-tree and sends them as
a stream to a plane-sweep algorithm, which uses a new data structure
called the aSB-tree to ensure query efficiency. The second method is
based on a new index structure called the OL-tree, which novelly extends
the k-d-B-tree to store segmented rectangular records. The OL-tree is
only of theoretical usage for it is not space efficient. The most practical
approach is based on a new index structure called the Virtual OL-tree.
These methods are theoretically and experimentally evaluated.
-
Spatial Database Query Processing using Indices
This talk first introduces the concept of spatial databases and points some
of its applications. Then it describes two representative spatial index structures:
the R-tree and k-d-B-tree. Finally, efficient index-based algorithms are discussed
to solve the selection query, aggregation query, and nearest neighbor query.
-
On Spatial-Range Closest-Pair Query
An important query in spatial database research is to find the closest pair of objects in a given space. Existing work assumes two different data sets are both indexed by R-trees, and finds the closest pair in the whole space via an optimized R-tree join technique. However, this technique does not perform well when the two data sets are identical. In this paper, we address the closest pair problem within the same data set. Furthermore, we consider an extended version of the query, which involves a spatial range. We now are interested in finding the closest pair of objects among those inside a given range. We first extend existing techniques to solve the new problem. We then propose two index structures based on augmenting the R-tree and show that in both cases, the augmented information is easy to maintain. Experimental results show that our structures are more robust than earlier approaches.
-
ER-Miner: A New Method to Mine Essential Rules and
Constrained Essential Rules
One fundamental task in the data mining area is to find strong
association rules. That is, association rules whose support and
confidence are both beyond given minimum thresholds. Recent
research has shown that certain rules can ``imply'' others. We describe
efficient algorithm to mine
``essential'' rules, i.e. strong rules which cannot be implied by
any other strong rule. The result size is very compact, and the set of
strong rules can be derived from the essential rules without accessing
the transactional database.