Skyline WebSite Developer Manual 1.0


1. General: Skyline Website and this document

This Website is for research purpose on Skyline query in Database research field. Although it is just a course project for the CSG130 Introduction to Database System in Northeastern University, under the instruction of Professor Donghui Zhang, we plan to make it improved for the continue research on Skyline query in on-line environment.

1.0 Copyleft

This part is always the most important, and also, the least important part of a developer manual, so we entitle it to be 0. This document is under the protection of GFDL

  Copyright (C)  2006 Jarod Wen.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.1
or any later version published by the Free Software Foundation;
A copy of the license is included in the section entitled ``GNU
Free Documentation License''.

The code of Skyline website, are under the GNU General Public License (see GPL). This site is based on .NET 2.0, but still we will try to share these code with all the researchers in this field. So all the codes are under the protection of GNU General Public License. Although it is a little funny to use GPL on a C# code, we think it is necessary to all of you who are reading this documents.

1.1 What is Skyline?

Skyline query has been well known as the Maximum Vector problem, Pareto Set, which means that to find objects in a data set who will not be dominated by any other objects. Here "dominate" means that: if object A dominate B, then all the attributions of A will have no worse value than B, and there is at least one attribution for A which is better than the corresponded attribution in B. The comparation for different attributions will be different: some may be "large-better", while others will be "smaller-better".

A classical example on 2-D skyline query is the "Hotel-near-beach" problem. Assume you will select a hotel near the beach for your holiday, and you have a dataset containing all the information you will use: for each hotel, there are their price and the distance to the beach. Your aim is to find the best one who will have lower price and short distance to the beach. The following figure, which is from The Skyline Operator by Stephan B¨orzs¨onyi, Donald Kossmann and Konrad Stocker, can give a better view on this problem.


1.2 How to compute the skyline object?

In the database research field, there are already a lot of algorithms for the skyline query for different environment. Following is a list on the most common used algorithms in the research and discussion, with the source they are from:


More and more efficient algorithms will come out for better performance on skyline query, which is also the purpose for our website.

1.3 About the platform: ASP.NET 2.0

Due to the require of the course project, the Skyline Website is constructed on the .NET Framework 2.0 using C#, edited in VS2005.

1.4 About this website and the Task List

This site is for the research aim, so with the going on research in Skyline Computation, more tests and features will be added in. Till now the performance we achieved can be far away from satisfication. So the following task list will show the direction we will plunge into in the future.

1.5 About this document

This document is developer-aimed, for the possible improvement, suggestions and modifications from any researchers and other else.

Lots of the description on the algorithms we used in our website can be found from their original papers, and the code here will not guarantee to be the best implementation, so try to give us your cool codes on them to improve the performance!

2. Roadmap: Whole structure of the website

Until Nov 29th, 2006, when it is the deadline of the project for the course, the whole website just achieved a few features on the skyline computation: basic skyline query using BNL algorithm and presorting algorithm; user defined data schema and data set.

The whole project contains the following folders and files.

3. Compute Skyline objects

In this part, the ideas used to compute the skyline objects in our website will be discussed. All the codes about these algorithms can be found in the App_code folder. In implementation of our algorithms, all the objects in data set have been considered as a row of data in a data table, which is correspond to the data table in database.

3.1 BNL(Block-nested-loops)

BNL algorithm is a straight-forward computation process for skyline query. This algorithm keep a window to store candidate skyline objects and then fix the real skyline objects by comparing objects in the window with all the objects not in the window. Finally all the objects which cannot be dominated by any other objects will be left in the window as the skyline objects.
In [BKS01], the window for the BNL algorithm is fixed, however in our implementation, it is dynamic ranged, which can improve the efficiency but heavier memory load.
In BasicSkyline.cs, we firstly defined the operator blnDominate, which will return whether one row in a data table can dominate another row. This operator will be called in the BasicBNLSkyline function, and then this function will compare all the objects for the skyline object which will not be dominated by any other objects.

3.2 SFS(Sort Filter Skyline)

SFS requires presorted data on all dimensions of the data table for each column(attribution) according to the descend or ascend sequence. It is based on the theory that the skyline represents the closure over the maximum scoring tuples of the relation with respect to all monotone scoring functions[CGGL03].

With no duplicate in the data table, all the data can be ranked according to each of the columns(attributions), and then the one with higher rank compared to the others will surely be the skyline object. Based on this idea, a sweep algorithm can be implemented on the presorted data to scan row-by-row to record the objects with higher ranks in each columns as the skyline objects. PreSort is the function to presort the date for each columns(attributions), and then SkylineQuery_Presorted is used for the skyline query by sweeping.

The actual data always have duplication, however, and the simple sweep can no longer get the correct answer due to the reason that the non-skyline object which has the same values on some columns(attributions) as one of the skyline objects will be counted as a skyline object. Now our attempt is use basic BNL algorithm for the objects gained from the sweep algorithm.

3.3 Epsilon Skyline

Epsilon skyline is a new attemption on the fixed number of skyline query, or k-skyline, which will find exactly k objects in the data table with better attributions than any others. Common skyline query always can't control the number of the objects queried out, and this method is an improvement for the user defined skyline computation.

4. User defined data schema

Users defined data schema provides easier and more customized skyline queries. Using this feature, users can upload their own data schemes and also the whole data table into the database, and then run skyline query on them. All the features are included in the UserDefined_1.aspx.cs and UserDefined_2.aspx.cs.

4.1 Database setup

These features are supported by special data table in database: Schemas, which includes three attributions: SchemaID, SchemaName, Columns.

+Schemas
|-----SchemaID                              ID of the schemes
|-----SchemaName                          Name of the schemes
|-----Columns                                 Columns contained in the scheme, divided with , between columns

4.2 User defined data schema upload

This feature uses the DataSet adapter provided by ADO.NET 2.0, which will setup a data connection between the data and the project, and then provide necessary interface to the developer for select, delete, insert.

        <asp:ObjectDataSource ID="ObjectDataSource1" runat="server" DeleteMethod="Delete"
            InsertMethod="Insert" OldValuesParameterFormatString="original_{0}" SelectMethod="GetSchemaData"
            TypeName="dsSchemaTableAdapters.SchemasTableAdapter" UpdateMethod="Update">
            <DeleteParameters>
                ...
            </DeleteParameters>
            <UpdateParameters>
                ...
            </UpdateParameters>
            <InsertParameters>
                ...
            </InsertParameters>
        </asp:ObjectDataSource>

Then we use two controls: GridView and DetailView, both of which are new controls in .NET2.0, to provide the on-the-fly data schemes upload:

<asp:GridView ID="GridView1" runat="server" DataSourceID="ObjectDataSource1" OnSelectedIndexChanged="GridView1_SelectedIndexChanged"
            Width="650px" AutoGenerateColumns="False" DataKeyNames="SchemaID">
...

<asp:DetailsView ID="DetailsView1" runat="server" AutoGenerateRows="False"
            DataSourceID="ObjectDataSource1" Height="50px" Width="649px" DataKeyNames="SchemaID" DefaultMode="Insert">
...

We also provide the feature to upload schemes from text file. The format of the text file is required to be like the following:

Column1   Column2   Column3

In the text file, there will be a tab ("\t") between two column names, and the scheme will be terminated by an enter ("\n"). The btnUpload_Click behavior then will read all the text in the text file and just turn the first line as the active scheme for storing, which means that this feature can only read one scheme from one text file each time.

4.3 Bulk upload user defined data into database

This feature will enable user to upload their data into database from a text file which contains data in the follow format:

R1C1   R1C2   R1C3   ...   R1Cn
R2C1   R2C2   R2C3   ...   R2Cn
...
RmC1   RmC2   RmC3   ...   RmCn

There is a tab ("\t") between two columns and each data row is terminated by an enter ("\n").

The upload feature uses the System.Data.SqlClient.SqlBulkCopy in blnBulk2DB, which will bulk upload data into an existing data table in database. It is new in the ADO.NET 2.0.

To avoid possible failure of upload, we use transaction to protect the integration of the data:

        SqlConnection conn = new SqlConnection(System.Configuration.ConfigurationManager.AppSettings["DB_NBAConnectionString"]);
        conn.Open();
        //Start transaction
        SqlTransaction trans = conn.BeginTransaction();
        ... ...
             //Start to bulk add data into new table
            SqlBulkCopy sqlBulkObj = new SqlBulkCopy(conn, SqlBulkCopyOptions.KeepIdentity, trans);
             ... ...
            trans.Commit();

5 System Error Logger

For the purpose of research, there will be unavoidable error in the running time of the website, and always these error messages are important for developer and researcher to check their algorithms. For this purpose, we make a error logger in the file SystemLog.cs to store most of the error into the log file ErrLog.txt.

5.1 How Error Logger works

The principle of the logger is very simple: we just maintain a log file in the main directionary of the website. If there is no such a file, a new log file will be created.

        //Check whether the log file is existing.
        if (!File.Exists(FilePath))
        {
            FileStream fs = new FileStream(FilePath, FileMode.CreateNew);
            fs.Close();
        }

The length of the log file is limited in 24KB. If one error occurs, all the error message will be appended into this file.

        //Write error log.
        if (File.ReadAllBytes(FilePath).Length >= 240000)
        {
            File.WriteAllText(FilePath, DateTime.Now + "\t" + strException + "\n");
        }
        else
        {
            File.AppendAllText(FilePath, DateTime.Now + "\t" + strException + "\n");
        }

5.2 How to use this logger in your developing?

To redirect any error in your implementation to the log file, you can simply do it like this:

        try
        {
            // Your code here
        }
        catch (Exception ex)
        {
            SystemLog.WriteException(ex.ToString());
            // The terminate process
        }

5.3 The content of the ErrLog.txt

The content of the ErrLog.txt is the same as the standard format of any error threw out by .NET CLR debugger:

2006-11-17 18:05:21    System.ArgumentException: Column 'c2' does not belong to table .
   at System.Data.DataRow.GetDataColumn(String columnName)
   at System.Data.DataRow.get_Item(String columnName)
   at BasicSkyline.blnDominate(DataRow dr1, DataRow dr2, ArrayList columnList, ArrayList TypeList) in e:\Documents and Settings\LeeWinnie\My Documents\Visual Studio 2005\WebSites\SkylineWebSite\App_Code\BasicSkyline.cs:line 173
   at BasicSkyline.BasicBNLSkyline(DataTable dt, ArrayList columnList, ArrayList TypeList) in e:\Documents and Settings\LeeWinnie\My Documents\Visual Studio 2005\WebSites\SkylineWebSite\App_Code\BasicSkyline.cs:line 91

6 Reference

All the following reference paper can be found here: http://groups-beta.google.com/group/DB_Skyline , which is a discussion group on Skyline computation maintained by the author of this document.

7 LICENSE: GPL and GFDL

This document is under the protection of GFDL, and the program of the website is under GPL. Both of them can be found here: http://www.gnu.org/licenses/

8 Index





Skyline Website
Last Modified:Sun Nov 26 20:22:25 2006
Generated on Sun Nov 26 20:22:25 2006 for Skyline Website by  doxygen 1.5.1-p1