|
Lambda
Join Demonstration Wins Award at Supercomputing 02 Conference
November 21, 2002
Baltimore, Maryland. Project DataSpace, in a collaborative project
with researchers from Chicago, Ottawa and Amsterdam, has won
the SuperComputing '02 High Performance Bandwidth Challenge
Award for
Innovative, High Speed, Data Correlation--Best Use of Emerging
Infrastructure. The group includes researchers from the National
Center for Data Mining at the University of Illinois at Chicago
(UIC), CANARIE, and SARA, who have been working together over
the past year to produce real-time merging of data over lambda
networks. At SC02, they presented the first demonstration of
the technology, with impressive results.
For the past two decades, database researchers have optimized
the ability of databases to join two tables in a database by
a common key, such as an employee or product ID. Database joins
are one of the key
technologies that make data processing practical.
As more and more data is distributed over the internet, the
ability to join data located in two different global locations
is becoming critical. There are two fundamental problems: finding
efficient protocols to move data over long distances and finding
efficient algorithms to merge two data streams.
At the Supercomputing '02, significant progress was made on
both fronts.
A stream of data was moved over SURFnet connecting a cluster
of computers at SARA Computing and Networking Services in Amsterdam
and a cluster of computers at StarLight in Chicago at over 2.8
Gb/s. At the same time a stream of data was moved over Canada's
CA*net4 network connecting a computer cluster at CANARIE in
Ottawa and a UIC computer cluster at StarLight in Chicago at
over 2 Gb/s. Both streams used a new protocol called SABUL designed
for high performance data transport developed by the National
Center for Data Mining/Laboratory for Advanced Computing at
the University of Illinois at Chicago.
At the same conference, using computer clusters at the StarLight
facility in Chicago, two streams of data were merged at over
500 Mb/s per node in the three node cluster. These so called
"lambda joins" are
an important component for distributed data mining applications.
The algorithm for joining two lambda streams was developed by
scientists at the National Center for Data Mining at the University
of Illinois at
Chicago.
"Lambda data joins are an excellent early example of how CA*net4's
lightpath provisioning facility can be used to help build new
and innovative distributed services,' according to Bill St.
Arnaud, Senior Director for Advanced Networks at CANARIE.
To many network engineers, lambda and lightpath are used interchangeably
to describe a low layer end-to-end dedicated communications
channel of effective guaranteed bandwidth. Using protocols such
as SABUL, it is now possible to use lambdas to move large data
sets over long distances as fast as the data can be pulled from
disk. Using lambda joins, it is now possible to merge two such
streams and look for patterns.
"With lambda joins, it is now practtical to look for correlation
in data even if the data is scattered around the world," said
Robert Grossman, Director of the National Center for Data Mining
at the University of Illinois at Chicago and President of the
Two Cultures Group.
This demonstration was awarded one of the three Quest Bandwidth
Challenges Awards presented at this year's Supercomputing 02
Conference.
Shirley Connelly,
Associate Director, NCDM 312 413 2176, connelly@uic.edu.
Robert Grossman Director, NCDM > 312 413 2176, grossman@uic.edu.
|