Sponsored by BMBF Logo

Work Package IV (AP-IV): Distributed Database Access and Data Stream Management

Work package IV achieves efficient and distributed Grid-based processing of data which originates from databases as well as data streams. Community members can subscribe to data streams linked to persistent data. This enables us to define special events which are fed from data streams and are combined with persistent data. These events can trigger specific actions. Selected community applications from various fields such as data analysis of the Millennium-Simulation, GAIA simulation database, or robotic telescopes support the development by providing scenarios.

  1. Disseminating, Publishing, and Processing Data in the Grid.

    To an increasing degree, researchers intend to provide large data sets in the Grid (e.g., the Millennium-Simulation, GAIA simulation database, or large catalogues) and process those efficiently in a distributed fashion. The means in use range from complex operations to complete process descriptions or workflows (see further developments on the Process Coordinator of the Planck project).

    By using mobile code, sophisticated description, and intelligent distribution mechanisms we efficiently parallelise the processing and minimise network traffic. To provide an appropriate load-balancing among the computers and across the network is pivotal.

  2. Grid-based Management and Processing of Data Streams.

    Providing decentralised and distributed information processing is the key to enable researchers to gain more thorough and new perceptions by associating and handling distributed persistent data (e.g., observations or simulations) or continuously generate data streams originating from sensors, gauging stations, telescopes and the like.

    To reach the necessary efficiency, we propose adaptive in-network query processing – moving the queries towards the data source – and thus optimising the data flow in the network.

Organisational Structure

Partners: MPA, MPE, TUM, ZAH

Work Package Manager: Tobias Scholl (TUM)

Technical Contact Partners:

  • Arthur Carlson (MPE)
  • Wolfgang Hovest (MPA)
  • Tobias Scholl (TUM)
  • Wolfgang Voges (MPE)
  • Joachim Wambsganß (ZAH)

Work Schedule

  1. Requirements specification and architecture design
  2. Development of a demonstration prototype
  3. Distributed query processing on persistent data
  4. Setup of a distributed Function-Provider-Server
  5. Development of a data stream management system
  6. Integration of persistent data into the data stream management system and further enhancements
  7. Realisation of an query optimiser for distributed query processing
  8. Deployment of data stream management into the infrastructure of the grid
  9. Test of the development through adaption of community applications