1. Open Data Hub Architecture¶
The architecture of the Open Data Hub is depicted in Figure 1.1, which shows its composing elements together with its main goal: To gather data from Data Sources and make them available to Data Consumers, which are usually third-party applications that use those data in any way that they deem useful, including (but not limited to) study the evolution of historical data, or carry out data analysis to produce Statistical Graphics.
At the core of the Open Data Hub lays bdp-core, a java application which contains all the business logic and handles all the connections with the underling database using the DAL. The Open Data Hub Core is composed by different modules: A Writer, that receives data from the Data Sources and stores them in the Database using the DAL and a Reader that extracts data form the databases and exposes them to Data Consumers using APIs on REST endpoints.
Communication with the Data Sources is guaranteed by the Data Collectors, which are Java applications built on top of the dc-interface that use a DTO for each different source to correctly import the data. Dual to the dc-interface, the ws-interface allows the export of DTOs to web services, that expose them to Data Consumers.
The bottom part of Figure 1.1 shows the Data Format used in the various steps of the data flow. Since the data are exposed in JSON, it is possible to develop applications in any language that uses them.
Records in the Data Sources can be stored in any format and are converted into JSON as DTOs. They are then transmitted to the Writer, who converts them and stores them in the Database using SQL. To expose data, the Reader queries the DB using SQL, transforms them in JSON’s DTOs to the Web Services who serve the JSON to the Data Consumers.
The Elements of the Open Data Hub in Details¶
As Figure 1.1 shows, the Open Data Hub is composed by a number of elements, described in the remainder of this section in the same order as they appear in the picture.
- Data Providers
A Data Provider is a person, company or public body that supplies to the Open Data Hub some data or dataset, which usually belongs to a single domain. Data are automatically picked up by sensors and stored under the responsibility of the Data Provider in some standard format, like for example CSV or JSON.
Since a data provider may decide at some point to not publish its data on the Open Data Hub anymore, or new data providers can join the Open Data Hub in the future, they are not an official part of the Open Data Hub. You can learn more on this, including the current list of data providers, in the dedicated section of the documentation.
A dataset is a collection of records that typically originate from one Data Provider, although, within the Open Data Hub, a dataset can be built from data gathered from multiple Data Providers. The underlying data format of a dataset never changes.
- Data Collector
Data collectors form a library of Java classes used to transform data gathered from Data Providers into a format that can be understood, used, and stored by the Open Data Hub Core. As a rule of thumb, one Data Collector is used for one Dataset and uses DTOs to transfer them to the Open Data Hub Core. They are usually created by extending the dc-interface in the bpd-core repository.
The Data Transfer Object are used to translate the data format used by the Data Providers, to a format that the Writer can understand and use to transfer the data in the Big Data infrastructure. The same DTO is later used by the Reader (see below) to present data. DTOs are written in JSON, and are composed of three Entities: Station, Data Type, and Record.
With the Writer, we enter in the Open Data Hub Core. The Writer’s purpose is to receive DTOs from the Data Collectors and store them into the DB and therefore implements all methods needed to read the DTO’s JSON format and to write to the database using SQL.
- ODH Core
The Open Data Hub Core lays at the very core of the Open Data Hub. Its main task is to keep the database updated, to be able to always serve up-to-date data. To do so, it relies on the Writer, to gather new or updated data from the data collectors and keeps a history of all data he ever received. It also relies on the Reader to expose data to the data consumers. Internal communication uses only SQL commands.
The Data Abstraction Layer is used by both the Writer and the Reader to access the Database and exchange DTOs and relies on Java Hibernate. It contains classes that map the content of a DTO to corresponding database tables.
- Database (DB)
The database represents the persistence layer and contains all the data sent by the Writer. Its configuration requires that two users be defined, one with full permissions granted -used by the writer, and one with read-only permissions, used by the Reader.
The reader is the last component of the Core. It uses the DAL to retrieve DTOs from the DB and to transmit them to the web services.
- Web Services
The Web Services, which extend the ws-interface in the Open Data Hub Core repository, receive data from the Reader and make them available to Data Consumers by exposing APIs and REST endpoints. They transform the DTO they get into JSON.
- Data Consumers
Data consumers are applications that use the JSON produced by web services and manipulates them to produce a useful output for the final user. As mentioned in the section Project Overview, application is intended in a broad sense: it can be a web site, a software application for any devices, a communication channel, or any means to use the data.
Also part of the architecture, but not pictured in the diagram, is the
persistence.xml file, which contains the credentials and
postgres configuration used by both the Reader and Writer.