This is a development version of the documentation and may contain inaccuracies! Please find the official documentation at https://opendatahub.readthedocs.io/en/latest/
This is the website of the Open Data Hub documentation, a collection of technical resources about the Open Data Hub project. The website serves as the main resource portal for everyone interested in accessing the data or deploying apps based on datasets & APIs provided by the Open Data Hub team.
The technical stuff is composed of:
- Catalogue of available datasets.
- How-tos, FAQs, and various tips and tricks for users.
- Links to the full API documentation.
- Resources for developers.
For non-technical information about the Open Data Hub project, please point your browser to https://opendatahub.bz.it/.
The Open Data Hub project envisions the development and set up of a portal whose primary purpose is to offer a single access point to all (Open) Data from the region of South Tyrol, Italy, that are relevant for the economy sector and its actors.
The availability of Open Data from a single source will allow everybody to utilise the Data in several ways:
- Digital communication channels. Data are retrieved from the Open Data Hub and used to provide informative services, like newsletters containing weather forecasting, or used in hotels to promote events taking place in the surroundings, along with additional information like seat availability, description, how to access each event, and so on and so forth.
- Applications for any devices, built on top of the data, that can be either a PoC to explore new means or new fields in which to use Open Data Hub data, or novel and innovative services or software products built on top of the data.
- Internet portals and websites. Data are retrieved from the Open Data Hub and visualised within graphical charts, graphs, or maps.
There are many services and software that rely on Open Data Hub’s Data, which are listed in the Apps built from Open Data Hub datasets section, grouped according to their maturity: production stage, beta and alpha stage.
Figure 1 gives a high level overview of the flow of data within the Open Data Hub: at the bottom, sensors gather data from various domains, which are fed to the Open Data Hub Big Data infrastructure and made available through endpoints to (third-party) applications, web sites, and vocal assistants. A more technical and in-depth overview can be found in next section, Open Data Hub Architecture.
All the data within the Open Data Hub will be easily accessible, preferring open interfaces and APIs which are built on existing standards like The Open Travel Alliance (OTA), The General Transit Feed Specification (GTFS), Alpinebits.
The Open Data Hub team also strives to keep all data regularly updated, and use standard exchange formats for them like Json and the Data Catalog Vocabulary (DCAT) to facilitate their spreading and use. Depending on the development of the project and the interest of users, more standards and data formats might be supported in the future.
Open Data Hub Architecture¶
The architecture of the Open Data Hub is depicted in Figure 2, which shows its composing elements together with its main goal: To gather data from Data Sources and make them available to Data Consumers, which are usually third-party applications that use those data in any way that they deem useful, including (but not limited to) study the evolution of historical data, or carry out data analysis to produce statistical graphics.
At the core of the Open Data Hub lays bdp-core, a java application which contains all the business logic and handles all the connections with the underling database using the DAL. The Open Data Hub Core is composed by different modules: A Writer, that receives data from the Data Sources and stores them in the Database using the DAL and a Reader that extracts data form the databases and exposes them to Data Consumers using APIs on REST endpoints.
Communication with the Data Sources is guaranteed by the Data Collectors, which are Java applications built on top of the dc-interface that use a DTO for each different source to correctly import the data. Dual to the dc-interface, the ws-interface allows the export of DTOs to web services, that expose them to Data Consumers.
The bottom part of Figure 2 shows the data format used in the various steps of the data flow. Since the data are exposed in JSON, it is possible to develop applications in any language that uses them.
Records in the Data Sources can be stored in any format and are converted into JSON as DTOs. They are then transmitted to the Writer, who converts them and stores them in the Database using SQL. To expose data, the Reader queries the DB using SQL, transforms them in JSON’s DTOs to the Web Services who serve the JSON to the Data Consumers.
The Elements of the Open Data Hub in Details¶
As Figure 2 shows, the Open Data Hub is composed by a number of elements, described in the remainder of this section in the same order as they appear in the picture.
- Data Providers
A Data Provider is a person, company or public body that supplies to the Open Data Hub some data or dataset, which usually belongs to a single domain. Data are automatically picked up by sensors and stored under the responsibility of the Data Provider in some standard format, like for example CSV or JSON.
Since a data provider may decide at some point to not publish its data on the Open Data Hub anymore, or new data providers can join the Open Data Hub in the future, they are not an official part of the Open Data Hub. You can learn more on this, including the current list of data providers, in the dedicated section of the documentation.
- A dataset is a collection of records that typically originate from one Data Provider, although, within the Open Data Hub, a dataset can be built from data gathered from multiple Data Providers. The underlying data format of a dataset never changes.
- Data Collector
- Data collectors form a library of Java classes used to transform data gathered from Data Providers into a format that can be understood, used, and stored by the Open Data Hub Core. As a rule of thumb, one Data Collector is used for one Dataset and uses DTOs to transfer them to the Open Data Hub Core. They are usually created by extending the dc-interface in the bpd-core repository.
- The Data Transfer Object are used to translate the data format used by the Data Providers, to a format that the Writer can understand and use to transfer the data in the Big Data infrastructure. The same DTO is later used by the Reader (see below) to present data. DTOs are written in JSON, and are composed of three Entities: Station, Data Type, and Record.
- With the Writer, we enter in the Open Data Hub Core. The Writer’s purpose is to receive DTOs from the Data Collectors and store them into the DB and therefore implements all methods needed to read the DTO’s JSON format and to write to the database using SQL.
- ODH Core
- The Open Data Hub Core lays at the very core of the Open Data Hub. Its main task is to keep the database updated, to be able to always serve up-to-date data. To do so, it relies on the Writer, to gather new or updated data from the data collectors and keeps a history of all data he ever received. It also relies on the Reader to expose data to the data consumers. Internal communication uses only SQL commands.
- The Data Abstraction Layer is used by both the Writer and the Reader to access the Database and exchange DTOs and relies on Java Hibernate. It contains classes that map the content of a DTO to corresponding database tables.
- Database (DB)
- The database represents the persistence layer and contains all the data sent by the Writer. Its configuration requires that two users be defined, one with full permissions granted -used by the writer, and one with read-only permissions, used by the Reader.
- The reader is the last component of the Core. It uses the DAL to retrieve DTOs from the DB and to transmit them to the web services.
- Web Services
- The Web Services, which extend the ws-interface in the Open Data Hub Core repository, receive data from the Reader and make them available to Data Consumers by exposing APIs and REST endpoints. They transform the DTO they get into JSON.
- Data Consumers
- Data consumers are applications that use the JSON produced by web services and manipulates them to produce a useful output for the final user. As mentioned in the section Project Overview, application is intended in a broad sense: it can be a web site, a software application for any devices, a communication channel, or any means to use the data.
Also part of the architecture, but not pictured in the diagram, is the
persistence.xml file, which contains the credentials and
postgres configuration used by both the Reader and Writer.
A domain is a category that contains entities that are closely related. In the Open Data Hub, each domain roughly identifies one social or economical category; the domains intended as sources for data served by the Open Data Hub are depicted at the bottom of Figure 1.
Currently, the domains that can be accessed through the Open Data Hub are:
- Mobility: this domain contains data about public transportation, parkings, charging station, and so on.
- Tourism: data about events, accomodations, points of interest, and so on.
Each domain is composed by datasets, each of which contains data that provide useful information for the domain.
There may be no clear separation between two domains, because for example, data about public transportation belong to the Mobility domain, but are also useful for the Tourism domain.
Accessing data in the Open Data Hub¶
There are different modalities to access data that are provided by the Open Data Hub, that are listed here. Currently, data from the Mobility and Tourism domains can be accessed, both from the command line and using a browser. Various dedicated tutorials are available in the List of HOWTOs section; while in section Getting Involved you can find additional ways to interact with the data and the Open Data Hub team.
Accessing data in the Open Data Hub by using a browser is useful on different levels: for the casual user, who can have a look at the type and quality of data provided; for a developer, that can use the REST API implemented by the Open Data Hub or even check if the results of his app are coherent with those retrieved with the API; for everyone in order to get acquainted with the various methods to retrieve data.
More in detail, these are the possibilities to interact with Open Data Hub’s data by using a browser:
- Go to the Apps built from Open Data Hub datasets section of the documentation, particularly sub-sections Production Stage Apps and Beta Stage Apps, and choose one of the web sites and portals that are listed there. Each of them uses the data gathered from one or more Open Data Hub’s datasets to display a number of useful information. You can then see how data are exposed and browse them.
- In the same Apps built from Open Data Hub datasets section, you can also check the list of the Alpha Stage Apps and choose one of them that you think you can expand, then get in touch with the authors to suggest additional features or collaborate with them to discuss its further development to improve it.
- Access the ODH Tourism data browser and search for the Open Data available in the Tourism domain. You can simply use those data for your convenience, or you might even find a novel way to exploit those data and use them in an app or portal you are going to develop. A detailed howto is available: How to use the Open Data Hub’s Tourism Data Browser? to help you getting acquainted with the browser.
- Go to the Swagger interface of the datasets in the Tourism domain, located at http://tourism.opendatahub.bz.it/swagger/, to learn how the REST APIs are built and how you can script them to fetch data for your application. To get started, there is a dedicated howto: How to access Tourism Data? that will guide you in the first steps.
- Access the Swagger interface of the datasets in the Mobility domain. Check the link for each of them in section Datasets in the Mobility Domain. Like in the case of the tourism’ Swagger interface, you can learn REST API call for that domain and fetch data for your application. There is a dedicated howto to learn more how to interact with this interface: ref:mobility-data-howto
- Open the Analytics for Mobility web page, at https://analytics.mobility.bz.it/. This portal uses data in the mobility domain to display various information about the sensors, including their locations, what they measure, and actual data in near-real time. You can retrieve data gathered by the sensors directly from the dataset, in almost real-time.
Unlike browser access, that provides an interactive access to data, with the option to incrementally refine a query, command line access proves useful for non-interactive, one-directional, and quick data retrieval in a number of scenarios, including:
- Scripting, data manipulation and interpolation, to be used in statistical analysis.
- Applications that gather data and present them to the end users.
- Automatic updates to third-parties websites or kiosk-systems like e.g., in the hall of hotels.
Command line access to the data is usually carried out with the curl Linux utility, which is used to retrieve information in a non-interactive way from a remote site and can be used with a variety of options and can save the contents it downloads, which can them be send to other applications and manipulated.
The number of options required by curl to retrieve data from Open Data Hub’s dataset is limited, usually they are not more than 3 or 4, but their syntax and content might become long and not easily readable by a human, due to the number of filters available. For example, to retrieve the list of all points of interests in South Tyrol, the following command should be used:
Your best opportunity to learn about the correct syntax and parameters to use is to go to the swagger interface of the tourism or mobility (http://ipchannels.integreen-life.bz.it/<dataset>/swagger-ui.html ) domains and execute a query: with the output, also the corresponding curl command used to retrieve the data will be shown.
|||You need to provide the dataset name, for example http://ipchannels.integreen-life.bz.it/parking/swagger-ui.html, see Datasets in the Mobility Domain for full links.|
The authentication layer is currently intended for internal use only, therefore it is not necessary to use authentication to access data provided by the Open Data Hub.
While the Open Data Hub project strives to offer only Open Data, it relies on third-party Data Providers, which may not offer the whole content of a dataset for public use. For this reason, an authentication mechanism has been implemented, which does not however impact on users.
Indeed, authentication in Open Data Hub is mainly used when exposing data to the consumer, which means by the Reader and in every single web service accessing the Reader, to allow the access to closed data in each dataset only to those who are allowed to, i.e., developers and members of the Open Data Hub team.
In the remainder of this section, we describe how authentication works within the Open Data Hub, because this information might be of interest to user that might become app developers for the Open Data Hub team; further information about how to use authentication can be found in the dedicated howto.
There are currently two different authentication methods available:
- The Token-based Authentication, defined in RFC 6750, requires that anyone who wants to access resources supply a valid username and password and becomes a Bearer Token that must be used to access the data. After the token expires, a new one must be obtained. This type of authentication is used for the datasets in the tourism domain.
- The OAuth2 Authentication follows the RFC 6749 and is used for all the datasets in the mobility domain.
For those not familiar with the OAuth2 mechanism, here is a quick description of the client-server interaction:
The client requests the permission to access restricted resources to the authorisation server.
The authorisation server replies with a refresh token and an access token. The access token contains an expire date.
The access token can now be used to access protected resources on the resource server. To be able to use the access token, add it as a Bearer token in the Authorization header of the HTTP call. Bearer is a means to use tokens in HTTP transactions. The complete specification can be found in RFC 6750.
If the access token has expired, you’ll get a HTTP
401 Unauthorizedresponse. In this case you need to request a new access-token, passing your refresh token in the Authorization header as Bearer token. As an example, in Open Data Hub datasets Bearer tokens can be inserted in a curl call like follows:
curl -X GET "$HTTP_URL_WITH_GET_PARAMETERS" -H "accept: */*" -H "Authorization: Bearer $TOKEN"
Here, $HTTP_URL_WITH_GET_PARAMETERS is the URL containing the API call and “$TOKEN” is the string of the token.