As a leading provider of connected car and location solution development services, our client is constantly looking to improve their technological solutions. To cut production costs and avoid the restrictions of commercially provided data, the company decided to adopt the OpenStreetMap (OSM) geographic database to start producing maps in-house. This makes our client independent from navigation platform providers and gives them full control over the cadence and quality of the maps delivered for their navigation solutions.
OSM provides a free, editable, and constantly updated database. Considering that the volume of received data is quite high and that location solution development requires timely releases, data processing demands a big data platform and solid expertise in data mapping solutions. Having proven to be an expert in providing data mapping services, Intellias was invited to contribute to our client’s shift from the existing system to the free open-source data provided by OSM. The team employed a big data process based on the Spark framework for cleansing data, integrating third-party data, enriching existing data, and providing a proper structure for obtained data.
OpenStreetMap has a special tool that allows reviewers to contribute to the database. OSM maps are provided as vector data, but maps get updated only after they get verification from recognized contributors. The project consumes updated data daily, and that data becomes available in the form of a portable screenshot. Then this data can be parsed, converted into Parquet, and uploaded to Hadoop, after which Spark processes it steadily. Unlike data obtained from other platforms, OSM data is not structured and needs to be adapted to fit the existing database.
The main challenge of the project was structuring the OSM data. Previously, the project used source processed data in Relational Database Format (RDF), given as Oracle-related tables. OSM data is different, and it had to be properly converted before starting the end-to-end data processing.
Apart from that, OSM data is populated by people, and its model is not restrictive. It allows people to add various attributes without any restrictions. This led to another crucial project task — a formalization of collected data, making it compatible with the source data previously provided to the project. The Intellias team found the best solution for processing OpenStreetMap integration data within Spark and Java.
The solution contained:
- Spark framework for scaling and parallel data processing
- Software developed using the Spark framework that can be used on both cloud and on-premises clusters
- Automatic scaling provided by Spark resources
- Cluster throughout managed by adding nodes to the cluster or removing them depending on the load
- Spark SQL, a distributed SQL query engine that can be used for data processing and analytics
- Data pipelines automated and visualized with various tools: Ansible scripts, Jenkins, etc.
- Interactive data analytics and collaborative documents
- Zeppelin web-based notebooks including the Spark interpreter for data-driven analytics
Experienced Intellias engineers improved the efficiency of the project, organized the development process, and created scalable solutions based on their strong experience with Java, Spark, and the overall map delivery process.
Intellias’ extensive mapmaking experience as well as strong experience working with the third-party solution and source data to be replaced was a game-changer in the project. Making OpenStreetMap data compatible with the existing system, adopting it, and developing a scalable, efficient solution would not have been possible without solid domain experience.
Intellias engineers were able to offer numerous process improvements. Creating a nightly build system helped the codebase stay healthy and helped maintain a fully automated build script. This, in turn, ensured that the build process was documented and repeatable. An empowered testing and validation framework, implemented in the form of SQL queries, determined whether the development process was meeting specified business requirements. Using a visualizer framework enabled the creation of a superior user experience for customers. Each of these improvements had a huge positive impact on the project’s overall efficiency and helped cut production costs.