eARQUERO uses various components of Hadoop plus other Hadoop based technologies.

Hadoop evolved into a complex collection of very powerful tools and platforms. Collectively, they provide unparalleled storage and processing capabilities.

Batch layer
Contains all the technologies for analyzing massive amount of data and real time data using batch oriented approach.

Search Layer
This is the layer responsible for data indexing and it provides the fundamental search based data discovery feature.

Serving Layer
It is meant to provide a quick data access (near-real-time query access) to multiple concurrent users/applications.
eARQUERO is classically organised in a client/server relationship. The user interface is a HTML5/javascript based rich client running into the browser which talks to the server through a set of REST based APIs (JSON/HTTP). So, all the product functionalities are exposed as REST APIs as well as through a user interface.

  • APIs for importing data either from any RDBMS or from files of any structure into the eARQUERO’s workspace.
Index & Search
  • APIs for data indexing and searching, they collectively realise the search based data discovery.
  • APIs for publishing data into one or more serving layers.
  • APIs for data indexing and searching, they collectively realize the search based data discovery.
  • eARQUERO provides a strict and controlled way to lay out the data on top of HDFS. All the data is also listed and registered on the eARQUERO metadata repository. In this way the user is forced to organize the data following a well specified pattern. All the data is organized in terms of databases and tables. However, everything is stored using the standard Hadoop apis allowing Hadoop native applications to access the data inside the eARQUERO’s workspace.
  • Besides being an Hadoop based BI tool, eARQUERO, through its powerful REST APIs, dramatically reduces the time for integrating Hadoop inside the enterprise following the modern approach.

eARQUERO builds on Hadoop authorization & authentication, offering a flexible way of controlling and handling the access to Data & Operations, also fully integrated with LDAP in Active Directory providing organizations and users an easy way of handling their credentials.
All traffic between the end user’s browser and de eARQUERO application server is secure, based on HTTPS.
At Data level, eARQUERO allows a fine grain security mechanism and a very simple way for sharing data across multiple users. If I own a table I could decide to make it visible to only one specific user or group.
This is the internal security model of Arquero, then depending on the platform Arquero is targeting, those rights will be converted on proper rights on the target platforms:

A database is stored as a directory inside HDFS and its tables as subdirectories. The eARQUERO rights are converted in proper rights at the HDFS level.
When the underlying HDFS supports the new access control list feature eARQUERO will map the fine grain access control to proper HDFS ACLs: giving table access to a single user is mapped to a proper ACL at the HDFS level. If eARQUERO works under secure impersonation, the actual user owns all the data it creates.
A nice consequence is the fact that the data are always consistent with the standard HDFS security model. If anyone tries to access the data outside of eARQUERO it will be stopped by HDFS itself. The data is always safe regardless they are accessed through eARQUERO or not.

eARQUERO provides a mechanism for publishing tables to Impala. Publishing aneARQUERO’s table to Impala means the automatic creation of an external table inside the Impala metadata repository.
So, besides the generation of the proper DDL SQL statements for the external table creation eARQUERO generates as well proper SQL statements for making that table only accessible by the people that have read access on the corresponding eARQUERO’s table.


Subscribe to our Updates!