hadoop - Spark in Business Intelligence -


Currently I am doing a project in Business Intelligence and Big Data Area, in 2 areas where I am new in all honesty And very green.

I was planning to build a hivewarehouse using mogodibi and linked it to a business intelligence platform like Pantaho. While researching I came to Spark and due to its interest in the shark module it has increased performance in memory and performance in the query.

I know that I can connect the hive to pantheo, but I was thinking that I can use Shark questions between them for the performance? If anyone is not aware of any other BI platform that will allow it?

As I said that I am very new in this area, therefore I feel free to correct because I have a good chance of having some concepts, mixed and some silly.

I think you have to build a hive dvdwirehouse using a hive or mongodibi datvairhouse using mogodibi needed. I did not understand how you are going to erase them, but I will try to answer this question.

Usually, you configure a BI tool, your choice for JDBC driver (like Hive) and The BI tool brings data using that Junk Driver. How the director brings data from DB, BI is completely transparent to the tool.

In this way, you can use the hive, shark or any other dB that comes with the JDBC driver.

I can summarize my options in this way:

Hive: The most complete feature set, and the most compatible device. Can be used on plain data or, you can demonstrate to increase ETL data in your own format.

Impla: Claims to be faster than Dal, but it can be used on plain data or there is less complete feature set, or you can perform ETL data growing in its roof format.

Shark: Cutting on the edge, the main stream is not yet, the performance depends on which percentage of your data might fit into the RAM on your cluster.


Comments