The review notes that f4 is built atop of HDFS, and it describes how it gets around several HDFS limitations (namely adding cross-data center replication and using erasure coding to decrease replication factors). While not directly related to Hadoop, this post summarizes a recent paper out of Facebook on their f4 BLOB storage system. For anyone interested in enterprise security, this is a good overview of the current state of Hortonworks’ offerings. In addition, the post includes several questions and answers related to the offering. Hortonworks has posted a recording of and slides from a recent webinar on Apache Knox and Ranger, which are the main enterprise security products in their distribution. The post introduces these tools and describes work to add support for running a mini cluster via OpenShift to the command-line tools. Specifically, the Kite SDK has tooling for running in-process mini clusters (HDFS, Hive, Flume, HBase, Zookeeper) for testing as well as locally via the command-line. The Cloudera Blog has a post on integrating the KiteSDK with OpenShift. It also contains instructions on setting up a local HBase and interacting with it using the HBase shell.
CLOUD BACKBLAZE THURSDAY 100M IPO SERIES
In the first part of a three-part series on HBase, this post presents an introduction to HBase’s data model and architecture. It’s just the tip of the iceberg-hopefully these things get better as Hadoop matures. This article highlights several ways that the Hadoop ecosystem could improve along those lines. TechnicalĪs the Hadoop ecosystem of projects grows and folks are using it in many different ways, integration between projects and consistency across projects are both important parts of usability. Finally, Kafka released version 0.8.2-beta this week, and a new project aims to provide higher throughput from Kafka for MapReduce jobs. Also, there are several articles this week on Hadoop adoption, which seems to be limited by maturity of enterprise features. There are articles on Hortonworks' HDP in the Microsoft Azure cloud, Cloudera’s new cloud provisioning tool Cloudera Director, OpenShift, and SequenceIQ’s Cloudbreak.
Hadoop in the cloud (both open and public) is a big topic again this week.