The data lakehouse is multi-engine and those engines (Spark, Flink, Trino, Arrow, Dask etc) all need to be in some way tied into a cohesive architecture.
The data lakehouse has to deliver central table storage, portable commute, access control and persistent structure. That is where formats like Iceberg ↗, Hudi ↗ and Delta Lake ↗ come into play. They are designed for the modern datalake and they are each supported in AIStor. We might have an opinion on which one wins (you can always ask us…) but we are committed to supporting them until it doesn't make sense (see Docker Swarm and Mesosphere).
The data lakehouse demands a level of performance, and more importantly, performance at scale, that legacy systems could only dream of.
AIStor has proven in multiple benchmarks that it is materially faster than Hadoop and the migration path is clearly documented ↗. AIStor is the fastest object store on the market on the least amount of hardware. With the support of the S3 Express API, AIStor is faster than ever, even faster than AWS S3 deployed to EKS. This means better performance for your query engines (Spark, Presto, Trino, Snowflake, Microsoft SQL Server, Teradata and more). This also includes your AI/ML platforms from MLflow ↗ to Kubeflow ↗.
AIStor's server binary is all of <100 MB. Despite its size, it is powerful enough to run in the datacenter, yet still small enough to live comfortably at the edge.
What it means to enterprises is that your S3 applications can access data anywhere, anytime, and with the same API. Implementing AIStor edge location and with replication capability, we can capture and filter data at the edge and ship it to the mother cluster for aggregation and further analytics implementation.
The data lakehouse extends the disaggregation seen in the Hadoop breakup. Data lakehouses have high speed query processing engines and they have high throughput storage.
The data lakehouse is far too large and diverse to fit into a database, so the data resides on the object store. This way, the database can focus on query optimization and outsource the storage functions to a high-speed object store. By keeping a subset of the data in memory and leveraging capabilities like predicate pushdown (S3 Select) and external tables - the query engine has far more flexibility.
Data is constantly getting generated, and that means it must constantly be ingested without incurring indigestion.
AIStor is built for this world and works out of the box with Kafka, Flink, RabbitMQ and a host of other solutions. The result is a data lakehouse that becomes the single source of truth and can expand to EBs and beyond.
AIStor has multiple clients whose daily data ingest exceeds 250PB a day.
Simplicity is hard. It takes work, discipline, and above all, commitment. AIStor's simplicity is legendary and is the result of a philosophical commitment to making our software easy to deploy, use, upgrade, and scale.
The data lakehouse does not need be complex. There are a handful of pieces and we are committed to ensuring that AIStor is the easiest to adopt and deploy.
AIStor works with every component of the modern data stack from every data streaming protocol and every data pipeline. Every vendor tests extensively and frequently with AIStor such that data pipelines are more resilient and available.
AIStor protects data with per-object, inline erasure coding, which is far more efficient than HDFS alternatives which came after replication and never gained adoption.
In addition, AIStor's bitrot detection ensures that it will never read corrupted data, capturing and healing corrupted objects on the fly. AIStor also supports cross-region, active-active replication. Finally, AIStor supports a complete object locking framework offering both Legal Hold and Retention (with Governance and Compliance modes).
Hadoop HDFS' successor isn't a hardware appliance, it is software running on standard hardware.
That is what AIStor is: software. AIStor is designed to take full advantage of standard hardware. With the ability to leverage NVMe drives and 100 GbE networking, AIStor can shrink the datacenter, improving operational efficiency and manageability. Indeed, companies that replace legacy storage sofware with AIStor reduce their HW footprint by 60% or more, while improving performance and reducing the FTE required to manage it.
MinIO supports multiple, sophisticated server-side encryption schemes to protect data — wherever it may be — in flight or at rest.
MinIO’s approach assures confidentiality, integrity, and authenticity with negligible performance overhead. Server side and client side encryption are supported using AES-256-GCM, ChaCha20-Poly1305, and AES-CBC, ensuring application compatibility. Furthermore, MinIO supports industry-leading key management systems (KMS).