With Venio, you can setup a storage gateway (usually an Amazon EC2) using Amazon S3 and the SMB protocol to allow the Venio Distributed Service and Venio Web Export Service to utilize an S3 bucket seamlessly.
Here's a basic data flow and explanation of where data resides during processing if utilizing an S3 gateway as a SMB share repository:
-
Application Request:
- The Venio service (export service or distributed service) initiates a file operation (read/write) using the SMB protocol.
-
Storage Gateway:
- The storage gateway, typically deployed as a virtual machine (VM) or hardware appliance on-premises, acts as an intermediary between Venio and Amazon S3.
- The gateway presents file shares via SMB to Venio, making cloud storage appear as a local network share.
- More information is available on the AWS documentation website: https://docs.aws.amazon.com/filegateway/latest/files3/file-gateway-concepts.html
-
Data Transfer:
- Write Operation:
- Venio service writes data to the SMB share.
- The storage gateway receives this data and temporarily caches it locally.
- The gateway then uploads the data from the local cache to an Amazon S3 bucket.
- After the upload, the data resides in Amazon S3.
- Read Operation:
- The Venio service requests data from the SMB share.
- The storage gateway checks its local cache for the requested data.
- If the data is not in the local cache, the gateway retrieves it from Amazon S3 and temporarily stores it locally before serving it to Venio.
- Once in the local cache, subsequent requests for the same data can be served directly from the cache, improving performance.
- Venio Ingesion:
- The venio service copies the data received from the storage gateway to wherever the service is configured to store the TEMP data. From that TEMP directory, the venio service scans, extracts, and ingests the data it copied out of the storage gateway attached to the S3 bucket.
-
Data Caching:
- The storage gateway uses local storage to cache frequently accessed data and recent writes. This cache helps reduce latency and improve performance for read and write operations.
-
During Write Operations:
- Local Cache: When venio writes data to the SMB share, it is initially stored in the local cache of the storage gateway. This local cache resides on the storage attached to the gateway.
- Amazon S3: The data is then uploaded from the local cache to Amazon S3, where it is permanently stored.
-
During Read Operations:
- Local Cache: If the requested data is in the local cache, it is served directly from there.
- Amazon S3: If the data is not in the local cache, it is retrieved from Amazon S3 and temporarily stored in the local cache before being served to the Venio service.
-
Transfer Phase:
- During the transfer phase (when data is being uploaded to S3 or downloaded from S3), it temporarily resides in transit. The gateway manages this transfer, ensuring data integrity and security using encrypted connections.
- Local Cache: Data resides here temporarily during write and read operations, improving performance by reducing latency.
- Amazon S3: Data is permanently stored in Amazon S3, ensuring durability, scalability, and availability.
- In Transit: Data is temporarily in transit during upload to S3 or download from S3, protected by encryption.
This setup allows on-premises applications to interact with cloud storage as if it were local, leveraging the scalability and durability of Amazon S3 while maintaining high performance through local caching.
To utilize in a multi-tenant environment, you would need to create a new S3 bucket dedicated to each specific client in order to ensure each client only had access to their own data.