Tired of duplicating massive datasets just to add vector search capabilities? With Oracle AI Database 26ai, you can now run high-performance similarity search directly on your existing Apache Iceberg tables stored in object storage — no data copying, no extra ETL pipelines, and no governance headaches.
This feature is a game-changer for data lakes built on Iceberg, Parquet, and cloud storage (OCI Object Storage, S3, etc.).
Why This Matters
- Avoid massive data duplication and sync issues
- Keep data in its original governed location
- Query Iceberg + Oracle tables together in the same SQL
- Create fast vector indexes without moving the source data
- Works great for RAG, semantic search, and recommendation systems
Step-by-Step: Query Iceberg Vectors in Minutes
1. Create External Table over Iceberg
CREATE TABLE ext_iceberg_vectors (
id VARCHAR2(100),
content CLOB,
embedding VECTOR(1024, FLOAT32) -- match your embedding dimension
)
ORGANIZATION EXTERNAL
(
TYPE ORACLE_BIGDATA
DEFAULT DIRECTORY DATA_PUMP_DIR
ACCESS PARAMETERS
(
com.oracle.bigdata.credential.name = 'OCI_CRED',
com.oracle.bigdata.fileformat = 'parquet',
com.oracle.bigdata.access_protocol = 'iceberg'
)
LOCATION ('iceberg:https://objectstorage.<region>.oraclecloud.com/.../metadata/v1.metadata.json')
)
REJECT LIMIT UNLIMITED;
2. Run Similarity Search (with on-the-fly embedding)
SELECT id,
content,
VECTOR_DISTANCE(embedding,
VECTOR_EMBEDDING(embedding_model USING :search_query AS data)) AS score
FROM ext_iceberg_vectors
ORDER BY score
FETCH FIRST 10 ROWS ONLY;
3. Speed It Up with Vector Index
CREATE VECTOR INDEX iceberg_vec_idx
ON ext_iceberg_vectors(embedding)
ORGANIZATION NEIGHBOR PARTITIONS
WITH TARGET ACCURACY 95;
Best Practices for Production
- Use credential objects for secure access to object storage
- Match vector dimension and type exactly with your embedding model
- Create IVF or HNSW indexes for large Iceberg tables
- Combine with Oracle tables in the same query for hybrid search
- Great for air-gapped environments (embeddings run in-database via ONNX)
Real-World Use Cases
- Semantic search over data lake documents
- RAG applications using Iceberg as the knowledge base
- Real-time recommendations without data movement
- Unified analytics across structured + unstructured data
Conclusion
Oracle AI Database 26ai + Apache Iceberg gives you the best of both worlds: the governance and scale of a modern data lake with the powerful, familiar vector search capabilities of Oracle.
No more unnecessary data copies. Just point, index, and query — delivering fast semantic search on your existing Iceberg tables today.