Skip to content

Gen API — Components (L3)

Inside the Gen API container.

Gen API Components

REST controllers

Area Endpoints (representative)
Dashboards / widgets / filters CRUD over dashboards, dashboard_widgets, dashboard_filters, table_filters.
Data sources / datasets DataSourcesController, DatasetsController, ConnectorsController — connector-typed CRUD; credentials never leave the server in plaintext.
Big-data SQL BigDataController/bigdata/query (NDJSON rows), /bigdata/query/stream (Arrow IPC), /bigdata/query/parse (AST + table refs, no execution).
Maps MapsController — PMTiles generation + status, on-demand H3 hexagons / clusters, filterIds (Arrow IPC stream of IDs matching a bbox/polygon).
Roles & permissions RolesController, AccessController — system admin management, role CRUD, permission flags, role membership, role-to-resource assignments.
Agent surface AgentController, ChatSessionsController, TooltipTemplatesController, OntologyController, ProxyController.
External APIs ExternalApisController, ExternalApiOAuthController — third-party OAuth integrations with encrypted refresh tokens.
Auth & favorites AuthController, FavoritesController.

Controllers stay thin: each operation is declared as a [FromServices] action parameter, never as a constructor dependency. DTOs are in the OpenAPI document at axion.gen.backend/openapi/v1.json. Web codegen pulls from this directly.

Auth Middleware

OIDC JWT bearer. Uses the BFF callback proxy pattern documented in axion.gen.backend/docs/Oidc.md:

  • The Next.js app initiates the OIDC code flow.
  • The IdP's redirect URL points at the Gen API (not the Next.js app).
  • Gen API receives the code, exchanges it, sets an HttpOnly session cookie, and redirects back to the SPA.
  • Subsequent requests carry the cookie; the API validates it server-side and resolves the JWT internally.

This pattern keeps access tokens out of localStorage — a hard requirement for many enterprise SSO deployments.

Operations layer

Same IOperation<TParam, TResult> pattern as Sense. Operations live under src/Axion.Gen.Core/Operations (one folder per domain: DataSources, Datasets, BigData, Dashboards, DashboardGroups, Roles, Map, Ontology, AgentDiscovery, Artifacts, Audit, ChatSessions, Curation, Dkb, ExternalApis, Favorites, Filters, Proxy, SemanticMetrics, TooltipTemplates, Users). They call into repositories, the connector factory, the encryption layer, and IPermissionService.

Authorization (OpenFGA)

IPermissionService is the only entry point for authorization checks and tuple writes — controllers and operations never call OpenFgaClient directly. Object hierarchy: system:axion-gen → datasource:{name} → dataset:{guid} → table:{datasetGuid}/{tableName}. Roles carry permission flags (has_admin_datasource, has_view_dataset, has_view_table) as wildcard tuples and inherit through parent. Granting access to a resource is a single tuple: assigned_role from the resource to the role; computed can_* relations make member ∧ has_* instantly effective.

Before SQL execution, QueryPermissionService extracts the table set from the parsed query and submits a single BatchCheck for can_view on every referenced table. Any denial throws UnauthorizedAccessException listing the denied tables. CTE-only queries skip authorization.

OpenFgaMigrationRunner is the startup gate: it computes a SHA-256 of model.fga, compares it against the latest openfga_model_versions row, uploads a new model when the hash changes (Run() in Development; Resolve() only in other environments), seeds built-in roles and parent tuples, and stores the active store/model id in OpenFgaRuntimeOptions.

GenDbContext

EF Core 10 + Npgsql + PostGIS. Entities live in src/Axion.Gen.Data/Entities and are mapped directly as the EF model — no separate DTO/data-model layer. JSONB columns use JsonElement (never JsonDocument).

Domain groups: identity & roles · data org (data_sources, datasets, dataset_groups, dataset_maps) · big-data caches (table_profile_cache, external_apis, user_external_api_tokens) · dashboards & filters · curation, search & semantics · agents & SDUI · favorites · auth bookkeeping (openfga_model_versions).

DataSource encryption

Credentials and OAuth refresh tokens are stored AES-256-GCM encrypted with versioned keys via AesGcmEncryptor. Two storage modes (per DataSources.md):

  • Database (default) — encrypted columns in Postgres (encrypted_credentials, nonce, tag, key_version); managed via the REST API.
  • Configuration — read from appsettings.json / env vars; CRUD throws.

Key rotation: insert a new key version into config, bump CurrentVersion, then run the rotate-keys-job Helm Job which re-encrypts rows under the new version.

Federated data-source connectors

Connectors implement IBigDataConnector and are auto-discovered by reflection from the [Connector(...)] attribute (AddBigDataConnectors() in DI). Each handles execution (ExecuteQuery → Arrow IPC stream), catalog (GetTables, GetTableSchema, GetTableProfile), per-column expression helpers (GetGeometryExpression, GetSelectExpression), and optional H3 aggregation.

Connector Driver
ClickHouseConnector Native HTTP via IHttpClientFactory (Arrow Stream IPC for execution, JSONEachRow for metadata).
BigQueryConnector Apache Arrow ADBC BigQuery driver via AdbcConnectionFactory.
ImpalaConnector Apache Arrow ADBC Impala driver.
FlightSqlConnector Apache Arrow ADBC Flight SQL driver.
S3ParquetConnector DuckDB (httpfs/S3 + spatial + h3 extensions); per-bucket secret created lazily.

AdbcConnectionFactory is the single entry point for ADBC-backed connectors — adding a new ADBC driver = adding a branch there. Adding an HTTP- or DuckDB-backed connector = adding a new [Connector(...)]-tagged class. Spatial bbox/polygon predicates plug in via ISpatialFilterBuilder keyed by connector type (today only ClickHouseSpatialFilterBuilder).

Where queries actually run

QueryExecutor has two execution paths:

  • Single-dataset fast path (one dataset, no alias) — the connector receives the user's SQL verbatim and streams Arrow IPC straight into the response. Nothing is materialized in DuckDB.
  • Federation path (multiple datasets, or a single dataset with an alias) — every query gets its own DuckDB scratch schema (q_{guid}). All-S3-Parquet queries take a fast subpath using read_parquet() views; mixed-connector queries pull each non-S3 dataset as Arrow IPC into a temp file and load it via read_arrow_ipc(...). The user's SQL then runs against the scratch schema with search_path set; the schema is dropped on exit.
flowchart LR
  user[Analyst] --> web[Gen Web]
  web -- "ask_data tool"  --> bff[Next.js BFF]
  bff -- "NL→SQL via LLM"  --> llm[OpenAI-compatible LLM]
  bff -- "execute SQL" --> api[Gen API]
  api -- "parse + sanitize"  --> parser[SqlParser
DuckDB polyglot] api -- "BatchCheck can_view" --> fga[(OpenFGA)] api --> exec{QueryExecutor} exec -- "single dataset" --> conn[Connector] exec -- "federation" --> duckdb[DuckDB
scratch schema] duckdb --> conn conn --> ds[(Data source)] api -- "Arrow IPC / NDJSON" --> web

The LLM never sees the data — only the question and the schema description from describe_table. Results stream back through the API to the browser as Arrow IPC or NDJSON.

Where to go next