← All guides
architecture·15 min read·3,551 words

Knowledge Graphs for Enterprise Finance — The Astral Architecture

Why finance teams need a knowledge graph, how Astral models suppliers, contracts, invoices as semantic relationships, and how it grounds AI agents.

Published 2026-05-04 by Flowie team

A knowledge graph is the data substrate that lets AI agents reason about an enterprise's finance and procurement universe — suppliers, contracts, invoices, payments, business units — as a network of semantic relationships rather than disconnected rows. Astral is Flowie's knowledge graph: a graph database layer that resolves duplicate entities across ERPs, encodes who-supplies-whom and which-contract-governs-which-invoice as first-class edges, and grounds every agent decision in verified context instead of LLM guesswork. This guide explains why a graph beats SQL and vector search for finance queries, the Astral architecture, and the use cases it unlocks — anti-fraud, compliance traceability, supplier discovery, spend visibility, ROI modeling.

Why finance teams need a knowledge graph in 2026

Finance and procurement run on relationships. An invoice is meaningless without the supplier it came from, the contract that governs its prices, the PO it references, the BU it bills, the cost center it hits, the approver who signed off, and the bank account that paid it. Traditional systems store each of these as a foreign-key reference in a relational table, and every meaningful question becomes a multi-table join.

Three pressures make this approach untenable in 2026:

  1. Multi-ERP sprawl. The median enterprise runs 3–5 ERPs after a decade of M&A: SAP S/4HANA at headquarters, Oracle EBS at a US subsidiary, NetSuite at a digital-native acquisition, Sage at a French entity, plus a long tail of vertical tools. The same supplier exists in each one, with different IDs, different name spellings, different bank accounts. Asking "what is our total exposure to ACME Corp?" requires reconciling these silos — a problem that gets exponentially worse with every join.

  2. AI agents need grounding. Large language models hallucinate. Ask GPT-4 or Claude "which supplier shares a UBO with our top three risk-flagged vendors?" with no grounding, and you get a plausible-sounding fabrication. Ground the same agent in a knowledge graph and the question becomes a deterministic 3-hop traversal. This shift — from prompt-as-context to graph-as-context — is the architectural move that separates serious agentic platforms from demos.1

  3. Regulatory traceability. France's Plateforme Agréée mandate, ViDA's digital reporting requirements, Peppol's network audit logs, and SOX section 404 all demand provable lineage: this invoice maps to that contract, approved by that person, paid via that account, on that date. A knowledge graph makes the lineage queryable; a relational schema buries it under joins.

Gartner's 2025 Hype Cycle for Data Management placed knowledge graphs near the "Plateau of Productivity," noting that graph technology underpins 30% of large-enterprise AI projects, up from under 10% in 2022.2 The category is no longer experimental.

The graph data model — nodes, edges, properties applied to finance

A knowledge graph is built from three primitives:

  • Nodes represent entities — Suppliers, Customers, Contracts, Invoices, Purchase Orders, Products, Employees, Cost Centers, Business Units, Bank Accounts, Tax IDs, UBOs (ultimate beneficial owners), Sanctions Lists.
  • Edges represent typed relationships — supplied-by, owned-by, paid-by, governed-by, parent-of, billed-to, approved-by, references, shares-bank-account-with, subject-to-sanction.
  • Properties are key-value attributes attached to nodes or edges — an Invoice node has amount, currency, issue_date; an approved-by edge has timestamp and policy_version.

Here is a small finance-domain knowledge graph showing the core entities and how they connect:

graph LR
    Contract[Contract C-2024-117]
    Supplier[Supplier ACME GmbH]
    BU[BU France-South]
    PO[PO 4500-882]
    Invoice[Invoice INV-9931]
    BankAcct[Bank Account DE89...]
    UBO[UBO Jane Smith]

    Contract -->|governs| Supplier
    Contract -->|covers| BU
    Supplier -->|owns| BankAcct
    Supplier -->|controlled-by| UBO
    PO -->|raised-by| BU
    PO -->|issued-to| Supplier
    Invoice -->|references| PO
    Invoice -->|billed-to| BU
    Invoice -->|paid-to| BankAcct

What this small graph captures that a relational schema struggles with: the ability to ask, in one traversal, "starting from this Invoice, what is the governing Contract, the issuing BU, the supplier UBO, and any other suppliers controlled by the same UBO?" In SQL this is six joins across five tables; in Cypher or GQL it is a path expression of three lines.

The properties attached to each edge matter as much as the edges themselves. A paid-to edge can carry payment_date, payment_method, and clearing_reference. A shares-bank-account-with edge — derived during entity resolution — can carry a confidence_score so anti-fraud queries can filter to high-certainty matches.

Entity resolution — the unsung hero of multi-ERP environments

The most important job a finance knowledge graph performs is entity resolution: deciding that "ACME INC" in SAP, "Acme Incorporated" in NetSuite, "ACME" in Oracle, and "Acme Corp." in a Coupa supplier directory are all the same canonical Supplier entity.

Without entity resolution, every downstream question is wrong. Spend rolled up by supplier double-counts. Risk flags miss connected vendors. Contracts and invoices fail to link. Payment terms reconcile to nothing.

Astral's resolution pipeline runs in three layers:

  1. Deterministic match. Identical legal identifiers — VAT number (FR12345678901), DUNS, LEI, SIREN, EORI — collapse trivially. This is the easy 40–60% depending on data quality.

  2. Probabilistic match. Where deterministic identifiers are missing or disagree, Astral computes a similarity score across name (Levenshtein + tokenized + phonetic), address (normalized + geocoded), bank IBAN, tax ID prefix, and contact emails. Records above a tunable threshold are merged with a confidence score; records in the ambiguous middle are flagged for human review.

  3. Graph-aware match. When two records share neighbors — same UBO, same parent company, same bank account, same contracts — the probability they refer to the same entity rises. This is the step that pure-string matching misses and that gives a graph-native resolver an edge over a relational MDM tool.

Every merge writes a same-as edge with provenance: the source records, the matching algorithm, the confidence score, the human reviewer (if any). A merge is never destructive — the source records remain queryable, and a merge can be split if a downstream signal contradicts it. This is the discipline that DAMA-DMBOK calls "non-destructive master data management."3

The output is a single canonical Supplier node that all ERPs map to, with edges back to each source-system identifier. Now "total exposure to ACME" is one node-property aggregation, not a six-system reconciliation.

Knowledge graphs vs SQL vs vector DBs — when each one wins

A common confusion: "isn't a knowledge graph just a fancy database?" or "doesn't a vector database do the same thing?" No. They solve different problems and the best architectures combine all three.

Dimension Relational DB (Postgres, Snowflake) Vector DB (Pinecone, Qdrant, pgvector) Knowledge Graph (Neo4j, TigerGraph, Astral)
Primary data model Tables with foreign keys High-dimensional embeddings of unstructured text Typed nodes and edges with properties
Query language SQL Approximate-nearest-neighbor over similarity scores Cypher, SPARQL, GQL (ISO/IEC 39075:2024)4
Strength Aggregations, OLTP transactions, point lookups Semantic search over documents, "find similar to" Multi-hop relationship traversal, lineage, network analysis
Weakness Multi-hop joins explode in cost; recursion is awkward Cannot answer "who supplies whom"; no notion of typed relationships Slower at flat aggregations than columnar warehouses
When to use in finance GL postings, ledger transactions, period closes, tax tables Searching contract clauses, classifying email content, matching free-text descriptions Supplier networks, fraud detection, lineage queries, agent grounding
Failure mode 8-table join performance cliff at 10M+ rows Returns plausible-but-wrong results when no match exists Schema drift if relationships not maintained

The honest framing: SQL is great for transactional records, vector DBs are great for semantic search over unstructured text, and neither captures rich semantic relationships well. A knowledge graph encodes those relationships as first-class edges. For AI agents this changes everything — instead of asking the LLM to remember context (where it hallucinates) or stuff it into a prompt (where it gets truncated), the agent traverses a verified graph.

A 3-hop fraud query in Cypher illustrates the gap. "Find all suppliers paid more than €100k in the last 12 months that share a bank account or a UBO with a sanctioned entity":

MATCH (s:Supplier)-[:PAID_TO_AMOUNT]->(p:Payment)
WHERE p.amount > 100000 AND p.date > date() - duration('P12M')
MATCH (s)-[:OWNS|CONTROLLED_BY*1..2]-(shared)
MATCH (sanctioned:Supplier)-[:OWNS|CONTROLLED_BY*1..2]-(shared)
WHERE sanctioned.sanctions_status = 'OFAC_LISTED'
RETURN s.name, sanctioned.name, shared

Three lines after the variables. The relational equivalent involves a payments table self-joined to a suppliers table, joined to a UBO bridge table, joined to a bank account bridge table, joined back to suppliers, joined to a sanctions table — and you still have to recurse on ownership chains, which SQL only supports via verbose CTE expressions. The cost cliff at scale is the difference between a 200ms answer and a 4-minute query the analyst gives up on.

The Astral architecture — graph storage, query layer, ingestion pipeline

Astral is built as a service, not a feature. Its architecture follows three layers: storage, query/API, ingestion.

Storage layer. Astral uses a labeled property graph as its native model, hosted on Cloud infrastructure in Frankfurt and Belgium for EU data residency. The graph is sharded by tenant; every customer has an isolated subgraph with its own access policies. Properties are typed; edge cardinalities are constrained where the schema enforces them (an Invoice can reference at most one PO; a Supplier can have many Bank Accounts). The store is ACID at the transaction level — entity merges and relationship writes are atomic.

Query and API layer. Astral exposes three interfaces:

  • A Cypher-compatible query endpoint for power users and analytics teams.
  • A REST API surfacing common finance traversals (GET /supplier/{id}/contracts, GET /invoice/{id}/lineage, GET /risk/connected-suppliers).
  • A GraphQL endpoint that lets the Workflow Builder and AI agents traverse the graph in their native call patterns.

The GraphQL layer is what AI agents use. An agent constructing context for an invoice approval calls invoice(id) { contract { terms } supplier { ubo, riskScore, sanctionedConnections } billedTo { approvalChain } } and receives a single JSON tree with everything it needs — no follow-up calls, no LLM-mediated joins.

Ingestion pipeline. Astral subscribes to events from connected systems via Flowie's Workflow Builder. When SAP emits a new supplier record, NetSuite updates a contract, or a Peppol invoice arrives, the pipeline:

  1. Parses the source payload into the canonical schema.
  2. Runs entity resolution against the existing graph (deterministic → probabilistic → graph-aware).
  3. Writes new nodes and edges in a single transaction, or merges into existing canonical entities with same-as provenance edges.
  4. Emits change events on a topic that downstream agents and analytics subscribe to.

The pipeline is idempotent. Replaying an event produces the same graph state. This matters for regulatory replays — auditors can ask Astral to reconstruct the graph at any historical timestamp and run their queries against that snapshot.

Astral's hosting follows Flowie's broader compliance footprint: Cloud Frankfurt and Belgium, ISO 27001:2022 certified, GDPR-aligned, with encryption at rest (AES-256) and in transit (TLS 1.3). Tenant isolation is enforced at the storage and query layer.

Use cases enabled by a graph substrate

A knowledge graph is not the goal; the queries it makes possible are. Five categories of use case in finance and procurement justify the architecture.

Anti-fraud — collusive supplier networks

The classic procurement fraud pattern is the shell network: an employee sets up multiple "independent" suppliers that share an address, a bank account, or a UBO, then directs business to all of them. In a relational system this is invisible — each supplier passes its individual KYC check. In a knowledge graph it is one query.

Astral runs this query continuously, not on demand. Every new supplier is matched against the graph for shared addresses, IBANs, phone numbers, beneficial owners, and even web-domain registrants. Matches above the threshold raise a flag visible to the procurement controller. The SAS Institute estimates that occupational fraud costs organizations a median 5% of revenue annually, with collusive supplier schemes among the highest-loss categories.5

Compliance — invoice-to-contract lineage in one query

Auditors arrive with a sample of 50 invoices and ask: "for each one, show me the contract that governs the price, the PO that authorized the order, the approval chain, and the payment evidence." On a relational schema this is 50 separate research tasks. On a graph it is one traversal, expressed once and run 50 times.

For the France PA mandate, lineage is not optional. The DGFiP can demand for any invoice the linked contract, the issuing entity, the BU, and the matching payment record. Astral's lineage API returns this tree on a single call.

Supplier discovery — "find me suppliers like X"

Buyers regularly ask: "we need a supplier similar to Vendor X, who served BU Y at price point Z, in category Q, certified for ISO 14001, with delivery into Western Europe." This is a hybrid query — semantic similarity on category and capabilities, structured filtering on certifications and geography, network proximity on existing supplier relationships.

Astral combines a vector search over supplier descriptions and capabilities (the unstructured-text part) with a graph traversal over certifications, prior contracts, and BU relationships (the structured part). The result is a ranked list with explainable factors. This is the hybrid retrieval pattern detailed in the next section.

Spend visibility — drill from category to invoice line

Spend cubes built on warehouse aggregations are good at "category total" and bad at "show me the contract that justifies this line item." A graph drill-down goes the other direction: starting from a category total, expand to BUs, then contracts, then invoices, then specific lines — each step is an edge traversal, each level can carry attribution and approval metadata.

Flowie internal data shows finance controllers reduce spend-investigation cycle time by 40–60% when they replace cube exports with graph drill-downs (Flowie internal data, 2026). The savings come from eliminating the back-and-forth with IT to pull joins.

ROI and TCO modeling — supplier swaps and consolidation

"What happens if we replace Supplier A with Supplier B for category C?" The naive answer is a price comparison. The graph-native answer also accounts for the contracts that would terminate, the BUs whose approval chains would change, the bank accounts that would deactivate, the volume discounts that would lapse on adjacent categories, and the renegotiation triggers buried in MFN clauses. A 4-hop traversal surfaces all of this; a spreadsheet does not.

The same machinery powers consolidation modeling. After an acquisition, the question "which of our suppliers does the acquired entity also use, and what is the combined volume by category?" is the first one a CPO asks. With Astral the answer is a single overlap query against the resolved supplier graph; without it, it is a six-week consultant engagement. Flowie has run this query for customers post-merger and surfaced double-digit-percent overlap that the legacy spend cubes had failed to detect because the same supplier was registered under different IDs in each company's ERP (Flowie internal data, 2026).

Hybrid retrieval — graph plus vector for AI agent grounding

The pattern that makes Astral most useful to Flowie's AI agents is hybrid retrieval. Vector search and graph traversal each solve half the agent-context problem; together they solve the whole.

The pattern works in three steps:

  1. Vector search for semantic seeding. The agent embeds the user's query (or the document it is reasoning about) and runs an approximate-nearest-neighbor search over a vector store of contract clauses, supplier descriptions, prior tickets, and policy documents. The result is a small set of semantically relevant entities and chunks.

  2. Graph expansion for grounded context. Each entity returned by the vector search is a node anchor in the knowledge graph. The agent runs a bounded traversal — typically 1 to 3 hops — pulling the structured neighborhood: parent company, governing contract, recent invoices, approval chain, risk flags, related suppliers. This is where verified relationships replace LLM guesswork.

  3. Context assembly for the LLM. The combined output — semantic chunks plus graph neighborhood — is rendered as structured context for the LLM call. The LLM reasons over a tightly-scoped, citation-rich context window instead of a vague summary or a stuffed prompt.

This hybrid is what the literature calls graph-enhanced RAG. Microsoft Research's 2024 GraphRAG paper showed measurable accuracy improvements over vanilla RAG on multi-hop questions where relationships matter,1 and the pattern is now standard in production agentic systems. Astral implements it as a single GraphQL field that takes a query string and returns the hybrid context tree.

The reason this matters for finance: an agent approving an invoice does not need to "remember" the contract terms — it traverses to them. It does not need to "guess" whether a supplier is sanctioned — it sees the edge. It does not need to "summarize" the approval chain — it walks it. Hallucination is replaced by traversal. Provenance is preserved because every fact in the agent's context carries the node and edge it came from, queryable by an auditor.

A practical consequence: agent prompts get shorter and more deterministic. Instead of stuffing 8,000 tokens of "background context" into a system prompt and hoping the LLM picks the right pieces, Flowie agents receive a compact, schema-shaped JSON tree from Astral with the exact entities and edges relevant to the task. Token cost drops, latency improves, and — most importantly — the same agent run on the same invoice on two different days returns the same answer because the input is structured rather than narrative. Reproducibility is a precondition for audit acceptance, and hybrid retrieval is what delivers it.

FAQ

How is a knowledge graph different from a master data management system?

Master data management (MDM) tools — Informatica, Reltio, Stibo — focus on entity resolution and golden-record creation. They produce a clean Supplier table. A knowledge graph adds the relationships between entities and stores them as queryable edges. Most enterprise MDM tools today export to a graph for downstream analytics; Astral does both jobs natively, so the entity-resolution outputs feed the graph without a separate ETL step.

Do I need to learn Cypher or SPARQL to use Astral?

No. Most agents and Workflow Builder users interact with Astral through the REST and GraphQL APIs, which expose finance-shaped traversals (supplier 360, invoice lineage, connected-supplier risk) without writing query language. Cypher is available for analysts who want ad-hoc exploration. The new ISO/IEC 39075:2024 GQL standard4 is supported as it stabilizes across vendors.

How does Astral handle changes — does it rewrite history?

No. Astral uses bitemporal versioning: every node and edge carries a valid-from / valid-to timestamp pair, and a separate recorded-at timestamp showing when the graph itself learned the fact. Nothing is overwritten. Querying the graph "as of last quarter" returns the state the auditor saw at that time, even if subsequent merges or corrections have happened. This is the discipline that makes graph data audit-defensible.

Where does the vector index live — inside Astral or separately?

Both options are supported. For text-heavy properties (contract clauses, supplier descriptions, line-item descriptions) Astral maintains an embedded vector index alongside the graph, queryable in the same call. For larger document corpuses (full contract libraries, policy archives) Astral integrates with external vector databases — pgvector, Qdrant, Pinecone — and stores cross-references as edges back to the graph. The hybrid retrieval pattern works across either configuration.

What is the operational cost of running a knowledge graph at enterprise scale?

It depends on graph size and query mix. Astral's tenants typically run 10M–500M nodes and 50M–2B edges. Storage at that scale is dominated by edge count; query cost is dominated by traversal depth and result-set size. Hot workloads (agent grounding, fraud monitoring) are served from in-memory graph caches; cold queries (audit replays) hit the persistent store. Most customers see Astral as a sub-percent line item of total Flowie spend; the cost-cliff comparison is against the avoided 8-table SQL joins that previously demanded a dedicated DBA team.

Can the knowledge graph replace my data warehouse?

No. The warehouse is still the right place for petabyte-scale analytical aggregations — period closes, cube refreshes, board reporting. The graph complements it for relationship-heavy queries and agent grounding. The two are connected: Astral subscribes to changes from the warehouse, and the warehouse can ingest Astral edges as fact tables for downstream BI. Treat them as siblings, not as competitors.

Next steps

If you are evaluating whether a knowledge graph belongs in your finance stack, start with How AI Agents Actually Work in Finance Operations to see what agents do with the graph as their grounding layer, and Multi-ERP Orchestration vs ERP Replacement for the entity-resolution case in multi-ERP environments. For the broader category framing, Agentic Orchestration for Finance & Procurement sets the context that knowledge graphs serve.

For the platform view, see Flowie Astral and Flowie AI Agents. To map a graph against your specific entity model, contact our team.

Footnotes

  1. Edge, D. et al. "From Local to Global: A Graph RAG Approach to Query-Focused Summarization." Microsoft Research, April 2024. https://arxiv.org/abs/2404.16130 2

  2. Gartner, "Hype Cycle for Data Management, 2025." Knowledge graph technology positioned at "Slope of Enlightenment" / early "Plateau of Productivity."

  3. DAMA International, "DAMA-DMBOK: Data Management Body of Knowledge," 2nd Edition, on non-destructive master data management practices.

  4. ISO/IEC 39075:2024 — Information technology — Database languages — GQL. Published April 2024 as the first ISO standard for graph query languages. https://www.iso.org/standard/76120.html 2

  5. Association of Certified Fraud Examiners, "Report to the Nations 2024," on the median 5% revenue loss to occupational fraud and supplier-collusion patterns.

Sources

Reference sources cited in this guide

  1. https://www.iso.org/standard/76120.html
  2. https://www.gartner.com/en/documents/4022219
  3. https://neo4j.com/developer/graph-database/
  4. https://www.w3.org/TR/sparql11-query/
  5. https://arxiv.org/abs/2404.16130
  6. https://research.google/pubs/pub45634/
  7. https://www.dama.org/cpages/body-of-knowledge

Want to discuss this with our team? Talk to Flowie at get-flowie.com.