Redditor Details Law Firm's $35,000 Investment in Self-Hosted LLM

Discussion of setup and costs offers insights into "private" legal AI tools

May 28, 2025

Attorneys and IP professionals who have read this blog’s posts about AI use might be aware of the concerns for privacy, confidentiality, and more. However, based on what the CEOs are saying about AI taking over the workforce in all industries, it would be unreasonable to proverbially throw the AI-baby out with the data-collection-bathwater. Other potential solutions need to be examined.

As law firms and IP-owning companies increasingly explore AI applications, the handling of sensitive legal data necessitates solutions that go beyond simply using third-party APIs to access powerful language models like those offered by OpenAI and Google. A recent online discussion on Reddit sheds light on one approach to this challenge: building a fully private, self-hosted AI setup. (NB: Be careful on Reddit, as it can get scary and/or inappropriate quickly.)

A Redditor, identified as eeko_systems, recently shared the experience of closing a $35,000 deal with a mid-sized law firm to construct and implement a private AI system. The core motivation for the firm was the need for control, privacy, and automation, without wanting to hire an internal AI team.

The system was designed to function as a "full blown internal system," aiming to be their own "GPT4-tier legal analyst" capable of processing internal case law, filings, and contracts, answering complex questions, and summarizing documents. A key requirement was zero exposure to third-party LLM providers like OpenAI or Anthropic.

A New Technical Architecture for Privacy and Control?

The technical stack chosen for this private AI solution prioritizes self-hosting and control. According to eeko_systems, the setup includes:

LLaMA 3 70B: A large language model chosen for its capability to be self-hosted and its suitability for professional use cases like law when deployed properly. It is used in a quantized and accelerated form using vLLM. Eeko_systems specifically chose LLaMA over models requiring third-party APIs, such as Anthropic, because it allows for complete control, data privacy, and avoids vendor lock-in, which is crucial for handling sensitive legal information.
Private Hosting: The model is hosted privately on CoreWeave using dual A100 GPUs. This provides an isolated, locked-down instance in a secure data center, ensuring the firm's stack and data remain under their control without shared environments or third-party APIs. The estimated monthly cost for this GPU hosting is approximately ~$1,200. This figure is based on the per-GPU cost of an A100 rather than a full node, which is significantly higher, and can be optimized with reserved instances or smart scheduling.
ChromaDB: Utilized as the vector store to manage the embedding and retrieval of documents.
LlamaIndex: Powers the Retrieval Augmented Generation (RAG) pipeline, enabling real-time question answering over the firm's case files. Eeko_systems explains that LlamaIndex facilitates document chunking, embedding, and querying over ChromaDB, connecting the private LLaMA model to the firm's files to return accurate, context-aware answers. Embedding models like text-embedding-3-small or BAAI/bge-small-en are used to convert document chunks into vectors.
n8n: Acts as the automation "glue" for eeko_systems’ entire system. Instead of building a traditional backend, n8n provides flexibility for automating tasks. This includes monitoring a shared Google Drive folder for new documents, automatically converting, chunking, and embedding documents into ChromaDB, initiating summary jobs with the LLM, routing results via Slack or email, and handling incoming staff questions.
- Eeko_systems notes that n8n is underrated for LLM-based workflow automation and makes the project modular, flexible, and fast. The system is designed such that paralegals can simply drop new documents into a folder and gain access to summaries and search capabilities within minutes. The firm's staff can also edit or extend the workflows in n8n themselves.
Streamlit: Reportedly provides a simple web UI for staff interaction with the model, allowing them to chat, ask questions, and get summaries instantly.
Security: Eeko_systems’ entire setup is wrapped in a secure configuration featuring JWT authentication, IP access controls, and full audit logging. Logging is performed for compliance, reporting, and future audits.

This comprehensive setup, costing $35,000 initially with a projected monthly hosting cost of ~$1,200, is viewed by the law firm as an investment that could “pay itself back in one quarter,” based on the expectation of saving dozens of hours per week.

According to commenters, the biggest time-saver is anticipated to be document review, where tasks previously taking hours might now be completed in minutes by asking the AI specific questions and receiving answers with source references. Presumably this would include staff work (e.g., non-billable) to process and prepare for practitioner review.

The Privacy vs. Cloud Performance Calculus

According to the Reddit post, the decision to build a private, self-hosted system highlights a crucial trade-off for firms handling sensitive data: the desire for privacy and control versus the potential performance and ease of use offered by leading cloud-based LLMs.

Eeko_systems elaborates that, for law firms, the inability to upload client data to public LLM services due to privacy and compliance concerns is a significant driver for seeking private solutions. Strict legal data policies often prohibit sending litigation data via APIs to companies with potentially loose terms of service regarding data usage, risking violations of secrecy laws and regulations.

While some commenters note that cloud providers like Azure and Google offer enterprise agreements and data residency options claimed to be GDPR-ready and potentially more secure than local networks, with contractual backing, the perceived advantage of a fully self-hosted setup is complete data sovereignty and control over processing.

The core principle for eeko_systems' client appears to be ensuring data is not fed back to the LLM provider.

Addressing Hallucinations with RAG

A significant challenge in deploying LLMs for high-stakes professional use cases like law is the risk of hallucinations – generating incorrect or fabricated information.

A commentor with experience in the legal field expressed strong skepticism about LLaMA 3 70B's accuracy for legal queries, citing an "awful rate of errors" even when used with a strong RAG pipeline. This commentor highlighted instances where lawyers have been fined for submitting briefs containing hallucinated case law, underscoring the legal field's low tolerance for mistakes. Attorneys and IP professionals should be aware of this growing pandemic.

Eeko_systems' technical design relies heavily on the RAG pipeline powered by LlamaIndex and ChromaDB to mitigate this risk. The RAG approach grounds the model's responses in the specific documents provided by the firm, aiming to produce accurate, context-aware answers with source references. The process involves breaking down documents into chunks, embedding them, and retrieving the most relevant chunks based on a user's query to provide context to the LLM.

Eeko_systems mentions tuning ChromaDB heavily for performance and potentially migrating to other vector databases like Weaviate or Qdrant if necessary. However, some commenters still raise concerns about how RAG handles large, complex, or conflicting documents, as well as the potential for quality degradation from using quantized models.

Insights from Others

The discussion thread accompanying the original post reveals that other practitioners are pursuing similar solutions and offer valuable insights.

One commentor, Low-Air-8542, describes a different approach using MacBook M4 Max machines as scalable, local processing units. This setup leverages Docker, Open WebUI, n8n for agents, Ollama & LM Studio, Postgres, and Qdrant. Also, when the system reach obsolescence in a few years, the MacBooks can still be used as workstations.

A key feature of Low-Air-8542's setup is an agent orchestrator that evaluates the complexity of a task and determines whether to process it locally or send it to a cloud-based AI (like OpenAI, Claude, or Gemini) after sensitive data has been stripped and replaced with dummy information. This "search and replace" method aims to leverage the superior results of cloud models while protecting sensitive data. Low-Air-8542 notes that even large local models can yield weaker results compared to cloud-based ones.

Other commentors are building similar AI solutions in different industries, such as real estate and healthcare (specifically HIPAA compliant versions for dental chains), indicating a growing market for specialized, private AI applications. The focus is often on automating time-consuming tasks like financial analysis, drafting documents, or handling maintenance requests.

The complexity of these custom setups also introduces challenges. Commentors highlighted the necessity of mastering multiple disciplines to build such a system. Maintenance is perceived as a significant pain point and a requirement. Cybersecurity, including potential risks like prompt injection and the need for security monitoring agents, is also mentioned as a crucial but often overlooked aspect.

There is an open question of liability in case of errors or data breaches is a serious concern for practitioners building these systems. Furthermore, reliance on a single individual or small team for maintenance and support raises questions about business continuity if something happens to the builder.

The Debate Over Pricing

The $35,000 price tag for the setup sparked considerable discussion among commentors, with many arguing that it was significantly underpriced. Commentors felt the price did not fully reflect the complexity of the build, the time investment required (although eeko_systems stated it would take less than two weeks with one freelancer), or the substantial value delivered to the law firm by saving dozens of hours per week.

Some suggested that the value-based pricing, considering the significant time savings, would justify a much higher cost. Eeko_systems acknowledged that the price might have been too low, particularly given the involvement of a freelancer, but viewed the deal as a valuable case study for securing higher-paying projects in the future.

Many commenters advised adding a recurring maintenance or support fee to ensure long-term revenue and cover potential issues.

Lingering Questions Regarding "Full Privacy"

Despite the emphasis on "fully private" and "self-hosted," several commentors raised valid questions about the setup's absolute privacy, particularly concerning the use of Google Drive for document intake and hosting on CoreWeave. They questioned how data flowing through or residing on third-party platforms could be considered truly private and compliant with strict legal requirements.

Eeko_systems clarified that Google Drive serves only as a secure intake method. Files are pulled from Drive, processed by the internal system, and then cleared, with nothing being stored long-term on Drive. Only the client's team and system have access.

The hosted GPUs on CoreWeave are described as isolated instances, not typical SaaS, ensuring the client's data and rules are maintained. The primary goal is preventing data from being used by third-party LLM providers. However, skepticism remains among some commentors regarding the inherent privacy of using services like Google Drive, questioning whether Google might still access the data.

The security of the overall stack and the potential for data exfiltration, even from a local setup, are also highlighted as critical considerations. Of course, nothing is perfectly secure, so choosing any AI tool—local or in the cloud—likely depends on a degree of risk tolerance and industry norms.

The Dynamic Future of Private Legal AI

The rapid pace of AI development suggests that current private LLM solutions may have a limited lifespan before requiring significant updates or replacement. One commentor estimates that such systems might become obsolete in 1-2 years. The need for robust maintenance and ongoing development seems obvious. While the modular and containerized design chosen by eeko_systems is intended to facilitate maintenance and upgrades, the client reportedly declined optional ongoing support packages.

The market for private AI for professionals, including law, finance, and healthcare, is seen as real and growing. Practitioners are exploring productizing these solutions. However, the potential entry of major cloud providers and AI companies offering more robust or easier-to-deploy private instances in the future could impact the landscape for custom builders. Again, trust

Conclusion

The case of this $35,000 private AI deal for a law firm illustrates the increasing demand for AI solutions that prioritize data privacy and control in sensitive industries.

The Reddit discussion highlights the complexities involved and the ongoing evolution of private AI solutions for legal practitioners and IP owners—no one seems to agree!

While building a self-hosted system with open-source models addresses the need to avoid sharing data with large third-party LLM providers, it introduces challenges related to accuracy, maintenance, cybersecurity, and cost.

Personally, this blogger is testing and customizing his system on an older Dell laptop with 32 GB RAM and a dated 8GB graphics card using Docker, ngrok, Ollama, Open Web UI, and n8n. It is private and fun to test and tweak but not consistent and robust in its outputs enough to replace Gemini and ChatGPT.

These “private” systems, while offering significant potential for efficiency gains over manual workflows, require careful consideration of their limitations, security implications, and the need for ongoing management in a rapidly changing technological environment.

Disclaimer: This is provided for informational purposes only and does not constitute legal or financial advice. To the extent there are any opinions in this article, they are the author’s alone and do not represent the beliefs of his firm or clients. The strategies expressed are purely speculation based on publicly available information. The information expressed is subject to change at any time and should be checked for completeness, accuracy and current applicability. For advice, consult a suitably licensed attorney and/or patent professional.