This is Tau!
Tau is inspired by Pi.AI and if you havent tried Pi yet, I strongly encourage you to try.
Like Pi, Tau's conversation is on continual conversation, unlike Chat based bots which feature many conversations and threads.
This is by design - Tau has a single conversation, like speaking to a human.
This is reflected by consulting Tau in decisions made along development: Order of features, voice type, etc.
Tau is a personal fun project.
I opened it as an open source for anyone to experiment with (fork), or just follow. (A star is appreciated!)
If you fork - delete history and facts to reset their knowledge and embark the journey anew!
- System Prompt: Speech-actions speak conversation structure.
- Conversation loop: A continueous conversation with ongoing context.
- Immediate memory: Reduce context by summarizing it to key points. Inject memory to System prompt.
- Long term memory: Save the running memory to vector database.
- Speech: Voice based conversation with hearing and speaking. (Whisper and OpenAI TTS)
- Vision infra: Set up Hailo-8L as an internal vision webservice.
- Setup Hailo-8L on Raspberry Pi, validate examples work.
- Look for best practices and options for integrating Hailo in your application.
- Find a suitable, working architecture to wrap hailo as a service
- Implement and improve the wrapper
- Pending Hailo review (update, will be integrates as community-examples, confirmed by Hailo)
- Integrate in the system, allow Tau to recognize faces
- add more-than-one models to be used serially, or use different devices (Coral, Sony AI Camera x2, Jetson)
- Long term fetching: Pull from long term memory into context.
- Auto-start on device startup.
- Long term memory archiving support.
- Entity based memory: Add GraphRAG based memory.
- Learn about GraphRAG, how to implement, etc.
- Use or implement GraphRAG
- Design further split to applications, event communications
- Setup Nvidia Jetson Orin Nano Super 8GB
- Local LLM on Jetson (Llama 3.2 3:b)
- Local Speech to text (faster-whisper) on Jetson
- WebRT VAD
- Silero VAD
- Implement Text to speech
- piperTTS
- Write a setup guide for Nvidia Jetson Orin Nano Super 8GB
- Build every component as a single event-based app
- Communication infra with websocket or unix domain socket (Global)
- Configuration infra, local configuration per device (Global)
- Detect main component, connects the secondary device to main device (Global)
- LLM as a service (Jetson)
- Speech detection as a service (Jetson)
- Speech as a service (Jetson)
- Add kokoroTTS support
- Memory as a service (Jetson)
- Vision as a service (Raspberry Pi)
- Face as a service (Raspberry Pi)
- Main loop (Jetson)
- Integrate Nvidia Jetson Orin Nano Super 8GB
- Integrate Hailo 10 as inference station (Llama 3.2 3b)
- Advanced voice: Move to ElevenLabs advanced voices.
- Tool use
- Add frameqork for actions:
- Open live camera feed action
- Snap a picture
- Add aec for voice recognition from https://gist.github.com/thewh1teagle/929af1c6b05d5f96ceef01130e758471
- Introspection: Add Introspection agent for active and background thinking and processing.
- Growth: Add nightly finetuning, move to smaller model.
Tau should be able to run on any linux with internet, but was tested only on a raspberry pi 5 8GB with official OS 64bit.
Raspberry AI Kit is needed for vision (Can be disabled in code - configuration support per request/in future)
All needed keys are in .env_sample.
Copy it to .env and add your keys.
Currently, the main key is OpenAI (Chat, Speech, Whisper), and VoyageAI + Pinecone is for vectordb
I plan on moving back to Anthropic (3.5 sonnet only)
Groq was used for a fast understand action usecase
- Cloning Git repositories 1.1. Clone this repository to your Raspberry Pi:
git clone https://github.com/OriNachum/autonomous-intelligence.git
1.2. Clone this repository to your Raspberry Pi:
git clone https://github.com/OriNachum/hailo-rpi5-examples.git
I have a pending PR to integrate this to main repo.
https://github.com/hailo-ai/hailo-rpi5-examples/pull/50
If you do, set up the your machine for Hailo-8L chip per Hailo's instructions.
- Copy .env_sample to .env and add all keys:
- ANTHROPIC_API_KEY: used for Claude based text completion and vision. Currently unused.
- OPENAI_API_KEY: Used for Speech, Whisper, vision and text.
- GROQ_API_KEY: Used for a super quick action understanding, May be replaced with embeddings.
- VOYAGE_API_KEY: VoyageAI is recommended by Anthropic. They offer the best embeddings to date (of when I selected it), and offer a great option for innovators.
- PINECONE_API_KEY: API Key of pinecone. Serverless is a great option.
- PINECONE_DIMENSION: Dimension of the embeddings generated by Voyage. Used for the setup of Pinecone
- PINECONE_INDEX_NAME: Name of the index in Pinecone, for memory
There are five programs to run by this order:
- hailo-rpi5-examples:
- basic-pipelines/detection_service.py: This runs the camera and emits events on changes on detection
- autonomous-intelligence
- services/face_service.py: this starts the face app, and reacts when speech occurs
- tau.py: this is the main LLM conversation loop
- tau_speech.py: this consumes speech events, and produces actual speech
- services/microphone_listener.py this listens to your speech and emits events to tau.py as input
There are multiple people for which I want to acknowledge for this development.
Of them, these are the people who confirmed for me to mention them:
- @Sagigamil
This project is licensed under the MIT License.