Model Gallery

Discover and install AI models from our curated collection

13 models available
1 repositories
Documentation

Find Your Perfect Model

Filter by Model Type

Browse by Tags

latitudegames_wayfarer-large-70b-llama-3.3
We’ve heard over and over from AI Dungeon players that modern AI models are too nice, never letting them fail or die. While it may be good for a chatbot to be nice and helpful, great stories and games aren’t all rainbows and unicorns. They have conflict, tension, and even death. These create real stakes and consequences for characters and the journeys they go on. Similarly, great games need opposition. You must be able to fail, die, and may even have to start over. This makes games more fun! However, the vast majority of AI models, through alignment RLHF, have been trained away from darkness, violence, or conflict, preventing them from fulfilling this role. To give our players better options, we decided to train our own model to fix these issues. The Wayfarer model series are a set of adventure role-play models specifically trained to give players a challenging and dangerous experience. We wanted to contribute back to the open source community that we’ve benefitted so much from so we open sourced a 12b parameter version version back in Jan. We thought people would love it but people were even more excited than we expected. Due to popular request we decided to train a larger 70b version based on Llama 3.3.

Repository: localaiLicense: llama3.3

calme-2.2-qwen2.5-72b-i1
This model is a fine-tuned version of the powerful Qwen/Qwen2.5-72B-Instruct, pushing the boundaries of natural language understanding and generation even further. My goal was to create a versatile and robust model that excels across a wide range of benchmarks and real-world applications. Use Cases This model is suitable for a wide range of applications, including but not limited to: Advanced question-answering systems Intelligent chatbots and virtual assistants Content generation and summarization Code generation and analysis Complex problem-solving and decision support

Repository: localaiLicense: apache-2.0

doctoraifinetune-3.1-8b-i1
This is a fine-tuned version of the Meta-Llama-3.1-8B-bnb-4bit model, specifically adapted for the medical field. It has been trained using a dataset that provides extensive information on diseases, symptoms, and treatments, making it ideal for AI-powered healthcare tools such as medical chatbots, virtual assistants, and diagnostic support systems. Key Features Disease Diagnosis: Accurately identifies diseases based on symptoms provided by the user. Symptom Analysis: Breaks down and interprets symptoms to provide a comprehensive medical overview. Treatment Recommendations: Suggests treatments and remedies according to medical conditions. Dataset The model is fine-tuned on 2000 rows from a dataset consisting of 272k rows. This dataset includes rich information about diseases, symptoms, and their corresponding treatments. The model is continuously being updated and will be further trained on the remaining data in future releases to improve accuracy and capabilities.

Repository: localaiLicense: llama3.1

calme-2.2-qwen2-72b
This model is a fine-tuned version of the powerful Qwen/Qwen2-72B-Instruct, pushing the boundaries of natural language understanding and generation even further. My goal was to create a versatile and robust model that excels across a wide range of benchmarks and real-world applications. The post-training process is identical to the calme-2.1-qwen2-72b model; however, some parameters are different, and it was trained for a longer period. Use Cases This model is suitable for a wide range of applications, including but not limited to: Advanced question-answering systems Intelligent chatbots and virtual assistants Content generation and summarization Code generation and analysis Complex problem-solving and decision support

Repository: localaiLicense: apache-2.0

chronos-gold-12b-1.0
Chronos Gold 12B 1.0 is a very unique model that applies to domain areas such as general chatbot functionatliy, roleplay, and storywriting. The model has been observed to write up to 2250 tokens in a single sequence. The model was trained at a sequence length of 16384 (16k) and will still retain the apparent 128k context length from Mistral-Nemo, though it deteriorates over time like regular Nemo does based on the RULER Test As a result, is recommended to keep your sequence length max at 16384, or you will experience performance degredation. The base model is mistralai/Mistral-Nemo-Base-2407 which was heavily modified to produce a more coherent model, comparable to much larger models. Chronos Gold 12B-1.0 re-creates the uniqueness of the original Chronos with significiantly enhanced prompt adherence (following), coherence, a modern dataset, as well as supporting a majority of "character card" formats in applications like SillyTavern. It went through an iterative and objective merge process as my previous models and was further finetuned on a dataset curated for it. The specifics of the model will not be disclosed at the time due to dataset ownership.

Repository: localaiLicense: apache-2.0

wayfarer-12b
We’ve heard over and over from AI Dungeon players that modern AI models are too nice, never letting them fail or die. While it may be good for a chatbot to be nice and helpful, great stories and games aren’t all rainbows and unicorns. They have conflict, tension, and even death. These create real stakes and consequences for characters and the journeys they go on. Similarly, great games need opposition. You must be able to fail, die, and may even have to start over. This makes games more fun! However, the vast majority of AI models, through alignment RLHF, have been trained away from darkness, violence, or conflict, preventing them from fulfilling this role. To give our players better options, we decided to train our own model to fix these issues. Wayfarer is an adventure role-play model specifically trained to give players a challenging and dangerous experience. We thought they would like it, but since releasing it on AI Dungeon, players have reacted even more positively than we expected. Because they loved it so much, we’ve decided to open-source the model so anyone can experience unforgivingly brutal AI adventures! Anyone can download the model to run locally. Or if you want to easily try this model for free, you can do so at https://aidungeon.com. We plan to continue improving and open-sourcing similar models, so please share any and all feedback on how we can improve model behavior. Below we share more details on how Wayfarer was created.

Repository: localaiLicense: apache-2.0

emo-2b
EMO-2B: Emotionally Intelligent Conversational AI Overview: EMO-2B is a state-of-the-art conversational AI model with 2.5 billion parameters, designed to engage in emotionally resonant dialogue. Building upon the success of EMO-1.5B, this model has been further fine-tuned on an extensive corpus of emotional narratives, enabling it to perceive and respond to the emotional undertones of user inputs with exceptional empathy and emotional intelligence. Key Features: - Advanced Emotional Intelligence: With its increased capacity, EMO-2B demonstrates an even deeper understanding and generation of emotional language, allowing for more nuanced and contextually appropriate emotional responses. - Enhanced Contextual Awareness: The model considers an even broader context within conversations, accounting for subtle emotional cues and providing emotionally resonant responses tailored to the specific situation. - Empathetic and Supportive Dialogue: EMO-2B excels at active listening, validating emotions, offering compassionate advice, and providing emotional support, making it an ideal companion for users seeking empathy and understanding. - Dynamic Persona Adaptation: The model can dynamically adapt its persona, communication style, and emotional responses to match the user's emotional state, ensuring a highly personalized and tailored conversational experience. Use Cases: EMO-2B is well-suited for a variety of applications where emotional intelligence and empathetic communication are crucial, such as: - Mental health support chatbots - Emotional support companions - Personalized coaching and motivation - Narrative storytelling and interactive fiction - Customer service and support (for emotionally sensitive contexts) Limitations and Ethical Considerations: While EMO-2B is designed to provide emotionally intelligent and empathetic responses, it is important to note that it is an AI system and cannot replicate the depth and nuance of human emotional intelligence. Users should be aware that the model's responses, while emotionally supportive, should not be considered a substitute for professional mental health support or counseling. Additionally, as with any language model, EMO-2B may reflect biases present in its training data. Users should exercise caution and critical thinking when interacting with the model, and report any concerning or inappropriate responses.

Repository: localaiLicense: gemma

qwen-sea-lion-v4-32b-it-i1
**Model Name:** Qwen-SEA-LION-v4-32B-IT **Base Model:** Qwen3-32B **Type:** Instruction-tuned Large Language Model (LLM) **Language Support:** 11 languages including English, Mandarin, Burmese, Indonesian, Malay, Filipino, Tamil, Thai, Vietnamese, Khmer, and Lao **Context Length:** 128,000 tokens **Repository:** [aisingapore/Qwen-SEA-LION-v4-32B-IT](https://huggingface.co/aisingapore/Qwen-SEA-LION-v4-32B-IT) **License:** [Qwen Terms of Service](https://qwen.ai/termsservice) / [Qwen Usage Policy](https://qwen.ai/usagepolicy) **Overview:** Qwen-SEA-LION-v4-32B-IT is a high-performance, multilingual instruction-tuned LLM developed by AI Singapore, specifically optimized for Southeast Asia (SEA). Built on the Qwen3-32B foundation, it underwent continued pre-training on 100B tokens from the SEA-Pile v2 corpus and further fine-tuned on ~8 million question-answer pairs to enhance instruction-following and reasoning. Designed for real-world multilingual applications across government, education, and business sectors in Southeast Asia, it delivers strong performance in dialogue, content generation, and cross-lingual tasks. **Key Features:** - Trained for 11 major SEA languages with high linguistic accuracy - 128K token context for long-form content and complex reasoning - Optimized for instruction following, multi-turn dialogue, and cultural relevance - Available in full precision and quantized variants (4-bit/8-bit) - Not safety-aligned — suitable for downstream safety fine-tuning **Use Cases:** - Multilingual chatbots and virtual assistants in SEA regions - Cross-lingual content generation and translation - Educational tools and public sector applications in Southeast Asia - Research and development in low-resource language modeling **Note:** This model is not safety-aligned. Use with caution and consider additional alignment measures for production deployment. **Contact:** [sealion@aisingapore.org](mailto:sealion@aisingapore.org) for inquiries.

Repository: localaiLicense: apache-2.0

simia-tau-sft-qwen3-8b
The **Simia-Tau-SFT-Qwen3-8B** is a fine-tuned version of the Qwen3-8B language model, developed by Simia-Agent and adapted for enhanced instruction-following capabilities. This model is optimized for dialogue and task-oriented interactions, making it highly effective for real-world applications requiring nuanced understanding and coherent responses. The model is available in multiple quantized formats (GGUF), including Q4_K_S, Q5_K_M, Q8_0, and others, enabling efficient deployment across devices with varying computational resources. These quantized versions maintain strong performance while reducing memory footprint and inference latency. While this repository hosts a quantized variant (specifically designed for GGUF-based inference via tools like llama.cpp), the original base model is **Qwen3-8B**, a large-scale open-source language model from Alibaba Cloud. The fine-tuning (SFT) process improves its alignment with human intent and enhances its ability to follow complex instructions. > 🔍 **Note**: This is a quantized version; for the full-precision base model, refer to [Simia-Agent/Simia-Tau-SFT-Qwen3-8B](https://huggingface.co/Simia-Agent/Simia-Tau-SFT-Qwen3-8B) on Hugging Face. **Use Case**: Ideal for chatbots, assistant systems, and interactive applications requiring strong reasoning, safety, and fluency. **Model Size**: 8B parameters (quantized for efficiency). **License**: See the original model's license (typically Apache 2.0 for Qwen series). 👉 Recommended for edge deployment with GGUF-compatible tools.

Repository: localaiLicense: apache-2.0

qwen3-vlto-32b-thinking
**Model Name:** Qwen3-VLTO-32B-Thinking **Model Type:** Large Language Model (Text-Only) **Base Model:** Qwen/Qwen3-VL-32B-Thinking (vanilla Qwen3-VL-32B with vision components removed) **Architecture:** Transformer-based, 32-billion parameter model optimized for reasoning and complex text generation. ### Description: Qwen3-VLTO-32B-Thinking is a pure text-only variant of the Qwen3-VL-32B-Thinking model, stripped of its vision capabilities while preserving the full reasoning and language understanding power. It is derived by transferring the weights from the vision-language model into a text-only transformer architecture, maintaining the same high-quality behavior for tasks such as logical reasoning, code generation, and dialogue. This model is ideal for applications requiring deep linguistic reasoning and long-context understanding without image input. It supports advanced multimodal reasoning capabilities *in text form*—perfect for research, chatbots, and content generation. ### Key Features: - ✅ 32B parameters, high reasoning capability - ✅ No vision components — fully text-only - ✅ Trained for complex thinking and step-by-step reasoning - ✅ Compatible with Hugging Face Transformers and GGUF inference tools - ✅ Available in multiple quantization levels (Q2_K to Q8_0) for efficient deployment ### Use Case: Ideal for advanced text generation, logical inference, coding, and conversational AI where vision is not needed. > 🔗 **Base Model**: [Qwen/Qwen3-VL-32B-Thinking](https://huggingface.co/Qwen/Qwen3-VL-32B-Thinking) > 📦 **Quantized Versions**: Available via [mradermacher/Qwen3-VLTO-32B-Thinking-GGUF](https://huggingface.co/mradermacher/Qwen3-VLTO-32B-Thinking-GGUF) --- *Note: The original model was created by Alibaba’s Qwen team. This variant was adapted by qingy2024 and quantized by mradermacher.*

Repository: localaiLicense: apache-2.0

qwen3-nemotron-32b-rlbff-i1
**Model Name:** Qwen3-Nemotron-32B-RLBFF **Base Model:** Qwen/Qwen3-32B **Developer:** NVIDIA **License:** NVIDIA Open Model License **Description:** Qwen3-Nemotron-32B-RLBFF is a high-performance, fine-tuned large language model built on the Qwen3-32B foundation. It is specifically optimized to generate high-quality, helpful responses in a default thinking mode through advanced reinforcement learning with binary flexible feedback (RLBFF). Trained on the HelpSteer3 dataset, this model excels in reasoning, planning, coding, and information-seeking tasks while maintaining strong safety and alignment with human preferences. **Key Performance (as of Sep 2025):** - **MT-Bench:** 9.50 (near GPT-4-Turbo level) - **Arena Hard V2:** 55.6% - **WildBench:** 70.33% **Architecture & Efficiency:** - 32 billion parameters, based on the Qwen3 Transformer architecture - Designed for deployment on NVIDIA GPUs (Ampere, Hopper, Turing) - Achieves performance comparable to DeepSeek R1 and O3-mini at less than 5% of the inference cost **Use Case:** Ideal for applications requiring reliable, thoughtful, and safe responses—such as advanced chatbots, research assistants, and enterprise AI systems. **Access & Usage:** Available on Hugging Face with support for Hugging Face Transformers and vLLM. **Cite:** [Wang et al., 2025 — RLBFF: Binary Flexible Feedback](https://arxiv.org/abs/2509.21319) 👉 *Note: The GGUF version (mradermacher/Qwen3-Nemotron-32B-RLBFF-i1-GGUF) is a user-quantized variant. The original model is available at nvidia/Qwen3-Nemotron-32B-RLBFF.*

Repository: localaiLicense: apache-2.0

qwen3-4b-thinking-2507-gspo-easy
**Model Name:** Qwen3-4B-Thinking-2507-GSPO-Easy **Base Model:** Qwen3-4B (by Alibaba Cloud) **Fine-tuned With:** GRPO (Generalized Reward Policy Optimization) **Framework:** Hugging Face TRL (Transformers Reinforcement Learning) **License:** [MIT](https://huggingface.co/leonMW/Qwen3-4B-Thinking-2507-GSPO-Easy/blob/main/LICENSE) --- ### 📌 Description: A fine-tuned 4-billion-parameter version of **Qwen3-4B**, optimized for **step-by-step reasoning and complex problem-solving** using **GRPO**, a reinforcement learning method designed to enhance mathematical and logical reasoning in language models. This model excels in tasks requiring **structured thinking**, such as solving math problems, logical puzzles, and multi-step reasoning, making it ideal for applications in education, AI assistants, and reasoning benchmarks. ### 🔧 Key Features: - Trained with **TRL 0.23.1** and **Transformers 4.57.1** - Optimized for **high-quality reasoning output** - Part of the **Qwen3-4B-Thinking** series, designed to simulate human-like thought processes - Compatible with Hugging Face `transformers` and `pipeline` API ### 📚 Use Case: Perfect for applications demanding **deep reasoning**, such as: - AI tutoring systems - Advanced chatbots with explanation capabilities - Automated problem-solving in STEM domains ### 📌 Quick Start (Python): ```python from transformers import pipeline question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?" generator = pipeline("text-generation", model="leonMW/Qwen3-4B-Thinking-2507-GSPO-Easy", device="cuda") output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0] print(output["generated_text"]) ``` > ✅ **Note**: This is the **original, non-quantized base model**. Quantized versions (e.g., GGUF) are available separately under the same repository for efficient inference on consumer hardware. --- 🔗 **Model Page:** [https://huggingface.co/leonMW/Qwen3-4B-Thinking-2507-GSPO-Easy](https://huggingface.co/leonMW/Qwen3-4B-Thinking-2507-GSPO-Easy) 📝 **Training Details & Visualizations:** [WandB Dashboard](https://wandb.ai/leonwenderoth-tu-darmstadt/huggingface/runs/t42skrc7) --- *Fine-tuned using GRPO — a method proven to boost mathematical reasoning in open language models. Cite: Shao et al., 2024 (arXiv:2402.03300)*

Repository: localaiLicense: apache-2.0

grovemoe-base-i1
**GroveMoE-Base** *Efficient, Sparse Mixture-of-Experts LLM with Adjugate Experts* GroveMoE-Base is a 33-billion-parameter sparse Mixture-of-Experts (MoE) language model designed for high efficiency and strong performance. Unlike dense models, only 3.14–3.28 billion parameters are activated per token, drastically reducing computational cost while maintaining high capability. **Key Features:** - **Novel Architecture**: Uses *adjugate experts* to dynamically allocate computation, enabling shared processing and significant FLOP reduction. - **Efficient Inference**: Achieves high throughput with low latency, ideal for deployment in resource-constrained environments. - **Based on Qwen3-30B-A3B-Base**: Up-cycled through mid-training and supervised fine-tuning, preserving strong pre-trained knowledge while adding new capabilities. **Use Cases:** Ideal for applications requiring efficient large-scale language understanding and generation—such as chatbots, content creation, and code generation—where speed and resource efficiency are critical. **Paper:** [GroveMoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts](https://arxiv.org/abs/2508.07785) **Model Hub:** [inclusionAI/GroveMoE-Base](https://huggingface.co/inclusionAI/GroveMoE-Base) **GitHub:** [github.com/inclusionAI/GroveMoE](https://github.com/inclusionAI/GroveMoE) *Note: The GGUF quantized versions (e.g., mradermacher/GroveMoE-Base-i1-GGUF) are community-quantized derivatives. The original model is the base model by inclusionAI.*

Repository: localaiLicense: apache-2.0