Troubleshooting (v0.2.0)

Common issues and solutions when running GibRAM server.

Server Issues

Server Won't Start

Port Already in Use

Symptom:

Error: listen tcp :6161: bind: address already in use

Cause: Another process is using port 6161.

Diagnosis:

# Check what's using the port
lsof -i :6161

# Or on Linux
netstat -tlnp | grep 6161

Solutions:

Stop the conflicting process:

# Kill process using port 6161
kill -9 <PID>

Use different port:
```
gibram-server --addr :6162
```

Update SDK to match:

indexer = GibRAMIndexer(
    session_id="my-project",
    port=6162  # Match server port
)

Invalid Configuration

Symptom:

Error: Failed to load config: yaml: unmarshal errors

Cause: Syntax error in config.yaml.

Solutions:

Validate YAML syntax:

# Install yamllint
pip install yamllint

# Check syntax
yamllint config.yaml

Check indentation (YAML is whitespace-sensitive):

# ✗ Wrong (mixed tabs/spaces)
server:
    addr: ":6161"

# ✓ Correct (2 spaces)
server:
  addr: ":6161"

Use example config:
```
cp config.example.yaml config.yaml
```

Permission Denied

Symptom:

Error: failed to create data directory: permission denied

Cause: No write permission to data directory.

Solutions:

Check permissions:
```
ls -la ./data
```

Fix permissions:

# Create directory with correct permissions
mkdir -p ./data
chmod 755 ./data

# Or run as user with permissions
sudo chown -R $USER:$USER ./data

Use different directory:
```
gibram-server --data ~/gibram-data
```

TLS Certificate Issues

Symptom:

Error: failed to configure TLS: tls: failed to find any PEM data

Cause: Certificate file not found or invalid.

Solutions:

Verify cert files exist:
```
ls -la /etc/gibram/certs/
```

Check file format (must be PEM):

openssl x509 -in server.crt -text -noout

Generate new cert:

openssl req -x509 -newkey rsa:4096 -nodes \
  -keyout server.key \
  -out server.crt \
  -days 365 \
  -subj "/CN=localhost"

Use auto-cert for development:
```
tls:
  auto_cert: true
```

Server Crashes or Hangs

Out of Memory

Symptom: - Server killed by OS - Logs: "CRITICAL: Memory usage 2048MB / 2048MB" - Docker: Container restarted

Cause: Too much data in sessions without cleanup.

Diagnosis:

# Check memory usage
free -h

# Check server logs
tail -f /var/log/gibram/gibram.log

# Check Docker stats
docker stats gibram

Solutions:

Increase memory limit (Docker):

# docker-compose.yml
deploy:
  resources:
    limits:
      memory: 4G  # Increase from 2G

Enable session TTL (via protocol):

# Set sessions to expire after 1 hour
# (SDK support coming in future version)

Manual cleanup:

# Delete expired sessions via CLI
gibram-cli -h localhost:6161

gibram> LIST_SESSIONS
# Check session IDs

gibram> DELETE_SESSION <session_id>

Reduce data volume:
Index fewer documents
Use larger chunk_size (fewer chunks)
Delete unused sessions

Too Many Connections

Symptom:

Error: max sessions limit reached (10000)

Cause: DoS protection triggered.

Solutions:

Check active sessions:
```
gibram-cli> LIST_SESSIONS
```

Clean up old sessions:

# Delete specific session
gibram-cli> DELETE_SESSION old-session-id

# Or restart server (if ephemeral mode)

Increase limit (code change required):

// In pkg/engine/engine.go
const MaxSessions = 50000  // Increase from 10000

Client Connection Issues

Connection Refused

Symptom (Python):

ConnectionRefusedError: [Errno 111] Connection refused

Symptom (Go):

failed to connect: dial tcp 127.0.0.1:6161: connect: connection refused

Cause: Server not running or wrong host/port.

Solutions:

Check server is running:

ps aux | grep gibram-server

# Expected output:
# user  12345  0.0  0.1  gibram-server --insecure

Verify port:

# Server logs should show:
# INFO  GibRAM Protobuf Server listening on :6161

Test with CLI:

gibram-cli -h localhost:6161 -insecure true

gibram> PING
# Should return: PONG

Check firewall:

# Allow port 6161
sudo ufw allow 6161/tcp

Docker network:

# If server in Docker, use host.docker.internal
# Or run SDK in same network
docker network create gibram-network

TLS Handshake Failed

Symptom:

tls: handshake failure: remote error: tls: bad certificate

Cause: Certificate validation failed.

Solutions:

Development: Use insecure mode:
```
gibram-server --insecure
```

# Client: no TLS config needed
indexer = GibRAMIndexer(session_id="test")

Skip cert verification (Go client):

config := client.DefaultPoolConfig()
config.TLSEnabled = true
config.TLSSkipVerify = true  // Dev only!

Production: Use valid certificate:
CA-signed certificate
Hostname matches cert CN
Certificate not expired

Authentication Failed

Symptom:

Error: unauthorized

Cause: Missing or invalid API key.

Solutions:

Verify server requires auth:

# Server logs should show:
# INFO  Authentication: enabled

Check API key matches config:

# config.yaml
auth:
  keys:
    - id: "app"
      key: "your-key-here"
      permissions: ["write"]

Use correct key in client (Go):

config := client.DefaultPoolConfig()
config.APIKey = "your-key-here"

Development: Disable auth:

gibram-server --insecure  # Disables both TLS and auth

Data Issues

Dimension Mismatch

Symptom:

Error: dimension mismatch: expected 1536, got 768

Cause: Server vector_dim ≠ embedding dimension.

Where it happens: - When adding TextUnit with embedding - When adding Entity with embedding - When adding Community with embedding

Solutions:

Check server dimension:

gibram-cli> INFO
# Look for: VectorDim: 1536

Match SDK to server:

# If server uses 1536
indexer = GibRAMIndexer(
    embedding_dimensions=1536  # Default, matches server
)

Change server dimension (requires restart + re-index):
```
gibram-server --dim 768
```

Use compatible embedding model:

# Server: vector_dim = 1536
# SDK: Use OpenAI text-embedding-3-small (1536 dims)
indexer = GibRAMIndexer(
    embedding_model="text-embedding-3-small",
    embedding_dimensions=1536
)

⚠️ WARNING: Changing vector_dim requires re-indexing all data.

Session Not Found

Symptom:

Error: session not found

Cause: Session expired or never created.

Solutions:

Check session exists:
```
gibram-cli> LIST_SESSIONS
```
Session TTL expired:
Session auto-deleted after TTL
Re-index data in new session
Server restarted (ephemeral mode):
All sessions lost on restart
Re-index data

Wrong session ID:

# Check session_id spelling
indexer = GibRAMIndexer(session_id="my-project")  # Must match exactly

Duplicate Entity Error

Symptom:

Error: entity with title "EINSTEIN" already exists

Cause: Entity with same title already in session.

Behavior: SDK handles this automatically by: 1. Checking if entity exists by title 2. Reusing existing entity ID 3. Linking to text units

If you see this error: - You're using low-level client directly - Check for existing entity before adding

SDK Issues

OpenAI API Errors

Rate Limit

Symptom:

openai.RateLimitError: Rate limit exceeded

Solutions:

Reduce batch size:

stats = indexer.index_documents(
    documents,
    batch_size=5  # Slower but less likely to hit rate limit
)

Add delay between batches (custom implementation needed)
Upgrade OpenAI plan for higher rate limits

Invalid API Key

Symptom:

openai.AuthenticationError: Incorrect API key provided

Solutions:

Check API key:
```
echo $OPENAI_API_KEY
```
Verify key is valid on OpenAI dashboard

Pass key explicitly:

indexer = GibRAMIndexer(
    llm_api_key="sk-...",
    session_id="test"
)

Quota Exceeded

Symptom:

openai.RateLimitError: You exceeded your current quota

Solutions:

Check quota on OpenAI billing dashboard
Add payment method or upgrade plan

Extraction Failed

Symptom:

Warning: Extraction failed for chunk: ...

Cause: LLM returned invalid JSON or timed out.

Impact: Chunk is skipped (indexing continues for other chunks).

Solutions:

Check OpenAI service status
Retry indexing (transient errors)
Check chunk content (very long/malformed text can cause issues)

Performance Issues

Slow Indexing

Symptom: Indexing takes > 10s per document.

Causes:

LLM API latency (main bottleneck)
Large documents → many chunks
Slow network to OpenAI API

Solutions:

Use faster model:

indexer = GibRAMIndexer(
    llm_model="gpt-4o-mini"  # Faster than gpt-4o
)

Increase chunk size (fewer LLM calls):

indexer = GibRAMIndexer(
    chunk_size=1024  # Larger chunks = fewer API calls
)

Disable community detection:

indexer = GibRAMIndexer(
    auto_detect_communities=False
)

Slow Queries

Symptom: Query takes > 1s.

Causes:

Large result set (high top_k)
Many entities in session
Deep graph traversal

Solutions:

Reduce top_k:

result = indexer.query("query", top_k=5)  # Instead of 50

Limit result types:

result = indexer.query(
    "query",
    include_entities=True,
    include_text_units=False,  # Skip if not needed
    include_communities=False
)

Docker Issues

Container Exits Immediately

Symptom:

docker ps
# gibram container not listed

Solutions:

Check logs:
```
docker logs gibram
```

Run interactively to see error:

docker run -it --rm \
  -p 6161:6161 \
  gibramio/gibram:latest

Common causes:
Invalid config mounted
Port conflict (6161 already used on host)
Memory limit too low

Can't Connect from Host

Symptom: Python SDK on host can't reach Docker container.

Solutions:

Check port mapping:

docker ps
# Should show: 0.0.0.0:6161->6161/tcp

Use correct host:

# From host machine
indexer = GibRAMIndexer(
    host="localhost",  # Or "127.0.0.1"
    port=6161
)

Check Docker network:

docker inspect gibram
# Look for "Ports" section

Getting Help

Gather Information

When reporting issues, include:

GibRAM version:
```
gibram-server --version
```

Server logs:

# Last 100 lines
tail -n 100 /var/log/gibram/gibram.log

# Or Docker logs
docker logs gibram --tail 100

Configuration (sanitized):
```
# config.yaml (remove sensitive keys)
```
Reproduction steps:
Minimal code example
Expected vs actual behavior

Resources

GitHub Issues: github.com/gibram-io/gibram/issues
Documentation: This site
Examples: examples/ directory in repo

Next Steps

Configuration Basics - Prevent common issues
Python SDK Quickstart - SDK setup