Solvice Maps: Infrastructure and Deployment Guide

Infrastructure Overview

Solvice Maps runs on Google Cloud Platform (GCP) using a modern cloud-native architecture designed for high availability, scalability, and operational excellence. The infrastructure supports both real-time routing services and batch processing workloads with automatic scaling and comprehensive monitoring.

Cloud Architecture

Platform: Google Cloud Platform

Project Structure:
  • Primary Project: solver-285414
  • Primary Region: europe-west1 (Belgium)
  • Availability Zone: europe-west1-b
  • Secondary Regions: Available for multi-region deployment
Key GCP Services Used:
  • Compute: Google Compute Engine + Google Kubernetes Engine
  • Storage: Cloud Storage for large results and OSRM map data
  • Database: Cloud SQL (PostgreSQL) for request metadata
  • Messaging: Cloud Pub/Sub for event-driven processing
  • Networking: Global Load Balancer with Cloud CDN
  • Monitoring: Cloud Monitoring + Cloud Logging
  • Security: Cloud IAM + Secret Manager

Service Deployment Architecture

1. MapR Gateway Service (Primary API)

Deployment Platform: Google Cloud Run
  • Runtime: JVM 17 with Quarkus native compilation
  • Container: Distroless base image for security
  • Scaling: 0-100 instances with request-based auto-scaling
  • Cold Start: < 100ms with native compilation
Resource Configuration:
resources:
  limits:
    cpu: "2"
    memory: "4Gi"
  requests:
    cpu: "0.5"
    memory: "1Gi"

concurrency: 100
timeout: 300s
Environment Variables:
# Database Connection
DATABASE_URL=postgresql://user:pass@host:5432/mapr_gateway
DB_MAX_POOL_SIZE=20

# External Service Endpoints
OSRM_SERVICE_URL=https://osrm-europe.solvice.io
TOMTOM_API_KEY=${TOMTOM_API_KEY}
GOOGLE_MAPS_API_KEY=${GOOGLE_MAPS_API_KEY}

# Pub/Sub Configuration
PUBSUB_PROJECT_ID=solver-285414
PUBSUB_TABLE_TOPIC=mapr-table-requests
PUBSUB_RESPONSE_TOPIC=mapr-table-responses

# Storage Configuration
STORAGE_BUCKET=mapr-gateway-results
STORAGE_SIGNED_URL_DURATION=3600

# Authentication
JWT_SECRET=${JWT_SECRET}
JWT_ISSUER=solvice-maps

2. OSRM Service (Routing Engine)

Deployment Platform: Google Kubernetes Engine (GKE)
  • Cluster: osrm-cluster (3 nodes, n1-highmem-2)
  • Node Pool: Container-Optimized OS with SSD persistent disks
  • Scaling: Horizontal Pod Autoscaler with custom metrics
Kubernetes Deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nodejs-mapr-europe-car
  namespace: default
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nodejs-mapr-europe-car
  template:
    metadata:
      labels:
        app: nodejs-mapr-europe-car
    spec:
      initContainers:
      - name: map-downloader
        image: gcr.io/solver-285414/map-downloader:latest
        volumeMounts:
        - name: osrm-maps
          mountPath: /maps
        env:
        - name: MAPS_BUCKET
          value: "osrm-maps-europe"
        - name: MAP_REGION
          value: "europe"
      
      containers:
      - name: osrm-service
        image: gcr.io/solver-285414/nodejs-mapr:latest
        ports:
        - containerPort: 3000
        env:
        - name: OSRM_MAPS
          value: |
            [{
              "map": "europe",
              "vehicle": "car",
              "path": "/maps/europe-{{slice}}.osrm",
              "slices": [0,1,2,3,4,5,6,7,8,9,10,11,12],
              "mmap": true
            }]
        - name: PUBSUB_TABLE_SUBSCRIPTIONS
          value: |
            [{
              "id": "europe-car-subscription",
              "weight": 10,
              "maxMessages": 2
            }]
        volumeMounts:
        - name: osrm-maps
          mountPath: /maps
          readOnly: true
        resources:
          requests:
            memory: "8Gi"
            cpu: "2"
          limits:
            memory: "12Gi" 
            cpu: "4"
        livenessProbe:
          httpGet:
            path: /v1/health
            port: 3000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /v1/health/ready
            port: 3000
          initialDelaySeconds: 5
          periodSeconds: 5
          
      volumes:
      - name: osrm-maps
        persistentVolumeClaim:
          claimName: osrm-maps-pvc
Auto-Scaling Configuration:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: osrm-service-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nodejs-mapr-europe-car
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  - type: External
    external:
      metric:
        name: pubsub.googleapis.com/subscription/num_undelivered_messages
        selector:
          matchLabels:
            resource.labels.subscription_id: "europe-car-subscription"
      target:
        type: AverageValue
        averageValue: "30"

Infrastructure as Code (Terraform)

Terraform Configuration Structure

terraform/
├── backend.tf          # Remote state configuration
├── provider.tf         # GCP provider configuration
├── variables.tf        # Input variables
├── container.tf        # GCE + Container configuration
├── instance_template.tf # VM instance template
├── instance_group.tf   # Managed instance group
├── autoscaler.tf       # Auto-scaling configuration
├── loadbalancer.tf     # Global load balancer
├── health_check.tf     # Health check configuration
├── bucket.tf           # Cloud Storage buckets
└── pubsub.tf          # Pub/Sub topics and subscriptions

Key Terraform Resources

Instance Template:
resource "google_compute_instance_template" "osrm_template" {
  name_prefix  = "osrm-template-"
  description  = "Template for OSRM service instances"
  machine_type = var.machine_type # n1-highmem-2

  disk {
    source_image = "cos-cloud/cos-stable"
    disk_type    = "pd-ssd"
    disk_size_gb = 280
    auto_delete  = true
    boot         = true
  }

  disk {
    source_image = var.data_disk_image # Custom image with OSRM data
    disk_type    = "pd-ssd"
    disk_size_gb = 1400
    auto_delete  = false
    boot         = false
  }

  network_interface {
    network = "default"
    access_config {
      nat_ip = null # Ephemeral IP
    }
  }

  metadata = {
    "gce-container-declaration" = module.gce-container.metadata_value
    "google-logging-enabled"    = "true"
    "enable-guest-attributes"   = "TRUE"
  }

  service_account {
    email  = var.service_account_email
    scopes = ["https://www.googleapis.com/auth/cloud-platform"]
  }

  tags = ["http-server", "https-server"]

  lifecycle {
    create_before_destroy = true
  }
}
Global Load Balancer:
resource "google_compute_global_forwarding_rule" "default" {
  name       = "osrm-global-forwarding-rule"
  target     = google_compute_target_http_proxy.default.id
  port_range = "80"
  ip_address = google_compute_global_address.default.address
}

resource "google_compute_target_http_proxy" "default" {
  name    = "osrm-target-proxy"
  url_map = google_compute_url_map.default.id
}

resource "google_compute_url_map" "default" {
  name            = "osrm-url-map"
  default_service = google_compute_backend_service.default.id
}

resource "google_compute_backend_service" "default" {
  name                  = "osrm-backend-service"
  protocol              = "HTTP"
  timeout_sec           = 30
  enable_cdn           = true
  load_balancing_scheme = "EXTERNAL"

  backend {
    group           = google_compute_instance_group_manager.default.instance_group
    balancing_mode  = "UTILIZATION"
    max_utilization = 0.8
  }

  health_checks = [google_compute_health_check.default.id]
}
Pub/Sub Configuration:
# Dynamic topic creation from JSON configuration
locals {
  pubsub_config = jsondecode(var.pubsub_subscriptions_json)
}

resource "google_pubsub_topic" "table_topics" {
  for_each = { for sub in local.pubsub_config : sub.id => sub }
  name     = "mapr-table-${each.value.id}"
  
  message_retention_duration = "604800s" # 7 days
}

resource "google_pubsub_topic" "dead_letter_topics" {
  for_each = { for sub in local.pubsub_config : sub.id => sub }
  name     = "mapr-table-${each.value.id}-dead-letter"
}

resource "google_pubsub_subscription" "table_subscriptions" {
  for_each = { for sub in local.pubsub_config : sub.id => sub }
  name     = "mapr-table-${each.value.id}-subscription"
  topic    = google_pubsub_topic.table_topics[each.key].name

  ack_deadline_seconds       = 600
  message_retention_duration = "604800s"
  retain_acked_messages      = false

  retry_policy {
    minimum_backoff = "10s"
    maximum_backoff = "600s"
  }

  dead_letter_policy {
    dead_letter_topic     = google_pubsub_topic.dead_letter_topics[each.key].id
    max_delivery_attempts = 5
  }
}

Deployment Processes

1. CI/CD Pipeline (GitLab CI)

Pipeline Structure:
stages:
  - validate
  - test
  - build
  - deploy-staging
  - integration-test
  - deploy-production

variables:
  DOCKER_DRIVER: overlay2
  DOCKER_TLS_CERTDIR: "/certs"

# Terraform Validation
validate:
  stage: validate
  image: hashicorp/terraform:1.9
  script:
    - cd terraform
    - terraform init -backend=false
    - terraform validate
    - terraform fmt -check

# Application Testing
test:
  stage: test
  image: node:22
  script:
    - cd osrm-service
    - npm ci
    - npm run test:unit
    - npm run test:integration

# Container Build
build:
  stage: build
  image: docker:24
  services:
    - docker:24-dind
  script:
    - docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA .
    - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA

# Staging Deployment
deploy-staging:
  stage: deploy-staging
  image: google/cloud-sdk:alpine
  script:
    - gcloud auth activate-service-account --key-file $GOOGLE_APPLICATION_CREDENTIALS
    - gcloud config set project $GCP_PROJECT_ID
    - cd terraform
    - terraform init
    - terraform workspace select staging
    - terraform plan -var="image_tag=$CI_COMMIT_SHA"
    - terraform apply -auto-approve -var="image_tag=$CI_COMMIT_SHA"
  environment:
    name: staging
    url: https://staging-api.solvice.io

# Production Deployment (Manual)
deploy-production:
  stage: deploy-production
  image: google/cloud-sdk:alpine
  script:
    - gcloud auth activate-service-account --key-file $GOOGLE_APPLICATION_CREDENTIALS
    - gcloud config set project $GCP_PROJECT_ID
    - cd terraform
    - terraform init
    - terraform workspace select production
    - terraform plan -var="image_tag=$CI_COMMIT_SHA"
    - terraform apply -auto-approve -var="image_tag=$CI_COMMIT_SHA"
  environment:
    name: production
    url: https://routing.solvice.io
  when: manual
  only:
    - main

2. Zero-Downtime Deployment Strategy

Rolling Update Process:
  1. Health Check: Ensure all current instances are healthy
  2. New Instance Launch: Launch new instances with updated configuration
  3. Health Validation: Wait for new instances to pass health checks
  4. Traffic Migration: Gradually shift traffic to new instances
  5. Old Instance Termination: Terminate old instances after validation
  6. Rollback Plan: Automated rollback if health checks fail
Blue-Green Deployment for Critical Updates:
#!/bin/bash
# Blue-Green deployment script

# Deploy to blue environment
terraform workspace select blue
terraform apply -var="image_tag=$NEW_VERSION"

# Run health checks
./scripts/health-check.sh blue

# Switch traffic to blue
gcloud compute url-maps set-default-service $URL_MAP \
  --default-service=$BLUE_BACKEND_SERVICE

# Monitor for 10 minutes
sleep 600

# If successful, cleanup green environment
if [ $? -eq 0 ]; then
  terraform workspace select green
  terraform destroy -auto-approve
  echo "Deployment successful"
else
  # Rollback to green
  gcloud compute url-maps set-default-service $URL_MAP \
    --default-service=$GREEN_BACKEND_SERVICE
  echo "Deployment failed, rolled back"
  exit 1
fi

3. Map Data Deployment

OSRM Map Update Process:
#!/bin/bash
# Map data update script

# Build new map data
./build-osrm-maps.sh $REGION $VERSION

# Create disk image
gcloud compute images create osrm-$REGION-$VERSION \
  --source-disk=osrm-build-disk \
  --source-disk-zone=europe-west1-b

# Update Terraform variable
export TF_VAR_data_disk_image="osrm-$REGION-$VERSION"

# Deploy with rolling update
terraform plan -var="data_disk_image=$TF_VAR_data_disk_image"
terraform apply -auto-approve

Monitoring and Alerting

1. Infrastructure Monitoring

Cloud Monitoring Metrics:
# Custom metric for OSRM request latency
- name: "osrm/request_duration_seconds"
  description: "OSRM request processing time"
  type: "histogram"
  labels: ["method", "status", "region"]

# Custom metric for queue depth
- name: "pubsub/queue_depth"
  description: "Number of undelivered messages"
  type: "gauge"
  labels: ["subscription", "topic"]

# Infrastructure metrics
- name: "compute/cpu_utilization"
- name: "compute/memory_utilization"
- name: "compute/disk_utilization"
Alerting Policies:
alertPolicy:
  displayName: "High Response Time"
  conditions:
    - displayName: "Response time > 100ms"
      conditionThreshold:
        threshold: 0.1
        comparison: COMPARISON_GREATER_THAN
        metric: "osrm/request_duration_seconds"
        aggregations:
          - alignmentPeriod: "300s"
            perSeriesAligner: ALIGN_RATE
  notificationChannels:
    - "projects/solver-285414/notificationChannels/slack-alerts"
    - "projects/solver-285414/notificationChannels/pager-duty"

2. Application-Level Monitoring

Health Check Endpoints:
// Comprehensive health checks
@Get('/health')
async getHealth(): Promise<HealthStatus> {
  return {
    status: 'healthy',
    timestamp: new Date().toISOString(),
    services: {
      database: await this.checkDatabase(),
      osrm: await this.checkOSRM(),
      pubsub: await this.checkPubSub(),
      storage: await this.checkStorage()
    },
    metrics: {
      activeConnections: this.getActiveConnections(),
      queueDepth: await this.getQueueDepth(),
      memoryUsage: process.memoryUsage()
    }
  };
}
Performance Metrics:
// Custom metrics collection
@Histogram('request_duration_seconds', ['method', 'status'])
private requestDuration: Histogram;

@Counter('requests_total', ['method', 'status'])
private requestsTotal: Counter;

@Gauge('active_requests', [])
private activeRequests: Gauge;

Security Configuration

1. Network Security

VPC Configuration:
resource "google_compute_network" "solvice_vpc" {
  name                    = "solvice-maps-vpc"
  auto_create_subnetworks = false
}

resource "google_compute_subnetwork" "private_subnet" {
  name          = "private-subnet"
  ip_cidr_range = "10.0.1.0/24"
  region        = "europe-west1"
  network       = google_compute_network.solvice_vpc.id
  
  private_ip_google_access = true
}

resource "google_compute_firewall" "allow_internal" {
  name    = "allow-internal"
  network = google_compute_network.solvice_vpc.name

  allow {
    protocol = "tcp"
    ports    = ["80", "443", "3000"]
  }

  source_ranges = ["10.0.0.0/8"]
}
SSL/TLS Configuration:
resource "google_compute_managed_ssl_certificate" "default" {
  name = "solvice-maps-ssl-cert"

  managed {
    domains = [
      "routing.solvice.io",
      "api.solvice.io"
    ]
  }
}

resource "google_compute_target_https_proxy" "default" {
  name             = "solvice-https-proxy"
  url_map          = google_compute_url_map.default.id
  ssl_certificates = [google_compute_managed_ssl_certificate.default.id]
}

2. IAM and Access Control

Service Account Configuration:
resource "google_service_account" "osrm_service_account" {
  account_id   = "osrm-service"
  display_name = "OSRM Service Account"
  description  = "Service account for OSRM compute instances"
}

resource "google_project_iam_member" "osrm_storage_access" {
  project = var.project_id
  role    = "roles/storage.objectViewer"
  member  = "serviceAccount:${google_service_account.osrm_service_account.email}"
}

resource "google_project_iam_member" "osrm_pubsub_access" {
  project = var.project_id
  role    = "roles/pubsub.subscriber"
  member  = "serviceAccount:${google_service_account.osrm_service_account.email}"
}
Secret Management:
resource "google_secret_manager_secret" "api_keys" {
  secret_id = "external-api-keys"
  
  replication {
    user_managed {
      replicas {
        location = "europe-west1"
      }
    }
  }
}

resource "google_secret_manager_secret_version" "api_keys_version" {
  secret      = google_secret_manager_secret.api_keys.id
  secret_data = jsonencode({
    tomtom_api_key = var.tomtom_api_key
    google_maps_api_key = var.google_maps_api_key
  })
}

Disaster Recovery and Backup

1. Data Backup Strategy

Database Backups:
# Automated PostgreSQL backups
gcloud sql backups create \
  --instance=mapr-gateway-db \
  --description="Daily automated backup $(date +%Y-%m-%d)"

# Point-in-time recovery enabled
gcloud sql instances patch mapr-gateway-db \
  --backup-start-time=02:00 \
  --enable-bin-log
Configuration Backups:
# Terraform state backup
gsutil cp gs://terraform-state-bucket/terraform.tfstate \
  gs://disaster-recovery-bucket/terraform-$(date +%Y%m%d).tfstate

# Container images backup
gcloud container images list-tags gcr.io/solver-285414/nodejs-mapr \
  --limit=10 --format='get(digest)' | \
  xargs -I {} gcloud container images add-tag \
    gcr.io/solver-285414/nodejs-mapr@{} \
    gcr.io/backup-project/nodejs-mapr:backup-$(date +%Y%m%d)

2. Multi-Region Deployment

Regional Failover Configuration:
# Primary region: europe-west1
# Secondary region: us-central1

resource "google_compute_instance_group_manager" "osrm_primary" {
  name     = "osrm-primary"
  location = "europe-west1-b"
  # ... primary configuration
}

resource "google_compute_instance_group_manager" "osrm_secondary" {
  name     = "osrm-secondary" 
  location = "us-central1-b"
  # ... secondary configuration (standby)
}

resource "google_compute_health_check" "regional_failover" {
  name = "regional-failover-check"
  
  http_health_check {
    port         = 80
    request_path = "/health"
  }
  
  check_interval_sec  = 10
  timeout_sec         = 5
  healthy_threshold   = 2
  unhealthy_threshold = 3
}

Cost Optimization

1. Resource Optimization

Preemptible Instances:
resource "google_compute_instance_template" "preemptible_template" {
  name = "osrm-preemptible-template"
  
  scheduling {
    preemptible = true
    automatic_restart = false
    on_host_maintenance = "TERMINATE"
  }
  
  # Use preemptible instances for batch processing
  machine_type = "n1-highmem-2"
}
Auto-Scaling Configuration:
resource "google_compute_autoscaler" "osrm_autoscaler" {
  name   = "osrm-autoscaler"
  target = google_compute_instance_group_manager.default.id

  autoscaling_policy {
    max_replicas    = 10
    min_replicas    = 1  # Scale to zero during off-hours
    cooldown_period = 300

    cpu_utilization {
      target = 0.7
    }

    scaling_schedules {
      name                  = "scale-down-nights"
      description           = "Scale down during off-hours"
      schedule              = "0 22 * * *"  # 10 PM
      time_zone             = "Europe/Brussels"
      min_required_replicas = 0
      duration_sec          = 28800  # 8 hours
    }
  }
}
This infrastructure provides a robust, scalable, and cost-effective foundation for the Solvice Maps platform, with comprehensive monitoring, security, and disaster recovery capabilities.