Tokyo, Japan / Geospatial Big Data Engineer

I build geospatial data platforms that turn mobility signals into reliable decisions.

Big data engineer focused on GPS-scale pipelines, spatial analytics, and production data products for smart cities, mobility, and urban planning teams.

I design systems that ingest high-volume location data, validate it, enrich it with spatial context, and ship it as analytics-ready datasets, dashboards, maps, and APIs.

PythonSparkKafkaAWS S3AWS GlueClickHouseKubernetesH3GeoPandasETL/ELTREST APIsOpenADR
Urban aerial map visualization with orange GPS movement arcs, blue road intensity corridors, and magenta footfall hotspots.

8+ yrs

data / ML systems

GPS-scale

mobility analytics

AWS + Spark

cloud data platforms

Tokyo

Japan-based

About

I solve the messy middle between raw mobility data and decision-ready geospatial products.

My work sits at the intersection of distributed data engineering and geospatial analytics. I care about the parts that make data products dependable: ingestion contracts, partitioning strategy, spatial indexing, reproducible transformations, cost-aware cloud execution, and clear outputs that technical and non-technical teams can trust.

Systems thinking

I start with data contracts, lineage, failure modes, and operational cost instead of treating pipelines as scripts.

Spatial fluency

I work with coordinates, grids, joins, time windows, OD movement, road volume, and POI footfall as first-class data concerns.

Product mindset

I care about the final consumer: analysts, researchers, consultants, dashboards, APIs, and map-based workflows.

Case studies

Systems work framed by problem, solution, stack, and impact.

These are public-safe summaries of real production patterns: ingestion, spatial enrichment, serving layers, optimization, and operational workflows.

LocationMind / GPS-scale geospatial pipelines

Mobility Data Foundation

Raw GPS
->
Validation
->
Spatial enrichment
->
Feature tables
->
Analytics outputs

Problem

Raw mobility inputs are noisy, high-volume, and difficult to reuse across research, consulting, analytics, and product workflows.

Solution

Designed reproducible ingestion and transformation layers that validate, normalize, aggregate, enrich, and prepare location records for downstream analytics.

Impact

  • Reduced repeated preprocessing work by standardizing analytical datasets.
  • Improved confidence in outputs through validation, documentation, and repeatable transformations.
  • Created foundations for maps, dashboards, CSV exports, and model features.
PythonSparkAWS S3AWS GlueGeoPandasH3Data quality

LocationMind / Urban planning and location intelligence

People-Flow Analytics Products

GPS events
->
Spatial indexing
->
OD / volume / footfall
->
Maps
->
Stakeholder delivery

Problem

Urban and mobility teams need reliable views of movement patterns without manually interpreting raw GPS trajectories.

Solution

Built analytical outputs for OD movement, road-volume intensity, and POI-footfall patterns using spatial aggregation and map-ready data products.

Impact

  • Translated movement signals into decision-ready geospatial outputs.
  • Supported stakeholder workflows across research, consulting, and product teams.
  • Made repeated mobility analysis easier to reproduce and compare over time.
PythonH3GeoPandasClickHouseMapsCSV products

GridSolutions / Optimization and platform integration

Energy Pricing Optimization System

Business inputs
->
Optimization module
->
API layer
->
Client systems
->
Operations

Problem

Pricing workflows needed optimization logic that could be integrated into operational systems and maintained over time.

Solution

Implemented optimization modules, API interfaces, and OpenADR-related integrations while improving architecture reliability.

Impact

  • Moved optimization research closer to production workflows.
  • Created integration surfaces for client platforms.
  • Reduced operational risk by refactoring away single points of failure.
PythonMINLPREST APIsOpenADROptimizationSystem refactoring

Smart Data Solutions / OCR and operational automation

Document Intelligence Pipeline

Scanned claims
->
OCR
->
Classification
->
Entity extraction
->
Review interface

Problem

Scanned claims required structured extraction and classification before they could move efficiently through operational workflows.

Solution

Combined OCR extraction, text classification, rule-based extraction, NER, and interface improvements to support faster manual review.

Impact

  • Improved the path from scanned documents to structured operational data.
  • Reduced friction for manual keying workflows.
  • Connected ML extraction with practical back-office usability.
OCRTesseractFineReaderWEKARandom ForestNER

System design

Architecture patterns for geospatial big-data systems.

The site now exposes architecture thinking directly instead of hiding it inside tool lists.

Mobility Data Lakehouse

A reusable pattern for turning high-volume GPS data into analytics-ready spatial features and product outputs.

Sources
->
Kafka / batch ingest
->
S3 raw zone
->
Spark + Glue
->
H3 feature marts
->
ClickHouse / APIs
Schema contractsPartition-aware processingSpatial indexingCost-aware computeObservable outputs

Geospatial Product Layer

A serving layer designed for teams that need fast comparison across movement metrics, time windows, and geographic cells.

Feature tables
->
Aggregation jobs
->
ClickHouse
->
Map tiles / CSV
->
Dashboards
->
Decision workflows
Low-latency readsReproducible exportsMap-ready geometryClear lineageStakeholder trust

Skills

Grouped by the work they enable, not by tool names.

The stack matters because it supports large-scale movement data, spatial context, reliable pipelines, and fast serving layers.

Data Engineering

Designing batch and streaming pipelines with explicit data contracts, validation, partitioning, orchestration, and observability.

PythonSparkKafkaETL/ELTData qualityTime-series features

Cloud & Infra

Shipping data products on cloud infrastructure with cost-aware execution, containerized services, and production deployment habits.

AWS S3AWS GlueEC2KubernetesClickHouseMonitoring

Geospatial

Turning raw mobility traces into spatially indexed, map-ready analytics for people-flow, OD movement, road volume, and POI footfall.

H3GeoPandasSpatial joinsMobility dataMapsLocation intelligence

Experience timeline

A career arc from ML systems to geospatial big-data platforms.

The through-line is production work: data quality, integration, cloud systems, operational reliability, and useful outputs.

Jan 2025 - Present

Big Data Engineer (GeoSpatial)

LocationMind / Permanent

Tokyo, Japan

Building mobility and geospatial data pipelines that transform raw GPS-scale inputs into people-flow analytics, map-ready outputs, and decision products.

  • Design ingestion, validation, normalization, aggregation, and geospatial enrichment workflows for mobility datasets.
  • Build time-series and spatial features for people-flow analysis, dashboards, maps, and CSV data products.
  • Improve reproducibility, monitoring, performance, documentation, and governance across the data lifecycle.
PythonSparkAWSGeospatial joinsTime-series featuresData quality

Nov 2021 - Jan 2025

AI Engineer

GridSolutions Inc / Permanent

Tokyo, Japan

Delivered optimization and integration systems for energy pricing workflows, including business logic, APIs, and OpenADR VEN projects.

  • Implemented optimization modules using Mixed-Integer Nonlinear Programming for pricing strategy workflows.
  • Built API interfaces for client-platform integration and maintained OpenADR VEN projects.
  • Refactored architecture to reduce single points of failure and improve operational reliability.
PythonOptimizationMINLPOpenADRAPI designReliability

Nov 2019 - Nov 2021

Machine Learning Engineer

Bottle / Permanent

Kathmandu, Nepal

Led delivery of machine-learning-backed APIs and cloud services, bridging implementation architecture with client-facing product needs.

  • Led end-to-end project lifecycles from planning through production delivery.
  • Collaborated with architects to define implementation plans aligned with client constraints.
  • Developed REST APIs and cloud integrations that exposed ML functionality to applications.
Machine learningREST APIsAWS LambdaAmazon EC2Team leadership

Jul 2018 - Nov 2019

Machine Learning and OCR Engineer

Smart Data Solutions / Full-time

Kathmandu, Nepal

Built OCR and document-intelligence workflows for scanned claims, combining extraction, classification, and operational interface improvements.

  • Extracted characters from scanned claims using Tesseract, FineReader, and Cartouche.
  • Classified extracted text with Random Forest models and extracted entities using rules and NER.
  • Improved manual keying interfaces to reduce operational friction.
OCRTesseractFineReaderWEKARandom ForestNER

Tech blog / insights

Technical themes behind the work.

Areas I keep sharpening through production work, architecture decisions, and practical geospatial data-system design.

Spatial indexing

Designing H3 Feature Tables for Mobility Analytics

How to choose resolution, partitioning, and aggregation boundaries when GPS events need to become reusable spatial features.

Data reliability

Data Quality Checks That Matter for GPS Pipelines

Practical validation patterns for timestamp consistency, coordinate sanity, duplicate trajectories, and downstream trust.

Serving layer

ClickHouse for Geospatial Data Products

Where columnar serving fits after Spark transformations, and how to think about query shape, rollups, and product latency.

Education

Computer science foundation for systems, data, and applied ML.

Formal computer science training supports the later career arc across software systems, data platforms, machine learning, and cloud applications.

Nepal

BSc CSIT

Deerwalk Institute of Technology, Tribhuvan University

Bachelor of Science in Computer Science and Information Technology from Deerwalk Institute of Technology, affiliated with Tribhuvan University, grounding later work in software systems, data platforms, and applied machine learning.

Contact

Open to geospatial data platforms, mobility analytics, and systems-heavy engineering work.

Reach out if the problem involves high-volume data, spatial analytics, distributed pipelines, or turning complex data into something a product team can operate.