ongteckwu/rozes
🌹 Rozes - The Fastest DataFrame Library for TypeScript/JavaScript/Zig
Blazing-fast data analysis powered by WebAssembly. Rozes brings pandas-like analytics to TypeScript/JavaScript with native performance, columnar storage, and zero-copy operations.
npm install rozes (Please wait for full version)
const { Rozes } = require("rozes");
const rozes = await Rozes.init();
const df = rozes.DataFrame.fromCSV(
"name,age,score\nAlice,30,95.5\nBob,25,87.3"
);
console.log(df.shape); // { rows: 2, cols: 3 }
const ages = df.column("age"); // Float64Array [30, 25] - zero-copy!
| Operation | Rozes | Papa Parse | csv-parse | Speedup |
|---|---|---|---|---|
| Parse 100K rows | 53.67ms | 207.67ms | 427.48ms | 3.87-7.96× |
| Parse 1M rows | 578ms | ~2-3s | ~5s | 3.5-8.7× |
| Filter 1M rows | 13.11ms | ~150ms | N/A | 11.4× |
| Sort 100K rows | 6.11ms | ~50ms | N/A | 8.2× |
| GroupBy 100K rows | 1.76ms | ~30ms | N/A | 17× |
| SIMD Sum 200K rows | 0.04ms | ~5ms | N/A | 125× |
| SIMD Mean 200K rows | 0.04ms | ~6ms | N/A | 150× |
| Radix Join 100K×100K | 5.29ms | N/A | N/A | N/A |
| Library | Bundle Size | Gzipped | vs Rozes |
|---|---|---|---|
| Rozes | 103KB | 52KB | 1× |
| Papa Parse | 206KB | 57KB | 2.0× larger |
| Danfo.js | 1.2MB | ~400KB | 12× larger |
| Polars-WASM | 2-5MB | ~1MB | 19-49× larger |
| DuckDB-WASM | 15MB | ~5MB | 146× larger |
Future Package Sizes (v1.3.0):
rozes/csv (CSV-only): 40KB gzippedrozes (universal): 120KB gzippedrozes/web (with WebGPU): 180KB gzippednpm install rozes
Requirements:
Add to your build.zig.zon:
.dependencies = .{
.rozes = .{
.url = "https://github.com/yourusername/rozes/archive/v1.0.0.tar.gz",
.hash = "...",
},
},
Then in your build.zig:
const rozes = b.dependency("rozes", .{
.target = target,
.optimize = optimize,
});
exe.root_module.addImport("rozes", rozes.module("rozes"));
Requirements:
import { Rozes } from "rozes";
const rozes = await Rozes.init();
const df = rozes.DataFrame.fromCSV(csvText);
console.log(df.shape);
import { Rozes, DataFrame } from "rozes";
const rozes: Rozes = await Rozes.init();
const df: DataFrame = rozes.DataFrame.fromCSV(csvText);
// Full autocomplete support
const shape = df.shape; // { rows: number, cols: number }
const columns = df.columns; // string[]
const ages = df.column("age"); // Float64Array | Int32Array | BigInt64Array | null
const { Rozes } = require("rozes");
const std = @import("std");
const DataFrame = @import("rozes").DataFrame;
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
const allocator = gpa.allocator();
const csv = "name,age,score\nAlice,30,95.5\nBob,25,87.3";
var df = try DataFrame.fromCSVBuffer(allocator, csv, .{});
defer df.free();
std.debug.print("Rows: {}, Cols: {}\n", .{ df.rowCount, df.columns.len });
}
<!DOCTYPE html>
<html>
<head>
<script type="module">
import { Rozes } from "./node_modules/rozes/dist/index.mjs";
const rozes = await Rozes.init();
const df = rozes.DataFrame.fromCSV(csvText);
console.log(df.shape);
</script>
</head>
</html>
Rozes provides a comprehensive DataFrame API for Node.js and browser environments through WebAssembly bindings.
// Parse CSV from string
const df = rozes.DataFrame.fromCSV(
"name,age,score\nAlice,30,95.5\nBob,25,87.3"
);
// Parse CSV from file (Node.js only)
const df2 = rozes.DataFrame.fromCSVFile("data.csv");
// Shape and metadata
df.shape; // { rows: 2, cols: 3 }
df.columns; // ["name", "age", "score"]
df.length; // 2
// Numeric columns - returns TypedArray (zero-copy!)
const ages = df.column("age"); // Float64Array [30, 25]
const scores = df.column("score"); // Float64Array [95.5, 87.3]
// String columns - returns array of strings
const names = df.column("name"); // ["Alice", "Bob"]
// Boolean columns - returns Uint8Array (0 = false, 1 = true)
const active = df.column("is_active"); // Uint8Array [1, 0]
// Select columns
const subset = df.select(["name", "age"]);
// Head and tail
const first5 = df.head(5);
const last5 = df.tail(5);
// Sort
const sorted = df.sort("age", false); // ascending
const descending = df.sort("score", true); // descending
Blazing-fast statistical functions with SIMD acceleration (2-6 billion rows/sec)
// Sum - 4.48 billion rows/sec
const totalScore = df.sum("score"); // 182.8
// Mean - 4.46 billion rows/sec
const avgAge = df.mean("age"); // 27.5
// Min/Max - 6.5-6.7 billion rows/sec
const minAge = df.min("age"); // 25
const maxScore = df.max("score"); // 95.5
// Variance and Standard Deviation
const variance = df.variance("score");
const stddev = df.stddev("score");
// Note: SIMD automatically used on x86_64 with AVX2, falls back to scalar on other platforms
const df = rozes.DataFrame.fromCSV(largeCSV);
console.log(df.shape);
import { Rozes, DataFrame } from "rozes";
const rozes: Rozes = await Rozes.init();
const df: DataFrame = rozes.DataFrame.fromCSV(csvText);
// Full autocomplete and type checking
const shape: { rows: number; cols: number } = df.shape;
const columns: string[] = df.columns;
const ages: Float64Array | Int32Array | null = df.column("age");
const total: number = df.sum("price");
| Category | Methods | Status |
|---|---|---|
| CSV I/O | fromCSV(), fromCSVFile() |
✅ Available |
| Properties | shape, columns, length |
✅ Available |
| Column Access | column() - numeric, string, boolean |
✅ Available |
| Selection | select(), head(), tail() |
✅ Available |
| Sorting | sort() |
✅ Available |
| SIMD Aggregations | sum(), mean(), min(), max(), variance(), stddev() |
✅ Available (1.2.0) |
| Advanced Operations | filter(), groupBy(), join() |
⏳ Coming in 1.3.0 |
| CSV Export | toCSV(), toCSVFile() |
⏳ Coming in 1.3.0 |
// CSV I/O
var df = try DataFrame.fromCSVBuffer(allocator, csv, .{});
var df2 = try DataFrame.fromCSVFile(allocator, "data.csv", .{});
const csv_out = try df.toCSV(allocator, .{});
// Data Access & Metadata
df.rowCount; // u32
df.columns.len; // usize
const col = df.column("age");
const row = df.row(0);
// Selection & Filtering
const selected = try df.select(&[_][]const u8{"name", "age"});
const filtered = try df.filter(myFilterFn);
const head = try df.head(10);
const tail = try df.tail(10);
// Sorting
const sorted = try df.sort("age", .Ascending);
const multi = try df.sortMulti(&[_][]const u8{"age", "score"}, &[_]SortOrder{.Ascending, .Descending});
// GroupBy Aggregations
const grouped = try df.groupBy("category");
const sum_result = try grouped.sum("amount");
const mean_result = try grouped.mean("score");
const min_result = try grouped.min("age");
const max_result = try grouped.max("age");
const count_result = try grouped.count();
// Joins (inner, left, right, outer, cross)
const joined = try df.join(df2, "id", "id", .Inner);
const left = try df.join(df2, "key", "key", .Left);
// Statistical Operations
const corr = try df.corr("age", "score");
const cov = try df.cov("age", "score");
const ranked = try df.rank("score");
const counts = try df.valueCounts("category");
// Missing Values
const filled = try df.fillna(0.0);
const dropped = try df.dropna();
const nulls = df.isNull("age");
// Reshape Operations
const pivoted = try df.pivot("date", "product", "sales");
const melted = try df.melt(&[_][]const u8{"id"}, &[_][]const u8{"val1", "val2"});
const transposed = try df.transpose();
const stacked = try df.stack();
const unstacked = try df.unstack("level");
// Combine DataFrames
const concatenated = try DataFrame.concat(allocator, &[_]DataFrame{df1, df2}, .Rows);
const merged = try df.merge(df2, &[_][]const u8{"key"});
const appended = try df.append(df2);
const updated = try df.update(df2);
// Window Operations
const rolling = try df.rolling(3).mean("price");
const expanding = try df.expanding().sum("quantity");
// Functional Operations
const mapped = try df.map("age", mapFn);
const applied = try df.apply(applyFn);
// String Operations (10+ functions)
const upper = try df.strUpper("name");
const lower = try df.strLower("name");
const len = try df.strLen("name");
const contains = try df.strContains("name", "Alice");
const startsWith = try df.strStartsWith("name", "A");
const endsWith = try df.strEndsWith("name", "e");
Node.js/Browser API (1.2.0) - Production-ready DataFrame library:
free() calls requiredcolumn()) - all types supportedselect(), head(), tail()sort() (single column, ascending/descending)sum(), mean(), min(), max(), variance(), stddev()shape, columns, length propertiesfromCSVFile)filter(), groupBy(), join(), toCSV()Zig API (1.2.0) - Full DataFrame operations (50+ operations):
sum(), mean(), min(), max(), count()rolling(), expanding()pivot(), melt(), transpose(), stack(), unstack()concat(), merge(), append(), update()apply(), map() with type conversionfillna(), dropna(), isNull()corr(), cov(), rank(), valueCounts()25+ Major Optimizations Across 10 Categories (Milestone 1.2.0):
.collect()| Operation | Dataset | Rozes | Target | Grade | vs Target |
|---|---|---|---|---|---|
| CSV Parse | 1M rows | 578ms | <3000ms | A+ | 81% faster |
| Filter | 1M rows | 13.11ms | <100ms | A+ | 87% faster |
| Sort | 100K rows | 6.11ms | <100ms | A+ | 94% faster |
| GroupBy | 100K rows | 1.76ms | <300ms | A+ | 99% faster! |
| Join (pure algorithm) | 10K × 10K | 0.44ms | <10ms | A+ | 96% faster |
| Join (full pipeline) | 10K × 10K | 588.56ms | <500ms | A | 18% slower |
| SIMD Sum | 200K rows | 0.04ms | <1ms | A+ | 96% faster |
| SIMD Mean | 200K rows | 0.04ms | <2ms | A+ | 98% faster |
| SIMD Min/Max | 200K rows | 0.03ms | <1ms | A+ | 97% faster |
| SIMD Variance | 200K rows | 0.09ms | <3ms | A+ | 97% faster |
| Radix Join SIMD Probe | 10K rows | 0.07ms | <0.5ms | A+ | 85% faster |
| Bloom Filter Rejection | 10K probes | 0.01ms | <0.2ms | A+ | 95% faster |
| Radix vs Standard Join | 100K×100K | 5.29ms | N/A | N/A | 1.65× speedup |
| Head | 100K rows | 0.01ms | N/A | A+ | 14B rows/sec |
| DropDuplicates | 100K rows | 656ms | N/A | N/A | 152K rows/sec |
Benchmarks run on macOS (Darwin 25.0.0), Zig 0.15.1, ReleaseFast mode, averaged over multiple runs
| Browser | Version | Status | Notes |
|---|---|---|---|
| Chrome | 90+ | ✅ Tier 1 | Full WebAssembly support |
| Firefox | 88+ | ✅ Tier 1 | Full WebAssembly support |
| Safari | 14+ | ✅ Tier 1 | Full WebAssembly support |
| Edge | 90+ | ✅ Tier 1 | Chromium-based |
| IE 11 | N/A | ❌ Not Supported | No WebAssembly |
⚠️ Missing Value Representation (MVP Limitation)
Current Behavior:
0 represents missing values[0, 1, 2] with fillna(99) becomes [99, 1, 2] (zero incorrectly replaced)NaN represents missing values[NaN, 1.5, 2.0] with fillna(0.0) becomes [0.0, 1.5, 2.0]Workarounds:
fillna(), dropna(), isna() operations on Int64 columns with legitimate zerosPlanned Fix (v1.4.0):
Node.js API limitations (coming in future releases):
toCSV(), toCSVFile() - WASM export not yet implementedfilter(), groupBy(), join() - Use Zig API for nowWhat's Available (1.2.0):
fromCSV(), fromCSVFile() - Fully implemented with parallel parsingcolumn() - All types (Int64, Float64, String, Bool) supportedselect(), head(), tail(), sort() - Fully functionalsum(), mean(), min(), max(), variance(), stddev() - Production readyFuture features (1.3.0+):
rozes/web, rozes/node, rozes/csv)Completed optimizations (Milestone 1.2.0):
See CHANGELOG.md for full list.
Built with Zig + WebAssembly:
Project Structure:
rozes/
├── src/ # Zig source code
│ ├── core/ # DataFrame engine
│ ├── csv/ # CSV parser (RFC 4180 compliant)
│ └── rozes.zig # Main API
├── dist/ # npm package
│ ├── index.js # CommonJS entry point
│ ├── index.mjs # ESM entry point
│ └── index.d.ts # TypeScript definitions
├── docs/ # Documentation
│ ├── NODEJS_API.md # Node.js API reference
│ ├── ZIG_API.md # Zig API reference
│ ├── MIGRATION.md # Migration guide
│ └── CHANGELOG.md # Version history
└── examples/ # Example programs
└── node/ # Node.js examples
# Prerequisites: Zig 0.15.1+
git clone https://github.com/yourusername/rozes.git
cd rozes
# Build WASM module
zig build
# Run tests (461/463 passing)
zig build test
# Run conformance tests (125/125 passing)
zig build conformance
# Run benchmarks (6/6 passing)
zig build benchmark
# Run memory leak tests (5/5 suites passing, ~5 minutes)
zig build memory-test
# Run nodejs tests
npm run test:api
We welcome contributions! Please:
zig fmt before committing| Feature | Rozes | Papa Parse | Danfo.js | Polars-WASM | DuckDB-WASM |
|---|---|---|---|---|---|
| Performance | ⚡ 3-10× faster | Baseline | ~Same as Papa | 2-5× faster | 5-10× faster |
| Bundle Size | 📦 62KB | 206KB | 1.2MB | 2-5MB | 15MB |
| Zero-Copy | ✅ TypedArray | ❌ | ❌ | ✅ | ✅ |
| RFC 4180 | ✅ 100% | ⚠️ ~95% | ⚠️ Basic | ✅ | ✅ |
| DataFrame Ops | ✅ 50+ | ❌ | ✅ | ✅ | ✅ SQL |
| Memory Safe | ✅ Zig | ❌ JS | ❌ JS | ✅ Rust | ✅ C++ |
| Node.js | ✅ | ✅ | ✅ | ✅ | ✅ |
| Browser | ✅ | ✅ | ✅ | ✅ | ✅ |
| TypeScript | ✅ Full | ⚠️ Basic | ✅ | ✅ | ✅ |
When to use Rozes:
When to use alternatives:
MIT License - see LICENSE for details.
Status: 1.2.0 Advanced Optimizations Release (11/12 benchmarks passing - 92%) Last Updated: 2025-11-01
Try it now: npm install rozes