Skip to content

Commit 97a7606

Browse files
authored
Merge pull request isamplesorg#16 from rdhyee/issue-13-parquet-duckdb
make clearer that iSamples central (and its API) is down right now
2 parents b4fb67a + e0a6f3b commit 97a7606

File tree

6 files changed

+55
-58
lines changed

6 files changed

+55
-58
lines changed

.claude/settings.local.json

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
{
2+
"permissions": {
3+
"allow": [
4+
"Bash(git branch:*)"
5+
],
6+
"deny": [],
7+
"ask": []
8+
}
9+
}

_quarto.yml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,6 @@ website:
99
search: true
1010
logo: assets/isampleslogopetal.png
1111
tools:
12-
- icon: table
13-
href: https://hyde.cyverse.org/isamples_central/ui/
1412
- icon: github
1513
href: https://github.com/isamplesorg
1614
- icon: slack

about.qmd

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,18 @@ title: "About iSamples"
44

55
# Project Objectives
66

7-
1. Design and develop iSamples infrastructure (iSamples in a Box and iSamples Central);
7+
1. Design and develop iSamples infrastructure (iSamples in a Box and distributed data systems);
88
2. Build four initial implementations of iSamples for adoption and use case testing (Open Context, GEOME, SESAR, and Smithsonian Institution);
99
3. Conduct outreach and community engagement to developers, individual researchers, and international organizations concerned with material samples.
1010

11+
## Current Data Access
12+
13+
**Note**: iSamples Central is currently unavailable. The project has transitioned to a **geoparquet-based approach** for data access and analysis:
14+
15+
- **Primary Data Source**: Comprehensive geoparquet files containing millions of sample records
16+
- **Analysis Platform**: Browser-based tools using DuckDB-WASM and Observable
17+
- **Coverage**: Complete datasets from SESAR, OpenContext, GEOME, and Smithsonian collections
18+
1119
![iSamples diagram](assets/iSamplesArchitecture.png)
1220

1321

design/requirements.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -337,7 +337,7 @@ Components
337337
## 15 All content sources should be assumed to be dynamic and attached components should facilitate efficient synchronization of subscribed content.
338338

339339

340-
iSamples central will need to continually update the catalog and promote dissemination of the content to subscribers (e.g. iSB instances).
340+
With the transition to geoparquet-based data access, content synchronization now occurs through periodic updates of parquet files rather than real-time API synchronization. This approach provides better performance and reliability for analytical workloads.
341341

342342
Derived from:
343343

index.qmd

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,15 @@ subtitle: "Toward an Interdisciplinary Cyberinfrastructure for Material Samples
66

77
The Internet of Samples (iSamples) is a multi-disciplinary and multi-institutional project funded by the National Science Foundation to design, develop, and promote service infrastructure to uniquely, consistently, and conveniently identify material samples, record metadata about them, and persistently link them to other samples and derived digital content, including images, data, and publications.
88

9+
## Current Data Access: Geoparquet-Based Approach
10+
11+
**Note**: iSamples Central is currently unavailable. The project now uses **geoparquet files** for efficient, browser-based data access and analysis:
12+
13+
- 📊 **[Interactive Tutorials](/tutorials/)** - Modern browser-based analysis with DuckDB-WASM
14+
- 🗺️ **Comprehensive Coverage** - Complete datasets from SESAR, OpenContext, GEOME, and Smithsonian
15+
- 🚀 **High Performance** - Fast, efficient data access with minimal memory usage
16+
- 🌐 **Universal Access** - Works in any modern browser without software installation
17+
918
**Resources**
1019

1120
* [Recording of project presentation at the 2020 SPNHC & ICOM NATHIST Conference](https://youtu.be/eRUw5NMksFo?t=105)

tutorials/index.qmd

Lines changed: 27 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -2,66 +2,39 @@
22
title: "Tutorials: Overview"
33
---
44

5-
Here's where we park our various tutorials!
5+
Welcome to the iSamples tutorials! These tutorials demonstrate how to work with sample data using modern browser-based tools and geoparquet files.
66

7-
Get the OpenAPI spec.
7+
## Available Data Sources
88

9-
```{ojs}
10-
//| echo: true
9+
With iSamples Central currently unavailable, all tutorials now use **geoparquet files** as the primary data source:
1110

12-
// Get the OpenAPI specification and display detailed endpoint information
13-
viewof apiEndpointDetails = {
14-
// Show loading indicator
15-
const loadingElement = html`<div>Loading API endpoints...</div>`;
16-
document.body.appendChild(loadingElement);
11+
### Primary Data Sources
12+
- **Zenodo Complete Dataset**: ~300MB, 6+ million records from all iSamples sources
13+
- **OpenContext Parquet**: Curated archaeological sample data
14+
- **Domain-specific Collections**: Specialized datasets for focused analysis
1715

18-
try {
19-
const OPENAPI_URL = 'https://central.isample.xyz/isamples_central/openapi.json';
16+
### Tutorial Categories
2017

21-
// Fetch the OpenAPI spec
22-
const response = await fetch(OPENAPI_URL);
23-
if (!response.ok) throw new Error(`Failed to fetch API spec: ${response.status}`);
18+
**🗺️ Geographic Analysis**
19+
- Interactive mapping and spatial exploration
20+
- Regional distribution analysis
21+
- Cesium-based 3D visualizations
2422

25-
const apiSpec = await response.json();
23+
**📊 Data Analysis**
24+
- Statistical analysis with DuckDB-WASM
25+
- Material category distributions
26+
- Cross-collection comparisons
2627

27-
// Extract detailed information about each endpoint
28-
const endpointDetails = [];
28+
**🚀 Performance Demonstrations**
29+
- Browser-based big data analysis
30+
- Efficient sampling and visualization techniques
31+
- HTTP range request optimization
2932

30-
for (const [path, pathMethods] of Object.entries(apiSpec.paths)) {
31-
for (const [method, details] of Object.entries(pathMethods)) {
32-
endpointDetails.push({
33-
endpoint: path,
34-
method: method.toUpperCase(),
35-
summary: details.summary || '',
36-
operationId: details.operationId || '',
37-
tags: (details.tags || []).join(', '),
38-
parameters: (details.parameters || [])
39-
.map(p => `${p.name} (${p.required ? 'required' : 'optional'})`)
40-
.join(', ')
41-
});
42-
}
43-
}
33+
## Why Geoparquet?
4434

45-
// Create a table with the detailed endpoint information
46-
return Inputs.table(
47-
endpointDetails,
48-
{
49-
label: "iSamples API Endpoints Details",
50-
width: {
51-
endpoint: 150,
52-
method: 80,
53-
summary: 200,
54-
operationId: 200,
55-
tags: 100,
56-
parameters: 300
57-
}
58-
}
59-
);
60-
} catch (error) {
61-
return html`<div style="color: red">Error fetching API endpoints: ${error.message}</div>`;
62-
} finally {
63-
// Remove loading indicator
64-
loadingElement.remove();
65-
}
66-
}
67-
```
35+
Our tutorials showcase how **geoparquet + DuckDB-WASM** enables:
36+
-**Universal access**: No software installation required
37+
-**Fast analysis**: 5-10x faster than traditional approaches (e.g., downloading full CSV datasets and analyzing them locally). [See benchmark](https://duckdb.org/2023/05/10/duckdb-wasm.html)
38+
-**Memory efficient**: Analyze 300MB datasets using <100MB browser memory
39+
-**Minimal data transfer**: Only download what you need
40+
-**Interactive exploration**: Real-time parameter adjustment

0 commit comments

Comments
 (0)