> For the complete documentation index, see [llms.txt](https://wiki.solids.group/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://wiki.solids.group/research-practices/data-management.md).

# Data Management

This page describes how to organize production, publishable simulation results in a way that supports sponsor requirements, reproducibility, and good scientific practice.

## Data Organization Approach

The group uses a paper-oriented data organization strategy. Simulation data should be associated with the publication in which it will eventually appear.

This keeps results close to the best available documentation and minimizes time lost when work passes between group members.

## Paper Git Repository

Every paper should be under Git version control through Overleaf or GitHub. This applies even if the manuscript itself is written in Word for collaborator or journal reasons.

When you are ready to generate potentially publishable results, create the paper repository.

Before publication, use this naming convention:

```
PaperDescriptiveTitleMixCapsNoSpaces
```

Examples:

```
PaperElasticSolver
PaperMicrostructureEvolution
```

Counterexamples:

```
ElasticSolverPaper
Paper_ElasticSolver
paperelasticsolver
Paper Elastic Solver
```

After publication, rename the repository to match the Google Scholar BibTeX ID, such as `authorlastnameYYYYfirstword`.

## Results Directory Structure

Create a `results` directory in the paper repository. Put all simulation data and postprocessing results there.

```
PaperDescriptiveName/
    main.tex
    main.bib
    figures/
        graphic.svg
        graphic.pdf
    results/
        TypeOne/
            README
            input
            hpc_batch.sh
            plotTypeOneStats.py
            TypeOneStats.pdf
            output_202310250824/
                metadata
                diff.patch
                plot_eta.pdf
                plot_temp.pdf
                00000cell/
                00001cell/
                outputdata.tar.gz
            output_202310250825/
            output_202310250826/
        TypeTwo/
```

Files in `figures` should be illustrations only, not simulation results. If a figure is editable, include both the editable source, such as an Inkscape SVG, and the rendered file used in the paper.

Files in `results` are simulation results. Any figure that contains simulation data must be stored in `results`, as close as possible to the data that generated it.

For example:

```latex
\includegraphics{results/TypeOne/output_202310250824/plot_eta.pdf}
```

This makes it clear how the visualization was generated.

## Metadata

Each simulation directory must contain enough metadata to reproduce the result.

For Alamo, include the standard metadata outputs. For other codes, include whatever metadata is needed to recreate the calculation. Metadata files should be version controlled.

Raw simulation outputs may be too large for Git. They should still live in the same logical directory, but they should be excluded from version control and moved to archival storage as soon as practical.

## Postprocessing Data

Most visualization and postprocessing is done in Python. All postprocessing scripts must be stored in version control. Raw Python scripts and Jupyter notebooks are both acceptable.

Postprocessing data should be stored as close as possible to the data being processed:

* If a visualization uses one simulation, store it in that simulation directory.
* If a visualization uses multiple simulations, store it in the lowest-level directory that contains all relevant simulations.

Name scripts so their outputs are obvious. For example, `plot_eta.py` should generate `plot_eta.pdf`.

Avoid generic script names such as:

```
analysis.py
get_x.py
randomfunctions.py
```

Avoid disconnected output names such as:

```
eta.pdf
myplot.pdf
output.pdf
```

If a project cannot follow the standard organization, add a `README` explaining the structure. Even then, the results directory must contain the information needed to generate the results and the outputs from those results.

## Journal Submission

Some journals do not support complex LaTeX directory structures. In those cases, a submission-specific fork may be used to move or rename files for journal requirements.

The archival research structure should remain intact.

After acceptance, rename the repository according to the BibTeX tag.

## Archiving

Most sponsored projects require data archiving. The group uses ISU Large Scale Storage for long-term archival storage.

Before archiving, compress large simulation data when possible. For example:

```bash
tar cvzf output.tar.gz *cell
```

Store the compressed output in the same simulation folder before transferring it to archival storage.

## Example Workflow

Suppose you are working on `PaperLargeDeformationElastic`, which builds on Smith et al. 2015.

1. Find the BibTeX tag for the earlier paper, such as `smith2015novel`.
2. Check out the `smith2015novel` repository from Overleaf or GitHub.
3. In the LaTeX source, find where the relevant figure is generated:

```latex
\includegraphics{results/DamageModeling/output.20192930103/plot.pdf}
```

4. Go to `results/DamageModeling/output.20192930103/` and inspect the metadata.
5. If rerunning the simulation is too expensive, locate the archived file:

```
smith2015novel/results/DamageModeling/output.20192930103/output.tar.gz
```

6. Copy the archive into the local clone, regenerate the plot using the local postprocessing script, and confirm the result.
7. Copy the needed input files and results into the corresponding folder in your new paper repository.
8. Add a note in the `README` explaining that the results derive from the previous work.

Good data organization prevents the original author from becoming a future bottleneck and makes prior work easier to understand, reproduce, and extend.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://wiki.solids.group/research-practices/data-management.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
