Back to Blog
Developer Tools 12 min read February 19, 2025

File Types Every Developer Should Know — And How AI is Changing the Landscape

A practical guide to the file types that power modern software development, from source code to config files to data formats, and how the AI era is introducing new ones you need to know.

DevForge Team

DevForge Team

AI Development Educators

Files and folders on a computer screen representing organized data structures

Files Are the Language of Software

Every piece of software you build is a collection of files. Source code, configuration, assets, data, documentation — all of it lives in files. Understanding what those files are, what they contain, and how they relate to each other is foundational knowledge that accelerates everything else you learn.

This guide covers the file types you'll encounter in modern development, organized by purpose, with a look at how the AI era is introducing new formats and changing how we work with existing ones.

Source Code Files

Source code files contain human-readable instructions in a programming language. The file extension tells the operating system and tools what language the file contains.

Common Source Code Extensions

| Extension | Language | Notes |

|-----------|----------|-------|

| .py | Python | Interpreted, runs via the Python interpreter |

| .js | JavaScript | Runs in browsers and Node.js |

| .ts | TypeScript | Compiled to JavaScript before running |

| .tsx / .jsx | TypeScript/JavaScript with JSX | React component files |

| .java | Java | Compiled to .class bytecode, runs on JVM |

| .kt | Kotlin | JVM language, modern alternative to Java |

| .cs | C# | .NET language, compiled to CIL bytecode |

| .c / .cpp | C / C++ | Compiled to machine code |

| .rs | Rust | Systems language, compiled to machine code |

| .go | Go | Compiled to machine code, single binary output |

| .swift | Swift | Apple platforms |

| .rb | Ruby | Interpreted, popular for web (Rails) |

| .php | PHP | Server-side web scripting |

How AI is Changing Source Files

AI coding tools now generate, modify, and review source files on your behalf. This is shifting what "writing code" means — developers increasingly write natural language instructions and review AI-generated code rather than typing every character.

More significantly, AI has given rise to entirely new file formats designed for human-AI collaboration. The .cursor config file, for example, tells Cursor's AI assistant how to understand your project. .github/copilot-instructions.md customizes GitHub Copilot's behavior for your codebase. These "AI instruction files" are becoming as standard as .gitignore.

Web Asset Files

Web development involves managing files across several categories:

Markup and Style

  • .html — HyperText Markup Language, the structure of web pages
  • .css — Cascading Style Sheets, the presentation layer
  • .scss / .sass — CSS preprocessors with variables and nesting, compiled to .css
  • .less — Another CSS preprocessor

JavaScript Module Formats

  • .mjs — ES Module JavaScript (explicit)
  • .cjs — CommonJS JavaScript (Node.js style)
  • .min.js — Minified JavaScript (whitespace removed, variable names shortened for production)

Template Files

  • .jsx / .tsx — React templates mixing JavaScript and HTML-like syntax
  • .vue — Vue.js single-file components (template + script + style in one file)
  • .svelte — Svelte components

Configuration Files

Configuration files control the behavior of your tools, build system, and runtime environment. Every serious project has a collection of them.

Universal Config Formats

.json (JavaScript Object Notation) — The universal data format for configuration and APIs. Human-readable, machine-parseable, and supported by every language.

json
{
  "name": "my-project",
  "version": "1.0.0",
  "dependencies": {
    "react": "^18.0.0"
  }
}

.yaml / .yml (YAML Ain't Markup Language) — Popular for DevOps configuration (Docker Compose, Kubernetes, GitHub Actions, CI/CD pipelines). More readable than JSON for complex nested structures because it uses indentation rather than brackets.

yaml
services:
  web:
    image: nginx:alpine
    ports:
      - "80:80"
  database:
    image: postgres:15
    environment:
      POSTGRES_PASSWORD: secret

.toml (Tom's Obvious Minimal Language) — Used by Rust's Cargo, Python's pyproject.toml, and increasingly in other tools. Like YAML but with clearer types and less indentation-sensitive.

.env — Environment variable files. Key-value pairs that configure an application for a specific environment (development, staging, production). Never committed to version control because they contain secrets like API keys and database passwords.

Language-Specific Config

  • package.json — Node.js project configuration, dependencies, scripts
  • tsconfig.json — TypeScript compiler configuration
  • pyproject.toml / requirements.txt — Python project configuration and dependencies
  • Cargo.toml — Rust project configuration
  • go.mod — Go module definition
  • pom.xml — Maven (Java) project configuration
  • .eslintrc / .eslintrc.json — JavaScript/TypeScript linting rules
  • .prettierrc — Code formatter configuration

Data Files

Data files store structured information consumed by applications.

JSON

The dominant format for web APIs and configuration. Almost every API you call returns JSON. When you fetch data from a REST API, you're almost always receiving JSON.

CSV (Comma-Separated Values)

The workhorse of data exchange. Spreadsheets export to CSV. Data scientists read CSV files. Databases export to CSV. Simple, universal, works everywhere.

XML (Extensible Markup Language)

The predecessor to JSON. Still prevalent in enterprise systems, Android resources, Microsoft Office documents (which are zipped XML internally), and SOAP web services.

SQLite (.db / .sqlite)

A self-contained database in a single file. Used by mobile apps (iOS and Android store data in SQLite files), desktop applications, and many development tools. No server required — the entire database is one file you can copy, move, or version control.

Parquet

A columnar data format used in data engineering and analytics. Stores data in columns rather than rows, making it highly efficient for queries that only need certain columns from large datasets. You'll encounter this in AWS, Google BigQuery, and any serious data pipeline.

AI-Specific File Types: The New Landscape

The AI era is generating a wave of new file formats. Understanding these is increasingly important.

Model Weights: .gguf, .safetensors, .pt, .bin

AI model files store the learned parameters (weights) of neural networks. These are binary files, often gigabytes in size.

  • .pt / .pth — PyTorch model format, the dominant framework for AI research
  • .safetensors — A safer, faster alternative to PyTorch's default format, now standard on Hugging Face
  • .gguf — Format used by llama.cpp for running large language models locally on consumer hardware; what you download when running Ollama, LM Studio, or Jan.ai
  • .bin — Generic binary format used by older Hugging Face models

Embeddings and Vector Data

Embeddings — numerical representations of text, images, or other data used in semantic search — are stored in various formats:

  • .npy / .npz — NumPy array formats, standard for storing embedding matrices
  • Vector database files from Chroma, Faiss, and Pinecone are typically proprietary binary formats

Prompt and Agent Configuration Files

A growing set of new file types exists specifically to configure AI systems:

  • .md (Markdown) used as prompts — System prompts stored as Markdown files for version control and reuse
  • JSONL (.jsonl) — JSON Lines format, where each line is a separate JSON object. The standard format for training data and fine-tuning datasets. Every conversation example for fine-tuning an LLM is typically stored as a JSONL file.
jsonl
{"messages": [{"role": "user", "content": "What is Python?"}, {"role": "assistant", "content": "Python is a programming language..."}]}
{"messages": [{"role": "user", "content": "Explain loops"}, {"role": "assistant", "content": "Loops repeat code..."}]}

Documentation and Markup Files

Markdown (.md)

The near-universal format for documentation in software projects. README files, wikis, documentation sites, and now AI instruction files are all written in Markdown. GitHub renders Markdown automatically, making it the default for project documentation.

markdown
# Project Name

A brief description of what this project does.

## Installation

npm install my-package

text

MDX (.mdx)

Markdown with embedded JSX (React components). Used by documentation sites built with Next.js, Gatsby, and Docusaurus to create interactive documentation with live code examples.

reStructuredText (.rst)

Markdown's older alternative, common in the Python ecosystem. Python's official documentation is written in .rst.

Build and Compiled Output Files

These are files generated by your build process — you don't edit them directly.

  • .class — Compiled Java bytecode
  • .o / .obj — Compiled object files (intermediate step before linking)
  • .exe / .dll (Windows) / .so (Linux) / .dylib (Mac) — Compiled executables and libraries
  • .wasm — WebAssembly, the compiled binary format that runs in browsers at near-native speed
  • .map (source maps) — Files that map minified/compiled JavaScript back to original source for debugging

Version Control Files

  • .gitignore — Lists files and directories Git should not track (node_modules, .env, build output)
  • .gitattributes — Configures how Git handles specific file types (line endings, diff display, merge behavior)
  • .git/ — The hidden directory containing the entire Git history for a repository

The Shift Ahead: AI-Native File Formats

The AI era is not just adding new file types — it's changing the meaning of existing ones. A .md file used to be documentation. Now it might be an AI system prompt. A .json file used to be configuration. Now it might be a fine-tuning dataset.

More broadly, the relationship between files and meaning is shifting. As AI systems become capable of reading and writing code, the file becomes an interface between human intent and machine action in a way it never was before. Your .cursorrules file communicates your preferences to an AI. Your system prompt in a .md file defines the personality of a deployed AI assistant.

Understanding file types is no longer just about knowing what application opens them. It's about understanding how they participate in a broader system that now includes AI as an active participant in the development process.

For a deeper look at how these tools work together in practice, explore our tutorials on prompt engineering, working with AI APIs, and building RAG pipelines to see these file types in action.

#File Types#Developer Tools#AI#Data Formats#Beginner