File Types Every Developer Should Know — And How AI is Changing the Landscape
A practical guide to the file types that power modern software development, from source code to config files to data formats, and how the AI era is introducing new ones you need to know.

DevForge Team
AI Development Educators

Files Are the Language of Software
Every piece of software you build is a collection of files. Source code, configuration, assets, data, documentation — all of it lives in files. Understanding what those files are, what they contain, and how they relate to each other is foundational knowledge that accelerates everything else you learn.
This guide covers the file types you'll encounter in modern development, organized by purpose, with a look at how the AI era is introducing new formats and changing how we work with existing ones.
Source Code Files
Source code files contain human-readable instructions in a programming language. The file extension tells the operating system and tools what language the file contains.
Common Source Code Extensions
| Extension | Language | Notes |
|-----------|----------|-------|
| .py | Python | Interpreted, runs via the Python interpreter |
| .js | JavaScript | Runs in browsers and Node.js |
| .ts | TypeScript | Compiled to JavaScript before running |
| .tsx / .jsx | TypeScript/JavaScript with JSX | React component files |
| .java | Java | Compiled to .class bytecode, runs on JVM |
| .kt | Kotlin | JVM language, modern alternative to Java |
| .cs | C# | .NET language, compiled to CIL bytecode |
| .c / .cpp | C / C++ | Compiled to machine code |
| .rs | Rust | Systems language, compiled to machine code |
| .go | Go | Compiled to machine code, single binary output |
| .swift | Swift | Apple platforms |
| .rb | Ruby | Interpreted, popular for web (Rails) |
| .php | PHP | Server-side web scripting |
How AI is Changing Source Files
AI coding tools now generate, modify, and review source files on your behalf. This is shifting what "writing code" means — developers increasingly write natural language instructions and review AI-generated code rather than typing every character.
More significantly, AI has given rise to entirely new file formats designed for human-AI collaboration. The .cursor config file, for example, tells Cursor's AI assistant how to understand your project. .github/copilot-instructions.md customizes GitHub Copilot's behavior for your codebase. These "AI instruction files" are becoming as standard as .gitignore.
Web Asset Files
Web development involves managing files across several categories:
Markup and Style
- .html — HyperText Markup Language, the structure of web pages
- .css — Cascading Style Sheets, the presentation layer
- .scss / .sass — CSS preprocessors with variables and nesting, compiled to .css
- .less — Another CSS preprocessor
JavaScript Module Formats
- .mjs — ES Module JavaScript (explicit)
- .cjs — CommonJS JavaScript (Node.js style)
- .min.js — Minified JavaScript (whitespace removed, variable names shortened for production)
Template Files
- .jsx / .tsx — React templates mixing JavaScript and HTML-like syntax
- .vue — Vue.js single-file components (template + script + style in one file)
- .svelte — Svelte components
Configuration Files
Configuration files control the behavior of your tools, build system, and runtime environment. Every serious project has a collection of them.
Universal Config Formats
.json (JavaScript Object Notation) — The universal data format for configuration and APIs. Human-readable, machine-parseable, and supported by every language.
{
"name": "my-project",
"version": "1.0.0",
"dependencies": {
"react": "^18.0.0"
}
}.yaml / .yml (YAML Ain't Markup Language) — Popular for DevOps configuration (Docker Compose, Kubernetes, GitHub Actions, CI/CD pipelines). More readable than JSON for complex nested structures because it uses indentation rather than brackets.
services:
web:
image: nginx:alpine
ports:
- "80:80"
database:
image: postgres:15
environment:
POSTGRES_PASSWORD: secret.toml (Tom's Obvious Minimal Language) — Used by Rust's Cargo, Python's pyproject.toml, and increasingly in other tools. Like YAML but with clearer types and less indentation-sensitive.
.env — Environment variable files. Key-value pairs that configure an application for a specific environment (development, staging, production). Never committed to version control because they contain secrets like API keys and database passwords.
Language-Specific Config
- package.json — Node.js project configuration, dependencies, scripts
- tsconfig.json — TypeScript compiler configuration
- pyproject.toml / requirements.txt — Python project configuration and dependencies
- Cargo.toml — Rust project configuration
- go.mod — Go module definition
- pom.xml — Maven (Java) project configuration
- .eslintrc / .eslintrc.json — JavaScript/TypeScript linting rules
- .prettierrc — Code formatter configuration
Data Files
Data files store structured information consumed by applications.
JSON
The dominant format for web APIs and configuration. Almost every API you call returns JSON. When you fetch data from a REST API, you're almost always receiving JSON.
CSV (Comma-Separated Values)
The workhorse of data exchange. Spreadsheets export to CSV. Data scientists read CSV files. Databases export to CSV. Simple, universal, works everywhere.
XML (Extensible Markup Language)
The predecessor to JSON. Still prevalent in enterprise systems, Android resources, Microsoft Office documents (which are zipped XML internally), and SOAP web services.
SQLite (.db / .sqlite)
A self-contained database in a single file. Used by mobile apps (iOS and Android store data in SQLite files), desktop applications, and many development tools. No server required — the entire database is one file you can copy, move, or version control.
Parquet
A columnar data format used in data engineering and analytics. Stores data in columns rather than rows, making it highly efficient for queries that only need certain columns from large datasets. You'll encounter this in AWS, Google BigQuery, and any serious data pipeline.
AI-Specific File Types: The New Landscape
The AI era is generating a wave of new file formats. Understanding these is increasingly important.
Model Weights: .gguf, .safetensors, .pt, .bin
AI model files store the learned parameters (weights) of neural networks. These are binary files, often gigabytes in size.
- .pt / .pth — PyTorch model format, the dominant framework for AI research
- .safetensors — A safer, faster alternative to PyTorch's default format, now standard on Hugging Face
- .gguf — Format used by llama.cpp for running large language models locally on consumer hardware; what you download when running Ollama, LM Studio, or Jan.ai
- .bin — Generic binary format used by older Hugging Face models
Embeddings and Vector Data
Embeddings — numerical representations of text, images, or other data used in semantic search — are stored in various formats:
- .npy / .npz — NumPy array formats, standard for storing embedding matrices
- Vector database files from Chroma, Faiss, and Pinecone are typically proprietary binary formats
Prompt and Agent Configuration Files
A growing set of new file types exists specifically to configure AI systems:
- .md (Markdown) used as prompts — System prompts stored as Markdown files for version control and reuse
- JSONL (.jsonl) — JSON Lines format, where each line is a separate JSON object. The standard format for training data and fine-tuning datasets. Every conversation example for fine-tuning an LLM is typically stored as a JSONL file.
{"messages": [{"role": "user", "content": "What is Python?"}, {"role": "assistant", "content": "Python is a programming language..."}]}
{"messages": [{"role": "user", "content": "Explain loops"}, {"role": "assistant", "content": "Loops repeat code..."}]}Documentation and Markup Files
Markdown (.md)
The near-universal format for documentation in software projects. README files, wikis, documentation sites, and now AI instruction files are all written in Markdown. GitHub renders Markdown automatically, making it the default for project documentation.
# Project Name
A brief description of what this project does.
## Installation
npm install my-package
MDX (.mdx)
Markdown with embedded JSX (React components). Used by documentation sites built with Next.js, Gatsby, and Docusaurus to create interactive documentation with live code examples.
reStructuredText (.rst)
Markdown's older alternative, common in the Python ecosystem. Python's official documentation is written in .rst.
Build and Compiled Output Files
These are files generated by your build process — you don't edit them directly.
- .class — Compiled Java bytecode
- .o / .obj — Compiled object files (intermediate step before linking)
- .exe / .dll (Windows) / .so (Linux) / .dylib (Mac) — Compiled executables and libraries
- .wasm — WebAssembly, the compiled binary format that runs in browsers at near-native speed
- .map (source maps) — Files that map minified/compiled JavaScript back to original source for debugging
Version Control Files
- .gitignore — Lists files and directories Git should not track (node_modules, .env, build output)
- .gitattributes — Configures how Git handles specific file types (line endings, diff display, merge behavior)
- .git/ — The hidden directory containing the entire Git history for a repository
The Shift Ahead: AI-Native File Formats
The AI era is not just adding new file types — it's changing the meaning of existing ones. A .md file used to be documentation. Now it might be an AI system prompt. A .json file used to be configuration. Now it might be a fine-tuning dataset.
More broadly, the relationship between files and meaning is shifting. As AI systems become capable of reading and writing code, the file becomes an interface between human intent and machine action in a way it never was before. Your .cursorrules file communicates your preferences to an AI. Your system prompt in a .md file defines the personality of a deployed AI assistant.
Understanding file types is no longer just about knowing what application opens them. It's about understanding how they participate in a broader system that now includes AI as an active participant in the development process.
For a deeper look at how these tools work together in practice, explore our tutorials on prompt engineering, working with AI APIs, and building RAG pipelines to see these file types in action.