Advanced ChatGPT Workflows

Data Analysis with ChatGPT Code Interpreter

Code Interpreter gives ChatGPT the ability to actually run Python, analyze files, generate charts, and perform computations — transforming it from a code generator into an interactive data analysis environment.

What Code Interpreter Actually Is

Code Interpreter is a sandboxed Python execution environment attached to ChatGPT. When Code Interpreter is enabled, ChatGPT can:

  • Write and execute Python code
  • Read files you upload (CSV, Excel, JSON, text, images)
  • Generate charts and visualizations using matplotlib
  • Perform mathematical computations precisely (unlike plain conversation)
  • Create files and offer them for download

This changes ChatGPT from a code generator into an interactive analysis environment. The difference: it can iterate on code by running it, observing the output, and refining — not just predicting what code should look like.

Enabling Code Interpreter

Code Interpreter is available to ChatGPT Plus users. Enable it by:

  1. Starting a new conversation
  2. Clicking the "+" icon in the prompt area
  3. Selecting "Attach files" — Code Interpreter activates automatically when you attach a file
  4. Or explicitly: type your request and ChatGPT will invoke it when needed

Alternatively, use a GPT with Code Interpreter enabled in its configuration.

Data Analysis Workflows

CSV Analysis

Upload a CSV and ask directly:

text
Analyze this CSV file.

1. Show me the shape (rows, columns) and data types
2. Check for missing values — report count and % per column
3. Show basic statistics for numeric columns
4. Identify any obvious data quality issues
5. Summarize what this data appears to represent

ChatGPT will run pandas operations and report findings. The code it ran is visible and can be copied.

Exploratory Analysis

text
For this sales dataset:
1. Plot monthly revenue over time (line chart)
2. Show revenue by category as a bar chart
3. Identify the top 10 customers by lifetime value
4. Calculate month-over-month growth rate for the last 12 months
5. Flag any months where revenue dropped more than 15% from the prior month

Use matplotlib for charts. Export a summary CSV with the monthly metrics.

Data Cleaning

text
Clean this dataset:
1. Standardize all date columns to ISO 8601 format (YYYY-MM-DD)
2. Normalize email addresses to lowercase and trim whitespace
3. Remove rows where [key column] is null
4. Deduplicate based on [identifier column], keeping the most recent row
5. Validate phone numbers — flag rows with non-standard formats

Show me a before/after comparison of affected rows.
Download the cleaned dataset as a new CSV.

Statistical Analysis

text
Run a cohort retention analysis on this user activity data.

The data has: user_id, event_date, event_type.

Calculate:
- Week-0 cohort size (users who first appeared each week)
- Retention % at weeks 1, 2, 4, 8, 12 for each cohort
- Display as a heatmap with week-0 as rows, retention weeks as columns
- Highlight cells below 20% retention in red

Precise Calculations

For arithmetic and calculations that require precision, always use Code Interpreter rather than relying on plain conversation:

text
Calculate: what is the compound annual growth rate of a $50,000
investment that grew to $183,700 over 13 years?

Then build a table showing the value at each year if it continued
growing at the same CAGR for 10 more years.

Code Interpreter computes this exactly. Plain conversation predicts a plausible-sounding answer that may be wrong.

Working With Non-CSV Files

Excel Files

text
This Excel file has multiple sheets. List all sheet names and
show me the first 5 rows of each. Then combine all sheets into
a single dataset and export as CSV.

Text Files and Logs

text
Parse this log file. Count error occurrences by type.
Show me the 10 most frequent errors with timestamps of
first and last occurrence. Plot error frequency over time.

JSON Data

text
This JSON file contains nested API response data.
Flatten the nested structure into a table with these columns:
[list the fields you want].
Handle missing fields with null. Export as CSV.

Iteration Workflow

Code Interpreter allows real iteration — not just asking once, but refining:

text
[Initial prompt with uploaded file]
> Analyze the sales data

[After seeing the chart]
> The x-axis labels are overlapping. Rotate them 45 degrees
  and increase figure width to 14 inches.

[After seeing the fix]
> Export this chart as a PNG at 300 DPI for a presentation.

Each refinement runs new code against the same data. You're not re-generating from scratch — you're iterating on a working analysis.

Limitations to Know

  • Session-scoped: Files uploaded and results generated exist only in the current conversation. Download anything you need before closing.
  • Python only: Code Interpreter runs Python, not JavaScript, SQL, or other languages.
  • No internet access: It cannot fetch live data. Upload files with the data you need.
  • Execution timeout: Very long-running computations will time out. Break large analyses into steps.
  • Memory limits: Very large files (>100MB) may fail or produce degraded results.

Key Takeaways

  • Code Interpreter actually executes Python — use it for calculations, analysis, and file processing, not just code generation
  • Upload your actual data files and describe the analysis in plain language
  • Always use Code Interpreter for arithmetic that needs to be correct — don't rely on plain conversation for precise calculations
  • Iterate: charts and analysis results can be refined step-by-step
  • Download all outputs before closing the session — they don't persist

---

Try It Yourself: Export any dataset you have access to as a CSV (even a small one — a spreadsheet export works). Upload it to ChatGPT with Code Interpreter enabled. Ask: "Give me a complete analysis of this dataset: shape, data types, missing values, basic statistics, and one visualization showing the most interesting trend." Observe what it produces without you writing a single line of Python.