cat articles/noteable-iris

Analyzing the Iris dataset with ChatGPT's Noteable plugin

The OpenCALM data I wrote about earlier was far too small, so I tried asking Noteable to analyze the Iris dataset, the classic dataset that everyone loves and that has probably been analyzed a hundred million times. The results were about what I expected, but it also quickly wrote code for plotting graphs and trying several algorithms, which felt very convenient. This is a short note about that.


The standard statistics output was unsurprising, but the pair plot was nice. I usually end up reading the documentation while writing pair plots, but it generated a clean seaborn plot and colored it by target, or species.

I only gave it an instruction like: "I want to build a model that predicts target using data other than target. What algorithms would be good for building the prediction model? Please answer using Noteable." From that, it split the data into train and test at 8:2, wrote implementations for five sklearn algorithms, and displayed the actual results. That was convenient because writing this by hand each time is bothersome. This time all models achieved 100% accuracy, but if the accuracy had differed by algorithm, I could probably ask why and get an explanation.

The notebook code automatically created by Noteable looked like this:

from sklearn.model_selection import train_test_split

X = iris_df.drop('target', axis=1)
y = iris_df['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train.shape, X_test.shape, y_train.shape, y_test.shape
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

models = [
    ('Logistic Regression', LogisticRegression()),
    ('Decision Tree', DecisionTreeClassifier()),
    ('Random Forest', RandomForestClassifier()),
    ('SVM', SVC()),
    ('KNN', KNeighborsClassifier())
]

results = []

for name, model in models:
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    results.append((name, accuracy))

results
[('Logistic Regression', 1.0),
 ('Decision Tree', 1.0),
 ('Random Forest', 1.0),
 ('SVM', 1.0),
 ('KNN', 1.0)]

ChatGPT then returned an easy-to-understand explanation of these results.


Next was clustering and visualization with dimensionality reduction. This is another thing that is quietly annoying to write yourself because you end up checking the documentation, but it generated the code quickly. It first used PCA for dimensionality reduction, and when I asked what would happen with t-SNE, the graph appeared right away.

The clustering and dimensionality-reduction plots with PCA and t-SNE looked like this:


ChatGPT cannot properly analyze unknown data it has never seen, although few-shot examples can help. A service that lets it execute a notebook, observe the results, and then continue the conversation through ChatGPT complements that weakness well. It made me feel again that Noteable is impressive.

People who do data analysis probably see Iris and think, "Ah, another Iris tutorial", and do not feel like analyzing it again. I was surprised that a day came when I voluntarily wanted to analyze the Iris dataset again.

cat related_articles/noteable-iris.yaml

  1. Using ChatGPT's Noteable and WebPilot plugins to build a notebook that predicts OpenCALM 14B performanceI tried the Noteable plugin with ChatGPT and WebPilot to scrape OpenCALM model data, build a notebook, plot parameter counts and perplexity, and estimate the performance of a hypothetical 14B model.
  2. Launching AI News and how I used OpenAI behind itI launched AI News, a site that collects AI, data science, and machine learning topics and summarizes them into three lines with AI. This article describes why I built it and how I used OpenAI APIs for classification and summarization.
  3. Understanding LangChain Expression Language (LCEL)LCEL is LangChain's recommended way to build chains. This article explains the basic behavior of Runnable, RunnableSequence, RunnableParallel, dict syntax, invoke, and RunnablePassthrough step by step.