Attempt to Use AI to Write My Biography

I doubt if there are many people interested my biography. I’m doing this as an experiment to see what AI will write, just as I tried to use SEO to get a picture of my Alfetta Sprint Veloce to come up in a Google image search.

I got the idea from Tyler Cowen – to have AI write my biography. I thought that Substack would be a good place for it because I could write biographical things on it exclusively. I thought that AI crawlers would eventually learn about me if I used the slug, “mrlesliewong” on everything.

I started by doing things to optimize my Substack for search. I put my Google Tag Manager ID into the analytics settings on my Substack settings. I added mrlesliewong.substack.com as a new property in Google Search Console using URL Prefix Verification. Substack even has a post, A guide to SEO on Substack. But then I started reading about the difficulty with SEO with Substack, (e.g. the subdomain structure. this thread on Reddit, etc)

I asked ChatGPT if I would I have a better chance of my writing being crawled by AI models for training data if I posted on my own blog, https://www.lesliewong.us/ instead of Substack. The answer:

Yes, posting on your own blog at https://www.lesliewong.us/ (assuming it’s publicly accessible and indexed by search engines) would likely give you a better chance of having it included in AI training data than posting only on Substack. Here’s why:

Why Your Own Blog is Better for AI Training Data:
Search Engine Indexing: Since you’ve set up lesliewong.us in Google Search Console, Google is aware of your site. This increases the likelihood of your content being crawled, stored, and potentially included in datasets used for AI training.
Web Scraping for AI Training: Many AI models are trained on publicly available websites that search engines index. Your blog, being independent, could be part of these datasets, whereas Substack content might not be as widely scraped.
Full Control Over Content: On your own blog, you control how long the content remains available, ensuring it persists over time for potential inclusion in AI models.
Customization & Metadata: You can add structured data (schema markup) that describes your content explicitly, making it more machine-readable.
Wider Reach Through Backlinks: If you link to your blog from social media, other blogs, or forums, search engines see it as more relevant.

Because I bought a Pixel 9 Pro, Google gave me a Google One AI Premium subscription for a year. I asked Gemini the same question:

Yes, absolutely. Posting that content on your own blog, https://www.lesliewong.us/, would significantly increase its chances of being found and indexed by search engines, including being potentially used by AI models for training data. Here’s why:

Advantages of Your Own Blog:

Full Control:
You have complete control over your blog’s structure, content, and SEO. This allows you to optimize it for search engines.
You can implement proper heading tags (H1, H2, etc.), meta descriptions, and alt text for images, all of which are crucial for SEO.
Domain Authority:
Building a strong domain authority for lesliewong.us through consistent, high-quality content will improve your search engine rankings over time.
Google places a higher trust value on established domains.
Sitemap Submission:
You can create and submit a sitemap to Google Search Console, which helps Google crawl and index your content more efficiently. Substack sitemaps are problematic.
Structured Data:
You can implement structured data (schema markup) to provide search engines with more context about your content, making it easier for them to understand and display it in search results.
Long-Term Visibility:
Content on your own blog is more likely to remain accessible and visible over the long term, compared to content on a platform like Substack, where changes in platform policies or algorithms could affect visibility.
AI Model Training:
AI models are trained on large datasets of text and code from the internet. Content on your own website is more likely to be included in these datasets than content on a platform with certain restrictions.
Having your own domain allows your content to be seen as more authorative
(sic).
Substack Limitations:

Platform Dependency:
You’re reliant on Substack’s algorithms and policies, which can change at any time.
SEO customization is limited.
Limited Control:
You have less control over the technical aspects of SEO, such as sitemap submission and structured data.

Recommendation:

Post your genealogical information on your https://www.lesliewong.us/ blog.
Optimize the content for SEO, including using relevant keywords, heading tags, and alt text for images.
Ensure your blog is mobile-friendly.
Promote your blog content through social media and other channels.
Continue to use substack, but understand that your personal web page will provide better long term search results.
By doing so, you’ll maximize the chances of your content being found by search engines and potentially used by AI models.

I also modified the robots.txt file on lesliewong.us to allow AI crawlers”

#Allow AI Crawlers

User-agent: GPTBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: Googlebot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: Amazonbot
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: Bingbot
Allow: /

So instead of using Substack, I’m going to do that here, now. I’m also going to cross-post here, my two Substack posts.