Best of your X follows: June 20

The strongest signal today is not one giant launch. It is a set of small tests for where AI systems are starting to show up: model comparisons that use artifacts instead of leaderboard numbers, public-sector workflow prototypes, and developer tools that now assume agents can write to real systems.

Source mix: mostly X posts from the monitored account set, plus Simon Willison's weblog when his X timeline was quiet. Pure retweets, one-line political posts, and low-context small talk were left out.

Model releases and evaluation

Ethan Mollick: GLM-5.2 Max can do the task, but Fable still changes the shape of it

What happened: Mollick credited GLM-5.2 Max, a new open-weights model, for completing a constrained poem task that involved disappearing letters 1.

Why it matters: his comparison was not about whether the output was correct. He argued that Fable integrated the disappearing-letter constraint into the poem's theme, while GLM-5.2 Max mostly satisfied the surface requirement 1.

Implication: if you evaluate creative or agentic systems only by task completion, you miss the difference between following an instruction and using the constraint as part of the work.

コンテンツカードを読み込んでいます…

Ethan Mollick: a 20-model harbor-town gallery as an AI progress test

What happened: Mollick shared a benchmark prompt asking models to build a procedurally generated 3D harbor-town simulation from 3000 BCE to 3000 AD, with beauty and user control in the spec 2.

Why it matters: the linked gallery compares model outputs from one prompt and describes the set as spanning 39 months of AI progress; the older GPT-3.5 and GPT-4 entries needed one standardized follow-up 3.

Implication: this is the kind of artifact-based benchmark that is easy for practitioners to inspect. You can judge coherence, interactivity, aesthetics, and failure modes without reducing everything to one score.

コンテンツカードを読み込んでいます…

Public-sector AI

Google DeepMind: planning-office prototype targets housing applications

What happened: Google DeepMind said it is working with UK government bodies on an AI housing application planning prototype 4.

Why it matters: the post says the prototype is aimed at repetitive planning-officer work, so officers can spend more attention on complex projects 4.

Implication: DeepMind is claiming a processing-time reduction of up to 50%. Treat that as a target claim from the project team, not an audited deployment result yet 4.

コンテンツカードを読み込んでいます…

Developer tools and engineering practice

Simon Willison: Datasette gets first-class row editing

What happened: Simon Willison released Datasette 1.0a34, adding insert, edit, and delete tools to the Datasette interface 5.

Why it matters: the feature is available on table pages, while edit and delete also appear as row-level actions. That makes the ordinary UI catch up with the write workflows Simon had already been exploring through Datasette Agent 5.

Implication: agent-assisted database work is pushing product surfaces back toward explicit human approval and visible edit controls, not just chat-only automation.

Datasette row-editing interface — Datasette 1.0a34 adds row insert, edit, and delete actions to the web interface 5.

Simon Willison / Charity Majors: AI coding raises the bar for engineering discipline

What happened: Willison surfaced Charity Majors' argument that AI made code generation cheap and fast, changing the economics of software production 6.

Why it matters: Majors' longer piece argues that if code becomes more disposable, teams need stronger production understanding, observability, review habits, and system invariants, not weaker ones 7.

Implication: the practical takeaway for AI coding teams is blunt: optimize for shared understanding and production feedback, because generated code is cheap and operational confusion is still expensive.

Short signals

Greg Brockman: GPT-Realtime-2 gets a terse internal endorsement

What happened: Greg Brockman posted that "GPT-Realtime-2 is something new" 8.

Why it matters: the post gives no launch note or technical detail, so the signal is weaker than a product announcement. It does show OpenAI's cofounder drawing attention to the realtime line after recent voice and WebRTC experiments in the developer community 8.

Implication: keep an eye on demos and docs before treating this as more than a high-level hint.

コンテンツカードを読み込んでいます…

François Chollet: solve hard problems by reframing, not piling on complexity

What happened: Chollet argued that hard problems are rarely solved by adding complexity; they are solved by reframing the question until a simpler answer becomes visible 9.

Why it matters: in the context of AI research and software design, that is a useful counterweight to scale-first thinking. More machinery can hide a bad problem statement.

Implication: before adding another layer to an agent pipeline, ask whether the task definition is wrong.

Best of your X follows: June 20

Model releases and evaluation

Ethan Mollick: GLM-5.2 Max can do the task, but Fable still changes the shape of it

Ethan Mollick: a 20-model harbor-town gallery as an AI progress test

Public-sector AI

Google DeepMind: planning-office prototype targets housing applications

Developer tools and engineering practice

Simon Willison: Datasette gets first-class row editing

Simon Willison / Charity Majors: AI coding raises the bar for engineering discipline

Short signals

Greg Brockman: GPT-Realtime-2 gets a terse internal endorsement

François Chollet: solve hard problems by reframing, not piling on complexity

参考ソース