Add core/read-content ability#739
Conversation
✅ WordPress Plugin Check Report
📊 ReportAll checks passed! No errors or warnings found. 🤖 Generated by WordPress Plugin Check Action • Learn more about Plugin Check |
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message. To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
d1db766 to
88df83c
Compare
|
Nice work so far! A few things to iron out:
|
|
For reference, I'm sharing WP CLI commands related to read-only operations on posts:
|
|
@jorgefilipecosta looks like some code review feedback and merge conflicts to clean up to help move this along towards merge and inclusion in the next AI plugin release (to help get some usage testing & feedback before a parallel PR is landed for core in 7.1) |
This is an unrealistic and IMO unwanted end-goal. I think we need to accept that no new core abilities will ship in 7.1, even if they make it into [email protected] < 3 weeks is not enough time to get actual API design feedback for core. It's not even a full AI Plugin release cycle. By all means if we can get this cleaned up in time for the next plugin release, great, but considering that they're not even behind an Experiment toggle, I wouldn't want y'all rushing it in before you or @dkotter think its viable because of a desire to squeeze it into beta1. |
3a7df4a to
b3748e6
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #739 +/- ##
=============================================
+ Coverage 75.50% 76.80% +1.30%
- Complexity 2086 2261 +175
=============================================
Files 99 100 +1
Lines 8626 9196 +570
=============================================
+ Hits 6513 7063 +550
- Misses 2113 2133 +20
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
Hi @gziolo, your feedback was applied. |
peterwilsoncc
left a comment
There was a problem hiding this comment.
I've added a few notes inline.
To avoid burying the lead: I think the key question I have is why you don't include the rendered content so bots can be set up with a read-only account?
|
After a closer inspection, I want to echo the point that @peterwilsoncc raised regarding the data formatting. We need to decide whether we return raw data or rendered/filtered data. This is what I see now:
What is needed will largely depend on the consumer. If they only want to present the data, then the preferred option would be using dates in the site's timezone, title, and content as rendered HTML. If they want to edit content, then maybe fetching raw data would make more sense. There are two ways to go about it:
@justlevine, thank you for the reminder about the timeline for the WP 7.1 release. Let's see how much work is left to iron out all the feedback raised so far. Either way, we want to get some early testing through the AI plugin. All the feedback is appreciated and will help shape this essential ability. |
| 'content_rendered' => array( | ||
| 'type' => 'string', | ||
| 'description' => __( 'The rendered post content. Present when the post type supports the editor. Empty when withheld for a password-protected post.', 'ai' ), | ||
| ), |
There was a problem hiding this comment.
If we return both content_raw and content_rendered, this will fill up the context window with duplicated content.
Suggestion: have a scope input field(similar to REST API)
editwill return the raw content that an agent can modify and update on a new requestreadit will return the "content_rendered". Probably best if we can strip html tags also (or transformed to MD :D [nice to have]) (I have no strong opinion on this one but with stripped tags we will be nice with the context window)
There was a problem hiding this comment.
readit will return the "content_rendered". Probably best if we can strip html tags also (or transformed to MD :D [nice to have]) (I have no strong opinion on this one but with stripped tags we will be nice with the context window)
Warning
Opinionated 😸
I'd be reluctant to add the required code to core given it would add quite a bit of work to each request.
Dries Buytaert wrote up his experience of providing a markdown version of his content for LLMs and finding that the agents simply crawled both versions, so I think it's pretty safe to assume that they're happy with HTML.
There was a problem hiding this comment.
IMO that article has little to do with Markdown vs HTML and is rather about sources of truth/context rot (LLMs.txt is ineffective) and misunderstandings about how agentic discovery works (nothings going to visit your MD endpoints if you don't explicitly encourage it to).
That aside , there are other implications with exposing raw vs rendered content, including security and end-user case. As abilities are meant to be a primitive and not a MCP-specific transport, I think the way to address that in MCP (or upstream in Abilities) is an arg to filter the output.
If in the short term this is a concern, then I'd say keep content_rendered without content_raw, since even if LLMs do prefer markdown to HTML, they'd want the markdown version of the rendered html, not the one fill with custom block markup or unprocessed synced patterns etc. But longer term, IMO the solution belongs in a different layer.
There was a problem hiding this comment.
That aside , there are other implications with exposing raw vs rendered content,
@jasonbahl any chance you have a few minutes to share your thoughts from on over in WPGraphQL-land? With how much influence we took in design language, it would make sense for the output ergonomics to wind up being similar, just maybe a bit flatter. Same considerations, at least 🤔
There was a problem hiding this comment.
In my earlier #739 (comment), I referenced WP CLI as a good example of where good defaults solve the issue you discuss. It could be replicated here by returning the same set: ID, post_title, post_name, post_date, and post_status. All other fields remain accessible, but the consumer needs to explicitly list them in the input using the fields array.
There was a problem hiding this comment.
👋 @justlevine I left some notes here addressing the raw/rendered concerns along with some broader concerns: #739 (review)
There was a problem hiding this comment.
I think the whole discussion missed my point. It's not about HTML vs Markdown, that was just a side note.
The real point: we need both content_raw and content_rendered, but never both at once.
- editing → agent needs
content_raw(block markup, shortcodes) so it can change it and write it back. Rendered HTML is useless here. - reading → agent needs
content_rendered(what a human sees). Raw markup is just noise.
Any single call is one job or the other. So the field it doesn't need is always 100% redundant, the full post body duplicated on every call.
So I'd go with one content field that returns one or the other based on context: raw when editing is needed, rendered when it's read-only. If we want a default, probably raw only.
@jasonbahl this is basically what ?context=edit (REST) and format: RAW (WPGraphQL) already do, force the caller to declare intent. Same idea, just collapsed into a single field instead of two you opt into separately.
One more thing: if we go this way, the ability description needs updating too, so the agent knows when to pick each one (raw when it intends to edit, rendered when it just needs to read).
There was a problem hiding this comment.
The real point: we need both content_raw and content_rendered, but never both at once.
This is not true. Putting aside yet again that abilities aren't meant to serve agents they're meant to be a underlying primitive to serve all last-mile APIs, an agent
- doing migrations,
- building patterns that match existing site designs,
- or frontend generative ui work,
are all cases that would benefit from having both the preprocessed _raw and the full-fat _rendered data beyond the well-established non-AI ones like page builders and headless previews.
But also, worse than that, this would pollute the API with state: you need to know the context in order to be able to accomodate for the data. I've got PostObject.content but no idea whether or not it needs parsing or rendering or data sanitization anymore.
There was a problem hiding this comment.
Yah I agree with @justlevine point I can think some cases where having rendered and non rendered content may be useful for an agent. Currently we don't return content by default, but we can request rendered or raw, or both, same for any other field.
Co-authored-by: Peter Wilson <[email protected]>
Co-authored-by: Peter Wilson <[email protected]>
f5136db to
75365de
Compare
|
Thank you all for the discussions and insights. It feels like the remaining discussion point is when to return rendered, raw, or both. By default the fields we return are very lean: id, post_type, status, date, slug, title_rendered (if title is supported). This allows us to keep responses lean while still giving agents access to either rendered versions (e.g., a summarization agent), raw versions (e.g., an edit agent), or both (e.g., some migration or SEO/GEO analysis agent). It is very flexible and agents can decide exactly what they need, but it also makes the code more complex, and forces agent workflows to "think" about the fields needed instead of just asking for edit fields or display fields. I'm not opposed to the introduction of a format like @gziolo said at #739 (comment), format being display, raw and I guess full when the agent wants both?. Format would essentially pre-select a group of fields. Another option could also be to make fields support the current array structure but also allow fields to be a string (display, raw, full); when it is a string, it essentially maps to a fixed array, e.g., display is equal to id, post_type, status, date, slug, title_rendered. And would avoid the format concept. I guess we have three options we need to decide on:
|
galatanovidiu
left a comment
There was a problem hiding this comment.
Did another pass on the read/query paths. Direction is good and the mode split reads well now. A cluster of things need work before this is solid, most of them in format_post() / execute_get_content() and the date helpers. Leaving the detail inline, short version here:
- Query
totalcan disagree withposts. Afieldssubset that empties every row gets those rows dropped fromposts, buttotalis read before the drop, so you can gettotal: 16withposts: []. date_gmt/modified_gmtare wrong on any non-UTC site. Both the normal path and the draft fallback mislabel the offset or the instant. Detail inline onformat_gmt_date().excerpt_renderedhides the real excerpt from editors on password-protected posts, even thoughcontent_renderedcorrectly shows them the content. Inconsistent, and it contradicts the field's own schema description.- A post whose slug is literally
"0"can't be fetched by slug —! empty()trap, it falls through to query mode. content_not_foundis dead code on every transport, becausecheck_permission()already resolves the object and denies first. Worth a decision: keep the defense-in-depth and drop the unreachable 404, or move per-object checks intoexecute()so agents get a real 404 vs 403.total/total_pagesdescriptions claimX-WP-Totalheaders that I don't see on/run. Question inline, might be thepaginationmeta not being honored.
A few non-blocking nits also inline (openWorldHint, nullable GMT types, sibling ai/get-post-details overlap, a missing test).
Query mode dropped rows whose requested-field projection was empty while total/total_pages still came from the unfiltered found_posts count, so a fields subset that applied to no row could report e.g. total 16 with an empty posts list. Keep those rows as empty objects instead, and document that total can still exceed the returned posts when row-level permission checks withhold entries (matching REST posts controller behavior).
The abilities run controller returns ability results verbatim and never emits X-WP-Total / X-WP-TotalPages headers, and no consumer reads the pagination meta flag, so drop the flag and the header claims from the total/total_pages descriptions. Query totals stay in the response body. Also trim the empty-projection note from the posts description.
normalize_fields() no longer string-filters the requested names or intersects them with the supported set: the input schema's enum already rejects unknown field names as ability_invalid_input before execution, so the extra normalization only masked invalid requests. Add a test asserting an unknown field name fails validation.
Make get_post_properties() the single source of truth for the ability's post fields. It returns the field definitions keyed by name in output order; the output schema uses the definitions directly and the input schema fields enum uses the keys. This removes the parallel fields list, so adding or removing a field is a single edit and the input enum and output schema can no longer drift. The edit-context list stays separate: it drives permission decisions in has_explicit_edit_fields(), not the schema shape.
Apply the open review feedback on the core/read-content ability: - Make check_permission() the authoritative permission decision for single-post modes and drop the duplicate per-post re-checks from the execute callback. Query mode keeps row-level filtering because rows are unknown until the query runs. not_found_error() is documented as the direct-invocation contract of the execute callback. - Treat the literal slug "0" as a valid single-post lookup instead of falling through to query mode. - Resolve post type capabilities through post_type_cap(), failing closed with do_not_allow when a capability name cannot be resolved. - Suspend the cookie-based password gate while rendering for users who can edit a password-protected post, so get_the_excerpt() returns the real excerpt instead of the protected-post placeholder. - Fix GMT date handling: read the stored GMT columns directly (deriving them from the local date for drafts) instead of get_post_datetime(), which reprojects GMT dates into the site timezone on non-UTC sites; simplify the local-date fallback and document the empty-string sentinel in the date field descriptions. - Declare the ability closed-world (open_world: false) for MCP clients and state in the description that it requires authentication and is exact-match only, not full-text search. - Cover each point with integration tests.
Summary
Part of: #40
Adds the read-only
core/read-contentability to the plugin, mirroring the companion WordPress Core implementation so the two stay in sync. It overrides any core-provided copy.The ability has explicit modes:
id, with optionalpost_typeguard, returns the post directlypost_typeandslugreturns the post directly{ posts, total, total_pages }and supportsstatus,author,parent,include,fields,page, andper_pageDefaults are lean:
id,post_type,status,date,slug, andtitle_rendered. Heavier fields are opt-in viafields; raw fields require edit access, while rendered fields can be requested for readable posts.Companion Core PR: WordPress/wordpress-develop#12195
Security
Uses a coarse capability gate plus per-post read/edit checks. Missing, unreadable, mismatched, and unexposed posts return the same not-found response.
Tests
PHPUnit integration tests cover permissions, fields, single-post modes, query mode, and
include. The E2E spectests/e2e/specs/abilities/core-read-content.spec.jsexercises the client-side ability modes andinclude.Manual testing
Paste this in the browser console on an editor/admin page after enabling AI. It creates two published posts and one draft, then exercises query mode, draft query mode, query
include, single-post byid, and single-post bypost_type+slug. Each ability call logs{ input, output }.