Skip to content

Add core/read-content ability#739

Open
jorgefilipecosta wants to merge 46 commits into
developfrom
add/core-content-ability
Open

Add core/read-content ability#739
jorgefilipecosta wants to merge 46 commits into
developfrom
add/core-content-ability

Conversation

@jorgefilipecosta

@jorgefilipecosta jorgefilipecosta commented Jun 16, 2026

Copy link
Copy Markdown
Member

Summary

Part of: #40

Adds the read-only core/read-content ability to the plugin, mirroring the companion WordPress Core implementation so the two stay in sync. It overrides any core-provided copy.

The ability has explicit modes:

  • single post by id, with optional post_type guard, returns the post directly
  • single post by post_type and slug returns the post directly
  • query mode returns { posts, total, total_pages } and supports status, author, parent, include, fields, page, and per_page

Defaults are lean: id, post_type, status, date, slug, and title_rendered. Heavier fields are opt-in via fields; raw fields require edit access, while rendered fields can be requested for readable posts.

Companion Core PR: WordPress/wordpress-develop#12195

Security

Uses a coarse capability gate plus per-post read/edit checks. Missing, unreadable, mismatched, and unexposed posts return the same not-found response.

Tests

PHPUnit integration tests cover permissions, fields, single-post modes, query mode, and include. The E2E spec tests/e2e/specs/abilities/core-read-content.spec.js exercises the client-side ability modes and include.

Manual testing

Paste this in the browser console on an editor/admin page after enabling AI. It creates two published posts and one draft, then exercises query mode, draft query mode, query include, single-post by id, and single-post by post_type + slug. Each ability call logs { input, output }.

const { ready } = await import( '@wordpress/core-abilities' );
if ( ready ) {
	await ready;
}

const { executeAbility } = await import( '@wordpress/abilities' );
const apiFetch =
	window.wp?.apiFetch || ( await import( '@wordpress/api-fetch' ) ).default;

const run = async ( input ) => {
	try {
		const output = await executeAbility( 'core/read-content', input );
		console.log( { input, output } );
		return output;
	} catch ( error ) {
		const output = {
			error: {
				code: error?.code,
				message: error?.message,
			},
		};
		console.log( { input, output } );
		return output;
	}
};

const suffix = Date.now();
const seededPosts = await Promise.all(
	[ 'one', 'two' ].map( ( label ) =>
		apiFetch( {
			path: '/wp/v2/posts',
			method: 'POST',
			data: {
				title: `core/read-content manual test ${ label }`,
				slug: `core-read-content-manual-${ suffix }-${ label }`,
				status: 'publish',
				content: `<!-- wp:paragraph --><p>Manual test ${ label } content.</p><!-- /wp:paragraph -->`,
			},
		} )
	)
);

const draftPost = await apiFetch( {
	path: '/wp/v2/posts',
	method: 'POST',
	data: {
		title: 'core/read-content manual test draft',
		slug: `core-read-content-manual-${ suffix }-draft`,
		status: 'draft',
		content: '<!-- wp:paragraph --><p>Manual test draft content.</p><!-- /wp:paragraph -->',
	},
} );

console.log( {
	input: { seed_posts: true },
	output: { published: seededPosts, draft: draftPost },
} );

// Query mode: readable posts with the default lean fields.
await run( { post_type: 'post' } );

// Query mode with include: specific posts, rendered content, and raw content.
await run( {
	post_type: 'post',
	include: seededPosts.map( ( post ) => post.id ),
	fields: [
		'id',
		'post_type',
		'status',
		'date',
		'slug',
		'title_rendered',
		'content_rendered',
		'content_raw',
	],
} );

// Query mode with drafts: requires a user that can edit/read drafts.
await run( {
	post_type: 'post',
	status: [ 'draft' ],
	include: [ draftPost.id ],
	fields: [
		'id',
		'post_type',
		'status',
		'date',
		'slug',
		'title_rendered',
		'content_rendered',
		'content_raw',
	],
} );

// Single-post mode: lookup by ID, with an optional post_type guard.
await run( {
	id: seededPosts[ 0 ].id,
	post_type: 'post',
	fields: [
		'id',
		'post_type',
		'slug',
		'title_rendered',
		'content_rendered',
		'content_raw',
	],
} );

// Single-post mode: lookup by post_type + slug.
await run( {
	post_type: 'post',
	slug: seededPosts[ 1 ].slug,
	fields: [
		'id',
		'post_type',
		'slug',
		'title_rendered',
		'content_rendered',
		'content_raw',
	],
} );
Open WordPress Playground Preview

@github-actions

github-actions Bot commented Jun 16, 2026

Copy link
Copy Markdown

✅ WordPress Plugin Check Report

✅ Status: Passed

📊 Report

All checks passed! No errors or warnings found.


🤖 Generated by WordPress Plugin Check Action • Learn more about Plugin Check

@jorgefilipecosta jorgefilipecosta changed the title [in progress] Add a core/content ability Add core/content ability Jun 18, 2026
@jorgefilipecosta jorgefilipecosta marked this pull request as ready for review June 18, 2026 09:25
@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message.

Co-authored-by: jorgefilipecosta <[email protected]>
Co-authored-by: gziolo <[email protected]>
Co-authored-by: peterwilsoncc <[email protected]>
Co-authored-by: galatanovidiu <[email protected]>
Co-authored-by: justlevine <[email protected]>
Co-authored-by: jasonbahl <[email protected]>
Co-authored-by: jeffpaul <[email protected]>

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

@jorgefilipecosta jorgefilipecosta force-pushed the add/core-content-ability branch 2 times, most recently from d1db766 to 88df83c Compare June 18, 2026 10:12
Comment thread includes/Main.php Outdated
@gziolo

gziolo commented Jun 18, 2026

Copy link
Copy Markdown
Member

Nice work so far! A few things to iron out:

  • It looks like some files from the core/settings work (Add a core/settings ability #691) slipped into this branch: Settings.php, SettingsTest.php, the core-settings.spec.js e2e, and the e2e-sample-settings plugin. Since this PR is meant to stand on its own, those should come out (worth double-checking the .gitignore additions too).
  • Input schema needs more thought. The options are really mutually exclusive, but the schema doesn't say so. If you pass id, none of the other params (status, author, parent, page, per_page) fit at all — they're just silently ignored. Same story with slug. The one combination that does make sense is slug together with post_type. So this should be modeled as distinct modes rather than a single flat object where anything goes.
  • I see there is ai/get-post-details in the repo doing basically a subset of this. This work should supersede it, so let's plan to drop the old one rather than ship two overlapping "read a post" abilities.

@gziolo

gziolo commented Jun 23, 2026

Copy link
Copy Markdown
Member

For reference, I'm sharing WP CLI commands related to read-only operations on posts:

  • wp post get – post can be found only by post ID, all fields are returned by default, it's possible to filter fields with --fields or --field
  • wp post list – fields returned by default (ID, post_title, post_name, post_date, post_status), it's possible to filter fields with --fields or --field, --<field>=<value> allows to pass filter results with one or more args supported by WP_Query.

@jeffpaul jeffpaul added this to the 1.1.0 milestone Jun 23, 2026
@jeffpaul jeffpaul moved this from Triage to In progress in WordPress AI Roadmap Jun 23, 2026
@jeffpaul

Copy link
Copy Markdown
Member

@jorgefilipecosta looks like some code review feedback and merge conflicts to clean up to help move this along towards merge and inclusion in the next AI plugin release (to help get some usage testing & feedback before a parallel PR is landed for core in 7.1)

@justlevine

justlevine commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

(to help get some usage testing & feedback before a parallel PR is landed for core in 7.1)

This is an unrealistic and IMO unwanted end-goal. I think we need to accept that no new core abilities will ship in 7.1, even if they make it into [email protected]

< 3 weeks is not enough time to get actual API design feedback for core. It's not even a full AI Plugin release cycle.

By all means if we can get this cleaned up in time for the next plugin release, great, but considering that they're not even behind an Experiment toggle, I wouldn't want y'all rushing it in before you or @dkotter think its viable because of a desire to squeeze it into beta1.

cc @gziolo @jmarx

@jorgefilipecosta jorgefilipecosta force-pushed the add/core-content-ability branch 2 times, most recently from 3a7df4a to b3748e6 Compare June 23, 2026 15:22
@codecov

codecov Bot commented Jun 23, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 95.26316% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.80%. Comparing base (9db55ee) to head (b0827ef).

Files with missing lines Patch % Lines
includes/Abilities/Content/Content.php 95.10% 27 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             develop     #739      +/-   ##
=============================================
+ Coverage      75.50%   76.80%   +1.30%     
- Complexity      2086     2261     +175     
=============================================
  Files             99      100       +1     
  Lines           8626     9196     +570     
=============================================
+ Hits            6513     7063     +550     
- Misses          2113     2133      +20     
Flag Coverage Δ
unit 76.80% <95.26%> (+1.30%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@jorgefilipecosta

Copy link
Copy Markdown
Member Author

Hi @gziolo, your feedback was applied.
Regarding "I see there is ai/get-post-details in the repo doing basically a subset of this. This work should supersede it, so let's plan to drop the old one rather than ship two overlapping "read a post" abilities." I plan to do that as a follow up in order to keep the scope of this PR smaller.

@peterwilsoncc peterwilsoncc left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a few notes inline.

To avoid burying the lead: I think the key question I have is why you don't include the rendered content so bots can be set up with a read-only account?

Comment thread includes/Abilities/Content/Content.php Outdated
Comment thread includes/Abilities/Content/Content.php Outdated
Comment thread includes/Abilities/Content/Content.php Outdated
Comment thread includes/Abilities/Content/Content.php Outdated
Comment thread includes/Abilities/Content/Content.php Outdated
Comment thread tests/Integration/Includes/Abilities/Content/ContentTest.php
Comment thread includes/Abilities/Content/Content.php
Comment thread includes/Abilities/Content/Content.php Outdated
Comment thread tests/Integration/Includes/Abilities/Content/ContentTest.php
Comment thread tests/Integration/Includes/Abilities/Content/ContentTest.php
@gziolo

gziolo commented Jun 24, 2026

Copy link
Copy Markdown
Member

After a closer inspection, I want to echo the point that @peterwilsoncc raised regarding the data formatting. We need to decide whether we return raw data or rendered/filtered data. This is what I see now:

  • Dates are in GMT format.
  • Title is currently filtered.
  • Content is in raw format.

What is needed will largely depend on the consumer. If they only want to present the data, then the preferred option would be using dates in the site's timezone, title, and content as rendered HTML. If they want to edit content, then maybe fetching raw data would make more sense. There are two ways to go about it:

  • Make room for both values as suggested in review.
  • Add a high-level flag that controls whether fields contain raw or post-processed fields.

@justlevine, thank you for the reminder about the timeline for the WP 7.1 release. Let's see how much work is left to iron out all the feedback raised so far. Either way, we want to get some early testing through the AI plugin. All the feedback is appreciated and will help shape this essential ability.

Comment thread includes/Abilities/Content/Content.php
Comment thread includes/Abilities/Content/Content.php
@gziolo gziolo mentioned this pull request Jun 24, 2026
3 tasks
Comment thread includes/Abilities/Content/Content.php Outdated
Comment thread includes/Abilities/Content/Content.php Outdated
Comment thread includes/Abilities/Content/Content.php Outdated
'content_rendered' => array(
'type' => 'string',
'description' => __( 'The rendered post content. Present when the post type supports the editor. Empty when withheld for a password-protected post.', 'ai' ),
),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we return both content_raw and content_rendered, this will fill up the context window with duplicated content.
Suggestion: have a scope input field(similar to REST API)

  • edit will return the raw content that an agent can modify and update on a new request
  • read it will return the "content_rendered". Probably best if we can strip html tags also (or transformed to MD :D [nice to have]) (I have no strong opinion on this one but with stripped tags we will be nice with the context window)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • read it will return the "content_rendered". Probably best if we can strip html tags also (or transformed to MD :D [nice to have]) (I have no strong opinion on this one but with stripped tags we will be nice with the context window)

Warning

Opinionated 😸

I'd be reluctant to add the required code to core given it would add quite a bit of work to each request.

Dries Buytaert wrote up his experience of providing a markdown version of his content for LLMs and finding that the agents simply crawled both versions, so I think it's pretty safe to assume that they're happy with HTML.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO that article has little to do with Markdown vs HTML and is rather about sources of truth/context rot (LLMs.txt is ineffective) and misunderstandings about how agentic discovery works (nothings going to visit your MD endpoints if you don't explicitly encourage it to).

That aside , there are other implications with exposing raw vs rendered content, including security and end-user case. As abilities are meant to be a primitive and not a MCP-specific transport, I think the way to address that in MCP (or upstream in Abilities) is an arg to filter the output.

If in the short term this is a concern, then I'd say keep content_rendered without content_raw, since even if LLMs do prefer markdown to HTML, they'd want the markdown version of the rendered html, not the one fill with custom block markup or unprocessed synced patterns etc. But longer term, IMO the solution belongs in a different layer.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That aside , there are other implications with exposing raw vs rendered content,

@jasonbahl any chance you have a few minutes to share your thoughts from on over in WPGraphQL-land? With how much influence we took in design language, it would make sense for the output ergonomics to wind up being similar, just maybe a bit flatter. Same considerations, at least 🤔

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my earlier #739 (comment), I referenced WP CLI as a good example of where good defaults solve the issue you discuss. It could be replicated here by returning the same set: ID, post_title, post_name, post_date, and post_status. All other fields remain accessible, but the consumer needs to explicitly list them in the input using the fields array.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋 @justlevine I left some notes here addressing the raw/rendered concerns along with some broader concerns: #739 (review)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the whole discussion missed my point. It's not about HTML vs Markdown, that was just a side note.

The real point: we need both content_raw and content_rendered, but never both at once.

  • editing → agent needs content_raw (block markup, shortcodes) so it can change it and write it back. Rendered HTML is useless here.
  • reading → agent needs content_rendered (what a human sees). Raw markup is just noise.

Any single call is one job or the other. So the field it doesn't need is always 100% redundant, the full post body duplicated on every call.

So I'd go with one content field that returns one or the other based on context: raw when editing is needed, rendered when it's read-only. If we want a default, probably raw only.

@jasonbahl this is basically what ?context=edit (REST) and format: RAW (WPGraphQL) already do, force the caller to declare intent. Same idea, just collapsed into a single field instead of two you opt into separately.

One more thing: if we go this way, the ability description needs updating too, so the agent knows when to pick each one (raw when it intends to edit, rendered when it just needs to read).

@justlevine justlevine Jul 2, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The real point: we need both content_raw and content_rendered, but never both at once.

This is not true. Putting aside yet again that abilities aren't meant to serve agents they're meant to be a underlying primitive to serve all last-mile APIs, an agent

  • doing migrations,
  • building patterns that match existing site designs,
  • or frontend generative ui work,

are all cases that would benefit from having both the preprocessed _raw and the full-fat _rendered data beyond the well-established non-AI ones like page builders and headless previews.

But also, worse than that, this would pollute the API with state: you need to know the context in order to be able to accomodate for the data. I've got PostObject.content but no idea whether or not it needs parsing or rendering or data sanitization anymore.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah I agree with @justlevine point I can think some cases where having rendered and non rendered content may be useful for an agent. Currently we don't return content by default, but we can request rendered or raw, or both, same for any other field.

@jorgefilipecosta jorgefilipecosta force-pushed the add/core-content-ability branch from f5136db to 75365de Compare July 2, 2026 14:17
@jorgefilipecosta jorgefilipecosta requested a review from a team July 2, 2026 14:17
@jorgefilipecosta

jorgefilipecosta commented Jul 2, 2026

Copy link
Copy Markdown
Member Author

Thank you all for the discussions and insights. It feels like the remaining discussion point is when to return rendered, raw, or both.

By default the fields we return are very lean: id, post_type, status, date, slug, title_rendered (if title is supported).
Then we have the following optional fields that are returned when requested (can always be requested): date_gmt, modified, modified_gmt, and link.
And the following fields that may be requested but may or may not be returned, depending on what the post type supports and what permissions the user has: excerpt_rendered, excerpt_protected, excerpt_raw, content_rendered, content_protected, content_raw, author, parent, title_raw.

This allows us to keep responses lean while still giving agents access to either rendered versions (e.g., a summarization agent), raw versions (e.g., an edit agent), or both (e.g., some migration or SEO/GEO analysis agent).

It is very flexible and agents can decide exactly what they need, but it also makes the code more complex, and forces agent workflows to "think" about the fields needed instead of just asking for edit fields or display fields.

I'm not opposed to the introduction of a format like @gziolo said at #739 (comment), format being display, raw and I guess full when the agent wants both?. Format would essentially pre-select a group of fields.
What happens when format is display but in the fields we include content_raw? I guess format takes precedence and fields can only select fields from that group (correct me if I'm wrong).
Another question I have is if format changes the return eg: if I set raw format do I automatically include the content_raw? In that case for consistency should not display format also include content_rendered?
Would format with 4 possible values address this: "default" - the lean format, "display" fields useful for display content_rendered etc, "raw": raw fields, "full": all fields possible. Fields would be available on display, edit or both, and fields array would extent or reduce the default set from the format. But If I am on display I can not request edit fields even if fields array passes it.

Another option could also be to make fields support the current array structure but also allow fields to be a string (display, raw, full); when it is a string, it essentially maps to a fixed array, e.g., display is equal to id, post_type, status, date, slug, title_rendered. And would avoid the format concept.

I guess we have three options we need to decide on:

  • the current, a lean return and anything above it is requested using explicit fields array (be it renderd, raw, or both).
  • A format flag that controls the set of available fields (display, raw, full), plus fields array to control what fields appear format takes precedence.
  • Make fields an array or a string the string being an alias for a set of fields.

@galatanovidiu galatanovidiu left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did another pass on the read/query paths. Direction is good and the mode split reads well now. A cluster of things need work before this is solid, most of them in format_post() / execute_get_content() and the date helpers. Leaving the detail inline, short version here:

  • Query total can disagree with posts. A fields subset that empties every row gets those rows dropped from posts, but total is read before the drop, so you can get total: 16 with posts: [].
  • date_gmt / modified_gmt are wrong on any non-UTC site. Both the normal path and the draft fallback mislabel the offset or the instant. Detail inline on format_gmt_date().
  • excerpt_rendered hides the real excerpt from editors on password-protected posts, even though content_rendered correctly shows them the content. Inconsistent, and it contradicts the field's own schema description.
  • A post whose slug is literally "0" can't be fetched by slug! empty() trap, it falls through to query mode.
  • content_not_found is dead code on every transport, because check_permission() already resolves the object and denies first. Worth a decision: keep the defense-in-depth and drop the unreachable 404, or move per-object checks into execute() so agents get a real 404 vs 403.
  • total / total_pages descriptions claim X-WP-Total headers that I don't see on /run. Question inline, might be the pagination meta not being honored.

A few non-blocking nits also inline (openWorldHint, nullable GMT types, sibling ai/get-post-details overlap, a missing test).

Comment thread includes/Abilities/Content/Content.php
Comment thread includes/Abilities/Content/Content.php
Comment thread includes/Abilities/Content/Content.php
Comment thread includes/Abilities/Content/Content.php Outdated
Comment thread includes/Abilities/Content/Content.php
Comment thread includes/Abilities/Content/Content.php Outdated
Comment thread includes/Abilities/Content/Content.php
Comment thread includes/Abilities/Content/Content.php Outdated
Comment thread includes/Abilities/Content/Content.php Outdated
Comment thread tests/Integration/Includes/Abilities/Content/ContentTest.php
Query mode dropped rows whose requested-field projection was empty while
total/total_pages still came from the unfiltered found_posts count, so a
fields subset that applied to no row could report e.g. total 16 with an
empty posts list. Keep those rows as empty objects instead, and document
that total can still exceed the returned posts when row-level permission
checks withhold entries (matching REST posts controller behavior).
The abilities run controller returns ability results verbatim and never
emits X-WP-Total / X-WP-TotalPages headers, and no consumer reads the
pagination meta flag, so drop the flag and the header claims from the
total/total_pages descriptions. Query totals stay in the response body.
Also trim the empty-projection note from the posts description.
normalize_fields() no longer string-filters the requested names or
intersects them with the supported set: the input schema's enum already
rejects unknown field names as ability_invalid_input before execution,
so the extra normalization only masked invalid requests. Add a test
asserting an unknown field name fails validation.
Make get_post_properties() the single source of truth for the ability's
post fields. It returns the field definitions keyed by name in output
order; the output schema uses the definitions directly and the input
schema fields enum uses the keys.

This removes the parallel fields list, so adding or removing a field is
a single edit and the input enum and output schema can no longer drift.
The edit-context list stays separate: it drives permission decisions in
has_explicit_edit_fields(), not the schema shape.
Apply the open review feedback on the core/read-content ability:

- Make check_permission() the authoritative permission decision for
  single-post modes and drop the duplicate per-post re-checks from the
  execute callback. Query mode keeps row-level filtering because rows
  are unknown until the query runs. not_found_error() is documented as
  the direct-invocation contract of the execute callback.
- Treat the literal slug "0" as a valid single-post lookup instead of
  falling through to query mode.
- Resolve post type capabilities through post_type_cap(), failing
  closed with do_not_allow when a capability name cannot be resolved.
- Suspend the cookie-based password gate while rendering for users who
  can edit a password-protected post, so get_the_excerpt() returns the
  real excerpt instead of the protected-post placeholder.
- Fix GMT date handling: read the stored GMT columns directly (deriving
  them from the local date for drafts) instead of get_post_datetime(),
  which reprojects GMT dates into the site timezone on non-UTC sites;
  simplify the local-date fallback and document the empty-string
  sentinel in the date field descriptions.
- Declare the ability closed-world (open_world: false) for MCP clients
  and state in the description that it requires authentication and is
  exact-match only, not full-text search.
- Cover each point with integration tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

8 participants