[Feature Request]:  Add token usage tracking to GoogleGenAI structured_predict methods

## Feature Description

The `GoogleGenAI` LLM integration should expose token usage metadata for structured prediction methods (`structured_predict`, `astructured_predict`, `stream_structured_predict`, `astream_structured_predict`).

Token tracking works correctly for `chat()`, `achat()`, `complete()`, and `acomplete()` via the `chat_from_gemini_response()` utility function which extracts `usage_metadata` and populates `additional_kwargs` with `prompt_tokens`, `completion_tokens`, and `total_tokens`.

However, structured prediction methods bypass this utility and return only the parsed Pydantic model, discarding all token usage information from the API response.

**Expected behavior:** Token usage should be accessible for all LLM methods, including structured predictions.

| Method | Returns | Token Tracking? | Raw Response Access? |
|--------|---------|-----------------|---------------------|
| `chat()` | `ChatResponse` | ✅ Yes | ✅ `response.raw` |
| `achat()` | `ChatResponse` | ✅ Yes | ✅ `response.raw` |
| `complete()` | `CompletionResponse` | ✅ Yes | ✅ `response.raw` |
| `acomplete()` | `CompletionResponse` | ✅ Yes | ✅ `response.raw` |
| `stream_chat()` | `ChatResponseGen` | ✅ Yes | ✅ `response.raw` |
| `astream_chat()` | `ChatResponseAsyncGen` | ✅ Yes | ✅ `response.raw` |
| `structured_predict()` | `Model` (Pydantic) | ❌ No | ❌ No |
| `astructured_predict()` | `Model` (Pydantic) | ❌ No | ❌ No |
| `stream_structured_predict()` | `Model` (yielded) | ❌ No | ❌ No |
| `astream_structured_predict()` | `Model` (yielded) | ❌ No | ❌ No |

<details>
<summary>📁 Code reference</summary>

**Working implementation:** `chat_from_gemini_response()` in `utils.py` lines 167-178

```python
if response.usage_metadata:
    raw["usage_metadata"] = response.usage_metadata.model_dump()
    additional_kwargs["prompt_tokens"] = response.usage_metadata.prompt_token_count
    additional_kwargs["completion_tokens"] = response.usage_metadata.candidates_token_count
    additional_kwargs["total_tokens"] = response.usage_metadata.total_token_count
```

**Missing implementation:** `structured_predict()` in `base.py` lines 584-644

```python
# response.usage_metadata exists but is discarded
if isinstance(response.parsed, BaseModel):
    return response.parsed  # No token metadata attached
```

</details>

---

## Reason

**What is stopping LlamaIndex from supporting this feature today?**

The `structured_predict()` implementation calls `self._client.models.generate_content()` directly and returns `response.parsed` without extracting `usage_metadata` from the response object. The data exists in the response, but is discarded.

**What existing approaches have not worked for you?**

1. **Phoenix/Arize observability** - Transactions using `structured_predict()` appear in traces without token counts, making it impossible to benchmark across methods.

2. **TokenCountingHandler** - Cannot count tokens for structured predictions, breaking cost analysis.

3. **Workaround using `chat()`** - While technically possible to pass `generation_config={"response_schema": MyModel}` to `chat()`, this requires manual JSON parsing, loses the convenience of `structured_predict()` returning typed objects, is undocumented, and creates API inconsistency.

4. **Thinking models** - Gemini 3.1, 3, and 3 Flash Lite have reasoning/thinking capabilities with `thoughts_token_count`. Engineers are unaware that structured predictions have untracked thinking tokens.

---

## Value of Feature

1. **Observability parity** - Phoenix OpenInference traces for structured predictions currently lack token counts, making it difficult for engineers to benchmark across different methods.

2. **Cost tracking** - Teams allocating costs by token usage cannot accurately track structured prediction usage.

3. **Thinking/reasoning models** - Modern Gemini models (3.1, 3, 3 Flash Lite) perform reasoning with `thoughts_token_count`. Without tracking, engineers cannot measure reasoning token efficiency or optimize prompt strategies.

4. **API consistency** - Users expect all LLM methods to return consistent metadata. The current gap creates confusion—why does `chat()` show token counts but `structured_predict()` doesn't?

5. **Downstream tool compatibility** - Tools like MLFlow, Phoenix, and custom callback handlers expect token counts in `additional_kwargs`. Structured predictions break this contract.

**Impact if not fixed:**

- Engineers may not realize structured predictions aren't being tracked
- Production systems have incomplete observability data
- Cost allocation for structured workflows is impossible
- Comparison benchmarks between methods are incomplete

---

## Related Issues

- **#20218** - Missing token usage information in GoogleGenAI metadata for MLFlow Tracing (Closed - Fixed for `chat()`/`achat()` only, `structured_predict()` methods not addressed)
- **#17736** - StructuredLLM - Add raw completion response alongside structured output (Open - Broader request for all LLMs and all raw response fields)
- **#19293** - No Input/Output Token count for Gemini 2.5 models (Open - May be related; reports missing token counts for Gemini 2.5 in instrumentation)
- **#19662** - Get thoughts_token_count from gemini response (Closed - Fixed in `chat_from_gemini_response()` but `thoughts_token_count` still not extracted in `structured_predict()`)

**#20218 is the closest predecessor** - it was closed after fixing token tracking for `chat()` methods, but the fix did not extend to `structured_predict()` methods. This issue effectively completes the work started in #20218.


Method	Returns	Token Tracking?	Raw Response Access?
`chat()`	`ChatResponse`	✅ Yes	✅ `response.raw`
`achat()`	`ChatResponse`	✅ Yes	✅ `response.raw`
`complete()`	`CompletionResponse`	✅ Yes	✅ `response.raw`
`acomplete()`	`CompletionResponse`	✅ Yes	✅ `response.raw`
`stream_chat()`	`ChatResponseGen`	✅ Yes	✅ `response.raw`
`astream_chat()`	`ChatResponseAsyncGen`	✅ Yes	✅ `response.raw`
`structured_predict()`	`Model` (Pydantic)	❌ No	❌ No
`astructured_predict()`	`Model` (Pydantic)	❌ No	❌ No
`stream_structured_predict()`	`Model` (yielded)	❌ No	❌ No
`astream_structured_predict()`	`Model` (yielded)	❌ No	❌ No

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request]: Add token usage tracking to GoogleGenAI structured_predict methods #21106

Feature Description

Reason

Value of Feature

Related Issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request]: Add token usage tracking to GoogleGenAI structured_predict methods #21106

Description

Feature Description

Reason

Value of Feature

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions