Out of curiosity, (a) did you receive this error at the start of a session or in the middle of it, and (b) did you manage to find/confirm valid findings within the scope/codebase 4.7 was auditing with Sonnet/yourself later on?
I just gave 4.7 a run over a codebase I have been heavily auditing with 4.6 the past few days. Things began soothly so I left it for 10-15 minutes. When I checked back in I saw it had died in the middle of investigating one of the paths I recommended exploring.
I was curious as to why the block occurred when my instructions and explicitly stated intent had not changed at all - I provided no further input after the first prompt. This would mean that its own reasoning output or tool call results triggered the filter. This is interesting, especially if you think of typical vuln research workflows and stages; it’s a lot of code review and tracing, things which likely look largely similar to normal engineering work, code reviews, etc. Things begin to get more explicitly “offensive” once you pick up on a viable angle or chain, and increase as you further validate and work the chain out, reaching maximum “offensiveness” as you write the final PoC, etc.
So, one would then have to wonder if the activity preceding the mid-session flagging only resulted in the flag because it finally found something seemingly viable and started shifting reasoning from generic-ish bug hunting to over exploitation.
So, I checked the preceding tool calls, and sure enough…
What a strange world we’re living in. Somebody should try making a joke AUP violation-based fuzzer, policy violations are the new segfaults…
I view this paradox as just an effect of poor framing. We should not look at it as “I am against intolerance/hatred/XYZ”, but “I want to minimize intolerance/hatred/XYZ.” The first focuses on local, case-by-case contexts, the latter in aggregate. Some XYZs, in some contexts, have properties that make them effective local tools to mitigate themselves in an aggregate context, which is probably a better candidate paradox here.
the claim is that it moved sales forward in time, but it'll have a corresponding dip in sales later, whereas a good sales campaign increases total volume (virtually no dip, brings in new customers, etc)
If Maduro stole the election from someone else, and the US does not put that someone else in power, then what does that mean? If the US exercises their own decision making and judgement when installing someone in Maduro’s place and overrides or eclipses the will of the Venezuelan people, then how is this in support of democracy?
Seems silly to ignore that the last date in your list had an event closer to what OP is referring to than any other year, no? Considering he was already crying election fraud in 2016 you could certainly view this as a line with upwards slope…
Alrighty then, in a few years you can test your model’s accuracy against my prediction based on history and an understanding of how our laws and civics actually work.
Can you truly not see the fundamental difference here? Taking drugs is voluntary and the risk of drugs being laced is known by effectively everyone. Comparing THAT to people getting incinerated in their office place is nothing short of daft and insulting.
I just gave 4.7 a run over a codebase I have been heavily auditing with 4.6 the past few days. Things began soothly so I left it for 10-15 minutes. When I checked back in I saw it had died in the middle of investigating one of the paths I recommended exploring.
I was curious as to why the block occurred when my instructions and explicitly stated intent had not changed at all - I provided no further input after the first prompt. This would mean that its own reasoning output or tool call results triggered the filter. This is interesting, especially if you think of typical vuln research workflows and stages; it’s a lot of code review and tracing, things which likely look largely similar to normal engineering work, code reviews, etc. Things begin to get more explicitly “offensive” once you pick up on a viable angle or chain, and increase as you further validate and work the chain out, reaching maximum “offensiveness” as you write the final PoC, etc.
So, one would then have to wonder if the activity preceding the mid-session flagging only resulted in the flag because it finally found something seemingly viable and started shifting reasoning from generic-ish bug hunting to over exploitation.
So, I checked the preceding tool calls, and sure enough…
What a strange world we’re living in. Somebody should try making a joke AUP violation-based fuzzer, policy violations are the new segfaults…