I'm trying to use computer use and browser use (via playwright MCP) in my work.
Computer use is a hit and miss (mostly miss), but playwright MCP often works very well.
The downside is it takes a lot of time to complete even easy tasks.
For example, to automate processing emails, it needs to
1. go to Gmail
2. log in to Google if necessary (This often requires two step verification so it's hard to completely automating, but possible)
3. read the latest mail
4. check the content and choose the action
- if needed, reply the email
- if it mentions tasks, add them to the todo list
- if it mentions schedules, add them to the calendar
5. repeat for all emails based on specified conditions.
And each step requires dozens of DOM (a11y tree) analyzes and actions (fill username/password input, check keep logging in, click submit button, etc).
Based on the model used, each step can take ~100s.
So easy tasks can easily add up to tens of minutes or even hours.
For frequently used tasks, I write skills like /logging-in, /read-latest-emails, using playwright scripts and let the agent choose them
And based on the email content, the agent chooses other tools like /write-reply, /add-todo, /add-event, etc, so that the model can only focus on the core tasks requiring thinking.
It reduces the execution time drastically.
But it can buries important business logic in the playwright scripts, not the agent's instructions.
For examples, simplified steps to add TODO items are like;
1. read the email
2. check if it's about todos, then decide to add them to Asana
3. extract and summarize the title, content, priority, due date, tags, etc.
3. access to Asana (log in if necessary)
4. check if there are similar tasks
5. if not, add the tasks
This can take tens of minutes, and each step can have important business logic, like;
- how to decide the priority and due date
- how to choose tags based on the content
- how to decide if two tasks are similar
This information should be read and updated by not only developers, but managers and other teams.
And if I write those steps in skills with playwright scripts, it improves the speed, but all those business logic are buried in the code, so not accessible by non-technical people.
It's also error-prone because web sites often tweak the UI and scripts can stop working.
So it's very convenient if the agent processes these step once, then decides it's worth writing the playwright script so that the next time those mundate processs can be executed instantly.
With automatic skill generation, the agent decides by itself if there are workflows worth writing skills with playwright scripts, like /log-in, /extract-information, /check-similar-tasks, /add-tasks.
Like Just-In-Time compiler, the skills are a byproduct of the agent instruction, all business logic are written in the agent's instruction, and doesn't need to be updated manually nor tracked in a version control system.
This can reduce a lot of execution time and API cost, and be applied other than browser automation, like computer use or any other agentic tasks if it's possible to write automation scripts for tasks not requiring thinking.
So I skimmed several articles and the reasons why the Theranos CEO was sentenced to 11 years are
1. The scale of the fraud was too big
2. From emails it seemed she intentionally tricked investors
3. The product, medical equipment, endangered patients.
I think this can be applied to Tesla too (though I'm not sure there is enough evidence of 2). Shouldn't someone in charge be sentenced to at least a few years?
Right, I've also heard your (1) above expressed as "she basically stole from the wrong set of people -- rich and powerful".
Kinda-sorta off-topic (but not really), it reminds me of Charlie Javice. She sold a database of college loan applicants to JP Morgan for $175 million -- it later turned out that she had fabricated most of that data.
I think the big difference is that criminal wire fraud depends on a "clear scheme to defraud with intent". Tesla/Musk can argue that they thought they would delivery - They've been making claims that FSD was coming for years and have been slowly making deliveries towards FSD, its just that its harder/taken longer than expected and without a smoking gun (email chains like in the Holmes case) it would be very hard to prove.
They may have committed false advertising or "failed to deliver on contract" but they are civil matters, which could still involve big payouts, but not prison time.
There's a corpus of work that could help there. Tesla was forced to add disclaimers like "Elon's statements are aspirational and do not necessarily represent engineering reality", as well as quotes from him on investor calls where he's described (in 2009, I believe) FSD as a "solved problem, we're just implementing", and five years later, "Our highest priority is solving the problem of FSD". But it seems possible that there comes a time when an ambitious prosecuting attorney or attorney general pushes for this and the discovery that comes with (though I have near zero confidence that even then, that discovery won't already be thoroughly crippled by document retention policies or outright fuckery by Elon).
It's exciting to see that the new CEO of Apple is a hardware guy.
I was just thinking about what had been avoiding enshittification, and Apple's hardware was the only thing I came up.
All other stuff, all products from Google, MS, Facebook, Twitter, and even Nvidia though the performance was improved has gone downhill.
It's not only tech companies, but fast food, car manufacturers, real estate, and many others, if it wasn't shit from the start like consulting, healthcare, and marketing.
They have flaws, like not allowing users to repair the hardware, but well, at least it's consistent.
I really hope Apple (hardware at least) will remain free from enshittification.
Do you just give your Google account to OpenClaw, or create a separate account with limited permission? I'm worried that OpenClaw decides to create an entire website on GCP project without asking if they see a message like "have you already developed and deployed the management dashboard?"
In my case, I often find life goals and enjoy the journey when I'm mentally healthy, not vice versa.
I can't control my mood, but when I am positive, I start a new hobby like dancing or playing an instrument, cook healthy, lift, sleep well, study new things, etc.
But when I'm depressed, I lose all interest in my life goals, eat junk food, skip exercise, and browse the Internet all night. I can't even enjoy my hobbies anymore.
It's always my mood that comes first, then I can find life goals and naturally do all healthy stuff.
Funnily, when I'm mentally healthy I also visit Hacker News frequently, but when I'm depressed all I do is infinite scrolling Reddit/TikTok.
It could be a bit like that elusive thing called motivation. "Just do it" seems so annoying when people say it but in my case sometimes its the only way to start building momentum. What im saying is dont wait for mood, perhaps the mood will develop once you obtain momentum on a goal or task.
Yeah I do hold this attitude and try to side brush "mood", which I believe is a mere suggestion, not a command. However I noticed a completely different "mode" when I'm operating IN project and BETWEEN projects.
When I'm IN project, "just do it" works very well. But when I'm BETWEEN projects (this is when I completed a project, and dived into the next project, but found out that I did not enjoy it or got lost or whatever), "just do it" only works when "just do it" -- it doesn't really create a focus needed to move the projects forward. What I got frustrated is that this IN BETWEEN period could take multiple months to get out, which is a super waste of time. If only I could figure out as soon as I completed the previous project, I'd achieve so much more -- you see, for the type of projects I'm working on, I could not afford to wait for several months, or even several weeks -- because this is not my daily job, so if I wait for too long I'm going to lose the knowledge -- it's like muscle training, you can't stop for several weeks and hope muscle retains.
I use the “just do it” trick a lot myself. It was something I discovered when I was in my early 20s, hiking the Appalachian Trail, which requires you to get up and move every day. My partner and I did not have much money, so if we failed to finish, we would not have a second chance. I remember waking up one morning after a rainstorm, realizing that I left my boots outside the tent. They were cold and wet. Putting them on was going to be unpleasant. I thought “I just need to do it so that I can have other things that I want in my life.” Something clicked in that moment. Now, whenever I don’t want to do something I ask “does it help me do or be what I want?” It helps a lot. From big goals (eg, earn a degree, get the job I want, etc) to little tasks (take out the garbage, clean the toilets, etc). Oddly I find the little jobs to be the hardest, probably because although I recognize that living in a house with clean toilets is something I want, it’s not obviously connected with a motivating goal. This mental trick is very helpful for the little tasks.
The worst blunder I made was when I explored cloud resources to improve the product's performance.
I created a GCP project (my-app-dev) for exploring how to scale up the cloud service.
I added several resources to mock the production, like compute instances/cloud SQL/etc, then populated the data and run several benchmarks.
I changed the specs, number of instances and replicas, and configs through gcloud command.
But for some reason, at one point codex asked to list all projects;
I couldn't understand the reason, but it seemed harmless so I approved the command.
$ gcloud projects list
PROJECT_ID NAME PROJECT_NUMBER
my-app-test my app 123456789012
my-app-dev my app 234567890123 <- the dev project I was working on
my-app my app 345678901234 <- the production (I know it's a bad name)
And after this, for whatever reason it changed the target project from the dev (my-app-dev) to the production (my-app) without asking or me realizing.
Of course I checked every commands.
I couldn't YOLO while working on cloud resources, even in dev environment.
But I focused on the subommands and its content and didn't even think it had changed the project ID along the way.
It continued to suggest more and more aggressive commands for testing, and I approved them brain-deadly...
This is part of the reason deployments to production cloud environments should:
1. Only be allowed via CI/CD
2. All infra should be defined as code
3. Any deployment to production should be a delayed process that also has a human-approval step in the workflow (at least one, if not more)
(Exactly where that review step is placed depends on your organisation - culture, size, etc.)
And anyone that does need to touch production should do so from an isolated VM with temporary credentials. Developers shouldn't routinely have production access from their terminal. This last aspect is easy and cheap to set up on AWS. I presume it's also possible in Google Cloud.
They're far from facts, but have an important advantage over most other sources: the bettors are motivated to predict truth.
News sources are motivated to get clicks, to appeal to certain audiences, and to retain tribal customers. None of these create incentives for truth. You can seek out smart, well-informed and principled journalists who will prioritize truth-seeking over money-making. There are some. But the fact remains you are relying on character to override incentives. With prediction markets, incentives and truth are naturally aligned. This makes them a powerful and valuable resource imo, even if there is a lot of scumminess that comes along for the ride. The insiders, more than anyone, are contributing to the truth signal.
on the other hand, similar to that one old assassination page, where you bet on the death date of people, it might encourage someone to make an event happen and thus fabricate the insider knowledge if the price is high enough.
So the feedback from prediction market turns around, so you can essentially buy events if you put enough money in.
They are motivated to pick what they believe is most likely to happen. The develope their idea of what is most likely to happen for the news. The reporters use their bets to wrote stories predicting what will happen.
First, you have inside traders. Then, among legitimate bettors, you have smart people using multiple data sources (not just the "news") and doing sophisticated analysis that most journalists cannot do, and are not motivated to do -- again, because their incentives are different.
Smart people cannot predict things by 'research'. "Will the US strike Iran by X date" going from 20% likelihood to 80%+ within hours points simply to insiders.
You can do research to know the US would strike, there's no other point in moving multiple carriers over to somewhere. But exactly WHEN is not researchable. This applies to most other bets. So lets stop pretending there's anything more than 2 cohorts, insiders and degenerate gamblers.
It's an empirical fact that smart people can predict things by doing research. See Tetlock's book Superforecasting.
I've been doing it profitably myself for almost 10 years now. I have zero special inside knowledge, and no access to any other non-public information.
> Will the US strike Iran by X date
Last year I did think the market for a strike on Iran was significantly underpriced given the information and conditions within a specific frame of time.
I don't think every smart person can just pop into prediction markets and print money, but I know many smart people who are long-term winners. I also don't try to knock people as degenerate when they have genuine talent.
You haven't been profitable for 10 years on prediction markets and you being profitable doesn't mean anything in regards to insiders or the rigging of a market.
I use c-q for prefix key because it doesn’t conflict with common zsh and vim bindings.
Because the author suggested swapping caps lock and control key, I also recommend mapping escape key at the control key and change the behavior based on whether another key is pressed. For example, if you press control + a, it sends c-a, but if you only press control key and then release, it sends escape. It makes your vim life (and in general) a lot easier. You don’t have to compete the most variable real estate on the keyboard, right next to the A key.
For most bindings like moving, resizing, and splitting,I emulate vim bindings.
Also, -r flag for bind-key command is impotent, because it enables to repeat commands like changing the pane size or move focus. You don’t have to press prefix key each time.
If you want to get fancy look with minimal setting, use plugins like nord tmux theme.
For example, to automate processing emails, it needs to 1. go to Gmail 2. log in to Google if necessary (This often requires two step verification so it's hard to completely automating, but possible) 3. read the latest mail 4. check the content and choose the action - if needed, reply the email - if it mentions tasks, add them to the todo list - if it mentions schedules, add them to the calendar 5. repeat for all emails based on specified conditions. And each step requires dozens of DOM (a11y tree) analyzes and actions (fill username/password input, check keep logging in, click submit button, etc). Based on the model used, each step can take ~100s. So easy tasks can easily add up to tens of minutes or even hours.
For frequently used tasks, I write skills like /logging-in, /read-latest-emails, using playwright scripts and let the agent choose them And based on the email content, the agent chooses other tools like /write-reply, /add-todo, /add-event, etc, so that the model can only focus on the core tasks requiring thinking. It reduces the execution time drastically.
But it can buries important business logic in the playwright scripts, not the agent's instructions. For examples, simplified steps to add TODO items are like; 1. read the email 2. check if it's about todos, then decide to add them to Asana 3. extract and summarize the title, content, priority, due date, tags, etc. 3. access to Asana (log in if necessary) 4. check if there are similar tasks 5. if not, add the tasks This can take tens of minutes, and each step can have important business logic, like; - how to decide the priority and due date - how to choose tags based on the content - how to decide if two tasks are similar This information should be read and updated by not only developers, but managers and other teams. And if I write those steps in skills with playwright scripts, it improves the speed, but all those business logic are buried in the code, so not accessible by non-technical people. It's also error-prone because web sites often tweak the UI and scripts can stop working.
So it's very convenient if the agent processes these step once, then decides it's worth writing the playwright script so that the next time those mundate processs can be executed instantly.
With automatic skill generation, the agent decides by itself if there are workflows worth writing skills with playwright scripts, like /log-in, /extract-information, /check-similar-tasks, /add-tasks. Like Just-In-Time compiler, the skills are a byproduct of the agent instruction, all business logic are written in the agent's instruction, and doesn't need to be updated manually nor tracked in a version control system.
This can reduce a lot of execution time and API cost, and be applied other than browser automation, like computer use or any other agentic tasks if it's possible to write automation scripts for tasks not requiring thinking.
reply