In numerical analysis, elementary function membership, like special function membership, is ambiguous. In many circumstances, it’s entirely reasonable to describe the natural logarithm as a special function.
> Remember the whole point of fair use is to benefit society by allowing reuse of material in ways that don't directly copy large portions of the material verbatim.
If you care exclusively about numerical stability and performance, why _this_ set of operators (e.g., there’re plenty of good reasons to include expm1 or log1p and certainly trigonometric functions)? It’d be an interesting research problem to measure and identify the minimal subset of operators (and I suspect it’d look differently than what you’d expect from an FPU).
If you care exclusively about minimalism, why not limit yourself to the Meijer-G function (or some other general-purpose alternative)?
I was at Bell during the options “debate.” I think something that this otherwise wonderful article misses is that some believed that commands were never intended to be the only way to use the system as the shell was intended to be just one of the many user interfaces that Research Unix would provide. From that perspective, it was entirely reasonable to believe that if a command was so complex that it needed options than it was likely more appropriate for one of the other user interfaces. I believe this is unfortunately now understood by some as it shouldn’t exist. However, because of the seemingly instantaneous popularity of the shell and pipelining text it became synonymous with Unix. It’s a shame because McIlroy had a lot of interesting ideas around pipelining audio and graphics that, as far as I know, never materialized.
>From that perspective, it was entirely reasonable to believe that if a command was so complex that it needed options than it was likely more appropriate for one of the other user interfaces
What would the non-shell interface to commands for text processing pipelining (e.g. sort, cut, grep, etc., all of which absolutely need options to function) have looked like? Some people to this day believe that any text processing more complicated than a simple grep or cut should be done in a stand-alone script written in a domain-specific language (e.g. Perl or awk), rather than piping shell commands to each other.
Personally, I’m glad we got the best of both worlds—having command line tools with a dizzying array of options doesn’t preclude being able to accomplish the same tasks with a more verbose stand-alone script. It is often far faster to write shell pipelines to accomplish fairly involved tasks than to write a stand-alone script. The classic example is the (in)famous McIlroy vs. Knuth: six lines of McIlroy’s shell pipeline accomplished the same thing as dozens of lines of Knuth’s “literate program,” while being equally as understandable [0].
>It’s a shame because McIlroy had a lot of interesting ideas around pipelining audio and graphics that, as far as I know, never materialized.
I would love to hear more about this. The UNIX shell is amazing for pipelining (most) things that are easily represented as text but really falls flat for pipelining everything else.
How is it a misconception? My overall point was that shell oneliners are often much faster to quickly bang out for a one-off use case than writing a full program from the ground up to accomplish the same thing. This is demonstrated to a very exaggerated degree in the Knuth vs. McIlroy example, but it also holds true for non-exaggerated real-world use cases. (I had a coworker who was totally shell illiterate and would write a Python script every time they had to do a simple task like count the number of unique words in a file. This took at least 10 times longer than someone proficient at the shell, which one could argue is itself a highly optimized domain-specific language for text processing.)
If your point is that the shell script isn't really the same thing as Knuth's program: sure, the approaches weren't algorithmically identical (assuming constant time insertions on average, Knuth's custom trie yields an O(N) solution, which is faster than McIlroy's O(N*log N) sort, though this point is moot if you use awk's hashtables to tally words rather than `sort | uniq -c`), but both approaches accomplish the exact same end result, and both fail to handle the exact same edge cases (e.g. accent marks).
The task Knuth was given was to illustrate his literate programming system (WEB) on the task given to him by Bentley, which meant writing a Pascal program "from the ground up", and (ideally) the program containing something of interest to read.
If instead of writing a full program as asked, he had given some cop-out like “actually, instead of writing this in WEB as you asked, I propose you just go to Bell Labs or some place where Unix is available, where it so happens that other people have written some programs like 'tr' and 'sort', then you can combine them in the following way”, that would have been an inappropriate reply, hardly worth publishing in the CACM column. (McIlroy, as reviewer, had the freedom to spend a section of his review advertising Unix and his invention of Unix pipelines, then not yet as well-known to the general CACM reader.)
So while of course shell one-liners are faster to bang out for a one-off use-case, they obviously cannot accomplish the task that was given (of demonstrating WEB). (BTW, I don't want to too much repeat the earlier discussion, but see https://codegolf.stackexchange.com/questions/188133/bentleys... — on that input, the best trie-based approach is 8x faster than awk and about 200x faster than the tr-sort script.)
The shell's advantage is that of the pipeline components don't need to suck the whole file in so it can potentially operate on much larger files without running out of memory. I think only "sort" is problematic and at least it's a merge sort.
In Python you could use a generator but it would get a little more complicated and you'd still have to add all the words to set() but hopefully the number of different words is not that great.
The trie approach is quite memory efficient and that can matter.
I'm fairly sure `open` is a generator and doesn't load the whole file into memory. So you wouldn't hit a memory error unless like you said the amount of unique words is high enough.
Hah, I have a friend who spent a large chunk of an undergraduate summer internship at Google porting a >50k line bash script (that was used in production!) to Python. It was not their most favorite summer, to say the least.
It's only possible if you can identify large portions of the 50k original lines as having been previously implemented by other components (python modules, microservices, etc.), or that large portions are dealing with cases that are guaranteed to no longer arise (so you either produce different results or error out if you detect them).
> What would the non-shell interface to commands for text processing pipelining (e.g. sort, cut, grep, etc., all of which absolutely need options to function) have looked like? Some people to this day believe that any text processing more complicated than a simple grep or cut should be done in a stand-alone script written in a domain-specific language (e.g. Perl or awk), rather than piping shell commands to each other.
I have no idea what the original intention was, but I could see the interface being Emacs or Vi(m).
A workflow I use a lot in Emacs with eshell is to pipe command output to a buffer, manipulate the buffer using Emacs editting commands (including find/replace, macros, indent-region, etc.), and then run another shell command on it, write it to a file, or copy/paste it somewhere else.
It's not for every situation but it's a lot faster than coming up with a complicated shell "one-liner".
The problem there is that you have to rethink your solution if you decide you want to turn your buffer manipulation into a reusable command. I like Emacs, but the easy transition from pipeline to shell script is a big point in pipelines' favor.
It's not really a problem, though. 9 times out of 10 shell one-liners are single use, and when they're not, I want something more readable than a one liner, anyway.
I haven't looked at this in years, but IIRC Knuth's program could be built and run on almost any OS that had a Pascal (?) compiler, whereas McIlroy's solution obviously required a Unix-like shell, piping, and the necessary commands/tools.
Interesting, any chance you could expand on these 'other user interfaces'? I'm not really familiar with Unix itself, but I've always considered Linux a shell-first OS (as opposed to Windows (NT), which I consider a GUI-first OS).
The other environment that is still popular today is the “statistical environment” that Rick Becker, Allan Wilks, and John Chambers created. It eventually became “S” and John would essentially recreate it as “R.” It’s a very nice environment for performing statistics and graphics together.
I like to see it this way: A shell /wraps/ the kernel. You cannot issue system calls directly, but a program that handles user input generically (and ideally dynamically), can do this for you. A desktop environment, Emacs, and to an increasing degree web browsers are all different "shells".
A shell is a program dedicated to allowing an operator to launch other programs. It can be as simple as a menu or as complex as a COM-interfaceable GUI with graphical desktop metaphor. It's often configured to but not strictly required to automatically launch on user login.
Any user-space executable can issue system calls, the shell isn't special there.
The most popular today outside of the shell environment is the statistical environment “S.” John Chambers would recreate it as “R” and I understand that it’s very popular and does a nice job of performing statistics and graphics together.
It was very primitive. It was essentially a mechanism for composing operations on vectors. I don’t know for certain but I would guess that it was inspired by IBM and their work on APL.
reply