GSoC Midterms, CPAN and a British Tar

First of all, all our GSoC student have passed their midterms! That is really awesome!

That means that we already have something that works when it comes to HTTP::UserAgent, but I guess that the other projects do also have something deliverable.
When you install HTTP::UserAgent today, you will not only get modules that provide HTTP::Request, HTTP::Response or HTTP::Message classes, you also get tools like http-download, http-request and http-dump that are very nice command line tools as known from Perl 5’s LWP::UserAgent distro.

In the past weeks HTTP::UserAgent was already capable of fetching simple sites even when that was slow. I took about 20 to 30 seconds to fetch a few hundred bytes. Also we had problems when it came to chunked transfer encoding, where don’t get the entire site at once. And finally, when you fetch text like html you want to get a string in the right encoding, and we always tried to encode it as utf8.

The first thing we fixed was the chunked transfer encoding actually, since my pet project P6CPAN needs to fetch the MIRRORED.BY file, which is just too big to fit into a single chunk. Fortunally, I already had a prototype that does only that in panda, which made it easy to port it to the sane setup in HTTP::UserAgent.

After that we took a look at the encoding of strings. In case we get a meaningful Content-Type header, we pull out the charset and use that to encode our string, and fall back to utf8 when the charset or that header is not present.

Fixing the poor performance was not really tricky in fact. I always suspected the IO::Socket.lines method, because this one does a lot of magic to find the next delimiter in the buffer we recieved so far. And since we actually do not care about lines when we want to recieve the document, we do not need that kind of overhead.
It answer was simple and shows the sanity and power of Perl 6. We turned the call to .lines and the line based regex matches into a grammar, that can match on lines like a gazillion times faster. The grammar is only in charge of parsing the headers, wich meant that we needed a way to split our buffer by CRLFCRLF. I hope that this will land in the spec and rakudo’s guts as Blob.split in near future.

So what about CPAN? As of today we can fetch the list of mirrors using the latest work of HTTP::UserAgent. Can fetch the gzipped p6dists.json, p6provides.json and p6binaries.json files to know what is up there on CPAN:

$ panda --github --cpan search NativeCall
Resources on github:
NativeCall * Call native libraries
GTK::Simple * Simple GTK 3 binding using NativeCall
Crypt::Bcrypt 0.5.0 An implementation of bcrypt using
NativeCall
Resources on CPAN:
NativeCall v1

Now that is basically it. When we fetched a tarball of, say NativeCall, we get a tar.gz file. We can unzip this tarball but then? There is no module in the Perl 6 world that can peek tar files.
That is my current project: Port Archive::Tar from Perl 5 to Perl 6 at: https://github.com/FROGGS/p6-Archive-Tar
This is in several ways interesting. First, this is my first attempt to port a widely used Perl 5 module. Second, it will help v5 to pass more tests and it will help to satisfy some dependencies of Perl 5 dists running on v5. And last but not minor important: It shows where mangling Bufs is not ideal in rakudo. A sub called subbuf-rw got implemented and specced last week, which does the same as substr-rw for strings, or the substr sub in Perl 5 when being used in lvalue context. Buf.split will hopefully land, and perhaps more. The mix of strings and meant-to-be binary data in the Perl 5 version of Archive::Tar is the major problem when porting it, sine in Perl 6 these are two different types that do not mix well, and they probably should not do so which will lead to a bit more sanity there I hope.

Leave a comment

Filed under Uncategorized

Google Summer of Code

I am proud that I’m allowed to be a mentor this year for a not only very interesting but also very important project: LWP+TLS

The first week has passed that Filip is officially working on the basic modules that are needed in order to use LWP::UserAgent.
He also blogs in his blog in a more detailed fashion than I am going to do here. I just want to state that I do not have any concerns about the success of his project and the benefit to the Perl 6 community.

He is well in his timeframe, and already a bunch of modules is ready for testing, like HTTP::Headers, HTTP::Message, HTTP:Request and so on. You can already clone these from his repositories and install then via panda by doing panda install . in the repository folders. Today we talked about idiomatic Perl 6 and also about performance characteristics of different approaches that in the end produce the same result. The very next step is to finish Net::HTTP and then carry on to build up LWP::Simple and its toolset lwp-request, lwp-dump and lwp-download.

I am also looking forward to play with an asynchronous prototype of LWP, I think this will not only be very handy, but will also reduce overhead, for example when you want to track the progress of a download.

 

Leave a comment

Filed under Uncategorized

HERE: Implement Labels; redo HERE

Wow, how the time flies :o)

Last time I blogged here was like ten months ago, and at least according to my github profile I was not really inactive.
The truth is that blogging is really hard for me, harder than, say, spend several nights in a row to fix a single bug in a foreign project.

So, what happened in all this time? For one thing I helped to get the MoarVM backend in place, by implement file operations in the beginning. And I think v5 runs now for almost six months on MoarVM, and in the last months very very fast. This is quite important for me because when I try to implement barewords or typeglobs, I *have to* run the entire spectests to discover regressions early.
And clearly it makes a difference to spectest in less than 10 minutes compared to like 35 minutes as it was before.

Speaking of tests

The last time I blogged v5 passes 1209 tests. Now we are at 8030 tests and it parses like 42000 of Perl 5’s spectest suite.
There is more to come: Last week rakudo Perl 6 got label support for loop statementes, and since both Perl 6 and Perl 5 on rakudo share the same architecture, v5 will get label support soon. That means that we should be able to next, redo and last labeled loops with a 10-line patch to v5, running on all supported backends: MoarVM, JVM and Parrot.

Module Versioning and Installation

I also spend a lot of time to get rakudo Perl 6 closer to CPAN. We are not there yet, but very very close. I will report about that in the next weeks, then I can hopefully state that you can install Perl 6 distributions from CPAN as you do from the ecosystem on github.
This also made me work on v5 as a distribution. The goal is that it you can just install it using panda. Problem here is that that v5’s parser is written in NQP, and panda is not capable to install stuff in NQP’s lib dirs right now. The solution to this might not be an easy one, but it could be a solution for slangs in general.

Leave a comment

Filed under Uncategorized

Hack hack hack

Hi, just wanted to post that v5 is passing 1209 tests now. That is a about 400 passes more within a few days.

If you are curious, this is how it looks when I hack. The green and red numbers show pro- and regressions from the last test run:
hack

Leave a comment

Filed under Uncategorized

Tales of version five

Hi, this is my first post on my first blog, so please be kind. :)

So, what is this all about? – It is about programming. It is not about about programming in general and not about one single language. It is about programming in 2.5 languages. (I’ll explain that in a few.)

If you have not yet seen the v5 module, you might have a look at its repository. v5 lets you write Perl 5 code in Perl6, call Perl 5 subs from Perl6 and vice-versa and pass variables from/to these two languages.

You know the Inline modules? Well, it is not quite like that. – When using Inline modules you usually need to put the foreign language in a quote, this gets then sort of evaluated and accessors are created that you call from the main language. But since that sounds a bit icky it is not what v5 is for.

The set goals of v5:

  • write Perl 5 code directly in Perl 6 code, usually as a closure
  • allow Perl 6 lexical blocks inside Perl 5 ones
  • make it easy to use variables declared in an outer block (outer means the other language here)
  • provide the behaviour of Perl 5 operators and built-ins for v5 blocks only, nested Perl 6 blocks should not be affected
  • and of course: make subs, packages, regexes, etc available to the other language

And it turns out that all of the above is not just “somehow possible”, it is there*.
* for a certain definition of “there”.

And how does that work?

Most important there is Perl 6’s architecture, which allows one to define slangs (sub languages). And there is the Perl 5 grammar written by Larry Wall. The Perl 6 compiler rakudo is based on a similar Perl 6 grammar (similar in structure), so it wasn’t too hard to get these two pieces together.

After a bit of try&error and help from pmurias making it a module, there was something that allows you to do:

use v6;
{
    use v5;
    print "Hello from Perl 5 on a $^O box!"
}

That was quite nice for a bit more than a week of hacking v5. The next steps were to add thoughts, sanity and some sort of structure.

So at this point the compiler is able to switch to the Perl 5 grammar when it sees the ‘use v5′ token. The AST it builds is pretty similar to the Perl 6 one (in many cases even identical). Every variable you declare in a v5 block is a proper (typed) Perl 6 variable, this makes it easy to pass these variable around. This way you dont need to think about how to call into Perl 5 or Perl 6 land, you just do.

In the early weeks I cared about control structures. This wasn’t too bad since at a first glance the Perl 6 one look pretty much the same, but the devil is in the details: In Perl 5 a closure is a loop. So you can ‘redo’ or ‘last’ it, or wire it to a continue block. After having that the number of passing tests was in the mid 50’s, and I recognized I need a proper test suite. So for one I took the tests of Perlito but since Perlito’s goal is not the same as v5, I had to take the tests of the Perl 5 repository. The problem with that is that the test design is bad, really bad. Perl 6’s test suite is from a language specification view, while Perl 5’s is from an implementations view. It is there for one already existing compiler, with a marture (and huge) test-internal infrastructure that makes it almost not useful for a new implementations. But since there is no other option I set up roast5 repository, and started to write test-helper scripts and started to fudge the 500 test files to be somewhat nice to the parser as it was at that time.

The next important step was to support the Perl 5 special variables, so that more test files are parseable. After that I cared about the operators, these have two interesting facts. For example for ‘ 1 + “42Foo” ‘ there is a different behaviour in Perl 6 than in Perl 5. In Perl 6 it would just die because “42Foo” can’t be transformed to a number, Perl 5 however is fine with just taking the numeric part from the front and ignoring the rest. Perl 6 handled the string->number transformation in method Str.Numeric. Clearly a v5 block needs something else. So I could augment class Str and place a Perl 5-ish Numeric method that wouldn’t die but do The Right Thing, but hold on, then it would pollute the Perl 6 world too. No, that is not what we want. The solution is that the operator ‘+’ is implemented as infix:<P5+>, which calls a method called P5Numeric on its arguments. This operator ‘+’ to infix:<P5+> dispatch is made in the Perl5::Actions, that is the AST-producing step right after parsing the code. And since Perl5::Actions is only in charge for the statements for a v5 block and not for nested Perl 6 blocks, the ‘+’ operator really has two faces, one for Perl 5 and one for Perl 6. (tbh, these are two different operators now.)

Where are we now?

That is a good question, and I love to answer it.
To the date I can proudly say that control structures work, quite a bunch special variables and built-ins are supported. The operators behave more and more Perl 5-ish, and modules/pragmas like Config, overload, warnings, English are more than stubs. Heredocs are working (I really like to send a dung-bomb to its inventor), and indirect object syntax and subroutine signatures are supported to a small but useful degree.

If you want to monitor the progress these is a STATUS file in the repository, right now it says that v5 passes 797 out of 40451 tests, I’m quite happy with that. (Note that pack.t give us 14704 failing tests.)

So, what about the 2.5 languages mentioned at the top?

Ohh yes, almost forgot. I think the ‘2’ is obvious now, it is about Perl 5 and 6. The 0.5 is about NQP – “Not Quite Perl”, that is the language for example the grammar and actions of Perl 6 and v5 is written in. It looks like a subset of Perl 6 and that’s why it only gets 0.5 points, even when it is Quite Awesome(tm).
So if you stay tuned, I’ll report from now and then about the v5 project, and maybe about other (related) programming issues too.


FROGGS

2 Comments

Filed under Uncategorized