Diff for "DEV/Google Summer of Code 2013"

Differences between revisions 26 and 27

GNU Mailman is hoping to participate in Google Summer of Code in 2013. This page captures some ideas for student projects, links to setup guides, etc.

Table of Contents:

Contents

Required Skills
Getting Started
Getting in touch with the developers
Project Ideas
Not-fully-defined Project Ideas

Required Skills

Incoming students will need the following skills:

Intermediate python programming
Familiarity with source control (Mailman core and Postorius use bzr, Hyperkitty uses git)
Ability to set up your own development environment for Mailman

Additional desired skills may be listed with specific projects.

We're happy to help you get up to speed, but the more you are able to demonstrate ability in advance, the more likely we are to choose your application!

Getting Started

Interested in getting started with Mailman development? Here are some links for potential students and mentors:

Mailman is written in Python. For developers, the path of least resistance is probably to use python 2.7+ (sadly not all our requirements handle python 3 yet) and work under Ubuntu Linux, although it is definitely possible to do development on other platforms. It is encouraged to run your dev environment in a VM. To build a development VM follow the instructions on Build Mailman3 Virtual Machine with Vagrant and Chef.

Development work on Mailman 2.1 has been frozen for some time, so all new project ideas should be related to Mailman 3.

Getting in touch with the developers

If you're interested in participating in GSoC 2013 as a student, mentor, or interested community member, you should join the Mailman Developers mailing list: http://mail.python.org/mailman/listinfo/mailman-developers

Post any questions, comments, etc. to mailman-developers@python.org.

In addition you may be able to find us on IRC at #mailman on irc.freenode.org. If no one is available to answer your question, please be patient and post it to the mailing list as well. (We *are* the developers for Mailman -- unsurprisingly, most of us prefer to communicate via email.)

Project Ideas

Here are a few ideas for projects we would love to see completed. If you have another idea you'd like to propose (either as a prospective student, mentor, or interested community member), please send it to mailman-developers@python.org for discussion!

Web Posting Interface

One of the hopes for Mailman 3 is that we can have the archives function seamlessly from the web for people who prefer a "web bulletin board" style of interface. This means that Mailman could be used as a replacement for something like phpbb. To achieve this goal, we need to create an interface allowing logged in users to post messages from the web interface, and integrate this into the appropriate pages (e.g. the hyperkitty archives, possibly the postorius list settings pages).

The interface itself may not take the whole summer, in which case you should be prepared to build other features to make the mailman web interface usable as a web forum. Talk to the developers/users about what might be useful and suggest a few of these as backup ideas when you write your GSoC application.

Task Level: Intermediate

Desired skills (in addition to those named above): experience with django (not necessary, but helpful), web design (css/html/javascript)

RSS and/or NNTP access to Mailman archives

Not everyone wants to read mailing lists in their mail client. Sometimes you just want to ask a single question and need to look for answers, sometimes you want to just browse in a different way. While we're hoping that the new HyperKitty archive will solve a lot of potential use cases, we want to encourage interoperability with other clients such as gmane and RSS readers to make Mailman archives available in more ways.

Note: The NNTP access was a GSoC project last year that sadly was not finished; we're hoping someone can step in and take over to make sure the feature is completed this year!

Task Level: Intermediate

Desired skills (in addition to those named above): None

OpenPGP Integration

Add GPG/OpenPGP keys to a user record and do posting permissions based on authenticated messages.

On the surface, this is simple, but PGP issues for mailing lists can be different than PGP for individual users, so you may run into some edge cases. This project will require a student who is very good at the details, since it's often all too easy to slightly mess up a privacy technology and as a result violate the privacy you'd hoped to provide!

Task Level: Intermediate

Desired skills (in addition to those named above): Understanding of / experience with PGP.

Full anonymization

When a user posts to a fully anonymized list, the From field gets rewritten to <bighash>@<list.dom.ain>. Then replying to this address would forward the message to the original user.

Task Level: Intermediate

Desired skills (in addition to those named above): None.

No-logging mode

Sometimes, people may want to use Mailman for mailing lists, but don't want anyone to know they're using it. For example, they may be using it for an activist list where there is real concern about prosecution, or they may simply want to maintain privacy for their users at a greater depth.

To aid with this, we want to develop plugins and modes to support no-logging, such that a mailing list records cannot be subpoenaed.

The goal of this GSoC project could be a suite of privacy features and functions that allow setting them all at once (or in groups) for improved usability.

Task Level: Intermediate

Desired skills (in addition to those named above): None.

Authenticated REST-API in Postorius/Django

Implement a public facing REST API with authentication/authorization, possibly using OAuth.

Task Level: Intermediate

Desired skills (in addition to those named above): OAuth & Django experience would be helpful

Scripts for migrating from Mailman 2.1 to Mailman 3

Mailman 3 is nearing release, but we need someone to write scripts to migrate from Mailman 2.1 to Mailman 3, including all list settings, user settings and archives. Some of these will be easy choices because the settings are similar, but some scripts may be more complex since the way users are dealt with has changed considerably and we may want to merge and normalize settings.

Task level: Advanced beginner

Desired skills (in addition to those named above): none

Better User Settings management

In Mailman 3, users have the ability to change settings for all their lists, for individual lists, and for individual managed email addresses, but handling all of these settings could potentially be a user interface nightmare.

For example,

a list member might want to subscribe an additional address so she can post using multiple identities, but not receive email there.
a list member might go on vacation and want to temporarily suspend mail from some mailing lists, but not others
a list member might want to receive digests of some lists but not all lists

We need someone to expose these options without making it unbearably confusing for users. This project will involve not only creating new settings pages, and probably some summary pages where users can see all their settings, but also doing some basic user testing to ensure that the result is actually usable.

Task level: Intermediate

Desired skills (in addition to those named above): User interface design, django (not necessary, but helpful)

Design interface "themes" for specific types of list

The Mailman admin interface exposes a huge number of settings, but many lists will never use them. We would like to have a variety of simplified interfaces specialized to various uses of Mailman.

Some examples:

An announcement list might require member management, management of authorized posters to the list, and moderation of list postings.
A "simple" discussion list might require member management, moderation of list postings, changes to list description, and changes to things that might trigger moderation

Although it is possible to build entirely separate new interfaces to mailman, we realize that many people will want to just use postorius and configure it to suit their needs, so ideally we would have a way of creating these themes from the main interface. We might also want to use this theme system for more traditional theme elements such as colours and icons. It's not yet clear the best way to do this: we might be able to achieve it using nothing more than CSS templates (this would be very lightweight and convenient in many circumstances; a fast way for anyone to see an interface customized for their most often used options) but there may also be cases where the site admin will want to disable options in the web interface for some users and not have it possible for them to circumvent it.

We want someone to take a look at Postorius, the current Mailman web interface, and develop several proof of concept interface themes or views as well as a more general templating system to create more such themes in the future.

Task level: Intermediate

Desired skills (in addition to those named above): intermediate python, experience with django (not necessary, but helpful), web design (css/html/javascript), familiarity with common uses of mailing lists, ability to move past decision paralysis and make executive decisions (we expect there to be some community argument about the best choice when designing the architecture of this system: you need to be prepared to just choose something and do it.)

Enhance List Style Capabilities

There are lots of different ways to use a mailing list: discussion lists, announcement-only lists, anonymous lists, etc. Mailman 3 supports these as list "styles" and you can see the IStyle interface in src/mailman/interfaces/styles.py for the API and src/mailman/styles/default.py for the two built-in styles, both essentially legacy styles.

Right now, the styles are only applied when the list is created, and the list doesn't record what style was applied when it was made, however it is possible that administrators might like to change them and have relevant lists update. This project is to do basically that: find a way to maintain styles and update them in a useful way, create a web interface for administrators to modify and apply styles, and create a stable of popular styles for new lists. We may want a way to update styles without over-riding admin-set options, or even apply multiple styles to a list. An experienced student may be able to complete this project early, so applicants may want to suggest other features they would like to work on after mid-terms.

Task level: Intermediate

Desired skills (in addition to those named above): understanding of how mailing lists are used in practice, ui design & django would be helpful for the interface changes, but should not be necessary.

Log monitor

(Suggested by rsk)

Write Python scripts that monitor all of Mailman's logs, integrate what they find, and complain to the appropriate -owner addresses when bad things start happening. Bad things could be any of:

- repeated attempts to subscribe from same IP address/network block

- sudden increase in inbound traffic to otherwise-quiet list

- spike in unsubscriptions (maybe someone is forging them)

- etc.

Bonus points: integrate with HTTP and SMTP logs from which web server and whichever MTA are in play. (But this might might be better as a followon the next year. Given the diversity of the software in play and thus the diversity of the logs, it's a lot. So maybe the goal should be "don't design or code in a way that rules out doing this in the future". OTOH, often the best way to accomplish that is to try to implement one!)

Task level: Intermediate

Desired skills (in addition to those named above): shell scripting, experience with simple visualizations might be helpful

Boilerplate stripper

(Suggested by rsk)

An increasing number of devices and services are starting to forcibly append boilerplate to users' email messages. This is annoying.

I'm talking about:

        Sent from my Android
        Sent from my Droid
        Sent from my HTC
        Sent from my Nexus

as well as the stuff appended by some freemail services.

This project itself may not take the whole summer, in which case you should be prepared to build other features. Think about a few that interest you and propose them as part of your GSoC application. For example, you might want to consider other types of content filters that mailing list users might like.

Note: This may be a project that can be finished before the end of the GSoC period, so any student proposing on this should have a list of other bugs or features they'd like to work on if this is finished early.

Task level: Intermediate

Desired skills (in addition to those named above): regular expressions would probably be helpful.

Anti-spam/anti-abuse in Mailman

(Suggested by rsk)

One of the general principles of anti-spam work is that abuse control is best done as close to the origination point as possible: thus, stopping spam at your network perimeter is better than stopping it in your firewall is better than stopping it in your MTA is better than stopping it in Mailman.

However...Mailman has localized knowledge that none of the others do, so it's time to craft some serious anti-abuse clue into it -- not just to defend against incoming spam, but against *outgoing* spam, so that list operators don't find their operations blacklisted. What's needed is a flexible piece of code that holds dubious messages for moderation.

An example: freemail account hijackers are sending out messages where the entire body consists solely of a URL. That's an easy test and will stop a lot of spam that will otherwise pass, even on subscriber-only lists (because it's coming from a subscriber's account). Another example: "test" messages. They're either spam or annoying. Another example: messages whose Subject line matches certain key phrases (which I won't include here, in case your incoming mail is content filtered -- mine's not, I don't use it). Another example: any message from often-forged bogus addresses like fbi@gmail.com.

So basically the idea is: have a laundry list of tests, run them sequentially, if there are any hits, hold the message and tell the right -owner about it. The good news is that this is simple and fast: most tests are a regexp match. Also the false positive rate on some of these is zero, as in 0.000. The bad news is that once in a while there will be a false positive, but if it's rare enough and if the message need only be owner-approved to go through, then no harm. (Note: We probably need to do better than this. None of the suggested examples actually requires knowledge of the mailing list, or even that it be a mailing list. Even with that information, MTA-based filters such as SpamAssassin allow per-mailbox rules in addition to the global rules. Much of this overlaps with content filtering, which is needed for other reasons. Rejection based on list membership can actually be done at the MTA level with a milter that accesses Mailman's REST interface.)

Why *not* do this in the MTA?

MTA doesn't know about lists and subscribers and patterns.
Some servers mix mailing lists and end user traffic and different kinds of anti-spam are appropriate for each.
Messages going *out* need much much more scrutiny than messages coming in -- much more important not to be a spam source/relay then to protect own addresses.
As Mailman gains mindshare, it will come under increasing scrutiny and attack. Now would be a good time to get in front of it. Waiting until the inevitable happens will be too late.
Mailing list owners often don't run the MTA.

Task level: Intermediate

Desired skills (in addition to those named above): familiarity with antispam work

Administrative email message log

(Suggested by rsk)

Add a configurable set of options that will instruct Mailman to keep a log of

(a) all incoming adminstrative messages and

(b) all outgoing administrative messages.

(a) can be kinda sorta done by configuring the MTA to send second copies elsewhere, but (b) can't...and this really should be done inside Mailman.

The log should consist of full copies of the messages in mbox format. This is useful for debugging, for dealing with attacks, AND for proving, when necessary, that someone really did opt-in or really did unsubscribe. (By administrative messages I mean anything to -request, -owner, etc., as well as outbound subscription confirmation requests.)

Task level: Intermediate

Desired skills (in addition to those named above): none

Better content-filtering/handling error messages

(Suggested by rsk)

When messages have content stripped, make sure that the log includes the Message-ID, the MIME type, the file extension, the rule that fired, etc. so that if this was not what was intended, it's much easier to debug.

Task level: Advanced beginner (the actual technical work should not be too difficult, but you will need to be adept at reading code to find the correct places to make changes)

Desired skills (in addition to those named above): none

Convenient access to logs via the admin interface.

(Suggested by sjt)

Many problems that occur with mailing lists involve messages that "just disappear" somewhere along the way. There are many places where a message can be help up or even discarded, and not all of them are under control of Mailman or even the administrator of the Mailman host. However, host MTAs and Mailman itself keep detailed logs of many events in the life of a post (receipt by the MTA, receipt by Mailman, transmission by Mailman, filtering, approval by moderator, and so on). These logs are often useful in pinpointing the failure (whether it's a poster PEBKAC, a site misconfiguration, or an actual bug in Mailman).

So, when debugging delivery failures, the first thing we ask for is information from the logs. However, many list owners don't know where their logs are, and it's not unusual to find that they don't have the necessary access, and so need to go begging to the host administrator for any information.

Postorius should provide some or all of the following views of logs:

Each current log (including "smtp", "post", "vette", and so on) should be browsable as one (perhaps large) page. User can use their browsers' scrolling and text search features.
It should be possible to select only specific lists for viewing.
It should be possible to select specific periods of time for viewing.
It may be useful to select certain events (eg, "discard") for viewing (many logs only record one kind of event, but some have multiple possibilities).
It may be useful to select on more general criteria (regular expressions, keyword search, time of day – I don't know, be creative!)
Filtered access to views of MTA logs would be wonderful! However, many host administrators don't like to provide such access for privacy reasons. Also, the locations and formats of such logs are customizable at each site. Thus, this task is likely to be quite difficult to implement well.

Bonus points for making the facility more or less independent of Postorius. Although Postorius is shaping up to be an excellent general-purpose UI, Mailman 3 is intentionally designed so that other specialized or restricted UIs can easily be used (even simultaneously). Allowing other admin UIs to use at least the REST API is important, and those implemented in Django might be able to use pretty much the whole thing.

Task level: Advanced Beginner

Desired skills (in addition to those named above): Django (you can learn as you go). At least one open-source MTA's log format (Postfix or Exim suggested; you can learn as you go).

Mailman Milter

Because Mailman provides access to administrative data (including subscriber list and posting policy) via a RESTful web interface, this information can actually be made available to other processes such as MTAs. Many MTAs implement the so-called "Sendmail milter" API for modular implementation of filters and mail acceptance policy. A "Mailman policy milter" could access Mailman's databases and add list-based spam filtering and other acceptance policy at the MTA level. This has the advantage that the host need not accept (and store and forward) a message when it has access to information that the message will be rejected by the addressee. (In some jurisdictions such as Germany, an MTA that accepts a message becomes legally responsible for delivery, either to the addressee or to another MTA that will accept responsibility for delivery.)

Task level: Intermediate

Desired skills (in addition to those named above): Ability to implement externally-specified protocols/APIs (milter) accurately.

Not-fully-defined Project Ideas

These are some ideas that are not as well-defined, kept here for inspiration. If you want to discuss any of these ideas, please send email to mailman-developers@python.org.

Fixing bugs/adding features from the bug queues
All of the components of Mailman have reasonably-maintained bug queues. It would be possible to build a GSoC project around a few wishlist items or desired bug fixes if none of the above projects appeals to you. Do make sure to talk to the mailman-developers list to find out if any given set of bugs would be suitable.
- The bug queue for Mailman core is here: https://bugs.launchpad.net/mailman/
- The bug queue for Postorius (the web UI) is here: https://bugs.launchpad.net/postorius/
- The bug queue for Hyperkitty (the archiver) is here: https://fedorahosted.org/hyperkitty/
Make your own project!
Got a feature you've always wanted to see in Mailman? We're always open for suggestions, just stop by the mailman-developers list to discuss your project idea before you submit an application so we can help you make sure it's the right size for gsoc and that we have the mentoring support available to help you build it!

MailmanWiki: DEV/Google Summer of Code 2013 (last edited 2015-03-04 06:03:22 by msapiro)

-  ⇤ ← Revision 26 as of 2013-04-19 22:20:02 → 
  Size: 23575
  Editor: xwuang
  Comment:
+   ← Revision 27 as of 2013-04-19 22:22:00 → ⇥
  Size: 23575
  Editor: xwuang
  Comment: use build a VM wiki page in "Getting Started" section
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
-#pragma page-filename DEV/versions/15958214
+#pragma page-filename DEV/versions/15564903