Your question was: Is there a GoDaddy coupon code for the USA?.
Im interested to see peoples responses and learn more...although I don't have any particularly specific technical knowledge in this area..
-I always separate my sending server and click tracking server - I don't want to risk the mail server getting clogged and not letting some clicks through...
Well sometimes I use a million different hardware configurations...
So what works best for you?.
I am not convinced yet that SSHD is the way to go. I still get errors and instability issues with the OS where I have SSHD in use. (with Linux, not Windows).
SATA2 still seems fast enough and far more stable....
How about RAM vs Processor? Am I better off putting my $$ into faster processors or more/faster RAM? Is a dual processor mailer any more effective than a single processor/multi core? ANyone here give a shit besides me?.
Well if you really want to know, here's my specs:.
1.4 ghz p4.
256 mb ram (this is really important).
160 gb ide.
1.5mb T1 line.
Cranks out 200,000 emails a min - can't beat that!!!..
This is a very sweeping question. Some of my machines are p4's with 2 gigs of ram and 100 gig ide hdd..
Other machines are dual quad cores with 16 gigs of ram, SAS drives 1066mhz ram etc..
Some machines use 32 ips others use 500. it really depends on the application of the application heh..
Some use windows which I hate, others use linux which I love...
Nice.. this is the conversation I am talking about..
Personally I stay away from Windows mail servers for one simple reason. CentOS (linux) is free. One less license and $$ to deal with when you have 60+ servers. Linux web/mail servers are very easy to manage.. What benefit do you find with windows or is there a specific application you like in the windows environment?.
I have a my own 50Meg and 20Meg fiber connections. Might be overkill but it's nice to have.. How many machines do you have on the t1 sending mail TJEWSKY? I was considering adding a t1 location to compare machine limits side by side with the fibers....
Hey DTran, do you find that you really need that much machine? dual quad cores? what performance difference do you see? Is that more for managing large lists or pushing high volume?..
Are you sure? This seems number seems pretty impossible given those stats unless you have about 100 of these machines on separate lines..
Well, aside from the Win2K, if it was linux box, you could still do some damage with this configuration.. Probably not 200k/Min but maybe almost 200k/hr. Its fun taking those old machines and turning them into work horses....
Why not throw in a SATA2 drive for the OS and put everything else that's disk I/O intensive on the SSHD drives - make sure the SSHD drives support trim!.
Here is a scream of a machine for a robomailer:.
CentOS 5.4 64 bit.
2 x Quad Cores or 4 x Quad Cores.
8 to 12GB DDR3 (the fastest memory you can get on the MB).
1 x 36 GB SAS Drive / 15K RPM (for OS - smaller disks are actually better for performance, and it's just the OS.).
Mount 2 x 60GB SSDH Drives in RAID 0 (stripe them, now you have 120GB for the db - largest db I've seen on robomail is under 120GB).
Mount 1 x 500GB SATA2 drive for crap/logs/data.
W/O SSDH drives, then use SAS drives (or even old SCSI drives), stay CLEAR from any "GREEN" drives..
Optimize the .conf DB after a few weeks of intense mailings. Lots of scripts out here for that..
Optimize .conf web server to handle tons of requests...
Nice post Robo.. You and I are on the same wavelength...
Agreeeeeed.. AVOID GREEN.. Lost so many hard drives to the environment!.
My question is this Robo, Is the multi Processor config necessary? I have many servers running quad core but only single processor and they kick ass, especially with the SSHD, so long as the RAM is maxxed out...
What difference will I see with a Dual or Quad Processors setup? Why spend the extra $$?.
Only bottle neck in this system is the disks. I personally would up the ram to 32gb or 64gb. Get myself a 1tb SATA drive for the OS and logs, and setup a ram disk with the balance of the ram so you mail queue is retarted fast, and you dont have any disk bottlenecks...
If you're application(s) for mailing only use single core processing then there is little advantage for multiple cores..
Most DB's can be optimized for multi-core usage although I rarely see full utilization of all cores, usually 1 process chews heavily on 1 core and the other processes nibble and chew on the other cores.
If you're seeing performance increase in single core cpu's - then focus on increasing BUS speed all around (MB/MEM/CPU) and you'll probably see more increases..
SSHD's are extremely fast, especially for random disk I/O in which you're using some type of db on the mailing machine. Suggestion: if you're using MySQL MyISAM - switch to InnoDB and test, you'll be pleasantly surprised.
If you're using mail queue's - that is an entirely different beast, you're going to be eating up disks all day long, and the larger they grow - the more slow you're going to be - until you choke...
Most stock MTA's (postfix,sendmail,qmail, etc) write at minimum 5 times to disk for every ATTEMPTED delivery. You'd have to heavily modify the source to change that..
Is Volo really that good? It seems like a lot of you mailers use it. I've heard good and bad...
Sweet system ! I have the same, except using windows ME !..
This thread is pointless. A PMTA should be able to send millions of messages per hour on a $500 supermicro box.
Making money sending email is 95% delivery, 4% list management software and 1% hardware...
You should try it out - I've heard good things and bad things, but different mailing solutions will behave differently for everyone..
[disclosure: I work w/ robomail]..
Can you provide server specs for this $500 supermicro box? and where to purchase the parts?..
Let's go Dell:.
PowerEdge 11G R210 Rack Server Details | Dell.
They don't even sell 1950's anymore, which is what we run. Looks like this replaced them...
Sorry to hijack thread but with purchased hardware are we usually talking like getting your own ip space and camping on it for a long time or are there other methods to utilize such hardware with shall we say more volatile methods?..
I don't recommend anyone hosting their own box. Use software and lease an off the shelf machine from a colo place.
We should be talking about specs for a machine like that..
Holy shit, I didn't realize these rackers dropped in price so much... NICE jboogie!.
Aside from the 1GB ram and minimal processor, yea you can SHOOT out mail fast - and grab bounces relatively fast, but this can't possibly support a db at those speeds, unless you don't bother updating suppressions/scrub/clickers/success/def/failures/openers/etc.. and just update unsubscribers..
I guess why bother having a db at all, you could just pound the shit out the logs, and just parse the logs when you need data. would make for slow queries - but hey, who needs stinking reports!..
My rig looks a bit like this:.
My db is currently on.
Two Quad core Xeon 1.86 Ghz's.
12GB of ram.
2 SataII 250GB drives (I kind of assume these are going to be the bottleneck).
My crm/controller is on an identical machine.
For actual delivery, I have 2 or 4 gb of ram, everything else is fairly unimportant..
I have not really ran into much of a bottleneck on the db side. I have hit a little bit of an issue on the controller side which basically aggregates all the data and puts it into the db in batched writes... But this is more an issue of code not being quite how I want it yet...
The DB on the box you're mailing from? Blasphemy!..
Its that 1% that drives your other 99%. If you are dealing with a small list, perhaps your system specs are less important. But when you have hundreds of millions of records to manage at the same time, it's all about processing power.
Also, I am not arguing about the $500 part. Machines are pretty damn powerful these days, and for pennies.. But some of us are tech junkies too, so just give us a little slack please...
Regarding Volo, it's a very good application and we have a few hosted in our facilities, and I personally operated and managed many for years... But it's uber expensive and thus not to profitable for many amateurs. Its definitely overkill for small list mailers. But please take your software talk to another thread so we can focus on hardware here...
Seriously, if the software was built correctly - then yes..
Can be self contained on 1 machine then entire mailing solution..
For a single machine, on a single network, this is ideal. Once you expand, then this becomes far from ideal. Once you send over many machines over many networks, this becomes an absolute nightmare.
Alas, this is a hardware discussion...
60+ Servers sheesh you must have a very popular blog /newsletter..
Why did kouzmanoff get banned? He was pretty cool I talked to him for an hour today on the phone...
I thought we were talking single machine. Expanding takes quite a few turns - fer sure agree here...
No.. SG is an ESP, and some servers are run by SG and some are leased to clients.. Many machines are identical but there are also many different configurations ranging from single core to multi processor...
If you want to talk about your cloud/cluster configuration, please share! This thread is not limited to single machines...
Absolutely not correct. When your database gets big it's about the number of spindles and having enough RAM to keep indexes in memory...
Correct, the db will chew up disks more than processors and (if reasonably enough) memory has advanced - most 3rd generation processors like the Intel Core 2 Quad Q6600 are pretty fast....
Remember, most db's rely on random disk i/o for performance. You can't take the manufacturers specs on speed without that specific test. I really don't care what the read/write speeds are - I'm looking at random seek times..
One way to counter this easily is to RAID 0 the disks, you loose disk reliance but you gain speed..
The other way is to use smaller disks, the smaller the spindle usually translates to faster seek times..
Remember, from what I've read about this hardware stuff - most fast SAS drives usually performs well until the disks are 1/2 full, then there is a weird curve increase in dropped performance. Obviously google is your friend to read more about random disk i/o performance...
Allow me to clarify because my statement was taken out of context. I was comparing the importance of hardware capabilities (processing) vs delivery/data.. I agree RAM quantity and speed is more important than processor.... I said processing power, not processor. I meant the overall hardwares ability to process the task, and I should have been more specific with my words. Thank you for the correction...
All you guys using your own machines are using optin subscribers or blasting bought email lists? if those r bought ones dont you have probs with your isps?..
If they go bad we replace them. We always have reserves..
Some of us use bought lists / rev share and some of us mail our own data our sites generate...
Ahhh.. When one has problems with ISP, one must become own ISP...
Ram disk for the spool directory FTW. As fast as you can get...
Postfix mitigates this by reusing the same inode for each stage of the queue process; it just changes the hard link when for example a mail goes from the active queue to deferred. This limits the amount of disk i/o it uses...
Dell PE 2950.
6x SATAII (raid0 for data, raid0 for logs, os disk, data disk).
16 GB ram (Most of it allocated to innodb buiffer pool).
Gigabit link to a few dual core atom 1U boxes (for the electricity savings) which run the mailing software -> postfix localhost -> postfix satellites out on the 'net via a T1...
Do you mail with Postfix? If that's working for you very interested in any tips on the right configuration...
I do mail w/ Postfix. I use v2.7 (to take advantage of sender_dependent_transport_maps)..
I typically relay all mail to a campaign-specific Postfix server that serves as the last hop before delivery to the recipient. I relay that mail using another postfix server located at/near my injector. The two postfix instances allow pipelining (ESMTP) between them, and I tweak the concurrency settings for this connection to accomodate whatever pipe is between them (maximize throughput)..
This method allows me to rate-limit outgoing mail using a policy server (policyd v2, 'cluebringer'), which is usually not possible - policy checks only affect incoming mail (but now my policy server can examine the mail as it is relayed, b/c it's technically 'incoming') b/c they're generally used for antispam..
Now I have fine-grained control over rate-limiting, per destination or source domain or IP, per user, per time period, whatever, and there is a web-based admin panel that writes ratelimiting info to a mysql database that the policy server reads. It's damn flexible, and changes are read in real-time..
I have several transports defined, for different tiers of recipients for which I want to limit connection concurrency (transport_destination_concurrency_limit, pos/neg feedback), or limit/disable connection caching..
I use additional transports to bind sending domain->IP address, the mapping is done using sender_dependent_transport_maps. As long as I set the sender domain correctly in the injector, is will go out through whatever IP I've specified..
There are a few more tricks, like storing a lot of config data in mysql databases, so I can change things in realtime via phpmyadmin or a custom web frontend (for some things). I forward all bounces to a central postfix server, which keeps inode usage down on the relay server (it needs those inodes for the queue spool). I set in_flow_delay to 0, which means that it'll continue to accept relayed mail even if message arrival rate exceeds the message delivery rate - this keeps the queue on my injector free for more work, and the relay fires off as fast as possible (or as fast as I've defined in the policy server). I also use header checks to do any rewriting that I feel is necessary, or to add Bulk headers for gmail , you get the idea. I keep everything in a git repository (including the scripts to provision a new box) so I can get everything going pretty quickly.
I've also written a patch for postfix which allows me to override the behavior from different smtp response codes; for example, a yahoo TS03 is a 421 response (wtf yahoo?), and postfix will happily put this in the deferred queue and try to resend it after $minimal_backoff_time; this is frickin stupid. Postfix uses a map to figure out what to do and, in this case, appends the error msg to the body, rewrite the return code to 5xx, which causes postfix to send it as an NDR to the bounce account. This is also tied into my custom stack, which will add a rule to the policy server disallowing mail to that domain for a specified period based on the error/domain. I'mma keep this one in my pocket though.
Wow that was a mouthful. HTH..
Well said, when you cant work with them, you buy them..
I should SAY SO! Nice !.
A) So, do the two instances reside locally?.
Have you considered running more than 1 instance on the machine?.
B) Writing patches for GPL/GNU MTA's is usually the answer to boost.
Performance & add special features.
Anyone who's doing this efficiently is a skilled ninja...
An visual nerds out there?.
Setting up an email sending cluster at one collocation would look similar to this:..
Very Pretty.. Whats the price tag and specs of this arrangement Kouzmanoff?..
The two instances are not on the same box, they communicate over the WAN but I try to maximize throughput over that link by tweaking the config..
I've considered using multi-instance, I haven't found a compelling reason to mess w/ it yet; the sender_dep_transport_maps make it unneccesary (in my workflow)...
I just checked out your site (Inter7), and saw that you guys use qmail. I had to laugh out loud, thinking about your patch comment!.
I'm not poking fun at you, I' just know qmail..
Yea, we started a very very long time ago with it... we recently took control back some of our packages = vpopmail, qmailadmin, etc..so we can continue to flood it with solid goodies - it's extremely reliable, so for large scale clustering, it still make sense for us..
Thank god DJB let his software release into the "free" world, now people can release versions with it actually working out of the box. We have like 5 or 6 rpm's in the works, and I suspect Debian will have the first release in it's source..
Some other development put it on the back burners for now, but it's still in queue...
This post feels like a treasure map. I have used Postfix quite a bit but never seen the detailed roadmap for really using it for true bulk mailing..
I think this thing can run into the hundreds of thousands depending on hardware manufacturer..
The storage array is a mid class EMC, the FC switches - take your pick..
The front end machines really don't need massive hardware, they are there to add / remove as needed so the front end machines can grow as business grows..
You can slice the services off onto each frontend machine, or you can run all servers on each front end machine -.
If you do the later, then it's real easy to add new front end machines because you just build the exact same configuration, but it might not be the best allocation of services.
Let's say you need more POP servers, if you add more front end machine with all the services running on them - then you're hardware heavy. which isn't really ideal..
How do the super expensive EMC setups compare to the less expensive but newer concept SSD rigs? For instance if you just plugged a few of the fusionIO cards into a server. Your looking at MASSIVE throughput for under 10k, right?..
SSHDs kick ass.. the good ones.. not the cheap ones though....
I'm a fan of "less is more". One of my current boxes has the capacity to queue and deliver about 60M emails+ per 24 hours. This includes:.
- Web Serving for clicks.
- Tracking & Metrics.
The delivery rate is also not for GI consisting many domains which inflate many advertised delivery rates. This is the type of speed I achieve with a very targeted set of domains..
The recommendations to switch to InnoDB are generally a bad idea unless your application AND database schema are designed to take advantage of InnoDB. InnoDB is not faster than MyISAM when it comes to writing to the DB. MySQL is a relatively "low performance" option when compared to all DB server possibilities available. It's popular because it is free and does the job, but you would see a substantial improvement using MS SQL. If you're sticking to open source, then check out custom MySQL builds from Percona. You can be daring and try the new stuff but personally I'll take a bit less speed in exchange for reliability when it comes to the DB..
Speed isn't everything but to maximize performance you need to minimize disk I/O. Even with SSDs, you still need to do that...so what can most of you do to boost performance?.
1) Mail Queue on RAM drive. Your server should have a lot of ram, because using a ram drive will effectively limit your queue, and therefore how much you can send per period..
2) Log to ram drive or disable logging if you're feeling lucky..
3) Use SSD in Raid 0 arrays. Seriously, don't fuck around with anything else. Use raid 0 exclusively, and with the current SATA 3.0 Gbps standard, about 2 SSDs per array is max..
4) Your DB should be on the SSD drives..
5) Make sure that your SSD capacity is 50% of what you actually need. If your shit takes up 100 GB then you want 300 GB of capacity, so you have room to grow as well as room for the controller to optimize reads and writes. Intel's SSDs are still the best and probably will be for a while..
6) Optimize your TCP/IP stack to handle "shitloads" of simultaneous connections. The default TCP/IP settings are generally inadequate..
7) Tune your DNS server accordingly. Whether you're running it locally or on a dedicated box, if you're mailing volume your DNS server is going to be busy..
8) Insist that your datacenter connects your box directly to their main managed switch. This will reduce latency and improve overall network throughput..
9) I could keep going or get into more detail but I'm not getting paid to and I'm a whore...
What qualifies them as cheap ones? Are the standard intel ones considered cheap?..
If the read/writes are in the 200s speed range, and they cost under $200....
You're a smart person. This is often overlooked, and also applies to other HDs.. And I didn't want to say anything about RAMDisks because, honestly I think thats the best solution (tricky but FAST) and I didn't want to play my ace card, but since you spilled the beans, RAMDisk is our current focus in development. Most basic linux systems already have RAMDIsks allocated at boot which can be tweaked.
If you have any suggestions in this regard, please share..
Hah...ramdisks, have been around for decades and every major OS has the software needed to make one included. I wouldn't refer to them as any kind of secret... With linux it's a simple matter of mounting a directory with the TMPFS. You can even have it auto-scale and spill over into the swap file so you don't get any "device not writeable" errors if the ram fills up. The nice thing is that proper usage of ram disk can make just about any MTA go from a 10-20/s to 1000-2000/s...
With SSDs the spec sheets don't tell the full story. Nobody is buying them for their max sequential throughput (the big MB/s numbers you often see). The main lure of an SSD are super-fast IOPS and low random seeks..
Problem is that most brand of SSD drives get slower over time, or have shitty controllers that do not properly manage "garbage collection" resulting in a drive that dies quickly. Intel X-25M is a solid choice - it's what I use. You don't need to blow extra money on the SLC version...plus, if you run them in raid 0 you essentially double their useful life...
Will a raid configuration adversely affect speed noticeably? I am not a RAID configuration expert....
That is a cool approach to rate limiting. Thanks for sharing!.
Thanks man! I struggled for months trying to figure out how to apply policies designed for incoming mail filtering to outgoing mail; when I realized that sending mail from my injector to a local postfix instance, and relaying to the last-hop MTA (rather than communicate directly with the last-hop MTA over the internet) was actually *faster*, everything fell into place.
Postfix (and most other MTAs, probably) have many fantastic and finely-tuned filtering capabilities, but they primarily apply them to ingress traffic (to combat spam). When we have to design our rate-limiting into our injector apps, we lose out on all that great engineering...
Wouldn't it have been cheaper to just buy powermta? I love solving problems like that, but at the end of the day didn't that cost you more money than 11k?.
I forgot to add what you did was a hella cool solution to the problem...
Actually crackpot, you are better off owning your own "powerMTA" and as much of everything else in the equation as possible... This way you can run it and lease it.
But I agree in that you should not have to rely on more software to limit your rates. If your rate limiter is not built into your system, you're overcomplicating things.. I am not arguing against the effectiveness of your postfix solution, just that it seems like alot of extra tweaking and multiple software UIs. which gets cumbersome on a large scale.. And a distraction from becoming more efficient in other areas....
Hey, I'm a developer, not a marketing guy. Working on stuff like this is like looking at porn all day, I cannot get enough of it...
Raid 0 is striping, where data that equal or larger than the stripe size is split in half and written to 2 or more drives. The main benefit of stripping 2 SSDs is to allow your system to remain responsive under high loads...that way you don't need a separate box for web, dns and mailing. One box can do it all. It also has the added benefit of extending the SSD life. I use MD to combine the drives on Linux, and LVM to partition them as needed. Don't use the raid features built into the motherboard unless it is a dedicated raid controller...
Indeed I can fully relate. I'll build stuff I don't need just for fun...
That is funny but true, I think there are like 3 of us on board with you in this thread that think the same way...
ALSO : I love your rate limited is actually (I think) a self imposed throttle system designed to rate limit real email servers from inbound attacks.. HA! +HARD REP - too funny, I love it..
So, if I get this right, the first box sends as much as it can to the relay boxes, which the relay boxes run the rate limiter for the first box... too funny..
I can think of about 1000 different ways to do this, but using these out of the box solutions is a fast way to cure a headache of a problem.
I just love it...
I just set up a RAID0 array on two SATA drives in my PE2950 and am getting almost 300MB/sec reads, compared to 75MB/sec reads for a RAID1 array with the same exact hardware, all on the same raid backplane. (using hdparm -t).
I dont' like to use LVM on performance-critical partitions; I've not done extensive research, but have read a few things that indicated in some cases LVM can reduce performance by an unacceptable margin (anything over a few % is unacceptable). If you think about it, it makes sense; LVM is all about making it possible to expand a volume by adding an additional layer of abstraction over disk geometry; so it necessarily imposes some overhead. In the real world, it probably has little impact unless there is another contributing factor (an unpatched bug for example), but I stay away from it on my need-for-speed drives (db, logs, swap etc.)..
That's just what it is - using Weitse's antispam defenses to make sure I'm not identified as a spammer. And I'm not, I'm 100% compliant/legal but still have to jump through damn hoops...
Raid 0 is the only one that really boosts overall performance, the others are either for spanning or redundancy. You should run the iozone benchmark to get a more accurate picture of your drives' performance. The main benefit of using raid 0 on a server is I/O per second, which is the number of simultaneous operations the disks can handle. Sequential transfers are not really important. You can also run the iostat command to see how much data is being written/read to your drives..
It's possible to get by without LVM, though I use it so I do not need to make multiple MD partitions. It also plays into the fact that administrative efficiency > slightly increased I/O overhead. Being able to change partition sizes (or add disks to the array) without losing data or turning off the server is a major benefit. The performance overhead of LVM is negligible and you should be setting the drive elevators to NOOP for any drive that uses NCQ or has a dedicated raid controller. Doing that should make the disks a bit more responsive...
I could probably just as easily google these questions, but for all the RAID inexperienced, do you absolutely need two of the same drive? When raiding two HDs, can one be SSHD and the other not? What are the rules for this? Or is it absolutely necessary to have identical models? or at least identical volumes?.
I don't believe there are any tools to allow RAID on unmatched size drives. Due to the way stripping works, I don't think anything allows different sized drives.
You *could* use one SSD and one Spindle based drive.. You could also email a list of known spamhaus traps...
That's my 10k a day rev method right there. Thanks for making it public...
I hadnt considered changing the IO scheduler; to be honest, the only alternate scheduler I had ever heard of actually being used was in Con Kolivas' -ck patchset. Would you please point me to some benchmarks that show the advantage of the noop scheduler? The drives in this box (PE2950) are all SATA, so they do use NCQ and they are on a dedi raid backplane. I'll be spending my evening reading up on this, thanks for your tips!...
If anyone can help me mail i'll be willing to trade your time for servers as I have a lot ready to rock and roll..
You can mix and match drives, but with raid 0 the maximum size of the array will be 2 times the capacity of the smallest drive in the array. If you have 3 drives, it would be 3 times the smallest size drive, and so on. Most people stripe drives in pairs because of steep diminishing returns when striping more than 2..
You would not want to strip a standard drive and SSD. Generally, identical drives are the way to go. I stick with Western Digital since they've proven reliable for all of my builds and their RE3 edition can basically run 24/7 for years..
CentOS by default uses CFQ (completely fair queuing) scheduler by default. It's fine for single-drive systems, or systems where the drive does not have any onboard optimizations like NCQ enabled. Once you enable hardware-level I/O optimizations you typically want to go with NOOP which prevents the OS from doing any kind of read/write optimizations. Your other option is "deadline", but in my tests I have achieved the best sustained results with noop..
You can find various benchmarks floating around but I'd say the particular results are almost specific to your hardware. You should experiment with them and see what you get...but if your drives are SATA2 or better with NCQ enabled your best bet is probably noop...
That's an excellent way to summarize the whole concept..
However, how are you ensuring your relay box removes the original headers to not give away the IPs / domains of your command-and-control box?.
Which Postfix settings let you do that?..
There are several ways to rewrite the headers, most directly is header_checks..
Heres an example:.
That handles your "command and control" Received: header, plus drops List-Unsubscribe:, X-Mailer: and Precedence: headers for relayed mail..
The above uses 'regex' (or 'pcre') map type, there are many different supported map types including database (mysql/pgsql) lookups..
If you want to get more control over the rewriting (getting domain-specific etc.) you can use a 'tcp table' map instead of a regex or db, which allows you to write a daemon that can customize the rewriting logic. Here's a great example that uses dnsbl lookups to customize header rewriting: Postfix header_checks using tcp_table and checkdbl.pl script KutuKupret.
The postfix-users group is an excellent source for this type of info. If you have a yahoo account, yahoo has a searchable index of the last few years of posts (it's a goldmine)...
This has to be one of my favorite all-time threads. You guys have some great ideas and knowledge being shared here..
I can't seem to give enough rep out to some of you. My little spongy brain is on fire!..
How about a little reputation boost for starting the thread? Come on guys.. Its the holidays!..
Our system operates so different from this...
First, the whole need for a relay machine... Not sure I understand why that is necessary, along with rewriting all your headers. Our system processes each outbound email with DKIM, and that signs the full header and email body using the private and public encryption keys. In order to do that, the email must be sent from the server with the mail domain and IP bound. Does your system do that at some point?.
Are you sending in a way that your MTA's main IP, hostname is displayed in message headers and so you need to relay the traffic thru another remote system to hide the sender details?.
Question regarding RAID0. Is there a limit to the number of HDs?.
I was just giving an example of how to rewrite headers using postfix, including the specific question asked (rewriting the sender data)..
I do some header rewriting, but not that simple, and it's done before the dkim signing occurs...
Awesome stuff. I remember you could rewrite headers, but nice to see your actual code!.
If you are relaying using SMTP with something like Postfix, delivery machine would add headers of the control machine where the message came from, giving away your command center in every message. Of course if you spool from control to delivery box via FTP that would not be an issue, but you'd have to write special handling code for that and won't be able to use this Postfix rate limiting solution...
What is a better solution? Using a relay machine or binding your mailing ips to the delivery machine? We use the latter.....
That would be great, however this particular network has a core of several heavy-ish servers/connections and a sh1tload of lightweight mailers spread out wherever (think VPS)..
The design developed from the constraints, it's not amazing but it does the job and lets me be flexible with my IPs...
Regarding raid 0, is there a limit to the number of drives I RAID? So if I want to experiment with 5 SSHDs in a raid0 config, is that feasible? practical? necessary? Whats the limit?.
Here is a programming question.. What is faster? Queuing information from an SQL database or from a text file? Can you explain in terms of hard drive usage? I have tried both and am not sure I see a difference.
Here is a context to consider. Lets say I am generating a list to mail to. Do I want my mailer running out of a DB or from a text file? Is it better to load the list into a temporary table or just queue the text file?.
Which is optimal for HD usage? How does each affect server performance?.
Does it make any difference at all?..
I am starting to get serious about implementing Postfix and have a few questions about your design..
How many transports could Postfix handle and how many do you typically use?.
Seems to me you need one per IP / sending domain / receiving domain group / email drop. Last one to be able to attribute logged messages to a specific send (or how else can you attribute them from standard logged info?). When you multiply to get all combinations it is conceivable to have hundreds or thousands of transports. No?.
Is it possible to configure these transports in MySQL database instead of master.cf flat file? If yes, how do you set that up?.
What other configs could you move to a DB and how to do that? Is there anything that still requires flat file?.
I could think of a few more reasons to use a separate bounce server, that probably should not be exposed in a public forum.
Any chance you could share the patches? Would love to review and possibly hack and contribute back. Did you write any patches to change how logging works? The standard Postfix log format seems pretty confusing / complex / limited / hard to process in DB. Very interested in how do you output and process your Postfix logs...
Well my favorite thread was about to fall off the main page so I am reviving it!.
Here is my tech topic for the day!.
I want to know everything you know about Python. Whats a good starting point in terms of practical applications and knowledge bases? How many experienced Python programmers are here? How many of you are using MTAs primarily running in Python?.
Thanks for your input!.
Sorry SuperGenii, but an MTA written in Python? are you serious?.
If I need something out of the door, yes ill first write it in perl, python or php... just to prove it works. if I want someone else to see/edit/write on top easily - ill leave it in that language..
If I'm going to STRESS/FAULT tolerant it in anyway, than I won't leave it in PPP for any period of time...
I appreciate the response. Any other opinions on the matter?..
Isn't there a way to turn Perl/Python into compiled code? How much worse would that be than writing in C from scratch?.
EDIT: Did a quick Googling and found this benchmark: C++ vs. Python vs. Perl vs. PHP performance benchmark /contrib/famzah. Any comments?..
There's CPython, Unladen Swallow (which is dead), Jython (to compile to JVM bytecode), and currently, the most active, PyPy. Take your pick..
Nothing beats writing it in a real language to start off with though, I completely agree with the sentiment that the Ps are for prototyping, not for production..
While I don't recommend it I've seen people pull in huge rev numbers with a perl mailer...
One cool thing I have seen about python is you can use inline c to process things. I havent looked into it much in the past 5 years but the theory is that you can re-write the process intensive parts of the code in lightweight c and really speed things up..
Thats cuz they are old school badasses...
Todays Mailer Tech Topic: Databases => MYSQL vs Your Favorite Brand.. Whats the best system for housing and querying millions and millions and millions of records?..
It really comes down to personally preference. I know some people who use NoSQL simply because they can't optimize a database properly and others use it because they like it better. I personally use MySQL and have no issues with it...
I am currently using Twisted Python (twistedmatrix.com) to set up a distributed network of multi-purpose servers to handle all of my mailing backend stuff.
For example, if a client wants a scrubbed file they can send xml-rpc requests to a local server that will locate the database service, and use AMQ messaging to proxy the user's request to the it, which generates the scrubbed dataset, streams the file back to the requesting server, which maps that file data to an http resource that the client can click to download..
The core app is a twisted daemon that listens for client requests and spawns whatever type of service is required. This daemon runs on every server I have, and they all can communicate w/ each other via AMQ. Some instances start with defined roles, like smtp+pop3, or database, but they all have AMQ (inter-server talk) and xml-rpc (client-server talk) running. My backend mailing structure is modeled in each app, as python classes (like lists, bounce codes, campaigns etc)..
If I want to check out bounces on server X, I send a message to that server to spawn up a pop3 instance and point it a whatever maildir I'm curious about. I ask the pop3 instance to scan for bounces - it can slurp the bounce codes by communicating with the server instance that stores them. Then it runs the regexes, blah blah..
The missing part of it is a directory service, currently each instance needs to know at runtime where every other instance is..
Also it needs a user interface, long-polling/COMET with ajax is popular. The vendor that wrote the definitive Twisted web UI framework (Divmod) has recently gone out of business, and I'm waiting for the dust to settle to see if it's going to be maintained in the future..
I wouldnt use python for my MTA. No way..
PMTA has a Python injector on google code somewhere, thats as far as I would go...
MySQL all the way here. There are problems which might benefit from something like Hadoop, but I believe anything in the day to day mailing space is solved with MySQl...
A better question would be what are some problems your facing. For instance tracking the delivery of every single email sent from a system can easily be a huge table...
I asked these types of questions earlier but did not see much response..
The volumes of transactional / tracking data in bulk mailing could get humongous real quick. Really curious how people are organizing, summarizing and reducing it in MySQL...