Hey, Russ! I'm busily working on queueing right now, as we speak. Seems like we're wasting effort on competing codebases -- any chance you'd like to re-integrate and coordinate our efforts?
Russ
· 1 year ago
Evan!
That's great to hear about the queuing!
Of course, you can have access to my changes (as per the GNU Affero license. ;-) ), but honestly, I haven't done anything except tweak a couple lines of PHP here and there as I work the through the code and figure it out. Working out where to splice in queuing would be way beyond my understanding of your code at the moment, though I'm happy to help as I get to know how things work.
-Russ
Evan Prodromou
· 1 year ago
Please, let me know if you need any help. As you can already see, there are some places for storing multimedia files (urls, mostly) in the DB already. It'd be great to take account of licenses, since one key part of Laconica is to support sharing free content.
Keith Gaughan
· 1 year ago
And the really silly thing is that there are perfectly good FOSS message queues out there like beanstalkd that they could use.
Earle Martin
· 1 year ago
Patches welcome, Keith. (I'm serious; that's why it's open source.)
lmorchard
· 1 year ago
I think part of the Laconica story, though, is getting the software installed on commodity PHP web hosts for federation. That kinda quashes a beanstalkd installation for most of that audience. Doesn't preclude it as a later bolt-on option, though
Jeff O'Hara
· 1 year ago
Quick question? do you know who on identi.ca wanted a version for their classroom? We are building a microblogging service at http://edmodo.com for teachers to use in their classroom.
Evan Prodromou
· 1 year ago
Also, where's your source?
Russ
· 1 year ago
Does it need to be in a repository right away or do diffs count? I don't mind doing an export of the code into Google Code or something, but it'd be pretty silly based on the number of lines I've tweaked (probably all of 10)...
-Russ
Evan Prodromou
· 1 year ago
Sure, let's figure out a way to share it. Once I'm over the hump on this queueing thing tonight (you can pull the code from darcs right now, by the way), I'm going to bed for about 14 hours, then I'm going to respond to all the great code contributions I've gotten over the last day and a half.
beingbrad
· 1 year ago
What is the problem again with a mid table that has the user's id and the id of everyone they are subscribed to as the limiting query for the user.all_subscribed_messages?
jschuman
· 1 year ago
Russ, you're right on the money. We had the very same discussion yesterday on NewGang Live. First one to actually do this as a true message bus wins.
heri
· 1 year ago
i don't get your comment about services like twitter needing a messaging architecture.
twitter uses this architecture right now and it still doesn't work.
tweets are not published automatically for instance, they use starling, a daemon whose task is to take all tweets and process them. The subscribed users then get the updates. exactly how you described it.
but this is flawed. why? because when you have thousands of followers, the message has to be copied thousands times. imagine what happens when another user who has thousands of followers replies back. i'm sure even smtp would also choke on that -- it's not designed for users wanting to message thousands of addresses.
that's why i think identi.ca is not better or worse than twitter. but here's one thing: users can install it on their own servers helping thus the load to be distributed. which is a great architecture design.
note: i don't necessarily have the solution though.
Evan Prodromou
· 1 year ago
One last thing, Russell: you may not be familiar with the XXX/FIXME commenting tradition. "XXX" means "I know this is bad, but it's good enough for now, so I'll come back and fix it later." And "FIXME" means "don't let this out the door". I don't think there are any FIXME comments there, but you can grep for XXX to find places in the code where I think optimizations are in order.
One thing in particular: I had problems with making joins work in DB_DataObject, so there are places where I have nested loops instead of much-more-efficient joins.
Asbjørn Ulsberg
· 1 year ago
You're of course completely correct. I can't, in fact, see why any page on either Identi.ca nor Twitter needs to be dynamically updated upon view. They can all be 100% static .html files, re-generated by a the batch job you describe, every now and then. This static file regeneration could be made very effective and would be extremely easy to scale.
sandos
· 1 year ago
Queues do not replace the DB though, it sounds like that here. You still need to store things somewhere, indexed, and databases are useful for that, although relational ones are probably not a good fit for this kind of data, and "inifinite" scaling. There you would probably need something easily like bigtable, hadoop, hypertable or whatever. I guess you could also partition a regular DB, but I am not convinced that is as easy, one you got it all going.
andyroberts
· 1 year ago
I know that Friendfeed appears to be more about aggregation than microblogging but I'd be interested to know whether the architecture for Friendfeed suffers the same basic design flaw described in this post. Good to see Evan jumping in and trying to overcome all the obstacles by the way.
Dave Hodson
· 1 year ago
This really isn't that hard to scale large - as you point out with your mention about queues. If identi.ca implements an async message queue, they can scale quite large. The last project I worked out handled north of 1 billion msgs/month, so it can scale up quite a bit with not much effort.
That's great to hear about the queuing!
Of course, you can have access to my changes (as per the GNU Affero license. ;-) ), but honestly, I haven't done anything except tweak a couple lines of PHP here and there as I work the through the code and figure it out. Working out where to splice in queuing would be way beyond my understanding of your code at the moment, though I'm happy to help as I get to know how things work.
-Russ
-Russ
twitter uses this architecture right now and it still doesn't work.
tweets are not published automatically for instance, they use starling, a daemon whose task is to take all tweets and process them. The subscribed users then get the updates. exactly how you described it.
but this is flawed. why? because when you have thousands of followers, the message has to be copied thousands times. imagine what happens when another user who has thousands of followers replies back. i'm sure even smtp would also choke on that -- it's not designed for users wanting to message thousands of addresses.
that's why i think identi.ca is not better or worse than twitter. but here's one thing: users can install it on their own servers helping thus the load to be distributed. which is a great architecture design.
note: i don't necessarily have the solution though.
One thing in particular: I had problems with making joins work in DB_DataObject, so there are places where I have nested loops instead of much-more-efficient joins.