Entrepreneur, husband, Dad, and technology geek all contained within a single human being.
1402 stories

From lambda to kappa and dataflow paradigms.

1 Share

I've spent some time coming up to speed on the state of modern data infrastructure lately, and this is an attempt to weld the notes I took along the way into a cohesive narrative. I'm quite certain there are a number of mistakes and omissions, and I'd love to hear what folks think could be expanded or fixed!

Lambda architecture with streaming and batch component.

Lambda architectures started coming into widespread awareness in 2013, thanks to work by Nathan Marz, and subsequently became a popular architecture. Their particular advantage was using real-time stream processing to calculate recent windows, and using batch processing to calculate final values for windows as they aged out. Batch is also used to backfill new or modified calculations.

One of the principle technologies in this wave was Storm, first released in 2011, which reached wide adoption as the real-time component of lambda architectures, often paired with Kafka for storing logs, and the Hadoop ecosystem for batch processing.

As more companies adopted these patterns, three primary concerns started to emerge:

  1. dual implementations: the streaming and batch components tended to diverge significantly enough that they required logic to be implemented twice, in different paradigms.
  2. weak real-time correctness guarantees: not only was the batch computation necessary to handle backfills, it was also necessary to reach a high degree of data correctness, as the real-time components only supported at-most-once and at-least-once guarantees.
  3. operational toil: while the Hadoop ecosystem has had years to trends towards maintainability, these real-time components were less mature and tended to introduce significant operational toil, in particular Storm's dependency on Mesos and ZooKeeper frustrated some adopters.

While the lambda achitectures played out in the industry's public eye, innovation was germinating behind closed doors, and two years later in 2015, at least three interesting threads dropped fruit:

  • Google published The Dataflow Model, which proposed an unified approach to streaming, micro-batch and batch processing, which in particular gave guarantees around exactly-once processing.

  • Twitter operationalized the lambda model. They took two paths: building Heron, an operationally improved version of Storm which shared its correctness limitations, and investing into summingbird which allowed the same code to run on Storm (or Heron) and Cascading (in particular, Scalding), a batch processing library that ran on Hadoop.

  • At approximately the same time, Jay Kreps' log evangelism was maturing into a Kafka implementation that would eventually offer exactly-once guarantees, the stream processing framework Samza which offered exactly-once stream computation, and more generally the Kappa architecture (where all traffic goes through a centralized event bus rather than requiring publishers to understand behavior and needs of their downstream consumers).

Showing Kappa architecture with all events going through a centralized event bus.

(This presentation looks at one adoption of a Kappa architecture.)

In the two years since, something of a data infrastructure renaissance has flourish into a dynamic ecosystem. The new generation of tools trends towards low-latency data pipelines through both micro-batching (as seen in Spark Streaming) and native-streaming (as seen in Flink Streaming), and new strategies to enforce exactly-once event processing, which has allowed stream processing to achieve levels of correctness that previously required batch processing.

While most lambda architecture technologies are seeing reduced adoption (e.g. Storm), some have managed to jump generations to become standard building blocks. In particular, Kafka, Hadoop YARN and HDFS remain entrenched. YARN and HDFS as interesting as technologies which originally entered the ecosystem in the even earlier map-reduce wave, and have now survived two generational shifts.

YARN appears the most fragile of the three, with most frameworks offering standalone modes of operation, in addition to continued competition from more general schedulers like Mesos, although Mesos itself feels like it's losing significant mindshare to Kubernetes (despite Mesos and Kubernetes being meaningfully different tools to address different needs).

Of those technologies making the generational leap, Kafka in particular is expanding its feature set in a bid to differentiate and grow its mindshare. The clearest example of this is Kafka Streams, which is effectively an Apache Storm competitor with fewer dependencies, fewer concepts (can be reasoned about as consuming a Kafka topic and outputting a compacted Kafka topic), and exactly-once guarantees. By design, it's not a direct competitor with the more complete streaming frameworks:

The gap we see Kafka Streams filling is less the analytics-focused domain these frameworks focus on and more building core applications and microservices that process real time data streams... building stream processing applications of this type requires addressing needs that are very different from the analytical or ETL domain of the typical MapReduce or Spark job. They need to go through the same processes that normal applications go through in terms of configuration, deployment, monitoring, etc. In short, they are more like microservices (overloaded word, I know) than MapReduce jobs. It’s just that this type of data streaming app processes asynchronous event streams from Kafka instead of HTTP requests.

Beyond Kafka Streams, Confluent is actively working to expand the Kafka ecosystem by plugging gaps in their tooling ecosystem with efforts like Kafka Connect, which tries to address common Kafka rollout dependencies like schema management and abstracting Kafka's low-level APIs.

Moving away from Kafka Streams and their intentionally narrow scope, the two analytics-focused frameworks that appear to be winning this generation's mindshare are Spark and Flink.

Both tools are receiving wide adoption, with an edge to Spark for batch/bounded processing, and to Flink for streaming/unbounded processing (in particular, as Flink support native streaming and Spark relies on micro-batching, which incurs a latency penality that may inhibit some usecases). More conceptually, Flink is an explicit inheritor of Google's Dataflow (now rebranded Beam) model, whereas Spark's novelty comes from its resilient distributed datasets.

Diagram of dataflow paradigm with watermarks and multiple windows.

At the same time, Google is making an interesting play to abstract away both Spark and Flink through their Beam library, which provides a library to implement dataflow paradigm programs that run on top of a variety of runners (include Flink and Spark, but also Google Cloud's Cloud Dataflow product). There are some very interesting ideas in the dataflow approach, and I particularly appreciated these explanations of windowing, snapshots.

Overall, it's quite an interesting time to be paying attention to data instructure, and I'm quite excited to see how things pan out.

Read the whole story
4 minutes ago
Waterloo, Canada
Share this story

Simple visual processing exercise is the first intervention to limit dementia

1 Share

Enlarge / Exercises that help you to quickly pick out details seem to have the biggest effect on dementia. (credit: Flickr user City Lights)

Dementia strikes many people as they age, and there's currently not much we can do about it. It would be nice to think that there could be a fix to stave it off, like a computer game or something that could do more than help you improve at that computer game. Well now, for the first time, it seems like there may be.

The Advanced Cognitive Training in Vital Elderly (ACTIVE) study was a randomized controlled trial in which thousands of healthy seniors got different kinds of cognitive training and had their cognition monitored over ten years. Importantly, the trial was registered at its outset at ClinicalTrials.gov, so even if all of the results were negative (and therefore not likely to be published in an academic journal) they would still be on record and accessible.

After five years, all of the results were in fact negative. But after ten years, one of the interventions reduced dementia risk by about 30 percent.

Read 6 remaining paragraphs | Comments

Read the whole story
10 hours ago
Waterloo, Canada
Share this story

We're All Innocently Out of Touch

1 Share

Seven billion people on this planet share a flaw: They’re out of touch with almost everyone else.

I’m out of touch. You’re out of touch. Neither of us fully understand why so many other people think differently than we do. Look, we try hard. We’re well meaning and open-minded. But as Mark Twain said, “If you hold a cat by its tail you will learn something you can’t learn any other way.” The most important stuff can’t be learned vicariously. You have to experience it. And all of us have wildly different experiences.

We’re just now realizing how wide this chasm of experiences is. The coolest part of the last 10 years is the connecting of different minds through social media. But the more views we’re exposed to, the harder it is to stomach that so many different views exist. Realizing how many different views exist forces you into one of two spots: Arguing with others whose views you think are wrong, or realizing how out of touch you are with people whose experiences have led them to different views. Both are hard to deal with.

This applies to investing.

Your teens and 20s are an important time. You start learning how the economy and the stock market work, and what they’re capable of. Money, like politics, is emotional. And emotional fields tend to be rooted in views formed at a young age.

Now, scan different countries and generations, and the range of experiences people had in their youth is a mile wide:

Screen Shot 2017-11-17 at 8.54.30 AM.png

An American born in 1970 saw stocks rise more than eightfold in their teens and 20s. An American born in 1950 saw stocks go nowhere in their teens and 20s. In Japan, the difference between back-to-back generations was losing 100% of your money vs. making 11.5x on your money.

Do you think these groups went through the rest of their lives thinking the stock market was capable of the same thing? Or posed the same risks? Or equally capable at securing a comfortable retirement?

Of course not.

Everyone knows the anecdotes of Great Depression babies never trusting the market again. But there’s hard evidence: A team of economists once crunched generations of data on how people invest. They found wildly different desires to accept investing risk across generations, and those swings coincided with the booms and busts that specific generations experienced, particularly in their youth.

One quote from the study stuck out to me: “Current [investment] beliefs depend on the realizations experienced in the past.”

That’s powerful. Think of the arguments we deal with in investing – over valuation, over expected returns, over moats, over bubbles. Two people with the same education and same data can think bitcoin is either the next tulip or the next internet. The whole reason markets work are because these gaps in opinion exist. But why do they exist, if we all have roughly the same data? Part of it is because we’ve all had different experiences, and current beliefs depend on past experiences. Which means we’re all pretty much out of touch with one another.

Fredrick Lewis Allen, in his book on the 1930s, wrote that Great Depression “marked millions of Americans – inwardly – for the rest of their lives.” But there was a range of experiences. Twenty-five years later, as he was running for president, John F. Kennedy was asked by a reporter what he remembered from the depression, and answered:

I have no first-hand knowledge of the depression. My family had one of the great fortunes of the world and it was worth more than ever then. We had bigger houses, more servants, we traveled more. About the only thing that I saw directly was when my father hired some extra gardeners just to give them a job so they could eat. I really did not learn about the depression until I read about it at Harvard.

This was major point in the 1960 election. How, people thought, could someone with no understanding the biggest economic story of the last generation be put in charge of the economy? It was, by my reading, overcome only by JFK’s experience in World War II. That was the other most important experience of the previous generation, and something his primary opponent, Hubert Humphrey, didn’t have.

The big point is that no amount of studying or listening lets you fully understand what it was like to experience these events. I read a lot of military history, but I will never comprehend what it’s like to be in combat. You can recreate stories, but you can’t recreate fear, adrenaline, and genuine uncertainty. So everyone who has been in combat will always have a different nuanced view about war than I do, no matter how hard I try to put myself in their shoes. The same is true for business, career, and investment history that all of us study.

Four years ago the New York Times did a story on the working conditions of Foxconn, the massive Chinese electronics manufacturer. The conditions are often atrocious. Readers were rightly upset, and many demanded changes. But a fascinating responses to the story came from the nephew of a Chinese worker, who wrote:

My aunt worked several years in what Americans call “sweat shops.” It was hard work. Long hours, “small” wage, “poor” working conditions. Do you know what my aunt did before she worked in one of these factories? She was a prostitute.

The idea of working in a “sweat shop” compared to that old lifestyle is an improvement, in my opinion. I know that my aunt would rather be “exploited” by an evil capitalist boss for a couple of dollars than have her body be exploited by several men for pennies.

That is why I am upset by many Americans’ thinking. We do not have the same opportunities as the West. Our governmental infrastructure is different. The country is different. Yes, factory is hard labor. Could it be better? Yes, but only when you compare such to American jobs.

I don’t know what to make of this. Part of me wants to argue, fiercely. Part of me wants to understand. But mostly it’s a blunt-force example of how different experiences can lead to vastly different views. Views we would never consider at first pass. We’re all out of touch, through no other fault than the random luck of our experiences.

Jeremy Grantham is a well-known investor who has been, with few exceptions, bearish for much of his long career. Why? Many reasons, but this description, from the book Bull!, provides context:

Fresh out of Harvard Business School, Grantham played the go-go market at its peak. By 1970, he had lost all of his money. “I like to say I got wiped out before anyone else knew the bear market started,” Grantham recalled years later.

He lost all of his money during a bull market at the start of his professional career. I didn’t experience that. You probably didn’t, either. But Grantham did. That’s his history. And it likely shaped how he thought about risk for the rest of his life in a way that I’ll never comprehend. He won’t understand my point of view all the same.

Three years ago Michael Sam came out as the first openly gay NFL player. Some were thrilled. Others weren’t. Sportscaster Dale Hansen gave a monologue offering his take and support of Sam: “I don’t understand his world, but I understand that he’s part of mine.”

It’s a totally different issue driven by different causes, but that’s a good framework for business and investing. I don’t have to understand your investing views. But I have to understand that you’re an equally influential part of the economy, and I’ll come closer to understanding your actions by asking what you’ve experienced to make you believe your views, rather than wondering why you don’t agree with mine.

Start with the assumption that everyone is innocently out of touch and you’ll be more likely to explore what’s going on through multiple points of view, instead of cramming what’s going on into the framework of your own experiences. It’s hard to do. It it’s uncomfortable when you do. But it’s the only way to get closer to figuring out why people behave like they do. Which is the puzzle we’re all trying to solve.

Read the whole story
2 days ago
Waterloo, Canada
Share this story

A few notes on daily blogging

1 Comment

I’ve been wanting to write about the habit of daily blogging I’ve taken up since Oct. 1st this year, but I’ve avoided it, because 1) there are so many other interesting things to blog about 2) I’ve worried that blogging about blogging is too recursive and it will open up some sort of evil dimension or will just jinx the good mojo I got workin’. Still, I want to give it a (hopefully quick) spin.

The idea started out from my anxiety about “stock and flow.” As Robin Sloan wrote seven years ago: flow is the feed (It’s the posts and the tweets. It’s the stream of daily and sub-daily updates that reminds people you exist.”) and stock is the durable stuff (“It’s the content you produce that’s as interesting in two months (or two years) as it is today. It’s what people discover via search. It’s what spreads slowly but surely, building fans over time.”)

In Show Your Work!, I wrote that it was always my M.O. to turn flow into stock: tweets become blog posts that become book chapters that become books. Trouble is, I had failed to heed Robin’s warning:

I feel like flow is ascendant these days, for obvious reasons—but we neglect stock at our own peril. I mean that both in terms of the health of an audience and, like, the health of a soul. Flow is a treadmill, and you can’t spend all of your time running on the treadmill. Well, you can. But then one day you’ll get off and look around and go: oh man. I’ve got nothing here.

Not only was I not turning flow into stock, I became acutely aware that due to the slow (or fast?) decay of social media and algorithm tinkering, the flow wasn’t even doing what it used to do —“remind people you exist”—  and worse, my bits were just getting sucked into a void, an archive that I could download, maybe, but probably never go back and mine for any gold. Turning flow into stock isn’t all that hard, but it gets exponentially harder the more flow you have to go back and sift through.

Also, quite frankly, Twitter turned into a cesspool almost overnight. My friend Alan Jacobs was very vocal about his split from Twitter, and after reading his vibrant blog and new book, How To Think, I just decided to give daily blogging a go again, and this time, to do it on my URL, on my old-school WordPress blog, like the old days, when blogging actually meant something to me.

So how’d it go? Well, so far, even better than I expected.

1) I had no idea how badly my writing muscles had atrophied. After a couple of weeks, I could feel the sentences coming easier.

2) After struggling to come up with a new book idea for so long, I could start to see all the connections between posts, the patterns, the idea planets I keep orbiting. Because it’s all in one place, hyperlinked together, I can see my own obsessions in a way that is much harder elsewhere. (Also: I’m owning my turf. This place has been around for a dozen years. Longer than Twitter and Tumblr and Instagram, and if I had to bet, I’d guess it will outlast them.)

3) I had forgotten how wonderful blogging is as a mode of thinking. Blogging is, for me, more about discovering what I have to say, and tweeting more about having a thought, then saying it the right way. It’s also great to be able to go as long or as short as you want to go.

4) Maybe most surprising, is that my posts have gotten, in my opinion, much deeper and more interesting. I used to scramble on Thursdays, trying to come up with a good blog post so I could post it at the top of Friday’s newsletter. Often I would cop out, write something quick and pat, and move on. Once I started daily blogging, not only did I have more to link to, it’s actually better stuff — some weeks I have a tough time deciding which post gets top billing in my list of 10. (I hope you’ll subscribe, btw, if you haven’t already.)

There’s a story about perfectionism in David Bayles and Ted Orland’s excellent book, Art & Fear

The ceramics teacher announced on opening day that he was dividing the class into two groups. All those on the left side of the studio, he said, would be graded solely on the quantity of work they produced, all those on the right solely on its quality. His procedure was simple: on the final day of class he would bring in his bathroom scales and weigh the work of the “quantity” group: fifty pound of pots rated an “A”, forty pounds a “B”, and so on. Those being graded on “quality”, however, needed to produce only one pot — albeit a perfect one — to get an “A”. Well, came grading time and a curious fact emerged: the works of highest quality were all produced by the group being graded for quantity. It seems that while the “quantity” group was busily churning out piles of work – and learning from their mistakes — the “quality” group had sat theorizing about perfection, and in the end had little more to show for their efforts than grandiose theories and a pile of dead clay.

With blogging, I’m not so sure it’s about quantity as much as it’s about frequency: for me, there’s something kind of magical about posting once a day. Good things happen. Something small every day leads to something big. (Seth Godin has championed daily blogging for years—he just passed his 7000th post.)

5) Maybe I’m weird, but it just feels good. It feels good to reclaim my turf. It feels good to have a spot to think out loud in public where people aren’t spitting and shitting all over the place.

Anyways, I hope I can keep it up for as long as possible. Thanks for reading.

Read the whole story
2 days ago
I hadn't really noticed he'd moved up to daily blogging but definitely have noticed I've liked (and shared) more since he did
Waterloo, Canada
Share this story

Look what you made me do! Here’s my first impression of Taylor Swift’s Reputation.

1 Comment

I am so far out of the demo, this feels maybe like an Abrictosaurus reviewing an opera, but for the six of you who have asked me if I’ve listened to the new Taylor Swift record, Reputation, (because I’m such a big dumb fan of 1989), here are my first impressions.

I just finished the first playthrough, and I like it. I haven’t paid super close to the lyrics, because I’ve literally listened to it one time, so this is just based on the general musical tone and pacing of the album.

Thoughts on the rest of the record:

…Ready For It? kicks off with a punch that winds me up for the rest of the record. I’m generally not a big fan of that dubstep wuuuubbbbvvvvsszzzzzzsound, but it works for me in this context.

End Game is a collaboration with Ed Sheeran and Future that left me cold. It feels out of place on this album, but especially after …Ready For It? got me so pumped up to hear what comes next. The vocals are so overproduced, the whole thing is a little much for me, but I suspect that the legion of Taylor fans who love Ed Sheeran will eat it up. (See above about how I’m not in the demo for the album.)

I Did Something Bad is glorious, lyrically and musically. I love that Taylor Swift is just dropping a huge DEAL WITH IT to everyone. This is probably my favorite song on the album.

Don’t Blame Me feels like a Lorde song, which sort of made me go “Buh? Wha? Fluh? Huh?” because I listened to Melodrama right before I listened to Reputation.

Delicate didn’t do much for me.

Look What You Made Me Do didn’t floor me when it was a single, but I feel like it works so much better in the context of the album, which isn’t what I expect from a pop album that is usually designed to have a bunch of singles (notable exception is Tove Lo’s Queen of the Clouds, which is a pop concept album and damn near perfect. Also, her new record, Blue Lips, is great).

So It Goes… feels like a song that could have been on 1989, and I mean that in the very best way.

Gorgeous is another one that could have been on 1989, the emotional B-side to Blank Space. I expect it to come back around in summer.

Getaway Car feels like a song that didn’t quite make the cut for 1989. I wasn’t crazy about it.

King of My Heart has this particular beat that’s common in pop right now that isn’t my favorite thing, and the vocals are way over processed, but for some reason those two things come together to make this track the exception that proves the rule.

Dancing With Our Hands Tied feels sort of like if Imogen Heap collaborated with Everything But The Girl in like 2002. It’s lush in a way that I haven’t heard Taylor Swift before, and I really liked that.

I’m not crazy about the falsetto in Dress, but maybe that’ll change.

This is Why We Can’t Have Nice Things is a lot of fun, and feels like it would be right at home in a modern Broadway musical. (And honestly, I just don’t care who – if anyone – that song is about. Music critics just need to get over the tired trope that Taylor Swift writes songs about everyone she has dated or known or whatever. Maybe this song is about someone in particular, but why does that even matter? Maybe it’s about you, stupid music critic, you big dummy.)

New Year’s Day is a great album closer. The stripped down vocals, simple harmony, and solo piano are such a great counterpoint to the production of the rest of the album. I can feel the brief moment of darkness at the end of it, before the house lights come up, as the lights go out on the stage. I think this song is going to be in a lot of graduation videos this year.

So, overall, 4 out of 5. One track I just don’t like at all, two tracks that I can take or leave, and 12 songs I really liked. Reputation didn’t grab me on its first listen the way 1989 did, but I feel like I’m going to get into it more upon subsequent plays.

But not grabbing me right away and compelling me to restart the album right away doesn’t necessarily mean it’s not a great record; it just means that it wants me to do a little work to find my way into it. It’s like, The Bends grabbed me right away and I played it to death. OK Computer took me several listens to appreciate and love, and all these years later, I never play The Bends, and will put OK Computer on pretty much always.

Did I just compare Taylor Swift to Radiohead? You bet your face I did. Don’t @ me. I contain multitudes.

Read the whole story
2 days ago
Pretty spot on from my own impressions. Also this last line is gold: "Did I just compare Taylor Swift to Radiohead? You bet your face I did. Don’t @ me. I contain multitudes."
Waterloo, Canada
1 day ago
Swift is impressing me as of late.
Share this story

The Signal From Linux, That Real-Life Thing, The Eerie Possibilities Of The Future

1 Comment


We thought a general Big Robot update was overdue, so we’re going to give you an insight into what’s going on, and what the robot road map is for the future.

Those of you watching us from across the mists of social media will have doubtless picked up on a couple of low-key announcements. One is that we will again be working with Ian McQue, who provided us with the concept art inspirations for The Signal From Tölva. This project, (or even projects!), is still in its infancy, but I can confirm that Ian will be contributing to a future video game project from BR.

We are, of course, still working on The Signal From Tölva, and have previously announced the free expansion, Ice Variant. No, it isn’t just snowy weather for the existing map, as someone suggested, it’s a completely new campaign, with new designs from Ian and new 3D art from Olly Skillman-Wilson and Jon Polti. In terms of what to expect, this is a prequel dive into events on Tölva, featuring a lost Information Broker. We think you’ll get a kick out of it. And, yes, it’s also going to have some snowy weather, too!

However, it hasn’t exactly been plain sailing for us in recent months. A number of real life events have stalled different members of the team, meaning that we’ve had to take some serious breaks here and there to deal with those issues. We can only apologise for that, but real life and real people, as ever, come first.

We’re picking up some speed again now, however, and the Linux port for TSFT is up on Steam and most of the other portals too. Dan and I will be keeping an eye on feedback for that, so let us know what you will find. More updates will follow, with Ice Variant probably arriving just after Christmas. We’ve finished about 90% of the level building, but that last 10% still has a tonne of work associated with it! We’re enjoying building it, however, and we think that’s going to show in the final update.

Further out we’re beginning to look at future projects. We have a number of prototypes to choose from – at least two of which are already playable to some extent – and we’re going to be looking at more closely when the time comes. Precisely what will come of these, and to what degree they will tie into our plans with Ian, are yet to be fully determined!

But there’s more!

Cambridge lecturer and award-winning author Robert Macfarlane has announced that he’s been talking with us about exploring the “eerier reaches” of the British landscape in a future project. This isn’t much more than a conversation at the moment, but that conversation is extremely interesting indeed, and we anticipate an exciting year or two ahead.

Big Robot grows stronger! Etc.

Read the whole story
2 days ago
Very cool... collaboration between a favorite author and game developers for Sir You’re Being Hunted.
Waterloo, Canada
Share this story
Next Page of Stories