Steve Wilkes
Striim the Rising Tide of Data / Steve Wilkes who is the CTO and co-founder of Striim.
Streaming data and how you really need to make all data streaming in order to take advantage of this new fast world and create operational speed applications which is what everyone is trying to do. S-T-R-I-I-M has got some new way to solve the streaming data problem that everyone really should be appreciating that they have.
Where Did Striim Come From?
GoldenGate, we had these various customer advisory councils. And the customers would often say it would be great if we actually see the data while it’s moving and get value out of it. So when we built the Striim platform, we not only recognized that we want to be able to move data from one place to another but also be able to do data processing, transformation, enrichment, visualization of that data as it’s moving. And that processing is important for a lot of different use cases. Because if you’re getting from one database to another which is a lot of kind of GoldenGate’s business, then it’s a light to light, seamless looks similar and you don’t need to do a lot of transformations.
But if you want to go from a database to a Kafka or message queue or right data into big data, then you have to worry about additional things. First, you have to worry about the data formats. Maybe you want to put JSON on Kafka or you want to put it as archive, your Big Data solution. We also have to worry about, is there enough information in this data to enable data applications to make decisions?
So if you’re doing change date capture and you’re doing it from a well-normalized database, most of what you’re going to be getting is IDs. You have updates to the customer order detail table, you have to see author ID, customer ID, item ID, status ID, right. That doesn’t mean anything for downstream application. So what you need to do in that case is do some processing of that data as it’s moving in memory before you land it onto that Kafka or to your cloud solutions.
That enrichment requires loading reference data into memory and joining with that. And that could be your information, your customer information. Large amounts of reference data, potentially millions of those. But you need to be able to have it in memory and be able to join it in a very efficient way so that you can maintain the speed of the streams. If you’re getting tens or hundreds of thousands of operations a second coming through, you can’t afford to go off to a database and ask for some reference information for every record. You have to join in memory and that data has to be in memory.
When we built the Striim platform, we want to expand upon what we have done before in multiple ways. We want to be able to collect data from more than just databases, add in message views and files and sensors and all that kind of stuff. We want to be able to transform the data processing and analyze it while it’s moving. We now support lots of different targets, not just databases but big data, message queues, cloud technologies, all of those things. And we want to be able to add new uses capabilities to visualize the data. So to do analytics, to build dashboards, and to see the data as it’s moving. And we recognized when we start the company six years ago this was ambitious but we have done it and that’s why we are with Striim today.
More Than Just Data Transfer
So when you showed me some — the pictures here, it really looks like instead of just taking data from one place and stuffing in another or taking the data, you’re transforming it and putting it somewhere else, you really create a space in the middle while the Striim is running to make an application happen. And we talked about some use cases like correlation and maybe in that analysis and analytics. We talked about some event-ing you could maybe do in the middle. You could fork-ing.
Some machine learning came up but it really now becomes a situation where someone’s data is in motion and while it’s in motion, we want to apply the application to it and do our applications. Not necessarily even at the endpoint which is what most people think of as get something like streaming Kafka, ticket here, and putting it out there, putting it on message queue, and getting there and then I’m going to do something down there. But you’re seeing this shift to let’s actually build the value of what we are doing while the data is moving.
Yes, you can certainly move the data from one place, put it in somewhere, store it, and analyze it afterwards. But the real value is to get real time insights and that’s where a lot of customers are heading. They may not all be there yet. They might be solving the first problem which is now to get to the data in a streaming fashion and move it, right. But those are all working on getting real time insights, recognize that the only way of doing that is by doing it in memory and doing it at the point the data is moving.
And that’s why you can take data sources in that platform and you don’t just have to put into one place. That same data source can be used to deliver into multiple targets and can also be used in analytics. So you can have multiple data flows, multiple pipelines all coming up the same streaming data, combining different streaming data together, running all these complex queries in analytics, and visualizing it all in the same platform without you having to try to piece together lots of different bits of software to achieve a solution.
The Role of The Graph Database.
What goes on in the middle, SQL-based? So it makes it very accessible to someone who is used to analytics to filter and aggregate and group things on the streaming data as if it were the final data. The other thing I liked is as you’re delivering the same consistent data in a real time way, you can put it in a graph database for graphical analysis. You can put it in Kudu for a drill down query. You can put it into Alexandra. You could put it into just anything else. And now in real time, all those other places are updated with the same information. So any downstream applications in the whole business are looking at sort of the same data.
Yes, and it’s actually really great for us that there are so many different technologies today, that people want to utilize the different types of analytics because as the cool integration requires things to integrate. And the more different types of sources there are, the more different types of targets there are, and the more you want to keep that up to date in real time, the more you need a full complete streaming analytics platform that supports all your sources of targets and can stream the data.
Just for the more I.T. oriented audience, what is the enterprise kind of features that Striim brings to the table because you’re not just connecting something up with open source wizardry? You’ve got a bunch of good things there. And where can we find out more information of this if we are interested?
For more information visit our website, striim, S-T-R-I-I-M.com. There’s a lot of information there. We also have lots of videos you can watch on YouTube and get to those resources. As far as being an enterprise grade platform, we are inherently Java based, cluster server based platform that supports full failover, recovery scenarios, enterprise, and migration purity built intimacy and lot and even individual data streams. It has the capability to integrate within everything you need.
Because it’s clustered, you can add more service as you need as well so you can handle scale and you can scale as necessary. And we can deploy our server platform almost anywhere. You could deploy on-premise, on bare metal machines, VMs, et cetera. We have containerized version of the platform. We also have marketplace offerings in Google Cloud, Azure, and AWS so you can split up instances there. So we run anywhere Java VM is available. And that can all form a cluster on which applications can move data easily from one place to another. And the cool premise really is getting your data where you want when you want to.