Dealing With Massively Distributed Data Flows – DZone Database

七月 16, 2018 - MorningStar

Dealing With Massively Distributed Data Flows - DZone Database

/ Database Zone

Over a million developers have joined DZone.

Log In / Sign Up

Dealing With Massively Distributed Data Flows - DZone Database

Dealing With Massively Distributed Data Flows - DZone Database

{{node.type}} · {{ node.urlSource.name }} · by {{node.authors[0].realName }}

{{node.downloads}}

{{totalResults}} search results

{{announcement.body}}

{{announcement.title}}

Let’s be friends:

Dealing With Massively Distributed Data Flows - DZone Database

Dealing With Massively Distributed Data Flows

DZone’s Guide to

Dealing With Massively Distributed Data Flows

Take a look at how to deal with massively distributed data flows and how to avoid errors.

by

Oren Eini, CEO RavenDB

·

Aug. 04, 18 · Database Zone ·

Free Resource

Like (1)

Comment (0)

Save

{{ articles[0].views | formatCount}} Views

Join the DZone community and get the full member experience.

RavenDB vs MongoDB: Which is Better? This White Paper compares the two leading NoSQL Document Databases on 9 features to find out which is the best solution for your next project.

Imagine that you are the owner of Gary’s Shoes and that you want to get data from all of your multitudes of stores into a centralized location. You’ll use that data to make decisions, predict future trends, etc. Given that each store must operate independently, you have a server in each location that will push up its changes (and get updates from) the HQ cluster. You can see an example of this kind of setup in this post.

This works quite well, but it does require the user to be aware of a potential issue. When you have a massively distributed data flow process setup, you need to also pay attention to the quiet in the noise. What do I mean by that?

One of our customers has RavenDB deployed to tens of thousands of locations worldwide. At any given time, you are going to have at least some of those locations unavailable. In some locations, part of closing down for the day means literally flipping the master switch on electricity for the entire building. On others, you might have someone tripping over the router or have some local or regional network outage.

Part of the strategy for dealing with such a data set, coming from so many separate locations, is the need to monitor when we aren’t getting data. The fact that on most of our locations we have near real-time data is very powerful for the business. But you also need to see where you aren’t getting the data from and setup proper alerts and monitoring for the missing data. From a business perspective, it is also advisable to surface that kind of detail all the way to the user. If you are going to be ordering inventory for the stores in a particular state, but the two major stores in the area are down because of a network issue and has been down for two days now, you want to be aware of that and figure out that you are working on out of date data.

To be honest, the issues aren’t so much about two days of lag in the case of once in blue moon type of error. In the scenario outlined above, in pretty much all business scenarios that I can think of, you won’t really see any impact on the decision making of the organization.

The killer is when you have some sort of a problem that goes on for a while. A DNS update that was missed because of bad DNS cache policy, for example. Now your updates to HQ go into the void on a consistent basis. On the other hand, everything else continues to function properly both locally and for HQ. If this isn’t accounted for, it is easy to miss this for a long period of time. I have seen such a case that was only discovered when the year’s end numbers didn’t quite match up with what they were supposed to. Given that this was the second year in a row this happened, the investigation found that some network issue indeed caused a very long-term topology failure. This was actually properly reported in a log file that no one ever read.

Lesson learned; make sure that part of your data flow strategy accounts for such things and bring them to the users’ attention. Actually resolving the issue was a network configuration change that took minutes and the entire dataset was synchronized within a few hours afterward. But finding out that there was even a problem took effectively forever.

Dealing With Massively Distributed Data Flows - DZone Database

Get comfortable using NoSQL in a free, self-directed learning course provided by RavenDB. Learn to create fully-functional real-world programs on NoSQL Databases. Register today.

Like This Article? Read More From DZone

Dealing With Massively Distributed Data Flows - DZone Database

Distributed Work in RavenDB 4.0

Dealing With Massively Distributed Data Flows - DZone Database

Getting Started With Apache Ignite.NET

Dealing With Massively Distributed Data Flows - DZone Database

Clustering in RavenDB 4.0

Dealing With Massively Distributed Data Flows - DZone Database

Free DZone Refcard

Graph-Powered Search: Neo4j & Elasticsearch

Topics:

database ,data flows ,ravendb ,clusters ,dns caching ,distributed

Like (1)

Comment (0)

Save

{{ articles[0].views | formatCount}} Views

Published at DZone with permission of Oren Eini, CEO RavenDB , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Database Partner Resources

Big Data Creates Big Revenue

RavenDB

How Your Data Type Choices Can Affect SQL Server Database Performance

SentryOne

Troubleshooting SQL Server Performance

SentryOne

The definitive guide to high availability: replication, clustering and topologies

MariaDB

Don’t let the database be a bottleneck: Solving Database Deployment Problems with Database DevOps

Redgate

Report: New Test Results Show How to Optimize Application and Microservice Performance

Tarantool

Cage match: technical comparison of the three leading enterprise open source databases

MariaDB

Preventing SQL Server Performance Problems Before They Hit Production

SentryOne

SQL support, smart cache, and speed of 1 million ACID transactions on a single CPU core with Tarantool

Tarantool

Achieve Scale: Moving Your Database to Multi-Region Deployments

Cockroach Labs

How A NoSQL MapReduce Can Boost Your Database Performance

RavenDB

Redis Enterprise delivers high throughput at 83% lower app latency, reducing operational costs over 75%. Learn more

Redis Labs

Database Partner Resources

Big Data Creates Big Revenue

RavenDB

How Your Data Type Choices Can Affect SQL Server Database Performance

SentryOne

Troubleshooting SQL Server Performance

SentryOne

The definitive guide to high availability: replication, clustering and topologies

MariaDB

{{ editionName }}

{{ node.blurb }}

Free {{node.type}}
{{ ::node.title }}

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.linkDescription }}

{{ parent.urlSource.name }}

Dealing With Massively Distributed Data Flows - DZone Database

by {{ parent.authors[0].realName || parent.author}}

· {{ parent.articleDate | date:’MMM. dd, yyyy’ }} {{ parent.linkDate | date:’MMM. dd, yyyy’ }}

· {{ parent.portal.name }} Zone

{{ parent.views }} ViewsClicks

Notice: Undefined variable: canUpdate in /var/www/html/wordpress/wp-content/plugins/wp-autopost-pro/wp-autopost-function.php on line 51