Why open source is integral to solving the long tail of data integration problems

By Anand Sagar Rao Vala posted 12-11-2020 02:28

It is no exaggeration to say that the world runs on open-source software: the globe’s financial, manufacturing, commerce, and supply chain systems are directly or indirectly dependent on open source code. Additionally, the majority of innovative new software is open-source, with key projects driving cloud computing, artificial intelligence, and machine learning, containers, and orchestration.

Data and analytics is a key sector for open source adoption. Almost three-quarters (74%) of respondents to 451 Research's recent Voice of the Enterprise: Data & Analytics survey agreed that their organization uses, or has used, a data platform technology that is freely available via a no-cost license (such as open-source, free-tier commercial and free trials), while 93% of Voice of the Enterprise: AI & Machine Learning survey respondents said they are using or plan to use open-source tools for AI/ML projects.

Open-source software has also been a key component of the data management sector for many years. Key projects include Hadoop and Spark, as well as numerous NoSQL databases, Kubernetes, Pentaho, and Talend for data integration, and the likes of TensorFlow and Python for data science.

There are multiple reasons why open source is particularly well suited to data management. One benefit is the availability of talent. Given the core importance of data management within organizations, the community aspect of open source provides a valuable pool of skills that – in many cases – are more transferrable and adaptable than those tied to proprietary tools or platforms.

Open source also aligns very well with one of the most significant gauges for measuring the potential effectiveness of a data integration offering: the breadth of support it provides in terms of connectivity to a variety of applications and data sources.

With proprietary software development and licensing projects, a commercial vendor takes responsibility for developing integration connectors, driven by vendor partnerships and customer demand. While that means that the most popular combinations are prioritized, it also means that more unusual requirements can be deprioritized, or even ignored completely if the vendor cannot justify allocating resources to their development.

With open-source software, it is not unusual for a community of developers, users, and commercial entities to emerge around a successful open source project. This means that rather than relying on a single vendor, users with an unusual integration requirement can either fix the problem themselves or work with a systems integrator or consulting partner to do so.

As such, open-source projects related to data integration can quickly evolve into a vibrant ecosystem of integration connectors that are developed and extended by a variety of community participants – including individual developers and users, enterprises, vendors, and systems integrators. It is this ecosystem, often combined with a commercial vendor that is eager to facilitate community engagement, that ensures that the long tail of niche or rarely used integration combinations is addressed.

Download and try the Pentaho Community Project or the 30-Day Trial of the Enterprise-class version
here and solve your data integration problem. 

1 comment



05-05-2022 13:22

Good Read