Skip to main content (Press Enter).
Skip auxiliary navigation (Press Enter).
Skip main navigation (Press Enter).
Explore All Communities
Become an Advocate
Earn Points and Badges
Start a Discussion
Why open source is integral to solving the long tail of data integration problems
Anand Sagar Rao Vala
It is no exaggeration to say that the world runs on open-source software: the globe’s financial, manufacturing, commerce, and supply chain systems are directly or indirectly dependent on open source code. Additionally, the majority of innovative new software is open-source, with key projects driving cloud computing, artificial intelligence, and machine learning, containers, and orchestration.
Data and analytics is a key sector for open source adoption. Almost three-quarters (74%) of respondents to
451 Research's recent Voice of the Enterprise: Data & Analytics survey
agreed that their organization uses, or has used, a data platform technology that is freely available via a no-cost license (such as open-source, free-tier commercial and free trials), while 93% of
Voice of the Enterprise: AI & Machine Learning survey
respondents said they are using or plan to use open-source tools for AI/ML projects.
Open-source software has also been a key component of the data management sector for many years. Key projects include Hadoop and Spark, as well as numerous NoSQL databases, Kubernetes, Pentaho, and Talend for data integration, and the likes of TensorFlow and Python for data science.
There are multiple reasons why open source is particularly well suited to data management. One benefit is the availability of talent. Given the core importance of data management within organizations, the community aspect of open source provides a valuable pool of skills that – in many cases – are more transferrable and adaptable than those tied to proprietary tools or platforms.
Open source also aligns very well with one of the most significant gauges for measuring the potential effectiveness of a data integration offering: the breadth of support it provides in terms of connectivity to a variety of applications and data sources.
With proprietary software development and licensing projects, a commercial vendor takes responsibility for developing integration connectors, driven by vendor partnerships and customer demand. While that means that the most popular combinations are prioritized, it also means that more unusual requirements can be deprioritized, or even ignored completely if the vendor cannot justify allocating resources to their development.
With open-source software, it is not unusual for a community of developers, users, and commercial entities to emerge around a successful open source project. This means that rather than relying on a single vendor, users with an unusual integration requirement can either fix the problem themselves or work with a systems integrator or consulting partner to do so.
As such, open-source projects related to data integration can quickly evolve into a vibrant ecosystem of integration connectors that are developed and extended by a variety of community participants – including individual developers and users, enterprises, vendors, and systems integrators. It is this ecosystem, often combined with a commercial vendor that is eager to facilitate community engagement, that ensures that the long tail of niche or rarely used integration combinations is addressed.
Download and try the Pentaho Community Project or the 30-Day Trial of the Enterprise-class version
and solve your data integration problem.
Lumada Manufacturing Insights: Laying the foundation for Manufacturing 4.0
Hitachi is Positioned to Address the Impact of Electric Vehicles on the Grid
DataOps - From Hype To Solution
Establish Control of the Data Lifecycle with Dataflow Manager
UCP BM Q3CY20 - Betelgeuse Project - What is New
A proud part of
Code of Conduct
© Hitachi Vantara LLC 2021. All Rights Reserved.
© Hitachi Vantara Corporation. All Rights Reserved.
Powered by Higher Logic