Big Data Systems Performance: The Little Shop of Horrors


Upvotes: DownVotes:
Age: a year     Page Views: 146
Votes / View: 14    Wilson Score: 0.34

The confusion around terms such as like NoSQL, Big Data, Data Science,Spark, SQL, and Data Lakes often creates more fog than clarity. However,clarity about the underlying technologies is crucial to designing thebest technical solution in any field relying on huge amounts of dataincluding data science, machine learning, but also more traditionalanalytical systems such as data integration, data warehousing,reporting, and OLAP.In my presentation, I will show that often at least three dimensions arecluttered and confused in discussions when it comes to data management:First, buzzwords (labels & terms like 'big data', 'AI', 'data lake');second, data design patterns (principles & best practices like:selection push-down, materialization, indexing); and Third, softwareplatforms (concrete implementations & frameworks like: Python, DBMS,Spark, and NoSQL-systems).Only by keeping these three dimensions apart, it is possible to createtechnically-sound architectures in the field of big data analytics.I will show concrete examples, which through a simple redesign and wisechoice of the right tools and technologies, run thereby up to 1000 timesfaster. This in turn triggers tremendous savings in terms of developmenttime, hardware costs, and maintenance effort.