From zero to four: how to get top machine learning badge on HackerRank

HackerRank is a company that focuses on competitive programming challenges. To drive competition among the users, it gives medals and badges for achievements in domain competitions. This guide reveals how to get top 4-stars badge in machine learning.

Read On →

Lookup table maintenance in Hive

Often analysts would use lookup tables for data manipulation. It is common, that such tables are maintained manually. This post reveals how to make this process easy and yet flexible using Hadoop and Hive external tables and Hive views.

Read On →

Beyond traditional join with Apache Spark

Examples of DataFrame jois with spark and why output sometimes looks wrong. This article covers variety of join types, including non-equi-join and slowly changing dimensions.

Read On →

Immutable heap implementation in Scala

Current Heap implementation in Scala (PriorityQueue) is mutable. It means that after heap manipulation, the previous state is no longer accessible. This article describes immutable heap construction based on Scala Vector. First of all, we need to define an interface to the Heap. It should have insert and extract methods. As far as designed data structure should be immutable, both methods should return the whole heap in addition to expected result. Read On →

How to update your maven

Install proper maven version on your computer, howto.

Read On →

Top 5 features released in spark 1.6

Spark version 1.6 has been released on Jan 4th, 2016. Compared to previous version it has significant improvements. Let's cover top 5 of them.

Read On →

Go versions, how to make updates easier

This article shows advantages of using version manager for Go – gvm.

Read On →

Get random lines from file with bash

Data sampling with bash.

Read On →

Static site generator for personal blog

Hugo as personal site generator.

Read On →