Reshaping Dataframe using Pivot and Melt in Apache Spark and pandas

Data cleaning is one of the most important and tedious part of data science workflow often mentioned but least discussed topic. Reflecting on my daily workflow, task of reshaping DataFrame is the very common operation I often do to get the data in desired format. Reshaping dataframe means transformation of the table structure, may be remove/adding of columns/rows or doing some aggregations on certains rows and produce a new column to summerize the aggregation result. In this post I won’t cover everything about reshaping, but I will discuss two most frequently used operations i.e. pivot and melt. The solutions I discuss are in spark to be more specific pyspark and I will give you brief solution for pandas but if you want detail explanation of pandas solution I would recommend you to read this post.

Data cleaning in python using pandas

Data cleaning is a very important part of any data science project as data scientist spend 80% of their time is this step of the project. But not very much attentions is given to the cleaning process and not much research efforts are put to create any sort of framework recently I came across an amazing paper titled as Tidy data by Hadley Wickham in Journal of Statistical Software in which he talks about common problems one might encounter in data cleaning and what a Tidy data looks like I couldn’t agree more to him, he has also created a R package reshape and reshape2 for data cleaning, but the problem was the paper had very little to no code I also found the code version of the paper but it was in _R_, while most of my data cleaning work is done in pandas, I had to translate all those R solutions to pandas equivalent, so in this post the I will summarize all the main idea of the paper that the author suggests in the paper and also how we can solve it in pandas.

Handling categorical features with python

As a data scientist, you may very frequently encounter categorical variable in your dataset like location, car model, gender, etc. You cannot directly use them in our machine learning algorithm as these algorithms only understand numbers. There are various techniques to convert these categorical features to numerical features but that is not the focus of this post, this post is about how to implement these techniques in python. I will talk a little bit about these techniques but won’t go into too much depth, I will emphasise more on various ways how you can implement this technique in python.

Visual text Analytics with python

Due to the flourish of internet and accessibility of technology incredible platforms like social media, forums, etc have been created for knowledge sharing. Exchanging ideas is not confined to a geographical area. Due to this volume and variety of content is generated in the form of images, video, text, etc. The amount of information is so much that it’s unmanageable to perceive it in bounded time, in such times area of text analytics has got the attention of people in the field of linguistics, business, etc. The goal of the post is to summarize few of the visual text analytical techniques that could help you in your initial phase of text mining or help you create a new feature for creating machine learning model. I will describe few online and offline tools that you could use to help you get started. By offline tools, I mean using python based software packages to created visualization and text pre-processing. Online tools will be web-browser based applications to which just have to paste the text or upload the text file to visualise the results.

Implementing K-NearestNeighbour algorithm from scratch in python

K-Nearest Neighbour is the simplest of machine learning algorithms which can be very effective in some cases. The objective of the post it to implement it from scratch in python, you need to know a fair bit of python for and a little bit of numpy for the faster version of the algorithm. Once we have implemented the algorithm we will also see how to improve the performance of the algorithm. As there is no single invincible algorithm, we will look into advantage/disadvantage of the algorithm, this will help us to decide on when to use the algorithm. Alright, then let’s get straight into it.

when a chatbot meets arduino

If you have worked with IOT projects that integrate with other web apps or any other types of system you probably must be dealing with 10 different languages and frameworks, it’s painful. It also gets chaotic to maintain the code base in different language and repository. In such areas of troubles, Nodejs/javascript gives out some ray of hope. In this post, we will create a chat bot that will be able to communicate with Arduino board and its connected sensor and this entire project will be in Nodejs just one language javascript. Nodejs is chosen because it excels at handles I/O operations.

Create a Twitter bot in 4 simple steps

Bots are usually created to automate certain repeated task like you must be visiting your favourite website’s regularly to read the latest post instead you could create a Bot that notifies’s you with the URL of new content which you should check it out. As an example of such Bot, we will create Bot that aggregate memes from 9GAG and Reddit on post it on the Bots Twitter account.

What this Bot will do?

  1. Fetch images from Reddit and 9GAG and tweet it at a regular interval (say 15 mins).
  2. When every we have new follower we will tweet him a friendly welcome message.

A simple DIY smart home IOT project

In the previous article we connected Arduino UNO with ESP8266-01 and tried various AT commands to create an Access Point, making a connection to a Wifi Network, listening to the TCP port and passing message back and forth with AT commands. That was all basic tasting water, in the post we will step up the game and we will read sensor data from Arduino and send it to a central server and then will display that sensor data on the mobile client.

For the purpose of demonstration, we will use DHT11 temperature and humidity sensor and will send temperature and humidity data to the central server. For the server and mobile client, we will use Blynk which is has a server which collects sensor data from various devices and there is also Blynk mobile client which you can create a dashboard with drag and drop interface and play around with visualization of the sensor data.

Connecting ESP8266-01 with Arduino UNO via Software serial

If you ever heard of internet of a thing and have been fascinated by its possibilities and its promises then I am sure you must have heard of Arduino development boards. Arduino is a great piece of hardware to do some quick hardware prototyping and when you are confident with the prototype you go a step further and build more of these devices and make them work for you to do some sort of automation. Adding Wireless capability to Arduino projects will even make it more easy to install/use. In this post, I will connect the ESP8266 chip to Arduino Uno. We will see how we can create Access Point and connect to wireless networks. ESP8266 is a low-cost alternative to Arduino WiFi shield.

Sensing current using Hall Effect Current sensor ACS712 and Arduino

Browsing through Arduino projects on the internet recently I came across ACS712 current Hall sensor which can be used to measure current, it can measure both AC and DC current.I found it pretty cool so I ordered one, this post is about how we can measure AC/DC current consumed by the load connected to the power supply. So of the other applications of the sensor that comes to my mind are sensing if a device is on/off, measure instability in power supply, motor control, over current fault protection, etc.

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now