What Is Go Like for Data Science?

Share on facebook
Share on google
Share on twitter
Share on linkedin
Go, or Golang is a very similar programming language but offers memory safety, garbage collection, and structural typing. Google designed it in 2007 to improve productivity in the era of multicore, networked machines, and large codebases (Wikipedia)

Go is a statically typed language and not commonly used for Data Science such as Python or R. But it is worth trying. So let’s figure out how you can do Data Science in the Golang by Google.


Requirements

You will need to have Go installed on your machine.


Getting Started

To get started, we create a new project, which is done as follows, you create a new file called main.go and you enter the following starter code:

package main
import (
"fmt"
"log"
"os"
)
func main() {
}

And that’s all you need to get started. Let’s try to run it to see if we don’t get any errors with: go run main.go


Getting the Data

In most crash courses to learn Data Science in Python or R, we use the Iris dataset, and we will use it this time.

In Go, there is a default function for reading CSV files. We use the OS Module to do this. To read the CSV, we enter the following code into our main func() :

package main
import (
"log"
"os"
)
func main() {
iris, err := os.Open("data/iris.csv")
if err != nil {
log.Fatal(err)
}
}

Processing the Data

If you’ve done any work with Python or R for data science, you are certainly familiar with the concept of a DataFrame. To use DataFrames in Go, we use the dataframe module, let’s install it:

go get github.com/kniren/gota/dataframe

And now we can include it in our imports:

import (
"log"
"os"
"dataframe"
)

To now actually, convert the data to a DataFrame use the following line of code:

df := dataframe.ReadCSV(iris)

Now print df to the console:

print(df)

This will output:

[150x5] DataFrame
sepal.length sepal.width petal.length petal.width variety
0: 5.100000 3.500000 1.400000 0.200000 Setosa
1: 4.900000 3.000000 1.400000 0.200000 Setosa
2: 4.700000 3.200000 1.300000 0.200000 Setosa
3: 4.600000 3.100000 1.500000 0.200000 Setosa
4: 5.000000 3.600000 1.400000 0.200000 Setosa
5: 5.400000 3.900000 1.700000 0.400000 Setosa
6: 4.600000 3.400000 1.400000 0.300000 Setosa
7: 5.000000 3.400000 1.500000 0.200000 Setosa
8: 4.400000 2.900000 1.400000 0.200000 Setosa
9: 4.900000 3.100000 1.500000 0.100000 Setosa
... ... ... ... ...
<float> <float> <float> <float> <string>

Using the Data

Now, reading data from CSV is cool and all that, but now we really want to do something with this data frame, right?

Getting The Head of the Data

To acquire the head of the data frame, use the following method:

head := df.Subset([]int{0, 3})

This will look for the first three rows.

In the console, this would look like this:

[2x5] DataFrame
sepal.length sepal.width petal.length petal.width variety
0: 5.100000 3.500000 1.400000 0.200000 Setosa
1: 4.600000 3.100000 1.500000 0.200000 Setosa
<float> <float> <float> <float> <string>

If Go can handle data manipulation decently, it might be worth exploring further, considering all non-data science-related things it holds over Python.

Let’s say you’re only interested in virginica species and want to store them to a specific variable. Here’s how you would do that:

virginica := df.Filter(dataframe.F{
Colname: "variety",
Comparator: "==",
Comparando: "Virginica",
})
fmt.Println(virginica)

With the following output:

[50x5] DataFrame
sepal.length sepal.width petal.length petal.width variety
0: 6.300000 3.300000 6.000000 2.500000 Virginica
1: 5.800000 2.700000 5.100000 1.900000 Virginica
2: 7.100000 3.000000 5.900000 2.100000 Virginica
3: 6.300000 2.900000 5.600000 1.800000 Virginica
4: 6.500000 3.000000 5.800000 2.200000 Virginica
5: 7.600000 3.000000 6.600000 2.100000 Virginica
6: 4.900000 2.500000 4.500000 1.700000 Virginica
7: 7.300000 2.900000 6.300000 1.800000 Virginica
8: 6.700000 2.500000 5.800000 1.800000 Virginica
9: 7.200000 3.600000 6.100000 2.500000 Virginica
... ... ... ... ...
<float> <float> <float> <float> <string>

Conclusion

To conclude this article about trying Go for data science I would like to say that using Go for Data Science is possible but I would recommend Python, it’s much faster and lots easier.

Python is partly made for Data Science and Go just isn’t, so don’t use the wrong tools.

If you’re trying to learn about Data Science I would not recommend using Go to start with as the language itself is difficult enough since it’s statically typed and Python is not.

bryan@dijkhuizenmedia.com

bryan@dijkhuizenmedia.com

Leave a Replay

Sign up for our Newsletter

Click edit button to change this text. Lorem ipsum dolor sit amet, consectetur adipiscing elit