Pacific-Design.com

    
Home Index

1. Cassandra

+ CRUD

+ Certification

+ Compaction

+ Compression

+ Configuration

+ Create Partitions

+ Heap Dump

+ Load Data

+ Perl

+ Read CVS File

+ Replication

+ Scala

+ Schema

+ Search By Key

+ Search By Value

+ nodetool

+ sstablekeys

Cassandra /

Apache Cassandra - Distributed Database

Creating a Cassandra Table from a Dataset

import com.datastax.spark.connector._

val df = spark
  .read
  .cassandraFormat("words", "test")
  .load()

val renamed = df.withColumnRenamed("col1", "newcolumnname")
renamed.createCassandraTable(
    "test", 
    "renamed", 
    partitionKeyColumns = Some(Seq("user")), 
    clusteringKeyColumns = Some(Seq("newcolumnname")))

renamed.write
  .cassandraFormat("renamed", "test")
  .save()

https://github.com/datastax/spark-cassandra-connector/blob/master/doc/14_data_frames.md

Append Dataset to Cassandra Table

        profilesDS.write.format("org.apache.spark.sql.cassandra")
            .options(Map("keyspace" -> "test", "table" -> "table3"))
            .mode("append")
            .save()

        //profilesDS.write.mode(SaveMode.Append).insertInto("table3")

        val df = spark.read.format("org.apache.spark.sql.cassandra")
            .options(Map("table" -> "table3", "keyspace" -> "test"))
            .load().toDF()
        df.show(10, false)

Cleaning Data from bad characters


$ tr -cd '\11\12\15\40-\176' < input_file > output_file.txt