cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

User16826992666
by Valued Contributor
  • 1567 Views
  • 1 replies
  • 0 kudos

Resolved! Are Delta tables able to support GDPR compliance?

I know that when deletes are made from a Delta table the underlying files are not actually removed. For compliance reasons I need to able to truly delete the records. How can I know which files need to be removed, and is there a way to remove them ot...

  • 1567 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

Here is a document explaining best practices for GDPR and CCPA compliance using Delta Lake. Specifically on cleaning up stale data - you can use the VACUUM function to remove files that are no longer referenced by a Delta table and are older than a s...

  • 0 kudos
User16765131552
by Contributor III
  • 3160 Views
  • 0 replies
  • 0 kudos

Dataframe.write with table containing Always generate columns and auto generate columns is failing(SQL SERVER + sql-spark-connector)

Dataframe write to SQL Server table containing Always autogenerate column fails. I am using Apache Spark Connector for SQL Server and Azure SQL. When autogenerate field are not included in dataframe, I encountered - "No key found " error If auto-gene...

  • 3160 Views
  • 0 replies
  • 0 kudos
jose_gonzalez
by Databricks Employee
  • 3058 Views
  • 1 replies
  • 0 kudos

Resolved! can I use DBconnect to connect any DBR version?

I would like to know if I can connect using to DBconnect to any DBR version or if only the supported version will work?

  • 3058 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Only the following Databricks Runtime versions are supported:Databricks Runtime 8.1 ML, Databricks Runtime 8.1Databricks Runtime 7.3 LTS ML, Databricks Runtime 7.3 LTSDatabricks Runtime 6.4 ML, Databricks Runtime 6.4Databricks Runtime 5.5 LTS ML, Dat...

  • 0 kudos
MoJaMa
by Databricks Employee
  • 1259 Views
  • 1 replies
  • 0 kudos
  • 1259 Views
  • 1 replies
  • 0 kudos
Latest Reply
MoJaMa
Databricks Employee
  • 0 kudos

Currently there is no concept of "Cluster Owner". https://docs.databricks.com/security/access-control/cluster-acl.html#cluster-level-permissionsSo you have to clone the cluster, thus making the person who cloned it the creator of the new cluster. The...

  • 0 kudos
jose_gonzalez
by Databricks Employee
  • 1472 Views
  • 1 replies
  • 0 kudos

Resolved! How can I connect my favorite IDE, like Pycharm to Databricks cluster?

I would like to know if there is a way to connect to Databricks cluster using my IDE

  • 1472 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Databricks connect allows you to connect your favorite IDE to Databricks clusters. You can find more details on how to set it up and install all the libraries https://docs.databricks.com/dev-tools/databricks-connect.html

  • 0 kudos
aladda
by Databricks Employee
  • 1437 Views
  • 1 replies
  • 0 kudos
  • 1437 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Databricks Employee
  • 0 kudos

There’s two places to leverage Github for content management and version control in DatabricksRepos for Git integration - Repos are folders whose contents are co-versioned together by syncing them to a remote Git repository. Repos can contain only Da...

  • 0 kudos
User15787040559
by Databricks Employee
  • 1594 Views
  • 1 replies
  • 0 kudos

How to translate Apache Pig FOREACH GENERATE statement to Spark?

If you have the following Apache Pig FOREACH GENERATE statement:XBCUD_Y_TMP1 = FOREACH (FILTER XBCUD BY act_ind == 'Y') GENERATE cust_hash_key,CONCAT(brd_abbr_cd,ctry_cd) as brdCtry:chararray,updt_dt_hash_key;the equivalent code in Apache Spark is:XB...

  • 1594 Views
  • 1 replies
  • 0 kudos
Latest Reply
User15725630784
Databricks Employee
  • 0 kudos

the equivalent code in Apache Spark is:XBCUD_Y_TMP1_DF = (XBCUD_DF .filter(col("act_ind") == "Y") .select(col("cust_hash_key"), concat(col("brd_abbr_cd"),col("ctry_cd")).alias("brdCtry"), col("updt_dt_hash_key")) )

  • 0 kudos
User15787040559
by Databricks Employee
  • 2337 Views
  • 1 replies
  • 0 kudos

What timezone is the “timestamp” value in the Databricks Usage log?

What timezone is the “timestamp” value in the Databricks Usage log ?Is it UTC?timestamp2020-12-01T00:59:59.000ZNeed to match this to AWS Cost Explorer timezone for simplicity.It's UTC.Please see timestamp under Audit Log Schema https://docs.databrick...

  • 2337 Views
  • 1 replies
  • 0 kudos
Latest Reply
User15725630784
Databricks Employee
  • 0 kudos

UTC

  • 0 kudos
User16765131552
by Contributor III
  • 2633 Views
  • 1 replies
  • 1 kudos

Resolved! Create a new cluster in Databricks using databricks-cli

I'm trying to create a new cluster in Databricks on Azure using databricks-cli.I'm using the following command:databricks clusters create --json '{ "cluster_name": "template2", "spark_version": "4.1.x-scala2.11" }'And getting back this error: Error: ...

  • 2633 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16765131552
Contributor III
  • 1 kudos

I found the right answer here.The correct format to run this command on azure is:databricks clusters create --json '{ "cluster_name": "my-cluster", "spark_version": "4.1.x-scala2.11", "node_type_id": "Standard_DS3_v2", "autoscale" : { "min_workers": ...

  • 1 kudos
User16830818524
by New Contributor II
  • 22499 Views
  • 1 replies
  • 0 kudos

Read Delta Table with Pandas

Is it possible to read a Delta table directly into a Pandas Dataframe?

  • 22499 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Databricks Employee
  • 0 kudos

You'd have convert a delta table to pyarrow and then use to_pandas. See https://databricks.com/blog/2020/12/22/natively-query-your-delta-lake-with-scala-java-and-python.html for details# Create a Pandas Dataframe by initially converting the Delta Lak...

  • 0 kudos
User15787040559
by Databricks Employee
  • 1741 Views
  • 1 replies
  • 1 kudos

Why do we need the ec2:CreateTags and ec2:DeleteTags permissions?

Why do we need the ec2:CreateTags and ec2:DeleteTags permissions?Are they required?Are ec2 tags used internally as well?

  • 1741 Views
  • 1 replies
  • 1 kudos
Latest Reply
User15787040559
Databricks Employee
  • 1 kudos

Yes, it’s required. It’s how Databrics tracks and tags resources.The tags are used to identify the owner of clusters on the AWS side and Databricks uses the tag information internally as well.

  • 1 kudos
MoJaMa
by Databricks Employee
  • 1777 Views
  • 1 replies
  • 0 kudos
  • 1777 Views
  • 1 replies
  • 0 kudos
Latest Reply
MoJaMa
Databricks Employee
  • 0 kudos

Yes. We can convert an existing workspace to PrivateLink on E2.So you can have one workspace that's on PL and one that's not.Please contact your Databricks Representative and we can help you make this change.

  • 0 kudos
HowardWong
by New Contributor II
  • 709 Views
  • 0 replies
  • 0 kudos

How do you handle Kafka offsets in a DR scenario?

If on one region running a structured streaming job with a checkpoint fails for whatever reason, DR kicks in to run a job in another region. What is the best way for the pick up the offset to continue where the failed job stopped?

  • 709 Views
  • 0 replies
  • 0 kudos
User16826994223
by Honored Contributor III
  • 1138 Views
  • 1 replies
  • 1 kudos

Does Databricks provide any isolation mechanisms when deployed in my account?

Does Databricks provide any isolation mechanisms when deployed in my account?

  • 1138 Views
  • 1 replies
  • 1 kudos
Latest Reply
Mooune_DBU
Valued Contributor
  • 1 kudos

If you're running on AWS: Databricks deploys Spark nodes in an Amazon Virtual Private Cloud (VPC) running in the customer’s own AWS account, giving the customer full control over their data and instances. VPCs enable customers to isolate the network ...

  • 1 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels