cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Lincoln_Bergeso
by New Contributor II
  • 11262 Views
  • 8 replies
  • 4 kudos

Resolved! How do I read the contents of a hidden file in a Spark job?

I'm trying to read a file from a Google Cloud Storage bucket. The filename starts with a period, so Spark assumes the file is hidden and won't let me read it.My code is similar to this:from pyspark.sql import SparkSession   spark = SparkSession.build...

  • 11262 Views
  • 8 replies
  • 4 kudos
Latest Reply
Dan_Z
Databricks Employee
  • 4 kudos

I don't think there is an easy way to do this. You will also break very basic functionality (like being able to read Delta tables) if you were able to get around these constraints. I suggest you employ a rename job and then read.

  • 4 kudos
7 More Replies
patojo94
by New Contributor II
  • 8583 Views
  • 0 replies
  • 0 kudos

Adding deduplication method to spark streaming

Hi everyone, I am having some troubles to add a deduplication step on a file streaming that is already running. The code I am trying to add is this one:df = df.withWatermark("arrival_time", "20 minutes")\ .dropDuplicates(["event_id", "arrival_time"])...

  • 8583 Views
  • 0 replies
  • 0 kudos
Jin_Kim
by New Contributor II
  • 7906 Views
  • 2 replies
  • 4 kudos

Resolved! address how to use multiple spark streaming jobs connecting to one job cluster

Hi,We have a scenario where we need to deploy 15 spark streaming applications on databricks reading from kafka to single Job cluster. We tried following approach:1. create job 1 with new job cluster (C1)2. create job2 pointing to C1...3. create job15...

  • 7906 Views
  • 2 replies
  • 4 kudos
Latest Reply
Jin_Kim
New Contributor II
  • 4 kudos

@Hubert Dudek​ , thanks a lot for responding. When we have setup like this, if one tasks fails, it will not terminate the entire job right?Since, the job is continously running as it is streaming app, is it possible to add new task to the job(while i...

  • 4 kudos
1 More Replies
Carneiro
by New Contributor II
  • 8785 Views
  • 2 replies
  • 2 kudos

Resolved! Stuck in "Running Command ..."

Hi, Since yesterday, without a known reason, some commands that used to run daily are now stuck in a "Running command" state. Commands as:dataframe.toPandas() dataframe.show(n=1) dataframe.description() dataframe.write.format("csv").save(location) ge...

  • 8785 Views
  • 2 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 2 kudos

Hi @Luiz Carneiro​ ,Could you split your Spark's actions into more CMD (paragraphs) and run one at a time to check where it could be taking the extra time. Also, Pandas only runs on your driver. Have you try to use Python or Scala APIs instead? in ca...

  • 2 kudos
1 More Replies
RengarLee
by Contributor
  • 5803 Views
  • 5 replies
  • 0 kudos

Resolved! How to improve Spark Streaming writer Input Rate and Processing rate?

Hi!I have many questions about Spark Streaming and Evnethub。Can you help me?Q1:How to improve Spark Streaming writer Input Rate and Processing rate?I connect Azure Eventhubs using Spark Streaming(Azure Databricks), but I found if I use display, this ...

  • 5803 Views
  • 5 replies
  • 0 kudos
Latest Reply
RengarLee
Contributor
  • 0 kudos

setMaxEventsPerTrigger not equal to numInputRow is my problem

  • 0 kudos
4 More Replies
shan_chandra
by Databricks Employee
  • 4960 Views
  • 1 replies
  • 3 kudos

Resolved! How to execute matplotlib animations in a Databricks notebook?

How to execute matplotlib animations in a Databricks notebook?

  • 4960 Views
  • 1 replies
  • 3 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 3 kudos

Please refer to the below example code and , use displayHTML(ani.to_jshtml()) to execute matplotlib animations in a databricks notebookimport matplotlib.pyplot as plt import matplotlib.animation import numpy as np t = np.linspace(0,2*np.pi) x = np.si...

  • 3 kudos
Orianh
by Valued Contributor II
  • 3933 Views
  • 2 replies
  • 2 kudos

Resolved! pyodbc read only connection.

Hey Guys, Is there a way to open pyodbc read only connection with simba spark driver? At the moment, I'm able to execute queries such as select , delete, insert into - basically every sql statement using pyodbc. I tried to open pyodbc connection but ...

  • 3933 Views
  • 2 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 2 kudos

This readonly=True is working only on some drivers. Just create additional users with granted read-only permission.

  • 2 kudos
1 More Replies
sbahm
by New Contributor III
  • 4702 Views
  • 4 replies
  • 4 kudos

Resolved! Issue with adding gitlab credentials to the databricks for "Git integration"

Hi,we have configured our infrastructure by terraform in AZURE, now we want to config GitLab integration with databriks to automate notebook and job deployment. I sow that now this step is available only via databricks UI interface, can you share som...

  • 4702 Views
  • 4 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 4 kudos

Actually Repos API is already available https://docs.databricks.com/dev-tools/api/latest/repos.html#operation/create-repo

  • 4 kudos
3 More Replies
alejandrofm
by Valued Contributor
  • 4548 Views
  • 4 replies
  • 1 kudos

Resolved! Can't see execution plan graph on all-purpose cluster

On a current running all-purpose-cluster, enter the spark UI, then SQL, and into a task, you can see the details, and SQL properties but the visualization doesn't appear, that graph is very useful to debug certain scenarios.It works fine on jobs.Any ...

  • 4548 Views
  • 4 replies
  • 1 kudos
Latest Reply
alejandrofm
Valued Contributor
  • 1 kudos

I'm on Chrome, sometimes appears sometimes not, will look more into that to give something reproducible. Thanks!

  • 1 kudos
3 More Replies
Mr__E
by Contributor II
  • 2145 Views
  • 1 replies
  • 1 kudos

Resolved! SSO and cluster creation restriction

Accounts added after we turned on SSO don't allow me to restrict their cluster creation abilities. How can I undo this, so I can prevent business people from writing to ETLed data?

  • 2145 Views
  • 1 replies
  • 1 kudos
Latest Reply
Mr__E
Contributor II
  • 1 kudos

Nevermind. Turns out someone was giving everyone admin privileges when they weren't supposed to and I didn't notice.

  • 1 kudos
govind
by New Contributor
  • 2910 Views
  • 2 replies
  • 0 kudos

Write 160M rows with 300 columns into Delta Table using Databricks?

Hi, I am using databricks to load data from one delta table into another delta table. I'm using SIMBA Spark JDBC connector to pull data from delta table in my source instance and writing into delta table in my databricks instance. The source has...

  • 2910 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @govind@dqlabs.ai​ Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

  • 0 kudos
1 More Replies
Zii
by New Contributor II
  • 4378 Views
  • 0 replies
  • 1 kudos

Delta Live Tables Quality check for distinct Values

Hi All, I have been having an issue identifying how to do a uniqueness check for the quality check. Below is an example. @dlt.expect("origin_not_dup", "origin is distinct from origin") def harmonized_data(): df=dlt.read("raw_data") for col in...

  • 4378 Views
  • 0 replies
  • 1 kudos
Vikram
by New Contributor II
  • 5116 Views
  • 4 replies
  • 4 kudos

Resolved! CVE-2022-0778

How can we update the OpenSSL version for the cluster to address this vulnerability ?https://ubuntu.com/security/CVE-2022-0778Tried with this global init script to auto update the openssl version but does not seem to work as apt-utils is missing. apt...

  • 5116 Views
  • 4 replies
  • 4 kudos
Latest Reply
Atanu
Databricks Employee
  • 4 kudos

I can see below from our internal communication. CVSSv3 score: 4.0 (Medium) AV:N/AC:H/PR:N/UI:N/S:C/C:N/I:N/A:LReference: https://www.openssl.org/news/secadv/20220315.txtSeverity: HighThe BN_mod_sqrt() function, which computes a modular square root, ...

  • 4 kudos
3 More Replies
Labels