cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

labromb
by Contributor
  • 1685 Views
  • 0 replies
  • 0 kudos

Converting a text widget to a list

Hi, Been working on some parallel notebook code, which I have ported to python from the example on the DB website and added some exception handling and that works fine. What I would like to do is paramterise the input but am not succeeding as the fun...

  • 1685 Views
  • 0 replies
  • 0 kudos
zach
by New Contributor III
  • 2389 Views
  • 5 replies
  • 1 kudos

Does Databricks have a google cloud Big Query equivalent of --dry_run to estimate costs before executing?

Databricks uses DBU's as a costing unit whether based onto of AWS/Azure/GCP and I want to know if Databricks has a google cloud Big Query equivalent of --dry_run for estimating costs? https://cloud.google.com/bigquery/docs/estimate-costs

  • 2389 Views
  • 5 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @zach welshman​ â€‹, We haven’t heard from you on the last response from @Werner Stinckens​, and I was checking back to see if you have a resolution yet. If you have any solution, please share it with the community as it can be helpful to others. Ot...

  • 1 kudos
4 More Replies
alonisser
by Contributor
  • 7927 Views
  • 14 replies
  • 7 kudos

Failing to install a library from dbfs mounted storage (adls2) with pass through credentials cluster

We've setup a premium workspace with passthrough credentials cluster , while they do work and access my adls gen 2 storageI can't make it install a library on the cluster from there. and keeping getting"Library installation attempted on the driver no...

  • 7927 Views
  • 14 replies
  • 7 kudos
Latest Reply
alonisser
Contributor
  • 7 kudos

Sorry I can't figure this out, the link you've added is irrelevant for passthrough credentials, if we add it the cluster won't be passthrough, Is there a way to add this just for a specific folder? while keeping passthrough for the rest?

  • 7 kudos
13 More Replies
Taha_Hussain
by Valued Contributor II
  • 1004 Views
  • 0 replies
  • 4 kudos

Databricks Office Hours Register for Office Hours to participate in a live Q&A session with Databricks experts! Our next event is scheduled for Ju...

Databricks Office HoursRegister for Office Hours to participate in a live Q&A session with Databricks experts! Our next event is scheduled for June 22nd from 8:00 am - 9:00am PT.This is your opportunity to connect directly with our experts to ask any...

  • 1004 Views
  • 0 replies
  • 4 kudos
sreedata
by New Contributor III
  • 2581 Views
  • 3 replies
  • 5 kudos

Resolved! Databricks -->Workflows-->Job Runs

In Databricks -->Workflows-->Job Runs we have a column "Run As".From where does this value come. We are getting a user id here but need to change it to a generic account. Any help would be appreciated. Thanks

  • 2581 Views
  • 3 replies
  • 5 kudos
Latest Reply
Anonymous
Not applicable
  • 5 kudos

I appreciate the information and advice you have shared.DNAHRBlock.com

  • 5 kudos
2 More Replies
wyzer
by Contributor II
  • 3500 Views
  • 4 replies
  • 2 kudos

Resolved! Unable to delete a DBFS folder

Hello everyone,I've created by error a DBFS folder named : ${env]But when I run this command :dbutils.fs.rm("/mnt/${env]")It returns me this error : java.net.URISyntaxException: Illegal character in path at index 12: /mnt/$%7Benv]How can I do please ...

  • 3500 Views
  • 4 replies
  • 2 kudos
Latest Reply
User16764241763
Honored Contributor
  • 2 kudos

Hello @Salah K.​ Can you try below?%sh rm -r /dbfs/mnt/$\{env\]

  • 2 kudos
3 More Replies
nadia
by New Contributor II
  • 1366 Views
  • 1 replies
  • 0 kudos

Resolved! Connection Databricks Postgresql

I use Databricks and I try to connect to posgresql via the following code"jdbcHostname = "xxxxxxx"jdbcDatabase = "xxxxxxxxxxxx"jdbcPort = "5432"username = "xxxxxxx"password = "xxxxxxxx"jdbcUrl = "jdbc:postgresql://{0}:{1}/{2}".format(jdbcHostname, jd...

  • 1366 Views
  • 1 replies
  • 0 kudos
Latest Reply
Prabakar
Esteemed Contributor III
  • 0 kudos

hi @Boumaza nadia​ Please check the Ganglia metrics for the cluster. This could be a scalability issue where cluster is overloading. This can happen due to a large partition not fitting into the given executor's memory. To fix this we recommend bump...

  • 0 kudos
shubhamb
by New Contributor III
  • 3387 Views
  • 4 replies
  • 6 kudos

Why does my Notebook fails if I try to load a function from another Notebook in Repos in Databricks

My function in func.pydef lower_events(df): return df.withColumn("event",f.lower(f.col("event")))My main notebook import pyspark.sql.functions as f from pyspark.sql.functions import udf, col, lower import sys   sys.path.append("..") from folder.func...

  • 3387 Views
  • 4 replies
  • 6 kudos
Latest Reply
shubhamb
New Contributor III
  • 6 kudos

@Kaniz Fatma​  https://community.databricks.com/s/question/0D58Y00008ouo6xSAA/how-to-fetch-environmental-variables-saved-in-one-notebook-into-another-notebook-in-databricks-repos-and-notebooksCan you please look into this

  • 6 kudos
3 More Replies
mortenhaga
by Contributor
  • 2282 Views
  • 2 replies
  • 2 kudos

Resolved! Importing python function with spark.read.jdbc in to Repos

Hi all!Before we used Databricks Repos we used the run magic to run various utility python functions from one notebook inside other notebooks, fex like reading from a jdbc connections. We now plan to switch to repos to utilize the fantastic CI/CD pos...

image.png image
  • 2282 Views
  • 2 replies
  • 2 kudos
Latest Reply
mortenhaga
Contributor
  • 2 kudos

Thats...odd. I was sure I had tried that, but now it works somehow. I guess it has to be that now I did it with double quotation marks. Thanks anyway! Works like a charm.

  • 2 kudos
1 More Replies
Marra
by New Contributor III
  • 4901 Views
  • 8 replies
  • 2 kudos

Read temporary views in SQL Analytics

I'm having issues trying to read temporary views in SQL Analytics module. Ive managed to create temporary views based on a query but I don't know how to read from them? Just using the name of the view returns "Table or view not found".

  • 4901 Views
  • 8 replies
  • 2 kudos
Latest Reply
Marra
New Contributor III
  • 2 kudos

No, I'm actually having issues reading from the view in the same session that created it. Using the same view name I get a table or view not found.

  • 2 kudos
7 More Replies
AndriusVitkausk
by New Contributor III
  • 1347 Views
  • 2 replies
  • 1 kudos

Autoloader event vs directory ingestion

For a production work load containing around 15k gzip compressed json files per hour all in a YYYY/MM/DD/HH/id/timestamp.json.gz directoryWhat would be the better approach on ingesting this into a delta table in terms of not only the incremental load...

  • 1347 Views
  • 2 replies
  • 1 kudos
Latest Reply
AndriusVitkausk
New Contributor III
  • 1 kudos

@Kaniz Fatma​ So i've not found a fix for the small file problem using autoloader, seems to struggle really badly against large directories, had a cluster running for 8h stuck on "listing directory" part with no end, cluster seemed completely idle to...

  • 1 kudos
1 More Replies
karthikeyanr
by New Contributor II
  • 4570 Views
  • 6 replies
  • 7 kudos

Unable to import .dbc files in Databricks for "Databricks Developer Foundation Capstone"

Hi,I am not able to import .dbc file into Databricks workspace for "Databricks Developer Foundation Capstone". When I click import the error message is displayed. Secondly when I click the github link in the 'Download the capstone' error 404 is displ...

errorgit import error
  • 4570 Views
  • 6 replies
  • 7 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 7 kudos

Hi @Karthikeyan.r3@cognizant.com R​ , We haven’t heard from you on the last responses, and I was checking back to see if you have a resolution yet. If you have any solution, please share it with the community as it can be helpful to others. Otherwise...

  • 7 kudos
5 More Replies
Constantine
by Contributor III
  • 2620 Views
  • 1 replies
  • 1 kudos

Resolved! Can we reuse checkpoints in Spark Streaming?

I am reading data from a Kafka topic, say topic_a. I have an application, app_one which uses Spark Streaming to read data from topic_a. I have a checkpoint location, loc_a to store the checkpoint. Now, app_one has read data till offset 90.Can I creat...

  • 2620 Views
  • 1 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Moderator
  • 1 kudos

Hi @John Constantine​,Is not recommended to share the checkpoint with your queries. Every streaming query should have their own checkpoint. If you can to start at the offset 90 in another query, then you can define it when starting your job. You can ...

  • 1 kudos
bhargavi1
by New Contributor II
  • 1310 Views
  • 1 replies
  • 1 kudos
  • 1310 Views
  • 1 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Moderator
  • 1 kudos

Hi,Could you share more details on what you have tried? please provide more details.

  • 1 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels
Top Kudoed Authors