cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ssy
by New Contributor II
  • 1926 Views
  • 2 replies
  • 0 kudos

How to configure pip file to include libraries from a proxy location

I need to configure pip file to include login credentials to allow for libraries to download from corporate artifactory. I'm trying to learn how to open a config file within databricks and add my credentials and package information. I will then have ...

  • 1926 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Samy Syed​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 0 kudos
1 More Replies
Jfoxyyc
by Valued Contributor
  • 1399 Views
  • 2 replies
  • 0 kudos

DLT - deduplication pattern?

Say we have an incremental append happening using autoloader, where filename is being added to the dataframe and that's all. If we want to de-duplicate this data in a rolling window, we can do something like merge into logs using dedupedLogs on ...

  • 1399 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Jordan Fox​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 0 kudos
1 More Replies
Nirbhay
by New Contributor II
  • 1096 Views
  • 3 replies
  • 0 kudos

Databricks community edition login issue

I am unable to login to data bricks community edition with my login id nirbhay.singh06@gmail.comPlease help me or send me mail if possible what so ever is the solution.This is required for my practice what should i do why every time getting issue her...

  • 1096 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Nirbhay Singh​ Thank you for reaching out, and we’re sorry to hear about this log-in issue! We have this Community Edition login troubleshooting post on Community. Please take a look, and follow the troubleshooting steps. If the steps do not reso...

  • 0 kudos
2 More Replies
Akshith_Rajesh
by New Contributor III
  • 1474 Views
  • 4 replies
  • 1 kudos

Does DataBricks lock the file in Adls Gen 2 before writing (Append) to a file If yes then how can we fetch the file is locked

I have a requirement , I am running 2 Notebooks parallelly I want to overwrite the file parallelly .If 2 Notebooks Try to overwrite the file at the same time , will I lose the data because of overwriting the file at the same time .I want to overwr...

  • 1474 Views
  • 4 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Rajesh Akshith​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Tha...

  • 1 kudos
3 More Replies
essentialDatabr
by New Contributor II
  • 1599 Views
  • 1 replies
  • 1 kudos

Confusion about {{run_id}} and {{parent_run_id}} variables for Databricks jobs (Azure)

In Databricks jobs on Azure you can use the {{run_id}} and {{parent_run_id}}variables for a specific run: https://docs.databricks.com/workflows/jobs/jobs.htmlFor Databricks jobs with only two or more tasks, then {{run_id}} seems to correspond to task...

  • 1599 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Kasper H​ :Yes, you are correct in your understanding that in Databricks jobs with multiple tasks, the {{run_id}} variable corresponds to the task_run_id and the {{parent_run_id}} variable corresponds to the job_run_id.For Databricks jobs with only ...

  • 1 kudos
asethia
by New Contributor
  • 2787 Views
  • 1 replies
  • 0 kudos

delta lake in Apache Spark

Hi,As per documentation https://docs.delta.io/latest/quick-start.html , we can configure DeltaCatalog using spark.sql.catalog.spark_catalog.The Iceberg supports two Catalog implementations (https://iceberg.apache.org/docs/latest/spark-configuration/#...

  • 2787 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Arun Sethia​ :Yes, Delta Lake also supports custom catalogs. Delta Lake uses the Spark Catalog API, which allows for pluggable catalog implementations. You can implement your own custom catalog to use with Delta Lake.To use a custom catalog, you can...

  • 0 kudos
kll
by New Contributor III
  • 3563 Views
  • 1 replies
  • 1 kudos

Resolved! OSError: Invalid argument when attempting to save a pandas dataframe to csv

I am attempting to save a pandas DataFrame to as csv to a directory I created in Databricks workspace or in the `cwd`. import pandas as pd   import os   df.to_csv("data.csv", index=False)   df.to_csv(str(os.getcwd()) + "/data.csv", index=False)      ...

  • 3563 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 1 kudos

Hi @Keval Shah​ ,You can save your dataframe to csv in dbfs storage.Please refer below code that might help you-df = pd.read_csv(StringIO(data), sep=',') #print(df) df.to_csv('/dbfs/FileStore/ajay/file1.txt')

  • 1 kudos
Kaijser
by New Contributor II
  • 2674 Views
  • 4 replies
  • 1 kudos

Logging clogged up with error messages (OSError: [Errno 95] Operation not supported, --- Logging error ---)

I have encountered this issue for a while now and it happens each run that is triggered. I discovered 2 things:1) If I run my script on a cluster that is not active and the cluster is activated by a scheduled trigger (not manually!) this doesn't happ...

  • 2674 Views
  • 4 replies
  • 1 kudos
Latest Reply
manasa
Contributor
  • 1 kudos

Hi @Aaron Kaijser​ Are you able to your logfile to ADLS?If yes, could you please explain how you did it

  • 1 kudos
3 More Replies
Retko
by Contributor
  • 4000 Views
  • 2 replies
  • 3 kudos

Resolved! How to quickly check if Delta Table is Empty

Hi,I need some quick way to return True if Delta Table is Empty.Tried this, but is is quite slow when checking more tables.spark.read.table("table_name").count()spark.read.table("table_name").rdd.isEmpty()len(spark.read.table("table_name").head(1)) =...

  • 4000 Views
  • 2 replies
  • 3 kudos
Latest Reply
Vartika
Moderator
  • 3 kudos

Hi @Retko Okter​ Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best? If not, please tell us so we can help you.Thanks!

  • 3 kudos
1 More Replies
bharathi
by New Contributor
  • 871 Views
  • 2 replies
  • 1 kudos

Hive database

The hive database and tables created in my workspace is not visible for other users when we were trying to access the databricks created at our work place

  • 871 Views
  • 2 replies
  • 1 kudos
Latest Reply
Vartika
Moderator
  • 1 kudos

Hi @bharathi vish​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedbac...

  • 1 kudos
1 More Replies
uzairm
by New Contributor III
  • 3484 Views
  • 2 replies
  • 1 kudos

My whole code is running on driver node, I want my code to run on worker nodes so that the memory of driver node is not exhausted. Please tell me improvement is my codes. My spark crashes frequently when the pulled data from s3 is huge.

I am running process which has 4 steps.Querying s3 file paths from dynamo DB based on certain parameters given by user. (function to do so provided by client, just have to import). Returns a list of filesCheck if those file paths have already been qu...

  • 3484 Views
  • 2 replies
  • 1 kudos
Latest Reply
Vartika
Moderator
  • 1 kudos

Hi @uzair mustafa​ Thank you for posting your question in our community! We are happy to assist you.Does @Suteja Kanuri​'s answer help? If it does, would you be happy to mark it as best?This will help other community members who may have similar ques...

  • 1 kudos
1 More Replies
Joao_DE
by New Contributor III
  • 1435 Views
  • 2 replies
  • 0 kudos

Run pytest inside repos and store the results in dbfs

Hi everyone!I am trying to run pytest inside a notebook on repos and store the results inside dbfs but i am getting an error stating permission denied, does anyone know why this happens and the solution. Error:

image image
  • 1435 Views
  • 2 replies
  • 0 kudos
Latest Reply
Vartika
Moderator
  • 0 kudos

Hi @João Peixoto​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so ...

  • 0 kudos
1 More Replies
pvignesh92
by Honored Contributor
  • 2290 Views
  • 4 replies
  • 2 kudos

Resolved! Pls restrict Spamming

Hi @Vidula Khanna​ , Recently there has been too many spams posted in the community discussions. I'm sure you might have noticed them. Is there any chance to clear all of them and may be restrict them in some way so that the purpose of this community...

  • 2290 Views
  • 4 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

@Suteja Kanuri​ using Azure OpenAI GPT4 we could connect it to the community and use 2 features of it.Ask the question, "is it spam?" and verify the user post this way.Display ready answers by OpenAI so we will avoid asking over and over again duplic...

  • 2 kudos
3 More Replies
pandu
by New Contributor II
  • 1699 Views
  • 2 replies
  • 3 kudos

connect to Oracle database using JDBC and perform merge condition

I would like to connect to oracle database using JDBC driver and write a code to perform merge condition using python.

  • 1699 Views
  • 2 replies
  • 3 kudos
Latest Reply
Vartika
Moderator
  • 3 kudos

Hi @Venkata Krishna Jonnalagadda​ Hope you are well.Just checking in. If @John Lourdu​'s answer helped, would you let us know and mark the answer as best? If not, would you be happy to give us more information?Thanks!

  • 3 kudos
1 More Replies
William_Scardua
by Valued Contributor
  • 821 Views
  • 2 replies
  • 1 kudos

How to get executors info by SDK (Python)

Hi guys,How I get executors information to my cluster by SDK (Python) have any idea ?Thank you

executors
  • 821 Views
  • 2 replies
  • 1 kudos
Latest Reply
Vartika
Moderator
  • 1 kudos

Hi @William Scardua​ We haven't heard from you since the last response from @josephk and I was checking back to see if it helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to others. Also, Please d...

  • 1 kudos
1 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels