cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

User16835756816
by Valued Contributor
  • 1069 Views
  • 0 replies
  • 5 kudos

Learn the basics with these resources: Register for an AWS Onboarding Webinar or an Azure Quickstart Lab- Learn the fundamentals from a Customer Succe...

Learn the basics with these resources: Register for an AWS Onboarding Webinar or an Azure Quickstart Lab- Learn the fundamentals from a Customer Success Engineer & get all your onboarding questions answered live.Started using Databricks, but have que...

  • 1069 Views
  • 0 replies
  • 5 kudos
User16835756816
by Valued Contributor
  • 1176 Views
  • 0 replies
  • 6 kudos

Welcome to Databricks! Here you will find resources for a successful onboarding experience. In this group you can ask quick questions and have them an...

Welcome to Databricks! Here you will find resources for a successful onboarding experience. In this group you can ask quick questions and have them answered by experts to unblock and accelerate your ramp up with Databricks.

  • 1176 Views
  • 0 replies
  • 6 kudos
magy
by New Contributor
  • 2946 Views
  • 3 replies
  • 0 kudos

Display, count and write commands stuck after 1st job

Hi, I have problems with displaying and saving a table in Databricks. Simple command can run for hours without any progress..Before that I am not doing any rocket science - code runs in less than a minute, I have one join at the end. I am using 7.3 ...

image
  • 2946 Views
  • 3 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

hi @Just Magy​ ,what is your data source? what type of lazy transformation and actions do you have in your code? Do you partition your data? Please provide more details.

  • 0 kudos
2 More Replies
amitdatabricksc
by New Contributor II
  • 7983 Views
  • 2 replies
  • 0 kudos

AttributeError: 'NoneType' object has no attribute 'repartition'

I am using a framework and i have a query where i am doing,df = seg_df.select(*).write.option("compression", "gzip') and i am getting below error,When i don't do the write.option i am not getting below error. Why is it giving me repartition error. Wh...

  • 7983 Views
  • 2 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

hi @AMIT GADHAVI​ ,could you provide more details? for example, what is your data source? how do you repartition? etc

  • 0 kudos
1 More Replies
eq
by New Contributor III
  • 5116 Views
  • 7 replies
  • 7 kudos

Resolved! Multi-task Jobs orchestration - simulating onComplete status

Currently, we are investigating how to effectively incorporate databricks latest feature for orchestration of tasks - Multi-task Jobs.The default behaviour is that a downstream task would not be executed if the previous one has failed for some reason...

  • 5116 Views
  • 7 replies
  • 7 kudos
Latest Reply
User16844513407
New Contributor III
  • 7 kudos

Hi @Stefan V​ ,My name is Jan and I'm a product manager working on job orchestration. Thank you for your question. At the moment this is not something directly supported yet, this is however on our radar. If you are interested in having a short conve...

  • 7 kudos
6 More Replies
snoeprol
by New Contributor II
  • 5522 Views
  • 3 replies
  • 2 kudos

Resolved! Unable to open files with python, but filesystem shows files exist

Dear community,I have the following problem:%fs mv '/FileStore/Tree_point_classification-1.dlpk' '/dbfs/mnt/group22/Tree_point_classification-1.dlpk'I have uploaded a file of a ML-model and have transferred it to the directory with When I now check ...

  • 5522 Views
  • 3 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

There is dbfs:/dbfs/ displayed maybe file is in /dbfs/dbfs directory? Please check it and try to open with open('/dbfs/dbfs. You can also use "data" from left menu to check what is in dbfs file system more easily.

  • 2 kudos
2 More Replies
alonisser
by Contributor II
  • 2815 Views
  • 2 replies
  • 1 kudos

Resolved! Accessing confluent schema registry from databricks with scala fails with 401 (just for scala, not python, just in databricks)

Nore, I've tested with the same connection variable:locally with scala - works (via the same prod schema registry)in the cluster with python - worksin the cluster with scala - fails with 401 auth errordef setupSchemaRegistry(schemaRegistryUrl: String...

  • 2815 Views
  • 2 replies
  • 1 kudos
Latest Reply
alonisser
Contributor II
  • 1 kudos

Found the issue: it's the uber package mangling some dependency resolving, which I fixedAnother issue, is that currently you can't use 6.* branch of confluent schema registry client in databricks, because the avro version is different then the one su...

  • 1 kudos
1 More Replies
kjoth
by Contributor II
  • 19581 Views
  • 5 replies
  • 5 kudos

Resolved! Databricks default python libraries list & version

We are using data-bricks. How do we know the default libraries installed in the databricks & what versions are being installed. I have ran pip list, but couldn't find the pyspark in the returned list.

  • 19581 Views
  • 5 replies
  • 5 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 5 kudos

Hi @karthick J​ ,If you would like to see all the libraries installed in your cluster and the version, then I will recommend to check the "Environment" tab. In there you will be able to find all the libraries installed in your cluster.Please follow t...

  • 5 kudos
4 More Replies
Erik
by Valued Contributor III
  • 6076 Views
  • 6 replies
  • 7 kudos

Databricks query performance when filtering on a column correlated to the partition-column

(This is a copy of a question I asked on stackoverflow here, but maybe this community is a better fit for the question):Setting: Delta-lake, Databricks SQL compute used by powerbi. I am wondering about the following scenario: We have a column `timest...

  • 6076 Views
  • 6 replies
  • 7 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 7 kudos

In query I would just query first by date (generated from timestamp which we want to query) and than by exact timestamp, so it will use partitioning benefit.

  • 7 kudos
5 More Replies
BradCliQ
by New Contributor II
  • 3157 Views
  • 2 replies
  • 2 kudos

Resolved! Clean up of residual AWS resources when deleting a DB workspace

When deleting a workspace from the Databricks Accounts Console, I noticed the AWS resources (VPC, NAT, etc.) are not removed. Should they be? And if not, is there a clean/simple way of cleaning up the residual AWS resources?

  • 3157 Views
  • 2 replies
  • 2 kudos
Latest Reply
BradCliQ
New Contributor II
  • 2 kudos

Thank you Prabakar - that's what I figured but didn't know if there was documentation on resource cleanup. I'll just go through and find everything the CF stack created and remove them.Regards,Brad

  • 2 kudos
1 More Replies
omsas
by New Contributor
  • 2991 Views
  • 2 replies
  • 0 kudos

How to add Columns for Automatic Fill on Pandas Python

1. I have data x,I would like to create a new column with the condition that the value are 1, 2 or 32. The name of the column is SHIFT where this SHIFT column will be filled automatically if the TIME_CREATED column meets the conditions.3. the conditi...

Columns Table Result of tested
  • 2991 Views
  • 2 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Esteemed Contributor
  • 0 kudos

You an do something like this in pandas. Note there could be a more performant way to do this too. import pandas as pd import numpy as np   df = pd.DataFrame({'a':[1,2,3,4]}) df.head() > a > 0 1 > 1 2 > 2 3 > 3 4   conditions = [(df['a'] <=2...

  • 0 kudos
1 More Replies
SQLArchitect
by New Contributor
  • 1663 Views
  • 1 replies
  • 1 kudos

Writing Records Failing Constraint Requirements to Separate Table when using Delta Live Tables

Are there any plans / capabilities in place or approaches people are using for writing (logging) records failing constraint requirements to separate tables when using Delta Live Tables? Also, are there any plans / capabilities in place or approaches ...

  • 1663 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ryan_Chynoweth
Esteemed Contributor
  • 1 kudos

According to the language reference documentation, I do not believe quarantining records is possible right now out of the box. But there are a few workarounds under the current functionality. Create a second table with the inverse of the expectations...

  • 1 kudos
Sandesh87
by New Contributor III
  • 3525 Views
  • 1 replies
  • 0 kudos

dbutils.secrets.get- NoSuchElementException: None.get

The below code executes a 'get' api method to retrieve objects from s3 and write to the data lake.The problem arises when I use dbutils.secrets.get to get the keys required to establish the connection to s3my_dataframe.rdd.foreachPartition(partition ...

  • 3525 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Howdy @Sandesh Puligundla​ - Thank you for your question. Thank you for your patience. I'd like to give this a bit longer to see how the community responds. Hang tight!

  • 0 kudos
Andyfcx
by New Contributor
  • 2915 Views
  • 2 replies
  • 2 kudos

Resolved! Is it possible to clone a private repository and use it in databricks Repos?

As title, I need to clone code from my private git repo, and use it in my notebook, I do something likedef cmd(command, cwd=None): process = subprocess.Popen(command.split(), stdout=subprocess.PIPE, cwd=cwd) output, error = process.communicate(...

  • 2915 Views
  • 2 replies
  • 2 kudos
Latest Reply
Prabakar
Databricks Employee
  • 2 kudos

Hi @Andy Huang​ , Yes, you can do it if it's accessible from Databricks. Please refer to: https://docs.databricks.com/repos.html#repos-for-git-integrationDatabricks does not support private Git servers, such as Git servers behind a VPN.

  • 2 kudos
1 More Replies
Personal1
by New Contributor II
  • 4430 Views
  • 2 replies
  • 2 kudos

Resolved! Understanding Partitions in Spark Local Mode

I have few fundamental questions in Spark3 while running a simple Spark app in my local mac machine (with 6 cores in total). Please help.local[*] runs my Spark application in local mode with all the cores present on my mac, correct? It also means tha...

  • 4430 Views
  • 2 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

That is a lot of questions in one topic.Let's give it a try:[1] this all depends on the values of the concerning parameters and the program you run(think joins, unions, repartition etc)[2] spark.default.parallelism is by default the number of cores *...

  • 2 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels