cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

anonturtle
by New Contributor
  • 676 Views
  • 1 replies
  • 0 kudos

How does automl classify which feature is numeric or categorical?

When running automl on its UI, it classifies a feature "local_convenience_store" as both a numeric and categorical column. This affects the result as for numeric columns a scaler is used while in a categorical column it is one hot encoded. For contex...

  • 676 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@hr then​ :The approach taken by AutoML to classify features as numeric or categorical depends on the specific AutoML framework or library being used, as different implementations may use different methods or heuristics to make this determination.In ...

  • 0 kudos
bluesky
by New Contributor II
  • 1354 Views
  • 2 replies
  • 1 kudos

Identity error Spark Sql:not enough data columns;target has 3 but the inserted data has 2, it's the identity column which is missing here

While inserting into target table i am getting an error '"not enough data columns;target has 3 but the inserted data has 2" but it's the identity column which is the 8th column ".insert into table A(col 1,col 2,col3)select col2,col3from table Bjoin t...

  • 1354 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @sky blue​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 1 kudos
1 More Replies
Gilg
by Contributor II
  • 2067 Views
  • 1 replies
  • 0 kudos

Adding column as StructType

Hi Team,Just wondering, how can I add a column to an existing table.I'd tried the below script but giving me an error:ParseException: [PARSE_SYNTAX_ERROR] Syntax error at or near '<'(line 1, pos 121)ALTER TABLE table_clone ADD COLUMNS col_name1 STRUC...

  • 2067 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Gil Gonong​ :In Databricks, you can add a column to an existing table using the ALTER TABLE statement in SQL. Here is an example:ALTER TABLE table_clone ADD COLUMN col_name1 STRUCT< type: STRING, values: ARRAY<STRING> >Note that you need to ...

  • 0 kudos
MerelyPerfect
by New Contributor II
  • 1960 Views
  • 3 replies
  • 1 kudos

read base64 json column with Autoloader and inferschema.

I have json files falling in our blob with two fields, 1. offset(integer), 2. value(base64).This value column is json with unicode. so they sent it as base64. Challenge is this json is very large with 100+ fields. so we cannot define the schema. We c...

  • 1960 Views
  • 3 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @MerelyPerfect Per​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you....

  • 1 kudos
2 More Replies
ramankr48
by Contributor II
  • 11709 Views
  • 5 replies
  • 8 kudos

Resolved! How to get all the tables name with a specific column or columns in a database?

let's say there is a database db in which 700 tables are there, and we need to find all the tables name in which column "project_id" is present.just an example for ubderstanding the questions.

  • 11709 Views
  • 5 replies
  • 8 kudos
Latest Reply
Anonymous
Not applicable
  • 8 kudos

databaseName = "db" desiredColumn = "project_id" database = spark.sql(f"show tables in {databaseName} ").collect() tablenames = [] for row in database: cols = spark.table(row.tableName).columns if desiredColumn in cols: tablenames.append(row....

  • 8 kudos
4 More Replies
thushar
by Contributor
  • 1249 Views
  • 6 replies
  • 0 kudos

GeneratedAlwaysAs' along with dataframe.write

Is it possible to use a calculated column (as like in the delta table using generatedAlwaysAs) definition while writing the data frame as a delta file like df.write.format("delta").Any options are there with the dataframe.write method to achieve this...

  • 1249 Views
  • 6 replies
  • 0 kudos
Latest Reply
pvignesh92
Honored Contributor
  • 0 kudos

Hi @Thushar R​ ,This option is not a part of Dataframe write API as GeneratedAlwaysAs feature is only applicable to Delta format and df.write is a common API to handle writes for all formats. If you to achieve this programmatically, you can still use...

  • 0 kudos
5 More Replies
chanansh
by Contributor
  • 840 Views
  • 2 replies
  • 0 kudos

how to compute difference over time of a spark structure streaming?

I have a table with a timestamp column (t) and a list of columns for which I would like to compute the difference over time (v), by some key(k): v_diff(t) = v(t)-v(t-1) for each k independently.Normally I would write:lag_window = Window.partitionBy(C...

  • 840 Views
  • 2 replies
  • 0 kudos
Latest Reply
chanansh
Contributor
  • 0 kudos

I found this but could not make it work https://www.databricks.com/blog/2022/10/18/python-arbitrary-stateful-processing-structured-streaming.html

  • 0 kudos
1 More Replies
rocky5
by New Contributor III
  • 1342 Views
  • 1 replies
  • 2 kudos

Cannot create delta live table

I created a simple definition of delta live table smth like:CREATE OR REFRESH STREAMING LIVE TABLE customers_silverAS SELECT * FROM STREAM(LIVE.customers_bronze)But I am getting an error when running a pipeline:com.databricks.sql.transaction.tahoe.De...

  • 1342 Views
  • 1 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Moderator
  • 2 kudos

You might need to execute the following on your tables to avoid this error message ALTER TABLE <table_name> SET TBLPROPERTIES ( 'delta.minReaderVersion' = '2', 'delta.minWriterVersion' = '5', 'delta.columnMapping.mode' = 'name' )Docs https...

  • 2 kudos
sonali1996
by New Contributor
  • 663 Views
  • 2 replies
  • 0 kudos

adding Widget as a column and populating its value every-time in that column in a table.

hi , I want date for runtime from ADF as @utcnow() -- base paramater of notebook activity in ADF and take the data in ADB using widgets as runtime_date, further i want that column to be added in my table X with the populated value from the widget.Eve...

  • 663 Views
  • 2 replies
  • 0 kudos
Latest Reply
sher
Valued Contributor II
  • 0 kudos

you can use as current_timestamp() or now()refer link: https://docs.databricks.com/sql/language-manual/functions/current_timestamp.html

  • 0 kudos
1 More Replies
data_explorer
by New Contributor II
  • 690 Views
  • 1 replies
  • 2 kudos
  • 690 Views
  • 1 replies
  • 2 kudos
Latest Reply
User16753725469
Contributor II
  • 2 kudos

Please refer: https://www.databricks.com/blog/2021/05/26/introducing-databricks-unity-catalog-fine-grained-governance-for-data-and-ai-on-the-lakehouse.html

  • 2 kudos
lizou
by Contributor II
  • 2020 Views
  • 4 replies
  • 6 kudos

Resolved! Identity column definition lost using save as table

I found an issue:For a table with an identity column defined.when the table column is renamed using this method, the identity definition will be removed. That means using an identity column in a table requires extra attention to check whether the ide...

  • 2020 Views
  • 4 replies
  • 6 kudos
Latest Reply
lizou
Contributor II
  • 6 kudos

try to avoid reload table, I found we can upgrade table version, and use rename column commandALTER TABLE test_id2 SET TBLPROPERTIES (  'delta.columnMapping.mode' = 'name',  'delta.minReaderVersion' = '2',  'delta.minWriterVersion' = '6')ALTER TABLE ...

  • 6 kudos
3 More Replies
ramankr48
by Contributor II
  • 11658 Views
  • 11 replies
  • 2 kudos

Resolved! how to add an identity column to an existing table?

I have created a database called retail and inside database a table is there called sales_order. I want to create an identity column in the sales_order table, but while creating it I am getting an error.

  • 11658 Views
  • 11 replies
  • 2 kudos
Latest Reply
PriyaAnanthram
Contributor III
  • 2 kudos

My DBR

  • 2 kudos
10 More Replies
auser85
by New Contributor III
  • 1339 Views
  • 2 replies
  • 1 kudos

How to reset the IDENTITY column count?

After accumulating many updates to a delta table,like,keyExample bigint GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1),my identity column values are in the hundreds of millions. Is there any way that I can reset this value through vacuumi...

  • 1339 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hey there @Andrew Fogarty​ Does @Werner Stinckens​'s response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly? Else please let us know if you need more help. Thanks!

  • 1 kudos
1 More Replies
joel_iemma
by New Contributor III
  • 2561 Views
  • 5 replies
  • 0 kudos

Resolved! A void column was created after connecting to cosmos

Hi everyone, I have connected to Cosmos using this tutorial https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/cosmos/azure-cosmos-spark_3_2-12/Samples/DatabricksLiveContainerMigrationAfter creating a table using a simple SQL command:CREATE TA...

image
  • 2561 Views
  • 5 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hey there @Joel iemma​ Hope all is well! Just wanted to check in if you would be happy to mark an answer as best for us, please? It would be really helpful for the other members too.Cheers!

  • 0 kudos
4 More Replies
cuteabhi32
by New Contributor III
  • 26191 Views
  • 11 replies
  • 1 kudos

Resolved! Trying to check if a column exist in a dataframe or not if not then i have to give NULL if yes then i need to give the column itself by using UDF

from pyspark import SparkContextfrom pyspark import SparkConffrom pyspark.sql.types import *from pyspark.sql.functions import *from pyspark.sql import *from pyspark.sql.types import StringTypefrom pyspark.sql.functions import udfdf1 = spark.read.form...

  • 26191 Views
  • 11 replies
  • 1 kudos
Latest Reply
cuteabhi32
New Contributor III
  • 1 kudos

Thanks i modified my code as per your suggestion and it worked perfectly Thanks again for all your inputsdflist= spark.createDataFrame(list(a.columns), "string").toDF("Name")dfg=dflist.filter(col('name').isin('ref_date')).count()if dfg==1 :  a = a.wi...

  • 1 kudos
10 More Replies
Labels